Skip to content

topmark.pipeline.runner

topmark / pipeline / runner

Run the TopMark V2 header processing pipeline for a single file.

This module defines the HeaderProcessor protocol interface, a registry system for associating file extensions with processor implementations, and helper functions for processor lookup and registration. It enables extensible, comment-style-based header processing for different file types.

run

run(ctx, steps, *, prune_views=True, keep_diff_view=False)

Execute the pipeline sequentially.

Parameters:

Name Type Description Default
ctx ProcessingContext

Mutable processing context.

required
steps Sequence[Step[ProcessingContext]]

Ordered sequence of pipeline steps. Each step takes and returns a context.

required
prune_views bool

Trim the views at the end of a run to reduce memory usage (default: True).

True
keep_diff_view bool

Whether to preserve the diff view.

False

Returns:

Type Description
ProcessingContext

The final processing context after all steps have run.

Source code in src/topmark/pipeline/runner.py
def run(
    ctx: ProcessingContext,
    steps: Sequence[Step[ProcessingContext]],
    *,
    prune_views: bool = True,
    keep_diff_view: bool = False,
) -> ProcessingContext:
    """Execute the pipeline sequentially.

    Args:
        ctx: Mutable processing context.
        steps: Ordered sequence of pipeline steps. Each step takes and returns a context.
        prune_views: Trim the views at the end of a run to reduce memory usage (default: `True`).
        keep_diff_view: Whether to preserve the diff view.

    Returns:
        The final processing context after all steps have run.
    """
    for step in steps:
        ctx = step(ctx)

    if prune_views is True:
        # Drops large in-memory buffers from heavy in-memory views;
        # retains only summary-friendly data.
        logger.debug("Trimming views; keep_diff_view: %r", keep_diff_view)
        ctx.views.release_all(keep_diff_view=keep_diff_view)

    return ctx