topmark.pipeline.context.model¶

Processing context model for the TopMark pipeline.

This module defines the core data structures used to represent the state of a single file as it flows through the TopMark pipeline. The central type is ProcessingContext, which carries configuration, status, diagnostics, and view data between steps.

Sections

ProcessingContext: High-level container that represents the per-file processing state and exposes convenience helpers for policy checks, feasibility decisions, and view access.

HaltState: Small helper dataclass that records why and where processing was halted for a given file.

HaltState `dataclass` ¶

HaltState(*, reason_code='', step_name='')

Information about a terminal halt for a single file.

Instances of this dataclass describe why and where the pipeline decided to stop processing a file. A non-empty step_name implies that a step requested an early, graceful halt.

Attributes:

Name	Type	Description
`reason_code`	`str`	Short machine-friendly reason code explaining why processing was halted (for example, `"unsupported"` or `"unchanged-summary"`). Intended for internal use and machine-readable output.
`step_name`	`str`	Name of the pipeline step that requested the halt. An empty string indicates that no explicit halt has been recorded.

ProcessingContext `dataclass` ¶

ProcessingContext(
    *,
    config,
    run_options,
    path,
    policy_registry,
    timestamp=None,
    steps=(lambda: [])(),
    resolution_probe=None,
    file_type=None,
    status=ProcessingStatus(),
    halt_state=None,
    header_processor=None,
    leading_bom=False,
    has_shebang=False,
    is_effectively_empty=False,
    is_logically_empty=False,
    newline_hist=(lambda: {})(),
    dominant_newline=None,
    dominance_ratio=None,
    mixed_newlines=None,
    newline_style="\n",
    ends_with_newline=None,
    pre_insert_capability=InsertCapability.UNEVALUATED,
    pre_insert_reason=None,
    pre_insert_origin=None,
    diagnostics=MutableDiagnosticLog(),
    diagnostic_hints=HintLog(),
    views=Views(),
)

Context for header processing in the TopMark pipeline.

A ProcessingContext instance represents the complete, mutable state for a single file as it flows through the pipeline. It holds configuration, per-axis status, diagnostics, and view data, and it exposes helpers for policy- and feasibility-related decisions.

Attributes:

Name	Type	Description
`config`	`FrozenConfig`	Effective layered configuration for this file.
`run_options`	`RunOptions`	Invocation-wide execution-only runtime options for the current run.
`path`	`Path`	The file path to process (absolute or relative to the working directory).
`policy_registry`	`PolicyRegistry`	The policy registry (global + file type specific overrides).
`timestamp`	`datetime \| None`	The file path's modification timestamp. This is distinct from `run_options.started_at`, which records when the invocation began.
`steps`	`list[Step[ProcessingContext]]`	Ordered list of pipeline steps that have been executed for this context.
`resolution_probe`	`ResolutionProbeResult \| None`	Probe result explaining file type and processor resolution for the current file path.
`file_type`	`FileType \| None`	Resolved file type for the file (for example, a Python or Markdown file type), if applicable.
`status`	`ProcessingStatus`	Aggregated status for each pipeline axis, kept as the single source of truth for per-axis outcomes.
`halt_state`	`HaltState \| None`	Information about an early, terminal halt for this file. `None` means processing has not been halted.
`header_processor`	`HeaderProcessor \| None`	Header processor instance responsible for this file type, if any.
`leading_bom`	`bool`	True if the original file began with a UTF-8 BOM (`"\\ufeff"`). The reader sets this flag and strips the BOM from the in-memory image; the writer re-attaches it to the final output.
`has_shebang`	`bool`	True if the first logical line starts with `"#!"` (post-BOM normalization).
`is_effectively_empty`	`bool`	Whether the decoded, BOM-stripped text image contains no non-whitespace characters. Newlines and other whitespace are allowed. This is the broad notion of "empty" used for most policy decisions.
`is_logically_empty`	`bool`	Whether the file is "logically empty": after BOM stripping, it contains optional horizontal whitespace and at most one trailing newline sequence (LF/CRLF/CR), and nothing else. This is a stricter subset of `is_effectively_empty` and is useful to preserve stable round-trips for files that are effectively placeholders.
`newline_hist`	`dict[str, int]`	Histogram of newline styles detected in the file image.
`dominant_newline`	`str \| None`	Dominant newline sequence detected in the file (for example, `"\\n"` or `"\\r\\n"`), if any.
`dominance_ratio`	`float \| None`	Ratio of dominant newline occurrences versus total newline occurrences.
`mixed_newlines`	`bool \| None`	True if multiple newline styles were detected, False if a single style was found, or None if not evaluated yet.
`newline_style`	`str`	Normalized newline style used when writing output; defaults to `"\\n"`.
`ends_with_newline`	`bool \| None`	True if the file ends with a newline sequence, False if it does not, or None if unknown.
`pre_insert_capability`	`InsertCapability`	Advisory from the sniffer about pre-insert checks (for example, spacers or empty body), defaults to `InsertCapability.UNEVALUATED`.
`pre_insert_reason`	`str \| None`	Human-readable reason why insertion may be problematic.
`pre_insert_origin`	`str \| None`	Origin of the pre-insertion diagnostic (typically a step or subsystem name).
`diagnostics`	`MutableDiagnosticLog`	Collected diagnostics (info, warning, and error) produced during processing.
`diagnostic_hints`	`HintLog`	Non-binding hints supplied by steps to explain decisions; used primarily for summarization.
`views`	`Views`	Bundle that carries image/header/build/render/updated/ diff views for this file. The runner may prune heavy views after processing.

is_empty_like `property` ¶

is_empty_like

Return True if the file contains no meaningful content.

This is True when the file is either: - physically empty on disk (0 bytes), or - effectively empty after decoding (only whitespace).

This helper is intended for convenience checks in pipeline steps and should not replace explicit emptiness distinctions in policy evaluation.

step_axes `property` ¶

step_axes

Map each executed step to the axes it may write.

The keys are step names (e.g. "SnifferStep"), and the values are lists of axis names (e.g. ["fs", "content"]). This is derived from the axes_written contract of each step instance in self.steps.

Combined with self.steps (execution order) and self.status.to_dict() (per-axis final status), this provides a complete view of the step/axis/status relationship without duplicating status payloads.

is_halted `property` ¶

is_halted

Return True if a step has requested an early halt for this file.

Returns:

Type	Description
`bool`	`True` when `halt_state` is not `None`, meaning that
`bool`	the pipeline should not execute any further steps for this file.

get_effective_policy ¶

get_effective_policy()

Return the effective policy for this processing context.

The effective policy is derived from the global configuration and any file-type-specific overrides via the shared PolicyRegistry. This method does not perform any merging at runtime; all policies are resolved at MutableConfig.freeze() time.

Per-type policies are keyed by canonical qualified file type identifiers such as topmark:python, not local identifiers such as python.

Returns:

Type	Description
`FrozenPolicy`	The effective policy for this context.

Source code in src/topmark/pipeline/context/model.py

def get_effective_policy(self) -> FrozenPolicy:
    """Return the effective policy for this processing context.

    The effective policy is derived from the global configuration and any
    file-type-specific overrides via the shared
    [`PolicyRegistry`][topmark.config.policy.PolicyRegistry]. This method
    does not perform any merging at runtime; all policies are resolved at
    [`MutableConfig.freeze()`][topmark.config.model.MutableConfig.freeze] time.

    Per-type policies are keyed by canonical qualified file type identifiers
    such as `topmark:python`, not local identifiers such as `python`.

    Returns:
        The effective policy for this context.
    """
    qualified_key: str | None = (
        self.file_type.qualified_key if self.file_type is not None else None
    )
    return self.policy_registry.for_type(qualified_key)

request_halt ¶

request_halt(reason, at_step)

Request a graceful, terminal stop for the rest of the pipeline.

This method records a HaltState on the context so that subsequent steps and the runner can avoid further processing for this file.

The context's halt_state field is updated in place.

Parameters:

Name	Type	Description	Default
`reason`	`str`	Short machine-friendly reason code for halting the pipeline (for example, `"unsupported"`).	required
`at_step`	`Step[ProcessingContext]`	Step instance requesting the halt.	required

Source code in src/topmark/pipeline/context/model.py

def request_halt(self, reason: str, at_step: Step[ProcessingContext]) -> None:
    """Request a graceful, terminal stop for the rest of the pipeline.

    This method records a ``HaltState`` on the context so that subsequent
    steps and the runner can avoid further processing for this file.

    The context's ``halt_state`` field is updated in place.

    Args:
        reason: Short machine-friendly reason code for halting the
            pipeline (for example, ``"unsupported"``).
        at_step: Step instance requesting the halt.
    """
    logger.info(
        "🛑 Processing halted in %s: %s",
        at_step.name,
        reason,
        stacklevel=2,
    )
    self.halt_state = HaltState(reason_code=reason, step_name=at_step.name)

iter_image_lines ¶

iter_image_lines()

Iterate the current file image without materializing.

This accessor hides the underlying representation (list-backed, mmap-backed, or generator-based) and returns an iterator over logical lines with original newline sequences preserved.

Returns:

Type	Description
`Iterable[str]`	An iterator over the file's lines. If no image is present,
`Iterable[str]`	an empty iterator is returned.

Source code in src/topmark/pipeline/context/model.py

def iter_image_lines(self) -> Iterable[str]:
    """Iterate the current file image without materializing.

    This accessor hides the underlying representation (list-backed, mmap-backed,
    or generator-based) and returns an iterator over logical lines with original
    newline sequences preserved.

    Returns:
        An iterator over the file's lines. If no image is present,
        an empty iterator is returned.
    """
    if self.views.image is not None:
        return self.views.image.iter_lines()
    return iter(())  # empty

image_line_count ¶

image_line_count()

Return the number of logical lines without materializing.

Returns:

Type	Description
`int`	Total number of lines in the current image, or `0` if no image is present.

Source code in src/topmark/pipeline/context/model.py

def image_line_count(self) -> int:
    """Return the number of logical lines without materializing.

    Returns:
        Total number of lines in the current image, or ``0`` if no image is present.
    """
    if self.views.image is not None:
        return self.views.image.line_count()
    return 0

iter_updated_lines ¶

iter_updated_lines()

Iterate the updated file image lines, if present.

Returns:

Type	Description
`Iterable[str]`	Iterator over updated lines. If no updated image is available (no planner/strip output),
`Iterable[str]`	returns an empty iterator.

Source code in src/topmark/pipeline/context/model.py

def iter_updated_lines(self) -> Iterable[str]:
    """Iterate the updated file image lines, if present.

    Returns:
        Iterator over updated lines. If no updated image is available (no planner/strip output),
        returns an empty iterator.
    """
    uv: UpdatedView | None = self.views.updated
    if not uv or uv.lines is None:
        return iter(())
    seq_or_it: Sequence[str] | Iterable[str] = uv.lines
    # If it's already a concrete sequence, avoid copying:
    if isinstance(seq_or_it, list | tuple):
        return iter(seq_or_it)
    # Fallback: it's an arbitrary iterable (possibly a generator)
    return iter(seq_or_it)

materialize_image_lines ¶

materialize_image_lines()

Return the original file image as a materialized list of lines.

Returns:

Type	Description
`list[str]`	List of logical lines from the current image view. An empty list is returned if no image
`list[str]`	is available.

Source code in src/topmark/pipeline/context/model.py

def materialize_image_lines(self) -> list[str]:
    """Return the original file image as a materialized list of lines.

    Returns:
        List of logical lines from the current image view. An empty list is returned if no image
        is available.
    """
    return list(self.iter_image_lines())

materialize_updated_lines ¶

materialize_updated_lines()

Return the updated file image as a materialized list of lines.

Returns:

Type	Description
`list[str]`	List of updated lines if present, otherwise an empty list.

Source code in src/topmark/pipeline/context/model.py

def materialize_updated_lines(self) -> list[str]:
    """Return the updated file image as a materialized list of lines.

    Returns:
        List of updated lines if present, otherwise an empty list.
    """
    uv: UpdatedView | None = self.views.updated
    if not uv or uv.lines is None:
        return []
    seq_or_it: Sequence[str] | Iterable[str] = uv.lines
    return seq_or_it if isinstance(seq_or_it, list) else list(seq_or_it)

to_dict ¶

to_dict()

Return a machine-readable representation of this processing result.

The schema is intended for CLI/CI consumption and avoids color or formatting concerns. View details are delegated to self.views.as_dict() to keep this method small and consistent with the Views bundling.

Returns:

Type	Description
`dict[str, object]`	A JSON-serializable mapping describing the context, including path, file type,
`dict[str, object]`	step statuses, views summary, diagnostics, and high-level outcome flags.

Source code in src/topmark/pipeline/context/model.py

def to_dict(self) -> dict[str, object]:
    """Return a machine-readable representation of this processing result.

    The schema is intended for CLI/CI consumption and avoids color or
    formatting concerns. View details are delegated to
    ``self.views.as_dict()`` to keep this method small and consistent with
    the ``Views`` bundling.

    Returns:
        A JSON-serializable mapping describing the context, including path, file type,
        step statuses, views summary, diagnostics, and high-level outcome flags.
    """
    views_summary: dict[str, object] = self.views.as_dict()

    return {
        "path": str(self.path),
        "file_type": {
            "qualified_key": (self.file_type.qualified_key),
            "description": (self.file_type.description),
        }
        if self.file_type
        else None,
        "steps": [s.name for s in self.steps],
        "step_axes": self.step_axes,
        "status": self.status.to_dict(),
        "views": views_summary,
        "diagnostics": [
            {"level": d.level.value, "message": d.message} for d in self.diagnostics
        ],
        "diagnostic_counts": self.diagnostics.to_dict(),
        "pre_insert_check": {
            "capability": self.pre_insert_capability.name,
            "reason": self.pre_insert_reason,
            "origin": self.pre_insert_origin,
        },
        "outcome": {
            "would_change": would_change(self),
            "can_change": can_change(self),
            "permitted_by_policy": check_permitted_by_policy(self),
            "check": {
                "would_add_or_update": would_add_or_update(self),
                "effective_would_add_or_update": effective_would_add_or_update(self),
            },
            "strip": {
                "would_strip": would_strip(self),
                "effective_would_strip": effective_would_strip(self),
            },
        },
    }

hint ¶

hint(
    *,
    axis,
    code,
    message,
    detail=None,
    cluster=None,
    terminal=False,
    reason=None,
    meta=None,
)

Create and attach a normalized Hint to this context.

This is a convenience façade around make_hint and HintLog.add, allowing pipeline steps to emit structured, non-binding diagnostics without depending on the underlying HintLog representation.

The new hint is appended to this context's hint log.

Parameters:

Name	Type	Description	Default
`axis`	`Axis`	Axis emitting the hint.	required
`code`	`KnownCode \| str`	Stable machine key for the condition.	required
`message`	`str`	Human-readable short summary line.	required
`detail`	`str \| None`	Optional extended diagnostic text rendered at higher verbosity (e.g., multi-line config snippets or rationale).	`None`
`cluster`	`Cluster \| str \| None`	Optional grouping key; defaults to `code`.	`None`
`terminal`	`bool`	Whether this condition is terminal.	`False`
`reason`	`str \| None`	Optional detail string.	`None`
`meta`	`dict[str, object] \| None`	Optional extensibility bag.	`None`

Example

ctx.hint(axis=Axis.PLAN, code=KnownCode.PLAN_INSERT, message="would insert header")

Source code in src/topmark/pipeline/context/model.py

def hint(
    self,
    *,
    axis: Axis,
    code: KnownCode | str,
    message: str,
    detail: str | None = None,
    cluster: Cluster | str | None = None,
    terminal: bool = False,
    reason: str | None = None,
    meta: dict[str, object] | None = None,
) -> None:
    """Create and attach a normalized `Hint` to this context.

    This is a convenience façade around `make_hint` and `HintLog.add`,
    allowing pipeline steps to emit structured, non-binding diagnostics
    without depending on the underlying `HintLog` representation.

    The new hint is appended to this context's hint log.

    Args:
        axis: Axis emitting the hint.
        code: Stable machine key for the condition.
        message: Human-readable short summary line.
        detail: Optional extended diagnostic text rendered at higher verbosity
            (e.g., multi-line config snippets or rationale).
        cluster: Optional grouping key; defaults to ``code``.
        terminal: Whether this condition is terminal.
        reason: Optional detail string.
        meta: Optional extensibility bag.

    Example:
        ```python
        ctx.hint(axis=Axis.PLAN, code=KnownCode.PLAN_INSERT, message="would insert header")
        ```
    """
    self.diagnostic_hints.add(
        make_hint(
            axis=axis,
            code=code,
            message=message,
            detail=detail,
            cluster=cluster,
            terminal=terminal,
            reason=reason,
            meta=meta,
        )
    )

bootstrap `classmethod` ¶

bootstrap(
    *,
    path,
    config,
    run_options,
    policy_registry_override=None,
)

Create a fresh context with no derived state.

Parameters:

Name	Type	Description	Default
`path`	`Path`	File system path for the file to process.	required
`config`	`FrozenConfig`	Effective layered configuration to attach to the context.	required
`run_options`	`RunOptions`	Invocation-wide execution-only runtime options.	required
`policy_registry_override`	`PolicyRegistry \| None`	Optional precomputed policy registry for the supplied effective config. When omitted, the registry is derived from `config` during bootstrap.	`None`

Returns:

Type	Description
`ProcessingContext`	Newly created context instance.

Source code in src/topmark/pipeline/context/model.py

@classmethod
def bootstrap(
    cls,
    *,
    path: Path,
    config: FrozenConfig,
    run_options: RunOptions,
    policy_registry_override: PolicyRegistry | None = None,
) -> ProcessingContext:
    """Create a fresh context with no derived state.

    Args:
        path: File system path for the file to process.
        config: Effective layered configuration to attach to the context.
        run_options: Invocation-wide execution-only runtime options.
        policy_registry_override: Optional precomputed policy registry for the
            supplied effective config. When omitted, the registry is derived
            from `config` during bootstrap.

    Returns:
        Newly created context instance.
    """
    if policy_registry_override is None:
        reg: PolicyRegistry = make_policy_registry(config)
    else:
        reg = policy_registry_override
    return cls(
        path=path,
        config=config,
        run_options=run_options,
        policy_registry=reg,
    )

topmark.pipeline.context.model¶

HaltState dataclass ¶

ProcessingContext dataclass ¶

is_empty_like property ¶

step_axes property ¶

is_halted property ¶

get_effective_policy ¶

request_halt ¶

iter_image_lines ¶

image_line_count ¶

iter_updated_lines ¶

materialize_image_lines ¶

materialize_updated_lines ¶

to_dict ¶

hint ¶

bootstrap classmethod ¶

HaltState `dataclass` ¶

ProcessingContext `dataclass` ¶

is_empty_like `property` ¶

step_axes `property` ¶

is_halted `property` ¶

bootstrap `classmethod` ¶