Skip to content

topmark.pipeline.context.model

topmark / pipeline / context / model

Processing context model for the TopMark pipeline.

This module defines the core data structures used to represent the state of a single file as it flows through the TopMark pipeline. The central type is ProcessingContext, which carries configuration, status, diagnostics, and view data between steps.

Sections

ProcessingContext: High-level container that represents the per-file processing state and exposes convenience helpers for policy checks, feasibility decisions, and view access.

HaltState: Small helper dataclass that records why and where processing was halted for a given file.

HaltState dataclass

HaltState(*, reason_code='', step_name='')

Information about a terminal halt for a single file.

Instances of this dataclass describe why and where the pipeline decided to stop processing a file. A non-empty step_name implies that a step requested an early, graceful halt.

Attributes:

Name Type Description
reason_code str

Short machine-friendly reason code explaining why processing was halted (for example, "unsupported" or "unchanged-summary"). Intended for internal use and machine-readable output.

step_name str

Name of the pipeline step that requested the halt. An empty string indicates that no explicit halt has been recorded.

ProcessingContext dataclass

ProcessingContext(
    *,
    config,
    run_options,
    path,
    policy_registry,
    timestamp=None,
    steps=(lambda: [])(),
    resolution_probe=None,
    file_type=None,
    status=ProcessingStatus(),
    halt_state=None,
    header_processor=None,
    leading_bom=False,
    has_shebang=False,
    is_effectively_empty=False,
    is_logically_empty=False,
    newline_hist=(lambda: {})(),
    dominant_newline=None,
    dominance_ratio=None,
    mixed_newlines=None,
    newline_style="\n",
    ends_with_newline=None,
    pre_insert_capability=InsertCapability.UNEVALUATED,
    pre_insert_reason=None,
    pre_insert_origin=None,
    diagnostics=MutableDiagnosticLog(),
    diagnostic_hints=HintLog(),
    views=Views(),
)

Context for header processing in the TopMark pipeline.

A ProcessingContext instance represents the complete, mutable state for a single file as it flows through the pipeline. It holds configuration, per-axis status, diagnostics, and view data, and it exposes helpers for policy- and feasibility-related decisions.

Attributes:

Name Type Description
config FrozenConfig

Effective layered configuration for this file.

run_options RunOptions

Invocation-wide execution-only runtime options for the current run.

path Path

The file path to process (absolute or relative to the working directory).

policy_registry PolicyRegistry

The policy registry (global + file type specific overrides).

timestamp datetime | None

The file path's modification timestamp. This is distinct from run_options.started_at, which records when the invocation began.

steps list[Step[ProcessingContext]]

Ordered list of pipeline steps that have been executed for this context.

resolution_probe ResolutionProbeResult | None

Probe result explaining file type and processor resolution for the current file path.

file_type FileType | None

Resolved file type for the file (for example, a Python or Markdown file type), if applicable.

status ProcessingStatus

Aggregated status for each pipeline axis, kept as the single source of truth for per-axis outcomes.

halt_state HaltState | None

Information about an early, terminal halt for this file. None means processing has not been halted.

header_processor HeaderProcessor | None

Header processor instance responsible for this file type, if any.

leading_bom bool

True if the original file began with a UTF-8 BOM ("\\ufeff"). The reader sets this flag and strips the BOM from the in-memory image; the writer re-attaches it to the final output.

has_shebang bool

True if the first logical line starts with "#!" (post-BOM normalization).

is_effectively_empty bool

Whether the decoded, BOM-stripped text image contains no non-whitespace characters. Newlines and other whitespace are allowed. This is the broad notion of "empty" used for most policy decisions.

is_logically_empty bool

Whether the file is "logically empty": after BOM stripping, it contains optional horizontal whitespace and at most one trailing newline sequence (LF/CRLF/CR), and nothing else. This is a stricter subset of is_effectively_empty and is useful to preserve stable round-trips for files that are effectively placeholders.

newline_hist dict[str, int]

Histogram of newline styles detected in the file image.

dominant_newline str | None

Dominant newline sequence detected in the file (for example, "\\n" or "\\r\\n"), if any.

dominance_ratio float | None

Ratio of dominant newline occurrences versus total newline occurrences.

mixed_newlines bool | None

True if multiple newline styles were detected, False if a single style was found, or None if not evaluated yet.

newline_style str

Normalized newline style used when writing output; defaults to "\\n".

ends_with_newline bool | None

True if the file ends with a newline sequence, False if it does not, or None if unknown.

pre_insert_capability InsertCapability

Advisory from the sniffer about pre-insert checks (for example, spacers or empty body), defaults to InsertCapability.UNEVALUATED.

pre_insert_reason str | None

Human-readable reason why insertion may be problematic.

pre_insert_origin str | None

Origin of the pre-insertion diagnostic (typically a step or subsystem name).

diagnostics MutableDiagnosticLog

Collected diagnostics (info, warning, and error) produced during processing.

diagnostic_hints HintLog

Non-binding hints supplied by steps to explain decisions; used primarily for summarization.

views Views

Bundle that carries image/header/build/render/updated/ diff views for this file. The runner may prune heavy views after processing.

is_empty_like property

is_empty_like

Return True if the file contains no meaningful content.

This is True when the file is either: - physically empty on disk (0 bytes), or - effectively empty after decoding (only whitespace).

This helper is intended for convenience checks in pipeline steps and should not replace explicit emptiness distinctions in policy evaluation.

step_axes property

step_axes

Map each executed step to the axes it may write.

The keys are step names (e.g. "SnifferStep"), and the values are lists of axis names (e.g. ["fs", "content"]). This is derived from the axes_written contract of each step instance in self.steps.

Combined with self.steps (execution order) and self.status.to_dict() (per-axis final status), this provides a complete view of the step/axis/status relationship without duplicating status payloads.

is_halted property

is_halted

Return True if a step has requested an early halt for this file.

Returns:

Type Description
bool

True when halt_state is not None, meaning that

bool

the pipeline should not execute any further steps for this file.

get_effective_policy

get_effective_policy()

Return the effective policy for this processing context.

The effective policy is derived from the global configuration and any file-type-specific overrides via the shared PolicyRegistry. This method does not perform any merging at runtime; all policies are resolved at MutableConfig.freeze() time.

Per-type policies are keyed by canonical qualified file type identifiers such as topmark:python, not local identifiers such as python.

Returns:

Type Description
FrozenPolicy

The effective policy for this context.

Source code in src/topmark/pipeline/context/model.py
def get_effective_policy(self) -> FrozenPolicy:
    """Return the effective policy for this processing context.

    The effective policy is derived from the global configuration and any
    file-type-specific overrides via the shared
    [`PolicyRegistry`][topmark.config.policy.PolicyRegistry]. This method
    does not perform any merging at runtime; all policies are resolved at
    [`MutableConfig.freeze()`][topmark.config.model.MutableConfig.freeze] time.

    Per-type policies are keyed by canonical qualified file type identifiers
    such as `topmark:python`, not local identifiers such as `python`.

    Returns:
        The effective policy for this context.
    """
    qualified_key: str | None = (
        self.file_type.qualified_key if self.file_type is not None else None
    )
    return self.policy_registry.for_type(qualified_key)

request_halt

request_halt(reason, at_step)

Request a graceful, terminal stop for the rest of the pipeline.

This method records a HaltState on the context so that subsequent steps and the runner can avoid further processing for this file.

The context's halt_state field is updated in place.

Parameters:

Name Type Description Default
reason str

Short machine-friendly reason code for halting the pipeline (for example, "unsupported").

required
at_step Step[ProcessingContext]

Step instance requesting the halt.

required
Source code in src/topmark/pipeline/context/model.py
def request_halt(self, reason: str, at_step: Step[ProcessingContext]) -> None:
    """Request a graceful, terminal stop for the rest of the pipeline.

    This method records a ``HaltState`` on the context so that subsequent
    steps and the runner can avoid further processing for this file.

    The context's ``halt_state`` field is updated in place.

    Args:
        reason: Short machine-friendly reason code for halting the
            pipeline (for example, ``"unsupported"``).
        at_step: Step instance requesting the halt.
    """
    logger.info(
        "🛑 Processing halted in %s: %s",
        at_step.name,
        reason,
        stacklevel=2,
    )
    self.halt_state = HaltState(reason_code=reason, step_name=at_step.name)

iter_image_lines

iter_image_lines()

Iterate the current file image without materializing.

This accessor hides the underlying representation (list-backed, mmap-backed, or generator-based) and returns an iterator over logical lines with original newline sequences preserved.

Returns:

Type Description
Iterable[str]

An iterator over the file's lines. If no image is present,

Iterable[str]

an empty iterator is returned.

Source code in src/topmark/pipeline/context/model.py
def iter_image_lines(self) -> Iterable[str]:
    """Iterate the current file image without materializing.

    This accessor hides the underlying representation (list-backed, mmap-backed,
    or generator-based) and returns an iterator over logical lines with original
    newline sequences preserved.

    Returns:
        An iterator over the file's lines. If no image is present,
        an empty iterator is returned.
    """
    if self.views.image is not None:
        return self.views.image.iter_lines()
    return iter(())  # empty

image_line_count

image_line_count()

Return the number of logical lines without materializing.

Returns:

Type Description
int

Total number of lines in the current image, or 0 if no image is present.

Source code in src/topmark/pipeline/context/model.py
def image_line_count(self) -> int:
    """Return the number of logical lines without materializing.

    Returns:
        Total number of lines in the current image, or ``0`` if no image is present.
    """
    if self.views.image is not None:
        return self.views.image.line_count()
    return 0

iter_updated_lines

iter_updated_lines()

Iterate the updated file image lines, if present.

Returns:

Type Description
Iterable[str]

Iterator over updated lines. If no updated image is available (no planner/strip output),

Iterable[str]

returns an empty iterator.

Source code in src/topmark/pipeline/context/model.py
def iter_updated_lines(self) -> Iterable[str]:
    """Iterate the updated file image lines, if present.

    Returns:
        Iterator over updated lines. If no updated image is available (no planner/strip output),
        returns an empty iterator.
    """
    uv: UpdatedView | None = self.views.updated
    if not uv or uv.lines is None:
        return iter(())
    seq_or_it: Sequence[str] | Iterable[str] = uv.lines
    # If it's already a concrete sequence, avoid copying:
    if isinstance(seq_or_it, list | tuple):
        return iter(seq_or_it)
    # Fallback: it's an arbitrary iterable (possibly a generator)
    return iter(seq_or_it)

materialize_image_lines

materialize_image_lines()

Return the original file image as a materialized list of lines.

Returns:

Type Description
list[str]

List of logical lines from the current image view. An empty list is returned if no image

list[str]

is available.

Source code in src/topmark/pipeline/context/model.py
def materialize_image_lines(self) -> list[str]:
    """Return the original file image as a materialized list of lines.

    Returns:
        List of logical lines from the current image view. An empty list is returned if no image
        is available.
    """
    return list(self.iter_image_lines())

materialize_updated_lines

materialize_updated_lines()

Return the updated file image as a materialized list of lines.

Returns:

Type Description
list[str]

List of updated lines if present, otherwise an empty list.

Source code in src/topmark/pipeline/context/model.py
def materialize_updated_lines(self) -> list[str]:
    """Return the updated file image as a materialized list of lines.

    Returns:
        List of updated lines if present, otherwise an empty list.
    """
    uv: UpdatedView | None = self.views.updated
    if not uv or uv.lines is None:
        return []
    seq_or_it: Sequence[str] | Iterable[str] = uv.lines
    return seq_or_it if isinstance(seq_or_it, list) else list(seq_or_it)

to_dict

to_dict()

Return a machine-readable representation of this processing result.

The schema is intended for CLI/CI consumption and avoids color or formatting concerns. View details are delegated to self.views.as_dict() to keep this method small and consistent with the Views bundling.

Returns:

Type Description
dict[str, object]

A JSON-serializable mapping describing the context, including path, file type,

dict[str, object]

step statuses, views summary, diagnostics, and high-level outcome flags.

Source code in src/topmark/pipeline/context/model.py
def to_dict(self) -> dict[str, object]:
    """Return a machine-readable representation of this processing result.

    The schema is intended for CLI/CI consumption and avoids color or
    formatting concerns. View details are delegated to
    ``self.views.as_dict()`` to keep this method small and consistent with
    the ``Views`` bundling.

    Returns:
        A JSON-serializable mapping describing the context, including path, file type,
        step statuses, views summary, diagnostics, and high-level outcome flags.
    """
    views_summary: dict[str, object] = self.views.as_dict()

    return {
        "path": str(self.path),
        "file_type": {
            "qualified_key": (self.file_type.qualified_key),
            "description": (self.file_type.description),
        }
        if self.file_type
        else None,
        "steps": [s.name for s in self.steps],
        "step_axes": self.step_axes,
        "status": self.status.to_dict(),
        "views": views_summary,
        "diagnostics": [
            {"level": d.level.value, "message": d.message} for d in self.diagnostics
        ],
        "diagnostic_counts": self.diagnostics.to_dict(),
        "pre_insert_check": {
            "capability": self.pre_insert_capability.name,
            "reason": self.pre_insert_reason,
            "origin": self.pre_insert_origin,
        },
        "outcome": {
            "would_change": would_change(self),
            "can_change": can_change(self),
            "permitted_by_policy": check_permitted_by_policy(self),
            "check": {
                "would_add_or_update": would_add_or_update(self),
                "effective_would_add_or_update": effective_would_add_or_update(self),
            },
            "strip": {
                "would_strip": would_strip(self),
                "effective_would_strip": effective_would_strip(self),
            },
        },
    }

hint

hint(
    *,
    axis,
    code,
    message,
    detail=None,
    cluster=None,
    terminal=False,
    reason=None,
    meta=None,
)

Create and attach a normalized Hint to this context.

This is a convenience façade around make_hint and HintLog.add, allowing pipeline steps to emit structured, non-binding diagnostics without depending on the underlying HintLog representation.

The new hint is appended to this context's hint log.

Parameters:

Name Type Description Default
axis Axis

Axis emitting the hint.

required
code KnownCode | str

Stable machine key for the condition.

required
message str

Human-readable short summary line.

required
detail str | None

Optional extended diagnostic text rendered at higher verbosity (e.g., multi-line config snippets or rationale).

None
cluster Cluster | str | None

Optional grouping key; defaults to code.

None
terminal bool

Whether this condition is terminal.

False
reason str | None

Optional detail string.

None
meta dict[str, object] | None

Optional extensibility bag.

None
Example
ctx.hint(axis=Axis.PLAN, code=KnownCode.PLAN_INSERT, message="would insert header")
Source code in src/topmark/pipeline/context/model.py
def hint(
    self,
    *,
    axis: Axis,
    code: KnownCode | str,
    message: str,
    detail: str | None = None,
    cluster: Cluster | str | None = None,
    terminal: bool = False,
    reason: str | None = None,
    meta: dict[str, object] | None = None,
) -> None:
    """Create and attach a normalized `Hint` to this context.

    This is a convenience façade around `make_hint` and `HintLog.add`,
    allowing pipeline steps to emit structured, non-binding diagnostics
    without depending on the underlying `HintLog` representation.

    The new hint is appended to this context's hint log.

    Args:
        axis: Axis emitting the hint.
        code: Stable machine key for the condition.
        message: Human-readable short summary line.
        detail: Optional extended diagnostic text rendered at higher verbosity
            (e.g., multi-line config snippets or rationale).
        cluster: Optional grouping key; defaults to ``code``.
        terminal: Whether this condition is terminal.
        reason: Optional detail string.
        meta: Optional extensibility bag.

    Example:
        ```python
        ctx.hint(axis=Axis.PLAN, code=KnownCode.PLAN_INSERT, message="would insert header")
        ```
    """
    self.diagnostic_hints.add(
        make_hint(
            axis=axis,
            code=code,
            message=message,
            detail=detail,
            cluster=cluster,
            terminal=terminal,
            reason=reason,
            meta=meta,
        )
    )

bootstrap classmethod

bootstrap(
    *,
    path,
    config,
    run_options,
    policy_registry_override=None,
)

Create a fresh context with no derived state.

Parameters:

Name Type Description Default
path Path

File system path for the file to process.

required
config FrozenConfig

Effective layered configuration to attach to the context.

required
run_options RunOptions

Invocation-wide execution-only runtime options.

required
policy_registry_override PolicyRegistry | None

Optional precomputed policy registry for the supplied effective config. When omitted, the registry is derived from config during bootstrap.

None

Returns:

Type Description
ProcessingContext

Newly created context instance.

Source code in src/topmark/pipeline/context/model.py
@classmethod
def bootstrap(
    cls,
    *,
    path: Path,
    config: FrozenConfig,
    run_options: RunOptions,
    policy_registry_override: PolicyRegistry | None = None,
) -> ProcessingContext:
    """Create a fresh context with no derived state.

    Args:
        path: File system path for the file to process.
        config: Effective layered configuration to attach to the context.
        run_options: Invocation-wide execution-only runtime options.
        policy_registry_override: Optional precomputed policy registry for the
            supplied effective config. When omitted, the registry is derived
            from `config` during bootstrap.

    Returns:
        Newly created context instance.
    """
    if policy_registry_override is None:
        reg: PolicyRegistry = make_policy_registry(config)
    else:
        reg = policy_registry_override
    return cls(
        path=path,
        config=config,
        run_options=run_options,
        policy_registry=reg,
    )