Skip to content

Pipelines (Concepts)

TopMark processes files through explicit, immutable pipelines composed of small, single-responsibility steps. Each pipeline represents a supported execution intent (scan, check, strip, apply, patch) and defines exactly which steps run and in which order.

A dedicated probe pipeline exists for resolution diagnostics (topmark probe). Probe orchestration also reports explicit inputs filtered before file-type probing via synthetic probe contexts.

Pipelines do not make high-level decisions themselves. Instead:

  • Each step mutates a strictly defined set of status axes
  • Steps may halt execution when required by policy or safety rules
  • Final outcomes (changed, unchanged, skipped, unsupported, error, ...) are derived centrally by the CLI and views from accumulated statuses and hints

This design guarantees predictability, debuggability, and idempotence.

Note

The canonical vocabulary used throughout the documentation is defined in Terminology and Canonical Vocabulary.

Pipeline execution consumes an immutable FrozenConfig plus runtime options assembled from the TOML → FrozenConfig → runtime flow documented in Architecture and Configuration discovery.

Pipeline execution also consumes a selected processing path. File-list resolution performs filesystem-identity evaluation before ordinary pipeline execution begins.

Filesystem-identity normalization collapses equivalent path spellings, such as symlinks, into a selected processing path. Filesystem-identity eligibility checks determine whether selected processing paths are safe to process. Pipeline steps therefore operate on processing paths rather than preserving original CLI, configuration, glob, or symlink spellings.

Hard-linked selected processing paths are handled by an invocation-wide engine guard before ordinary per-file pipeline execution. If multiple selected paths refer to the same filesystem object through hard links, every affected path is blocked as an unsupported processing target; no source, target, winner, or loser path is selected.

Source-local TOML options such as [config].root and strict are resolved before pipeline execution. They influence configuration discovery and staged config-loading validation behavior, but do not become layered configuration fields.

Note

[config].strict is a TOML-source-local strictness preference controlling staged configuration-loading validation for the current TOML source.

Effective strictness is evaluated across:

  • TOML-source diagnostics;
  • merged-config diagnostics;
  • runtime applicability diagnostics.

When strict validation fails, TopMark exits with CONFIG_ERROR. The diagnostics that triggered the failure remain visible in human-readable and machine-readable output formats.

strict is resolved during TOML loading and does not become a layered configuration field.

Concepts vs Reference

This page explains how the pipelines work and how the CLI composes them. For the canonical, API-backed definitions of pipelines, steps, and enums, see:

Step names and enum names on this page are written as MkDocStrings/AutoRefs links, for example topmark.pipeline.steps.resolver.ResolverStep. MkDocs resolves these references through the generated API documentation.


Pipeline Overview

All pipelines are built from the same core phases:

  1. Input selection - discover files, evaluate filesystem identity, normalize equivalent path spellings, enforce processing-target eligibility, and select processing paths
  2. Discovery - identify file type and viability for each processing path
  3. Inspection - read content and detect existing headers
  4. Evaluation - generate and compare expected headers
  5. Mutation (optional) - plan, patch, and/or write changes

The probe pipeline is an exception: it only executes the resolution phase and stops immediately after producing probe results.

Input selection happens before ordinary pipeline execution. Filesystem-identity normalization handles symlink behavior: file symlink spellings and their targets are collapsed to the resolved processing target before pipeline steps run. Filesystem-identity eligibility checks handle safety policy such as hard-link detection: hard-linked selected processing paths are blocked before ordinary step execution, while unrelated selected paths continue through the requested pipeline. Synthetic probe contexts for filtered or missing explicit inputs preserve diagnostic input information only for those paths that never became normal processing paths.

Unified Pipeline Flow

flowchart TD

  subgraph Probing
    O[<tt>ProberStep</tt>]
  end

  subgraph Discovery
    R[<tt>ResolverStep</tt>]
    S[<tt>SnifferStep</tt>]
    D[<tt>ReaderStep</tt>]
    N[<tt>ScannerStep</tt>]

    R --> S --> D --> N
  end

  subgraph Check
    B[<tt>BuilderStep</tt>]
    T[<tt>RendererStep</tt>]

    N --> B --> T
  end

  subgraph Strip
    X[<tt>StripperStep</tt>]

    N --> X
  end

  subgraph Comparison
    C[<tt>ComparerStep</tt>]
  end

  subgraph Mutation
    P[<tt>PlannerStep</tt>]
    H[<tt>PatcherStep</tt>]
    W[<tt>WriterStep</tt>]

    C --> P
    P -->|patch| H
    P -->|apply| W
  end

  T --> C
  X ---> C

Not all pipelines traverse all phases. Each variant selects a strict subset of steps.


Pipeline guarantees

TopMark pipelines are:

  • deterministic
  • step-ordered
  • side-effect constrained
  • idempotent
  • processing-path based
  • presentation-independent

Pipeline steps mutate processing context state. CLI views, API DTOs, and machine-readable output classify final outcomes from accumulated statuses and hints.

Some intermediate data is stored in phase-scoped pipeline views, such as the original file image, detected header data, generated fields, rendered headers, updated content, and unified diffs. Steps that read these views declare their dependencies via consumes_views. When runtime view pruning is enabled, the runner uses those declarations to release consumed view payloads after the last remaining consumer has run, while preserving requested output such as retained diffs.

For filesystem inputs, the processing context path is the selected processing path. It may differ from the path spelling supplied on the command line or in configuration when symlinks or equivalent relative spellings are involved.

For hard-linked filesystem inputs, selected processing paths remain separate results but are blocked before ordinary per-file pipeline execution. The engine does not collapse the hard-link group into a preferred source, target, winner, or loser path.


Available Pipelines

Pipelines are defined in src/topmark/pipeline/pipelines.py and exposed via topmark.pipeline.pipelines.Pipeline.

The CLI selects among these immutable pipeline variants based on command intent and flags such as --patch and --apply.

PROBE

Purpose: Explain file type and processor resolution

Mutation: ❌ none

Steps:

flowchart TD

O[<tt>ProberStep</tt>]

End states:

  • Resolution status (resolved, unsupported, no_processor, filtered)

  • Selected file type and processor (if any)

  • Full candidate set with match signals

  • Explicit inputs filtered before file-type probing are represented by synthetic probe results with status="filtered" and reasons such as excluded_by_path_filter, excluded_by_file_type_filter, or excluded_by_discovery_filter.

This pipeline powers topmark probe and topmark.api.probe() and is intentionally resolution-only.

It halts immediately after probing and does not perform inspection, comparison, or mutation. Discovery-level filtering is reported by orchestration via synthetic probe results for explicitly requested paths that did not reach probing.

Probe results that do reach runtime probing report processing paths. They should not be interpreted as a lossless echo of the original invocation spelling.

Hard-linked selected processing paths also remain visible in probe output. Each affected path is reported independently as unsupported with the stable reason string hard_link_duplicate.

SCAN

Purpose: Detect file type and existing TopMark headers

Mutation: ❌ none

Steps:

flowchart TD

R[<tt>ResolverStep</tt>]
S[<tt>SnifferStep</tt>]
D[<tt>ReaderStep</tt>]
N[<tt>ScannerStep</tt>]

R --> S --> D --> N

End states:

  • Header detected / missing / malformed
  • File unsupported, unreadable, binary, or blocked by policy
  • Hard-linked processing target blocked before ordinary scan steps run

This pipeline is used as the foundation for all others.


CHECK_RENDER

Purpose: Generate the expected header without comparison

Mutation: ❌ none

Steps:

flowchart TD

SP(<b>SCAN</b>)
B[<tt>BuilderStep</tt>]
T[<tt>RendererStep</tt>]

SP --> B --> T

End states:

  • Rendered header available in context
  • No determination yet whether changes are needed

BuilderStep derives built-in header metadata fields such as file_relpath, file_abspath, relpath, and abspath from the selected processing target. If a file was reached through a symlink, these generated fields describe the resolved target TopMark reads and writes rather than the symlink spelling. Header metadata path fields are serialized with POSIX / separators on all platforms.

Useful for debugging header generation.


CHECK (Summary)

Purpose: Determine whether a file would change

Mutation: ❌ none (dry-run safe)

Steps:

flowchart TD

CR(<b>CHECK_RENDER</b>)
C[<tt>ComparerStep</tt>]

CR --> C

End states:

  • UNCHANGED - rendered header matches existing header
  • CHANGED - header would be updated or inserted
  • SKIPPED / UNSUPPORTED - policy or file constraints

This is the default pipeline behind topmark check.


CHECK_PATCH

Purpose: Produce a unified diff without writing

Mutation: ❌ none (dry-run safe)

Steps:

flowchart TD

CP(<b>CHECK</b>)
P[<tt>PlannerStep</tt>]
H[<tt>PatcherStep</tt>]

CP --> P --> H

End states:

  • Patch generated
  • No patch if unchanged or skipped

PatcherStep generates unified diffs for human review. Diff file labels use the same human-facing display-path policy as TEXT and Markdown reports, including the logical --stdin-filename for STDIN-backed processing when available. They are not machine-readable path serialization fields.

Used when --patch is requested without --apply.


CHECK_APPLY

Purpose: Update or insert headers in place

Mutation: ✅ writes enabled

Steps:

flowchart TD

CP(<b>CHECK</b>)
P[<tt>PlannerStep</tt>]
W[<tt>WriterStep</tt>]

CP --> P --> W

End states:

  • File written
  • Write skipped if unchanged or blocked
  • Failure if filesystem or policy prevents writing

Requires --apply.


CHECK_APPLY_PATCH

Purpose: Apply changes and emit a patch

Mutation: ✅ writes enabled

Steps:

flowchart TD

CP(<b>CHECK</b>)
P[<tt>PlannerStep</tt>]
H[<tt>PatcherStep</tt>]
W[<tt>WriterStep</tt>]

CP --> P --> H --> W

Primarily useful for CI or audit workflows.


STRIP (Summary)

Purpose: Remove an existing TopMark header

Mutation: ❌ none (dry-run safe)

Steps:

flowchart TD

SP(<b>SCAN</b>)
X[<tt>StripperStep</tt>]

SP --> X

End states:

  • Header removed in rendered output
  • No-op if header absent
  • Skipped if unsupported or blocked

STRIP_PATCH

Purpose: Show diff for header removal

Mutation: ❌ none

Steps:

flowchart TD

XP(<b>STRIP</b>)
C[<tt>ComparerStep</tt>]
P[<tt>PlannerStep</tt>]
H[<tt>PatcherStep</tt>]

XP --> C --> P --> H

STRIP_APPLY

Purpose: Remove headers in place

Mutation: ✅ writes enabled

Steps:

flowchart TD

XP(<b>STRIP</b>)
P[<tt>PlannerStep</tt>]
W[<tt>WriterStep</tt>]

XP --> P --> W

STRIP_APPLY_PATCH

Purpose: Remove headers and emit patch

Mutation: ✅ writes enabled

Steps:

flowchart TD

XP(<b>STRIP</b>)
C[<tt>ComparerStep</tt>]
P[<tt>PlannerStep</tt>]
H[<tt>PatcherStep</tt>]
W[<tt>WriterStep</tt>]

XP --> C --> P --> H --> W

Step Responsibilities

Each step implements the Step protocol and:

  • Declares which status axes it may write
  • Declares which pipeline view slots it may consume via consumes_views
  • May halt execution via ctx.flow.halt
  • Emits structured hints for diagnostics
Step Responsibility
ProberStep Run resolution probe and expose scored candidates, selection, and processor binding
ResolverStep Determine file type and header processor (see Resolution)
SnifferStep Fast policy and newline checks
ReaderStep Read file content safely
ScannerStep Locate existing header bounds
BuilderStep Build expected header field values and POSIX-serialized metadata paths for the selected processing target
RendererStep Render header text
ComparerStep Compare existing vs rendered header
StripperStep Remove header content
PlannerStep Decide insert / replace / remove plan
PatcherStep Generate unified diff with human-facing display labels
WriterStep Persist changes

View consumer declarations

Pipeline view consumer declarations are part of the step contract. They describe which large, phase-scoped view payloads a step may read during run() or hint().

These declarations are intentionally separate from axes_written: axes describe status ownership, while consumes_views describes data dependencies. The runner aggregates the declarations of remaining steps and releases views that no later step can consume. This keeps pruning tied to typed view slots instead of brittle step-name string checks.

Current consumer declarations are:

Step Consumed view slots
ProberStep none
ResolverStep none
SnifferStep none
ReaderStep none
ScannerStep image
BuilderStep none
RendererStep image, header, build
ComparerStep image, header, build, render, updated
StripperStep image, header
PlannerStep image, header, render, updated
PatcherStep image, updated
WriterStep updated

ReaderStep and BuilderStep produce views but do not consume existing view slots. RendererStep consumes the original image because it may preserve insertion indentation from the source file.


Conditional and Policy-Driven End States

Some pipelines may terminate early due to policy or safety constraints:

Configuration validation happens before these pipeline steps run. Under effective strict config checking, configuration warnings are treated as validation failures and may prevent pipeline execution from starting.

  • Binary files
  • Mixed line endings
  • BOM before shebang
  • Missing read/write permissions
  • Hard-linked processing targets
  • Unsupported file types

In these cases:

  • The pipeline halts cleanly
  • No mutation occurs
  • A terminal hint explains why the file was skipped or blocked

This guarantees:

  • Safe dry-runs
  • No partial writes
  • Idempotent behavior across repeated runs

Key Design Guarantees

  • Immutability: Pipelines are Final[tuple[Step, ...]]
  • Determinism: Same input → same outcome
  • Processing-path identity: pipeline steps operate on selected processing paths, not raw invocation spellings
  • Filesystem-identity safety: hard-linked selected processing paths are blocked before ordinary per-file step execution without choosing a preferred path
  • Dry-run safety: No writes without --apply
  • Separation of concerns: Steps mutate context, views classify outcomes
  • Runtime/configuration separation: pipeline execution consumes resolved runtime configuration and runtime options rather than re-running TOML discovery during step execution

See also

This pipeline model is the backbone of TopMark's reliability and extensibility. New behavior is introduced by adding steps or composing new pipelines, not by special-casing control flow.


Per-axis lifecycle

TopMark tracks progress using a set of status axes. Each axis starts in PENDING and transitions as steps complete or halt early.

These diagrams are intentionally coarse: they show possible terminal states, not every code path.

Resolve axis

stateDiagram-v2
  direction LR
  [*] --> PENDING
  PENDING --> RESOLVED
  PENDING --> TYPE_RESOLVED_HEADERS_UNSUPPORTED
  PENDING --> TYPE_RESOLVED_NO_PROCESSOR_REGISTERED
  PENDING --> UNSUPPORTED

FS axis

stateDiagram-v2
  direction LR
  [*] --> PENDING
  PENDING --> OK
  PENDING --> EMPTY
  PENDING --> NOT_FOUND
  PENDING --> NO_READ_PERMISSION
  PENDING --> UNREADABLE
  PENDING --> HARD_LINK_DUPLICATE
  PENDING --> NO_WRITE_PERMISSION
  PENDING --> BINARY
  PENDING --> BOM_BEFORE_SHEBANG
  PENDING --> UNICODE_DECODE_ERROR
  PENDING --> MIXED_LINE_ENDINGS

Content axis

stateDiagram-v2
  direction LR
  [*] --> PENDING
  PENDING --> OK
  PENDING --> UNSUPPORTED
  PENDING --> SKIPPED_MIXED_LINE_ENDINGS
  PENDING --> SKIPPED_POLICY_BOM_BEFORE_SHEBANG
  PENDING --> UNREADABLE

Header axis

stateDiagram-v2
  direction LR
  [*] --> PENDING
  PENDING --> MISSING
  PENDING --> DETECTED
  PENDING --> MALFORMED
  PENDING --> MALFORMED_ALL_FIELDS
  PENDING --> MALFORMED_SOME_FIELDS
  PENDING --> EMPTY

Generation axis

stateDiagram-v2
  direction LR
  [*] --> PENDING
  PENDING --> GENERATED
  PENDING --> NO_FIELDS

Render axis

stateDiagram-v2
  direction LR
  [*] --> PENDING
  PENDING --> RENDERED

Comparison axis

stateDiagram-v2
  direction LR
  [*] --> PENDING
  PENDING --> CHANGED
  PENDING --> UNCHANGED
  PENDING --> SKIPPED

Strip axis

stateDiagram-v2
  direction LR
  [*] --> PENDING
  PENDING --> NOT_NEEDED
  PENDING --> READY
  PENDING --> FAILED

Plan axis

stateDiagram-v2
  direction LR
  [*] --> PENDING
  PENDING --> PREVIEWED
  PENDING --> REPLACED
  PENDING --> INSERTED
  PENDING --> REMOVED
  PENDING --> SKIPPED
  PENDING --> FAILED

Patch axis

stateDiagram-v2
  direction LR
  [*] --> PENDING
  PENDING --> GENERATED
  PENDING --> SKIPPED
  PENDING --> FAILED

Write axis

stateDiagram-v2
  direction LR
  [*] --> PENDING
  PENDING --> WRITTEN
  PENDING --> SKIPPED
  PENDING --> FAILED

CLI-focused flowcharts

These diagrams describe the user-visible execution paths behind topmark check and topmark strip, including the --patch and --apply switches.

topmark check

flowchart TD
  A[User runs: topmark check]
  B[SCAN: resolve + sniff + read + scan]
  C[CHECK_RENDER: build + render]
  D[COMPARE]
  E[Report: unchanged]
  F[Plan insert/replace]
  G[Report: would change]
  H[Generate patch]
  I[Write file]
  J[Report: patch shown]
  K[Report: written]
  L[Blocked by policy/fs/content]
  M[Report: skipped/unsupported/error]

  A --> B
  B --> C
  C --> D
  D -->|unchanged| E
  D -->|would change| F
  F -->|no --patch, no --apply| G
  F -->|--patch| H
  F -->|--apply| I
  H --> J
  I --> K
  B --> L --> M

topmark strip

flowchart TD
  A[User runs: topmark strip]
  B[SCAN: resolve + sniff + read + scan]
  C[STRIP: compute removal]
  D[COMPARE]
  E[Report: no-op]
  F[Plan removal]
  G[Report: would remove]
  H[Generate patch]
  I[Write file]
  J[Report: patch shown]
  K[Report: written]
  L[Blocked by policy/fs/content]
  M[Report: skipped/unsupported/error]

  A --> B
  B --> C
  C --> D
  D -->|nothing to remove| E
  D -->|would remove| F
  F -->|no --patch, no --apply| G
  F -->|--patch| H
  F -->|--apply| I
  H --> J
  I --> K
  B --> L --> M

Filtered or missing explicit inputs are not produced by ProberStep itself. They are represented by synthetic contexts created by probe orchestration before final presentation, API, and machine-readable output packaging.