Pipelines (Concepts)¶

TopMark processes files through explicit, immutable pipelines composed of small, single-responsibility steps. Each pipeline represents a supported execution intent (scan, check, strip, apply, patch) and defines exactly which steps run and in which order.

A dedicated probe pipeline exists for resolution diagnostics (topmark probe). Probe orchestration also reports explicit inputs filtered before file-type probing via synthetic probe contexts.

Pipelines do not make high-level decisions themselves. Instead:

Each step mutates a strictly defined set of status axes
Steps may halt execution when required by policy or safety rules
Final outcomes (changed, unchanged, skipped, unsupported, error, ...) are derived centrally by the CLI and views from accumulated statuses and hints

This design guarantees predictability, debuggability, and idempotence.

Note

The canonical vocabulary used throughout the documentation is defined in Terminology and Canonical Vocabulary.

Pipeline execution consumes an immutable FrozenConfig plus runtime options assembled from the TOML → FrozenConfig → runtime flow documented in Architecture and Configuration discovery.

Pipeline execution also consumes a selected processing path. File-list resolution performs filesystem-identity evaluation before ordinary pipeline execution begins.

Filesystem-identity normalization collapses equivalent path spellings, such as symlinks, into a selected processing path. Filesystem-identity eligibility checks determine whether selected processing paths are safe to process. Pipeline steps therefore operate on processing paths rather than preserving original CLI, configuration, glob, or symlink spellings.

Hard-linked selected processing paths are handled by an invocation-wide engine guard before ordinary per-file pipeline execution. If multiple selected paths refer to the same filesystem object through hard links, every affected path is blocked as an unsupported processing target; no source, target, winner, or loser path is selected.

Source-local TOML options such as [config].root and strict are resolved before pipeline execution. They influence configuration discovery and staged config-loading validation behavior, but do not become layered configuration fields.

Note

[config].strict is a TOML-source-local strictness preference controlling staged configuration-loading validation for the current TOML source.

Effective strictness is evaluated across:

TOML-source diagnostics;
merged-config diagnostics;
runtime applicability diagnostics.

When strict validation fails, TopMark exits with CONFIG_ERROR. The diagnostics that triggered the failure remain visible in human-readable and machine-readable output formats.

strict is resolved during TOML loading and does not become a layered configuration field.

Concepts vs Reference¶

This page explains how the pipelines work and how the CLI composes them. For the canonical, API-backed definitions of pipelines, steps, and enums, see:

Pipelines reference hub: Pipelines (Reference)
Internals (generated): api/internals/topmark/pipeline/pipelines.md
Architecture overview: Architecture

Step names and enum names on this page are written as MkDocStrings/AutoRefs links, for example topmark.pipeline.steps.resolver.ResolverStep. MkDocs resolves these references through the generated API documentation.

Pipeline Overview¶

All pipelines are built from the same core phases:

Input selection - discover files, evaluate filesystem identity, normalize equivalent path spellings, enforce processing-target eligibility, and select processing paths
Discovery - identify file type and viability for each processing path
Inspection - read content and detect existing headers
Evaluation - generate and compare expected headers
Mutation (optional) - plan, patch, and/or write changes

The probe pipeline is an exception: it only executes the resolution phase and stops immediately after producing probe results.

Input selection happens before ordinary pipeline execution. Filesystem-identity normalization handles symlink behavior: file symlink spellings and their targets are collapsed to the resolved processing target before pipeline steps run. Filesystem-identity eligibility checks handle safety policy such as hard-link detection: hard-linked selected processing paths are blocked before ordinary step execution, while unrelated selected paths continue through the requested pipeline. Synthetic probe contexts for filtered or missing explicit inputs preserve diagnostic input information only for those paths that never became normal processing paths.

Unified Pipeline Flow¶

flowchart TD

  subgraph Probing
    O[<tt>ProberStep</tt>]
  end

  subgraph Discovery
    R[<tt>ResolverStep</tt>]
    S[<tt>SnifferStep</tt>]
    D[<tt>ReaderStep</tt>]
    N[<tt>ScannerStep</tt>]

    R --> S --> D --> N
  end

  subgraph Check
    B[<tt>BuilderStep</tt>]
    T[<tt>RendererStep</tt>]

    N --> B --> T
  end

  subgraph Strip
    X[<tt>StripperStep</tt>]

    N --> X
  end

  subgraph Comparison
    C[<tt>ComparerStep</tt>]
  end

  subgraph Mutation
    P[<tt>PlannerStep</tt>]
    H[<tt>PatcherStep</tt>]
    W[<tt>WriterStep</tt>]

    C --> P
    P -->|patch| H
    P -->|apply| W
  end

  T --> C
  X ---> C

Not all pipelines traverse all phases. Each variant selects a strict subset of steps.

Pipeline guarantees¶

TopMark pipelines are:

deterministic
step-ordered
side-effect constrained
idempotent
processing-path based
presentation-independent

Pipeline steps mutate processing context state. CLI views, API DTOs, and machine-readable output classify final outcomes from accumulated statuses and hints.

Some intermediate data is stored in phase-scoped pipeline views, such as the original file image, detected header data, generated fields, rendered headers, updated content, and unified diffs. Steps that read these views declare their dependencies via consumes_views. When runtime view pruning is enabled, the runner uses those declarations to release consumed view payloads after the last remaining consumer has run, while preserving requested output such as retained diffs.

For filesystem inputs, the processing context path is the selected processing path. It may differ from the path spelling supplied on the command line or in configuration when symlinks or equivalent relative spellings are involved.

For hard-linked filesystem inputs, selected processing paths remain separate results but are blocked before ordinary per-file pipeline execution. The engine does not collapse the hard-link group into a preferred source, target, winner, or loser path.

Available Pipelines¶

Pipelines are defined in src/topmark/pipeline/pipelines.py and exposed via topmark.pipeline.pipelines.Pipeline.

The CLI selects among these immutable pipeline variants based on command intent and flags such as --patch and --apply.

PROBE¶

Purpose: Explain file type and processor resolution

Mutation: ❌ none

Steps:

flowchart TD

O[<tt>ProberStep</tt>]

End states:

Resolution status (resolved, unsupported, no_processor, filtered)
Selected file type and processor (if any)
Full candidate set with match signals
Explicit inputs filtered before file-type probing are represented by synthetic probe results with status="filtered" and reasons such as excluded_by_path_filter, excluded_by_file_type_filter, or excluded_by_discovery_filter.

This pipeline powers topmark probe and topmark.api.probe() and is intentionally resolution-only.

It halts immediately after probing and does not perform inspection, comparison, or mutation. Discovery-level filtering is reported by orchestration via synthetic probe results for explicitly requested paths that did not reach probing.

Probe results that do reach runtime probing report processing paths. They should not be interpreted as a lossless echo of the original invocation spelling.

Hard-linked selected processing paths also remain visible in probe output. Each affected path is reported independently as unsupported with the stable reason string hard_link_duplicate.

SCAN¶

Purpose: Detect file type and existing TopMark headers

Mutation: ❌ none

Steps:

flowchart TD

R[<tt>ResolverStep</tt>]
S[<tt>SnifferStep</tt>]
D[<tt>ReaderStep</tt>]
N[<tt>ScannerStep</tt>]

R --> S --> D --> N

End states:

Header detected / missing / malformed
File unsupported, unreadable, binary, or blocked by policy
Hard-linked processing target blocked before ordinary scan steps run

This pipeline is used as the foundation for all others.

CHECK_RENDER¶

Purpose: Generate the expected header without comparison

Mutation: ❌ none

Steps:

flowchart TD

SP(<b>SCAN</b>)
B[<tt>BuilderStep</tt>]
T[<tt>RendererStep</tt>]

SP --> B --> T

End states:

Rendered header available in context
No determination yet whether changes are needed

BuilderStep derives built-in header metadata fields such as file_relpath, file_abspath, relpath, and abspath from the selected processing target. If a file was reached through a symlink, these generated fields describe the resolved target TopMark reads and writes rather than the symlink spelling. Header metadata path fields are serialized with POSIX / separators on all platforms.

Useful for debugging header generation.

CHECK (Summary)¶

Purpose: Determine whether a file would change

Mutation: ❌ none (dry-run safe)

Steps:

flowchart TD

CR(<b>CHECK_RENDER</b>)
C[<tt>ComparerStep</tt>]

CR --> C

End states:

UNCHANGED - rendered header matches existing header
CHANGED - header would be updated or inserted
SKIPPED / UNSUPPORTED - policy or file constraints

This is the default pipeline behind topmark check.

CHECK_PATCH¶

Purpose: Produce a unified diff without writing

Mutation: ❌ none (dry-run safe)

Steps:

flowchart TD

CP(<b>CHECK</b>)
P[<tt>PlannerStep</tt>]
H[<tt>PatcherStep</tt>]

CP --> P --> H

End states:

Patch generated
No patch if unchanged or skipped

PatcherStep generates unified diffs for human review. Diff file labels use the same human-facing display-path policy as TEXT and Markdown reports, including the logical --stdin-filename for STDIN-backed processing when available. They are not machine-readable path serialization fields.

Used when --patch is requested without --apply.

CHECK_APPLY¶

Purpose: Update or insert headers in place

Mutation: ✅ writes enabled

Steps:

flowchart TD

CP(<b>CHECK</b>)
P[<tt>PlannerStep</tt>]
W[<tt>WriterStep</tt>]

CP --> P --> W

End states:

File written
Write skipped if unchanged or blocked
Failure if filesystem or policy prevents writing

Requires --apply.

CHECK_APPLY_PATCH¶

Purpose: Apply changes and emit a patch

Mutation: ✅ writes enabled

Steps:

flowchart TD

CP(<b>CHECK</b>)
P[<tt>PlannerStep</tt>]
H[<tt>PatcherStep</tt>]
W[<tt>WriterStep</tt>]

CP --> P --> H --> W

Primarily useful for CI or audit workflows.

STRIP (Summary)¶

Purpose: Remove an existing TopMark header

Mutation: ❌ none (dry-run safe)

Steps:

flowchart TD

SP(<b>SCAN</b>)
X[<tt>StripperStep</tt>]

SP --> X

End states:

Header removed in rendered output
No-op if header absent
Skipped if unsupported or blocked

STRIP_PATCH¶

Purpose: Show diff for header removal

Mutation: ❌ none

Steps:

flowchart TD

XP(<b>STRIP</b>)
C[<tt>ComparerStep</tt>]
P[<tt>PlannerStep</tt>]
H[<tt>PatcherStep</tt>]

XP --> C --> P --> H

STRIP_APPLY¶

Purpose: Remove headers in place

Mutation: ✅ writes enabled

Steps:

flowchart TD

XP(<b>STRIP</b>)
P[<tt>PlannerStep</tt>]
W[<tt>WriterStep</tt>]

XP --> P --> W

STRIP_APPLY_PATCH¶

Purpose: Remove headers and emit patch

Mutation: ✅ writes enabled

Steps:

flowchart TD

XP(<b>STRIP</b>)
C[<tt>ComparerStep</tt>]
P[<tt>PlannerStep</tt>]
H[<tt>PatcherStep</tt>]
W[<tt>WriterStep</tt>]

XP --> C --> P --> H --> W

Step Responsibilities¶

Each step implements the Step protocol and:

Declares which status axes it may write
Declares which pipeline view slots it may consume via consumes_views
May halt execution via ctx.flow.halt
Emits structured hints for diagnostics

Step	Responsibility
`ProberStep`	Run resolution probe and expose scored candidates, selection, and processor binding
`ResolverStep`	Determine file type and header processor (see `Resolution`)
`SnifferStep`	Fast policy and newline checks
`ReaderStep`	Read file content safely
`ScannerStep`	Locate existing header bounds
`BuilderStep`	Build expected header field values and POSIX-serialized metadata paths for the selected processing target
`RendererStep`	Render header text
`ComparerStep`	Compare existing vs rendered header
`StripperStep`	Remove header content
`PlannerStep`	Decide insert / replace / remove plan
`PatcherStep`	Generate unified diff with human-facing display labels
`WriterStep`	Persist changes

View consumer declarations¶

Pipeline view consumer declarations are part of the step contract. They describe which large, phase-scoped view payloads a step may read during run() or hint().

These declarations are intentionally separate from axes_written: axes describe status ownership, while consumes_views describes data dependencies. The runner aggregates the declarations of remaining steps and releases views that no later step can consume. This keeps pruning tied to typed view slots instead of brittle step-name string checks.

Current consumer declarations are:

Step	Consumed view slots
`ProberStep`	none
`ResolverStep`	none
`SnifferStep`	none
`ReaderStep`	none
`ScannerStep`	`image`
`BuilderStep`	none
`RendererStep`	`image`, `header`, `build`
`ComparerStep`	`image`, `header`, `build`, `render`, `updated`
`StripperStep`	`image`, `header`
`PlannerStep`	`image`, `header`, `render`, `updated`
`PatcherStep`	`image`, `updated`
`WriterStep`	`updated`

ReaderStep and BuilderStep produce views but do not consume existing view slots. RendererStep consumes the original image because it may preserve insertion indentation from the source file.

Conditional and Policy-Driven End States¶

Some pipelines may terminate early due to policy or safety constraints:

Configuration validation happens before these pipeline steps run. Under effective strict config checking, configuration warnings are treated as validation failures and may prevent pipeline execution from starting.

Binary files
Mixed line endings
BOM before shebang
Missing read/write permissions
Hard-linked processing targets
Unsupported file types

In these cases:

The pipeline halts cleanly
No mutation occurs
A terminal hint explains why the file was skipped or blocked

This guarantees:

Safe dry-runs
No partial writes
Idempotent behavior across repeated runs

Key Design Guarantees¶

Immutability: Pipelines are Final[tuple[Step, ...]]
Determinism: Same input → same outcome
Processing-path identity: pipeline steps operate on selected processing paths, not raw invocation spellings
Filesystem-identity safety: hard-linked selected processing paths are blocked before ordinary per-file step execution without choosing a preferred path
Dry-run safety: No writes without --apply
Separation of concerns: Steps mutate context, views classify outcomes
Runtime/configuration separation: pipeline execution consumes resolved runtime configuration and runtime options rather than re-running TOML discovery during step execution

Per-axis lifecycle¶

TopMark tracks progress using a set of status axes. Each axis starts in PENDING and transitions as steps complete or halt early.

These diagrams are intentionally coarse: they show possible terminal states, not every code path.

Resolve axis¶

stateDiagram-v2
  direction LR
  [*] --> PENDING
  PENDING --> RESOLVED
  PENDING --> TYPE_RESOLVED_HEADERS_UNSUPPORTED
  PENDING --> TYPE_RESOLVED_NO_PROCESSOR_REGISTERED
  PENDING --> UNSUPPORTED

FS axis¶

stateDiagram-v2
  direction LR
  [*] --> PENDING
  PENDING --> OK
  PENDING --> EMPTY
  PENDING --> NOT_FOUND
  PENDING --> NO_READ_PERMISSION
  PENDING --> UNREADABLE
  PENDING --> HARD_LINK_DUPLICATE
  PENDING --> NO_WRITE_PERMISSION
  PENDING --> BINARY
  PENDING --> BOM_BEFORE_SHEBANG
  PENDING --> UNICODE_DECODE_ERROR
  PENDING --> MIXED_LINE_ENDINGS

Content axis¶

stateDiagram-v2
  direction LR
  [*] --> PENDING
  PENDING --> OK
  PENDING --> UNSUPPORTED
  PENDING --> SKIPPED_MIXED_LINE_ENDINGS
  PENDING --> SKIPPED_POLICY_BOM_BEFORE_SHEBANG
  PENDING --> UNREADABLE

Header axis¶

stateDiagram-v2
  direction LR
  [*] --> PENDING
  PENDING --> MISSING
  PENDING --> DETECTED
  PENDING --> MALFORMED
  PENDING --> MALFORMED_ALL_FIELDS
  PENDING --> MALFORMED_SOME_FIELDS
  PENDING --> EMPTY

Generation axis¶

stateDiagram-v2
  direction LR
  [*] --> PENDING
  PENDING --> GENERATED
  PENDING --> NO_FIELDS

Render axis¶

stateDiagram-v2
  direction LR
  [*] --> PENDING
  PENDING --> RENDERED

Comparison axis¶

stateDiagram-v2
  direction LR
  [*] --> PENDING
  PENDING --> CHANGED
  PENDING --> UNCHANGED
  PENDING --> SKIPPED

Strip axis¶

stateDiagram-v2
  direction LR
  [*] --> PENDING
  PENDING --> NOT_NEEDED
  PENDING --> READY
  PENDING --> FAILED

Plan axis¶

stateDiagram-v2
  direction LR
  [*] --> PENDING
  PENDING --> PREVIEWED
  PENDING --> REPLACED
  PENDING --> INSERTED
  PENDING --> REMOVED
  PENDING --> SKIPPED
  PENDING --> FAILED

Patch axis¶

stateDiagram-v2
  direction LR
  [*] --> PENDING
  PENDING --> GENERATED
  PENDING --> SKIPPED
  PENDING --> FAILED

Write axis¶

stateDiagram-v2
  direction LR
  [*] --> PENDING
  PENDING --> WRITTEN
  PENDING --> SKIPPED
  PENDING --> FAILED

CLI-focused flowcharts¶

These diagrams describe the user-visible execution paths behind topmark check and topmark strip, including the --patch and --apply switches.

`topmark check`¶

flowchart TD
  A[User runs: topmark check]
  B[SCAN: resolve + sniff + read + scan]
  C[CHECK_RENDER: build + render]
  D[COMPARE]
  E[Report: unchanged]
  F[Plan insert/replace]
  G[Report: would change]
  H[Generate patch]
  I[Write file]
  J[Report: patch shown]
  K[Report: written]
  L[Blocked by policy/fs/content]
  M[Report: skipped/unsupported/error]

  A --> B
  B --> C
  C --> D
  D -->|unchanged| E
  D -->|would change| F
  F -->|no --patch, no --apply| G
  F -->|--patch| H
  F -->|--apply| I
  H --> J
  I --> K
  B --> L --> M

`topmark strip`¶

flowchart TD
  A[User runs: topmark strip]
  B[SCAN: resolve + sniff + read + scan]
  C[STRIP: compute removal]
  D[COMPARE]
  E[Report: no-op]
  F[Plan removal]
  G[Report: would remove]
  H[Generate patch]
  I[Write file]
  J[Report: patch shown]
  K[Report: written]
  L[Blocked by policy/fs/content]
  M[Report: skipped/unsupported/error]

  A --> B
  B --> C
  C --> D
  D -->|nothing to remove| E
  D -->|would remove| F
  F -->|no --patch, no --apply| G
  F -->|--patch| H
  F -->|--apply| I
  H --> J
  I --> K
  B --> L --> M

Filtered or missing explicit inputs are not produced by ProberStep itself. They are represented by synthetic contexts created by probe orchestration before final presentation, API, and machine-readable output packaging.

Pipelines (Concepts)¶

Concepts vs Reference¶

Pipeline Overview¶

Unified Pipeline Flow¶

Pipeline guarantees¶

Available Pipelines¶

PROBE¶

SCAN¶

CHECK_RENDER¶

CHECK (Summary)¶

CHECK_PATCH¶

CHECK_APPLY¶

CHECK_APPLY_PATCH¶

STRIP (Summary)¶

STRIP_PATCH¶

STRIP_APPLY¶

STRIP_APPLY_PATCH¶

Step Responsibilities¶

View consumer declarations¶

Conditional and Policy-Driven End States¶

Key Design Guarantees¶

See also¶

Per-axis lifecycle¶

Resolve axis¶

FS axis¶

Content axis¶

Header axis¶

Generation axis¶

Render axis¶

Comparison axis¶

Strip axis¶

Plan axis¶

Patch axis¶

Write axis¶

CLI-focused flowcharts¶

topmark check¶

topmark strip¶

`topmark check`¶

`topmark strip`¶