Pipelines (Concepts)¶
TopMark processes files through explicit, immutable pipelines composed of small, single-responsibility steps. Each pipeline represents a supported execution intent (scan, check, strip, apply, patch) and defines exactly which steps run and in which order.
A dedicated probe pipeline exists for resolution diagnostics
(topmark probe). Probe orchestration also reports explicit inputs
filtered before file-type probing via synthetic probe contexts.
Pipelines do not make high-level decisions themselves. Instead:
- Each step mutates a strictly defined set of status axes
- Steps may halt execution when required by policy or safety rules
- Final outcomes (changed, unchanged, skipped, unsupported, error, ...) are derived centrally by the CLI and views from accumulated statuses and hints
This design guarantees predictability, debuggability, and idempotence.
Note
The canonical vocabulary used throughout the documentation is defined in Terminology and Canonical Vocabulary.
Pipeline execution consumes an immutable FrozenConfig plus
runtime options assembled from the TOML → FrozenConfig → runtime flow documented in
Architecture and Configuration discovery.
Source-local TOML options such as [config].root and strict are resolved before pipeline
execution. They influence configuration discovery and staged config-loading validation behavior, but
do not become layered configuration fields.
Note
[config].strict is a TOML-source-local strictness preference controlling staged
configuration-loading validation for the current TOML source.
Effective strictness is evaluated across:
- TOML-source diagnostics;
- merged-config diagnostics;
- runtime applicability diagnostics.
strict is resolved during TOML loading and does not become a layered configuration field.
Concepts vs Reference¶
This page explains how the pipelines work and how the CLI composes them. For the canonical, API-backed definitions of pipelines, steps, and enums, see:
- Pipelines reference hub:
Pipelines (Reference) - Internals (generated):
api/internals/topmark/pipeline/pipelines.md - Architecture overview:
Architecture
Step names and enum names on this page are written as MkDocStrings/AutoRefs links, for example
topmark.pipeline.steps.resolver.ResolverStep.
MkDocs resolves these references through the generated API documentation.
Pipeline Overview¶
All pipelines are built from the same core phases:
- Discovery - identify file type and viability
- Inspection - read content and detect existing headers
- Evaluation - generate and compare expected headers
- Mutation (optional) - plan, patch, and/or write changes
The probe pipeline is an exception: it only executes the resolution phase and stops immediately
after producing probe results.
Unified Pipeline Flow¶
flowchart TD
subgraph Probing
O[<tt>ProberStep</tt>]
end
subgraph Discovery
R[<tt>ResolverStep</tt>]
S[<tt>SnifferStep</tt>]
D[<tt>ReaderStep</tt>]
N[<tt>ScannerStep</tt>]
R --> S --> D --> N
end
subgraph Check
B[<tt>BuilderStep</tt>]
T[<tt>RendererStep</tt>]
N --> B --> T
end
subgraph Strip
X[<tt>StripperStep</tt>]
N --> X
end
subgraph Comparison
C[<tt>ComparerStep</tt>]
end
subgraph Mutation
P[<tt>PlannerStep</tt>]
H[<tt>PatcherStep</tt>]
W[<tt>WriterStep</tt>]
C --> P
P -->|patch| H
P -->|apply| W
end
T --> C
X ---> C
Not all pipelines traverse all phases. Each variant selects a strict subset of steps.
Pipeline guarantees¶
TopMark pipelines are:
- deterministic
- step-ordered
- side-effect constrained
- idempotent
- presentation-independent
Pipeline steps mutate processing context state. CLI views, API DTOs, and machine-readable output classify final outcomes from accumulated statuses and hints.
Available Pipelines¶
Pipelines are defined in src/topmark/pipeline/pipelines.py and exposed via
topmark.pipeline.pipelines.Pipeline.
The CLI selects among these immutable pipeline variants based on command intent and flags such as
--patch and --apply.
PROBE¶
Purpose: Explain file type and processor resolution
Mutation: ❌ none
Steps:
flowchart TD
O[<tt>ProberStep</tt>]
End states:
-
Resolution status (
resolved,unsupported,no_processor,filtered) -
Selected file type and processor (if any)
-
Full candidate set with match signals
-
Explicit inputs filtered before file-type probing are represented by synthetic probe results with
status="filtered"and reasons such asexcluded_by_path_filter,excluded_by_file_type_filter, orexcluded_by_discovery_filter.
This pipeline powers topmark probe and
topmark.api.probe() and is intentionally
resolution-only.
It halts immediately after probing and does not perform inspection, comparison, or mutation. Discovery-level filtering is reported by orchestration via synthetic probe results for explicitly requested paths that did not reach probing.
SCAN¶
Purpose: Detect file type and existing TopMark headers
Mutation: ❌ none
Steps:
flowchart TD
R[<tt>ResolverStep</tt>]
S[<tt>SnifferStep</tt>]
D[<tt>ReaderStep</tt>]
N[<tt>ScannerStep</tt>]
R --> S --> D --> N
End states:
- Header detected / missing / malformed
- File unsupported, unreadable, binary, or blocked by policy
This pipeline is used as the foundation for all others.
CHECK_RENDER¶
Purpose: Generate the expected header without comparison
Mutation: ❌ none
Steps:
flowchart TD
SP(<b>SCAN</b>)
B[<tt>BuilderStep</tt>]
T[<tt>RendererStep</tt>]
SP --> B --> T
End states:
- Rendered header available in context
- No determination yet whether changes are needed
Useful for debugging header generation.
CHECK (Summary)¶
Purpose: Determine whether a file would change
Mutation: ❌ none (dry-run safe)
Steps:
flowchart TD
CR(<b>CHECK_RENDER</b>)
C[<tt>ComparerStep</tt>]
CR --> C
End states:
UNCHANGED- rendered header matches existing headerCHANGED- header would be updated or insertedSKIPPED/UNSUPPORTED- policy or file constraints
This is the default pipeline behind topmark check.
CHECK_PATCH¶
Purpose: Produce a unified diff without writing
Mutation: ❌ none (dry-run safe)
Steps:
flowchart TD
CP(<b>CHECK</b>)
P[<tt>PlannerStep</tt>]
H[<tt>PatcherStep</tt>]
CP --> P --> H
End states:
- Patch generated
- No patch if unchanged or skipped
Used when --patch is requested without --apply.
CHECK_APPLY¶
Purpose: Update or insert headers in place
Mutation: ✅ writes enabled
Steps:
flowchart TD
CP(<b>CHECK</b>)
P[<tt>PlannerStep</tt>]
W[<tt>WriterStep</tt>]
CP --> P --> W
End states:
- File written
- Write skipped if unchanged or blocked
- Failure if filesystem or policy prevents writing
Requires --apply.
CHECK_APPLY_PATCH¶
Purpose: Apply changes and emit a patch
Mutation: ✅ writes enabled
Steps:
flowchart TD
CP(<b>CHECK</b>)
P[<tt>PlannerStep</tt>]
H[<tt>PatcherStep</tt>]
W[<tt>WriterStep</tt>]
CP --> P --> H --> W
Primarily useful for CI or audit workflows.
STRIP (Summary)¶
Purpose: Remove an existing TopMark header
Mutation: ❌ none (dry-run safe)
Steps:
flowchart TD
SP(<b>SCAN</b>)
X[<tt>StripperStep</tt>]
SP --> X
End states:
- Header removed in rendered output
- No-op if header absent
- Skipped if unsupported or blocked
STRIP_PATCH¶
Purpose: Show diff for header removal
Mutation: ❌ none
Steps:
flowchart TD
XP(<b>STRIP</b>)
C[<tt>ComparerStep</tt>]
P[<tt>PlannerStep</tt>]
H[<tt>PatcherStep</tt>]
XP --> C --> P --> H
STRIP_APPLY¶
Purpose: Remove headers in place
Mutation: ✅ writes enabled
Steps:
flowchart TD
XP(<b>STRIP</b>)
P[<tt>PlannerStep</tt>]
W[<tt>WriterStep</tt>]
XP --> P --> W
STRIP_APPLY_PATCH¶
Purpose: Remove headers and emit patch
Mutation: ✅ writes enabled
Steps:
flowchart TD
XP(<b>STRIP</b>)
C[<tt>ComparerStep</tt>]
P[<tt>PlannerStep</tt>]
H[<tt>PatcherStep</tt>]
W[<tt>WriterStep</tt>]
XP --> C --> P --> H --> W
Step Responsibilities¶
Each step implements the Step protocol and:
- Declares which status axes it may write
- May halt execution via
ctx.flow.halt - Emits structured hints for diagnostics
| Step | Responsibility |
|---|---|
ProberStep |
Run resolution probe and expose scored candidates, selection, and processor binding |
ResolverStep |
Determine file type and header processor (see Resolution) |
SnifferStep |
Fast policy and newline checks |
ReaderStep |
Read file content safely |
ScannerStep |
Locate existing header bounds |
BuilderStep |
Build expected header field values |
RendererStep |
Render header text |
ComparerStep |
Compare existing vs rendered header |
StripperStep |
Remove header content |
PlannerStep |
Decide insert / replace / remove plan |
PatcherStep |
Generate unified diff |
WriterStep |
Persist changes |
Conditional and Policy-Driven End States¶
Some pipelines may terminate early due to policy or safety constraints:
Configuration validation happens before these pipeline steps run. Under effective strict config checking, configuration warnings are treated as validation failures and may prevent pipeline execution from starting.
- Binary files
- Mixed line endings
- BOM before shebang
- Missing read/write permissions
- Unsupported file types
In these cases:
- The pipeline halts cleanly
- No mutation occurs
- A terminal hint explains why the file was skipped or blocked
This guarantees:
- Safe dry-runs
- No partial writes
- Idempotent behavior across repeated runs
Key Design Guarantees¶
- Immutability: Pipelines are
Final[tuple[Step, ...]] - Determinism: Same input → same outcome
- Dry-run safety: No writes without
--apply - Separation of concerns: Steps mutate context, views classify outcomes
- Runtime/configuration separation: pipeline execution consumes resolved runtime configuration and runtime options rather than re-running TOML discovery during step execution
See also¶
Architecture- TOML → FrozenConfig → runtime overviewPipelines (Reference)- generated API-backed reference entry pointsTerminology and Canonical Vocabulary- canonical definitions for pipeline, status, hint, runtime, and machine-readable terminologyMachine-readable output- how pipeline results are exposed in JSON and NDJSON outputsConfiguration discovery- source-local TOML options and precedence
This pipeline model is the backbone of TopMark's reliability and extensibility. New behavior is introduced by adding steps or composing new pipelines, not by special-casing control flow.
Per-axis lifecycle¶
TopMark tracks progress using a set of status axes. Each axis starts in PENDING and
transitions as steps complete or halt early.
These diagrams are intentionally coarse: they show possible terminal states, not every code path.
Resolve axis¶
stateDiagram-v2
direction LR
[*] --> PENDING
PENDING --> RESOLVED
PENDING --> TYPE_RESOLVED_HEADERS_UNSUPPORTED
PENDING --> TYPE_RESOLVED_NO_PROCESSOR_REGISTERED
PENDING --> UNSUPPORTED
FS axis¶
stateDiagram-v2
direction LR
[*] --> PENDING
PENDING --> OK
PENDING --> EMPTY
PENDING --> NOT_FOUND
PENDING --> NO_READ_PERMISSION
PENDING --> UNREADABLE
PENDING --> NO_WRITE_PERMISSION
PENDING --> BINARY
PENDING --> BOM_BEFORE_SHEBANG
PENDING --> UNICODE_DECODE_ERROR
PENDING --> MIXED_LINE_ENDINGS
Content axis¶
stateDiagram-v2
direction LR
[*] --> PENDING
PENDING --> OK
PENDING --> UNSUPPORTED
PENDING --> SKIPPED_MIXED_LINE_ENDINGS
PENDING --> SKIPPED_POLICY_BOM_BEFORE_SHEBANG
PENDING --> UNREADABLE
Header axis¶
stateDiagram-v2
direction LR
[*] --> PENDING
PENDING --> MISSING
PENDING --> DETECTED
PENDING --> MALFORMED
PENDING --> MALFORMED_ALL_FIELDS
PENDING --> MALFORMED_SOME_FIELDS
PENDING --> EMPTY
Generation axis¶
stateDiagram-v2
direction LR
[*] --> PENDING
PENDING --> GENERATED
PENDING --> NO_FIELDS
Render axis¶
stateDiagram-v2
direction LR
[*] --> PENDING
PENDING --> RENDERED
Comparison axis¶
stateDiagram-v2
direction LR
[*] --> PENDING
PENDING --> CHANGED
PENDING --> UNCHANGED
PENDING --> SKIPPED
Strip axis¶
stateDiagram-v2
direction LR
[*] --> PENDING
PENDING --> NOT_NEEDED
PENDING --> READY
PENDING --> FAILED
Plan axis¶
stateDiagram-v2
direction LR
[*] --> PENDING
PENDING --> PREVIEWED
PENDING --> REPLACED
PENDING --> INSERTED
PENDING --> REMOVED
PENDING --> SKIPPED
PENDING --> FAILED
Patch axis¶
stateDiagram-v2
direction LR
[*] --> PENDING
PENDING --> GENERATED
PENDING --> SKIPPED
PENDING --> FAILED
Write axis¶
stateDiagram-v2
direction LR
[*] --> PENDING
PENDING --> WRITTEN
PENDING --> SKIPPED
PENDING --> FAILED
CLI-focused flowcharts¶
These diagrams describe the user-visible execution paths behind
topmark check and topmark strip,
including the --patch and --apply switches.
topmark check¶
flowchart TD
A[User runs: topmark check]
B[SCAN: resolve + sniff + read + scan]
C[CHECK_RENDER: build + render]
D[COMPARE]
E[Report: unchanged]
F[Plan insert/replace]
G[Report: would change]
H[Generate patch]
I[Write file]
J[Report: patch shown]
K[Report: written]
L[Blocked by policy/fs/content]
M[Report: skipped/unsupported/error]
A --> B
B --> C
C --> D
D -->|unchanged| E
D -->|would change| F
F -->|no --patch, no --apply| G
F -->|--patch| H
F -->|--apply| I
H --> J
I --> K
B --> L --> M
topmark strip¶
flowchart TD
A[User runs: topmark strip]
B[SCAN: resolve + sniff + read + scan]
C[STRIP: compute removal]
D[COMPARE]
E[Report: no-op]
F[Plan removal]
G[Report: would remove]
H[Generate patch]
I[Write file]
J[Report: patch shown]
K[Report: written]
L[Blocked by policy/fs/content]
M[Report: skipped/unsupported/error]
A --> B
B --> C
C --> D
D -->|nothing to remove| E
D -->|would remove| F
F -->|no --patch, no --apply| G
F -->|--patch| H
F -->|--apply| I
H --> J
I --> K
B --> L --> M
Filtered or missing explicit inputs are not produced by
ProberStep itself. They are represented by synthetic
contexts created by probe orchestration before final presentation, API, and machine-readable output
packaging.