Skip to content

Pipelines (Concepts)

TopMark processes files through explicit, immutable pipelines composed of small, single-responsibility steps. Each pipeline represents a supported execution intent (scan, check, strip, apply, patch) and defines exactly which steps run and in which order.

A dedicated probe pipeline exists for resolution diagnostics (topmark probe). Probe orchestration also reports explicit inputs filtered before file-type probing via synthetic probe contexts.

Pipelines do not make high-level decisions themselves. Instead:

  • Each step mutates a strictly defined set of status axes
  • Steps may halt execution when required by policy or safety rules
  • Final outcomes (changed, unchanged, skipped, unsupported, error, ...) are derived centrally by the CLI and views from accumulated statuses and hints

This design guarantees predictability, debuggability, and idempotence.

Note

The canonical vocabulary used throughout the documentation is defined in Terminology and Canonical Vocabulary.

Pipeline execution consumes an immutable FrozenConfig plus runtime options assembled from the TOML → FrozenConfig → runtime flow documented in Architecture and Configuration discovery.

Source-local TOML options such as [config].root and strict are resolved before pipeline execution. They influence configuration discovery and staged config-loading validation behavior, but do not become layered configuration fields.

Note

[config].strict is a TOML-source-local strictness preference controlling staged configuration-loading validation for the current TOML source.

Effective strictness is evaluated across:

  • TOML-source diagnostics;
  • merged-config diagnostics;
  • runtime applicability diagnostics.

strict is resolved during TOML loading and does not become a layered configuration field.

Concepts vs Reference

This page explains how the pipelines work and how the CLI composes them. For the canonical, API-backed definitions of pipelines, steps, and enums, see:

Step names and enum names on this page are written as MkDocStrings/AutoRefs links, for example topmark.pipeline.steps.resolver.ResolverStep. MkDocs resolves these references through the generated API documentation.


Pipeline Overview

All pipelines are built from the same core phases:

  1. Discovery - identify file type and viability
  2. Inspection - read content and detect existing headers
  3. Evaluation - generate and compare expected headers
  4. Mutation (optional) - plan, patch, and/or write changes

The probe pipeline is an exception: it only executes the resolution phase and stops immediately after producing probe results.

Unified Pipeline Flow

flowchart TD

  subgraph Probing
    O[<tt>ProberStep</tt>]
  end

  subgraph Discovery
    R[<tt>ResolverStep</tt>]
    S[<tt>SnifferStep</tt>]
    D[<tt>ReaderStep</tt>]
    N[<tt>ScannerStep</tt>]

    R --> S --> D --> N
  end

  subgraph Check
    B[<tt>BuilderStep</tt>]
    T[<tt>RendererStep</tt>]

    N --> B --> T
  end

  subgraph Strip
    X[<tt>StripperStep</tt>]

    N --> X
  end

  subgraph Comparison
    C[<tt>ComparerStep</tt>]
  end

  subgraph Mutation
    P[<tt>PlannerStep</tt>]
    H[<tt>PatcherStep</tt>]
    W[<tt>WriterStep</tt>]

    C --> P
    P -->|patch| H
    P -->|apply| W
  end

  T --> C
  X ---> C

Not all pipelines traverse all phases. Each variant selects a strict subset of steps.


Pipeline guarantees

TopMark pipelines are:

  • deterministic
  • step-ordered
  • side-effect constrained
  • idempotent
  • presentation-independent

Pipeline steps mutate processing context state. CLI views, API DTOs, and machine-readable output classify final outcomes from accumulated statuses and hints.


Available Pipelines

Pipelines are defined in src/topmark/pipeline/pipelines.py and exposed via topmark.pipeline.pipelines.Pipeline.

The CLI selects among these immutable pipeline variants based on command intent and flags such as --patch and --apply.

PROBE

Purpose: Explain file type and processor resolution

Mutation: ❌ none

Steps:

flowchart TD

O[<tt>ProberStep</tt>]

End states:

  • Resolution status (resolved, unsupported, no_processor, filtered)

  • Selected file type and processor (if any)

  • Full candidate set with match signals

  • Explicit inputs filtered before file-type probing are represented by synthetic probe results with status="filtered" and reasons such as excluded_by_path_filter, excluded_by_file_type_filter, or excluded_by_discovery_filter.

This pipeline powers topmark probe and topmark.api.probe() and is intentionally resolution-only.

It halts immediately after probing and does not perform inspection, comparison, or mutation. Discovery-level filtering is reported by orchestration via synthetic probe results for explicitly requested paths that did not reach probing.

SCAN

Purpose: Detect file type and existing TopMark headers

Mutation: ❌ none

Steps:

flowchart TD

R[<tt>ResolverStep</tt>]
S[<tt>SnifferStep</tt>]
D[<tt>ReaderStep</tt>]
N[<tt>ScannerStep</tt>]

R --> S --> D --> N

End states:

  • Header detected / missing / malformed
  • File unsupported, unreadable, binary, or blocked by policy

This pipeline is used as the foundation for all others.


CHECK_RENDER

Purpose: Generate the expected header without comparison

Mutation: ❌ none

Steps:

flowchart TD

SP(<b>SCAN</b>)
B[<tt>BuilderStep</tt>]
T[<tt>RendererStep</tt>]

SP --> B --> T

End states:

  • Rendered header available in context
  • No determination yet whether changes are needed

Useful for debugging header generation.


CHECK (Summary)

Purpose: Determine whether a file would change

Mutation: ❌ none (dry-run safe)

Steps:

flowchart TD

CR(<b>CHECK_RENDER</b>)
C[<tt>ComparerStep</tt>]

CR --> C

End states:

  • UNCHANGED - rendered header matches existing header
  • CHANGED - header would be updated or inserted
  • SKIPPED / UNSUPPORTED - policy or file constraints

This is the default pipeline behind topmark check.


CHECK_PATCH

Purpose: Produce a unified diff without writing

Mutation: ❌ none (dry-run safe)

Steps:

flowchart TD

CP(<b>CHECK</b>)
P[<tt>PlannerStep</tt>]
H[<tt>PatcherStep</tt>]

CP --> P --> H

End states:

  • Patch generated
  • No patch if unchanged or skipped

Used when --patch is requested without --apply.


CHECK_APPLY

Purpose: Update or insert headers in place

Mutation: ✅ writes enabled

Steps:

flowchart TD

CP(<b>CHECK</b>)
P[<tt>PlannerStep</tt>]
W[<tt>WriterStep</tt>]

CP --> P --> W

End states:

  • File written
  • Write skipped if unchanged or blocked
  • Failure if filesystem or policy prevents writing

Requires --apply.


CHECK_APPLY_PATCH

Purpose: Apply changes and emit a patch

Mutation: ✅ writes enabled

Steps:

flowchart TD

CP(<b>CHECK</b>)
P[<tt>PlannerStep</tt>]
H[<tt>PatcherStep</tt>]
W[<tt>WriterStep</tt>]

CP --> P --> H --> W

Primarily useful for CI or audit workflows.


STRIP (Summary)

Purpose: Remove an existing TopMark header

Mutation: ❌ none (dry-run safe)

Steps:

flowchart TD

SP(<b>SCAN</b>)
X[<tt>StripperStep</tt>]

SP --> X

End states:

  • Header removed in rendered output
  • No-op if header absent
  • Skipped if unsupported or blocked

STRIP_PATCH

Purpose: Show diff for header removal

Mutation: ❌ none

Steps:

flowchart TD

XP(<b>STRIP</b>)
C[<tt>ComparerStep</tt>]
P[<tt>PlannerStep</tt>]
H[<tt>PatcherStep</tt>]

XP --> C --> P --> H

STRIP_APPLY

Purpose: Remove headers in place

Mutation: ✅ writes enabled

Steps:

flowchart TD

XP(<b>STRIP</b>)
P[<tt>PlannerStep</tt>]
W[<tt>WriterStep</tt>]

XP --> P --> W

STRIP_APPLY_PATCH

Purpose: Remove headers and emit patch

Mutation: ✅ writes enabled

Steps:

flowchart TD

XP(<b>STRIP</b>)
C[<tt>ComparerStep</tt>]
P[<tt>PlannerStep</tt>]
H[<tt>PatcherStep</tt>]
W[<tt>WriterStep</tt>]

XP --> C --> P --> H --> W

Step Responsibilities

Each step implements the Step protocol and:

  • Declares which status axes it may write
  • May halt execution via ctx.flow.halt
  • Emits structured hints for diagnostics
Step Responsibility
ProberStep Run resolution probe and expose scored candidates, selection, and processor binding
ResolverStep Determine file type and header processor (see Resolution)
SnifferStep Fast policy and newline checks
ReaderStep Read file content safely
ScannerStep Locate existing header bounds
BuilderStep Build expected header field values
RendererStep Render header text
ComparerStep Compare existing vs rendered header
StripperStep Remove header content
PlannerStep Decide insert / replace / remove plan
PatcherStep Generate unified diff
WriterStep Persist changes

Conditional and Policy-Driven End States

Some pipelines may terminate early due to policy or safety constraints:

Configuration validation happens before these pipeline steps run. Under effective strict config checking, configuration warnings are treated as validation failures and may prevent pipeline execution from starting.

  • Binary files
  • Mixed line endings
  • BOM before shebang
  • Missing read/write permissions
  • Unsupported file types

In these cases:

  • The pipeline halts cleanly
  • No mutation occurs
  • A terminal hint explains why the file was skipped or blocked

This guarantees:

  • Safe dry-runs
  • No partial writes
  • Idempotent behavior across repeated runs

Key Design Guarantees

  • Immutability: Pipelines are Final[tuple[Step, ...]]
  • Determinism: Same input → same outcome
  • Dry-run safety: No writes without --apply
  • Separation of concerns: Steps mutate context, views classify outcomes
  • Runtime/configuration separation: pipeline execution consumes resolved runtime configuration and runtime options rather than re-running TOML discovery during step execution

See also

This pipeline model is the backbone of TopMark's reliability and extensibility. New behavior is introduced by adding steps or composing new pipelines, not by special-casing control flow.


Per-axis lifecycle

TopMark tracks progress using a set of status axes. Each axis starts in PENDING and transitions as steps complete or halt early.

These diagrams are intentionally coarse: they show possible terminal states, not every code path.

Resolve axis

stateDiagram-v2
  direction LR
  [*] --> PENDING
  PENDING --> RESOLVED
  PENDING --> TYPE_RESOLVED_HEADERS_UNSUPPORTED
  PENDING --> TYPE_RESOLVED_NO_PROCESSOR_REGISTERED
  PENDING --> UNSUPPORTED

FS axis

stateDiagram-v2
  direction LR
  [*] --> PENDING
  PENDING --> OK
  PENDING --> EMPTY
  PENDING --> NOT_FOUND
  PENDING --> NO_READ_PERMISSION
  PENDING --> UNREADABLE
  PENDING --> NO_WRITE_PERMISSION
  PENDING --> BINARY
  PENDING --> BOM_BEFORE_SHEBANG
  PENDING --> UNICODE_DECODE_ERROR
  PENDING --> MIXED_LINE_ENDINGS

Content axis

stateDiagram-v2
  direction LR
  [*] --> PENDING
  PENDING --> OK
  PENDING --> UNSUPPORTED
  PENDING --> SKIPPED_MIXED_LINE_ENDINGS
  PENDING --> SKIPPED_POLICY_BOM_BEFORE_SHEBANG
  PENDING --> UNREADABLE

Header axis

stateDiagram-v2
  direction LR
  [*] --> PENDING
  PENDING --> MISSING
  PENDING --> DETECTED
  PENDING --> MALFORMED
  PENDING --> MALFORMED_ALL_FIELDS
  PENDING --> MALFORMED_SOME_FIELDS
  PENDING --> EMPTY

Generation axis

stateDiagram-v2
  direction LR
  [*] --> PENDING
  PENDING --> GENERATED
  PENDING --> NO_FIELDS

Render axis

stateDiagram-v2
  direction LR
  [*] --> PENDING
  PENDING --> RENDERED

Comparison axis

stateDiagram-v2
  direction LR
  [*] --> PENDING
  PENDING --> CHANGED
  PENDING --> UNCHANGED
  PENDING --> SKIPPED

Strip axis

stateDiagram-v2
  direction LR
  [*] --> PENDING
  PENDING --> NOT_NEEDED
  PENDING --> READY
  PENDING --> FAILED

Plan axis

stateDiagram-v2
  direction LR
  [*] --> PENDING
  PENDING --> PREVIEWED
  PENDING --> REPLACED
  PENDING --> INSERTED
  PENDING --> REMOVED
  PENDING --> SKIPPED
  PENDING --> FAILED

Patch axis

stateDiagram-v2
  direction LR
  [*] --> PENDING
  PENDING --> GENERATED
  PENDING --> SKIPPED
  PENDING --> FAILED

Write axis

stateDiagram-v2
  direction LR
  [*] --> PENDING
  PENDING --> WRITTEN
  PENDING --> SKIPPED
  PENDING --> FAILED

CLI-focused flowcharts

These diagrams describe the user-visible execution paths behind topmark check and topmark strip, including the --patch and --apply switches.

topmark check

flowchart TD
  A[User runs: topmark check]
  B[SCAN: resolve + sniff + read + scan]
  C[CHECK_RENDER: build + render]
  D[COMPARE]
  E[Report: unchanged]
  F[Plan insert/replace]
  G[Report: would change]
  H[Generate patch]
  I[Write file]
  J[Report: patch shown]
  K[Report: written]
  L[Blocked by policy/fs/content]
  M[Report: skipped/unsupported/error]

  A --> B
  B --> C
  C --> D
  D -->|unchanged| E
  D -->|would change| F
  F -->|no --patch, no --apply| G
  F -->|--patch| H
  F -->|--apply| I
  H --> J
  I --> K
  B --> L --> M

topmark strip

flowchart TD
  A[User runs: topmark strip]
  B[SCAN: resolve + sniff + read + scan]
  C[STRIP: compute removal]
  D[COMPARE]
  E[Report: no-op]
  F[Plan removal]
  G[Report: would remove]
  H[Generate patch]
  I[Write file]
  J[Report: patch shown]
  K[Report: written]
  L[Blocked by policy/fs/content]
  M[Report: skipped/unsupported/error]

  A --> B
  B --> C
  C --> D
  D -->|nothing to remove| E
  D -->|would remove| F
  F -->|no --patch, no --apply| G
  F -->|--patch| H
  F -->|--apply| I
  H --> J
  I --> K
  B --> L --> M

Filtered or missing explicit inputs are not produced by ProberStep itself. They are represented by synthetic contexts created by probe orchestration before final presentation, API, and machine-readable output packaging.