Skip to content

Common filtering recipes

Filtering controls determine stable runtime behavior such as:

  • which paths participate in discovery
  • which file types are eligible for processing
  • how explicit inputs participate in semantic runtime outcomes
  • how probe diagnostics are reported

TopMark determines which files to process using a combination of path-based filters and file type filters.

Note

The canonical vocabulary used throughout the documentation is defined in Terminology and Canonical Vocabulary.

Filtering overview

Filtering and discovery semantics are shared consistently across:

TopMark applies filtering in a deterministic order:

  1. Path-based discovery and filtering
  2. File-type filtering
  3. Runtime applicability evaluation
  4. Runtime processor resolution

Exclude rules take precedence over include rules.

For canonical file-type identifier semantics, see File-type filtering. For layered configuration behavior, see Configuration.

Note

For topmark probe, paths excluded during step 1 or 2 may still be reported as filtered semantic outcomes when they were explicitly requested inputs.


Runtime filtering boundaries

TopMark intentionally separates:

  1. path discovery
  2. path filtering
  3. file-type filtering
  4. runtime applicability evaluation
  5. runtime probing and processor resolution

Each stage consumes the finalized results of the previous stage.

This layered filtering model keeps runtime behavior deterministic while preserving stable probe diagnostics and machine-readable filtering semantics.


Missing vs unmatched inputs

TopMark distinguishes between explicit literal paths and glob patterns:

  • Explicit missing literal paths (e.g., fubar.py) are treated as hard input errors and result in FILE_NOT_FOUND (66).
  • Unmatched glob patterns (e.g., missing/**/*.py) are treated as soft runtime-discovery diagnostics and do not cause a failure for processing commands (check, strip) (exit SUCCESS (0)).

This distinction ensures that typos in explicit inputs are surfaced, while flexible patterns that match nothing do not cause runtime processing-command failures.


Path-based filtering

TopMark supports the following path-based filtering controls:

  • --include, --exclude Include or exclude glob patterns.
  • --include-from, --exclude-from Load patterns from files (one per line).
  • --files-from Provide an explicit list of files to process.

Stable path-filtering semantics:

  • Positional arguments are resolved relative to the current working directory (CWD), Black-style.
  • Patterns in --include, --exclude, and files referenced by --include-from / --exclude-from are also resolved relative to CWD.
  • Absolute patterns are not supported.
  • Exclude rules take precedence over include rules.
  • Path-based filtering occurs before file-type filtering.

STDIN support

File-processing commands support two STDIN modes when supplying file lists or content:

  • List mode: provide newline-delimited paths or patterns via:
  • --files-from -
  • --include-from -
  • --exclude-from -
  • Content mode: process a single virtual runtime file from STDIN content by passing - as the sole PATH together with --stdin-filename NAME

See shared input modes for the full STDIN contract, including why TopMark does not provide a --stdin option flag.


Interaction with topmark probe

The topmark probe command uses the same runtime filtering pipeline and discovery semantics described above.

This includes:

  • path filtering
  • file-type filtering
  • canonical file-type identifier normalization and resolution
  • ambiguity handling

However, unlike processing commands (check, strip), probe also reports **explicit inputs that were filtered out before runtime file-type probing.

Additionally, probe treats unmatched glob patterns as filtered semantic outcomes rather than silent runtime no-ops. As a result:

  • Unmatched glob patterns are reported as filtered probe results (e.g., filtered: excluded_by_discovery_filter).
  • The command exits with UNSUPPORTED_FILE_TYPE (69), reflecting incomplete runtime semantic resolution.

This differs from processing commands, which treat unmatched patterns as non-fatal diagnostics.

probe is read-only and diagnostic-only. It shares discovery and filtering behavior with check and strip, but rejects mutation, diff, reporting, and header-generation options that do not apply.

For example, when a path is excluded via --exclude or exclude_patterns, topmark probe will still show it in the output as:

<path>: <filtered> - filtered: excluded_by_path_filter

In machine-readable JSON and NDJSON output, these are represented as structured probe results with:

{
  "status": "filtered",
  "reason": "excluded_by_path_filter",
  "selected_file_type": null,
  "selected_processor": null,
  "candidates": []
}

Filtered probe results may use one of the following reasons:

  • excluded_by_path_filter - excluded by path-based include/exclude rules
  • excluded_by_file_type_filter - excluded by file-type include/exclude rules
  • excluded_by_discovery_filter - excluded before runtime probing, but exact category not identified
  • no_candidates - no file-type candidates were found (e.g., unsupported extension)

Only explicitly requested runtime inputs (CLI paths or --files-from) are reported this way. Files excluded implicitly during recursive discovery are not enumerated.


Filtering recipes

Recipe: Process only Python and Markdown

CLI:

topmark check --include-file-types python,markdown .

Equivalent canonical form:

topmark check --include-file-types topmark:python,topmark:markdown .

TOML:

[files]
include_file_types = ["python", "markdown"]

Recipe: Exclude generated/virtualenv folders

TOML:

[files]
exclude_patterns = [
  ".venv/**",
  "**/__pycache__/**",
  "**/.mypy_cache/**",
  "**/.pytest_cache/**",
  "dist/**",
  "build/**",
]

Recipe: Include only src/ and tests/

TOML:

[files]
include_patterns = ["src/**", "tests/**"]

Recipe: Use include/exclude pattern files (portable across repos)

[files]
include_from = ["include.txt"]
exclude_from = ["exclude.txt"]

These files may also be provided via STDIN by using - as the file path.

Example include.txt:

src/**
tests/**

Example exclude.txt:

.venv/**
**/__pycache__/**

Recipe: Exclude a specific file type after path filtering

[files]
include_patterns = ["**/*.toml", "**/*.yaml", "**/*.yml"]
exclude_file_types = ["yaml"]

Equivalent canonical form:

[files]
exclude_file_types = ["topmark:yaml"]

Recipe: Process only an explicit file list (from Git)

Generate a file list:

git ls-files > files.txt

Then:

topmark check --files-from files.txt

You can also stream the file list via STDIN:

git ls-files | topmark check --files-from -

Recipe: Show only actionable files (would change)

topmark check --report actionable .

Recipe: Include unsupported files in reporting

topmark check --report noncompliant .
topmark strip --report noncompliant .

File-type filtering

TopMark supports file-type include/exclude filtering via:

  • --include-file-types / -t
  • --exclude-file-types / -T
  • include_file_types
  • exclude_file_types

File-type filters are evaluated after path-based filtering.

TopMark accepts file type identifiers in local form, such as python, or qualified form, such as topmark:python.

Local identifiers are accepted only when unambiguous. Internally, TopMark normalizes identifiers to canonical qualified file type identities before filtering, runtime resolution, policy evaluation, diagnostics, and registry lookup.

Plugins and integrations may declare file types in their own namespace, such as acme:python. This allows independent ecosystems to define custom file types and register independent runtime header processors without colliding with built-in TopMark identifiers.

Local identifiers are accepted only when they are unambiguous. If more than one registered file type has the same local identifier, the local form is considered ambiguous and TopMark requires the qualified form.


Exit-code interaction

Filtering decisions can influence exit codes indirectly:

  • Missing explicit inputs → FILE_NOT_FOUND (66)
  • Unmatched glob patterns → no failure (check / strip, SUCCESS (0)), or UNSUPPORTED_FILE_TYPE (69) in probe

Missing explicit inputs take precedence over semantic runtime probe outcomes.

When multiple conditions occur, TopMark applies a deterministic exit-code priority model (see Exit Codes documentation), where hard input and filesystem errors take precedence.

Invalid CLI usage (for example, unsupported options or inappropriate STDIN modes) is reported as a usage error and takes precedence over filtering outcomes.


Notes on configuration strictness

Filtering determines which runtime files participate in processing, while staged config-loading validation determines whether a run is allowed to proceed.

Note

[config].strict is a TOML-source-local strictness preference controlling staged configuration-loading validation for the current TOML source.

Effective strictness is evaluated across:

  • TOML-source diagnostics;
  • merged-config diagnostics;
  • runtime applicability diagnostics.

strict is resolved during TOML loading and does not become a layered configuration field.

Effective strictness is controlled by:

  1. CLI override (--strict / --no-strict)
  2. TOML setting (strict)
  3. default non-strict behavior

When strict config checking is enabled, configuration-loading validation warnings are treated as errors and may cause the command to fail before processing files.


See also