Skip to content

topmark probe

Purpose: Explain file-type and processor resolution.

The probe command explains how TopMark resolves a file to a file type and header processor. It is diagnostic-only: it does not read full file content for header detection, does not compare or mutate headers, and does not write files.

Instead, it exposes the resolution decision process, including:

  • the selected file type and processor
  • canonical resolved file type identities and qualified keys
  • the runtime-resolution status and reason
  • all scored candidate file types
  • match signals (extension, filename, pattern, content probing)
  • explicit inputs filtered during discovery before runtime probing

Note

The canonical vocabulary used throughout the documentation is defined in Terminology and Canonical Vocabulary.

Note

Path representation

TopMark serializes machine-readable filesystem path fields with POSIX / separators on all platforms.

Path serialization is a presentation contract and is distinct from filesystem identity.

TopMark first determines the selected processing path for the filesystem target being processed and then serializes that processing path according to the machine-output contract.

This contract applies to:

  • header metadata path fields;
  • processing machine-output payloads;
  • probe machine-output payloads;
  • configuration machine-output payloads; and
  • TOML/config provenance payloads.

Examples:

real/file.py
./real/file.py
link-to-file.py

may refer to the same filesystem identity and therefore produce the same serialized processing path.

TopMark's machine-readable path fields remain path-based and are derived from the selected processing path for each processing target.

Filesystem identity policy is a separate concern from path serialization. TopMark may apply additional filesystem-identity rules when determining whether a processing target is eligible for processing. For example, selected hard-linked files are detected using device/inode identity and are reported as unsupported processing targets. Such checks do not alter the serialized path values emitted in machine-readable output.

Human-facing output follows display-path policy instead:

  • CLI and Markdown reports may use the host platform's native path representation;
  • STDIN-backed processing displays the logical --stdin-filename when available; and
  • unified diff file labels are human-facing display labels, not machine-readable path fields.

Synthetic configuration-source identifiers (for example built-in defaults) are serialized as stable labels rather than filesystem paths.


Quick start

# Explain how a file is classified
topmark probe README.md

# Increase detail (selected fields, then candidates)
topmark probe -v README.md
topmark probe -vv README.md

# Multiple files
topmark probe src/

# Machine-readable output
topmark probe --output-format json README.md
topmark probe --output-format ndjson README.md

# Markdown report (document-style)
topmark probe --output-format markdown README.md

Input applicability

probe is read-only and diagnostic-only. It shares input discovery, filtering, configuration, and file-type resolution controls with check and strip, but it rejects options that belong to file mutation, patch planning, reporting summaries, diffs, or generated-header rendering.

Use check or strip for header comparison, patch previews, reports, or mutation.

STDIN modes

probe supports both list STDIN mode (--files-from -, --include-from -, or --exclude-from -) and content STDIN mode (- plus --stdin-filename NAME). These modes are mutually exclusive.

See shared input modes for the full STDIN contract, including why TopMark does not provide a --stdin option flag.


Configuration and validation

probe supports --strict / --no-strict to override the effective strict value for the run.

Before any file processing begins, TopMark performs whole-source TOML schema validation during configuration loading. TOML-source diagnostics (including missing-section INFO diagnostics) are evaluated together with merged-config and runtime applicability diagnostics during staged configuration-loading validation for the run.

Note

[config].strict is a TOML-source-local strictness preference controlling staged configuration-loading validation for the current TOML source.

Effective strictness is evaluated across:

  • TOML-source diagnostics;
  • merged-config diagnostics;
  • runtime applicability diagnostics.

When strict validation fails, TopMark exits with CONFIG_ERROR. The diagnostics that triggered the failure remain visible in human-readable and machine-readable output formats.

strict is resolved during TOML loading and does not become a layered configuration field.

TopMark resolves configuration from defaults, user config, the project chain discovered from the resolved discovery anchor, explicit --config files, and CLI overrides before staged validation produces the effective runtime configuration. For path-processing commands such as probe, the discovery anchor is derived from the first selected input path when one is available, or from the current working directory otherwise.

Configuration discovery is evaluated before runtime filesystem-identity evaluation selects processing paths for probing. Symlinked discovery anchors therefore affect which project configuration files are found before selected probe paths or machine-readable probe-path fields are produced. See Configuration discovery, precedence, and policy for the full configuration-loading and validation contract.


Filtering and file discovery

TopMark determines which files to process using a combination of path-based filters and file-type filters.

Path arguments, include/exclude patterns, --files-from, and file-type filters follow the shared TopMark filtering pipeline. Positional paths and relative patterns are resolved from the current working directory; path-based filters run before file-type filters, and exclude rules take precedence. See Filtering for the full path discovery contract.

During discovery, TopMark performs filesystem-identity evaluation and selects processing paths before runtime probing begins. If multiple path spellings resolve to the same filesystem target (for example a symlink and its target), probe operates on the selected processing path rather than preserving the original spelling. Hard-link policy is evaluated as a processing-target eligibility check.

This runtime discovery stage is separate from configuration discovery. Project-chain configuration files have already been selected from the resolved discovery anchor before probe evaluates file filters, selected processing paths, and runtime probing results.

Filtered and missing explicit inputs remain diagnostic records because they never became normal processing paths.

Unlike processing commands, probe may report explicitly requested files as filtered diagnostic results instead of silently omitting them.

Explicit directories that successfully expand to selected files are treated as discovery inputs and are not reported as separate filtered probe results. Explicit missing paths are reported as missing input errors rather than filtered probe results.

File type filters

  • --include-file-types / -t Restrict processing to the given file type identifiers. May be repeated and/or provided as a comma-separated list.
  • --exclude-file-types / -T Exclude the given file type identifiers. May be repeated and/or provided as a comma-separated list.

File type identifiers are normalized to canonical qualified file type identities before filtering, diagnostics, policy evaluation, and registry resolution.

TopMark accepts file type identifiers in local form, such as python, or qualified form, such as topmark:python.

Local identifiers are accepted only when unambiguous. Internally, TopMark normalizes identifiers to canonical qualified file type identities before filtering, runtime resolution, policy evaluation, diagnostics, and registry lookup.

See file-type filtering for the full identifier contract.

Examples:

topmark probe --include-file-types python README.md
topmark probe --include-file-types topmark:markdown README.md
topmark probe --exclude-file-types topmark:python src/

Path-based filters

  • --include, --exclude Include or exclude glob patterns.
  • --include-from, --exclude-from Load patterns from files (one per line).
  • --files-from Provide an explicit list of files to process.

See Filtering for CWD-resolution rules, missing vs unmatched input behavior, include/exclude precedence, and STDIN interactions.

Notes:

  • Existing filesystem inputs are normalized to selected processing paths before runtime probing.
  • Symlink spellings are not preserved for runtime identity or machine-readable probe-path fields.
  • Missing and filtered explicit inputs may still report the original diagnostic input path because no processing path was selected.

Example

# Probe only Python-like files selected through include/exclude filters
printf "*.py\n" > inc.txt
printf "tests/*\n# ignored\n" > exc.txt

topmark probe --include-from inc.txt --exclude-from exc.txt -vv

Behavior details

  • Read-only: does not modify files.
  • Resolution-only: does not perform header scanning, comparison, mutation planning, or writes.
  • Shared discovery: uses the same discovery and filtering pipeline as check and strip, while preserving filtered explicit inputs as diagnostic probe results.
  • Shared runtime resolution: uses the same normalization, scoring, and runtime resolution logic as check and strip.
  • Processing-path identity: runtime probing operates on selected processing paths. Symlink spellings are resolved to the target path before ordinary probe execution.
  • Hard-link policy: if multiple selected processing paths are hard links to the same filesystem object, probe reports each affected path as unsupported rather than selecting a source, target, winner, or loser path.
  • Candidate visibility: exposes selected file type, processor, candidate scores, match signals, runtime-resolution status, and runtime-resolution reason.
  • Idempotency: repeated runs produce identical output for unchanged inputs.

Output behavior

TEXT rendering

TEXT rendering provides a concise summary by default, with increasing detail via verbosity:

  • default: one-line summary per file
  • -v: include selected file type and processor
  • -vv: include candidate lists, match signals, and resolution details

Markdown output

Use --output-format markdown to render a document-oriented report.

Notes:

  • Markdown output is document-oriented and ignores TEXT-oriented verbosity and quiet controls.
  • Always includes selected details and candidate tables
  • Suitable for documentation or review artifacts

Machine-readable output (JSON, NDJSON)

Machine-readable formats are intended for automation and tooling integration.

  • JSON: a single machine-readable JSON document containing meta, config, config_diagnostics, and probes
  • NDJSON: one machine-readable NDJSON record per line; includes kind="probe" records for each probe result

For the canonical schema, see:

Probe machine-readable output emits processing paths with POSIX / separators and resolved file type identities using canonical qualified identity strings when available. For probe results that reach runtime probing, the emitted path describes the selected processing target rather than the original symlink or invocation spelling.

Hard-linked processing targets remain separate probe results. Each affected selected path produces its own machine-readable probe payload and is reported as an unsupported processing target with reason hard_link_duplicate.

Shared output controls

Output format, TEXT verbosity, quiet mode, color output, and shared exit-code behavior are documented in shared options and exit codes.

TEXT verbosity is separate from internal logging:

  • -v, --verbose increases TEXT output detail for probe diagnostics.
  • -q, --quiet suppresses TEXT rendering while preserving the command's exit status.
  • Markdown output is document-oriented and ignores TEXT-oriented verbosity and quiet controls.
  • Machine-readable JSON and NDJSON output are unaffected by TEXT-oriented verbosity and quiet controls.

Notes:

  • Primary/headline hint selection, where rendered in human-readable output, is presentation-level guidance and is not part of the stable CLI contract; rely on exit codes and machine-readable output for automation.
  • probe is diagnostic-only and never renders diffs or patch previews.

Machine-readable output

JSON

{
  "meta": { /* MetaPayload */ },
  "config": { /* RuntimeConfigPayload */ },
  "config_diagnostics": { /* ConfigDiagnosticsPayload */ },
  "probes": [
    {
      "path": "README.md", // POSIX path serialization
      "status": "resolved",
      "reason": "selected_highest_score",
      "selected_file_type": { ... },
      "selected_processor": { ... },
      "candidates": [ ... ]
    }
  ]
}

Only explicit inputs that actually fail selection are represented as filtered probe payloads.

Probe payloads that reach runtime probing report the selected processing path. Filtered and missing explicit inputs may instead report the original diagnostic input path because no processing path was selected.

Directories that successfully expand to selected files are not emitted as additional filtered probe results. Explicit missing paths are represented as missing-input probe results rather than filtered probe results.

If multiple selected processing paths are hard links to the same filesystem object, probe emits one result per selected path. Each affected result reports an unsupported outcome with reason hard_link_duplicate. TopMark does not select a preferred path from the hard-link group.

{
  "path": "__pycache__/example.cpython-312.pyc", // POSIX path serialization
  "status": "filtered",
  "reason": "excluded_by_path_filter",
  "selected_file_type": null,
  "selected_processor": null,
  "candidates": []
}

Filtered probe results may use one of the following reasons:

  • excluded_by_path_filter - excluded by path-based include/exclude rules
  • excluded_by_file_type_filter - excluded by file-type include/exclude rules after identifier normalization to canonical qualified file type identities
  • excluded_by_discovery_filter - excluded before probing but exact category not identified

NDJSON

{"kind":"config",...}
{"kind":"config_diagnostics",...}
{"kind":"diagnostic",...}
{"kind":"probe","meta":{...},"probe":{...}}  <!-- one per probe result -->

Canonical file type identities in machine-readable output use normalized qualified-key identities such as topmark:python.

Probe payload path values represent selected processing paths serialized with POSIX / separators on all platforms. Human TEXT output remains display-oriented and may use the host platform's native path representation.

Filtered and missing explicit-input probe results are an exception: they may report the original diagnostic input path because runtime processing-path selection never occurred.


Command-specific options

Option Description
-q, --quiet Suppress TEXT rendering while preserving exit status.
--files-from Read newline-delimited paths from file (use '-' for STDIN).
- (PATH) Read one virtual file from STDIN content (requires --stdin-filename).
--include Add paths by glob.
--include-from File of patterns to include.
--exclude Exclude paths by glob.
--exclude-from File of patterns to exclude.
--include-file-types / -t Restrict to local or qualified file type identifiers.
--exclude-file-types / -T Exclude local or qualified file type identifiers.
--stdin-filename Assumed filename when PATH is '-' (content from STDIN).
--allow-content-probe / --no-allow-content-probe Shared runtime policy override for file-type detection.

Run topmark probe -h for the full list of options.


Exit codes

topmark probe exits with SUCCESS (0) when all inputs are fully resolved.

Common probe exit codes:

Scenario Exit code
All inputs resolved SUCCESS (0)
Any input unresolved / unsupported / filtered UNSUPPORTED_FILE_TYPE (69)
Missing explicit input path FILE_NOT_FOUND (66)
Permission failure PERMISSION_DENIED (77)
Configuration error CONFIG_ERROR (78)
Invalid CLI usage USAGE_ERROR (64)

Notes:

  • Click parser-level usage errors (for example, unknown commands, unknown options, or invalid option values) may exit with code 2 before command logic runs.
  • UNSUPPORTED_FILE_TYPE (69) indicates runtime-resolution failure (e.g., unsupported file type or filtered input), not a crash.
  • Explicit missing literal paths are treated as hard input errors and produce FILE_NOT_FOUND (66).
  • Missing explicit inputs take precedence over runtime-resolution outcomes (UNSUPPORTED_FILE_TYPE (69)).
  • Unmatched glob patterns are reported as filtered probe results (e.g., filtered: excluded_by_discovery_filter) and result in UNSUPPORTED_FILE_TYPE (69).
  • Ambiguous local file type identifiers may also contribute to runtime-resolution outcomes unless callers use canonical qualified identifiers such as topmark:python.

See Exit codes for the complete CLI-wide exit-code contract.


Typical workflows

1) Inspect file classification

topmark probe README.md

2) Investigate ambiguous matches

topmark probe -vv README.md

3) Integrate with tooling

topmark probe --output-format json README.md



Troubleshooting

  • Unsupported file: ensure file type patterns, bindings, or extensions are configured correctly.
  • Unexpected resolution result: use -vv to inspect candidate scores and match signals.
  • Symlink path not shown in output: probe reports selected processing paths for inputs that reach runtime probing. If a symlink and its target resolve to the same file, the emitted probe path describes the resolved processing target rather than the symlink spelling.
  • Hard-linked files are reported as unsupported: TopMark blocks processing when multiple selected paths refer to the same filesystem object through hard links. Each affected path is reported independently with reason hard_link_duplicate; no preferred path is selected from the hard-link group.
  • File type filter does not match: prefer qualified identifiers such as topmark:python when local identifiers may be ambiguous.
  • No processor: check that a processor binding exists for the selected file type.
  • Filtered input: the path was excluded during discovery or filtering evaluation (e.g., --exclude). The probe output will show one of:
  • filtered: excluded_by_path_filter
  • filtered: excluded_by_file_type_filter
  • filtered: excluded_by_discovery_filter
  • --stdin is rejected: Use - as the PATH sentinel together with --stdin-filename NAME when reading one virtual file from STDIN content.
  • Missing file error: A literal path such as fubar.py is treated as an explicit input and fails with FILE_NOT_FOUND (66) when it does not exist. Missing explicit inputs are reported as missing-input probe results rather than filtered probe results.