topmark probe¶
Purpose: Explain file-type and processor resolution.
The probe command explains how TopMark resolves a file to a file type and header processor. It is
diagnostic-only: it does not read full file content for header detection, does not compare or mutate
headers, and does not write files.
Instead, it exposes the resolution decision process, including:
- the selected file type and processor
- canonical resolved file type identities and qualified keys
- the runtime-resolution status and reason
- all scored candidate file types
- match signals (extension, filename, pattern, content probing)
- explicit inputs filtered during discovery before runtime probing
Note
The canonical vocabulary used throughout the documentation is defined in Terminology and Canonical Vocabulary.
Note
Path representation
TopMark serializes machine-readable filesystem path fields with POSIX / separators on all
platforms.
Path serialization is a presentation contract and is distinct from filesystem identity.
TopMark first determines the selected processing path for the filesystem target being processed and then serializes that processing path according to the machine-output contract.
This contract applies to:
- header metadata path fields;
- processing machine-output payloads;
- probe machine-output payloads;
- configuration machine-output payloads; and
- TOML/config provenance payloads.
Examples:
may refer to the same filesystem identity and therefore produce the same serialized processing path.
TopMark's machine-readable path fields remain path-based and are derived from the selected processing path for each processing target.
Filesystem identity policy is a separate concern from path serialization. TopMark may apply additional filesystem-identity rules when determining whether a processing target is eligible for processing. For example, selected hard-linked files are detected using device/inode identity and are reported as unsupported processing targets. Such checks do not alter the serialized path values emitted in machine-readable output.
Human-facing output follows display-path policy instead:
- CLI and Markdown reports may use the host platform's native path representation;
- STDIN-backed processing displays the logical
--stdin-filenamewhen available; and - unified diff file labels are human-facing display labels, not machine-readable path fields.
Synthetic configuration-source identifiers (for example built-in defaults) are serialized as stable labels rather than filesystem paths.
Quick start¶
# Explain how a file is classified
topmark probe README.md
# Increase detail (selected fields, then candidates)
topmark probe -v README.md
topmark probe -vv README.md
# Multiple files
topmark probe src/
# Machine-readable output
topmark probe --output-format json README.md
topmark probe --output-format ndjson README.md
# Markdown report (document-style)
topmark probe --output-format markdown README.md
Input applicability¶
probe is read-only and diagnostic-only. It shares input discovery, filtering, configuration, and
file-type resolution controls with check and strip, but it rejects
options that belong to file mutation, patch planning, reporting summaries, diffs, or
generated-header rendering.
Use check or strip for header comparison, patch previews, reports, or
mutation.
STDIN modes¶
probe supports both list STDIN mode (--files-from -, --include-from -, or --exclude-from -)
and content STDIN mode (- plus --stdin-filename NAME). These modes are mutually exclusive.
See shared input modes for the full STDIN contract,
including why TopMark does not provide a --stdin option flag.
Configuration and validation¶
probe supports --strict / --no-strict to override the effective strict value for the run.
Before any file processing begins, TopMark performs whole-source TOML schema validation during configuration loading. TOML-source diagnostics (including missing-section INFO diagnostics) are evaluated together with merged-config and runtime applicability diagnostics during staged configuration-loading validation for the run.
Note
[config].strict is a TOML-source-local strictness preference controlling staged
configuration-loading validation for the current TOML source.
Effective strictness is evaluated across:
- TOML-source diagnostics;
- merged-config diagnostics;
- runtime applicability diagnostics.
When strict validation fails, TopMark exits with CONFIG_ERROR. The diagnostics that triggered
the failure remain visible in human-readable and machine-readable output formats.
strict is resolved during TOML loading and does not become a layered configuration field.
TopMark resolves configuration from defaults, user config, the project chain discovered from the
resolved discovery anchor, explicit --config files, and CLI overrides before staged validation
produces the effective runtime configuration. For path-processing commands such as probe, the
discovery anchor is derived from the first selected input path when one is available, or from the
current working directory otherwise.
Configuration discovery is evaluated before runtime filesystem-identity evaluation selects processing paths for probing. Symlinked discovery anchors therefore affect which project configuration files are found before selected probe paths or machine-readable probe-path fields are produced. See Configuration discovery, precedence, and policy for the full configuration-loading and validation contract.
Filtering and file discovery¶
TopMark determines which files to process using a combination of path-based filters and file-type filters.
Path arguments, include/exclude patterns, --files-from, and file-type filters follow the shared
TopMark filtering pipeline. Positional paths and relative patterns are resolved from the current
working directory; path-based filters run before file-type filters, and exclude rules take
precedence. See Filtering for the full path discovery
contract.
During discovery, TopMark performs filesystem-identity evaluation and selects processing paths
before runtime probing begins. If multiple path spellings resolve to the same filesystem target (for
example a symlink and its target), probe operates on the selected processing path rather than
preserving the original spelling. Hard-link policy is evaluated as a processing-target eligibility
check.
This runtime discovery stage is separate from configuration discovery. Project-chain configuration
files have already been selected from the resolved discovery anchor before probe evaluates file
filters, selected processing paths, and runtime probing results.
Filtered and missing explicit inputs remain diagnostic records because they never became normal processing paths.
Unlike processing commands, probe may report explicitly requested files as filtered diagnostic
results instead of silently omitting them.
Explicit directories that successfully expand to selected files are treated as discovery inputs and are not reported as separate filtered probe results. Explicit missing paths are reported as missing input errors rather than filtered probe results.
File type filters¶
--include-file-types / -tRestrict processing to the given file type identifiers. May be repeated and/or provided as a comma-separated list.--exclude-file-types / -TExclude the given file type identifiers. May be repeated and/or provided as a comma-separated list.
File type identifiers are normalized to canonical qualified file type identities before filtering, diagnostics, policy evaluation, and registry resolution.
TopMark accepts file type identifiers in local form, such as python, or qualified form, such as
topmark:python.
Local identifiers are accepted only when unambiguous. Internally, TopMark normalizes identifiers to canonical qualified file type identities before filtering, runtime resolution, policy evaluation, diagnostics, and registry lookup.
See file-type filtering for the full identifier contract.
Examples:
topmark probe --include-file-types python README.md
topmark probe --include-file-types topmark:markdown README.md
topmark probe --exclude-file-types topmark:python src/
Path-based filters¶
--include,--excludeInclude or exclude glob patterns.--include-from,--exclude-fromLoad patterns from files (one per line).--files-fromProvide an explicit list of files to process.
See Filtering for CWD-resolution rules, missing vs unmatched input behavior, include/exclude precedence, and STDIN interactions.
Notes:
- Existing filesystem inputs are normalized to selected processing paths before runtime probing.
- Symlink spellings are not preserved for runtime identity or machine-readable probe-path fields.
- Missing and filtered explicit inputs may still report the original diagnostic input path because no processing path was selected.
Example¶
# Probe only Python-like files selected through include/exclude filters
printf "*.py\n" > inc.txt
printf "tests/*\n# ignored\n" > exc.txt
topmark probe --include-from inc.txt --exclude-from exc.txt -vv
Behavior details¶
- Read-only: does not modify files.
- Resolution-only: does not perform header scanning, comparison, mutation planning, or writes.
- Shared discovery: uses the same discovery and filtering pipeline as
checkandstrip, while preserving filtered explicit inputs as diagnostic probe results. - Shared runtime resolution: uses the same normalization, scoring, and runtime resolution logic as
checkandstrip. - Processing-path identity: runtime probing operates on selected processing paths. Symlink spellings are resolved to the target path before ordinary probe execution.
- Hard-link policy: if multiple selected processing paths are hard links to the same filesystem object, probe reports each affected path as unsupported rather than selecting a source, target, winner, or loser path.
- Candidate visibility: exposes selected file type, processor, candidate scores, match signals, runtime-resolution status, and runtime-resolution reason.
- Idempotency: repeated runs produce identical output for unchanged inputs.
Output behavior¶
TEXT rendering¶
TEXT rendering provides a concise summary by default, with increasing detail via verbosity:
- default: one-line summary per file
-v: include selected file type and processor-vv: include candidate lists, match signals, and resolution details
Markdown output¶
Use --output-format markdown to render a document-oriented report.
Notes:
- Markdown output is document-oriented and ignores TEXT-oriented verbosity and quiet controls.
- Always includes selected details and candidate tables
- Suitable for documentation or review artifacts
Machine-readable output (JSON, NDJSON)¶
Machine-readable formats are intended for automation and tooling integration.
- JSON: a single machine-readable JSON document containing
meta,config,config_diagnostics, andprobes - NDJSON: one machine-readable NDJSON record per line; includes
kind="probe"records for each probe result
For the canonical schema, see:
Probe machine-readable output emits processing paths with POSIX / separators and resolved file
type identities using canonical qualified identity strings when available. For probe results that
reach runtime probing, the emitted path describes the selected processing target rather than the
original symlink or invocation spelling.
Hard-linked processing targets remain separate probe results. Each affected selected path produces
its own machine-readable probe payload and is reported as an unsupported processing target with
reason hard_link_duplicate.
Shared output controls¶
Output format, TEXT verbosity, quiet mode, color output, and shared exit-code behavior are documented in shared options and exit codes.
TEXT verbosity is separate from internal logging:
-v,--verboseincreases TEXT output detail for probe diagnostics.-q,--quietsuppresses TEXT rendering while preserving the command's exit status.- Markdown output is document-oriented and ignores TEXT-oriented verbosity and quiet controls.
- Machine-readable JSON and NDJSON output are unaffected by TEXT-oriented verbosity and quiet controls.
Notes:
- Primary/headline hint selection, where rendered in human-readable output, is presentation-level guidance and is not part of the stable CLI contract; rely on exit codes and machine-readable output for automation.
probeis diagnostic-only and never renders diffs or patch previews.
Machine-readable output¶
JSON¶
{
"meta": { /* MetaPayload */ },
"config": { /* RuntimeConfigPayload */ },
"config_diagnostics": { /* ConfigDiagnosticsPayload */ },
"probes": [
{
"path": "README.md", // POSIX path serialization
"status": "resolved",
"reason": "selected_highest_score",
"selected_file_type": { ... },
"selected_processor": { ... },
"candidates": [ ... ]
}
]
}
Only explicit inputs that actually fail selection are represented as filtered probe payloads.
Probe payloads that reach runtime probing report the selected processing path. Filtered and missing explicit inputs may instead report the original diagnostic input path because no processing path was selected.
Directories that successfully expand to selected files are not emitted as additional filtered probe results. Explicit missing paths are represented as missing-input probe results rather than filtered probe results.
If multiple selected processing paths are hard links to the same filesystem object, probe emits one
result per selected path. Each affected result reports an unsupported outcome with reason
hard_link_duplicate. TopMark does not select a preferred path from the hard-link group.
{
"path": "__pycache__/example.cpython-312.pyc", // POSIX path serialization
"status": "filtered",
"reason": "excluded_by_path_filter",
"selected_file_type": null,
"selected_processor": null,
"candidates": []
}
Filtered probe results may use one of the following reasons:
excluded_by_path_filter- excluded by path-based include/exclude rulesexcluded_by_file_type_filter- excluded by file-type include/exclude rules after identifier normalization to canonical qualified file type identitiesexcluded_by_discovery_filter- excluded before probing but exact category not identified
NDJSON¶
{"kind":"config",...}
{"kind":"config_diagnostics",...}
{"kind":"diagnostic",...}
{"kind":"probe","meta":{...},"probe":{...}} <!-- one per probe result -->
Canonical file type identities in machine-readable output use normalized qualified-key identities
such as topmark:python.
Probe payload path values represent selected processing paths serialized with POSIX / separators
on all platforms. Human TEXT output remains display-oriented and may use the host platform's native
path representation.
Filtered and missing explicit-input probe results are an exception: they may report the original diagnostic input path because runtime processing-path selection never occurred.
Command-specific options¶
| Option | Description |
|---|---|
-q, --quiet |
Suppress TEXT rendering while preserving exit status. |
--files-from |
Read newline-delimited paths from file (use '-' for STDIN). |
- (PATH) |
Read one virtual file from STDIN content (requires --stdin-filename). |
--include |
Add paths by glob. |
--include-from |
File of patterns to include. |
--exclude |
Exclude paths by glob. |
--exclude-from |
File of patterns to exclude. |
--include-file-types / -t |
Restrict to local or qualified file type identifiers. |
--exclude-file-types / -T |
Exclude local or qualified file type identifiers. |
--stdin-filename |
Assumed filename when PATH is '-' (content from STDIN). |
--allow-content-probe / --no-allow-content-probe |
Shared runtime policy override for file-type detection. |
Run
topmark probe -hfor the full list of options.
Exit codes¶
topmark probe exits with SUCCESS (0) when all inputs are fully resolved.
Common probe exit codes:
| Scenario | Exit code |
|---|---|
| All inputs resolved | SUCCESS (0) |
| Any input unresolved / unsupported / filtered | UNSUPPORTED_FILE_TYPE (69) |
| Missing explicit input path | FILE_NOT_FOUND (66) |
| Permission failure | PERMISSION_DENIED (77) |
| Configuration error | CONFIG_ERROR (78) |
| Invalid CLI usage | USAGE_ERROR (64) |
Notes:
- Click parser-level usage errors (for example, unknown commands, unknown options, or invalid option
values) may exit with code
2before command logic runs. UNSUPPORTED_FILE_TYPE (69)indicates runtime-resolution failure (e.g., unsupported file type or filtered input), not a crash.- Explicit missing literal paths are treated as hard input errors and produce
FILE_NOT_FOUND (66). - Missing explicit inputs take precedence over runtime-resolution outcomes
(
UNSUPPORTED_FILE_TYPE (69)). - Unmatched glob patterns are reported as filtered probe results (e.g.,
filtered: excluded_by_discovery_filter) and result inUNSUPPORTED_FILE_TYPE (69). - Ambiguous local file type identifiers may also contribute to runtime-resolution outcomes unless
callers use canonical qualified identifiers such as
topmark:python.
See Exit codes for the complete CLI-wide exit-code contract.
Typical workflows¶
1) Inspect file classification¶
2) Investigate ambiguous matches¶
3) Integrate with tooling¶
Related commands¶
topmark check- verify and update headers.topmark strip- remove detected TopMark headers.topmark config check- validate the effective runtime configuration and report diagnostics.topmark config dump- inspect the effective runtime configuration, including normalized file type identifiers.
Related docs¶
- Command overview
- Configuration
- Filtering
- Policies
- Shared options
- Exit codes
- Registry model
- Resolution model
- Machine-readable output
- Machine-readable format conventions
- Terminology and Canonical Vocabulary
Troubleshooting¶
- Unsupported file: ensure file type patterns, bindings, or extensions are configured correctly.
- Unexpected resolution result: use
-vvto inspect candidate scores and match signals. - Symlink path not shown in output:
probereports selected processing paths for inputs that reach runtime probing. If a symlink and its target resolve to the same file, the emitted probe path describes the resolved processing target rather than the symlink spelling. - Hard-linked files are reported as unsupported: TopMark blocks processing when multiple
selected paths refer to the same filesystem object through hard links. Each affected path is
reported independently with reason
hard_link_duplicate; no preferred path is selected from the hard-link group. - File type filter does not match: prefer qualified identifiers such as
topmark:pythonwhen local identifiers may be ambiguous. - No processor: check that a processor binding exists for the selected file type.
- Filtered input: the path was excluded during discovery or filtering evaluation (e.g.,
--exclude). The probe output will show one of: filtered: excluded_by_path_filterfiltered: excluded_by_file_type_filterfiltered: excluded_by_discovery_filter--stdinis rejected: Use-as the PATH sentinel together with--stdin-filename NAMEwhen reading one virtual file from STDIN content.- Missing file error: A literal path such as
fubar.pyis treated as an explicit input and fails withFILE_NOT_FOUND (66)when it does not exist. Missing explicit inputs are reported as missing-input probe results rather than filtered probe results.