Skip to content

topmark.resolution.filetypes

topmark / resolution / filetypes

Path-based file type and processor resolution helpers.

This module contains the shared scoring-based resolution engine used to map a concrete filesystem path onto the most specific matching FileType, and optionally onto the bound HeaderProcessor registered for that resolved file type.

Unlike identifier-based lookup in topmark.registry.filetypes, these helpers operate on real paths and evaluate extension, filename, pattern, and optional content-based signals.

Resolution may produce multiple matching FileType candidates. This is not a registry error. Instead, the resolver applies a deterministic precedence model and selects at most one effective winner. Candidate overlap is therefore allowed, but the final selection must remain stable for the same path, content, and effective registry state.

The module also constructs probe results for topmark probe. Probe result value objects live in topmark.resolution.probe, while the probe implementation remains here so it can share the exact same scoring and tie-break helpers used by effective resolution.

FileTypeCandidate dataclass

FileTypeCandidate(
    *, score, namespace, local_key, file_type
)

Describe a scored file type resolution candidate.

Attributes:

Name Type Description
score int

Candidate precedence score; higher is better.

namespace str

Candidate file type namespace.

local_key str

Candidate file type local key.

file_type FileType

Candidate FileType instance.

FileTypeCandidateOrderKey dataclass

FileTypeCandidateOrderKey(
    *, score_rank, namespace, local_key
)

Deterministic ordering key for scored file type candidates.

The fields are ordered in comparison priority order so instances can be used directly as min() or sorted() keys.

Attributes:

Name Type Description
score_rank int

Negated candidate score. Lower values sort first, so higher scores win when this key is used with min() or ascending sort.

namespace str

Candidate file type namespace used as a deterministic tie-breaker.

local_key str

Candidate file type local key used as a deterministic tie-breaker.

MatchSignals dataclass

MatchSignals(*, extension, filename, pattern)

Name-based match signals for a FileType (extension, filename/tail, pattern).

any property

any

Whether any name-based signal matched (extension, filename, or pattern).

candidate_order_key

candidate_order_key(candidate)

Return the deterministic ordering key for a file type candidate.

Candidates are ordered by

1) score (descending) 2) namespace (ascending) 3) local key (ascending)

The returned ordered value object is intended to be used with min() or sorted() so that the highest-scoring candidate wins and exact score ties are resolved deterministically.

Parameters:

Name Type Description Default
candidate FileTypeCandidate

Candidate being ranked.

required

Returns:

Type Description
FileTypeCandidateOrderKey

Ordered candidate ranking key.

Source code in src/topmark/resolution/filetypes.py
def candidate_order_key(
    candidate: FileTypeCandidate,
) -> FileTypeCandidateOrderKey:
    """Return the deterministic ordering key for a file type candidate.

    Candidates are ordered by:
      1) score (descending)
      2) namespace (ascending)
      3) local key (ascending)

    The returned ordered value object is intended to be used with `min()` or
    `sorted()` so that the highest-scoring candidate wins and exact score ties
    are resolved deterministically.

    Args:
        candidate: Candidate being ranked.

    Returns:
        Ordered candidate ranking key.
    """
    return FileTypeCandidateOrderKey(
        score_rank=-candidate.score,
        namespace=candidate.namespace,
        local_key=candidate.local_key,
    )

get_file_type_candidates_for_path

get_file_type_candidates_for_path(
    path,
    *,
    include_file_types=None,
    exclude_file_types=None,
)

Return candidate file types using name-based and optional content-based matching.

This helper centralizes the resolution logic used by ResolverStep. For each registered FileType, it computes name-based match signals, determines whether content probing is allowed via the file type's ContentGate, optionally calls the file type's content_matcher, and evaluates inclusion rules and scoring.

Parameters:

Name Type Description Default
path Path

Filesystem path of the file being resolved.

required
include_file_types Collection[str] | None

Optional set of file type identifiers to include. Frozen config passes canonical qualified keys. Direct callers may pass public local identifiers when unambiguous. Empty collection means no whitelist filter.

None
exclude_file_types Collection[str] | None

Optional set of file type identifiers to exclude. Frozen config passes canonical qualified keys. Direct callers may pass public local identifiers when unambiguous. Empty collection means no blacklist filter.

None

Returns:

Type Description
list[FileTypeCandidate]

Unsorted scored candidates. The caller is responsible for selecting the

list[FileTypeCandidate]

best candidate.

Source code in src/topmark/resolution/filetypes.py
def get_file_type_candidates_for_path(
    path: Path,
    *,
    include_file_types: Collection[str] | None = None,
    exclude_file_types: Collection[str] | None = None,
) -> list[FileTypeCandidate]:
    """Return candidate file types using name-based and optional content-based matching.

    This helper centralizes the resolution logic used by `ResolverStep`.
    For each registered `FileType`, it computes name-based match signals,
    determines whether content probing is allowed via the file type's
    `ContentGate`, optionally calls the file type's `content_matcher`, and
    evaluates inclusion rules and scoring.

    Args:
        path: Filesystem path of the file being resolved.
        include_file_types: Optional set of file type identifiers to include.
            Frozen config passes canonical qualified keys. Direct callers may
            pass public local identifiers when unambiguous. Empty collection
            means no whitelist filter.
        exclude_file_types: Optional set of file type identifiers to exclude.
            Frozen config passes canonical qualified keys. Direct callers may
            pass public local identifiers when unambiguous. Empty collection
            means no blacklist filter.

    Returns:
        Unsorted scored candidates. The caller is responsible for selecting the
        best candidate.
    """
    drafts: list[_ProbeCandidateDraft] = _get_probe_candidate_drafts_for_path(
        path,
        include_file_types=include_file_types,
        exclude_file_types=exclude_file_types,
    )
    return [draft.candidate for draft in drafts]

probe_resolution_for_path

probe_resolution_for_path(
    path,
    *,
    include_file_types=None,
    exclude_file_types=None,
)

Resolve a path and return probe-visible explanation details.

This helper uses the shared scoring and deterministic tie-break model and returns all diagnostic details needed by the topmark probe command and probe-backed pipeline resolution.

Parameters:

Name Type Description Default
path Path

Filesystem path of the file being resolved.

required
include_file_types Collection[str] | None

Optional set of file type identifiers to include. Frozen config passes canonical qualified keys. Direct callers may pass public local identifiers when unambiguous.

None
exclude_file_types Collection[str] | None

Optional set of file type identifiers to exclude. Frozen config passes canonical qualified keys. Direct callers may pass public local identifiers when unambiguous.

None

Returns:

Type Description
ResolutionProbeResult

Probe result containing candidates, selected file type, selected processor,

ResolutionProbeResult

status, and reason.

Source code in src/topmark/resolution/filetypes.py
def probe_resolution_for_path(
    path: Path,
    *,
    include_file_types: Collection[str] | None = None,
    exclude_file_types: Collection[str] | None = None,
) -> ResolutionProbeResult:
    """Resolve a path and return probe-visible explanation details.

    This helper uses the shared scoring and deterministic tie-break model and
    returns all diagnostic details needed by the `topmark probe` command and
    probe-backed pipeline resolution.

    Args:
        path: Filesystem path of the file being resolved.
        include_file_types: Optional set of file type identifiers to include.
            Frozen config passes canonical qualified keys. Direct callers may
            pass public local identifiers when unambiguous.
        exclude_file_types: Optional set of file type identifiers to exclude.
            Frozen config passes canonical qualified keys. Direct callers may
            pass public local identifiers when unambiguous.

    Returns:
        Probe result containing candidates, selected file type, selected processor,
        status, and reason.
    """
    drafts: list[_ProbeCandidateDraft] = _get_probe_candidate_drafts_for_path(
        path,
        include_file_types=include_file_types,
        exclude_file_types=exclude_file_types,
    )
    if not drafts:
        return ResolutionProbeResult(
            path=path,
            status=ResolutionProbeStatus.UNSUPPORTED,
            reason=ResolutionProbeReason.NO_CANDIDATES,
            candidates=(),
            selected_file_type=None,
            selected_processor=None,
        )

    ranked_drafts: list[_ProbeCandidateDraft] = sorted(
        drafts,
        key=lambda draft: candidate_order_key(draft.candidate),
    )
    best_draft: _ProbeCandidateDraft = ranked_drafts[0]
    best_candidate: FileTypeCandidate = best_draft.candidate

    top_score: int = best_candidate.score
    top_candidates: list[FileTypeCandidate] = [
        draft.candidate for draft in ranked_drafts if draft.candidate.score == top_score
    ]
    reason: ResolutionProbeReason = (
        ResolutionProbeReason.SELECTED_BY_TIE_BREAK
        if len(top_candidates) > 1
        else ResolutionProbeReason.SELECTED_HIGHEST_SCORE
    )

    selected_file_type = ResolutionProbeSelection(
        qualified_key=best_candidate.file_type.qualified_key,
        namespace=best_candidate.namespace,
        local_key=best_candidate.local_key,
        score=best_candidate.score,
    )

    from topmark.registry.registry import Registry

    processor: HeaderProcessor | None = Registry.resolve_processor(
        best_candidate.file_type.qualified_key
    )
    selected_processor: ResolutionProbeSelection | None = None
    status = ResolutionProbeStatus.RESOLVED
    if processor is None:
        status = ResolutionProbeStatus.NO_PROCESSOR
        reason = ResolutionProbeReason.SELECTED_FILE_TYPE_HAS_NO_BOUND_PROCESSOR
    else:
        selected_processor = ResolutionProbeSelection(
            qualified_key=processor.qualified_key,
            namespace=processor.namespace,
            local_key=processor.local_key,
        )

    probe_candidates: list[ResolutionProbeCandidate] = []
    for rank, draft in enumerate(ranked_drafts, start=1):
        candidate = draft.candidate
        probe_candidates.append(
            ResolutionProbeCandidate(
                qualified_key=candidate.file_type.qualified_key,
                namespace=candidate.namespace,
                local_key=candidate.local_key,
                score=candidate.score,
                selected=candidate.file_type.qualified_key
                == best_candidate.file_type.qualified_key,
                tie_break_rank=rank,
                match=draft.match,
            )
        )

    return ResolutionProbeResult(
        path=path,
        status=status,
        reason=reason,
        candidates=tuple(probe_candidates),
        selected_file_type=selected_file_type,
        selected_processor=selected_processor,
    )