Skip to content

Registry Model

TopMark uses a layered registry architecture to manage:

  • file type identities
  • header processor identities
  • bindings between file types and processors
  • runtime overlays and extensions
  • resolver and probe integration

The registry model is explicit, deterministic, overlay-based, and composition-oriented. Identity registration and processor binding are separate operations.

This page owns the detailed registry model. The broader system architecture is documented in Architecture overview.

Note

The canonical vocabulary used throughout the documentation is defined in Terminology and Canonical Vocabulary.


Runtime model overview

The runtime registry model is primarily composed of:

These runtime registry objects participate in stable runtime behavior such as:

  • file type resolution
  • processor dispatch
  • policy lookup
  • pipeline execution
  • CLI introspection
  • machine-readable output rendering

Note

User-facing documentation intentionally focuses on stable runtime behavior and public CLI

contracts rather than internal implementation objects.

Advanced registry behavior and overlay mutation semantics are documented here for maintainers, plugin authors, and advanced integrators.


Design goals

The registry model exists to make TopMark extensible without making runtime behavior implicit or order-dependent.

Earlier process-global mutable registries made tests order-dependent and blurred the distinction between introspection and mutation.

The current model keeps base registry data immutable and confines mutation to explicit overlay state.

The main goals are:

  1. deterministic behavior across CLI, API, tests, and documentation generation;
  2. safe extensibility for plugins and tests;
  3. clear separation between introspection and mutation;
  4. efficient composition of effective runtime registry views;
  5. test isolation for registry overlays;
  6. a single effective registry view for resolver, pipeline, API, and CLI behavior.

Base registries and overlays

TopMark composes effective runtime registries from immutable base registry data plus mutable overlay state.

Base registries contain:

  • built-in file types;
  • discovered file type plugins;
  • built-in processor definitions;
  • built-in file-type-to-processor bindings.

Overlay state contains process-local additions and removals requested by tests, plugins, runtime extensions, or advanced integrations.

The effective composed runtime registry view is:

base registry + overlay additions - overlay removals

Base registries are not mutated by overlay operations.

This allows TopMark to keep built-in registry state immutable while still supporting runtime extension and isolated tests.

flowchart TB
    subgraph BASE[Base registries]
        BFT["Base FileTypes<br/>(built-ins + plugins)"]
        BPR["Base Processors<br/>(built-ins)"]
        BBD["Base Bindings<br/>(built-ins)"]
    end

    subgraph OVER[Overlay state]
        OFT["FileType overlays"]
        OPR["Processor overlays"]
        OBD["Binding overlays"]
    end

    BFT --> EFT["Effective FileType view"]
    OFT --> EFT

    BPR --> EPR["Effective Processor view"]
    OPR --> EPR

    BBD --> EBD["Effective Binding view"]
    OBD --> EBD

    EFT --> RES["File type resolution"]
    EBD --> BIND["Processor binding lookup"]
    EPR --> BIND
    RES --> BIND

Registry layers

TopMark separates identity registries from relationship registries.

This separation is part of the stable 1.x registry architecture contract.

FileTypeRegistry

FileTypeRegistry manages file type identities.

Each file type has:

  • namespace
  • local key
  • qualified key
  • extensions
  • resolver and matching metadata

Examples of local identifiers:

python
markdown

Examples of canonical qualified identifiers:

topmark:python
topmark:markdown

TopMark normalizes file type identifiers to canonical qualified keys.

Local identifiers are accepted only when unambiguous.

HeaderProcessorRegistry

HeaderProcessorRegistry manages header processor identities.

Processors remain independent from file types. This allows:

  • multiple file types to share a processor;
  • processor bindings to change without redefining file types;
  • runtime overlays and plugin integration.

BindingRegistry

BindingRegistry manages relationships between file types and processors.

Bindings define:

  • which processor is selected for a file type;
  • whether a recognized file type is supported;
  • which processor participates in header operations.

This separation prevents implicit side effects between identity registration and processor binding.


Registry facade

Registry provides the stable read-only facade over the effective composed runtime registries.

The facade exposes immutable effective composed runtime registry views.

The stable public-facing runtime facade is:

Most integrations should prefer the facade rather than interacting directly with advanced registry mutation APIs.

Examples:

from topmark.registry.registry import Registry

for ft in Registry.filetypes().values():
    print(ft.qualified_key)
from topmark.registry.registry import Registry

for binding in Registry.bindings():
    print(binding.file_type_key, binding.processor_key)

Public facade vs advanced registries

The stable public-facing runtime registry entry point is:

It exposes read-only effective views and is suitable for introspection.

The advanced registries are:

These registries provide overlay mutation helpers such as registration, unregistration, binding, and unbinding. They are intended for:

  • tests;
  • plugins;
  • advanced integrations.

Overlay mutation helpers affect overlay state only. They do not mutate immutable built-in or plugin-discovered base registry entries.


Qualified vs local identifiers

TopMark accepts file type identifiers in either:

  • local form (python);
  • qualified form (topmark:python).

Identifiers normalize to canonical qualified keys.

Local identifiers are accepted only when unambiguous.

If multiple registered file types share the same local identifier, callers must use the qualified form.

Examples:

topmark:python
acme:python

In this situation:

python

is ambiguous.

Use:

topmark:python

instead.

Advanced registry-facing APIs normalize and resolve identifiers through FileTypeRegistry.resolve_filetype_id(...), which returns the matching FileType instance from the effective composed runtime registry.

flowchart LR
    INPUT["Public identifier<br/>python or topmark:python"]
    RESOLVE["FileTypeRegistry.resolve_filetype_id(...)"]
    FT["FileType<br/>qualified_key = topmark:python"]
    RUNTIME["Resolver, filters,<br/>policy lookup, bindings"]

    INPUT --> RESOLVE --> FT --> RUNTIME

Recognized vs supported file types

A file type is recognized if its file type identifier exists in FileTypeRegistry.

A file type is supported if it is recognized and has an effective binding through BindingRegistry to a registered processor definition in HeaderProcessorRegistry.

A file type may be recognized but still unbound.

  • it participates in discovery and filtering;
  • it may appear in results depending on the selected report scope;
  • no header insertion or removal is attempted.

Resolver integration

The resolver and probe system operate on canonical qualified file type identities.

This affects:

  • include/exclude file-type filters;
  • policy lookup;
  • runtime bindings;
  • probe diagnostics;
  • CLI filtering;
  • API overlays.

Resolver and probe APIs:


Plugin integration

File type plugins are discovered through the topmark.filetypes entry point group.

Plugin-defined file types participate in the same composed runtime registry and identifier semantics as built-in file types.

Plugin authors should:

  • use a stable namespace such as acme or my_plugin;
  • choose clear local keys such as django_html or my_lang;
  • document and use qualified identifiers such as acme:django_html in shared examples;
  • avoid relying on local identifiers remaining unambiguous as ecosystems grow.

Header processor plugins currently use advanced runtime-overlay integration semantics. They should bind processor definitions to canonical qualified file type identifiers.

For a plugin-focused guide, see Plugins and extensibility.


Registry composition

The effective runtime registry is always derived from immutable base registry data plus overlay state.

Overlay mutations never mutate built-in or plugin-discovered base entries directly. Instead, TopMark recomposes effective runtime registry views from:

base registry + overlay additions - overlay removals

This composition-oriented architecture keeps runtime behavior deterministic while still supporting tests, plugins, runtime extensions, and advanced integrations.


Runtime overlays

Advanced integrations may register runtime overlay mutations. Overlay mutations invalidate composed effective-view caches as described in Caching and invalidation.

Examples include:

  • plugins;
  • tests;
  • temporary runtime bindings;
  • integration-specific file types.

Overlay mutations affect only overlay state layered on top of immutable base registry data.

Overlay operations may:

  • register or unregister file types;
  • register or unregister processors;
  • bind or unbind processors to file types.

Overlay mutations are:

  • process-local;
  • overlay-only;
  • thread-safe;
  • cache-invalidating.

They do not mutate built-in or plugin-discovered base registry entries.

Overlay state exists specifically to support:

  • isolated tests;
  • temporary runtime extensions;
  • advanced integration scenarios;
  • plugin composition without mutating built-ins.

After overlay mutation, the next effective registry read recomposes the effective runtime view from:

base registry + overlay additions - overlay removals

Most integrations should prefer the stable Registry facade and avoid direct overlay mutation unless runtime extension behavior is explicitly required.


Caching and invalidation

Base registries are cached because construction and plugin discovery should happen once per process.

Composed effective views are also cached for fast repeated access.

Any overlay mutation invalidates the composed effective-view cache. The next call to an effective view, such as as_mapping() or the Registry facade, recomposes the view from base registry data and overlay state.

Practical consequences:

  • overlay mutations remain lightweight;
  • registry reads remain fast;
  • tests that mutate overlays must clean them up;
  • callers do not need to manage composed cache invalidation manually.
sequenceDiagram
    autonumber

    participant Caller
    participant FTR as FileTypeRegistry
    participant HPR as HeaderProcessorRegistry
    participant BR as BindingRegistry

    Caller->>FTR: register()/unregister()
    FTR->>FTR: update overlays
    FTR->>FTR: invalidate composed cache

    Caller->>HPR: register()/unregister()
    HPR->>HPR: update overlays
    HPR->>HPR: invalidate composed cache

    Caller->>BR: bind()/unbind()
    BR->>BR: update overlays
    BR->>BR: invalidate composed cache

    Note over Caller,BR: Later...

    Caller->>FTR: as_mapping()
    FTR->>FTR: compose effective view
    FTR-->>Caller: cached mapping

    Caller->>HPR: as_mapping()
    HPR->>HPR: compose effective view
    HPR-->>Caller: cached mapping

    Caller->>BR: as_mapping()
    BR->>BR: compose effective view
    BR-->>Caller: cached mapping

Runtime extension example

from topmark.registry.bindings import BindingRegistry
from topmark.registry.filetypes import FileTypeRegistry
from topmark.registry.processors import HeaderProcessorRegistry

# Register file type identity.
FileTypeRegistry.register(ft)

# Register processor identity.
proc_def = HeaderProcessorRegistry.register(
    processor_class=MyProcessor,
)

# Bind file type to processor.
BindingRegistry.bind(
    file_type_key=ft.qualified_key,
    processor_key=proc_def.qualified_key,
)

Cleanup should reverse the same steps explicitly:

BindingRegistry.unbind(ft.qualified_key)
HeaderProcessorRegistry.unregister(proc_def.qualified_key)
FileTypeRegistry.unregister(ft.qualified_key)

When registering processors against file type identities, prefer qualified file type identifiers such as topmark:python or my_plugin:django_html once multiple namespaces are in play. Local identifiers remain supported when unambiguous, but may become ambiguous as extensions are added.

For long-term or redistributable extensions, prefer publishing a plugin using the topmark.filetypes entry point group.


Registry CLI commands

TopMark provides registry inspection commands.

Examples:

topmark registry filetypes
topmark registry processors
topmark registry bindings

These commands expose the effective composed runtime registry view.

Use:

topmark registry --help

for available subcommands and output options.


Why not per-run registries?

Registries are intentionally process-global rather than threaded through every runtime layer as per-run registry objects.

Reasons include:

  • registry contents affect discovery, resolution, bindings, and pipeline execution;
  • threading registry objects through every API would significantly complicate the runtime model;
  • most users do not need per-run registry customization;
  • overlay mutation already provides explicit runtime-extension behavior when required.

Configuration controls which file types participate in a run.

Registries control which file types, processors, and bindings exist in the effective runtime environment.


Non-goals

The registry model is not designed to provide:

  • transactional registry mutation in production code;
  • fuzzy matching for file type identifiers;
  • implicit namespace fallback or fuzzy namespace resolution;
  • silent mutation of built-in or plugin-provided base entries;
  • per-run registry objects passed through every runtime layer.

Configuration controls which file types are selected for a run. Registries control what file types, processors, and bindings exist in the effective runtime environment.


Stability model

The stable public API surface is defined by:

  • topmark.api
  • the CLI contract
  • documented DTOs and result views

Registry internals are documented for maintainers and advanced integrators, but registry overlay mutation behavior intentionally remains more flexible than the stable topmark.api execution API.

Most integrations should prefer:

rather than mutating advanced registries directly.