Skip to content

Plugins and extensibility

TopMark supports extensibility via plugins that provide:

  1. File types (definitions of how TopMark recognizes files), and optionally
  2. Header processors (implementations that can detect/insert/update/strip headers for those file types).

This page documents the supported plugin extension points for TopMark 1.x.

Note

The canonical vocabulary used throughout the documentation is defined in Terminology and Canonical Vocabulary.

For the lower-level registry architecture, composed registry views, bindings, overlays, and identity semantics, see Registry model.

See also:


Conceptual model

Plugins extend TopMark by contributing file type definitions and, for advanced integrations, runtime processor overlay registrations.

The detailed registry architecture is documented in Registry model. In short:

  • file type plugins are loaded through the topmark.filetypes entry point group;
  • built-in processors are defined by TopMark's internal binding inventory;
  • advanced processor integrations use runtime overlays;
  • CLI and API execution use the effective composed runtime registry view.

Plugin authors should treat qualified file type identifiers, such as my_plugin:my_lang, as the stable reference once custom namespaces are involved.


Extension points

File types are discovered through Python entry points. TopMark loads:

  • built-in file types from a small set of internal modules, and
  • plugin file types from the entry point group:
  • Entry point group: topmark.filetypes

A plugin contributes one or more FileType objects through that entry point.

When loaded: lazily, when TopMark first performs file-type resolution.


Header processors (advanced / internal-facing)

Built-in header processors are declared explicitly in TopMark's internal processor binding inventory and instantiated when the base processor registry is constructed.

Advanced integrations and tests may still register additional processor classes at runtime through topmark.registry.registry.Registry or topmark.registry.processors.HeaderProcessorRegistry. These registrations are applied as overlay-only mutations layered on top of the immutable internal base registry.


Registration order and runtime overlays

TopMark uses explicit base registries plus overlay registries:

  • base file types are loaded from built-ins and file-type entry points;
  • base processors are constructed from explicit built-in bindings;
  • runtime additions and removals are applied as overlays via topmark.registry.*.

This means plugin-defined file types must still be available before a processor class is registered against them, but processor registration no longer depends on module import order or decorator side effects. Path-based file type selection is performed by the shared scoring resolver in topmark.resolution.filetypes. The formal selection and ambiguity policy is documented in resolution.md.


Writing a FileType plugin

File type identity: name and namespace

Every FileType has two identity components:

  • namespace: identifies the producer, such as topmark, acme, or my_plugin
  • name: the local file type key within that namespace

TopMark reserves the namespace topmark (the internal constant TOPMARK_NAMESPACE) for built-in file types.

Plugin guidance:

  • Set namespace to your package or organization identifier, for example "acme" or "my_company".
  • Choose a clear local name, for example "django_html" or "my_lang".
  • Use qualified file type identities, such as "acme:django_html", in shared configuration, processor bindings, and documentation.

Note: namespace is mandatory for both file types and processors. The built-in namespace topmark is reserved for TopMark-provided types.

TopMark normalizes file type identifiers to canonical qualified keys of the form <namespace>:<name>.

TopMark accepts both:

  • local identifiers such as "python", when unambiguous;
  • qualified identifiers such as "topmark:python" or "acme:django_html".

Local identifiers are accepted only when unambiguous in the effective composed registry. If multiple file types share the same local identifier, callers must use the qualified form.

Registry-facing APIs resolve identifiers through FileTypeRegistry.resolve_filetype_id(...).

For the complete identity contract, see Registry model.

1) Create a provider function

Create a module that returns an iterable of FileType objects.

Example:

# my_topmark_plugin/filetypes.py
from __future__ import annotations

from topmark.filetypes.model import FileType

def provide_filetypes() -> list[FileType]:
    return [
        FileType(
            name="my_lang",
            namespace="my_plugin",
            extensions=[".mylang"],
            filenames=[],
            patterns=[],
            description="MyLang source files",
            skip_processing=False,
        )
    ]

TopMark provides a small helper factory that simplifies constructing multiple file types that share the same namespace.

from topmark.filetypes.factory import make_filetype_factory

make_my_ft = make_filetype_factory(namespace="my_plugin")

MY_FILETYPE = make_my_ft(
    name="my_lang",
    description="MyLang source files",
    extensions=[".mylang"],
)

This avoids repeating the namespace argument and ensures that all FileType instances created by the plugin share the correct identity.

The factory only constructs FileType objects.

Registration still happens when TopMark loads file types through the topmark.filetypes entry point group.

2) Register the entry point

In your plugin's pyproject.toml:

[project.entry-points."topmark.filetypes"]
my_topmark_plugin = "my_topmark_plugin.filetypes:provide_filetypes"

This registers your file type provider so TopMark can discover it.

File types are loaded lazily when TopMark first resolves file types during configuration or pipeline execution.


Writing a HeaderProcessor plugin (advanced)

Header processor plugins use advanced runtime-overlay registration semantics rather than entry-point discovery.

A processor class must define a stable processor identity:

  • namespace: identifies the producer
  • key: local processor identifier within that namespace

The qualified processor identity is <namespace>:<key>.

To register a processor class for a file type at runtime, use the composed registry layer:

# my_topmark_plugin/processors.py
from __future__ import annotations

from topmark.processors.base import HeaderProcessor
from topmark.registry.registry import Registry


class MyLangHeaderProcessor(HeaderProcessor):
    """Example processor for MyLang."""
    namespace = "my_plugin"
    key = "my_lang"

    # Implement required HeaderProcessor methods here
    ...


Registry.register_processor("my_plugin:my_lang", MyLangHeaderProcessor)

At registration time, TopMark resolves the file type identifier through the composed runtime file type registry and then binds the processor to that resolved FileType object. Qualified identifiers are recommended because a local file type identifier may become ambiguous once multiple namespaces define similarly named file types.

Important:

  • file type registration must happen before processor registration;
  • runtime processor registrations are overlay-only and do not mutate immutable built-in base registry data;
  • processor bindings should use canonical qualified file type identifiers for deterministic behavior.

Runtime processor registration flow

Unlike file types, processor classes are not discovered from entry points. They are registered explicitly through the runtime registry API when needed.

A typical advanced integration flow is:

  1. expose file types through the topmark.filetypes entry point group;
  2. let TopMark discover those file types lazily;
  3. register processor classes explicitly through HeaderProcessorRegistry.register(...), Registry.bind(...), or HeaderProcessorRegistry.register(...) during controlled initialization.

This keeps built-in registry construction deterministic and avoids module-import side effects.


For most integrations, providing FileType plugins only is sufficient.

Header processor plugins are more advanced because they rely on runtime overlay registration and explicit processor bindings.

Unless you need custom header parsing or formatting logic, prefer defining custom file types that reuse existing processors.


Troubleshooting

"Unknown file type" during processor registration

Cause: the processor registration target does not resolve through the composed file type registry.

Fix:

  • Ensure the plugin file type (including its namespace and unique name) is registered via the topmark.filetypes entry point.
  • Ensure file type discovery occurs before calling Registry.bind(...).
  • Prefer qualified file type identifiers such as "my_plugin:my_lang" when registering processors.

"Ambiguous file type identifier" during processor registration

Cause: an unqualified file type identifier such as "python" or "html" matched more than one file type in the composed registry.

Fix:

  • Retry with a qualified identifier such as "topmark:html" or "my_plugin:django_html".
  • Use qualified identifiers consistently in shared configuration, processor bindings, and plugin documentation.

Duplicate processor registration

TopMark rejects duplicate overlay registrations targeting the same effective file type binding.

If you see an error indicating that a processor is already registered for a file type, decide on an explicit overlay strategy first, for example:

  • unregister the existing overlay and then register the replacement;
  • leave the existing processor in place;
  • fail fast and require the caller to choose a policy explicitly.

Relevant internal modules

These modules are useful for advanced TopMark integrations and registry extensions:

These composed registries provide effective runtime views that combine base registrations with runtime overlays and removals.


See also