Skip to content

Documentation pipeline and reference hygiene

This page documents how TopMark's stable documentation is generated, validated, and kept consistent using the tooling under tools/docs/.

Note

The canonical vocabulary used throughout the documentation is defined in Terminology and Canonical Vocabulary.

It is intended for contributors and maintainers working on:

  • API documentation
  • Internal architecture docs
  • Docstring quality and reference hygiene
  • MkDocs and mkdocs-gen-files integration

This page focuses on the documentation-generation and validation pipeline itself rather than general documentation authoring conventions. Detailed writing conventions, workflow-page structure, heading policy, and snippet usage rules are documented in Documentation Conventions.


Scope of this document

This page documents:

  • generated documentation architecture;
  • MkDocs integration and generation hooks;
  • API and docstring scanning behavior;
  • documentation hygiene validation;
  • snippet and draft handling;
  • strict-mode and validation behavior;
  • reference-hygiene enforcement.

It intentionally does not redefine authoring conventions already covered by:


Overview

This pipeline supports both:

  • the stable public API documentation (topmark.api)
  • internal module documentation (topmark.*)

and is aligned with TopMark's layered runtime and configuration architecture:

  • TOML → FrozenConfig → runtime → pipeline

See Architecture for the conceptual overview.

TopMark's documentation build consists of three coordinated layers:

  1. Handwritten Markdown
  2. Located under docs/
  3. Includes DEV documentation, guides, and architecture notes
  4. Generated Markdown
  5. Produced at build time by mkdocs-gen-files
  6. Includes API internals, public API reference pages, and CLI reference output
  7. Build-time validation and hygiene
  8. Enforced via MkDocs hooks, custom tooling, and shared helpers
  9. Ensures symbol references, snippet includes, and generated pages remain consistent, deterministic, and maintainable

All tooling lives under:

tools/docs/

and is executed only during documentation builds.

Documentation validation is also integrated into local contributor workflows, CI verification, and stable-release validation through make verify, nox, and GitHub Actions.


Relationship to CI and validation tooling

Documentation validation is intentionally layered and deterministic:

  • MkDocs performs rendering-time validation;
  • tools/docs/ performs deterministic repository hygiene and prose-hygiene checks;
  • make verify and nox integrate documentation validation into contributor workflows;
  • GitHub Actions enforce documentation validation in CI.

See also:


Generated documentation

Internals pages

Generated by gen_api_pages.py under:

api/internals/

Characteristics:

  • One page per importable module under src/topmark/
  • Breadcrumb navigation reflecting the package hierarchy
  • Per-package index.md pages listing immediate children
  • A grouped index under api/internals/topmark/index.md

Exact public API surfaces (defined in PUBLIC_API_PREFIXES) are not generated as internals pages, because generating both a public reference page and an internals page for the same module would create duplicate mkdocs-autorefs anchors.

Public reference pages

These pages correspond to the stable public API surface defined by topmark.api.__all__ and are covered by the API snapshot stability contract.

Generated under:

api/reference/

For modules listed in:

PUBLIC_API_PREFIXES = (
    "topmark.api",
    "topmark.registry",
)

These pages represent stable supported public API surfaces.

Public API stability expectations and snapshot validation are documented in:

CLI reference pages

Generated from live TopMark output:

usage/generated-filetypes.md
usage/generated-processors.md

via:

python -m topmark ... --output-format markdown

Generated CLI reference pages are therefore treated as derived release artifacts rather than handwritten documentation.


Relationship to documentation conventions

The documentation pipeline enforces generated-page consistency and validation behavior, while stable writing and structure conventions are documented separately.

Authoring conventions include:

  • heading structure;
  • snippet conventions;
  • workflow-page templates;
  • related-pages conventions;
  • heading-style policy;
  • Markdown organization rules.

See:


Docstring scanning and reference hygiene

Both handwritten Markdown and Python module docstrings are scanned for unlinked backticked symbol references, such as:

`topmark.registry.registry.Registry`

Docstring scanning is performed on raw Python source files before mkdocstrings renders them into Markdown, which ensures reported line numbers always refer to the original src/... files.

The shared enforcement logic lives in:

tools/docs/docs_utils.py

and is used identically by:

  • hooks.py (Markdown scanning)
  • gen_api_pages.py (docstring scanning)

Why this matters

This helps ensure that documentation remains navigable and that symbol references stay valid even as internal modules evolve.

  • mkdocs-autorefs can only resolve symbols that are properly linked;
  • backticked-but-unlinked symbols silently break cross-references;
  • docstrings are rendered into the generated documentation and must follow the same hygiene rules as handwritten Markdown.

What is considered a symbol

A candidate is enforced when it:

  • looks like a dotted Python path;
  • starts with topmark.;
  • is not a filename (.toml, .yaml, ...);
  • is not explicitly whitelisted.

This logic lives in:

tools/docs/docs_utils.should_enforce_link()

Whitelisting non-linkable symbols

Some backticked identifiers are intentional and should not be linked.

These are allowed through an explicit exact-match whitelist:

export TOPMARK_DOCS_NONLINKED_SYMBOLS="topmark.toml,topmark.internal_thing"

Rules:

  • Exact matches only (no prefixes)
  • Applies to Markdown and docstrings
  • Logged in debug mode for transparency

Logging, debug, and strict modes

Two environment variables control documentation-validation behavior:

TOPMARK_DOCS_DEBUG

When enabled:

  • Emits detailed DEBUG/INFO logs
  • Shows:
  • Rendered-on context
  • Edit URLs (for Markdown)
  • Alternate inline-link suggestions (Alt:)
  • Full symbol lists (no truncation)

TOPMARK_DOCS_STRICT_REFS

When enabled:

  • the build fails if any unlinked symbols are found;
  • failures are aggregated and reported after processing all pages;
  • mkdocs.exceptions.Abort is used for clean termination.

Severity behavior remains intentionally consistent between:

  • hooks.py (Markdown scanning)
  • gen_api_pages.py (docstring scanning)

Contextual logging

All diagnostics aim to be actionable.

Depending on origin, logs include:

  • Local repo paths (docs/... or src/...)
  • Line numbers
  • Rendered-on pages
  • Edit URLs (when available)

Context lines are built centrally via:

tools/docs/docs_utils.context_lines()

Drafts and snippets

Draft files

Files under:

docs/_drafts/
docs/**/_drafts/

are:

  • Ignored by MkDocs navigation
  • Ignored by version control
  • Safe for work-in-progress documentation
  • optionally visible when serving documentation locally (marked as draft)

Markdown snippets

Files under:

docs/_snippets/

are:

  • intended for inclusion via plugins such as include-markdown;
  • not standalone pages;
  • explicitly excluded via exclude_docs in mkdocs.yml;
  • intended only for stable reusable documentation fragments.

Markdown documentation hygiene is validated through:

make docs-hygiene

which runs:

python tools/docs/check_docs_hygiene.py --docs-hygiene --stats

Python code-prose hygiene is validated separately through:

python tools/docs/check_code_hygiene.py

The Markdown hygiene validation performs repository-hygiene checks for:

  • broken include paths;
  • malformed docs-root-relative include paths;
  • include targets resolving outside docs/;
  • nested snippet includes;
  • accidental macOS ._* files under documentation sources;
  • Markdown files under docs/ missing from mkdocs.yml navigation;
  • emoji in Markdown headings;
  • missing section separators between level-2 headings.

The checker also reports maintainability warnings for:

  • orphaned snippets;
  • headings inside snippets;
  • smart punctuation in Markdown prose;
  • relative links inside reusable snippets unless include-markdown link rewriting is intentional;
  • snippet include paths that do not use the formatter-stable \_snippets/ prefix.

Shared navigation snippets such as related-pages*.md are intentionally allowed to contain relative links because they centralize reusable documentation navigation behavior.

check_code_hygiene.py complements the Markdown-focused checks by scanning Python comments, docstrings, and prose-oriented string literals under src/topmark/, tests/, and tools/. It currently enforces ASCII-oriented punctuation hygiene for terminal-safe, deterministic, and copy/paste-friendly generated documentation and CLI output.

These checks intentionally remain lightweight and repository-focused. They reinforce repository-wide documentation consistency without turning every style preference into a hard release blocker.


Design principles

The documentation tooling follows a few strict principles:

  • Deterministic - no hidden state and no reliance on import order.
  • Fail-late, report-all - especially in strict mode.
  • Shared logic, single source of truth - no duplicated include semantics, prose-hygiene rules, or validation heuristics.
  • Documentation is code - docstrings, Markdown, and generated reference material are held to the same standard.

Summary

  • documentation is generated, validated, and enforced as part of the build;
  • tools/docs/ is the authoritative location for documentation tooling;
  • reference hygiene, Markdown hygiene, and Python prose hygiene are enforced consistently across documentation sources, comments, and docstrings;
  • debug and strict modes provide both flexibility and CI-grade guarantees.

If you change how TopMark is structured, update the documentation pipeline accordingly - it is a stable and intentionally maintained part of the project architecture.