Skip to content

topmark.pipeline.views

topmark / pipeline / views

View abstractions for large, phase-scoped pipeline data.

This module defines lightweight, typed "views" that expose file and header data without committing to a concrete backing representation. Implementations can be list-backed today and evolve to memory-mapped or generator-based forms later, while keeping step contracts stable and memory usage low.

The views are intentionally minimal: callers iterate or count lines instead of materializing whole images, and rich blocks/mappings are grouped in small dataclasses per phase.

Releasable

Bases: Protocol

Protocol for views that can release large in-memory buffers.

This optional lifecycle hook allows memory-heavy views to discard their materialized state (e.g., lists of lines) after downstream steps no longer need them. The pipeline runner invokes release() when pruning is enabled to keep peak memory usage low.

Implementers should make release() idempotent: calling it multiple times must be safe and should not raise. Views that do not hold large buffers can implement a no-op release() to satisfy the protocol if needed.

Examples:

  • ListFileImageView clears its backing list[str] reference.
  • A memory-mapped view could close or unmap its file handle.

release

release()

Release any materialized buffers to reduce memory usage.

Source code in src/topmark/pipeline/views.py
def release(self) -> None:
    """Release any materialized buffers to reduce memory usage."""
    ...

FileImageView

Bases: Releasable, Protocol

Protocol for read-only access to a file's logical lines.

FileImageView extends Releasable so implementations must provide a release() method that frees materialized buffers when called by the pipeline runner during pruning.

line_count

line_count()

Return the number of logical lines in the file image.

Returns:

Name Type Description
int int

Total number of lines available via iter_lines().

Source code in src/topmark/pipeline/views.py
def line_count(self) -> int:
    """Return the number of logical lines in the file image.

    Returns:
        int: Total number of lines available via ``iter_lines()``.
    """
    ...

iter_lines

iter_lines()

Iterate the file's logical lines, preserving original line endings.

Returns:

Type Description
Iterable[str]

Iterable[str]: An iterator over the file's lines. The iterator

Iterable[str]

must yield strings exactly as they appear in the source (e.g.,

Iterable[str]

with \n/\r\n kept as read).

Source code in src/topmark/pipeline/views.py
def iter_lines(self) -> Iterable[str]:
    r"""Iterate the file's logical lines, preserving original line endings.

    Returns:
        Iterable[str]: An iterator over the file's lines. The iterator
        must yield strings exactly as they appear in the source (e.g.,
        with ``\n``/``\r\n`` kept as read).
    """
    ...

ListFileImageView dataclass

ListFileImageView(lines)

List-backed FileImageView implementation (and Releasable).

This view wraps an in-memory list[str] where each element represents a logical line including its original newline sequence (keepends semantics). Calling release discards the backing list to free memory; subsequent iteration yields an empty sequence.

Parameters:

Name Type Description Default
lines list[str]

Source lines to expose. The list is not copied; the caller retains ownership and must not mutate it while the view is used.

required
Source code in src/topmark/pipeline/views.py
def __init__(self, lines: list[str]) -> None:
    self._lines: list[str] | None = lines

line_count

line_count()

Return the number of lines in the underlying list.

Returns:

Name Type Description
int int

Total line count.

Source code in src/topmark/pipeline/views.py
def line_count(self) -> int:
    """Return the number of lines in the underlying list.

    Returns:
        int: Total line count.
    """
    return 0 if self._lines is None else len(self._lines)

iter_lines

iter_lines()

Iterate lines from the underlying list without copying.

Returns:

Type Description
Iterable[str]

Iterable[str]: An iterator over the stored lines.

Source code in src/topmark/pipeline/views.py
def iter_lines(self) -> Iterable[str]:
    """Iterate lines from the underlying list without copying.

    Returns:
        Iterable[str]: An iterator over the stored lines.
    """
    return iter(self._lines or ())

release

release()

Release materialized lines.

Source code in src/topmark/pipeline/views.py
def release(self) -> None:
    """Release materialized lines."""
    self._lines = None

HeaderView dataclass

HeaderView(
    *,
    range,
    lines,
    block,
    mapping,
    success_count=0,
    error_count=0,
)

Bases: Releasable

Structured view of the existing header detected by the scanner.

Attributes:

Name Type Description
range tuple[int, int] | None

Inclusive (start, end) line indices of the detected header block within the file, or None when absent.

lines Sequence[str] | None

Header lines exactly as found (keepends), or None when not captured.

block str | None

Concatenated header text ("".join(lines)), or None when not captured.

mapping Mapping[str, str] | None

Parsed field mapping extracted from the header, or None when parsing was not performed.

success_count int

The number of header lines that were successfully parsed and added to the mapping. Defaults to 0.

error_count int

The number of header lines that were malformed (e.g., missing a colon, or having an empty field name). Defaults to 0.

release

release()

Release header buffers (lines, block, mapping).

Source code in src/topmark/pipeline/views.py
def release(self) -> None:
    """Release header buffers (lines, block, mapping)."""
    self.lines = None
    self.block = None
    self.mapping = None

BuilderView dataclass

BuilderView(*, builtins, selected)

Bases: Releasable

Structured view of field dictionaries produced by the builder step.

Attributes:

Name Type Description
builtins Mapping[str, str] | None

Derived built-in fields (e.g., file, relpath).

selected Mapping[str, str] | None

The subset (and overrides) selected for rendering, aligned with the configuration's header_fields order.

Notes

The contained mappings are exposed read-only through abstract mapping types. Calling release() clears the references to allow pruning.

release

release()

Release the diff payload to reduce memory usage.

Source code in src/topmark/pipeline/views.py
def release(self) -> None:
    """Release the diff payload to reduce memory usage."""
    self.builtins = None
    self.selected = None

RenderView dataclass

RenderView(*, lines, block)

Bases: Releasable

Structured view of the expected header produced by the renderer.

Attributes:

Name Type Description
lines Sequence[str] | None

Rendered header lines (keepends), or None when not rendered.

block str | None

Concatenated rendered header text, or None.

Notes

Large buffers may be pruned by calling release(), which clears lines and block.

release

release()

Release the renderer payload to reduce memory usage.

Source code in src/topmark/pipeline/views.py
def release(self) -> None:
    """Release the renderer payload to reduce memory usage."""
    self.lines = None
    self.block = None

UpdatedView dataclass

UpdatedView(*, lines)

Bases: Releasable

View of the pipeline's updated file image.

lines may be a sequence (e.g., list[str]) or a lazy iterable (e.g., a generator composing a three-segment view) to avoid materializing large buffers up-front.

Attributes:

Name Type Description
lines Sequence[str] | Iterable[str] | None

Updated file image as a sequence or iterable of lines, or None when no update was produced.

Notes

Pruning is handled by calling release(), which clears the updated file image reference. If lines is an iterator, callers must treat this view as single-pass.

release

release()

Release the updated file image payload to reduce memory usage.

Source code in src/topmark/pipeline/views.py
def release(self) -> None:
    """Release the updated file image payload to reduce memory usage."""
    self.lines = None

DiffView dataclass

DiffView(*, text)

Bases: Releasable

Unified diff view for CLI/CI consumption.

Attributes:

Name Type Description
text str | None

Unified diff as a single string, or None when no diff was generated.

Notes

Pruning is done by calling release(), which nulls text to free memory.

release

release()

Release the diff payload to reduce memory usage.

Source code in src/topmark/pipeline/views.py
def release(self) -> None:
    """Release the diff payload to reduce memory usage."""
    self.text = None

Views dataclass

Views(
    *,
    image=None,
    header=None,
    build=None,
    render=None,
    updated=None,
    diff=None,
)

Bundle of phase-scoped, releasable views for a single file.

Notes

The bundle itself provides release_all() to prune memory after a run. Individual views remain responsible for their own release() behavior.

release_all

release_all(*, keep_diff_view=False)

Release all non-None views safely (idempotent).

Parameters:

Name Type Description Default
keep_diff_view bool

Whether to preserve the diff view.

False
Source code in src/topmark/pipeline/views.py
def release_all(
    self,
    *,
    keep_diff_view: bool = False,
) -> None:
    """Release all non-None views safely (idempotent).

    Args:
        keep_diff_view: Whether to preserve the diff view.
    """
    logger.debug("keep_diff_view: %r", keep_diff_view)
    if self.image:
        self.image.release()
    if self.header:
        self.header.release()
    if self.build:
        self.build.release()
    if self.render:
        self.render.release()
    if self.updated:
        self.updated.release()

    if self.diff and not keep_diff_view:
        self.diff.release()

as_dict

as_dict()

Short machine-friendly summary; avoid heavy text blobs.

Source code in src/topmark/pipeline/views.py
def as_dict(self) -> dict[str, object]:
    """Short machine-friendly summary; avoid heavy text blobs."""
    return {
        "image_lines": self.image.line_count() if self.image else 0,
        "header_range": getattr(self.header, "range", None),
        "header_fields": dict(self.header.mapping or {}) if self.header else None,
        "build_selected": dict(self.build.selected or {}) if self.build else None,
        "render_line_count": (
            len(self.render.lines) if (self.render and self.render.lines is not None) else 0
        ),
        "updated_has_lines": self.updated is not None and self.updated.lines is not None,
        "diff_present": bool(self.diff and self.diff.text),
    }