topmark.pipeline.policy_whitespace¶
topmark / pipeline / policy_whitespace
Policy-aware whitespace utilities for the pipeline.
This module provides shared helpers used by processors and steps to reason about
blank lines and effectively empty bodies in a file-type aware way. The behavior is
controlled by topmark.filetypes.policy.FileTypeHeaderPolicy, in particular
its blank_collapse_mode and blank_collapse_extra fields.
Helpers¶
is_pure_spacer(line, policy)- classify a single line as a pure spacer per policy (STRICT/UNICODE/NONE, with optional extra chars).is_effectively_empty_body(lines, policy)- determine whether a sequence of lines should be treated as effectively empty (only spaces/tabs/EOLs and BOMs), without consuming control characters such as form-feed unless the policy opts in.
is_pure_spacer ¶
Return True if line should be treated as a pure spacer per policy.
STRICT: spaces/tabs/EOL only; preserve control chars (e.g., \x0c). UNICODE: all Unicode whitespace (like str.strip()). NONE: never collapse non-empty lines.
blank_collapse_extra may list extra chars to treat as blank in addition.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
line
|
str
|
The line to check. |
required |
policy
|
FileTypeHeaderPolicy | None
|
The policy to use, or None for defaults |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if the line is blank per policy, else False. |
Source code in src/topmark/pipeline/policy_whitespace.py
is_effectively_empty_body ¶
Return True if the given lines are effectively empty per policy.
The body is considered empty when, after removing BOMs and line terminators, all remaining characters are ignorable under the policy:
- STRICT (default): only spaces and tabs are ignorable; control characters such as form-feed (\x0c) are preserved.
- UNICODE: any Unicode whitespace is ignorable (akin to
str.strip()). - NONE: never treat non-empty content as empty.
blank_collapse_extra extends the ignorable set with project-specific characters.