Architecture

This page describes the leafpress rendering pipeline for contributors and developers who want to understand how MkDocs projects are converted into branded documents.

Pipeline Overview

flowchart TD
    CLI["CLI (cli.py)"] --> |"source arg or auto-detect"| SR["Source Resolution (source.py)"]
    SR --> |"ResolvedSource"| PL["Pipeline Orchestrator (pipeline.py)"]
    PL --> CFG["Config Loading (config.py)"]
    PL --> MKP["MkDocs Parsing (mkdocs_parser.py)"]
    PL --> GIT["Git Info (git_info.py)"]
    CFG --> |"BrandingConfig"| PL
    MKP --> |"MkDocsConfig + NavItems"| PL
    GIT --> |"GitVersion"| PL
    PL --> MR["Markdown Rendering (markdown_renderer.py)"]
    MR --> MM["Mermaid Diagrams (mermaid.py)"]
    MR --> AN["Annotations (annotations.py)"]
    MM --> |"HTML with images"| MR
    AN --> |"HTML with footnotes"| MR
    MR --> |"list of NavItem, HTML"| PL
    PL --> PDF["PDF Renderer"]
    PL --> HTML["HTML Renderer"]
    PL --> DOCX["DOCX Renderer"]
    PL --> ODT["ODT Renderer"]
    PL --> EPUB["EPUB Renderer"]
    PL --> MDE["Markdown Export Renderer"]
    PDF --> OUT["Output Files"]
    HTML --> OUT
    DOCX --> OUT
    ODT --> OUT
    EPUB --> OUT
    MDE --> OUT

Pipeline Stages

1. CLI Entry Point

Module: src/leafpress/cli.py

The Typer-based CLI parses arguments and invokes the pipeline. The convert command accepts a source path (or auto-detects it), output format, branding config path, and rendering options like cover page, TOC, watermark, and local timezone.

The info command uses the same source resolution to display project metadata without rendering.

2. Source Resolution

Module: src/leafpress/source.py

resolve_source(source, branch) returns a ResolvedSource context manager. It detects whether the source is a git URL (via regex) or a local path:

Git URLs are cloned to a temporary directory with optional branch checkout. The temp directory is cleaned up automatically when the context exits.
Local paths are validated and used directly without cleanup.

3. Configuration

Module: src/leafpress/config.py

BrandingConfig is a Pydantic model that defines all branding fields (company name, logo, colors, footer options, watermark, etc.). Configuration is loaded from leafpress.yml via load_config(), with every field overridable via LEAFPRESS_* environment variables through _apply_env_overrides().

config_from_env() can build a complete config purely from environment variables when no YAML file is available.

4. MkDocs Parsing

Module: src/leafpress/mkdocs_parser.py

parse_mkdocs_config(config_path) reads mkdocs.yml and returns a MkDocsConfig dataclass containing the site name, docs directory, nav structure, markdown extensions, and theme info.

The nav is parsed recursively into NavItem trees, then flatten_nav() produces a depth-first ordered list where section headers have path=None and pages have their markdown file path.

If no nav key is defined, _auto_discover_nav() walks the docs directory to build one automatically.

5. Markdown Rendering

Module: src/leafpress/markdown_renderer.py

MarkdownRenderer converts each page's markdown to HTML using Python-Markdown with a full set of extensions (tables, fenced code, admonitions, footnotes, pymdownx highlight/superfences/tabbed/tasklist/emoji, and more).

After initial conversion, the renderer applies post-processing:

Asset resolution — rewrites relative src= and href= attributes to absolute file:// URIs
Emoji mapping — resolves :material-*: shortcodes to unicode or SVG
Annotation processing — transforms Material for MkDocs annotation markers into footnotes
Mermaid rendering — converts fenced mermaid blocks into inline images

6. Post-Processing Modules

Mermaid Diagrams

Module: src/leafpress/mermaid.py

render_mermaid_blocks(html, output_dir) finds fenced mermaid code blocks in the HTML, encodes each diagram as base64, sends it to mermaid.ink for rendering, and replaces the code block with an <img> tag pointing to the generated PNG. File names use SHA256 digests for deduplication.

Annotations

Module: src/leafpress/annotations.py

render_annotations(html) finds elements with the annotate class paired with sibling <ol> lists (the Material for MkDocs annotation pattern). It replaces (N) text markers with superscript references and converts the ordered list into a styled annotation block.

Monorepo Pipeline

When projects is defined in leafpress.yml, the pipeline switches to monorepo mode. Instead of parsing a single mkdocs.yml, it processes each sub-project independently and combines the results:

Detection — if branding.projects is non-empty, monorepo mode activates
Per-project processing — for each entry in projects:
- Resolve the source (local path or git clone for URL entries)
- Parse the project's own mkdocs.yml
- Detect the project's package version (without walking up to parent directories)
- Build a chapter cover page with per-project metadata (author, subtitle, etc.), falling back to top-level branding values
- Create a chapter NavItem at level 0
- Flatten the project's nav and bump all levels by +1 via bump_nav_levels(), so project pages nest under the chapter heading
- Render each page's Markdown to HTML using a project-specific MarkdownRenderer (with the project's own extensions and docs directory)
Combination — all chapter covers and rendered pages are concatenated into a single html_pages list
Output — the combined list is passed to format renderers, producing a single document with chapters

Each sub-project gets its own MarkdownRenderer instance, so extension configurations and docs directories are isolated between projects. Git URL projects are cloned to temporary directories and cleaned up automatically after all pages are collected.

7. Format Rendering

Each renderer receives list[tuple[NavItem, str]] (the nav structure paired with rendered HTML per page) plus branding config, git info, and rendering options.

BaseRenderer Protocol & Shared Helpers

Module: src/leafpress/base_renderer.py

All renderers conform to the BaseRenderer protocol, which defines the common constructor and render() signatures. This module also provides shared helper functions used across multiple renderers:

Helper	Purpose	Used by
`replace_checkboxes(html)`	Replaces `<input type="checkbox">` elements with unicode symbols (☑/☐) for print-friendly output	PDF, HTML, EPUB
`make_anchor_id(title)`	Converts a title string to a URL-safe anchor ID	HTML, EPUB
`resolve_logo_uri(branding)`	Returns the logo as a `file://` URI or HTTP URL, or empty string	PDF, HTML

Format-Specific Renderers

Format	Module	Library	Approach
PDF	`src/leafpress/pdf/renderer.py`	WeasyPrint	Jinja2 HTML templates + CSS, rendered to PDF
HTML	`src/leafpress/html/renderer.py`	Jinja2	Single-file HTML with inline CSS and embedded assets
DOCX	`src/leafpress/docx/renderer.py`	python-docx	HTML parsed via custom `html_converter.py` into docx elements
ODT	`src/leafpress/odt/renderer.py`	odfpy	Programmatic ODF document construction
EPUB	`src/leafpress/epub/renderer.py`	ebooklib	HTML chapters wrapped in EPUB structure
Markdown	`src/leafpress/markdown_export/renderer.py`	—	Reads source `.md` files, concatenates with front matter and TOC

All renderers support cover pages, tables of contents, branding, and watermarks. PDF and HTML use Jinja2 templates in their respective templates/ directories; DOCX, ODT, and EPUB build documents programmatically. The Markdown export renderer reads source .md files directly rather than converting from HTML, preserving the original formatting.

Import Pipeline

The leafpress import command converts Word (.docx), PowerPoint (.pptx), Excel (.xlsx), and LaTeX (.tex) files to Markdown. This is a separate pipeline from the convert flow above.

flowchart TD
    CLI["CLI (cli.py)"] --> |"file path + options"| DET["Format Detection"]
    DET --> |".docx"| DOCXI["DOCX Converter (importer/converter.py)"]
    DET --> |".pptx"| PPTXI["PPTX Converter (importer/converter_pptx.py)"]
    DET --> |".xlsx"| XLSXI["XLSX Converter (importer/converter_xlsx.py)"]
    DET --> |".tex"| TEXI["TeX Converter (importer/converter_tex.py)"]
    DOCXI --> MAM["mammoth (HTML → Markdown)"]
    PPTXI --> PPT["python-pptx (slides → Markdown)"]
    XLSXI --> OPX["openpyxl (sheets → Markdown tables)"]
    TEXI --> PLE["pylatexenc (AST → Markdown)"]
    MAM --> IMG["Image Handler (importer/image_handler.py)"]
    PPT --> IMG
    PLE --> IMG
    IMG --> |"assets/"| OUT["Output .md + images"]
    MAM --> OUT
    PPT --> OUT
    OPX --> OUT
    PLE --> OUT

DOCX Import

Module: src/leafpress/importer/converter.py

Uses the mammoth library to convert Word documents to HTML, then transforms the HTML to Markdown. Supports image extraction (via ImageHandler), configurable code block detection by Word style name, and heading level mapping.

PPTX Import

Module: src/leafpress/importer/converter_pptx.py

Uses python-pptx to iterate over slides and extract content:

Slide titles become ## H2 headings (untitled slides get ## Slide N)
Text frames are converted to Markdown with bold/italic/hyperlink preservation
Tables are rendered as pipe-style Markdown tables
Images are extracted to an assets/ directory via ImageHandler.save_image()
Speaker notes are included as blockquotes (toggleable via --notes/--no-notes)
Group shapes are recursed into for nested content

XLSX Import

Module: src/leafpress/importer/converter_xlsx.py

Uses openpyxl to read Excel workbooks in data-only mode (computed values, not formulas). Each worksheet becomes a ## Sheet Name section with a pipe-style Markdown table. The first row is treated as the header. Empty sheets are skipped. No image extraction is needed.

LaTeX Import

Module: src/leafpress/importer/converter_tex.py

Uses pylatexenc to parse LaTeX source into an AST, then walks the tree to produce Markdown:

Sections (\section, \subsection, etc.) become ATX headings
Formatting (\textbf, \textit, \texttt) become Markdown equivalents
Math (inline $...$ and display environments like equation, align) passes through verbatim for MathJax/KaTeX
Lists (itemize, enumerate, description) become bullet/numbered/definition lists with nesting support
Tables (tabular) are rendered as pipe-style Markdown tables with column alignment
Code blocks (verbatim, lstlisting, minted) become fenced code blocks with language detection
Images (\includegraphics) are resolved relative to the .tex file and copied via ImageHandler
Figures use \caption text as image alt text
Links (\href, \url) become Markdown links
Footnotes (\footnote) become Markdown footnote syntax

Shared Importer Base

Module: src/leafpress/importer/base.py

Contains utilities shared across all four converters: ImportResult dataclass, resolve_output_path(), postprocess_markdown(), and rows_to_pipe_table().

Image Handler

Module: src/leafpress/importer/image_handler.py

Shared by the DOCX, PPTX, and LaTeX importers. ImageHandler manages an output directory for extracted images. save_image(image_bytes, content_type) writes image data to assets/ with content-type-based extensions, returning a relative Markdown image path. The DOCX importer uses handle_image() as a mammoth callback; the PPTX and LaTeX importers call save_image() directly.

Module Map

Layer	Files	Purpose
CLI	`cli.py`	Command definitions, argument parsing, progress display
Orchestration	`pipeline.py`	Coordinates all stages of conversion
Input	`source.py`, `project.py`	Source resolution, project auto-detection
Import	`importer/base.py`, `importer/converter.py`, `importer/converter_pptx.py`, `importer/converter_xlsx.py`, `importer/converter_tex.py`, `importer/image_handler.py`	DOCX/PPTX/XLSX/LaTeX to Markdown conversion
Config	`config.py`, `exceptions.py`	Branding schema, validation, env overrides
Parsing	`mkdocs_parser.py`	MkDocs config and nav parsing
Rendering	`markdown_renderer.py`	Markdown-to-HTML conversion
Post-processing	`mermaid.py`, `annotations.py`	Diagram and annotation transforms
Renderer base	`base_renderer.py`	Renderer protocol and shared helpers (checkboxes, anchors, logo URIs)
Output	`pdf/`, `html/`, `docx/`, `odt/`, `epub/`, `markdown_export/`	Format-specific renderers and templates
Metadata	`git_info.py`	Git version extraction
Diagnostics	`doctor.py`	Environment health checks

Adding a New Output Format

To add a new output format (e.g., LaTeX):

Create a renderer module at src/leafpress/{format}/renderer.py with a class that satisfies the BaseRenderer protocol defined in src/leafpress/base_renderer.py. The class must accept (branding, git_info, mkdocs_cfg) and implement a render() method that produces the output file. Use shared helpers from base_renderer (e.g., replace_checkboxes, make_anchor_id, resolve_logo_uri) rather than reimplementing common logic.
Register in pipeline.py — add a branch in the format dispatch logic that instantiates your renderer and calls render().
Add the CLI format option — extend the format choices in cli.py so users can pass -f {format}.
Add tests — create tests/test_{format}_renderer.py with cover page, TOC, branding, and watermark tests following the patterns in existing test files.
Document — add a page in docs/docs/ and update the nav in docs/mkdocs.yml.

Key Dependencies

LeafPress is built on top of excellent open-source libraries. Here's what powers each layer of the pipeline.

CLI & Configuration

Library	Role	Links
Typer	CLI framework with automatic help and shell completion	GitHub · Docs
Rich	Terminal formatting, progress bars, and status spinners	GitHub · Docs
Pydantic	Configuration schema validation and environment variable parsing	GitHub · Docs
PyYAML	YAML config file parsing	GitHub · PyPI
python-dotenv	`.env` file loading for environment-based config	GitHub

Markdown Processing

Library	Role	Links
Python-Markdown	Core Markdown-to-HTML conversion engine	GitHub · Docs
PyMdown Extensions	Tabbed content, task lists, code highlighting, emoji, and superfences	GitHub · Docs
Pygments	Syntax highlighting for code blocks	GitHub · Docs
Beautiful Soup	HTML post-processing (asset resolution, annotations, mermaid)	Docs
lxml	Fast HTML/XML parser backend for Beautiful Soup	GitHub · Docs
Jinja2	HTML and PDF template rendering	GitHub · Docs

Output Renderers

Library	Role	Links
WeasyPrint	PDF generation from HTML+CSS (optional)	GitHub · Docs
python-docx	DOCX document generation	GitHub · Docs
odfpy	ODT (OpenDocument) generation	GitHub · PyPI
EbookLib	EPUB generation	GitHub · Docs

Document Import

Library	Role	Links
mammoth	Word (.docx) to HTML conversion with semantic style mapping	GitHub · PyPI
markdownify	HTML-to-Markdown conversion for the DOCX import pipeline	GitHub · PyPI
python-pptx	PowerPoint (.pptx) slide parsing and content extraction	GitHub · Docs
openpyxl	Excel (.xlsx) workbook reading and cell extraction	GitHub · Docs

Other

Library	Role	Links
GitPython	Git tag, branch, and commit extraction for document footers	GitHub · Docs
Requests	HTTP client for diagram fetching and mermaid rendering	GitHub · Docs