Document Import

LeafPress can import Word (.docx), PowerPoint (.pptx), Excel (.xlsx), and LaTeX (.tex) files and convert them to Markdown. This is useful for migrating existing documents into an MkDocs project.

leafpress import report.docx
leafpress import deck.pptx
leafpress import data.xlsx
leafpress import paper.tex

# Import multiple files at once — mix and match formats
leafpress import *.docx *.pptx *.xlsx *.tex

# Send all output to a directory
leafpress import *.docx *.pptx *.xlsx *.tex -o docs/

See the CLI Reference for all flags and examples.

Word Import (DOCX)

Word documents are converted using the mammoth library, which maps Word styles to semantic HTML, then to Markdown.

What's supported

Feature	How it's handled
Headings	Mapped to `#`–`######` based on Word heading level
Bold / italic	Preserved as `bold` and `italic`
Hyperlinks	Converted to Markdown link syntax
Ordered / unordered lists	Converted to Markdown lists
Tables	Converted to pipe-style Markdown tables
Images	Extracted to an `assets/` directory and referenced via Markdown image syntax
Code blocks	Detected by Word style name — use `--code-styles` to specify which styles are code

Code block detection

By default, no Word styles are treated as code. Use --code-styles to specify which style names should become fenced code blocks:

leafpress import report.docx --code-styles "Code Block,Source Code"

Limitations

The following Word features are not currently supported and may be lost or simplified during import:

Feature	Reason
Track changes / comments	Mammoth accepts the final document state only — tracked changes and comments are not included in the output.
Headers / footers	Document headers and footers are not part of the body content and are skipped.
Page breaks / columns	Page layout is a visual property with no Markdown equivalent.
Text boxes	Floating text boxes are not part of the main document flow and are skipped by mammoth.
SmartArt / charts	Rendered as embedded images by Word internally — if `--extract-images` is enabled, they may appear as images, but labels and data are not extractable as text.
Custom fonts / colors	Mammoth maps semantic styles (bold, italic, headings) but ignores visual-only formatting like font family, size, and color.
Table of contents	Word TOC fields are not resolved — they appear as static text or are omitted.
Footnotes / endnotes	Converted to inline text rather than Markdown footnote syntax.

Tip

For best results, use Word's built-in heading styles (Heading 1, Heading 2, etc.) rather than manually formatted bold text. Mammoth relies on styles, not visual formatting.

PowerPoint Import (PPTX)

PowerPoint presentations are converted using the python-pptx library. Each slide becomes a section in the output Markdown.

What's supported

Feature	How it's handled
Slide titles	Each slide title becomes an `## H2` heading
Untitled slides	Get a fallback heading `## Slide N`
Body text	Extracted with paragraph structure preserved
Bold / italic	Preserved as `bold` and `italic`
Hyperlinks	Converted to Markdown link syntax
Indented text	Rendered as nested bullet lists based on indent level
Tables	Converted to pipe-style Markdown tables
Images	Extracted to `assets/` and referenced via Markdown image syntax
Speaker notes	Included as blockquotes (toggleable)
Group shapes	Recursed into — nested shapes are extracted individually

Speaker notes

By default, speaker notes are included as Markdown blockquotes beneath each slide:

## Quarterly Review

Revenue increased by 15% over the prior quarter.

> Remember to highlight the APAC growth numbers.

To omit speaker notes:

leafpress import deck.pptx --no-notes

Limitations

The following PowerPoint features are not currently supported and will be silently skipped during import:

Feature	Reason
SmartArt	SmartArt diagrams are stored as complex XML structures that python-pptx cannot access. They appear as opaque shapes with no extractable text.
Charts	Embedded charts (bar, pie, line, etc.) are rendered as OLE objects. The chart data and labels are not accessible through the shape API.
Animations / transitions	Markdown has no equivalent — these are presentation-only features.
Audio / video	Embedded media cannot be meaningfully represented in Markdown.
Slide master / layout formatting	Only content is extracted, not visual styling from the theme.

Tip

If a slide contains SmartArt or charts, consider replacing them with static images in PowerPoint before importing — images are fully supported and will be extracted to assets/.

Excel Import (XLSX)

Excel spreadsheets are converted using the openpyxl library. Each worksheet becomes a section with a Markdown table.

What's supported

Feature	How it's handled
Multiple sheets	Each sheet becomes a `## Sheet Name` section
Header row	First row treated as the table header
Text values	Rendered as-is
Numbers	Integers and floats rendered as strings (whole-number floats drop the `.0`)
Dates / times	Formatted as `YYYY-MM-DD` or `HH:MM:SS`
Empty cells	Rendered as blank table cells
Pipe characters	Escaped to `\\|` so they don't break table syntax
Empty sheets	Skipped silently

Example output

A sheet named "Servers" with three columns produces:

## Servers

| Host     | Role     | CPU |
| -------- | -------- | --- |
| web-01   | frontend | 4   |
| db-01    | database | 8   |

Limitations

Feature	Reason
Merged cells	Merged regions are not unmerged — only the top-left cell value is read.
Formulas	Cell values are read in data-only mode — you see computed results, not formula text. Cells that have never been calculated in Excel may appear blank.
Charts / images	Embedded charts and images are not extracted.
Conditional formatting / colors	Visual-only formatting has no Markdown equivalent.
Multiple header rows	Only the first row is treated as the header.

Tip

For best results, save your Excel file in Excel before importing — this ensures all formula results are cached. LeafPress reads cached values, not formulas.

LaTeX Import (TEX)

LaTeX documents are converted using a native parser (pylatexenc). The converter handles the most common academic paper and documentation constructs.

What's supported

Feature	How it's handled
Headings	`\section` → `##`, `\subsection` → `###`, `\subsubsection` → `####`, etc.
Bold / italic / code	`\textbf` → `bold`, `\textit` / `\emph` → `italic`, `\texttt` → `code`
Lists	`itemize` → bullets, `enumerate` → numbered, with nesting support
Math	Inline $...$ and display `$$...$$` / `\[...\]` / `equation` / `align` passed through for MathJax/KaTeX
Images	`\includegraphics` resolved relative to `.tex` file, copied to `assets/`
Tables	`tabular` → pipe-style Markdown tables with column alignment
Links	`\href{url}{text}` → Markdown links, `\url{url}` → angle-bracket URLs
Code blocks	`verbatim`, `lstlisting`, `minted` → fenced code blocks (with language detection)
Figures	`\caption` text used as image alt text
Title / author	`\title` and `\author` rendered at top of document
Blockquotes	`abstract`, `quote`, `quotation` → blockquotes
Footnotes	`\footnote` → Markdown footnote syntax

Limitations

Feature	Reason
`\input` / `\include`	Multi-file LaTeX projects are not supported — only the specified `.tex` file is converted.
Custom macros	`\newcommand` / `\def` definitions are skipped with a warning. Usages of custom macros appear as raw text.
TikZ / PGF diagrams	`tikzpicture` and `pgfpicture` environments are skipped with a warning.
Beamer	Beamer-specific environments (`frame`) and overlay commands (`\pause`, `\only<>`) are not converted.
Cross-references	`\ref` and `\cite` produce placeholder text like `[ref:label]` and `[key]` — not resolved to numbers or bibliography entries.
Bibliography	`.bib` files are not parsed. `\cite`, `\citet`, `\citep` commands produce bracketed keys.
EPS/PDF images	Only raster image formats (PNG, JPG, SVG, etc.) are copied. EPS and PDF images produce a warning.
Theorem-like environments	`theorem`, `lemma`, `proof`, `corollary`, `definition`, and other custom environments render their body as plain text with a warning.
`\paragraph` headings	Registered as a heading level but may not render with `#####` prefix due to parser argument handling.
`\multicolumn` in tables	Column spanning is not represented — cells render but span information is lost.
`\subfigure` / `\subcaption`	Sub-figure environments are not supported — arguments render as plain text.
Accented characters	LaTeX accent commands (`\"o`, `\'{e}`, `\~{n}`) are not converted to Unicode — they appear as raw LaTeX.
`siunitx` package	`\SI`, `\si`, `\num` commands are not converted — they appear as raw text.
`\label` inside math	Labels within math environments are passed through verbatim in the `$$` block.

Tip

For best results with math, ensure your Markdown renderer supports MathJax or KaTeX. Math expressions are passed through verbatim in LaTeX syntax.

Common options

Image extraction

The Word, PowerPoint, and LaTeX importers extract embedded images to an assets/ directory next to the output file. To skip image extraction:

leafpress import report.docx --no-extract-images
leafpress import deck.pptx --no-extract-images
leafpress import paper.tex --no-extract-images

Output path

By default, the output file uses the same stem as the input (e.g., deck.pptx → deck.md). You can specify a path or directory:

# Explicit file path (single file only)
leafpress import deck.pptx -o docs/presentation.md

# Directory (creates deck.md inside it)
leafpress import deck.pptx -o docs/

When importing multiple files, --output must be a directory (or omitted):

leafpress import *.docx *.pptx *.xlsx -o docs/

URL import

You can import documents directly from URLs — the file is downloaded to a temp directory, converted, and the temp file is cleaned up automatically:

# Import a LaTeX paper from a URL
leafpress import https://example.com/paper.tex

# Import a Word document from a URL
leafpress import https://example.com/report.docx -o docs/

# Mix local files and URLs
leafpress import report.docx https://example.com/slides.pptx -o docs/

The file type is inferred from the URL path extension (.docx, .pptx, .xlsx, .tex). If the URL has no recognized extension, the Content-Type header is used as a fallback. URLs that cannot be mapped to a supported format produce an error.

Batch import

You can pass multiple files and URLs in a single command. Formats can be mixed freely — each source is detected and routed to the appropriate converter:

# Import everything in one shot
leafpress import report.docx proposal.docx slides.pptx data.xlsx paper.tex

# Use shell globs to grab all supported files
leafpress import *.docx *.pptx *.xlsx *.tex

# Mix local and remote sources
leafpress import *.docx https://example.com/paper.tex -o docs/

# Combine with other options
leafpress import *.docx --code-styles "Code Block" --no-extract-images
leafpress import *.pptx --no-notes -o imported/

If one source fails (e.g., missing file, download error, or corrupt document), the remaining sources are still processed. A summary of failures is shown at the end.