Name | marktripy JSON |
Version |
1.0.3
JSON |
| download |
home_page | None |
Summary | A Python package for converting Markdown to AST and back to Markdown |
upload_time | 2025-07-29 11:38:11 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.12 |
license | MIT |
keywords |
ast
commonmark
converter
markdown
parser
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# marktripy
**TL;DR**: A Python package for parsing Markdown to AST, manipulating the tree structure, and serializing back to Markdown while preserving formatting. Built on `markdown-it-py` and `mistletoe` for maximum flexibility.
```python
from marktripy import parse_markdown, render_markdown
# Parse Markdown to AST
ast = parse_markdown("# Hello\n\nThis is **bold** text.")
# Manipulate AST (e.g., downgrade headings)
for node in ast.walk():
if node.type == "heading":
node.level += 1
# Render back to Markdown
markdown = render_markdown(ast)
# Output: "## Hello\n\nThis is **bold** text."
```
## Installation
```bash
# Using pip
pip install marktripy
# Using uv (recommended)
uv add marktripy
# Development installation
git clone https://github.com/yourusername/marktripy
cd marktripy
uv sync --dev
```
## Quick Usage
### Basic Markdown to HTML
```python
from marktripy import markdown_to_html
html = markdown_to_html("# Hello World\n\nThis is **bold** and *italic*.")
# <h1>Hello World</h1><p>This is <strong>bold</strong> and <em>italic</em>.</p>
```
### AST Manipulation
```python
from marktripy import parse_markdown, render_markdown
# Parse Markdown to AST
ast = parse_markdown("""
# Main Title
## Section 1
Some content here.
## Section 2
More content.
""")
# Add IDs to all headings
for node in ast.walk():
if node.type == "heading":
# Generate ID from heading text
text = node.get_text().lower().replace(" ", "-")
node.attrs["id"] = text
# Downgrade all headings by one level
for node in ast.walk():
if node.type == "heading" and node.level < 6:
node.level += 1
# Render back to Markdown
result = render_markdown(ast)
```
### Custom Syntax Extensions
```python
from marktripy import create_extension, Parser
# Create a custom extension for ++text++ → <kbd>text</kbd>
kbd_extension = create_extension(
pattern=r'\+\+([^+]+)\+\+',
node_type='kbd',
html_tag='kbd'
)
# Use parser with extension
parser = Parser(extensions=[kbd_extension])
ast = parser.parse("Press ++Ctrl+C++ to copy")
html = parser.render_html(ast)
# Output: Press <kbd>Ctrl+C</kbd> to copy
```
### CLI Usage
```bash
# Convert Markdown to HTML
marktripy convert input.md -o output.html
# Parse and manipulate Markdown
marktripy transform input.md --downgrade-headings --add-ids -o output.md
# Validate Markdown structure
marktripy validate document.md --check-links --check-headings
```
## The Backstory
### Why Another Markdown Parser?
The Python ecosystem has numerous Markdown parsers, each with different strengths:
- **`markdown`**: The original, extensible but with a complex API
- **`markdown2`**: Faster alternative but less extensible
- **`mistune`**: Fast and supports AST, but limited round-trip capability
- **`marko`**: Good AST support but newer with less ecosystem
- **`markdown-it-py`**: Port of markdown-it with excellent plugin system
After extensive research (see `/ref` directory), I found that no single library perfectly addressed the need for:
1. **Clean AST manipulation** - Easy traversal and modification of document structure
2. **Round-trip conversion** - Parse Markdown → AST → Markdown without losing formatting
3. **Extensibility** - Simple API for adding custom syntax
4. **Performance** - Fast enough for real-world documents
5. **Standards compliance** - CommonMark compliant with GFM extensions
### The Research Journey
The `/ref` directory contains comprehensive research comparing 8+ Python Markdown libraries across multiple dimensions:
- **ref1.md**: Practical guide to advanced Markdown processing in Python
- **ref2.md**: Detailed comparison of parser architectures and extension mechanisms
- **ref3.md**: Performance benchmarks and feature matrix
Key findings:
- `markdown-it-py` offers the best plugin architecture
- `mistletoe` has the cleanest AST representation
- `marko` provides good round-trip capabilities
- Performance varies by 10-100x between libraries
### Design Philosophy
`marktripy` combines the best ideas from existing libraries:
1. **Dual-parser architecture**: Use `markdown-it-py` for extensibility and `mistletoe` for AST manipulation
2. **Unified AST format**: Convert between parser representations transparently
3. **Preserving formatting**: Track source positions and whitespace for faithful round-trips
4. **Plugin-first design**: Everything beyond core CommonMark is a plugin
5. **Type safety**: Full type hints with `mypy --strict` compatibility
## Technical Architecture
### Core Components
```text
marktripy/
├── ast.py # Unified AST node definitions
├── parser.py # Parser abstraction layer
├── renderer.py # Markdown/HTML renderers
├── extensions/ # Built-in extensions
│ ├── gfm.py # GitHub Flavored Markdown
│ ├── toc.py # Table of contents generator
│ └── ...
├── transformers/ # AST transformation utilities
│ ├── headings.py # Heading manipulation
│ ├── links.py # Link processing
│ └── ...
└── cli.py # Command-line interface
```
### AST Structure
The AST uses a unified node structure compatible with both parsers:
```python
class ASTNode:
type: str # Node type (heading, paragraph, etc.)
children: List[ASTNode]
attrs: Dict[str, Any] # Attributes (id, class, etc.)
content: str # Text content for leaf nodes
meta: Dict[str, Any] # Source mapping, parser-specific data
```
### Parser Architecture
```python
# Abstraction layer over multiple parsers
class Parser:
def __init__(self, parser_backend="markdown-it-py", extensions=None):
self.backend = self._create_backend(parser_backend)
self.extensions = extensions or []
def parse(self, markdown: str) -> ASTNode:
# Parse with backend
backend_ast = self.backend.parse(markdown)
# Convert to unified AST
return self._normalize_ast(backend_ast)
```
### Extension System
Extensions can hook into multiple stages:
```python
class Extension:
def extend_parser(self, parser): ... # Modify parser rules
def transform_ast(self, ast): ... # Post-process AST
def extend_renderer(self, renderer): ... # Custom rendering
```
### Rendering Pipeline
1. **AST → Markdown**: Preserves formatting, handles custom nodes
2. **AST → HTML**: Configurable sanitization, custom handlers
3. **AST → JSON**: Serialization for processing pipelines
### Performance Optimizations
- Lazy parsing for large documents
- Streaming renderers for memory efficiency
- Optional C extensions via `umarkdown` backend
- Caching for repeated transformations
## Advanced Usage
### Custom Transformers
```python
from marktripy import Transformer
class HeaderAnchorTransformer(Transformer):
"""Add GitHub-style anchor links to headers"""
def transform(self, ast):
for node in ast.walk():
if node.type == "heading":
anchor = self.create_anchor(node)
node.children.insert(0, anchor)
return ast
```
### Parser Backends
```python
# Use different backends for different needs
from marktripy import Parser
# Maximum compatibility
parser = Parser(backend="markdown")
# Best performance
parser = Parser(backend="mistletoe")
# Most extensions
parser = Parser(backend="markdown-it-py")
```
### Integration Examples
```python
# Pelican static site generator
from marktripy import PelicanReader
# MkDocs documentation
from marktripy import MkDocsPlugin
# Jupyter notebook processing
from marktripy import MarkdownCell
```
## Contributing
We welcome contributions! Key areas:
- Additional extensions (math, diagrams, etc.)
- Performance improvements
- Better round-trip fidelity
- More transformer utilities
See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
## License
MIT License - see [LICENSE](LICENSE) for details.
## Acknowledgments
Built on the shoulders of giants:
- `markdown-it-py` developers for the excellent plugin system
- `mistletoe` for the clean AST design
- The CommonMark specification authors
- All researchers of the Python Markdown ecosystem
Raw data
{
"_id": null,
"home_page": null,
"name": "marktripy",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.12",
"maintainer_email": null,
"keywords": "ast, commonmark, converter, markdown, parser",
"author": null,
"author_email": "Adam <adam@example.com>",
"download_url": "https://files.pythonhosted.org/packages/bf/96/779a97a53739a4596e0a56a08de040799d15e79d93e994d1ce544025b065/marktripy-1.0.3.tar.gz",
"platform": null,
"description": "# marktripy\n\n**TL;DR**: A Python package for parsing Markdown to AST, manipulating the tree structure, and serializing back to Markdown while preserving formatting. Built on `markdown-it-py` and `mistletoe` for maximum flexibility.\n\n```python\nfrom marktripy import parse_markdown, render_markdown\n\n# Parse Markdown to AST\nast = parse_markdown(\"# Hello\\n\\nThis is **bold** text.\")\n\n# Manipulate AST (e.g., downgrade headings)\nfor node in ast.walk():\n if node.type == \"heading\":\n node.level += 1\n\n# Render back to Markdown\nmarkdown = render_markdown(ast)\n# Output: \"## Hello\\n\\nThis is **bold** text.\"\n```\n\n## Installation\n\n```bash\n# Using pip\npip install marktripy\n\n# Using uv (recommended)\nuv add marktripy\n\n# Development installation\ngit clone https://github.com/yourusername/marktripy\ncd marktripy\nuv sync --dev\n```\n\n## Quick Usage\n\n### Basic Markdown to HTML\n\n```python\nfrom marktripy import markdown_to_html\n\nhtml = markdown_to_html(\"# Hello World\\n\\nThis is **bold** and *italic*.\")\n# <h1>Hello World</h1><p>This is <strong>bold</strong> and <em>italic</em>.</p>\n```\n\n### AST Manipulation\n\n```python\nfrom marktripy import parse_markdown, render_markdown\n\n# Parse Markdown to AST\nast = parse_markdown(\"\"\"\n# Main Title\n## Section 1\nSome content here.\n## Section 2\nMore content.\n\"\"\")\n\n# Add IDs to all headings\nfor node in ast.walk():\n if node.type == \"heading\":\n # Generate ID from heading text\n text = node.get_text().lower().replace(\" \", \"-\")\n node.attrs[\"id\"] = text\n \n# Downgrade all headings by one level\nfor node in ast.walk():\n if node.type == \"heading\" and node.level < 6:\n node.level += 1\n\n# Render back to Markdown\nresult = render_markdown(ast)\n```\n\n### Custom Syntax Extensions\n\n```python\nfrom marktripy import create_extension, Parser\n\n# Create a custom extension for ++text++ \u2192 <kbd>text</kbd>\nkbd_extension = create_extension(\n pattern=r'\\+\\+([^+]+)\\+\\+',\n node_type='kbd',\n html_tag='kbd'\n)\n\n# Use parser with extension\nparser = Parser(extensions=[kbd_extension])\nast = parser.parse(\"Press ++Ctrl+C++ to copy\")\nhtml = parser.render_html(ast)\n# Output: Press <kbd>Ctrl+C</kbd> to copy\n```\n\n### CLI Usage\n\n```bash\n# Convert Markdown to HTML\nmarktripy convert input.md -o output.html\n\n# Parse and manipulate Markdown\nmarktripy transform input.md --downgrade-headings --add-ids -o output.md\n\n# Validate Markdown structure\nmarktripy validate document.md --check-links --check-headings\n```\n\n## The Backstory\n\n### Why Another Markdown Parser?\n\nThe Python ecosystem has numerous Markdown parsers, each with different strengths:\n\n- **`markdown`**: The original, extensible but with a complex API\n- **`markdown2`**: Faster alternative but less extensible\n- **`mistune`**: Fast and supports AST, but limited round-trip capability\n- **`marko`**: Good AST support but newer with less ecosystem\n- **`markdown-it-py`**: Port of markdown-it with excellent plugin system\n\nAfter extensive research (see `/ref` directory), I found that no single library perfectly addressed the need for:\n\n1. **Clean AST manipulation** - Easy traversal and modification of document structure\n2. **Round-trip conversion** - Parse Markdown \u2192 AST \u2192 Markdown without losing formatting\n3. **Extensibility** - Simple API for adding custom syntax\n4. **Performance** - Fast enough for real-world documents\n5. **Standards compliance** - CommonMark compliant with GFM extensions\n\n### The Research Journey\n\nThe `/ref` directory contains comprehensive research comparing 8+ Python Markdown libraries across multiple dimensions:\n\n- **ref1.md**: Practical guide to advanced Markdown processing in Python\n- **ref2.md**: Detailed comparison of parser architectures and extension mechanisms\n- **ref3.md**: Performance benchmarks and feature matrix\n\nKey findings:\n\n- `markdown-it-py` offers the best plugin architecture\n- `mistletoe` has the cleanest AST representation\n- `marko` provides good round-trip capabilities\n- Performance varies by 10-100x between libraries\n\n### Design Philosophy\n\n`marktripy` combines the best ideas from existing libraries:\n\n1. **Dual-parser architecture**: Use `markdown-it-py` for extensibility and `mistletoe` for AST manipulation\n2. **Unified AST format**: Convert between parser representations transparently\n3. **Preserving formatting**: Track source positions and whitespace for faithful round-trips\n4. **Plugin-first design**: Everything beyond core CommonMark is a plugin\n5. **Type safety**: Full type hints with `mypy --strict` compatibility\n\n## Technical Architecture\n\n### Core Components\n\n```text\nmarktripy/\n\u251c\u2500\u2500 ast.py # Unified AST node definitions\n\u251c\u2500\u2500 parser.py # Parser abstraction layer\n\u251c\u2500\u2500 renderer.py # Markdown/HTML renderers\n\u251c\u2500\u2500 extensions/ # Built-in extensions\n\u2502 \u251c\u2500\u2500 gfm.py # GitHub Flavored Markdown\n\u2502 \u251c\u2500\u2500 toc.py # Table of contents generator\n\u2502 \u2514\u2500\u2500 ...\n\u251c\u2500\u2500 transformers/ # AST transformation utilities\n\u2502 \u251c\u2500\u2500 headings.py # Heading manipulation\n\u2502 \u251c\u2500\u2500 links.py # Link processing\n\u2502 \u2514\u2500\u2500 ...\n\u2514\u2500\u2500 cli.py # Command-line interface\n```\n\n### AST Structure\n\nThe AST uses a unified node structure compatible with both parsers:\n\n```python\nclass ASTNode:\n type: str # Node type (heading, paragraph, etc.)\n children: List[ASTNode]\n attrs: Dict[str, Any] # Attributes (id, class, etc.)\n content: str # Text content for leaf nodes\n meta: Dict[str, Any] # Source mapping, parser-specific data\n```\n\n### Parser Architecture\n\n```python\n# Abstraction layer over multiple parsers\nclass Parser:\n def __init__(self, parser_backend=\"markdown-it-py\", extensions=None):\n self.backend = self._create_backend(parser_backend)\n self.extensions = extensions or []\n \n def parse(self, markdown: str) -> ASTNode:\n # Parse with backend\n backend_ast = self.backend.parse(markdown)\n # Convert to unified AST\n return self._normalize_ast(backend_ast)\n```\n\n### Extension System\n\nExtensions can hook into multiple stages:\n\n```python\nclass Extension:\n def extend_parser(self, parser): ... # Modify parser rules\n def transform_ast(self, ast): ... # Post-process AST\n def extend_renderer(self, renderer): ... # Custom rendering\n```\n\n### Rendering Pipeline\n\n1. **AST \u2192 Markdown**: Preserves formatting, handles custom nodes\n2. **AST \u2192 HTML**: Configurable sanitization, custom handlers\n3. **AST \u2192 JSON**: Serialization for processing pipelines\n\n### Performance Optimizations\n\n- Lazy parsing for large documents\n- Streaming renderers for memory efficiency \n- Optional C extensions via `umarkdown` backend\n- Caching for repeated transformations\n\n## Advanced Usage\n\n### Custom Transformers\n\n```python\nfrom marktripy import Transformer\n\nclass HeaderAnchorTransformer(Transformer):\n \"\"\"Add GitHub-style anchor links to headers\"\"\"\n \n def transform(self, ast):\n for node in ast.walk():\n if node.type == \"heading\":\n anchor = self.create_anchor(node)\n node.children.insert(0, anchor)\n return ast\n```\n\n### Parser Backends\n\n```python\n# Use different backends for different needs\nfrom marktripy import Parser\n\n# Maximum compatibility\nparser = Parser(backend=\"markdown\")\n\n# Best performance \nparser = Parser(backend=\"mistletoe\")\n\n# Most extensions\nparser = Parser(backend=\"markdown-it-py\")\n```\n\n### Integration Examples\n\n```python\n# Pelican static site generator\nfrom marktripy import PelicanReader\n\n# MkDocs documentation\nfrom marktripy import MkDocsPlugin\n\n# Jupyter notebook processing\nfrom marktripy import MarkdownCell\n```\n\n## Contributing\n\nWe welcome contributions! Key areas:\n\n- Additional extensions (math, diagrams, etc.)\n- Performance improvements\n- Better round-trip fidelity\n- More transformer utilities\n\nSee [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.\n\n## License\n\nMIT License - see [LICENSE](LICENSE) for details.\n\n## Acknowledgments\n\nBuilt on the shoulders of giants:\n\n- `markdown-it-py` developers for the excellent plugin system\n- `mistletoe` for the clean AST design\n- The CommonMark specification authors\n- All researchers of the Python Markdown ecosystem\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A Python package for converting Markdown to AST and back to Markdown",
"version": "1.0.3",
"project_urls": {
"Homepage": "https://github.com/twardoch/marktripy",
"Issues": "https://github.com/twardoch/marktripy/issues",
"Repository": "https://github.com/twardoch/marktripy"
},
"split_keywords": [
"ast",
" commonmark",
" converter",
" markdown",
" parser"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "9a07a241d76967401f8044f56347839b0c3770b3b18c06dd168167fd46800f49",
"md5": "1dabd6f4b7609e6d81d5de2a58e77a76",
"sha256": "ffadcec11db94031d9673761d43c151fe65004692f823811640ec87fe54cb965"
},
"downloads": -1,
"filename": "marktripy-1.0.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "1dabd6f4b7609e6d81d5de2a58e77a76",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.12",
"size": 47568,
"upload_time": "2025-07-29T11:38:09",
"upload_time_iso_8601": "2025-07-29T11:38:09.283393Z",
"url": "https://files.pythonhosted.org/packages/9a/07/a241d76967401f8044f56347839b0c3770b3b18c06dd168167fd46800f49/marktripy-1.0.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "bf96779a97a53739a4596e0a56a08de040799d15e79d93e994d1ce544025b065",
"md5": "91c36d339ca1af77177b14f07717bd1e",
"sha256": "7d760b44aac7a528d6e045f22592670c013e3a29f19b93a902c5c72991477c3a"
},
"downloads": -1,
"filename": "marktripy-1.0.3.tar.gz",
"has_sig": false,
"md5_digest": "91c36d339ca1af77177b14f07717bd1e",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.12",
"size": 46082,
"upload_time": "2025-07-29T11:38:11",
"upload_time_iso_8601": "2025-07-29T11:38:11.428368Z",
"url": "https://files.pythonhosted.org/packages/bf/96/779a97a53739a4596e0a56a08de040799d15e79d93e994d1ce544025b065/marktripy-1.0.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-29 11:38:11",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "twardoch",
"github_project": "marktripy",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "marktripy"
}