marktripy


Namemarktripy JSON
Version 1.0.3 PyPI version JSON
download
home_pageNone
SummaryA Python package for converting Markdown to AST and back to Markdown
upload_time2025-07-29 11:38:11
maintainerNone
docs_urlNone
authorNone
requires_python>=3.12
licenseMIT
keywords ast commonmark converter markdown parser
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # marktripy

**TL;DR**: A Python package for parsing Markdown to AST, manipulating the tree structure, and serializing back to Markdown while preserving formatting. Built on `markdown-it-py` and `mistletoe` for maximum flexibility.

```python
from marktripy import parse_markdown, render_markdown

# Parse Markdown to AST
ast = parse_markdown("# Hello\n\nThis is **bold** text.")

# Manipulate AST (e.g., downgrade headings)
for node in ast.walk():
    if node.type == "heading":
        node.level += 1

# Render back to Markdown
markdown = render_markdown(ast)
# Output: "## Hello\n\nThis is **bold** text."
```

## Installation

```bash
# Using pip
pip install marktripy

# Using uv (recommended)
uv add marktripy

# Development installation
git clone https://github.com/yourusername/marktripy
cd marktripy
uv sync --dev
```

## Quick Usage

### Basic Markdown to HTML

```python
from marktripy import markdown_to_html

html = markdown_to_html("# Hello World\n\nThis is **bold** and *italic*.")
# <h1>Hello World</h1><p>This is <strong>bold</strong> and <em>italic</em>.</p>
```

### AST Manipulation

```python
from marktripy import parse_markdown, render_markdown

# Parse Markdown to AST
ast = parse_markdown("""
# Main Title
## Section 1
Some content here.
## Section 2
More content.
""")

# Add IDs to all headings
for node in ast.walk():
    if node.type == "heading":
        # Generate ID from heading text
        text = node.get_text().lower().replace(" ", "-")
        node.attrs["id"] = text
        
# Downgrade all headings by one level
for node in ast.walk():
    if node.type == "heading" and node.level < 6:
        node.level += 1

# Render back to Markdown
result = render_markdown(ast)
```

### Custom Syntax Extensions

```python
from marktripy import create_extension, Parser

# Create a custom extension for ++text++ → <kbd>text</kbd>
kbd_extension = create_extension(
    pattern=r'\+\+([^+]+)\+\+',
    node_type='kbd',
    html_tag='kbd'
)

# Use parser with extension
parser = Parser(extensions=[kbd_extension])
ast = parser.parse("Press ++Ctrl+C++ to copy")
html = parser.render_html(ast)
# Output: Press <kbd>Ctrl+C</kbd> to copy
```

### CLI Usage

```bash
# Convert Markdown to HTML
marktripy convert input.md -o output.html

# Parse and manipulate Markdown
marktripy transform input.md --downgrade-headings --add-ids -o output.md

# Validate Markdown structure
marktripy validate document.md --check-links --check-headings
```

## The Backstory

### Why Another Markdown Parser?

The Python ecosystem has numerous Markdown parsers, each with different strengths:

- **`markdown`**: The original, extensible but with a complex API
- **`markdown2`**: Faster alternative but less extensible
- **`mistune`**: Fast and supports AST, but limited round-trip capability
- **`marko`**: Good AST support but newer with less ecosystem
- **`markdown-it-py`**: Port of markdown-it with excellent plugin system

After extensive research (see `/ref` directory), I found that no single library perfectly addressed the need for:

1. **Clean AST manipulation** - Easy traversal and modification of document structure
2. **Round-trip conversion** - Parse Markdown → AST → Markdown without losing formatting
3. **Extensibility** - Simple API for adding custom syntax
4. **Performance** - Fast enough for real-world documents
5. **Standards compliance** - CommonMark compliant with GFM extensions

### The Research Journey

The `/ref` directory contains comprehensive research comparing 8+ Python Markdown libraries across multiple dimensions:

- **ref1.md**: Practical guide to advanced Markdown processing in Python
- **ref2.md**: Detailed comparison of parser architectures and extension mechanisms
- **ref3.md**: Performance benchmarks and feature matrix

Key findings:

- `markdown-it-py` offers the best plugin architecture
- `mistletoe` has the cleanest AST representation
- `marko` provides good round-trip capabilities
- Performance varies by 10-100x between libraries

### Design Philosophy

`marktripy` combines the best ideas from existing libraries:

1. **Dual-parser architecture**: Use `markdown-it-py` for extensibility and `mistletoe` for AST manipulation
2. **Unified AST format**: Convert between parser representations transparently
3. **Preserving formatting**: Track source positions and whitespace for faithful round-trips
4. **Plugin-first design**: Everything beyond core CommonMark is a plugin
5. **Type safety**: Full type hints with `mypy --strict` compatibility

## Technical Architecture

### Core Components

```text
marktripy/
├── ast.py          # Unified AST node definitions
├── parser.py       # Parser abstraction layer
├── renderer.py     # Markdown/HTML renderers
├── extensions/     # Built-in extensions
│   ├── gfm.py     # GitHub Flavored Markdown
│   ├── toc.py     # Table of contents generator
│   └── ...
├── transformers/   # AST transformation utilities
│   ├── headings.py # Heading manipulation
│   ├── links.py    # Link processing
│   └── ...
└── cli.py         # Command-line interface
```

### AST Structure

The AST uses a unified node structure compatible with both parsers:

```python
class ASTNode:
    type: str           # Node type (heading, paragraph, etc.)
    children: List[ASTNode]
    attrs: Dict[str, Any]   # Attributes (id, class, etc.)
    content: str        # Text content for leaf nodes
    meta: Dict[str, Any]    # Source mapping, parser-specific data
```

### Parser Architecture

```python
# Abstraction layer over multiple parsers
class Parser:
    def __init__(self, parser_backend="markdown-it-py", extensions=None):
        self.backend = self._create_backend(parser_backend)
        self.extensions = extensions or []
        
    def parse(self, markdown: str) -> ASTNode:
        # Parse with backend
        backend_ast = self.backend.parse(markdown)
        # Convert to unified AST
        return self._normalize_ast(backend_ast)
```

### Extension System

Extensions can hook into multiple stages:

```python
class Extension:
    def extend_parser(self, parser): ...      # Modify parser rules
    def transform_ast(self, ast): ...         # Post-process AST
    def extend_renderer(self, renderer): ...  # Custom rendering
```

### Rendering Pipeline

1. **AST → Markdown**: Preserves formatting, handles custom nodes
2. **AST → HTML**: Configurable sanitization, custom handlers
3. **AST → JSON**: Serialization for processing pipelines

### Performance Optimizations

- Lazy parsing for large documents
- Streaming renderers for memory efficiency  
- Optional C extensions via `umarkdown` backend
- Caching for repeated transformations

## Advanced Usage

### Custom Transformers

```python
from marktripy import Transformer

class HeaderAnchorTransformer(Transformer):
    """Add GitHub-style anchor links to headers"""
    
    def transform(self, ast):
        for node in ast.walk():
            if node.type == "heading":
                anchor = self.create_anchor(node)
                node.children.insert(0, anchor)
        return ast
```

### Parser Backends

```python
# Use different backends for different needs
from marktripy import Parser

# Maximum compatibility
parser = Parser(backend="markdown")

# Best performance  
parser = Parser(backend="mistletoe")

# Most extensions
parser = Parser(backend="markdown-it-py")
```

### Integration Examples

```python
# Pelican static site generator
from marktripy import PelicanReader

# MkDocs documentation
from marktripy import MkDocsPlugin

# Jupyter notebook processing
from marktripy import MarkdownCell
```

## Contributing

We welcome contributions! Key areas:

- Additional extensions (math, diagrams, etc.)
- Performance improvements
- Better round-trip fidelity
- More transformer utilities

See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.

## License

MIT License - see [LICENSE](LICENSE) for details.

## Acknowledgments

Built on the shoulders of giants:

- `markdown-it-py` developers for the excellent plugin system
- `mistletoe` for the clean AST design
- The CommonMark specification authors
- All researchers of the Python Markdown ecosystem

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "marktripy",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.12",
    "maintainer_email": null,
    "keywords": "ast, commonmark, converter, markdown, parser",
    "author": null,
    "author_email": "Adam <adam@example.com>",
    "download_url": "https://files.pythonhosted.org/packages/bf/96/779a97a53739a4596e0a56a08de040799d15e79d93e994d1ce544025b065/marktripy-1.0.3.tar.gz",
    "platform": null,
    "description": "# marktripy\n\n**TL;DR**: A Python package for parsing Markdown to AST, manipulating the tree structure, and serializing back to Markdown while preserving formatting. Built on `markdown-it-py` and `mistletoe` for maximum flexibility.\n\n```python\nfrom marktripy import parse_markdown, render_markdown\n\n# Parse Markdown to AST\nast = parse_markdown(\"# Hello\\n\\nThis is **bold** text.\")\n\n# Manipulate AST (e.g., downgrade headings)\nfor node in ast.walk():\n    if node.type == \"heading\":\n        node.level += 1\n\n# Render back to Markdown\nmarkdown = render_markdown(ast)\n# Output: \"## Hello\\n\\nThis is **bold** text.\"\n```\n\n## Installation\n\n```bash\n# Using pip\npip install marktripy\n\n# Using uv (recommended)\nuv add marktripy\n\n# Development installation\ngit clone https://github.com/yourusername/marktripy\ncd marktripy\nuv sync --dev\n```\n\n## Quick Usage\n\n### Basic Markdown to HTML\n\n```python\nfrom marktripy import markdown_to_html\n\nhtml = markdown_to_html(\"# Hello World\\n\\nThis is **bold** and *italic*.\")\n# <h1>Hello World</h1><p>This is <strong>bold</strong> and <em>italic</em>.</p>\n```\n\n### AST Manipulation\n\n```python\nfrom marktripy import parse_markdown, render_markdown\n\n# Parse Markdown to AST\nast = parse_markdown(\"\"\"\n# Main Title\n## Section 1\nSome content here.\n## Section 2\nMore content.\n\"\"\")\n\n# Add IDs to all headings\nfor node in ast.walk():\n    if node.type == \"heading\":\n        # Generate ID from heading text\n        text = node.get_text().lower().replace(\" \", \"-\")\n        node.attrs[\"id\"] = text\n        \n# Downgrade all headings by one level\nfor node in ast.walk():\n    if node.type == \"heading\" and node.level < 6:\n        node.level += 1\n\n# Render back to Markdown\nresult = render_markdown(ast)\n```\n\n### Custom Syntax Extensions\n\n```python\nfrom marktripy import create_extension, Parser\n\n# Create a custom extension for ++text++ \u2192 <kbd>text</kbd>\nkbd_extension = create_extension(\n    pattern=r'\\+\\+([^+]+)\\+\\+',\n    node_type='kbd',\n    html_tag='kbd'\n)\n\n# Use parser with extension\nparser = Parser(extensions=[kbd_extension])\nast = parser.parse(\"Press ++Ctrl+C++ to copy\")\nhtml = parser.render_html(ast)\n# Output: Press <kbd>Ctrl+C</kbd> to copy\n```\n\n### CLI Usage\n\n```bash\n# Convert Markdown to HTML\nmarktripy convert input.md -o output.html\n\n# Parse and manipulate Markdown\nmarktripy transform input.md --downgrade-headings --add-ids -o output.md\n\n# Validate Markdown structure\nmarktripy validate document.md --check-links --check-headings\n```\n\n## The Backstory\n\n### Why Another Markdown Parser?\n\nThe Python ecosystem has numerous Markdown parsers, each with different strengths:\n\n- **`markdown`**: The original, extensible but with a complex API\n- **`markdown2`**: Faster alternative but less extensible\n- **`mistune`**: Fast and supports AST, but limited round-trip capability\n- **`marko`**: Good AST support but newer with less ecosystem\n- **`markdown-it-py`**: Port of markdown-it with excellent plugin system\n\nAfter extensive research (see `/ref` directory), I found that no single library perfectly addressed the need for:\n\n1. **Clean AST manipulation** - Easy traversal and modification of document structure\n2. **Round-trip conversion** - Parse Markdown \u2192 AST \u2192 Markdown without losing formatting\n3. **Extensibility** - Simple API for adding custom syntax\n4. **Performance** - Fast enough for real-world documents\n5. **Standards compliance** - CommonMark compliant with GFM extensions\n\n### The Research Journey\n\nThe `/ref` directory contains comprehensive research comparing 8+ Python Markdown libraries across multiple dimensions:\n\n- **ref1.md**: Practical guide to advanced Markdown processing in Python\n- **ref2.md**: Detailed comparison of parser architectures and extension mechanisms\n- **ref3.md**: Performance benchmarks and feature matrix\n\nKey findings:\n\n- `markdown-it-py` offers the best plugin architecture\n- `mistletoe` has the cleanest AST representation\n- `marko` provides good round-trip capabilities\n- Performance varies by 10-100x between libraries\n\n### Design Philosophy\n\n`marktripy` combines the best ideas from existing libraries:\n\n1. **Dual-parser architecture**: Use `markdown-it-py` for extensibility and `mistletoe` for AST manipulation\n2. **Unified AST format**: Convert between parser representations transparently\n3. **Preserving formatting**: Track source positions and whitespace for faithful round-trips\n4. **Plugin-first design**: Everything beyond core CommonMark is a plugin\n5. **Type safety**: Full type hints with `mypy --strict` compatibility\n\n## Technical Architecture\n\n### Core Components\n\n```text\nmarktripy/\n\u251c\u2500\u2500 ast.py          # Unified AST node definitions\n\u251c\u2500\u2500 parser.py       # Parser abstraction layer\n\u251c\u2500\u2500 renderer.py     # Markdown/HTML renderers\n\u251c\u2500\u2500 extensions/     # Built-in extensions\n\u2502   \u251c\u2500\u2500 gfm.py     # GitHub Flavored Markdown\n\u2502   \u251c\u2500\u2500 toc.py     # Table of contents generator\n\u2502   \u2514\u2500\u2500 ...\n\u251c\u2500\u2500 transformers/   # AST transformation utilities\n\u2502   \u251c\u2500\u2500 headings.py # Heading manipulation\n\u2502   \u251c\u2500\u2500 links.py    # Link processing\n\u2502   \u2514\u2500\u2500 ...\n\u2514\u2500\u2500 cli.py         # Command-line interface\n```\n\n### AST Structure\n\nThe AST uses a unified node structure compatible with both parsers:\n\n```python\nclass ASTNode:\n    type: str           # Node type (heading, paragraph, etc.)\n    children: List[ASTNode]\n    attrs: Dict[str, Any]   # Attributes (id, class, etc.)\n    content: str        # Text content for leaf nodes\n    meta: Dict[str, Any]    # Source mapping, parser-specific data\n```\n\n### Parser Architecture\n\n```python\n# Abstraction layer over multiple parsers\nclass Parser:\n    def __init__(self, parser_backend=\"markdown-it-py\", extensions=None):\n        self.backend = self._create_backend(parser_backend)\n        self.extensions = extensions or []\n        \n    def parse(self, markdown: str) -> ASTNode:\n        # Parse with backend\n        backend_ast = self.backend.parse(markdown)\n        # Convert to unified AST\n        return self._normalize_ast(backend_ast)\n```\n\n### Extension System\n\nExtensions can hook into multiple stages:\n\n```python\nclass Extension:\n    def extend_parser(self, parser): ...      # Modify parser rules\n    def transform_ast(self, ast): ...         # Post-process AST\n    def extend_renderer(self, renderer): ...  # Custom rendering\n```\n\n### Rendering Pipeline\n\n1. **AST \u2192 Markdown**: Preserves formatting, handles custom nodes\n2. **AST \u2192 HTML**: Configurable sanitization, custom handlers\n3. **AST \u2192 JSON**: Serialization for processing pipelines\n\n### Performance Optimizations\n\n- Lazy parsing for large documents\n- Streaming renderers for memory efficiency  \n- Optional C extensions via `umarkdown` backend\n- Caching for repeated transformations\n\n## Advanced Usage\n\n### Custom Transformers\n\n```python\nfrom marktripy import Transformer\n\nclass HeaderAnchorTransformer(Transformer):\n    \"\"\"Add GitHub-style anchor links to headers\"\"\"\n    \n    def transform(self, ast):\n        for node in ast.walk():\n            if node.type == \"heading\":\n                anchor = self.create_anchor(node)\n                node.children.insert(0, anchor)\n        return ast\n```\n\n### Parser Backends\n\n```python\n# Use different backends for different needs\nfrom marktripy import Parser\n\n# Maximum compatibility\nparser = Parser(backend=\"markdown\")\n\n# Best performance  \nparser = Parser(backend=\"mistletoe\")\n\n# Most extensions\nparser = Parser(backend=\"markdown-it-py\")\n```\n\n### Integration Examples\n\n```python\n# Pelican static site generator\nfrom marktripy import PelicanReader\n\n# MkDocs documentation\nfrom marktripy import MkDocsPlugin\n\n# Jupyter notebook processing\nfrom marktripy import MarkdownCell\n```\n\n## Contributing\n\nWe welcome contributions! Key areas:\n\n- Additional extensions (math, diagrams, etc.)\n- Performance improvements\n- Better round-trip fidelity\n- More transformer utilities\n\nSee [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.\n\n## License\n\nMIT License - see [LICENSE](LICENSE) for details.\n\n## Acknowledgments\n\nBuilt on the shoulders of giants:\n\n- `markdown-it-py` developers for the excellent plugin system\n- `mistletoe` for the clean AST design\n- The CommonMark specification authors\n- All researchers of the Python Markdown ecosystem\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A Python package for converting Markdown to AST and back to Markdown",
    "version": "1.0.3",
    "project_urls": {
        "Homepage": "https://github.com/twardoch/marktripy",
        "Issues": "https://github.com/twardoch/marktripy/issues",
        "Repository": "https://github.com/twardoch/marktripy"
    },
    "split_keywords": [
        "ast",
        " commonmark",
        " converter",
        " markdown",
        " parser"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "9a07a241d76967401f8044f56347839b0c3770b3b18c06dd168167fd46800f49",
                "md5": "1dabd6f4b7609e6d81d5de2a58e77a76",
                "sha256": "ffadcec11db94031d9673761d43c151fe65004692f823811640ec87fe54cb965"
            },
            "downloads": -1,
            "filename": "marktripy-1.0.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1dabd6f4b7609e6d81d5de2a58e77a76",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.12",
            "size": 47568,
            "upload_time": "2025-07-29T11:38:09",
            "upload_time_iso_8601": "2025-07-29T11:38:09.283393Z",
            "url": "https://files.pythonhosted.org/packages/9a/07/a241d76967401f8044f56347839b0c3770b3b18c06dd168167fd46800f49/marktripy-1.0.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "bf96779a97a53739a4596e0a56a08de040799d15e79d93e994d1ce544025b065",
                "md5": "91c36d339ca1af77177b14f07717bd1e",
                "sha256": "7d760b44aac7a528d6e045f22592670c013e3a29f19b93a902c5c72991477c3a"
            },
            "downloads": -1,
            "filename": "marktripy-1.0.3.tar.gz",
            "has_sig": false,
            "md5_digest": "91c36d339ca1af77177b14f07717bd1e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.12",
            "size": 46082,
            "upload_time": "2025-07-29T11:38:11",
            "upload_time_iso_8601": "2025-07-29T11:38:11.428368Z",
            "url": "https://files.pythonhosted.org/packages/bf/96/779a97a53739a4596e0a56a08de040799d15e79d93e994d1ce544025b065/marktripy-1.0.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-29 11:38:11",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "twardoch",
    "github_project": "marktripy",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "marktripy"
}
        
Elapsed time: 1.72839s