dn

Name	dn JSON
Version	0.0.8 JSON
	download
home_page	None
Summary	Tools for markdown parsing and generation
upload_time	2025-10-08 00:20:01
maintainer	None
docs_url	None
author	Thor Whalen
requires_python	None
license	mit
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # dn

Markdown parsing and generation

To install: `pip install dn`


## Optional Dependencies

This package supports converting various file formats to Markdown, with each format requiring specific dependencies:


    Format      Required Package(s)
    ----------- -----------------
    PDF         pypdf
    Word        mammoth
    Excel       pandas, openpyxl, tabulate
    PowerPoint  python-pptx
    HTML        html2text
    Notebooks   nbconvert, nbformat

**Installation Options**

You can install these dependencies after the fact, if and when package complains it needs some specific resource. 

You can also install these when installing `dn`, like so:

```bash
    # Install with minimal dependencies
    pip install dn

    # Install with support for specific formats
    pip install dn[pdf]               # PDF conversion support
    pip install dn[word]              # Word document support
    pip install dn[excel]             # Excel support
    pip install dn[powerpoint]        # PowerPoint support
    pip install dn[html]              # HTML conversion
    pip install dn[notebook]          # Jupyter Notebook support

    # Install multiple format support
    pip install dn[pdf,word,excel]    # Multiple formats

    # Install all optional dependencies
    pip install dn[all]
```

# Examples

## To and from jupyter notebooks


    from dn import markdown_to_notebook

    sample_markdown = """# Sample Notebook

    This is a markdown cell with some explanation.

    ```python
    # This is a code cell
    print("Hello, World!")
    x = 42
    print(f"The answer is {x}")
    ```

    ## Another Section

    More markdown content here.

    ```python
    # Another code cell
    def greet(name):
        return f"Hello, {name}!"

    print(greet("Jupyter"))
    ```

    Final markdown cell."""
        


Test basic functionality

```python
notebook = markdown_to_notebook(sample_markdown)
print(f"Created notebook with {len(notebook['cells'])} cells")
```

Test with file output

```python
output_path = markdown_to_notebook(
    sample_markdown,
    egress="./sample_notebook.ipynb"
)
print(f"Saved notebook to: {output_path}")
```

    Created notebook with 5 cells
    Saved notebook to: /Users/thorwhalen/Dropbox/py/proj/t/dn/misc/sample_notebook.ipynb



```python
from dn import notebook_to_markdown

md_string = notebook_to_markdown(notebook)
print(md_string)
```

    # Sample Notebook
    
    This is a markdown cell with some explanation.
    
    
    
    ```python
    # This is a code cell
    print("Hello, World!")
    x = 42
    print(f"The answer 
    ...
    nt(greet("Jupyter"))
    
    ```
    
    Final markdown cell.
    
    


## ... and other formats


```python
from dn import pdf_to_markdown  # requires pypdf
```


```python
from dn import docx_to_markdown  # requires mammoth
```


```python
from dn import excel_to_markdown  # requires pandas
```


```python
from dn import pptx_to_markdown  # requires python-pptx
```


```python
from dn import html_to_markdown  # requires html2text
```

# Markdown stores

User story: I have a directory with multiple files in different formats.

I want to batch convert all supported files to markdown and store them in memory.


```python
from dn import Files, bytes_store_to_markdown_store

from dn.tests.utils_for_testing_dn import test_data_dir

# Setup source files from test directory
src_files = Files(test_data_dir)

# Setup target store as an in-memory dictionary
target_store = {}

# Convert all files in directory to markdown
result = bytes_store_to_markdown_store(src_files, target_store, verbose=False)

# Check that the result is the target_store
assert result is target_store

# Verify that the supported file types were converted correctly
supported_files = [
    "test.docx",
    "test.pptx",
    "test.pdf",
    "test.html",
    "test.xlsx",
    "test.txt",
    "test.md",
    "test.ipynb",
]

print(f"\nSupported files (given what packages are installed here): {supported_files}\n")

for filename in supported_files:
    assert f"{filename}.md" in target_store, f"{filename} not found in target_store"
    assert len(target_store[f"{filename}.md"]) > 0, f"{filename} conversion failed"

```

    invalid pdf header: b'PK\x03\x04\n'
    EOF marker not found
    EOF marker not found
    invalid pdf header: b'PK\x03\x04\x14'
    EOF marker not found
    invalid pdf h
    ...
    df header: b'PK\x03\x04\x14'
    EOF marker not found


    
    Supported files (given what packages are installed here): ['test.docx', 'test.pptx', 'test.pdf', 'test.html', 'test.xlsx', 'test.txt', 'test.md', 'test.ipynb']
    


# Convert this notebook into a markdown for the README.md


```python
from dn import notebook_to_markdown

notebook_to_markdown('~/Dropbox/py/proj/t/dn/misc/dn_readme.ipynb', target_file='../README.md')
```

    HTML output truncated. (Data removed)



```python

```

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "dn",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": null,
    "author": "Thor Whalen",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/7c/a5/bb2f05a45b814c371363e493195b5468de9bfdca1ae55f7a320cd8e31475/dn-0.0.8.tar.gz",
    "platform": null,
    "description": "# dn\n\nMarkdown parsing and generation\n\nTo install: `pip install dn`\n\n\n## Optional Dependencies\n\nThis package supports converting various file formats to Markdown, with each format requiring specific dependencies:\n\n\n    Format      Required Package(s)\n    ----------- -----------------\n    PDF         pypdf\n    Word        mammoth\n    Excel       pandas, openpyxl, tabulate\n    PowerPoint  python-pptx\n    HTML        html2text\n    Notebooks   nbconvert, nbformat\n\n**Installation Options**\n\nYou can install these dependencies after the fact, if and when package complains it needs some specific resource. \n\nYou can also install these when installing `dn`, like so:\n\n```bash\n    # Install with minimal dependencies\n    pip install dn\n\n    # Install with support for specific formats\n    pip install dn[pdf]               # PDF conversion support\n    pip install dn[word]              # Word document support\n    pip install dn[excel]             # Excel support\n    pip install dn[powerpoint]        # PowerPoint support\n    pip install dn[html]              # HTML conversion\n    pip install dn[notebook]          # Jupyter Notebook support\n\n    # Install multiple format support\n    pip install dn[pdf,word,excel]    # Multiple formats\n\n    # Install all optional dependencies\n    pip install dn[all]\n```\n\n# Examples\n\n## To and from jupyter notebooks\n\n\n    from dn import markdown_to_notebook\n\n    sample_markdown = \"\"\"# Sample Notebook\n\n    This is a markdown cell with some explanation.\n\n    ```python\n    # This is a code cell\n    print(\"Hello, World!\")\n    x = 42\n    print(f\"The answer is {x}\")\n    ```\n\n    ## Another Section\n\n    More markdown content here.\n\n    ```python\n    # Another code cell\n    def greet(name):\n        return f\"Hello, {name}!\"\n\n    print(greet(\"Jupyter\"))\n    ```\n\n    Final markdown cell.\"\"\"\n        \n\n\nTest basic functionality\n\n```python\nnotebook = markdown_to_notebook(sample_markdown)\nprint(f\"Created notebook with {len(notebook['cells'])} cells\")\n```\n\nTest with file output\n\n```python\noutput_path = markdown_to_notebook(\n    sample_markdown,\n    egress=\"./sample_notebook.ipynb\"\n)\nprint(f\"Saved notebook to: {output_path}\")\n```\n\n    Created notebook with 5 cells\n    Saved notebook to: /Users/thorwhalen/Dropbox/py/proj/t/dn/misc/sample_notebook.ipynb\n\n\n\n```python\nfrom dn import notebook_to_markdown\n\nmd_string = notebook_to_markdown(notebook)\nprint(md_string)\n```\n\n    # Sample Notebook\n    \n    This is a markdown cell with some explanation.\n    \n    \n    \n    ```python\n    # This is a code cell\n    print(\"Hello, World!\")\n    x = 42\n    print(f\"The answer \n    ...\n    nt(greet(\"Jupyter\"))\n    \n    ```\n    \n    Final markdown cell.\n    \n    \n\n\n## ... and other formats\n\n\n```python\nfrom dn import pdf_to_markdown  # requires pypdf\n```\n\n\n```python\nfrom dn import docx_to_markdown  # requires mammoth\n```\n\n\n```python\nfrom dn import excel_to_markdown  # requires pandas\n```\n\n\n```python\nfrom dn import pptx_to_markdown  # requires python-pptx\n```\n\n\n```python\nfrom dn import html_to_markdown  # requires html2text\n```\n\n# Markdown stores\n\nUser story: I have a directory with multiple files in different formats.\n\nI want to batch convert all supported files to markdown and store them in memory.\n\n\n```python\nfrom dn import Files, bytes_store_to_markdown_store\n\nfrom dn.tests.utils_for_testing_dn import test_data_dir\n\n# Setup source files from test directory\nsrc_files = Files(test_data_dir)\n\n# Setup target store as an in-memory dictionary\ntarget_store = {}\n\n# Convert all files in directory to markdown\nresult = bytes_store_to_markdown_store(src_files, target_store, verbose=False)\n\n# Check that the result is the target_store\nassert result is target_store\n\n# Verify that the supported file types were converted correctly\nsupported_files = [\n    \"test.docx\",\n    \"test.pptx\",\n    \"test.pdf\",\n    \"test.html\",\n    \"test.xlsx\",\n    \"test.txt\",\n    \"test.md\",\n    \"test.ipynb\",\n]\n\nprint(f\"\\nSupported files (given what packages are installed here): {supported_files}\\n\")\n\nfor filename in supported_files:\n    assert f\"{filename}.md\" in target_store, f\"{filename} not found in target_store\"\n    assert len(target_store[f\"{filename}.md\"]) > 0, f\"{filename} conversion failed\"\n\n```\n\n    invalid pdf header: b'PK\\x03\\x04\\n'\n    EOF marker not found\n    EOF marker not found\n    invalid pdf header: b'PK\\x03\\x04\\x14'\n    EOF marker not found\n    invalid pdf h\n    ...\n    df header: b'PK\\x03\\x04\\x14'\n    EOF marker not found\n\n\n    \n    Supported files (given what packages are installed here): ['test.docx', 'test.pptx', 'test.pdf', 'test.html', 'test.xlsx', 'test.txt', 'test.md', 'test.ipynb']\n    \n\n\n# Convert this notebook into a markdown for the README.md\n\n\n```python\nfrom dn import notebook_to_markdown\n\nnotebook_to_markdown('~/Dropbox/py/proj/t/dn/misc/dn_readme.ipynb', target_file='../README.md')\n```\n\n    HTML output truncated. (Data removed)\n\n\n\n```python\n\n```\n",
    "bugtrack_url": null,
    "license": "mit",
    "summary": "Tools for markdown parsing and generation",
    "version": "0.0.8",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "0b11f6489b18b334a60641132641f4f21e392594977cbd5848c9b549e4de94ff",
                "md5": "ad8b0b08b53c596767b7fcb088ff69d0",
                "sha256": "4ce02ebcb5601c2698ef16f6e21667a28a51e549c5e05bed15e9f3a6ecd7800f"
            },
            "downloads": -1,
            "filename": "dn-0.0.8-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ad8b0b08b53c596767b7fcb088ff69d0",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 15153,
            "upload_time": "2025-10-08T00:20:00",
            "upload_time_iso_8601": "2025-10-08T00:20:00.935730Z",
            "url": "https://files.pythonhosted.org/packages/0b/11/f6489b18b334a60641132641f4f21e392594977cbd5848c9b549e4de94ff/dn-0.0.8-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "7ca5bb2f05a45b814c371363e493195b5468de9bfdca1ae55f7a320cd8e31475",
                "md5": "e0718e89ef857a32fb83f7cda9052ed9",
                "sha256": "c946e58fe96c2385e44a97052a18c91551a908ef3d7abe06f4d2fcf5ac8801e0"
            },
            "downloads": -1,
            "filename": "dn-0.0.8.tar.gz",
            "has_sig": false,
            "md5_digest": "e0718e89ef857a32fb83f7cda9052ed9",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 15552,
            "upload_time": "2025-10-08T00:20:01",
            "upload_time_iso_8601": "2025-10-08T00:20:01.778842Z",
            "url": "https://files.pythonhosted.org/packages/7c/a5/bb2f05a45b814c371363e493195b5468de9bfdca1ae55f7a320cd8e31475/dn-0.0.8.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-08 00:20:01",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "dn"
}

Thor Whalen