| Name | dn JSON |
| Version |
0.0.8
JSON |
| download |
| home_page | None |
| Summary | Tools for markdown parsing and generation |
| upload_time | 2025-10-08 00:20:01 |
| maintainer | None |
| docs_url | None |
| author | Thor Whalen |
| requires_python | None |
| license | mit |
| keywords |
|
| VCS |
|
| bugtrack_url |
|
| requirements |
No requirements were recorded.
|
| Travis-CI |
No Travis.
|
| coveralls test coverage |
No coveralls.
|
# dn
Markdown parsing and generation
To install: `pip install dn`
## Optional Dependencies
This package supports converting various file formats to Markdown, with each format requiring specific dependencies:
Format Required Package(s)
----------- -----------------
PDF pypdf
Word mammoth
Excel pandas, openpyxl, tabulate
PowerPoint python-pptx
HTML html2text
Notebooks nbconvert, nbformat
**Installation Options**
You can install these dependencies after the fact, if and when package complains it needs some specific resource.
You can also install these when installing `dn`, like so:
```bash
# Install with minimal dependencies
pip install dn
# Install with support for specific formats
pip install dn[pdf] # PDF conversion support
pip install dn[word] # Word document support
pip install dn[excel] # Excel support
pip install dn[powerpoint] # PowerPoint support
pip install dn[html] # HTML conversion
pip install dn[notebook] # Jupyter Notebook support
# Install multiple format support
pip install dn[pdf,word,excel] # Multiple formats
# Install all optional dependencies
pip install dn[all]
```
# Examples
## To and from jupyter notebooks
from dn import markdown_to_notebook
sample_markdown = """# Sample Notebook
This is a markdown cell with some explanation.
```python
# This is a code cell
print("Hello, World!")
x = 42
print(f"The answer is {x}")
```
## Another Section
More markdown content here.
```python
# Another code cell
def greet(name):
return f"Hello, {name}!"
print(greet("Jupyter"))
```
Final markdown cell."""
Test basic functionality
```python
notebook = markdown_to_notebook(sample_markdown)
print(f"Created notebook with {len(notebook['cells'])} cells")
```
Test with file output
```python
output_path = markdown_to_notebook(
sample_markdown,
egress="./sample_notebook.ipynb"
)
print(f"Saved notebook to: {output_path}")
```
Created notebook with 5 cells
Saved notebook to: /Users/thorwhalen/Dropbox/py/proj/t/dn/misc/sample_notebook.ipynb
```python
from dn import notebook_to_markdown
md_string = notebook_to_markdown(notebook)
print(md_string)
```
# Sample Notebook
This is a markdown cell with some explanation.
```python
# This is a code cell
print("Hello, World!")
x = 42
print(f"The answer
...
nt(greet("Jupyter"))
```
Final markdown cell.
## ... and other formats
```python
from dn import pdf_to_markdown # requires pypdf
```
```python
from dn import docx_to_markdown # requires mammoth
```
```python
from dn import excel_to_markdown # requires pandas
```
```python
from dn import pptx_to_markdown # requires python-pptx
```
```python
from dn import html_to_markdown # requires html2text
```
# Markdown stores
User story: I have a directory with multiple files in different formats.
I want to batch convert all supported files to markdown and store them in memory.
```python
from dn import Files, bytes_store_to_markdown_store
from dn.tests.utils_for_testing_dn import test_data_dir
# Setup source files from test directory
src_files = Files(test_data_dir)
# Setup target store as an in-memory dictionary
target_store = {}
# Convert all files in directory to markdown
result = bytes_store_to_markdown_store(src_files, target_store, verbose=False)
# Check that the result is the target_store
assert result is target_store
# Verify that the supported file types were converted correctly
supported_files = [
"test.docx",
"test.pptx",
"test.pdf",
"test.html",
"test.xlsx",
"test.txt",
"test.md",
"test.ipynb",
]
print(f"\nSupported files (given what packages are installed here): {supported_files}\n")
for filename in supported_files:
assert f"{filename}.md" in target_store, f"{filename} not found in target_store"
assert len(target_store[f"{filename}.md"]) > 0, f"{filename} conversion failed"
```
invalid pdf header: b'PK\x03\x04\n'
EOF marker not found
EOF marker not found
invalid pdf header: b'PK\x03\x04\x14'
EOF marker not found
invalid pdf h
...
df header: b'PK\x03\x04\x14'
EOF marker not found
Supported files (given what packages are installed here): ['test.docx', 'test.pptx', 'test.pdf', 'test.html', 'test.xlsx', 'test.txt', 'test.md', 'test.ipynb']
# Convert this notebook into a markdown for the README.md
```python
from dn import notebook_to_markdown
notebook_to_markdown('~/Dropbox/py/proj/t/dn/misc/dn_readme.ipynb', target_file='../README.md')
```
HTML output truncated. (Data removed)
```python
```
Raw data
{
"_id": null,
"home_page": null,
"name": "dn",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": null,
"author": "Thor Whalen",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/7c/a5/bb2f05a45b814c371363e493195b5468de9bfdca1ae55f7a320cd8e31475/dn-0.0.8.tar.gz",
"platform": null,
"description": "# dn\n\nMarkdown parsing and generation\n\nTo install: `pip install dn`\n\n\n## Optional Dependencies\n\nThis package supports converting various file formats to Markdown, with each format requiring specific dependencies:\n\n\n Format Required Package(s)\n ----------- -----------------\n PDF pypdf\n Word mammoth\n Excel pandas, openpyxl, tabulate\n PowerPoint python-pptx\n HTML html2text\n Notebooks nbconvert, nbformat\n\n**Installation Options**\n\nYou can install these dependencies after the fact, if and when package complains it needs some specific resource. \n\nYou can also install these when installing `dn`, like so:\n\n```bash\n # Install with minimal dependencies\n pip install dn\n\n # Install with support for specific formats\n pip install dn[pdf] # PDF conversion support\n pip install dn[word] # Word document support\n pip install dn[excel] # Excel support\n pip install dn[powerpoint] # PowerPoint support\n pip install dn[html] # HTML conversion\n pip install dn[notebook] # Jupyter Notebook support\n\n # Install multiple format support\n pip install dn[pdf,word,excel] # Multiple formats\n\n # Install all optional dependencies\n pip install dn[all]\n```\n\n# Examples\n\n## To and from jupyter notebooks\n\n\n from dn import markdown_to_notebook\n\n sample_markdown = \"\"\"# Sample Notebook\n\n This is a markdown cell with some explanation.\n\n ```python\n # This is a code cell\n print(\"Hello, World!\")\n x = 42\n print(f\"The answer is {x}\")\n ```\n\n ## Another Section\n\n More markdown content here.\n\n ```python\n # Another code cell\n def greet(name):\n return f\"Hello, {name}!\"\n\n print(greet(\"Jupyter\"))\n ```\n\n Final markdown cell.\"\"\"\n \n\n\nTest basic functionality\n\n```python\nnotebook = markdown_to_notebook(sample_markdown)\nprint(f\"Created notebook with {len(notebook['cells'])} cells\")\n```\n\nTest with file output\n\n```python\noutput_path = markdown_to_notebook(\n sample_markdown,\n egress=\"./sample_notebook.ipynb\"\n)\nprint(f\"Saved notebook to: {output_path}\")\n```\n\n Created notebook with 5 cells\n Saved notebook to: /Users/thorwhalen/Dropbox/py/proj/t/dn/misc/sample_notebook.ipynb\n\n\n\n```python\nfrom dn import notebook_to_markdown\n\nmd_string = notebook_to_markdown(notebook)\nprint(md_string)\n```\n\n # Sample Notebook\n \n This is a markdown cell with some explanation.\n \n \n \n ```python\n # This is a code cell\n print(\"Hello, World!\")\n x = 42\n print(f\"The answer \n ...\n nt(greet(\"Jupyter\"))\n \n ```\n \n Final markdown cell.\n \n \n\n\n## ... and other formats\n\n\n```python\nfrom dn import pdf_to_markdown # requires pypdf\n```\n\n\n```python\nfrom dn import docx_to_markdown # requires mammoth\n```\n\n\n```python\nfrom dn import excel_to_markdown # requires pandas\n```\n\n\n```python\nfrom dn import pptx_to_markdown # requires python-pptx\n```\n\n\n```python\nfrom dn import html_to_markdown # requires html2text\n```\n\n# Markdown stores\n\nUser story: I have a directory with multiple files in different formats.\n\nI want to batch convert all supported files to markdown and store them in memory.\n\n\n```python\nfrom dn import Files, bytes_store_to_markdown_store\n\nfrom dn.tests.utils_for_testing_dn import test_data_dir\n\n# Setup source files from test directory\nsrc_files = Files(test_data_dir)\n\n# Setup target store as an in-memory dictionary\ntarget_store = {}\n\n# Convert all files in directory to markdown\nresult = bytes_store_to_markdown_store(src_files, target_store, verbose=False)\n\n# Check that the result is the target_store\nassert result is target_store\n\n# Verify that the supported file types were converted correctly\nsupported_files = [\n \"test.docx\",\n \"test.pptx\",\n \"test.pdf\",\n \"test.html\",\n \"test.xlsx\",\n \"test.txt\",\n \"test.md\",\n \"test.ipynb\",\n]\n\nprint(f\"\\nSupported files (given what packages are installed here): {supported_files}\\n\")\n\nfor filename in supported_files:\n assert f\"{filename}.md\" in target_store, f\"{filename} not found in target_store\"\n assert len(target_store[f\"{filename}.md\"]) > 0, f\"{filename} conversion failed\"\n\n```\n\n invalid pdf header: b'PK\\x03\\x04\\n'\n EOF marker not found\n EOF marker not found\n invalid pdf header: b'PK\\x03\\x04\\x14'\n EOF marker not found\n invalid pdf h\n ...\n df header: b'PK\\x03\\x04\\x14'\n EOF marker not found\n\n\n \n Supported files (given what packages are installed here): ['test.docx', 'test.pptx', 'test.pdf', 'test.html', 'test.xlsx', 'test.txt', 'test.md', 'test.ipynb']\n \n\n\n# Convert this notebook into a markdown for the README.md\n\n\n```python\nfrom dn import notebook_to_markdown\n\nnotebook_to_markdown('~/Dropbox/py/proj/t/dn/misc/dn_readme.ipynb', target_file='../README.md')\n```\n\n HTML output truncated. (Data removed)\n\n\n\n```python\n\n```\n",
"bugtrack_url": null,
"license": "mit",
"summary": "Tools for markdown parsing and generation",
"version": "0.0.8",
"project_urls": null,
"split_keywords": [],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "0b11f6489b18b334a60641132641f4f21e392594977cbd5848c9b549e4de94ff",
"md5": "ad8b0b08b53c596767b7fcb088ff69d0",
"sha256": "4ce02ebcb5601c2698ef16f6e21667a28a51e549c5e05bed15e9f3a6ecd7800f"
},
"downloads": -1,
"filename": "dn-0.0.8-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ad8b0b08b53c596767b7fcb088ff69d0",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 15153,
"upload_time": "2025-10-08T00:20:00",
"upload_time_iso_8601": "2025-10-08T00:20:00.935730Z",
"url": "https://files.pythonhosted.org/packages/0b/11/f6489b18b334a60641132641f4f21e392594977cbd5848c9b549e4de94ff/dn-0.0.8-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "7ca5bb2f05a45b814c371363e493195b5468de9bfdca1ae55f7a320cd8e31475",
"md5": "e0718e89ef857a32fb83f7cda9052ed9",
"sha256": "c946e58fe96c2385e44a97052a18c91551a908ef3d7abe06f4d2fcf5ac8801e0"
},
"downloads": -1,
"filename": "dn-0.0.8.tar.gz",
"has_sig": false,
"md5_digest": "e0718e89ef857a32fb83f7cda9052ed9",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 15552,
"upload_time": "2025-10-08T00:20:01",
"upload_time_iso_8601": "2025-10-08T00:20:01.778842Z",
"url": "https://files.pythonhosted.org/packages/7c/a5/bb2f05a45b814c371363e493195b5468de9bfdca1ae55f7a320cd8e31475/dn-0.0.8.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-08 00:20:01",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "dn"
}