Name | obsidianmd-parser JSON |
Version |
0.3.1
JSON |
| download |
home_page | None |
Summary | A Python library for parsing Obsidian Markdown (.md) files and vaults. |
upload_time | 2025-09-07 16:19:07 |
maintainer | None |
docs_url | None |
author | paddyd |
requires_python | <4.0,>=3.12 |
license | MIT |
keywords |
obsidian
markdown
parser
dataview
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# obsidianmd-parser
A Python package for parsing Obsidian Markdown vaults and notes, with support for Obsidian's built-in markdown format and Dataview queries.
## Features
- **Complete Vault Parsing**: Load and parse entire Obsidian vaults
- **Note Object Model**: Work with notes as Python objects with attributes and methods
- **Obsidian Markdown Support**:
- Wikilinks (`[[links]]` and `[[links|aliases]]`)
- Tags (`#tag`, `#nested/tag`)
- Task lists with status tracking
- Obsidian callouts
- **Relationship Tracking**: Analyze backlinks and relationships between notes
- **Dataview Support**:
- Parse Dataview queries from notes
- Evaluate Dataview queries programmatically
- **Search Capabilities**:
- Exact search for notes
- Similarity search using various algorithms
- **Code Block Handling**: Correctly excludes parsing within code blocks
## Installation
```bash
pip install obsidianmd-parser
```
## Quick Start
```python
from obsidian_parser import Vault
# Load a vault
vault = Vault("path/to/your/obsidian/vault")
# Find notes by exact name
note = vault.get_note("My Note")
# Search notes by similarity
similar_notes = vault.find_notes("machine learning", case_sensitive=False)
# Access note properties
print(note.title)
print(note.tags)
print(note.wikilinks)
print(note.tasks)
# Work with relationships
backlinks = note.get_backlinks(vault=vault)
related = note.get_forward_links(vault=vault)
most_linked = note.get_most_linked()
```
## Core API
### Vault
The `Vault` class represents an entire Obsidian vault:
```python
# lazy_load = notes are parsed only when accessed (default: True)
vault = Vault("path/to/vault", lazy_load=True)
# Search and retrieval
note = vault.get_note("Note Title")
notes = vault.find_similar_notes("search query", threshold=0.5)
# Vault analysis
note_graph = vault.get_note_graph() # Produces a note graph tuple object
dataview_usage = vault.analyze_dataview_usage() # Get vault statistics for dataview queries
broken_links = vault.find_broken_links() # Finds all broken links in the vault
```
### Note
The `Note` class represents an individual note:
```python
# Access note metadata
note.title # Note title
note.path # File path
note.content # Raw markdown content
note.frontmatter # Parsed YAML frontmatter
# Access parsed elements
note.tags # List of tags in the note
note.wikilinks # List of wikilinks (forward)
note.tasks # List of tasks
note.callouts # List of callouts
# Access raw frontmatter
raw = note.frontmatter # Dict-like object with raw values
# Get cleaned frontmatter (removes wikilinks, formats dates)
cleaned = note.frontmatter.clean()
# Custom date formatting
cleaned = note.frontmatter.clean(date_format='DD-MM-YYYY')
cleaned = note.frontmatter.clean(date_format='%B %d, %Y') # "March 24, 2025"
# Relationships
vault=Vault('path/to/vault')
note.get_backlinks(vault) # Notes that link to this note
note.get_forward_links(vault) # Notes this note links to
note.get_related_notes() # Related notes by various metrics
note.get_link_context("Target") # Get the context for a piece of text in your note
note.get_link_context( # E.g. context for a wikilink.
target=note.wikilinks[0].display_text,
context_chars=40)
```
## Sections
```python
for section in note.sections:
print(f"Section: {section.heading}")
print(f" Full path: {section.full_path}")
print(f" Parent headings [(level, heading)]: {section.parent_headings}")
print(f" Heading list: {section.breadcrumb}")
print(f" Heading hierarchy: {section.full_path}")
print(f" Has parent: {section.parent is not None}")
```
## Dataview Support
Parse and evaluate Dataview queries:
```python
# Parse Dataview queries from a note
queries = note.dataview_queries
query = queries[0]
query.evaluate(vault, note)
# Evaluate a Dataview query in notes or sections
print(note.get_evaluated_view(vault))
note_section = notes.sections[10]
print(note_section.get_evaluated_view(vault))
```
## Advanced Usage
### Custom Search
```python
# Configure similarity search
results = vault.search(
query="machine learning",
limit=10
threshold=0.6
)
```
### Vault Analysis
```python
# Build an note index dataframe of the vault
vault_index = vault.build_index()
# Build and analyze vault graph
graph = vault.get_note_graph()
# Find broken links
broken_links = vault.find_broken_links()
# Relationship analysis
relationship_stats = vault.analyze_relationships() # Builds a Relationship Analyzer object
stats_report = relationship_stats.build_statistics_report()
df = relationship_stats.export_to_dataframe() # Pandas dataframe object
relationship_stats.find_hub_notes( # Find notes with lots of connections (default = 10)
min_connections=50
)
orphaned_notes = relationship_stats.find_orphaned_notes() # Find orphaned notes (no backlinks)
```
### Working with Parsed Elements
```python
# Access specific elements
for link in note.wikilinks:
print(f"Link to: {link.target}, alias: {link.alias}")
for task in note.tasks:
if task.status == " ":
print(f"TODO: {task.text}")
for tag in note.tags:
print(f"Tag: #{tag.name}")
```
## Requirements
- Python 3.12+ (earlier versions may be supported but not yet tested)
- Dependencies are automatically installed with pip
## Contributing
Contributions are welcome! The project is hosted on Codeberg:
https://codeberg.org/paddyd/obsidian-parser
Please feel free to submit issues and pull requests.
## License
MIT
## Changelog
### 0.3.1 (2025-09-07)
- Added fix to prevent '#'s in URLs being parsed as tags.
- Added further unit tests for tag parsing.
### 0.3.0 (2025-06-14)
- Added parent heading parsing for `Sections`.
- `Sections` now capture heading hierarchy for the whole note.
### 0.2.0 (2025-06-07)
- Added `Frontmatter.clean()` method for cleaning frontmatter values
- Frontmatter now returns a dict-like object instead of plain dict
- Improved wikilink parsing in frontmatter values
### 0.1.0 (Initial Release)
- Core vault and note parsing functionality
- Obsidian markdown format support
- Dataview query parsing and evaluation
- Search capabilities (exact and similarity)
- Relationship tracking and graph building
Raw data
{
"_id": null,
"home_page": null,
"name": "obsidianmd-parser",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.12",
"maintainer_email": null,
"keywords": "obsidian, markdown, parser, dataview",
"author": "paddyd",
"author_email": "patduf1@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/55/36/c9df7b78bb6651ca4ba7b04079eb04c512ee6d47db2f51499d8c8798e5b7/obsidianmd_parser-0.3.1.tar.gz",
"platform": null,
"description": "# obsidianmd-parser\n\nA Python package for parsing Obsidian Markdown vaults and notes, with support for Obsidian's built-in markdown format and Dataview queries.\n\n## Features\n\n- **Complete Vault Parsing**: Load and parse entire Obsidian vaults\n- **Note Object Model**: Work with notes as Python objects with attributes and methods\n- **Obsidian Markdown Support**: \n - Wikilinks (`[[links]]` and `[[links|aliases]]`)\n - Tags (`#tag`, `#nested/tag`)\n - Task lists with status tracking\n - Obsidian callouts\n- **Relationship Tracking**: Analyze backlinks and relationships between notes\n- **Dataview Support**: \n - Parse Dataview queries from notes\n - Evaluate Dataview queries programmatically\n- **Search Capabilities**:\n - Exact search for notes\n - Similarity search using various algorithms\n- **Code Block Handling**: Correctly excludes parsing within code blocks\n\n## Installation\n\n```bash\npip install obsidianmd-parser\n```\n\n## Quick Start\n\n```python\nfrom obsidian_parser import Vault\n\n# Load a vault\nvault = Vault(\"path/to/your/obsidian/vault\")\n\n# Find notes by exact name\nnote = vault.get_note(\"My Note\")\n\n# Search notes by similarity\nsimilar_notes = vault.find_notes(\"machine learning\", case_sensitive=False)\n\n# Access note properties\nprint(note.title)\nprint(note.tags)\nprint(note.wikilinks)\nprint(note.tasks)\n\n# Work with relationships\nbacklinks = note.get_backlinks(vault=vault)\nrelated = note.get_forward_links(vault=vault)\nmost_linked = note.get_most_linked()\n```\n\n## Core API\n\n### Vault\n\nThe `Vault` class represents an entire Obsidian vault:\n\n```python\n# lazy_load = notes are parsed only when accessed (default: True)\nvault = Vault(\"path/to/vault\", lazy_load=True)\n\n# Search and retrieval\nnote = vault.get_note(\"Note Title\")\nnotes = vault.find_similar_notes(\"search query\", threshold=0.5)\n\n# Vault analysis\nnote_graph = vault.get_note_graph() # Produces a note graph tuple object\ndataview_usage = vault.analyze_dataview_usage() # Get vault statistics for dataview queries\nbroken_links = vault.find_broken_links() # Finds all broken links in the vault\n```\n\n### Note\n\nThe `Note` class represents an individual note:\n\n```python\n# Access note metadata\nnote.title # Note title\nnote.path # File path\nnote.content # Raw markdown content\nnote.frontmatter # Parsed YAML frontmatter\n\n# Access parsed elements\nnote.tags # List of tags in the note\nnote.wikilinks # List of wikilinks (forward)\nnote.tasks # List of tasks\nnote.callouts # List of callouts\n\n# Access raw frontmatter\nraw = note.frontmatter # Dict-like object with raw values\n\n# Get cleaned frontmatter (removes wikilinks, formats dates)\ncleaned = note.frontmatter.clean()\n\n# Custom date formatting\ncleaned = note.frontmatter.clean(date_format='DD-MM-YYYY')\ncleaned = note.frontmatter.clean(date_format='%B %d, %Y') # \"March 24, 2025\"\n\n# Relationships\nvault=Vault('path/to/vault')\nnote.get_backlinks(vault) # Notes that link to this note\nnote.get_forward_links(vault) # Notes this note links to\nnote.get_related_notes() # Related notes by various metrics\nnote.get_link_context(\"Target\") # Get the context for a piece of text in your note \nnote.get_link_context( # E.g. context for a wikilink.\n target=note.wikilinks[0].display_text, \n context_chars=40)\n```\n\n## Sections\n\n```python\nfor section in note.sections:\n print(f\"Section: {section.heading}\")\n print(f\" Full path: {section.full_path}\")\n print(f\" Parent headings [(level, heading)]: {section.parent_headings}\")\n print(f\" Heading list: {section.breadcrumb}\")\n print(f\" Heading hierarchy: {section.full_path}\")\n print(f\" Has parent: {section.parent is not None}\")\n```\n\n## Dataview Support\n\nParse and evaluate Dataview queries:\n\n```python\n# Parse Dataview queries from a note\nqueries = note.dataview_queries\n\nquery = queries[0]\nquery.evaluate(vault, note)\n\n# Evaluate a Dataview query in notes or sections\nprint(note.get_evaluated_view(vault))\n\nnote_section = notes.sections[10]\n\nprint(note_section.get_evaluated_view(vault))\n```\n\n## Advanced Usage\n\n### Custom Search\n\n```python\n# Configure similarity search\nresults = vault.search(\n query=\"machine learning\",\n limit=10\n threshold=0.6\n)\n```\n\n### Vault Analysis\n\n```python\n# Build an note index dataframe of the vault\nvault_index = vault.build_index()\n\n# Build and analyze vault graph\ngraph = vault.get_note_graph()\n\n# Find broken links\nbroken_links = vault.find_broken_links()\n\n# Relationship analysis\nrelationship_stats = vault.analyze_relationships() # Builds a Relationship Analyzer object\nstats_report = relationship_stats.build_statistics_report()\ndf = relationship_stats.export_to_dataframe() # Pandas dataframe object\nrelationship_stats.find_hub_notes( # Find notes with lots of connections (default = 10)\n min_connections=50\n) \norphaned_notes = relationship_stats.find_orphaned_notes() # Find orphaned notes (no backlinks)\n```\n\n### Working with Parsed Elements\n\n```python\n# Access specific elements\nfor link in note.wikilinks:\n print(f\"Link to: {link.target}, alias: {link.alias}\")\n\nfor task in note.tasks:\n if task.status == \" \":\n print(f\"TODO: {task.text}\")\n\nfor tag in note.tags:\n print(f\"Tag: #{tag.name}\")\n```\n\n## Requirements\n\n- Python 3.12+ (earlier versions may be supported but not yet tested)\n- Dependencies are automatically installed with pip\n\n## Contributing\n\nContributions are welcome! The project is hosted on Codeberg:\n\nhttps://codeberg.org/paddyd/obsidian-parser\n\nPlease feel free to submit issues and pull requests.\n\n## License\n\nMIT\n\n## Changelog\n\n### 0.3.1 (2025-09-07)\n- Added fix to prevent '#'s in URLs being parsed as tags.\n- Added further unit tests for tag parsing.\n\n### 0.3.0 (2025-06-14)\n- Added parent heading parsing for `Sections`.\n- `Sections` now capture heading hierarchy for the whole note.\n\n### 0.2.0 (2025-06-07)\n- Added `Frontmatter.clean()` method for cleaning frontmatter values\n- Frontmatter now returns a dict-like object instead of plain dict\n- Improved wikilink parsing in frontmatter values\n\n### 0.1.0 (Initial Release)\n- Core vault and note parsing functionality\n- Obsidian markdown format support\n- Dataview query parsing and evaluation\n- Search capabilities (exact and similarity)\n- Relationship tracking and graph building\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A Python library for parsing Obsidian Markdown (.md) files and vaults.",
"version": "0.3.1",
"project_urls": {
"Documentation": "https://codeberg.org/paddyd/obsidianmd-parser",
"Homepage": "https://codeberg.org/paddyd/obsidianmd-parser",
"Repository": "https://codeberg.org/paddyd/obsidianmd-parser"
},
"split_keywords": [
"obsidian",
" markdown",
" parser",
" dataview"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "537b8e861fceba1dd500c287949046a66c16ff99077e897872db61d9fabe7d70",
"md5": "90990f005b90ac2bc4aeddef1865f66f",
"sha256": "cdc3b36f85311d34feea1d449aee1d954cf68e05cd9da7e80bb20ed28e525e02"
},
"downloads": -1,
"filename": "obsidianmd_parser-0.3.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "90990f005b90ac2bc4aeddef1865f66f",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.12",
"size": 52938,
"upload_time": "2025-09-07T16:19:06",
"upload_time_iso_8601": "2025-09-07T16:19:06.523904Z",
"url": "https://files.pythonhosted.org/packages/53/7b/8e861fceba1dd500c287949046a66c16ff99077e897872db61d9fabe7d70/obsidianmd_parser-0.3.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "5536c9df7b78bb6651ca4ba7b04079eb04c512ee6d47db2f51499d8c8798e5b7",
"md5": "29d19619b7033232fb4d9fdc690eb0f9",
"sha256": "17274a0b548a66e7307d82e78c5a7ec712a76d4ddf919d26d72aa790b8f46a92"
},
"downloads": -1,
"filename": "obsidianmd_parser-0.3.1.tar.gz",
"has_sig": false,
"md5_digest": "29d19619b7033232fb4d9fdc690eb0f9",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.12",
"size": 48406,
"upload_time": "2025-09-07T16:19:07",
"upload_time_iso_8601": "2025-09-07T16:19:07.816453Z",
"url": "https://files.pythonhosted.org/packages/55/36/c9df7b78bb6651ca4ba7b04079eb04c512ee6d47db2f51499d8c8798e5b7/obsidianmd_parser-0.3.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-07 16:19:07",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": true,
"codeberg_user": "paddyd",
"codeberg_project": "obsidianmd-parser",
"lcname": "obsidianmd-parser"
}