obsidianmd-parser


Nameobsidianmd-parser JSON
Version 0.3.1 PyPI version JSON
download
home_pageNone
SummaryA Python library for parsing Obsidian Markdown (.md) files and vaults.
upload_time2025-09-07 16:19:07
maintainerNone
docs_urlNone
authorpaddyd
requires_python<4.0,>=3.12
licenseMIT
keywords obsidian markdown parser dataview
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # obsidianmd-parser

A Python package for parsing Obsidian Markdown vaults and notes, with support for Obsidian's built-in markdown format and Dataview queries.

## Features

- **Complete Vault Parsing**: Load and parse entire Obsidian vaults
- **Note Object Model**: Work with notes as Python objects with attributes and methods
- **Obsidian Markdown Support**: 
  - Wikilinks (`[[links]]` and `[[links|aliases]]`)
  - Tags (`#tag`, `#nested/tag`)
  - Task lists with status tracking
  - Obsidian callouts
- **Relationship Tracking**: Analyze backlinks and relationships between notes
- **Dataview Support**: 
  - Parse Dataview queries from notes
  - Evaluate Dataview queries programmatically
- **Search Capabilities**:
  - Exact search for notes
  - Similarity search using various algorithms
- **Code Block Handling**: Correctly excludes parsing within code blocks

## Installation

```bash
pip install obsidianmd-parser
```

## Quick Start

```python
from obsidian_parser import Vault

# Load a vault
vault = Vault("path/to/your/obsidian/vault")

# Find notes by exact name
note = vault.get_note("My Note")

# Search notes by similarity
similar_notes = vault.find_notes("machine learning", case_sensitive=False)

# Access note properties
print(note.title)
print(note.tags)
print(note.wikilinks)
print(note.tasks)

# Work with relationships
backlinks = note.get_backlinks(vault=vault)
related = note.get_forward_links(vault=vault)
most_linked = note.get_most_linked()
```

## Core API

### Vault

The `Vault` class represents an entire Obsidian vault:

```python
# lazy_load = notes are parsed only when accessed (default: True)
vault = Vault("path/to/vault", lazy_load=True)

# Search and retrieval
note = vault.get_note("Note Title")
notes = vault.find_similar_notes("search query", threshold=0.5)

# Vault analysis
note_graph = vault.get_note_graph()                 # Produces a note graph tuple object
dataview_usage = vault.analyze_dataview_usage()     # Get vault statistics for dataview queries
broken_links = vault.find_broken_links()            # Finds all broken links in the vault
```

### Note

The `Note` class represents an individual note:

```python
# Access note metadata
note.title          # Note title
note.path          # File path
note.content       # Raw markdown content
note.frontmatter   # Parsed YAML frontmatter

# Access parsed elements
note.tags          # List of tags in the note
note.wikilinks     # List of wikilinks (forward)
note.tasks         # List of tasks
note.callouts      # List of callouts

# Access raw frontmatter
raw = note.frontmatter  # Dict-like object with raw values

# Get cleaned frontmatter (removes wikilinks, formats dates)
cleaned = note.frontmatter.clean()

# Custom date formatting
cleaned = note.frontmatter.clean(date_format='DD-MM-YYYY')
cleaned = note.frontmatter.clean(date_format='%B %d, %Y')  # "March 24, 2025"

# Relationships
vault=Vault('path/to/vault')
note.get_backlinks(vault)       # Notes that link to this note
note.get_forward_links(vault)   # Notes this note links to
note.get_related_notes()        # Related notes by various metrics
note.get_link_context("Target") # Get the context for a piece of text in your note 
note.get_link_context(          # E.g. context for a wikilink.
  target=note.wikilinks[0].display_text, 
  context_chars=40)
```

## Sections

```python
for section in note.sections:
    print(f"Section: {section.heading}")
    print(f"  Full path: {section.full_path}")
    print(f"  Parent headings [(level, heading)]: {section.parent_headings}")
    print(f"  Heading list: {section.breadcrumb}")
    print(f"  Heading hierarchy: {section.full_path}")
    print(f"  Has parent: {section.parent is not None}")
```

## Dataview Support

Parse and evaluate Dataview queries:

```python
# Parse Dataview queries from a note
queries = note.dataview_queries

query = queries[0]
query.evaluate(vault, note)

# Evaluate a Dataview query in notes or sections
print(note.get_evaluated_view(vault))

note_section = notes.sections[10]

print(note_section.get_evaluated_view(vault))
```

## Advanced Usage

### Custom Search

```python
# Configure similarity search
results = vault.search(
    query="machine learning",
    limit=10
    threshold=0.6
)
```

### Vault Analysis

```python
# Build an note index dataframe of the vault
vault_index = vault.build_index()

# Build and analyze vault graph
graph = vault.get_note_graph()

# Find broken links
broken_links = vault.find_broken_links()

# Relationship analysis
relationship_stats = vault.analyze_relationships()          # Builds a Relationship Analyzer object
stats_report = relationship_stats.build_statistics_report()
df = relationship_stats.export_to_dataframe()               # Pandas dataframe object
relationship_stats.find_hub_notes(                          # Find notes with lots of connections (default = 10)
  min_connections=50
) 
orphaned_notes = relationship_stats.find_orphaned_notes()   # Find orphaned notes (no backlinks)
```

### Working with Parsed Elements

```python
# Access specific elements
for link in note.wikilinks:
    print(f"Link to: {link.target}, alias: {link.alias}")

for task in note.tasks:
    if task.status == " ":
        print(f"TODO: {task.text}")

for tag in note.tags:
    print(f"Tag: #{tag.name}")
```

## Requirements

- Python 3.12+ (earlier versions may be supported but not yet tested)
- Dependencies are automatically installed with pip

## Contributing

Contributions are welcome! The project is hosted on Codeberg:

https://codeberg.org/paddyd/obsidian-parser

Please feel free to submit issues and pull requests.

## License

MIT

## Changelog

### 0.3.1 (2025-09-07)
- Added fix to prevent '#'s in URLs being parsed as tags.
- Added further unit tests for tag parsing.

### 0.3.0 (2025-06-14)
- Added parent heading parsing for `Sections`.
- `Sections` now capture heading hierarchy for the whole note.

### 0.2.0 (2025-06-07)
- Added `Frontmatter.clean()` method for cleaning frontmatter values
- Frontmatter now returns a dict-like object instead of plain dict
- Improved wikilink parsing in frontmatter values

### 0.1.0 (Initial Release)
- Core vault and note parsing functionality
- Obsidian markdown format support
- Dataview query parsing and evaluation
- Search capabilities (exact and similarity)
- Relationship tracking and graph building


            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "obsidianmd-parser",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.12",
    "maintainer_email": null,
    "keywords": "obsidian, markdown, parser, dataview",
    "author": "paddyd",
    "author_email": "patduf1@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/55/36/c9df7b78bb6651ca4ba7b04079eb04c512ee6d47db2f51499d8c8798e5b7/obsidianmd_parser-0.3.1.tar.gz",
    "platform": null,
    "description": "# obsidianmd-parser\n\nA Python package for parsing Obsidian Markdown vaults and notes, with support for Obsidian's built-in markdown format and Dataview queries.\n\n## Features\n\n- **Complete Vault Parsing**: Load and parse entire Obsidian vaults\n- **Note Object Model**: Work with notes as Python objects with attributes and methods\n- **Obsidian Markdown Support**: \n  - Wikilinks (`[[links]]` and `[[links|aliases]]`)\n  - Tags (`#tag`, `#nested/tag`)\n  - Task lists with status tracking\n  - Obsidian callouts\n- **Relationship Tracking**: Analyze backlinks and relationships between notes\n- **Dataview Support**: \n  - Parse Dataview queries from notes\n  - Evaluate Dataview queries programmatically\n- **Search Capabilities**:\n  - Exact search for notes\n  - Similarity search using various algorithms\n- **Code Block Handling**: Correctly excludes parsing within code blocks\n\n## Installation\n\n```bash\npip install obsidianmd-parser\n```\n\n## Quick Start\n\n```python\nfrom obsidian_parser import Vault\n\n# Load a vault\nvault = Vault(\"path/to/your/obsidian/vault\")\n\n# Find notes by exact name\nnote = vault.get_note(\"My Note\")\n\n# Search notes by similarity\nsimilar_notes = vault.find_notes(\"machine learning\", case_sensitive=False)\n\n# Access note properties\nprint(note.title)\nprint(note.tags)\nprint(note.wikilinks)\nprint(note.tasks)\n\n# Work with relationships\nbacklinks = note.get_backlinks(vault=vault)\nrelated = note.get_forward_links(vault=vault)\nmost_linked = note.get_most_linked()\n```\n\n## Core API\n\n### Vault\n\nThe `Vault` class represents an entire Obsidian vault:\n\n```python\n# lazy_load = notes are parsed only when accessed (default: True)\nvault = Vault(\"path/to/vault\", lazy_load=True)\n\n# Search and retrieval\nnote = vault.get_note(\"Note Title\")\nnotes = vault.find_similar_notes(\"search query\", threshold=0.5)\n\n# Vault analysis\nnote_graph = vault.get_note_graph()                 # Produces a note graph tuple object\ndataview_usage = vault.analyze_dataview_usage()     # Get vault statistics for dataview queries\nbroken_links = vault.find_broken_links()            # Finds all broken links in the vault\n```\n\n### Note\n\nThe `Note` class represents an individual note:\n\n```python\n# Access note metadata\nnote.title          # Note title\nnote.path          # File path\nnote.content       # Raw markdown content\nnote.frontmatter   # Parsed YAML frontmatter\n\n# Access parsed elements\nnote.tags          # List of tags in the note\nnote.wikilinks     # List of wikilinks (forward)\nnote.tasks         # List of tasks\nnote.callouts      # List of callouts\n\n# Access raw frontmatter\nraw = note.frontmatter  # Dict-like object with raw values\n\n# Get cleaned frontmatter (removes wikilinks, formats dates)\ncleaned = note.frontmatter.clean()\n\n# Custom date formatting\ncleaned = note.frontmatter.clean(date_format='DD-MM-YYYY')\ncleaned = note.frontmatter.clean(date_format='%B %d, %Y')  # \"March 24, 2025\"\n\n# Relationships\nvault=Vault('path/to/vault')\nnote.get_backlinks(vault)       # Notes that link to this note\nnote.get_forward_links(vault)   # Notes this note links to\nnote.get_related_notes()        # Related notes by various metrics\nnote.get_link_context(\"Target\") # Get the context for a piece of text in your note \nnote.get_link_context(          # E.g. context for a wikilink.\n  target=note.wikilinks[0].display_text, \n  context_chars=40)\n```\n\n## Sections\n\n```python\nfor section in note.sections:\n    print(f\"Section: {section.heading}\")\n    print(f\"  Full path: {section.full_path}\")\n    print(f\"  Parent headings [(level, heading)]: {section.parent_headings}\")\n    print(f\"  Heading list: {section.breadcrumb}\")\n    print(f\"  Heading hierarchy: {section.full_path}\")\n    print(f\"  Has parent: {section.parent is not None}\")\n```\n\n## Dataview Support\n\nParse and evaluate Dataview queries:\n\n```python\n# Parse Dataview queries from a note\nqueries = note.dataview_queries\n\nquery = queries[0]\nquery.evaluate(vault, note)\n\n# Evaluate a Dataview query in notes or sections\nprint(note.get_evaluated_view(vault))\n\nnote_section = notes.sections[10]\n\nprint(note_section.get_evaluated_view(vault))\n```\n\n## Advanced Usage\n\n### Custom Search\n\n```python\n# Configure similarity search\nresults = vault.search(\n    query=\"machine learning\",\n    limit=10\n    threshold=0.6\n)\n```\n\n### Vault Analysis\n\n```python\n# Build an note index dataframe of the vault\nvault_index = vault.build_index()\n\n# Build and analyze vault graph\ngraph = vault.get_note_graph()\n\n# Find broken links\nbroken_links = vault.find_broken_links()\n\n# Relationship analysis\nrelationship_stats = vault.analyze_relationships()          # Builds a Relationship Analyzer object\nstats_report = relationship_stats.build_statistics_report()\ndf = relationship_stats.export_to_dataframe()               # Pandas dataframe object\nrelationship_stats.find_hub_notes(                          # Find notes with lots of connections (default = 10)\n  min_connections=50\n) \norphaned_notes = relationship_stats.find_orphaned_notes()   # Find orphaned notes (no backlinks)\n```\n\n### Working with Parsed Elements\n\n```python\n# Access specific elements\nfor link in note.wikilinks:\n    print(f\"Link to: {link.target}, alias: {link.alias}\")\n\nfor task in note.tasks:\n    if task.status == \" \":\n        print(f\"TODO: {task.text}\")\n\nfor tag in note.tags:\n    print(f\"Tag: #{tag.name}\")\n```\n\n## Requirements\n\n- Python 3.12+ (earlier versions may be supported but not yet tested)\n- Dependencies are automatically installed with pip\n\n## Contributing\n\nContributions are welcome! The project is hosted on Codeberg:\n\nhttps://codeberg.org/paddyd/obsidian-parser\n\nPlease feel free to submit issues and pull requests.\n\n## License\n\nMIT\n\n## Changelog\n\n### 0.3.1 (2025-09-07)\n- Added fix to prevent '#'s in URLs being parsed as tags.\n- Added further unit tests for tag parsing.\n\n### 0.3.0 (2025-06-14)\n- Added parent heading parsing for `Sections`.\n- `Sections` now capture heading hierarchy for the whole note.\n\n### 0.2.0 (2025-06-07)\n- Added `Frontmatter.clean()` method for cleaning frontmatter values\n- Frontmatter now returns a dict-like object instead of plain dict\n- Improved wikilink parsing in frontmatter values\n\n### 0.1.0 (Initial Release)\n- Core vault and note parsing functionality\n- Obsidian markdown format support\n- Dataview query parsing and evaluation\n- Search capabilities (exact and similarity)\n- Relationship tracking and graph building\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A Python library for parsing Obsidian Markdown (.md) files and vaults.",
    "version": "0.3.1",
    "project_urls": {
        "Documentation": "https://codeberg.org/paddyd/obsidianmd-parser",
        "Homepage": "https://codeberg.org/paddyd/obsidianmd-parser",
        "Repository": "https://codeberg.org/paddyd/obsidianmd-parser"
    },
    "split_keywords": [
        "obsidian",
        " markdown",
        " parser",
        " dataview"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "537b8e861fceba1dd500c287949046a66c16ff99077e897872db61d9fabe7d70",
                "md5": "90990f005b90ac2bc4aeddef1865f66f",
                "sha256": "cdc3b36f85311d34feea1d449aee1d954cf68e05cd9da7e80bb20ed28e525e02"
            },
            "downloads": -1,
            "filename": "obsidianmd_parser-0.3.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "90990f005b90ac2bc4aeddef1865f66f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.12",
            "size": 52938,
            "upload_time": "2025-09-07T16:19:06",
            "upload_time_iso_8601": "2025-09-07T16:19:06.523904Z",
            "url": "https://files.pythonhosted.org/packages/53/7b/8e861fceba1dd500c287949046a66c16ff99077e897872db61d9fabe7d70/obsidianmd_parser-0.3.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "5536c9df7b78bb6651ca4ba7b04079eb04c512ee6d47db2f51499d8c8798e5b7",
                "md5": "29d19619b7033232fb4d9fdc690eb0f9",
                "sha256": "17274a0b548a66e7307d82e78c5a7ec712a76d4ddf919d26d72aa790b8f46a92"
            },
            "downloads": -1,
            "filename": "obsidianmd_parser-0.3.1.tar.gz",
            "has_sig": false,
            "md5_digest": "29d19619b7033232fb4d9fdc690eb0f9",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.12",
            "size": 48406,
            "upload_time": "2025-09-07T16:19:07",
            "upload_time_iso_8601": "2025-09-07T16:19:07.816453Z",
            "url": "https://files.pythonhosted.org/packages/55/36/c9df7b78bb6651ca4ba7b04079eb04c512ee6d47db2f51499d8c8798e5b7/obsidianmd_parser-0.3.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-07 16:19:07",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": true,
    "codeberg_user": "paddyd",
    "codeberg_project": "obsidianmd-parser",
    "lcname": "obsidianmd-parser"
}
        
Elapsed time: 1.13536s