Name | llama-index-node-parser-docling JSON |
Version |
0.2.0
JSON |
| download |
home_page | None |
Summary | llama-index node_parser docling integration |
upload_time | 2024-10-26 02:03:11 |
maintainer | None |
docs_url | None |
author | Panos Vagenas |
requires_python | <4.0,>=3.10 |
license | MIT |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# Docling Node Parser
## Overview
Docling Node Parser parses [Docling](https://github.com/DS4SD/docling) JSON output into LlamaIndex nodes with rich metadata for usage in downstream pipelines for RAG / QA etc.
## Installation
```console
pip install llama-index-node-parser-docling
```
## Usage
Docling Node Parser parses LlamaIndex documents containing JSON-serialized Docling format, as created by a Docling Reader.
Basic usage looks like this:
```python
# docs = ... # e.g. created using Docling Reader in JSON mode
from llama_index.node_parser.docling import DoclingNodeParser
node_parser = DoclingNodeParser()
nodes = node_parser.get_nodes_from_documents(documents=docs)
print(f"{nodes[12].text[:70]}...")
# > Docling provides an easy code interface to convert PDF documents from ...
print(nodes[12].metadata)
# > {'doc_items': [
# > 'self_ref': '#/main-text/21',
# > 'prov': [
# > 'page_no': 2,
# > 'bbox': {'l': 107.3, 't': 499.5, 'r': 504.0, 'b': 456.7, ...},
# > ...
# > ],
# > 'headings': ['2 Getting Started'],
# > ...
# > }
```
Raw data
{
"_id": null,
"home_page": null,
"name": "llama-index-node-parser-docling",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.10",
"maintainer_email": null,
"keywords": null,
"author": "Panos Vagenas",
"author_email": "pva@zurich.ibm.com",
"download_url": "https://files.pythonhosted.org/packages/b3/60/efed7ea4497bb5e47aad9875d135974a4b8ecd5e965cf3cbfd1d07a630ba/llama_index_node_parser_docling-0.2.0.tar.gz",
"platform": null,
"description": "# Docling Node Parser\n\n## Overview\n\nDocling Node Parser parses [Docling](https://github.com/DS4SD/docling) JSON output into LlamaIndex nodes with rich metadata for usage in downstream pipelines for RAG / QA etc.\n\n## Installation\n\n```console\npip install llama-index-node-parser-docling\n```\n\n## Usage\n\nDocling Node Parser parses LlamaIndex documents containing JSON-serialized Docling format, as created by a Docling Reader.\n\nBasic usage looks like this:\n\n```python\n# docs = ... # e.g. created using Docling Reader in JSON mode\n\nfrom llama_index.node_parser.docling import DoclingNodeParser\n\nnode_parser = DoclingNodeParser()\nnodes = node_parser.get_nodes_from_documents(documents=docs)\nprint(f\"{nodes[12].text[:70]}...\")\n# > Docling provides an easy code interface to convert PDF documents from ...\n\nprint(nodes[12].metadata)\n# > {'doc_items': [\n# > 'self_ref': '#/main-text/21',\n# > 'prov': [\n# > 'page_no': 2,\n# > 'bbox': {'l': 107.3, 't': 499.5, 'r': 504.0, 'b': 456.7, ...},\n# > ...\n# > ],\n# > 'headings': ['2 Getting Started'],\n# > ...\n# > }\n```\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "llama-index node_parser docling integration",
"version": "0.2.0",
"project_urls": null,
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "cce09793e4417cccab6c97b394b54986e7c7f99ab0b8fe3fb6421247c3174c6c",
"md5": "a91a77435e6988a48ffdbba5092744e7",
"sha256": "0d88636d10b32a61323402bb8012470860eb0e69344373f692fec6d0e39761b2"
},
"downloads": -1,
"filename": "llama_index_node_parser_docling-0.2.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "a91a77435e6988a48ffdbba5092744e7",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.10",
"size": 3386,
"upload_time": "2024-10-26T02:03:09",
"upload_time_iso_8601": "2024-10-26T02:03:09.930331Z",
"url": "https://files.pythonhosted.org/packages/cc/e0/9793e4417cccab6c97b394b54986e7c7f99ab0b8fe3fb6421247c3174c6c/llama_index_node_parser_docling-0.2.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "b360efed7ea4497bb5e47aad9875d135974a4b8ecd5e965cf3cbfd1d07a630ba",
"md5": "a58a7fe182be1ac40192da4c235395a9",
"sha256": "9b2fec1be5f92a84abc34cd427233db7ec18f5b387a0041887a577c2352d7cfd"
},
"downloads": -1,
"filename": "llama_index_node_parser_docling-0.2.0.tar.gz",
"has_sig": false,
"md5_digest": "a58a7fe182be1ac40192da4c235395a9",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.10",
"size": 3022,
"upload_time": "2024-10-26T02:03:11",
"upload_time_iso_8601": "2024-10-26T02:03:11.189706Z",
"url": "https://files.pythonhosted.org/packages/b3/60/efed7ea4497bb5e47aad9875d135974a4b8ecd5e965cf3cbfd1d07a630ba/llama_index_node_parser_docling-0.2.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-26 02:03:11",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "llama-index-node-parser-docling"
}