llama-index-node-parser-docling


Namellama-index-node-parser-docling JSON
Version 0.2.0 PyPI version JSON
download
home_pageNone
Summaryllama-index node_parser docling integration
upload_time2024-10-26 02:03:11
maintainerNone
docs_urlNone
authorPanos Vagenas
requires_python<4.0,>=3.10
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Docling Node Parser

## Overview

Docling Node Parser parses [Docling](https://github.com/DS4SD/docling) JSON output into LlamaIndex nodes with rich metadata for usage in downstream pipelines for RAG / QA etc.

## Installation

```console
pip install llama-index-node-parser-docling
```

## Usage

Docling Node Parser parses LlamaIndex documents containing JSON-serialized Docling format, as created by a Docling Reader.

Basic usage looks like this:

```python
# docs = ...  # e.g. created using Docling Reader in JSON mode

from llama_index.node_parser.docling import DoclingNodeParser

node_parser = DoclingNodeParser()
nodes = node_parser.get_nodes_from_documents(documents=docs)
print(f"{nodes[12].text[:70]}...")
# > Docling provides an easy code interface to convert PDF documents from ...

print(nodes[12].metadata)
# > {'doc_items': [
# >    'self_ref': '#/main-text/21',
# >    'prov': [
# >      'page_no': 2,
# >      'bbox': {'l': 107.3, 't': 499.5, 'r': 504.0, 'b': 456.7, ...},
# >      ...
# >  ],
# >  'headings': ['2 Getting Started'],
# >  ...
# > }
```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "llama-index-node-parser-docling",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.10",
    "maintainer_email": null,
    "keywords": null,
    "author": "Panos Vagenas",
    "author_email": "pva@zurich.ibm.com",
    "download_url": "https://files.pythonhosted.org/packages/b3/60/efed7ea4497bb5e47aad9875d135974a4b8ecd5e965cf3cbfd1d07a630ba/llama_index_node_parser_docling-0.2.0.tar.gz",
    "platform": null,
    "description": "# Docling Node Parser\n\n## Overview\n\nDocling Node Parser parses [Docling](https://github.com/DS4SD/docling) JSON output into LlamaIndex nodes with rich metadata for usage in downstream pipelines for RAG / QA etc.\n\n## Installation\n\n```console\npip install llama-index-node-parser-docling\n```\n\n## Usage\n\nDocling Node Parser parses LlamaIndex documents containing JSON-serialized Docling format, as created by a Docling Reader.\n\nBasic usage looks like this:\n\n```python\n# docs = ...  # e.g. created using Docling Reader in JSON mode\n\nfrom llama_index.node_parser.docling import DoclingNodeParser\n\nnode_parser = DoclingNodeParser()\nnodes = node_parser.get_nodes_from_documents(documents=docs)\nprint(f\"{nodes[12].text[:70]}...\")\n# > Docling provides an easy code interface to convert PDF documents from ...\n\nprint(nodes[12].metadata)\n# > {'doc_items': [\n# >    'self_ref': '#/main-text/21',\n# >    'prov': [\n# >      'page_no': 2,\n# >      'bbox': {'l': 107.3, 't': 499.5, 'r': 504.0, 'b': 456.7, ...},\n# >      ...\n# >  ],\n# >  'headings': ['2 Getting Started'],\n# >  ...\n# > }\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "llama-index node_parser docling integration",
    "version": "0.2.0",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "cce09793e4417cccab6c97b394b54986e7c7f99ab0b8fe3fb6421247c3174c6c",
                "md5": "a91a77435e6988a48ffdbba5092744e7",
                "sha256": "0d88636d10b32a61323402bb8012470860eb0e69344373f692fec6d0e39761b2"
            },
            "downloads": -1,
            "filename": "llama_index_node_parser_docling-0.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a91a77435e6988a48ffdbba5092744e7",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.10",
            "size": 3386,
            "upload_time": "2024-10-26T02:03:09",
            "upload_time_iso_8601": "2024-10-26T02:03:09.930331Z",
            "url": "https://files.pythonhosted.org/packages/cc/e0/9793e4417cccab6c97b394b54986e7c7f99ab0b8fe3fb6421247c3174c6c/llama_index_node_parser_docling-0.2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b360efed7ea4497bb5e47aad9875d135974a4b8ecd5e965cf3cbfd1d07a630ba",
                "md5": "a58a7fe182be1ac40192da4c235395a9",
                "sha256": "9b2fec1be5f92a84abc34cd427233db7ec18f5b387a0041887a577c2352d7cfd"
            },
            "downloads": -1,
            "filename": "llama_index_node_parser_docling-0.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "a58a7fe182be1ac40192da4c235395a9",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.10",
            "size": 3022,
            "upload_time": "2024-10-26T02:03:11",
            "upload_time_iso_8601": "2024-10-26T02:03:11.189706Z",
            "url": "https://files.pythonhosted.org/packages/b3/60/efed7ea4497bb5e47aad9875d135974a4b8ecd5e965cf3cbfd1d07a630ba/llama_index_node_parser_docling-0.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-26 02:03:11",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "llama-index-node-parser-docling"
}
        
Elapsed time: 0.71929s