llama-index-readers-nougat-ocr


Namellama-index-readers-nougat-ocr JSON
Version 0.2.0 PyPI version JSON
download
home_pageNone
Summaryllama-index readers nougat_ocr integration
upload_time2024-08-22 06:42:23
maintainermdarshad1000
docs_urlNone
authorYour Name
requires_python<4.0,>=3.8.1
licenseMIT
keywords academic papers ocr pdf
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Nougat OCR loader

```bash
pip install llama-index-readers-nougat-ocr
```

This loader reads the equations, symbols, and tables included in the PDF.

Users can input the path of the academic PDF document `file` which they want to parse. This OCR understands LaTeX math and tables.

## Usage

Here's an example usage of the PDFNougatOCR.

```python
from llama_index.readers.nougat_ocr import PDFNougatOCR

reader = PDFNougatOCR()

pdf_path = Path("/path/to/pdf")

documents = reader.load_data(pdf_path)
```

## Miscellaneous

An `output` folder will be created with the same name as the pdf and `.mmd` extension.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "llama-index-readers-nougat-ocr",
    "maintainer": "mdarshad1000",
    "docs_url": null,
    "requires_python": "<4.0,>=3.8.1",
    "maintainer_email": null,
    "keywords": "academic papers, ocr, pdf",
    "author": "Your Name",
    "author_email": "you@example.com",
    "download_url": "https://files.pythonhosted.org/packages/07/7c/c574f8baa60be6f1858ee66cb35a51bdad80486eb79b9266937326d0a3c0/llama_index_readers_nougat_ocr-0.2.0.tar.gz",
    "platform": null,
    "description": "# Nougat OCR loader\n\n```bash\npip install llama-index-readers-nougat-ocr\n```\n\nThis loader reads the equations, symbols, and tables included in the PDF.\n\nUsers can input the path of the academic PDF document `file` which they want to parse. This OCR understands LaTeX math and tables.\n\n## Usage\n\nHere's an example usage of the PDFNougatOCR.\n\n```python\nfrom llama_index.readers.nougat_ocr import PDFNougatOCR\n\nreader = PDFNougatOCR()\n\npdf_path = Path(\"/path/to/pdf\")\n\ndocuments = reader.load_data(pdf_path)\n```\n\n## Miscellaneous\n\nAn `output` folder will be created with the same name as the pdf and `.mmd` extension.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "llama-index readers nougat_ocr integration",
    "version": "0.2.0",
    "project_urls": null,
    "split_keywords": [
        "academic papers",
        " ocr",
        " pdf"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5394876ada55237874b5dd022f2f66382b145082136c9903a5297d7e631e7228",
                "md5": "690a21806edd10a67e471484bd02f447",
                "sha256": "d709f6bd465819dff0138e92f0d75f6069fa1e401cab1c2b0f0799da32be6e26"
            },
            "downloads": -1,
            "filename": "llama_index_readers_nougat_ocr-0.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "690a21806edd10a67e471484bd02f447",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.8.1",
            "size": 2690,
            "upload_time": "2024-08-22T06:42:21",
            "upload_time_iso_8601": "2024-08-22T06:42:21.226270Z",
            "url": "https://files.pythonhosted.org/packages/53/94/876ada55237874b5dd022f2f66382b145082136c9903a5297d7e631e7228/llama_index_readers_nougat_ocr-0.2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "077cc574f8baa60be6f1858ee66cb35a51bdad80486eb79b9266937326d0a3c0",
                "md5": "a36698b408e05059151234293fb37d59",
                "sha256": "979a097999a5e03c80deeb5afd088af6bf6eeb2e3eed246853c006b832a10c6e"
            },
            "downloads": -1,
            "filename": "llama_index_readers_nougat_ocr-0.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "a36698b408e05059151234293fb37d59",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.8.1",
            "size": 2462,
            "upload_time": "2024-08-22T06:42:23",
            "upload_time_iso_8601": "2024-08-22T06:42:23.297386Z",
            "url": "https://files.pythonhosted.org/packages/07/7c/c574f8baa60be6f1858ee66cb35a51bdad80486eb79b9266937326d0a3c0/llama_index_readers_nougat_ocr-0.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-22 06:42:23",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "llama-index-readers-nougat-ocr"
}
        
Elapsed time: 0.26566s