| Name | llama-parse JSON | 
            
| Version | 
                  0.6.78
                   
                  JSON | 
            
 | download  | 
            
| home_page | None  | 
            
| Summary | Parse files into RAG-Optimized formats. | 
            | upload_time | 2025-11-04 02:16:22 | 
            | maintainer | None | 
            
            | docs_url | None | 
            | author | None | 
            
            | requires_python | <4.0,>=3.9 | 
            
            
            | license | None | 
            | keywords | 
                 | 
            | VCS | 
                
                    | 
                
            
            | bugtrack_url | 
                
                 | 
             
            
            | requirements | 
                
                  No requirements were recorded.
                
             | 
            
| Travis-CI | 
                
                   No Travis.
                
             | 
            | coveralls test coverage | 
                
                   No coveralls.
                
             | 
        
        
            
            # LlamaParse
[](https://pypi.org/project/llama-parse/)
[](https://github.com/run-llama/llama_parse/graphs/contributors)
[](https://discord.gg/dGcwcsnxhU)
LlamaParse is a **GenAI-native document parser** that can parse complex document data for any downstream LLM use case (RAG, agents).
It is really good at the following:
- ✅ **Broad file type support**: Parsing a variety of unstructured file types (.pdf, .pptx, .docx, .xlsx, .html) with text, tables, visual elements, weird layouts, and more.
- ✅ **Table recognition**: Parsing embedded tables accurately into text and semi-structured representations.
- ✅ **Multimodal parsing and chunking**: Extracting visual elements (images/diagrams) into structured formats and return image chunks using the latest multimodal models.
- ✅ **Custom parsing**: Input custom prompt instructions to customize the output the way you want it.
LlamaParse directly integrates with [LlamaIndex](https://github.com/run-llama/llama_index).
The free plan is up to 1000 pages a day. Paid plan is free 7k pages per week + 0.3c per additional page by default. There is a sandbox available to test the API [**https://cloud.llamaindex.ai/parse ↗**](https://cloud.llamaindex.ai/parse).
Read below for some quickstart information, or see the [full documentation](https://docs.cloud.llamaindex.ai/).
If you're a company interested in enterprise RAG solutions, and/or high volume/on-prem usage of LlamaParse, come [talk to us](https://www.llamaindex.ai/contact).
## Getting Started
First, login and get an api-key from [**https://cloud.llamaindex.ai/api-key ↗**](https://cloud.llamaindex.ai/api-key).
Then, make sure you have the latest LlamaIndex version installed.
**NOTE:** If you are upgrading from v0.9.X, we recommend following our [migration guide](https://pretty-sodium-5e0.notion.site/v0-10-0-Migration-Guide-6ede431dcb8841b09ea171e7f133bd77), as well as uninstalling your previous version first.
```
pip uninstall llama-index  # run this if upgrading from v0.9.x or older
pip install -U llama-index --upgrade --no-cache-dir --force-reinstall
```
Lastly, install the package:
`pip install llama-parse`
Now you can parse your first PDF file using the command line interface. Use the command `llama-parse [file_paths]`. See the help text with `llama-parse --help`.
```bash
export LLAMA_CLOUD_API_KEY='llx-...'
# output as text
llama-parse my_file.pdf --result-type text --output-file output.txt
# output as markdown
llama-parse my_file.pdf --result-type markdown --output-file output.md
# output as raw json
llama-parse my_file.pdf --output-raw-json --output-file output.json
```
You can also create simple scripts:
```python
import nest_asyncio
nest_asyncio.apply()
from llama_parse import LlamaParse
parser = LlamaParse(
    api_key="llx-...",  # can also be set in your env as LLAMA_CLOUD_API_KEY
    result_type="markdown",  # "markdown" and "text" are available
    num_workers=4,  # if multiple files passed, split in `num_workers` API calls
    verbose=True,
    language="en",  # Optionally you can define a language, default=en
)
# sync
documents = parser.load_data("./my_file.pdf")
# sync batch
documents = parser.load_data(["./my_file1.pdf", "./my_file2.pdf"])
# async
documents = await parser.aload_data("./my_file.pdf")
# async batch
documents = await parser.aload_data(["./my_file1.pdf", "./my_file2.pdf"])
```
## Using with file object
You can parse a file object directly:
```python
import nest_asyncio
nest_asyncio.apply()
from llama_parse import LlamaParse
parser = LlamaParse(
    api_key="llx-...",  # can also be set in your env as LLAMA_CLOUD_API_KEY
    result_type="markdown",  # "markdown" and "text" are available
    num_workers=4,  # if multiple files passed, split in `num_workers` API calls
    verbose=True,
    language="en",  # Optionally you can define a language, default=en
)
file_name = "my_file1.pdf"
extra_info = {"file_name": file_name}
with open(f"./{file_name}", "rb") as f:
    # must provide extra_info with file_name key with passing file object
    documents = parser.load_data(f, extra_info=extra_info)
# you can also pass file bytes directly
with open(f"./{file_name}", "rb") as f:
    file_bytes = f.read()
    # must provide extra_info with file_name key with passing file bytes
    documents = parser.load_data(file_bytes, extra_info=extra_info)
```
## Using with `SimpleDirectoryReader`
You can also integrate the parser as the default PDF loader in `SimpleDirectoryReader`:
```python
import nest_asyncio
nest_asyncio.apply()
from llama_parse import LlamaParse
from llama_index.core import SimpleDirectoryReader
parser = LlamaParse(
    api_key="llx-...",  # can also be set in your env as LLAMA_CLOUD_API_KEY
    result_type="markdown",  # "markdown" and "text" are available
    verbose=True,
)
file_extractor = {".pdf": parser}
documents = SimpleDirectoryReader(
    "./data", file_extractor=file_extractor
).load_data()
```
Full documentation for `SimpleDirectoryReader` can be found on the [LlamaIndex Documentation](https://docs.llamaindex.ai/en/stable/module_guides/loading/simpledirectoryreader.html).
## Examples
Several end-to-end indexing examples can be found in the examples folder
- [Getting Started](/docs/examples-py/parse/demo_basic.ipynb)
- [Advanced RAG Example](/docs/examples-py/parse/demo_advanced.ipynb)
- [Raw API Usage](/docs/examples-py/parse/demo_api.ipynb)
## Documentation
[https://docs.cloud.llamaindex.ai/](https://docs.cloud.llamaindex.ai/)
## Terms of Service
See the [Terms of Service Here](./TOS.pdf).
## Get in Touch (LlamaCloud)
LlamaParse is part of LlamaCloud, our e2e enterprise RAG platform that provides out-of-the-box, production-ready connectors, indexing, and retrieval over your complex data sources. We offer SaaS and VPC options.
LlamaCloud is currently available via waitlist (join by [creating an account](https://cloud.llamaindex.ai/)). If you're interested in state-of-the-art quality and in centralizing your RAG efforts, come [get in touch with us](https://www.llamaindex.ai/contact).
            
         
        Raw data
        
            {
    "_id": null,
    "home_page": null,
    "name": "llama-parse",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.9",
    "maintainer_email": null,
    "keywords": null,
    "author": null,
    "author_email": "Logan Markewich <logan@llamaindex.ai>",
    "download_url": "https://files.pythonhosted.org/packages/e7/6c/906cc53ff9a3fd0aec639de4809b4050664a10f8270de651ce33bbc299b0/llama_parse-0.6.78.tar.gz",
    "platform": null,
    "description": "# LlamaParse\n\n[](https://pypi.org/project/llama-parse/)\n[](https://github.com/run-llama/llama_parse/graphs/contributors)\n[](https://discord.gg/dGcwcsnxhU)\n\nLlamaParse is a **GenAI-native document parser** that can parse complex document data for any downstream LLM use case (RAG, agents).\n\nIt is really good at the following:\n\n- \u2705 **Broad file type support**: Parsing a variety of unstructured file types (.pdf, .pptx, .docx, .xlsx, .html) with text, tables, visual elements, weird layouts, and more.\n- \u2705 **Table recognition**: Parsing embedded tables accurately into text and semi-structured representations.\n- \u2705 **Multimodal parsing and chunking**: Extracting visual elements (images/diagrams) into structured formats and return image chunks using the latest multimodal models.\n- \u2705 **Custom parsing**: Input custom prompt instructions to customize the output the way you want it.\n\nLlamaParse directly integrates with [LlamaIndex](https://github.com/run-llama/llama_index).\n\nThe free plan is up to 1000 pages a day. Paid plan is free 7k pages per week + 0.3c per additional page by default. There is a sandbox available to test the API [**https://cloud.llamaindex.ai/parse \u2197**](https://cloud.llamaindex.ai/parse).\n\nRead below for some quickstart information, or see the [full documentation](https://docs.cloud.llamaindex.ai/).\n\nIf you're a company interested in enterprise RAG solutions, and/or high volume/on-prem usage of LlamaParse, come [talk to us](https://www.llamaindex.ai/contact).\n\n## Getting Started\n\nFirst, login and get an api-key from [**https://cloud.llamaindex.ai/api-key \u2197**](https://cloud.llamaindex.ai/api-key).\n\nThen, make sure you have the latest LlamaIndex version installed.\n\n**NOTE:** If you are upgrading from v0.9.X, we recommend following our [migration guide](https://pretty-sodium-5e0.notion.site/v0-10-0-Migration-Guide-6ede431dcb8841b09ea171e7f133bd77), as well as uninstalling your previous version first.\n\n```\npip uninstall llama-index  # run this if upgrading from v0.9.x or older\npip install -U llama-index --upgrade --no-cache-dir --force-reinstall\n```\n\nLastly, install the package:\n\n`pip install llama-parse`\n\nNow you can parse your first PDF file using the command line interface. Use the command `llama-parse [file_paths]`. See the help text with `llama-parse --help`.\n\n```bash\nexport LLAMA_CLOUD_API_KEY='llx-...'\n\n# output as text\nllama-parse my_file.pdf --result-type text --output-file output.txt\n\n# output as markdown\nllama-parse my_file.pdf --result-type markdown --output-file output.md\n\n# output as raw json\nllama-parse my_file.pdf --output-raw-json --output-file output.json\n```\n\nYou can also create simple scripts:\n\n```python\nimport nest_asyncio\n\nnest_asyncio.apply()\n\nfrom llama_parse import LlamaParse\n\nparser = LlamaParse(\n    api_key=\"llx-...\",  # can also be set in your env as LLAMA_CLOUD_API_KEY\n    result_type=\"markdown\",  # \"markdown\" and \"text\" are available\n    num_workers=4,  # if multiple files passed, split in `num_workers` API calls\n    verbose=True,\n    language=\"en\",  # Optionally you can define a language, default=en\n)\n\n# sync\ndocuments = parser.load_data(\"./my_file.pdf\")\n\n# sync batch\ndocuments = parser.load_data([\"./my_file1.pdf\", \"./my_file2.pdf\"])\n\n# async\ndocuments = await parser.aload_data(\"./my_file.pdf\")\n\n# async batch\ndocuments = await parser.aload_data([\"./my_file1.pdf\", \"./my_file2.pdf\"])\n```\n\n## Using with file object\n\nYou can parse a file object directly:\n\n```python\nimport nest_asyncio\n\nnest_asyncio.apply()\n\nfrom llama_parse import LlamaParse\n\nparser = LlamaParse(\n    api_key=\"llx-...\",  # can also be set in your env as LLAMA_CLOUD_API_KEY\n    result_type=\"markdown\",  # \"markdown\" and \"text\" are available\n    num_workers=4,  # if multiple files passed, split in `num_workers` API calls\n    verbose=True,\n    language=\"en\",  # Optionally you can define a language, default=en\n)\n\nfile_name = \"my_file1.pdf\"\nextra_info = {\"file_name\": file_name}\n\nwith open(f\"./{file_name}\", \"rb\") as f:\n    # must provide extra_info with file_name key with passing file object\n    documents = parser.load_data(f, extra_info=extra_info)\n\n# you can also pass file bytes directly\nwith open(f\"./{file_name}\", \"rb\") as f:\n    file_bytes = f.read()\n    # must provide extra_info with file_name key with passing file bytes\n    documents = parser.load_data(file_bytes, extra_info=extra_info)\n```\n\n## Using with `SimpleDirectoryReader`\n\nYou can also integrate the parser as the default PDF loader in `SimpleDirectoryReader`:\n\n```python\nimport nest_asyncio\n\nnest_asyncio.apply()\n\nfrom llama_parse import LlamaParse\nfrom llama_index.core import SimpleDirectoryReader\n\nparser = LlamaParse(\n    api_key=\"llx-...\",  # can also be set in your env as LLAMA_CLOUD_API_KEY\n    result_type=\"markdown\",  # \"markdown\" and \"text\" are available\n    verbose=True,\n)\n\nfile_extractor = {\".pdf\": parser}\ndocuments = SimpleDirectoryReader(\n    \"./data\", file_extractor=file_extractor\n).load_data()\n```\n\nFull documentation for `SimpleDirectoryReader` can be found on the [LlamaIndex Documentation](https://docs.llamaindex.ai/en/stable/module_guides/loading/simpledirectoryreader.html).\n\n## Examples\n\nSeveral end-to-end indexing examples can be found in the examples folder\n\n- [Getting Started](/docs/examples-py/parse/demo_basic.ipynb)\n- [Advanced RAG Example](/docs/examples-py/parse/demo_advanced.ipynb)\n- [Raw API Usage](/docs/examples-py/parse/demo_api.ipynb)\n\n## Documentation\n\n[https://docs.cloud.llamaindex.ai/](https://docs.cloud.llamaindex.ai/)\n\n## Terms of Service\n\nSee the [Terms of Service Here](./TOS.pdf).\n\n## Get in Touch (LlamaCloud)\n\nLlamaParse is part of LlamaCloud, our e2e enterprise RAG platform that provides out-of-the-box, production-ready connectors, indexing, and retrieval over your complex data sources. We offer SaaS and VPC options.\n\nLlamaCloud is currently available via waitlist (join by [creating an account](https://cloud.llamaindex.ai/)). If you're interested in state-of-the-art quality and in centralizing your RAG efforts, come [get in touch with us](https://www.llamaindex.ai/contact).\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Parse files into RAG-Optimized formats.",
    "version": "0.6.78",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "8e82bd2f6880d4141536fe27f12103d38aec4caddbca3d8509208b6926860bcc",
                "md5": "aeb4543c0de34e134e5dd5bf07065df0",
                "sha256": "2ea72aa8ecca1c0ae4799d5bce9a0a3a93c5e66c9bb545b5f2ea7888bb327987"
            },
            "downloads": -1,
            "filename": "llama_parse-0.6.78-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "aeb4543c0de34e134e5dd5bf07065df0",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.9",
            "size": 4880,
            "upload_time": "2025-11-04T02:16:21",
            "upload_time_iso_8601": "2025-11-04T02:16:21.382950Z",
            "url": "https://files.pythonhosted.org/packages/8e/82/bd2f6880d4141536fe27f12103d38aec4caddbca3d8509208b6926860bcc/llama_parse-0.6.78-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "e76c906cc53ff9a3fd0aec639de4809b4050664a10f8270de651ce33bbc299b0",
                "md5": "ac2ace4ca27947014c2f3638bbfe3023",
                "sha256": "1a386d165de9a9bbf644a56de23b4935a04c4c78cbbd489821889b16279e62be"
            },
            "downloads": -1,
            "filename": "llama_parse-0.6.78.tar.gz",
            "has_sig": false,
            "md5_digest": "ac2ace4ca27947014c2f3638bbfe3023",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.9",
            "size": 3579,
            "upload_time": "2025-11-04T02:16:22",
            "upload_time_iso_8601": "2025-11-04T02:16:22.308824Z",
            "url": "https://files.pythonhosted.org/packages/e7/6c/906cc53ff9a3fd0aec639de4809b4050664a10f8270de651ce33bbc299b0/llama_parse-0.6.78.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-11-04 02:16:22",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "llama-parse"
}