<p align="center">
  <img src="https://www.novalad.ai/logo.svg" alt="Novalad Logo" width="500"/>
</p>
**Novalad** is an AI-powered platform that transforms chaotic, unstructured filesโsuch as PDFs and PowerPointsโinto beautifully organized, machine-readable data ๐ก. Designed for developers, data teams, and enterprises, Novalad efficiently handles complex layouts, tables, graphs, and multi-format data using a multi-model, map-reduce approach ๐งฉ.
[View Novalad Extraction Result](https://www.novalad.ai/comparision.html)
---
[](https://colab.research.google.com/gist/connectaman/92cbb44bcb17a474e32ce9c194effb97/novalad-demo.ipynb)
[](https://pypi.org/project/novalad/)
[](https://pypi.org/project/novalad/)
[](https://github.com/novaladai/novalad)
[](https://www.novalad.ai/)
[](https://docs.novalad.ai)
[](https://novalad.apidog.io/)
[](https://www.youtube.com/watch?v=aoiZqHQ4Um4)
[](https://www.apache.org/licenses/LICENSE-2.0)
---
## Table of Contents
- [Installation](#installation)
- [Usage](#usage)
  - [Importing and Initializing the Client](#importing-and-initializing-the-client)
  - [Uploading a File from Your Local System](#uploading-a-file-from-your-local-system)
  - [Processing a Document Directly from a URL](#processing-a-document-directly-from-a-url)
  - [Checking Job Status](#checking-job-status)
  - [Retrieving and Rendering Outputs](#retrieving-and-rendering-outputs)
    - [JSON Output](#json-output)
    - [Markdown Output](#markdown-output)
    - [LangChain Document Format Output](#langchain-document-format-output)
    - [Knowledge Graph Output](#knowledge-graph-output)
    - [Rendering the Outputs (Notebooks Only)](#rendering-the-outputs-notebooks-only)
- [Troubleshooting](#troubleshooting)
- [License](#license)
- [Support](#support)
---
## Installation ๐
Install the Novalad package using pip:
```bash
pip install novalad
```
---
## Usage ๐
1. **Generate API Key**:  
   Log in to [Novalad](https://app.novalad.ai) (https://app.novalad.ai) and generate your API key. Copy the key and keep it handy.
2. **Importing and Initializing the Client**  
   Begin by importing `NovaladClient` from the package and initializing it with your API key:
   You can set `NOVALAD_API_KEY` in env variable or pass it to Client
   ```python
   from novalad import NovaladClient
   # Initialize client with your API key
   client = NovaladClient(api_key="YOUR_API_KEY") # or set env NOVALAD_API_KEY 
   ```
### Uploading a File from Your Local System
If you have a file stored locally (e.g., a PDF document), specify its file path and use the `upload` method to send the file for processing.  
*Note: Only run this code if you are processing a local file. If your file is hosted online (via URL or cloud storage), skip this step.*
```python
# Define the path to your document
path = r"C:\path\to\your\document.pdf"
# Upload the file
client.upload(file_path=path)
```
After uploading your file, trigger the processing job using the `run` method:
```python
# Start processing the uploaded file
client.run()
```
<p align="center">OR</p>
### Processing a Document Directly from a URL
If your document is hosted online (such as in cloud storage or via a public URL), you can process it directly by passing its URL to the `run` method. This approach avoids the local upload step.
```python
# Process document directly by passing the file URL
client.run(
    url="https://d2uars7xkdmztq.cloudfront.net/app_resources/8049/documentation/91320_en.pdf"
)
```
Supported URL Types:
- HTTPS URLs
- AWS S3 pre-signed URLs
- GCP Storage Signed URLs
- Azure Blob HTTPS public URLs
### Checking Job Status
Monitor the status of your processing job by calling the `status` method. The job continues until the status is either `"success"` or `"failed"`:
```python
import time
while True:
    status = client.status()
    if status["status"] in ["success", "failed"]:
        break
    time.sleep(60)  # Check every 30 seconds
    print(".", end="")
print("\n", status)
```
### Retrieving and Rendering Outputs
After the job is complete, you can retrieve and render the results in various formats:
| Format                 | Description                                                                              |
|------------------------|------------------------------------------------------------------------------------------|
| **JSON** ๐งพ            | Raw layout and structured element data (ideal for developers)                            |
| **Markdown** ๐        | Clean, human-readable content for documentation and wikis                                |
| **Knowledge Graph** ๐ธ๏ธ | Visual representation of semantic relations and entities                                 |
| **LangChain Docs** ๐  | Plug-and-play format optimized for LLM pipelines                                           |
#### JSON Output
Retrieve the raw JSON response containing structured data, metadata, and extracted text:
```python
json_response = client.output(format="json")
print(json_response)
```
#### Markdown Output
Get a Markdown version of the output and render it using the `render_markdown` helper:
```python
markdown_output = client.output(format="markdown")
print(markdown_output)
```
#### LangChain Document Format Output
Retrieve the output as a structured document object for further processing:
```python
documents = client.output(format="document")
print(documents)
```
#### Knowledge Graph Output
Retrieve the relationships and entities within the document as a knowledge graph:
```python
kg_output = client.output(format="graph")
print(kg_output)
```
#### Rendering the Outputs (NOTEBOOK ONLY!!!)
IF YOU ARE USING JUPYTER NOTEBOOK/COLLAB/KAGGLE, YOU CAN RENDER OR VIEW THE OUTPUT FORMATS DIRECTLY IN YOUR NOTEBOOK CELLS
**Render JSON Output**:  
This code renders images displaying the PDF document page-wise with elements and layouts highlighted.  
*Note: You can also save the rendered images to a local directory by passing `save_dir=r"C:\path\to\save\visualization"` to the `render_elements` function.*
```python
from novalad import render_elements
render_elements(path, json_response)
# To save images locally:
# render_elements(path, json_response, save_dir=r"C:\path\to\save\visualization")
```

**Render Markdown Output**:
```python
from novalad import render_markdown
render_markdown(markdown_output)
```

**Render Knowledge Graph**:
```python
from novalad import render_knowledge_graph
render_knowledge_graph(kg_output)
```

---
## Troubleshooting ๐ ๏ธ
- **Job Failure**: Verify that your API key is correct and the file path is accessible. Review the status output for error messages.
- **File Path Issues**: Ensure the file path is correctly formatted (use raw strings for Windows paths).
- **URL Issues**: Confirm that the document URL is correct and publicly accessible.
- **API Key Problems**: Verify that your API key is active and valid. If authentication issues persist, please contact support.
for any issue please mail us at info@novalad.ai
---
## License ๐
This project is licensed under the [Apache License](LICENSE).
---
## Support ๐โโ๏ธ๐โโ๏ธ
For additional help or to report issues, please refer to the official documentation or contact support at [info@novalad.ai](mailto:info@novalad.ai)
---
<p align="center">Thank you for choosing Novalad! ๐</p>
            
         
        Raw data
        
            {
    "_id": null,
    "home_page": "https://www.novalad.ai",
    "name": "novalad",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "AI, PDF, PowerPoint, data extraction, document parsing, unstructured data, layout parser",
    "author": "Novalad",
    "author_email": "info@novalad.ai",
    "download_url": "https://files.pythonhosted.org/packages/66/0c/292b939f24fec72d668b7cd03d1feaf4252beaa3e4abd8adf29d8e88afeb/novalad-0.1.16.tar.gz",
    "platform": null,
    "description": "<p align=\"center\">\n  <img src=\"https://www.novalad.ai/logo.svg\" alt=\"Novalad Logo\" width=\"500\"/>\n</p>\n\n**Novalad** is an AI-powered platform that transforms chaotic, unstructured files\u2014such as PDFs and PowerPoints\u2014into beautifully organized, machine-readable data \ud83d\udca1. Designed for developers, data teams, and enterprises, Novalad efficiently handles complex layouts, tables, graphs, and multi-format data using a multi-model, map-reduce approach \ud83e\udde9.\n\n[View Novalad Extraction Result](https://www.novalad.ai/comparision.html)\n\n\n---\n[](https://colab.research.google.com/gist/connectaman/92cbb44bcb17a474e32ce9c194effb97/novalad-demo.ipynb)\n[](https://pypi.org/project/novalad/)\n[](https://pypi.org/project/novalad/)\n[](https://github.com/novaladai/novalad)\n[](https://www.novalad.ai/)\n[](https://docs.novalad.ai)\n[](https://novalad.apidog.io/)\n[](https://www.youtube.com/watch?v=aoiZqHQ4Um4)\n[](https://www.apache.org/licenses/LICENSE-2.0)\n---\n\n## Table of Contents\n- [Installation](#installation)\n- [Usage](#usage)\n  - [Importing and Initializing the Client](#importing-and-initializing-the-client)\n  - [Uploading a File from Your Local System](#uploading-a-file-from-your-local-system)\n  - [Processing a Document Directly from a URL](#processing-a-document-directly-from-a-url)\n  - [Checking Job Status](#checking-job-status)\n  - [Retrieving and Rendering Outputs](#retrieving-and-rendering-outputs)\n    - [JSON Output](#json-output)\n    - [Markdown Output](#markdown-output)\n    - [LangChain Document Format Output](#langchain-document-format-output)\n    - [Knowledge Graph Output](#knowledge-graph-output)\n    - [Rendering the Outputs (Notebooks Only)](#rendering-the-outputs-notebooks-only)\n- [Troubleshooting](#troubleshooting)\n- [License](#license)\n- [Support](#support)\n\n---\n\n## Installation \ud83d\ude80\n\nInstall the Novalad package using pip:\n\n```bash\npip install novalad\n```\n\n---\n\n## Usage \ud83d\udcda\n\n1. **Generate API Key**:  \n   Log in to [Novalad](https://app.novalad.ai) (https://app.novalad.ai) and generate your API key. Copy the key and keep it handy.\n\n2. **Importing and Initializing the Client**  \n   Begin by importing `NovaladClient` from the package and initializing it with your API key:\n   You can set `NOVALAD_API_KEY` in env variable or pass it to Client\n\n   ```python\n   from novalad import NovaladClient\n\n   # Initialize client with your API key\n   client = NovaladClient(api_key=\"YOUR_API_KEY\") # or set env NOVALAD_API_KEY \n   ```\n\n### Uploading a File from Your Local System\n\nIf you have a file stored locally (e.g., a PDF document), specify its file path and use the `upload` method to send the file for processing.  \n*Note: Only run this code if you are processing a local file. If your file is hosted online (via URL or cloud storage), skip this step.*\n\n```python\n# Define the path to your document\npath = r\"C:\\path\\to\\your\\document.pdf\"\n\n# Upload the file\nclient.upload(file_path=path)\n```\n\nAfter uploading your file, trigger the processing job using the `run` method:\n\n```python\n# Start processing the uploaded file\nclient.run()\n```\n\n<p align=\"center\">OR</p>\n\n### Processing a Document Directly from a URL\n\nIf your document is hosted online (such as in cloud storage or via a public URL), you can process it directly by passing its URL to the `run` method. This approach avoids the local upload step.\n\n```python\n# Process document directly by passing the file URL\nclient.run(\n    url=\"https://d2uars7xkdmztq.cloudfront.net/app_resources/8049/documentation/91320_en.pdf\"\n)\n```\n\nSupported URL Types:\n- HTTPS URLs\n- AWS S3 pre-signed URLs\n- GCP Storage Signed URLs\n- Azure Blob HTTPS public URLs\n\n### Checking Job Status\n\nMonitor the status of your processing job by calling the `status` method. The job continues until the status is either `\"success\"` or `\"failed\"`:\n\n```python\nimport time\n\nwhile True:\n    status = client.status()\n    if status[\"status\"] in [\"success\", \"failed\"]:\n        break\n    time.sleep(60)  # Check every 30 seconds\n    print(\".\", end=\"\")\nprint(\"\\n\", status)\n```\n\n### Retrieving and Rendering Outputs\n\nAfter the job is complete, you can retrieve and render the results in various formats:\n\n| Format                 | Description                                                                              |\n|------------------------|------------------------------------------------------------------------------------------|\n| **JSON** \ud83e\uddfe            | Raw layout and structured element data (ideal for developers)                            |\n| **Markdown** \ud83d\udcd8        | Clean, human-readable content for documentation and wikis                                |\n| **Knowledge Graph** \ud83d\udd78\ufe0f | Visual representation of semantic relations and entities                                 |\n| **LangChain Docs** \ud83d\udd17  | Plug-and-play format optimized for LLM pipelines                                           |\n\n#### JSON Output\n\nRetrieve the raw JSON response containing structured data, metadata, and extracted text:\n\n```python\njson_response = client.output(format=\"json\")\nprint(json_response)\n```\n\n#### Markdown Output\n\nGet a Markdown version of the output and render it using the `render_markdown` helper:\n\n```python\nmarkdown_output = client.output(format=\"markdown\")\nprint(markdown_output)\n```\n\n#### LangChain Document Format Output\n\nRetrieve the output as a structured document object for further processing:\n\n```python\ndocuments = client.output(format=\"document\")\nprint(documents)\n```\n\n#### Knowledge Graph Output\n\nRetrieve the relationships and entities within the document as a knowledge graph:\n\n```python\nkg_output = client.output(format=\"graph\")\nprint(kg_output)\n```\n\n#### Rendering the Outputs (NOTEBOOK ONLY!!!)\n\nIF YOU ARE USING JUPYTER NOTEBOOK/COLLAB/KAGGLE, YOU CAN RENDER OR VIEW THE OUTPUT FORMATS DIRECTLY IN YOUR NOTEBOOK CELLS\n\n\n**Render JSON Output**:  \nThis code renders images displaying the PDF document page-wise with elements and layouts highlighted.  \n*Note: You can also save the rendered images to a local directory by passing `save_dir=r\"C:\\path\\to\\save\\visualization\"` to the `render_elements` function.*\n\n```python\nfrom novalad import render_elements\n\nrender_elements(path, json_response)\n# To save images locally:\n# render_elements(path, json_response, save_dir=r\"C:\\path\\to\\save\\visualization\")\n```\n\n\n\n**Render Markdown Output**:\n\n```python\nfrom novalad import render_markdown\n\nrender_markdown(markdown_output)\n```\n\n\n**Render Knowledge Graph**:\n\n```python\nfrom novalad import render_knowledge_graph\n\nrender_knowledge_graph(kg_output)\n```\n\n\n\n---\n\n\n\n## Troubleshooting \ud83d\udee0\ufe0f\n\n- **Job Failure**: Verify that your API key is correct and the file path is accessible. Review the status output for error messages.\n- **File Path Issues**: Ensure the file path is correctly formatted (use raw strings for Windows paths).\n- **URL Issues**: Confirm that the document URL is correct and publicly accessible.\n- **API Key Problems**: Verify that your API key is active and valid. If authentication issues persist, please contact support.\n\nfor any issue please mail us at info@novalad.ai\n\n---\n\n## License \ud83d\udcc4\n\nThis project is licensed under the [Apache License](LICENSE).\n\n---\n\n\n## Support \ud83d\ude4b\u200d\u2642\ufe0f\ud83d\ude4b\u200d\u2640\ufe0f\n\nFor additional help or to report issues, please refer to the official documentation or contact support at [info@novalad.ai](mailto:info@novalad.ai)\n\n---\n\n<p align=\"center\">Thank you for choosing Novalad! \ud83d\ude80</p>\n\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Novalad: AI-powered platform for transforming unstructured documents like PDFs and PowerPoints into machine-readable, structured data.",
    "version": "0.1.16",
    "project_urls": {
        "API Docs": "https://novalad.apidog.io/",
        "Documentation": "https://www.novalad.ai",
        "Google Colab": "https://colab.research.google.com/gist/connectaman/92cbb44bcb17a474e32ce9c194effb97/novalad-demo.ipynb",
        "Homepage": "https://www.novalad.ai",
        "Repository": "https://github.com/novaladai"
    },
    "split_keywords": [
        "ai",
        " pdf",
        " powerpoint",
        " data extraction",
        " document parsing",
        " unstructured data",
        " layout parser"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "aaf467516c8cd6f146e6f47890866d3154d8236911b82654ca7a4edaca4d239d",
                "md5": "1f2e5d471281640a48a8ff6b9c99cdb1",
                "sha256": "29d2337ab343594d650fc9b9797a4ba710dc0e8e29010451aba1b44b8c10537e"
            },
            "downloads": -1,
            "filename": "novalad-0.1.16-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1f2e5d471281640a48a8ff6b9c99cdb1",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 20045,
            "upload_time": "2025-09-02T05:40:59",
            "upload_time_iso_8601": "2025-09-02T05:40:59.906909Z",
            "url": "https://files.pythonhosted.org/packages/aa/f4/67516c8cd6f146e6f47890866d3154d8236911b82654ca7a4edaca4d239d/novalad-0.1.16-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "660c292b939f24fec72d668b7cd03d1feaf4252beaa3e4abd8adf29d8e88afeb",
                "md5": "b3343f2e57e5f136f50e7d5d50d97031",
                "sha256": "48e876a00c4dca5387580319a79c3688ba1890bebda93dbd934e091d55dc3e7f"
            },
            "downloads": -1,
            "filename": "novalad-0.1.16.tar.gz",
            "has_sig": false,
            "md5_digest": "b3343f2e57e5f136f50e7d5d50d97031",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 18989,
            "upload_time": "2025-09-02T05:41:01",
            "upload_time_iso_8601": "2025-09-02T05:41:01.216989Z",
            "url": "https://files.pythonhosted.org/packages/66/0c/292b939f24fec72d668b7cd03d1feaf4252beaa3e4abd8adf29d8e88afeb/novalad-0.1.16.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-02 05:41:01",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "novalad"
}