novalad


Namenovalad JSON
Version 0.1.14 PyPI version JSON
download
home_pagehttps://www.novalad.ai
SummaryNovalad: AI-powered platform for transforming unstructured documents like PDFs and PowerPoints into machine-readable, structured data.
upload_time2025-07-14 05:43:09
maintainerNone
docs_urlNone
authorNovalad
requires_python>=3.8
licenseApache-2.0
keywords ai pdf powerpoint data extraction document parsing unstructured data layout parser
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <p align="center">
  <img src="https://www.novalad.ai/logo.svg" alt="Novalad Logo" width="500"/>
</p>

**Novalad** is an AI-powered platform that transforms chaotic, unstructured filesโ€”such as PDFs and PowerPointsโ€”into beautifully organized, machine-readable data ๐Ÿ’ก. Designed for developers, data teams, and enterprises, Novalad efficiently handles complex layouts, tables, graphs, and multi-format data using a multi-model, map-reduce approach ๐Ÿงฉ.

[View Novalad Extraction Result](https://www.novalad.ai/comparision.html)


---
[![Google Colab](https://img.shields.io/badge/Colab-Notebook-F9AB00?logo=google-colab&logoColor=white)](https://colab.research.google.com/gist/connectaman/92cbb44bcb17a474e32ce9c194effb97/novalad-demo.ipynb)
[![PyPI version](https://img.shields.io/pypi/v/novalad)](https://pypi.org/project/novalad/)
[![Python Version](https://img.shields.io/pypi/pyversions/novalad)](https://pypi.org/project/novalad/)
[![GitHub](https://img.shields.io/badge/GitHub-Repo-181717?logo=github&logoColor=white)](https://github.com/novaladai/novalad)
[![Website](https://img.shields.io/badge/Website-live-blue)](https://www.novalad.ai/)
[![Docs](https://img.shields.io/badge/Documentation-Online-brightgreen)](https://docs.novalad.ai)
[![API Docs](https://img.shields.io/badge/API-Reference-informational)](https://novalad.apidog.io/)
[![YouTube](https://img.shields.io/badge/Watch-Video-red?logo=youtube&logoColor=white)](https://www.youtube.com/watch?v=aoiZqHQ4Um4)
[![License Apache](https://img.shields.io/badge/License-Apache%202.0-blue)](https://www.apache.org/licenses/LICENSE-2.0)
---

## Table of Contents
- [Installation](#installation)
- [Usage](#usage)
  - [Importing and Initializing the Client](#importing-and-initializing-the-client)
  - [Uploading a File from Your Local System](#uploading-a-file-from-your-local-system)
  - [Processing a Document Directly from a URL](#processing-a-document-directly-from-a-url)
  - [Checking Job Status](#checking-job-status)
  - [Retrieving and Rendering Outputs](#retrieving-and-rendering-outputs)
    - [JSON Output](#json-output)
    - [Markdown Output](#markdown-output)
    - [LangChain Document Format Output](#langchain-document-format-output)
    - [Knowledge Graph Output](#knowledge-graph-output)
    - [Rendering the Outputs (Notebooks Only)](#rendering-the-outputs-notebooks-only)
- [Troubleshooting](#troubleshooting)
- [License](#license)
- [Support](#support)

---

## Installation ๐Ÿš€

Install the Novalad package using pip:

```bash
pip install novalad
```

---

## Usage ๐Ÿ“š

1. **Generate API Key**:  
   Log in to [Novalad](https://app.novalad.ai) (https://app.novalad.ai) and generate your API key. Copy the key and keep it handy.

2. **Importing and Initializing the Client**  
   Begin by importing `NovaladClient` from the package and initializing it with your API key:
   You can set `NOVALAD_API_KEY` in env variable or pass it to Client

   ```python
   from novalad import NovaladClient

   # Initialize client with your API key
   client = NovaladClient(api_key="YOUR_API_KEY") # or set env NOVALAD_API_KEY 
   ```

### Uploading a File from Your Local System

If you have a file stored locally (e.g., a PDF document), specify its file path and use the `upload` method to send the file for processing.  
*Note: Only run this code if you are processing a local file. If your file is hosted online (via URL or cloud storage), skip this step.*

```python
# Define the path to your document
path = r"C:\path\to\your\document.pdf"

# Upload the file
client.upload(file_path=path)
```

After uploading your file, trigger the processing job using the `run` method:

```python
# Start processing the uploaded file
client.run()
```

<p align="center">OR</p>

### Processing a Document Directly from a URL

If your document is hosted online (such as in cloud storage or via a public URL), you can process it directly by passing its URL to the `run` method. This approach avoids the local upload step.

```python
# Process document directly by passing the file URL
client.run(
    url="https://d2uars7xkdmztq.cloudfront.net/app_resources/8049/documentation/91320_en.pdf"
)
```

Supported URL Types:
- HTTPS URLs
- AWS S3 pre-signed URLs
- GCP Storage Signed URLs
- Azure Blob HTTPS public URLs

### Checking Job Status

Monitor the status of your processing job by calling the `status` method. The job continues until the status is either `"success"` or `"failed"`:

```python
import time

while True:
    status = client.status()
    if status["status"] in ["success", "failed"]:
        break
    time.sleep(60)  # Check every 30 seconds
    print(".", end="")
print("\n", status)
```

### Retrieving and Rendering Outputs

After the job is complete, you can retrieve and render the results in various formats:

| Format                 | Description                                                                              |
|------------------------|------------------------------------------------------------------------------------------|
| **JSON** ๐Ÿงพ            | Raw layout and structured element data (ideal for developers)                            |
| **Markdown** ๐Ÿ“˜        | Clean, human-readable content for documentation and wikis                                |
| **Knowledge Graph** ๐Ÿ•ธ๏ธ | Visual representation of semantic relations and entities                                 |
| **LangChain Docs** ๐Ÿ”—  | Plug-and-play format optimized for LLM pipelines                                           |

#### JSON Output

Retrieve the raw JSON response containing structured data, metadata, and extracted text:

```python
json_response = client.output(format="json")
print(json_response)
```

#### Markdown Output

Get a Markdown version of the output and render it using the `render_markdown` helper:

```python
markdown_output = client.output(format="markdown")
print(markdown_output)
```

#### LangChain Document Format Output

Retrieve the output as a structured document object for further processing:

```python
documents = client.output(format="document")
print(documents)
```

#### Knowledge Graph Output

Retrieve the relationships and entities within the document as a knowledge graph:

```python
kg_output = client.output(format="graph")
print(kg_output)
```

#### Rendering the Outputs (NOTEBOOK ONLY!!!)

IF YOU ARE USING JUPYTER NOTEBOOK/COLLAB/KAGGLE, YOU CAN RENDER OR VIEW THE OUTPUT FORMATS DIRECTLY IN YOUR NOTEBOOK CELLS


**Render JSON Output**:  
This code renders images displaying the PDF document page-wise with elements and layouts highlighted.  
*Note: You can also save the rendered images to a local directory by passing `save_dir=r"C:\path\to\save\visualization"` to the `render_elements` function.*

```python
from novalad import render_elements

render_elements(path, json_response)
# To save images locally:
# render_elements(path, json_response, save_dir=r"C:\path\to\save\visualization")
```
![Knowledge Graph](static/images/extraction_hightlight.png)


**Render Markdown Output**:

```python
from novalad import render_markdown

render_markdown(markdown_output)
```
![Knowledge Graph](static/images/extraction_markdown.png)

**Render Knowledge Graph**:

```python
from novalad import render_knowledge_graph

render_knowledge_graph(kg_output)
```
![Knowledge Graph](static/images/extraction_kg.png)


---



## Troubleshooting ๐Ÿ› ๏ธ

- **Job Failure**: Verify that your API key is correct and the file path is accessible. Review the status output for error messages.
- **File Path Issues**: Ensure the file path is correctly formatted (use raw strings for Windows paths).
- **URL Issues**: Confirm that the document URL is correct and publicly accessible.
- **API Key Problems**: Verify that your API key is active and valid. If authentication issues persist, please contact support.

for any issue please mail us at info@novalad.ai

---

## License ๐Ÿ“„

This project is licensed under the [Apache License](LICENSE).

---


## Support ๐Ÿ™‹โ€โ™‚๏ธ๐Ÿ™‹โ€โ™€๏ธ

For additional help or to report issues, please refer to the official documentation or contact support at [info@novalad.ai](mailto:info@novalad.ai)

---

<p align="center">Thank you for choosing Novalad! ๐Ÿš€</p>


            

Raw data

            {
    "_id": null,
    "home_page": "https://www.novalad.ai",
    "name": "novalad",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "AI, PDF, PowerPoint, data extraction, document parsing, unstructured data, layout parser",
    "author": "Novalad",
    "author_email": "info@novalad.ai",
    "download_url": "https://files.pythonhosted.org/packages/36/3d/3641876e2208b411f3e9a96ab1aa89a5221c33c6f922e805702e1223f39b/novalad-0.1.14.tar.gz",
    "platform": null,
    "description": "<p align=\"center\">\n  <img src=\"https://www.novalad.ai/logo.svg\" alt=\"Novalad Logo\" width=\"500\"/>\n</p>\n\n**Novalad** is an AI-powered platform that transforms chaotic, unstructured files\u2014such as PDFs and PowerPoints\u2014into beautifully organized, machine-readable data \ud83d\udca1. Designed for developers, data teams, and enterprises, Novalad efficiently handles complex layouts, tables, graphs, and multi-format data using a multi-model, map-reduce approach \ud83e\udde9.\n\n[View Novalad Extraction Result](https://www.novalad.ai/comparision.html)\n\n\n---\n[![Google Colab](https://img.shields.io/badge/Colab-Notebook-F9AB00?logo=google-colab&logoColor=white)](https://colab.research.google.com/gist/connectaman/92cbb44bcb17a474e32ce9c194effb97/novalad-demo.ipynb)\n[![PyPI version](https://img.shields.io/pypi/v/novalad)](https://pypi.org/project/novalad/)\n[![Python Version](https://img.shields.io/pypi/pyversions/novalad)](https://pypi.org/project/novalad/)\n[![GitHub](https://img.shields.io/badge/GitHub-Repo-181717?logo=github&logoColor=white)](https://github.com/novaladai/novalad)\n[![Website](https://img.shields.io/badge/Website-live-blue)](https://www.novalad.ai/)\n[![Docs](https://img.shields.io/badge/Documentation-Online-brightgreen)](https://docs.novalad.ai)\n[![API Docs](https://img.shields.io/badge/API-Reference-informational)](https://novalad.apidog.io/)\n[![YouTube](https://img.shields.io/badge/Watch-Video-red?logo=youtube&logoColor=white)](https://www.youtube.com/watch?v=aoiZqHQ4Um4)\n[![License Apache](https://img.shields.io/badge/License-Apache%202.0-blue)](https://www.apache.org/licenses/LICENSE-2.0)\n---\n\n## Table of Contents\n- [Installation](#installation)\n- [Usage](#usage)\n  - [Importing and Initializing the Client](#importing-and-initializing-the-client)\n  - [Uploading a File from Your Local System](#uploading-a-file-from-your-local-system)\n  - [Processing a Document Directly from a URL](#processing-a-document-directly-from-a-url)\n  - [Checking Job Status](#checking-job-status)\n  - [Retrieving and Rendering Outputs](#retrieving-and-rendering-outputs)\n    - [JSON Output](#json-output)\n    - [Markdown Output](#markdown-output)\n    - [LangChain Document Format Output](#langchain-document-format-output)\n    - [Knowledge Graph Output](#knowledge-graph-output)\n    - [Rendering the Outputs (Notebooks Only)](#rendering-the-outputs-notebooks-only)\n- [Troubleshooting](#troubleshooting)\n- [License](#license)\n- [Support](#support)\n\n---\n\n## Installation \ud83d\ude80\n\nInstall the Novalad package using pip:\n\n```bash\npip install novalad\n```\n\n---\n\n## Usage \ud83d\udcda\n\n1. **Generate API Key**:  \n   Log in to [Novalad](https://app.novalad.ai) (https://app.novalad.ai) and generate your API key. Copy the key and keep it handy.\n\n2. **Importing and Initializing the Client**  \n   Begin by importing `NovaladClient` from the package and initializing it with your API key:\n   You can set `NOVALAD_API_KEY` in env variable or pass it to Client\n\n   ```python\n   from novalad import NovaladClient\n\n   # Initialize client with your API key\n   client = NovaladClient(api_key=\"YOUR_API_KEY\") # or set env NOVALAD_API_KEY \n   ```\n\n### Uploading a File from Your Local System\n\nIf you have a file stored locally (e.g., a PDF document), specify its file path and use the `upload` method to send the file for processing.  \n*Note: Only run this code if you are processing a local file. If your file is hosted online (via URL or cloud storage), skip this step.*\n\n```python\n# Define the path to your document\npath = r\"C:\\path\\to\\your\\document.pdf\"\n\n# Upload the file\nclient.upload(file_path=path)\n```\n\nAfter uploading your file, trigger the processing job using the `run` method:\n\n```python\n# Start processing the uploaded file\nclient.run()\n```\n\n<p align=\"center\">OR</p>\n\n### Processing a Document Directly from a URL\n\nIf your document is hosted online (such as in cloud storage or via a public URL), you can process it directly by passing its URL to the `run` method. This approach avoids the local upload step.\n\n```python\n# Process document directly by passing the file URL\nclient.run(\n    url=\"https://d2uars7xkdmztq.cloudfront.net/app_resources/8049/documentation/91320_en.pdf\"\n)\n```\n\nSupported URL Types:\n- HTTPS URLs\n- AWS S3 pre-signed URLs\n- GCP Storage Signed URLs\n- Azure Blob HTTPS public URLs\n\n### Checking Job Status\n\nMonitor the status of your processing job by calling the `status` method. The job continues until the status is either `\"success\"` or `\"failed\"`:\n\n```python\nimport time\n\nwhile True:\n    status = client.status()\n    if status[\"status\"] in [\"success\", \"failed\"]:\n        break\n    time.sleep(60)  # Check every 30 seconds\n    print(\".\", end=\"\")\nprint(\"\\n\", status)\n```\n\n### Retrieving and Rendering Outputs\n\nAfter the job is complete, you can retrieve and render the results in various formats:\n\n| Format                 | Description                                                                              |\n|------------------------|------------------------------------------------------------------------------------------|\n| **JSON** \ud83e\uddfe            | Raw layout and structured element data (ideal for developers)                            |\n| **Markdown** \ud83d\udcd8        | Clean, human-readable content for documentation and wikis                                |\n| **Knowledge Graph** \ud83d\udd78\ufe0f | Visual representation of semantic relations and entities                                 |\n| **LangChain Docs** \ud83d\udd17  | Plug-and-play format optimized for LLM pipelines                                           |\n\n#### JSON Output\n\nRetrieve the raw JSON response containing structured data, metadata, and extracted text:\n\n```python\njson_response = client.output(format=\"json\")\nprint(json_response)\n```\n\n#### Markdown Output\n\nGet a Markdown version of the output and render it using the `render_markdown` helper:\n\n```python\nmarkdown_output = client.output(format=\"markdown\")\nprint(markdown_output)\n```\n\n#### LangChain Document Format Output\n\nRetrieve the output as a structured document object for further processing:\n\n```python\ndocuments = client.output(format=\"document\")\nprint(documents)\n```\n\n#### Knowledge Graph Output\n\nRetrieve the relationships and entities within the document as a knowledge graph:\n\n```python\nkg_output = client.output(format=\"graph\")\nprint(kg_output)\n```\n\n#### Rendering the Outputs (NOTEBOOK ONLY!!!)\n\nIF YOU ARE USING JUPYTER NOTEBOOK/COLLAB/KAGGLE, YOU CAN RENDER OR VIEW THE OUTPUT FORMATS DIRECTLY IN YOUR NOTEBOOK CELLS\n\n\n**Render JSON Output**:  \nThis code renders images displaying the PDF document page-wise with elements and layouts highlighted.  \n*Note: You can also save the rendered images to a local directory by passing `save_dir=r\"C:\\path\\to\\save\\visualization\"` to the `render_elements` function.*\n\n```python\nfrom novalad import render_elements\n\nrender_elements(path, json_response)\n# To save images locally:\n# render_elements(path, json_response, save_dir=r\"C:\\path\\to\\save\\visualization\")\n```\n![Knowledge Graph](static/images/extraction_hightlight.png)\n\n\n**Render Markdown Output**:\n\n```python\nfrom novalad import render_markdown\n\nrender_markdown(markdown_output)\n```\n![Knowledge Graph](static/images/extraction_markdown.png)\n\n**Render Knowledge Graph**:\n\n```python\nfrom novalad import render_knowledge_graph\n\nrender_knowledge_graph(kg_output)\n```\n![Knowledge Graph](static/images/extraction_kg.png)\n\n\n---\n\n\n\n## Troubleshooting \ud83d\udee0\ufe0f\n\n- **Job Failure**: Verify that your API key is correct and the file path is accessible. Review the status output for error messages.\n- **File Path Issues**: Ensure the file path is correctly formatted (use raw strings for Windows paths).\n- **URL Issues**: Confirm that the document URL is correct and publicly accessible.\n- **API Key Problems**: Verify that your API key is active and valid. If authentication issues persist, please contact support.\n\nfor any issue please mail us at info@novalad.ai\n\n---\n\n## License \ud83d\udcc4\n\nThis project is licensed under the [Apache License](LICENSE).\n\n---\n\n\n## Support \ud83d\ude4b\u200d\u2642\ufe0f\ud83d\ude4b\u200d\u2640\ufe0f\n\nFor additional help or to report issues, please refer to the official documentation or contact support at [info@novalad.ai](mailto:info@novalad.ai)\n\n---\n\n<p align=\"center\">Thank you for choosing Novalad! \ud83d\ude80</p>\n\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Novalad: AI-powered platform for transforming unstructured documents like PDFs and PowerPoints into machine-readable, structured data.",
    "version": "0.1.14",
    "project_urls": {
        "API Docs": "https://novalad.apidog.io/",
        "Documentation": "https://www.novalad.ai",
        "Google Colab": "https://colab.research.google.com/gist/connectaman/92cbb44bcb17a474e32ce9c194effb97/novalad-demo.ipynb",
        "Homepage": "https://www.novalad.ai",
        "Repository": "https://github.com/novaladai"
    },
    "split_keywords": [
        "ai",
        " pdf",
        " powerpoint",
        " data extraction",
        " document parsing",
        " unstructured data",
        " layout parser"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0db897216d5140fcc35e539c31ad4c54f5e04994f3380cc81dbdb278775903d0",
                "md5": "c272c392fc2d581ee9b484c14a49f400",
                "sha256": "ec073f27904e5bbd23fdeaf4faf866d07f70a51c7d0936d6e0275e52f17cfec5"
            },
            "downloads": -1,
            "filename": "novalad-0.1.14-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c272c392fc2d581ee9b484c14a49f400",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 19927,
            "upload_time": "2025-07-14T05:43:08",
            "upload_time_iso_8601": "2025-07-14T05:43:08.294540Z",
            "url": "https://files.pythonhosted.org/packages/0d/b8/97216d5140fcc35e539c31ad4c54f5e04994f3380cc81dbdb278775903d0/novalad-0.1.14-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "363d3641876e2208b411f3e9a96ab1aa89a5221c33c6f922e805702e1223f39b",
                "md5": "93215fa2f677fc56fd70bde2386d62a3",
                "sha256": "57a63f078b301b89a64f9cc4facda2c88d8eae3b0afc39beaa58e91c93d2e30a"
            },
            "downloads": -1,
            "filename": "novalad-0.1.14.tar.gz",
            "has_sig": false,
            "md5_digest": "93215fa2f677fc56fd70bde2386d62a3",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 18897,
            "upload_time": "2025-07-14T05:43:09",
            "upload_time_iso_8601": "2025-07-14T05:43:09.418306Z",
            "url": "https://files.pythonhosted.org/packages/36/3d/3641876e2208b411f3e9a96ab1aa89a5221c33c6f922e805702e1223f39b/novalad-0.1.14.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-14 05:43:09",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "novalad"
}
        
Elapsed time: 0.58511s