<p align="center">
<img src="https://www.novalad.ai/logo.svg" alt="Novalad Logo" width="500"/>
</p>
**Novalad** is an AI-powered platform that transforms chaotic, unstructured filesโsuch as PDFs and PowerPointsโinto beautifully organized, machine-readable data ๐ก. Designed for developers, data teams, and enterprises, Novalad efficiently handles complex layouts, tables, graphs, and multi-format data using a multi-model, map-reduce approach ๐งฉ.
[View Novalad Extraction Result](https://www.novalad.ai/comparision.html)
---
[](https://colab.research.google.com/gist/connectaman/92cbb44bcb17a474e32ce9c194effb97/novalad-demo.ipynb)
[](https://pypi.org/project/novalad/)
[](https://pypi.org/project/novalad/)
[](https://github.com/novaladai/novalad)
[](https://www.novalad.ai/)
[](https://docs.novalad.ai)
[](https://novalad.apidog.io/)
[](https://www.youtube.com/watch?v=aoiZqHQ4Um4)
[](https://www.apache.org/licenses/LICENSE-2.0)
---
## Table of Contents
- [Installation](#installation)
- [Usage](#usage)
- [Importing and Initializing the Client](#importing-and-initializing-the-client)
- [Uploading a File from Your Local System](#uploading-a-file-from-your-local-system)
- [Processing a Document Directly from a URL](#processing-a-document-directly-from-a-url)
- [Checking Job Status](#checking-job-status)
- [Retrieving and Rendering Outputs](#retrieving-and-rendering-outputs)
- [JSON Output](#json-output)
- [Markdown Output](#markdown-output)
- [LangChain Document Format Output](#langchain-document-format-output)
- [Knowledge Graph Output](#knowledge-graph-output)
- [Rendering the Outputs (Notebooks Only)](#rendering-the-outputs-notebooks-only)
- [Troubleshooting](#troubleshooting)
- [License](#license)
- [Support](#support)
---
## Installation ๐
Install the Novalad package using pip:
```bash
pip install novalad
```
---
## Usage ๐
1. **Generate API Key**:
Log in to [Novalad](https://app.novalad.ai) (https://app.novalad.ai) and generate your API key. Copy the key and keep it handy.
2. **Importing and Initializing the Client**
Begin by importing `NovaladClient` from the package and initializing it with your API key:
You can set `NOVALAD_API_KEY` in env variable or pass it to Client
```python
from novalad import NovaladClient
# Initialize client with your API key
client = NovaladClient(api_key="YOUR_API_KEY") # or set env NOVALAD_API_KEY
```
### Uploading a File from Your Local System
If you have a file stored locally (e.g., a PDF document), specify its file path and use the `upload` method to send the file for processing.
*Note: Only run this code if you are processing a local file. If your file is hosted online (via URL or cloud storage), skip this step.*
```python
# Define the path to your document
path = r"C:\path\to\your\document.pdf"
# Upload the file
client.upload(file_path=path)
```
After uploading your file, trigger the processing job using the `run` method:
```python
# Start processing the uploaded file
client.run()
```
<p align="center">OR</p>
### Processing a Document Directly from a URL
If your document is hosted online (such as in cloud storage or via a public URL), you can process it directly by passing its URL to the `run` method. This approach avoids the local upload step.
```python
# Process document directly by passing the file URL
client.run(
url="https://d2uars7xkdmztq.cloudfront.net/app_resources/8049/documentation/91320_en.pdf"
)
```
Supported URL Types:
- HTTPS URLs
- AWS S3 pre-signed URLs
- GCP Storage Signed URLs
- Azure Blob HTTPS public URLs
### Checking Job Status
Monitor the status of your processing job by calling the `status` method. The job continues until the status is either `"success"` or `"failed"`:
```python
import time
while True:
status = client.status()
if status["status"] in ["success", "failed"]:
break
time.sleep(60) # Check every 30 seconds
print(".", end="")
print("\n", status)
```
### Retrieving and Rendering Outputs
After the job is complete, you can retrieve and render the results in various formats:
| Format | Description |
|------------------------|------------------------------------------------------------------------------------------|
| **JSON** ๐งพ | Raw layout and structured element data (ideal for developers) |
| **Markdown** ๐ | Clean, human-readable content for documentation and wikis |
| **Knowledge Graph** ๐ธ๏ธ | Visual representation of semantic relations and entities |
| **LangChain Docs** ๐ | Plug-and-play format optimized for LLM pipelines |
#### JSON Output
Retrieve the raw JSON response containing structured data, metadata, and extracted text:
```python
json_response = client.output(format="json")
print(json_response)
```
#### Markdown Output
Get a Markdown version of the output and render it using the `render_markdown` helper:
```python
markdown_output = client.output(format="markdown")
print(markdown_output)
```
#### LangChain Document Format Output
Retrieve the output as a structured document object for further processing:
```python
documents = client.output(format="document")
print(documents)
```
#### Knowledge Graph Output
Retrieve the relationships and entities within the document as a knowledge graph:
```python
kg_output = client.output(format="graph")
print(kg_output)
```
#### Rendering the Outputs (NOTEBOOK ONLY!!!)
IF YOU ARE USING JUPYTER NOTEBOOK/COLLAB/KAGGLE, YOU CAN RENDER OR VIEW THE OUTPUT FORMATS DIRECTLY IN YOUR NOTEBOOK CELLS
**Render JSON Output**:
This code renders images displaying the PDF document page-wise with elements and layouts highlighted.
*Note: You can also save the rendered images to a local directory by passing `save_dir=r"C:\path\to\save\visualization"` to the `render_elements` function.*
```python
from novalad import render_elements
render_elements(path, json_response)
# To save images locally:
# render_elements(path, json_response, save_dir=r"C:\path\to\save\visualization")
```

**Render Markdown Output**:
```python
from novalad import render_markdown
render_markdown(markdown_output)
```

**Render Knowledge Graph**:
```python
from novalad import render_knowledge_graph
render_knowledge_graph(kg_output)
```

---
## Troubleshooting ๐ ๏ธ
- **Job Failure**: Verify that your API key is correct and the file path is accessible. Review the status output for error messages.
- **File Path Issues**: Ensure the file path is correctly formatted (use raw strings for Windows paths).
- **URL Issues**: Confirm that the document URL is correct and publicly accessible.
- **API Key Problems**: Verify that your API key is active and valid. If authentication issues persist, please contact support.
for any issue please mail us at info@novalad.ai
---
## License ๐
This project is licensed under the [Apache License](LICENSE).
---
## Support ๐โโ๏ธ๐โโ๏ธ
For additional help or to report issues, please refer to the official documentation or contact support at [info@novalad.ai](mailto:info@novalad.ai)
---
<p align="center">Thank you for choosing Novalad! ๐</p>
Raw data
{
"_id": null,
"home_page": "https://www.novalad.ai",
"name": "novalad",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "AI, PDF, PowerPoint, data extraction, document parsing, unstructured data, layout parser",
"author": "Novalad",
"author_email": "info@novalad.ai",
"download_url": "https://files.pythonhosted.org/packages/36/3d/3641876e2208b411f3e9a96ab1aa89a5221c33c6f922e805702e1223f39b/novalad-0.1.14.tar.gz",
"platform": null,
"description": "<p align=\"center\">\n <img src=\"https://www.novalad.ai/logo.svg\" alt=\"Novalad Logo\" width=\"500\"/>\n</p>\n\n**Novalad** is an AI-powered platform that transforms chaotic, unstructured files\u2014such as PDFs and PowerPoints\u2014into beautifully organized, machine-readable data \ud83d\udca1. Designed for developers, data teams, and enterprises, Novalad efficiently handles complex layouts, tables, graphs, and multi-format data using a multi-model, map-reduce approach \ud83e\udde9.\n\n[View Novalad Extraction Result](https://www.novalad.ai/comparision.html)\n\n\n---\n[](https://colab.research.google.com/gist/connectaman/92cbb44bcb17a474e32ce9c194effb97/novalad-demo.ipynb)\n[](https://pypi.org/project/novalad/)\n[](https://pypi.org/project/novalad/)\n[](https://github.com/novaladai/novalad)\n[](https://www.novalad.ai/)\n[](https://docs.novalad.ai)\n[](https://novalad.apidog.io/)\n[](https://www.youtube.com/watch?v=aoiZqHQ4Um4)\n[](https://www.apache.org/licenses/LICENSE-2.0)\n---\n\n## Table of Contents\n- [Installation](#installation)\n- [Usage](#usage)\n - [Importing and Initializing the Client](#importing-and-initializing-the-client)\n - [Uploading a File from Your Local System](#uploading-a-file-from-your-local-system)\n - [Processing a Document Directly from a URL](#processing-a-document-directly-from-a-url)\n - [Checking Job Status](#checking-job-status)\n - [Retrieving and Rendering Outputs](#retrieving-and-rendering-outputs)\n - [JSON Output](#json-output)\n - [Markdown Output](#markdown-output)\n - [LangChain Document Format Output](#langchain-document-format-output)\n - [Knowledge Graph Output](#knowledge-graph-output)\n - [Rendering the Outputs (Notebooks Only)](#rendering-the-outputs-notebooks-only)\n- [Troubleshooting](#troubleshooting)\n- [License](#license)\n- [Support](#support)\n\n---\n\n## Installation \ud83d\ude80\n\nInstall the Novalad package using pip:\n\n```bash\npip install novalad\n```\n\n---\n\n## Usage \ud83d\udcda\n\n1. **Generate API Key**: \n Log in to [Novalad](https://app.novalad.ai) (https://app.novalad.ai) and generate your API key. Copy the key and keep it handy.\n\n2. **Importing and Initializing the Client** \n Begin by importing `NovaladClient` from the package and initializing it with your API key:\n You can set `NOVALAD_API_KEY` in env variable or pass it to Client\n\n ```python\n from novalad import NovaladClient\n\n # Initialize client with your API key\n client = NovaladClient(api_key=\"YOUR_API_KEY\") # or set env NOVALAD_API_KEY \n ```\n\n### Uploading a File from Your Local System\n\nIf you have a file stored locally (e.g., a PDF document), specify its file path and use the `upload` method to send the file for processing. \n*Note: Only run this code if you are processing a local file. If your file is hosted online (via URL or cloud storage), skip this step.*\n\n```python\n# Define the path to your document\npath = r\"C:\\path\\to\\your\\document.pdf\"\n\n# Upload the file\nclient.upload(file_path=path)\n```\n\nAfter uploading your file, trigger the processing job using the `run` method:\n\n```python\n# Start processing the uploaded file\nclient.run()\n```\n\n<p align=\"center\">OR</p>\n\n### Processing a Document Directly from a URL\n\nIf your document is hosted online (such as in cloud storage or via a public URL), you can process it directly by passing its URL to the `run` method. This approach avoids the local upload step.\n\n```python\n# Process document directly by passing the file URL\nclient.run(\n url=\"https://d2uars7xkdmztq.cloudfront.net/app_resources/8049/documentation/91320_en.pdf\"\n)\n```\n\nSupported URL Types:\n- HTTPS URLs\n- AWS S3 pre-signed URLs\n- GCP Storage Signed URLs\n- Azure Blob HTTPS public URLs\n\n### Checking Job Status\n\nMonitor the status of your processing job by calling the `status` method. The job continues until the status is either `\"success\"` or `\"failed\"`:\n\n```python\nimport time\n\nwhile True:\n status = client.status()\n if status[\"status\"] in [\"success\", \"failed\"]:\n break\n time.sleep(60) # Check every 30 seconds\n print(\".\", end=\"\")\nprint(\"\\n\", status)\n```\n\n### Retrieving and Rendering Outputs\n\nAfter the job is complete, you can retrieve and render the results in various formats:\n\n| Format | Description |\n|------------------------|------------------------------------------------------------------------------------------|\n| **JSON** \ud83e\uddfe | Raw layout and structured element data (ideal for developers) |\n| **Markdown** \ud83d\udcd8 | Clean, human-readable content for documentation and wikis |\n| **Knowledge Graph** \ud83d\udd78\ufe0f | Visual representation of semantic relations and entities |\n| **LangChain Docs** \ud83d\udd17 | Plug-and-play format optimized for LLM pipelines |\n\n#### JSON Output\n\nRetrieve the raw JSON response containing structured data, metadata, and extracted text:\n\n```python\njson_response = client.output(format=\"json\")\nprint(json_response)\n```\n\n#### Markdown Output\n\nGet a Markdown version of the output and render it using the `render_markdown` helper:\n\n```python\nmarkdown_output = client.output(format=\"markdown\")\nprint(markdown_output)\n```\n\n#### LangChain Document Format Output\n\nRetrieve the output as a structured document object for further processing:\n\n```python\ndocuments = client.output(format=\"document\")\nprint(documents)\n```\n\n#### Knowledge Graph Output\n\nRetrieve the relationships and entities within the document as a knowledge graph:\n\n```python\nkg_output = client.output(format=\"graph\")\nprint(kg_output)\n```\n\n#### Rendering the Outputs (NOTEBOOK ONLY!!!)\n\nIF YOU ARE USING JUPYTER NOTEBOOK/COLLAB/KAGGLE, YOU CAN RENDER OR VIEW THE OUTPUT FORMATS DIRECTLY IN YOUR NOTEBOOK CELLS\n\n\n**Render JSON Output**: \nThis code renders images displaying the PDF document page-wise with elements and layouts highlighted. \n*Note: You can also save the rendered images to a local directory by passing `save_dir=r\"C:\\path\\to\\save\\visualization\"` to the `render_elements` function.*\n\n```python\nfrom novalad import render_elements\n\nrender_elements(path, json_response)\n# To save images locally:\n# render_elements(path, json_response, save_dir=r\"C:\\path\\to\\save\\visualization\")\n```\n\n\n\n**Render Markdown Output**:\n\n```python\nfrom novalad import render_markdown\n\nrender_markdown(markdown_output)\n```\n\n\n**Render Knowledge Graph**:\n\n```python\nfrom novalad import render_knowledge_graph\n\nrender_knowledge_graph(kg_output)\n```\n\n\n\n---\n\n\n\n## Troubleshooting \ud83d\udee0\ufe0f\n\n- **Job Failure**: Verify that your API key is correct and the file path is accessible. Review the status output for error messages.\n- **File Path Issues**: Ensure the file path is correctly formatted (use raw strings for Windows paths).\n- **URL Issues**: Confirm that the document URL is correct and publicly accessible.\n- **API Key Problems**: Verify that your API key is active and valid. If authentication issues persist, please contact support.\n\nfor any issue please mail us at info@novalad.ai\n\n---\n\n## License \ud83d\udcc4\n\nThis project is licensed under the [Apache License](LICENSE).\n\n---\n\n\n## Support \ud83d\ude4b\u200d\u2642\ufe0f\ud83d\ude4b\u200d\u2640\ufe0f\n\nFor additional help or to report issues, please refer to the official documentation or contact support at [info@novalad.ai](mailto:info@novalad.ai)\n\n---\n\n<p align=\"center\">Thank you for choosing Novalad! \ud83d\ude80</p>\n\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "Novalad: AI-powered platform for transforming unstructured documents like PDFs and PowerPoints into machine-readable, structured data.",
"version": "0.1.14",
"project_urls": {
"API Docs": "https://novalad.apidog.io/",
"Documentation": "https://www.novalad.ai",
"Google Colab": "https://colab.research.google.com/gist/connectaman/92cbb44bcb17a474e32ce9c194effb97/novalad-demo.ipynb",
"Homepage": "https://www.novalad.ai",
"Repository": "https://github.com/novaladai"
},
"split_keywords": [
"ai",
" pdf",
" powerpoint",
" data extraction",
" document parsing",
" unstructured data",
" layout parser"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "0db897216d5140fcc35e539c31ad4c54f5e04994f3380cc81dbdb278775903d0",
"md5": "c272c392fc2d581ee9b484c14a49f400",
"sha256": "ec073f27904e5bbd23fdeaf4faf866d07f70a51c7d0936d6e0275e52f17cfec5"
},
"downloads": -1,
"filename": "novalad-0.1.14-py3-none-any.whl",
"has_sig": false,
"md5_digest": "c272c392fc2d581ee9b484c14a49f400",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 19927,
"upload_time": "2025-07-14T05:43:08",
"upload_time_iso_8601": "2025-07-14T05:43:08.294540Z",
"url": "https://files.pythonhosted.org/packages/0d/b8/97216d5140fcc35e539c31ad4c54f5e04994f3380cc81dbdb278775903d0/novalad-0.1.14-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "363d3641876e2208b411f3e9a96ab1aa89a5221c33c6f922e805702e1223f39b",
"md5": "93215fa2f677fc56fd70bde2386d62a3",
"sha256": "57a63f078b301b89a64f9cc4facda2c88d8eae3b0afc39beaa58e91c93d2e30a"
},
"downloads": -1,
"filename": "novalad-0.1.14.tar.gz",
"has_sig": false,
"md5_digest": "93215fa2f677fc56fd70bde2386d62a3",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 18897,
"upload_time": "2025-07-14T05:43:09",
"upload_time_iso_8601": "2025-07-14T05:43:09.418306Z",
"url": "https://files.pythonhosted.org/packages/36/3d/3641876e2208b411f3e9a96ab1aa89a5221c33c6f922e805702e1223f39b/novalad-0.1.14.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-14 05:43:09",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "novalad"
}