Name | chunkr-ai JSON |
Version |
0.0.5
JSON |
| download |
home_page | None |
Summary | Python client for Chunkr: open source document intelligence |
upload_time | 2025-01-10 01:52:19 |
maintainer | None |
docs_url | None |
author | None |
requires_python | None |
license | None |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# Chunkr Python Client
This provides a simple interface to interact with the Chunkr API.
## Getting Started
You can get an API key from [Chunkr](https://chunkr.ai) or deploy your own Chunkr instance. For self-hosted deployment options, check out our [deployment guide](https://github.com/lumina-ai-inc/chunkr/tree/main?tab=readme-ov-file#self-hosted-deployment-options).
For more information about the API and its capabilities, visit the [Chunkr API docs](https://docs.chunkr.ai).
## Installation
```bash
pip install chunkr-ai
```
## Usage
We provide two clients: `Chunkr` for synchronous operations and `ChunkrAsync` for asynchronous operations.
### Synchronous Usage
```python
from chunkr_ai import Chunkr
# Initialize client
chunkr = Chunkr()
# Upload a file and wait for processing
task = chunkr.upload("document.pdf")
# Print the response
print(task)
# Get output from task
output = task.output
# If you want to upload without waiting for processing
task = chunkr.start_upload("document.pdf")
# ... do other things ...
task.poll() # Check status when needed
```
### Asynchronous Usage
```python
from chunkr_ai import ChunkrAsync
async def process_document():
# Initialize client
chunkr = ChunkrAsync()
# Upload a file and wait for processing
task = await chunkr.upload("document.pdf")
# Print the response
print(task)
# Get output from task
output = task.output
# If you want to upload without waiting for processing
task = await chunkr.start_upload("document.pdf")
# ... do other things ...
await task.poll_async() # Check status when needed
```
### Additional Features
Both clients support various input types:
```python
# Upload from file path
chunkr.upload("document.pdf")
# Upload from opened file
with open("document.pdf", "rb") as f:
chunkr.upload(f)
# Upload from URL
chunkr.upload("https://example.com/document.pdf")
# Upload from base64 string
chunkr.upload("data:application/pdf;base64,JVBERi0xLjcKCjEgMCBvYmo...")
# Upload an image
from PIL import Image
img = Image.open("photo.jpg")
chunkr.upload(img)
```
### Configuration
You can customize the processing behavior by passing a `Configuration` object:
```python
from chunkr_ai.models import Configuration, OcrStrategy, SegmentationStrategy, GenerationStrategy
# Basic configuration
config = Configuration(
ocr_strategy=OcrStrategy.AUTO,
segmentation_strategy=SegmentationStrategy.LAYOUT_ANALYSIS,
high_resolution=True,
expires_in=3600, # seconds
)
# Upload with configuration
task = chunkr.upload("document.pdf", config)
```
#### Available Configuration Examples
- **Chunk Processing**
```python
from chunkr_ai.models import ChunkProcessing
config = Configuration(
chunk_processing=ChunkProcessing(target_length=1024)
)
```
- **Expires In**
```python
config = Configuration(expires_in=3600)
```
- **High Resolution**
```python
config = Configuration(high_resolution=True)
```
- **JSON Schema**
```python
config = Configuration(json_schema=JsonSchema(
title="Sales Data",
properties=[
Property(name="Person with highest sales", prop_type="string", description="The person with the highest sales"),
Property(name="Person with lowest sales", prop_type="string", description="The person with the lowest sales"),
]
))
```
- **OCR Strategy**
```python
config = Configuration(ocr_strategy=OcrStrategy.AUTO)
```
- **Segment Processing**
```python
from chunkr_ai.models import SegmentProcessing, GenerationConfig, GenerationStrategy
config = Configuration(
segment_processing=SegmentProcessing(
page=GenerationConfig(
html=GenerationStrategy.LLM,
markdown=GenerationStrategy.LLM
)
)
)
```
- **Segmentation Strategy**
```python
config = Configuration(
segmentation_strategy=SegmentationStrategy.LAYOUT_ANALYSIS # or SegmentationStrategy.PAGE
)
```
## Environment setup
You can provide your API key and URL in several ways:
1. Environment variables: `CHUNKR_API_KEY` and `CHUNKR_URL`
2. `.env` file
3. Direct initialization:
```python
chunkr = Chunkr(
api_key="your-api-key",
url="https://api.chunkr.ai"
)
```
Raw data
{
"_id": null,
"home_page": null,
"name": "chunkr-ai",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": null,
"author": null,
"author_email": "Ishaan Kapoor <ishaan@lumina.sh>",
"download_url": "https://files.pythonhosted.org/packages/b0/34/50ef339e53d199bd6f4e8215d1b3728cde13f4cd1e8f212648a5233ba51d/chunkr_ai-0.0.5.tar.gz",
"platform": null,
"description": "# Chunkr Python Client\n\nThis provides a simple interface to interact with the Chunkr API.\n\n## Getting Started\n\nYou can get an API key from [Chunkr](https://chunkr.ai) or deploy your own Chunkr instance. For self-hosted deployment options, check out our [deployment guide](https://github.com/lumina-ai-inc/chunkr/tree/main?tab=readme-ov-file#self-hosted-deployment-options).\n\nFor more information about the API and its capabilities, visit the [Chunkr API docs](https://docs.chunkr.ai).\n\n## Installation\n\n```bash\npip install chunkr-ai\n```\n\n## Usage\n\nWe provide two clients: `Chunkr` for synchronous operations and `ChunkrAsync` for asynchronous operations.\n\n### Synchronous Usage\n\n```python\nfrom chunkr_ai import Chunkr\n\n# Initialize client\nchunkr = Chunkr()\n\n# Upload a file and wait for processing\ntask = chunkr.upload(\"document.pdf\")\n\n# Print the response\nprint(task)\n\n# Get output from task\noutput = task.output\n\n# If you want to upload without waiting for processing\ntask = chunkr.start_upload(\"document.pdf\")\n# ... do other things ...\ntask.poll() # Check status when needed\n```\n\n### Asynchronous Usage\n\n```python\nfrom chunkr_ai import ChunkrAsync\n\nasync def process_document():\n # Initialize client\n chunkr = ChunkrAsync()\n\n # Upload a file and wait for processing\n task = await chunkr.upload(\"document.pdf\")\n\n # Print the response\n print(task)\n\n # Get output from task\n output = task.output\n\n # If you want to upload without waiting for processing\n task = await chunkr.start_upload(\"document.pdf\")\n # ... do other things ...\n await task.poll_async() # Check status when needed\n```\n\n### Additional Features\n\nBoth clients support various input types:\n\n```python\n# Upload from file path\nchunkr.upload(\"document.pdf\")\n\n# Upload from opened file\nwith open(\"document.pdf\", \"rb\") as f:\n chunkr.upload(f)\n\n# Upload from URL\nchunkr.upload(\"https://example.com/document.pdf\")\n\n# Upload from base64 string\nchunkr.upload(\"data:application/pdf;base64,JVBERi0xLjcKCjEgMCBvYmo...\")\n\n# Upload an image\nfrom PIL import Image\nimg = Image.open(\"photo.jpg\")\nchunkr.upload(img)\n```\n\n### Configuration\n\nYou can customize the processing behavior by passing a `Configuration` object:\n\n```python\nfrom chunkr_ai.models import Configuration, OcrStrategy, SegmentationStrategy, GenerationStrategy\n\n# Basic configuration\nconfig = Configuration(\n ocr_strategy=OcrStrategy.AUTO,\n segmentation_strategy=SegmentationStrategy.LAYOUT_ANALYSIS,\n high_resolution=True,\n expires_in=3600, # seconds\n)\n\n# Upload with configuration\ntask = chunkr.upload(\"document.pdf\", config)\n```\n\n#### Available Configuration Examples\n\n- **Chunk Processing**\n ```python\n from chunkr_ai.models import ChunkProcessing\n config = Configuration(\n chunk_processing=ChunkProcessing(target_length=1024)\n )\n ```\n- **Expires In**\n ```python\n config = Configuration(expires_in=3600)\n ```\n\n- **High Resolution**\n ```python\n config = Configuration(high_resolution=True)\n ```\n\n- **JSON Schema**\n ```python\n config = Configuration(json_schema=JsonSchema(\n title=\"Sales Data\",\n properties=[\n Property(name=\"Person with highest sales\", prop_type=\"string\", description=\"The person with the highest sales\"),\n Property(name=\"Person with lowest sales\", prop_type=\"string\", description=\"The person with the lowest sales\"),\n ]\n ))\n ```\n\n- **OCR Strategy**\n ```python\n config = Configuration(ocr_strategy=OcrStrategy.AUTO)\n ```\n\n- **Segment Processing**\n ```python\n from chunkr_ai.models import SegmentProcessing, GenerationConfig, GenerationStrategy\n config = Configuration(\n segment_processing=SegmentProcessing(\n page=GenerationConfig(\n html=GenerationStrategy.LLM,\n markdown=GenerationStrategy.LLM\n )\n )\n )\n ```\n\n- **Segmentation Strategy**\n ```python\n config = Configuration(\n segmentation_strategy=SegmentationStrategy.LAYOUT_ANALYSIS # or SegmentationStrategy.PAGE\n )\n ```\n\n## Environment setup\n\nYou can provide your API key and URL in several ways:\n1. Environment variables: `CHUNKR_API_KEY` and `CHUNKR_URL`\n2. `.env` file\n3. Direct initialization:\n```python\nchunkr = Chunkr(\n api_key=\"your-api-key\",\n url=\"https://api.chunkr.ai\"\n)\n```\n",
"bugtrack_url": null,
"license": null,
"summary": "Python client for Chunkr: open source document intelligence",
"version": "0.0.5",
"project_urls": {
"Homepage": "https://chunkr.ai"
},
"split_keywords": [],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "b36c42c0ba37c1951dd6d7aebbae36eb41499b9be61bb02eda75e092a6a906f2",
"md5": "46ccc90f5a91e09fe19609ec8eb22e1a",
"sha256": "a67e2884aa31ef1f8c1425152f6eecaf385889b522695e3449affcf8f0a10efb"
},
"downloads": -1,
"filename": "chunkr_ai-0.0.5-py3-none-any.whl",
"has_sig": false,
"md5_digest": "46ccc90f5a91e09fe19609ec8eb22e1a",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 11879,
"upload_time": "2025-01-10T01:52:17",
"upload_time_iso_8601": "2025-01-10T01:52:17.483865Z",
"url": "https://files.pythonhosted.org/packages/b3/6c/42c0ba37c1951dd6d7aebbae36eb41499b9be61bb02eda75e092a6a906f2/chunkr_ai-0.0.5-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "b03450ef339e53d199bd6f4e8215d1b3728cde13f4cd1e8f212648a5233ba51d",
"md5": "8cd27bbdec201a26fe450604a8422813",
"sha256": "7a0882c8e9f247c404d274042e9dd8ea50f383de020d6c693cd214691c1dafe1"
},
"downloads": -1,
"filename": "chunkr_ai-0.0.5.tar.gz",
"has_sig": false,
"md5_digest": "8cd27bbdec201a26fe450604a8422813",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 10697,
"upload_time": "2025-01-10T01:52:19",
"upload_time_iso_8601": "2025-01-10T01:52:19.726155Z",
"url": "https://files.pythonhosted.org/packages/b0/34/50ef339e53d199bd6f4e8215d1b3728cde13f4cd1e8f212648a5233ba51d/chunkr_ai-0.0.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-10 01:52:19",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "chunkr-ai"
}