chunkr-ai


Namechunkr-ai JSON
Version 0.0.5 PyPI version JSON
download
home_pageNone
SummaryPython client for Chunkr: open source document intelligence
upload_time2025-01-10 01:52:19
maintainerNone
docs_urlNone
authorNone
requires_pythonNone
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Chunkr Python Client

This provides a simple interface to interact with the Chunkr API.

## Getting Started

You can get an API key from [Chunkr](https://chunkr.ai) or deploy your own Chunkr instance. For self-hosted deployment options, check out our [deployment guide](https://github.com/lumina-ai-inc/chunkr/tree/main?tab=readme-ov-file#self-hosted-deployment-options).

For more information about the API and its capabilities, visit the [Chunkr API docs](https://docs.chunkr.ai).

## Installation

```bash
pip install chunkr-ai
```

## Usage

We provide two clients: `Chunkr` for synchronous operations and `ChunkrAsync` for asynchronous operations.

### Synchronous Usage

```python
from chunkr_ai import Chunkr

# Initialize client
chunkr = Chunkr()

# Upload a file and wait for processing
task = chunkr.upload("document.pdf")

# Print the response
print(task)

# Get output from task
output = task.output

# If you want to upload without waiting for processing
task = chunkr.start_upload("document.pdf")
# ... do other things ...
task.poll()  # Check status when needed
```

### Asynchronous Usage

```python
from chunkr_ai import ChunkrAsync

async def process_document():
    # Initialize client
    chunkr = ChunkrAsync()

    # Upload a file and wait for processing
    task = await chunkr.upload("document.pdf")

    # Print the response
    print(task)

    # Get output from task
    output = task.output

    # If you want to upload without waiting for processing
    task = await chunkr.start_upload("document.pdf")
    # ... do other things ...
    await task.poll_async()  # Check status when needed
```

### Additional Features

Both clients support various input types:

```python
# Upload from file path
chunkr.upload("document.pdf")

# Upload from opened file
with open("document.pdf", "rb") as f:
    chunkr.upload(f)

# Upload from URL
chunkr.upload("https://example.com/document.pdf")

# Upload from base64 string
chunkr.upload("data:application/pdf;base64,JVBERi0xLjcKCjEgMCBvYmo...")

# Upload an image
from PIL import Image
img = Image.open("photo.jpg")
chunkr.upload(img)
```

### Configuration

You can customize the processing behavior by passing a `Configuration` object:

```python
from chunkr_ai.models import Configuration, OcrStrategy, SegmentationStrategy, GenerationStrategy

# Basic configuration
config = Configuration(
    ocr_strategy=OcrStrategy.AUTO,
    segmentation_strategy=SegmentationStrategy.LAYOUT_ANALYSIS,
    high_resolution=True,
    expires_in=3600,  # seconds
)

# Upload with configuration
task = chunkr.upload("document.pdf", config)
```

#### Available Configuration Examples

- **Chunk Processing**
  ```python
  from chunkr_ai.models import ChunkProcessing
  config = Configuration(
      chunk_processing=ChunkProcessing(target_length=1024)
  )
  ```
- **Expires In**
  ```python
  config = Configuration(expires_in=3600)
  ```

- **High Resolution**
  ```python
  config = Configuration(high_resolution=True)
  ```

- **JSON Schema**
  ```python
  config = Configuration(json_schema=JsonSchema(
      title="Sales Data",
      properties=[
          Property(name="Person with highest sales", prop_type="string", description="The person with the highest sales"),
          Property(name="Person with lowest sales", prop_type="string", description="The person with the lowest sales"),
      ]
  ))
  ```

- **OCR Strategy**
  ```python
  config = Configuration(ocr_strategy=OcrStrategy.AUTO)
  ```

- **Segment Processing**
  ```python
  from chunkr_ai.models import SegmentProcessing, GenerationConfig, GenerationStrategy
  config = Configuration(
      segment_processing=SegmentProcessing(
          page=GenerationConfig(
              html=GenerationStrategy.LLM,
              markdown=GenerationStrategy.LLM
          )
      )
  )
  ```

- **Segmentation Strategy**
  ```python
  config = Configuration(
      segmentation_strategy=SegmentationStrategy.LAYOUT_ANALYSIS  # or SegmentationStrategy.PAGE
  )
  ```

## Environment setup

You can provide your API key and URL in several ways:
1. Environment variables: `CHUNKR_API_KEY` and `CHUNKR_URL`
2. `.env` file
3. Direct initialization:
```python
chunkr = Chunkr(
    api_key="your-api-key",
    url="https://api.chunkr.ai"
)
```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "chunkr-ai",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": null,
    "author": null,
    "author_email": "Ishaan Kapoor <ishaan@lumina.sh>",
    "download_url": "https://files.pythonhosted.org/packages/b0/34/50ef339e53d199bd6f4e8215d1b3728cde13f4cd1e8f212648a5233ba51d/chunkr_ai-0.0.5.tar.gz",
    "platform": null,
    "description": "# Chunkr Python Client\n\nThis provides a simple interface to interact with the Chunkr API.\n\n## Getting Started\n\nYou can get an API key from [Chunkr](https://chunkr.ai) or deploy your own Chunkr instance. For self-hosted deployment options, check out our [deployment guide](https://github.com/lumina-ai-inc/chunkr/tree/main?tab=readme-ov-file#self-hosted-deployment-options).\n\nFor more information about the API and its capabilities, visit the [Chunkr API docs](https://docs.chunkr.ai).\n\n## Installation\n\n```bash\npip install chunkr-ai\n```\n\n## Usage\n\nWe provide two clients: `Chunkr` for synchronous operations and `ChunkrAsync` for asynchronous operations.\n\n### Synchronous Usage\n\n```python\nfrom chunkr_ai import Chunkr\n\n# Initialize client\nchunkr = Chunkr()\n\n# Upload a file and wait for processing\ntask = chunkr.upload(\"document.pdf\")\n\n# Print the response\nprint(task)\n\n# Get output from task\noutput = task.output\n\n# If you want to upload without waiting for processing\ntask = chunkr.start_upload(\"document.pdf\")\n# ... do other things ...\ntask.poll()  # Check status when needed\n```\n\n### Asynchronous Usage\n\n```python\nfrom chunkr_ai import ChunkrAsync\n\nasync def process_document():\n    # Initialize client\n    chunkr = ChunkrAsync()\n\n    # Upload a file and wait for processing\n    task = await chunkr.upload(\"document.pdf\")\n\n    # Print the response\n    print(task)\n\n    # Get output from task\n    output = task.output\n\n    # If you want to upload without waiting for processing\n    task = await chunkr.start_upload(\"document.pdf\")\n    # ... do other things ...\n    await task.poll_async()  # Check status when needed\n```\n\n### Additional Features\n\nBoth clients support various input types:\n\n```python\n# Upload from file path\nchunkr.upload(\"document.pdf\")\n\n# Upload from opened file\nwith open(\"document.pdf\", \"rb\") as f:\n    chunkr.upload(f)\n\n# Upload from URL\nchunkr.upload(\"https://example.com/document.pdf\")\n\n# Upload from base64 string\nchunkr.upload(\"data:application/pdf;base64,JVBERi0xLjcKCjEgMCBvYmo...\")\n\n# Upload an image\nfrom PIL import Image\nimg = Image.open(\"photo.jpg\")\nchunkr.upload(img)\n```\n\n### Configuration\n\nYou can customize the processing behavior by passing a `Configuration` object:\n\n```python\nfrom chunkr_ai.models import Configuration, OcrStrategy, SegmentationStrategy, GenerationStrategy\n\n# Basic configuration\nconfig = Configuration(\n    ocr_strategy=OcrStrategy.AUTO,\n    segmentation_strategy=SegmentationStrategy.LAYOUT_ANALYSIS,\n    high_resolution=True,\n    expires_in=3600,  # seconds\n)\n\n# Upload with configuration\ntask = chunkr.upload(\"document.pdf\", config)\n```\n\n#### Available Configuration Examples\n\n- **Chunk Processing**\n  ```python\n  from chunkr_ai.models import ChunkProcessing\n  config = Configuration(\n      chunk_processing=ChunkProcessing(target_length=1024)\n  )\n  ```\n- **Expires In**\n  ```python\n  config = Configuration(expires_in=3600)\n  ```\n\n- **High Resolution**\n  ```python\n  config = Configuration(high_resolution=True)\n  ```\n\n- **JSON Schema**\n  ```python\n  config = Configuration(json_schema=JsonSchema(\n      title=\"Sales Data\",\n      properties=[\n          Property(name=\"Person with highest sales\", prop_type=\"string\", description=\"The person with the highest sales\"),\n          Property(name=\"Person with lowest sales\", prop_type=\"string\", description=\"The person with the lowest sales\"),\n      ]\n  ))\n  ```\n\n- **OCR Strategy**\n  ```python\n  config = Configuration(ocr_strategy=OcrStrategy.AUTO)\n  ```\n\n- **Segment Processing**\n  ```python\n  from chunkr_ai.models import SegmentProcessing, GenerationConfig, GenerationStrategy\n  config = Configuration(\n      segment_processing=SegmentProcessing(\n          page=GenerationConfig(\n              html=GenerationStrategy.LLM,\n              markdown=GenerationStrategy.LLM\n          )\n      )\n  )\n  ```\n\n- **Segmentation Strategy**\n  ```python\n  config = Configuration(\n      segmentation_strategy=SegmentationStrategy.LAYOUT_ANALYSIS  # or SegmentationStrategy.PAGE\n  )\n  ```\n\n## Environment setup\n\nYou can provide your API key and URL in several ways:\n1. Environment variables: `CHUNKR_API_KEY` and `CHUNKR_URL`\n2. `.env` file\n3. Direct initialization:\n```python\nchunkr = Chunkr(\n    api_key=\"your-api-key\",\n    url=\"https://api.chunkr.ai\"\n)\n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Python client for Chunkr: open source document intelligence",
    "version": "0.0.5",
    "project_urls": {
        "Homepage": "https://chunkr.ai"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "b36c42c0ba37c1951dd6d7aebbae36eb41499b9be61bb02eda75e092a6a906f2",
                "md5": "46ccc90f5a91e09fe19609ec8eb22e1a",
                "sha256": "a67e2884aa31ef1f8c1425152f6eecaf385889b522695e3449affcf8f0a10efb"
            },
            "downloads": -1,
            "filename": "chunkr_ai-0.0.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "46ccc90f5a91e09fe19609ec8eb22e1a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 11879,
            "upload_time": "2025-01-10T01:52:17",
            "upload_time_iso_8601": "2025-01-10T01:52:17.483865Z",
            "url": "https://files.pythonhosted.org/packages/b3/6c/42c0ba37c1951dd6d7aebbae36eb41499b9be61bb02eda75e092a6a906f2/chunkr_ai-0.0.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "b03450ef339e53d199bd6f4e8215d1b3728cde13f4cd1e8f212648a5233ba51d",
                "md5": "8cd27bbdec201a26fe450604a8422813",
                "sha256": "7a0882c8e9f247c404d274042e9dd8ea50f383de020d6c693cd214691c1dafe1"
            },
            "downloads": -1,
            "filename": "chunkr_ai-0.0.5.tar.gz",
            "has_sig": false,
            "md5_digest": "8cd27bbdec201a26fe450604a8422813",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 10697,
            "upload_time": "2025-01-10T01:52:19",
            "upload_time_iso_8601": "2025-01-10T01:52:19.726155Z",
            "url": "https://files.pythonhosted.org/packages/b0/34/50ef339e53d199bd6f4e8215d1b3728cde13f4cd1e8f212648a5233ba51d/chunkr_ai-0.0.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-10 01:52:19",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "chunkr-ai"
}
        
Elapsed time: 0.56057s