kallia


Namekallia JSON
Version 0.1.2 PyPI version JSON
download
home_pageNone
SummarySemantic Document Processing Library
upload_time2025-07-13 08:55:52
maintainerNone
docs_urlNone
authorNone
requires_python>=3.11
licenseNone
keywords document-processing semantic-chunking document-analysis text-processing machine-learning
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Kallia

**Semantic Document Processing Library**

Kallia is a FastAPI-based document processing service that converts documents into intelligent semantic chunks. The library specializes in extracting meaningful content segments from documents while preserving context and semantic relationships.

## 🚀 Features

- **Document-to-Markdown Conversion**: Standardized processing pipeline for consistent output
- **Semantic Chunking**: Intelligent content segmentation that respects document structure and meaning
- **PDF Support**: Currently supports PDF documents with extensible architecture for additional formats
- **RESTful API**: Clean, well-documented API interface with comprehensive error handling
- **Configurable Processing**: Adjustable parameters for temperature, token limits, and page selection
- **Docker Ready**: Containerized deployment with Docker and docker-compose support
- **Vision-Language Model Integration**: Leverages advanced AI models for content understanding
- **Interactive Playground**: Chainlit-based demo application for testing and exploration

## 📋 Prerequisites

- Python 3.11 or higher (3.11, 3.12, 3.13 supported)
- Docker (optional, for containerized deployment)
- Access to a compatible language model API (OpenRouter, Ollama, etc.)

## 🛠️ Installation

### PyPI Installation (Recommended)

Install Kallia directly from PyPI:

```bash
pip install kallia
```

### Local Development Setup

1. **Clone the repository**

   ```bash
   git clone https://github.com/kallia-project/kallia.git
   cd kallia
   ```

2. **Install dependencies**

   ```bash
   cd kallia
   pip install -r requirements.txt
   ```

3. **Configure environment variables**

   ```bash
   cp .env.example .env
   ```

   Edit `.env` with your configuration:

   ```env
   KALLIA_PROVIDER_API_KEY=your_api_key_here
   KALLIA_PROVIDER_BASE_URL=https://openrouter.ai/api/v1
   KALLIA_PROVIDER_MODEL=qwen/qwen2.5-vl-32b-instruct
   ```

4. **Run the application**
   ```bash
   fastapi run kallia/main.py --port 8000
   ```

### Docker Deployment

1. **Using Docker Compose (Recommended)**

   ```bash
   cd kallia
   docker-compose up -d
   ```

2. **Manual Docker Build**

   ```bash
   # Build the Docker image
   docker build -t kallia:0.1.2 .

   # Run the container
   docker run -p 8000:80 \
     -e KALLIA_PROVIDER_API_KEY=ollama \
     -e KALLIA_PROVIDER_BASE_URL=http://localhost:11434/v1 \
     -e KALLIA_PROVIDER_MODEL=qwen2.5vl:32b \
     kallia:0.1.2
   ```

## ⚙️ Configuration

### Environment Variables

| Variable                   | Description                              | Example                     |
| -------------------------- | ---------------------------------------- | --------------------------- |
| `KALLIA_PROVIDER_API_KEY`  | API key for your language model provider | `ollama`                    |
| `KALLIA_PROVIDER_BASE_URL` | Base URL for the API endpoint            | `http://localhost:11434/v1` |
| `KALLIA_PROVIDER_MODEL`    | Model identifier to use                  | `qwen2.5vl:32b`             |

### Supported Providers

- **OpenRouter**: Use OpenRouter API for access to various models
- **Ollama**: Local model deployment with Ollama
- **Custom Endpoints**: Any OpenAI-compatible API endpoint

## 📖 Usage

### API Endpoint

**POST** `/documents`

Converts a document into semantic chunks with concise summaries.

#### Request Body

```json
{
  "url": "https://raw.githubusercontent.com/kallia-project/kallia/refs/tags/v0.1.2/assets/pdf/01.pdf",
  "page_number": 1,
  "temperature": 0.0,
  "max_tokens": 8192
}
```

#### Parameters

- `url` (string, required): URL to the document to process
- `page_number` (integer, optional): Specific page to process (default: 1)
- `temperature` (float, optional): Model temperature for processing (default: 0.0)
- `max_tokens` (integer, optional): Maximum tokens for processing (default: 8192)

#### Response

```json
{
  "documents": [
    {
      "page_number": 1,
      "chunks": [
        {
          "original_text": "Original document content...",
          "concise_summary": "Concise summary of the content..."
        }
      ]
    }
  ]
}
```

### Example Usage

#### cURL

```bash
curl -X POST "http://localhost:8000/documents" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://raw.githubusercontent.com/kallia-project/kallia/refs/tags/v0.1.2/assets/pdf/01.pdf",
    "page_number": 1,
    "temperature": 0.0,
    "max_tokens": 4096
  }'
```

#### Python API Client

```python
import requests

response = requests.post(
    "http://localhost:8000/documents",
    json={
        "url": "https://raw.githubusercontent.com/kallia-project/kallia/refs/tags/v0.1.2/assets/pdf/01.pdf",
        "page_number": 1,
        "temperature": 0.0,
        "max_tokens": 4096
    }
)

result = response.json()
documents = result["documents"]

for document in documents:
    print(f"Page {document['page_number']}:")
    for chunk in document["chunks"]:
        print(f"  Original: {chunk['original_text']}")
        print(f"  Summary: {chunk['concise_summary']}")
        print("  ---")
```

### Programmatic Usage

You can also use Kallia directly as a Python library without running the API server:

#### Convert PDF to Markdown

```python
from kallia.documents import Documents

# Convert a PDF document to markdown
url = "./assets/pdf/01.pdf"
page_number = 1
temperature = 0.0
max_tokens = 8192

markdown_content = Documents.to_markdown(
    source=url,
    page_number=page_number,
    temperature=temperature,
    max_tokens=max_tokens,
)

print(markdown_content)
```

#### Create Semantic Chunks

```python
from kallia.documents import Documents
from kallia.chunker import Chunker

# First convert document to markdown
url = "./assets/pdf/01.pdf"
page_number = 1
temperature = 0.0
max_tokens = 8192

markdown_content = Documents.to_markdown(
    source=url,
    page_number=page_number,
    temperature=temperature,
    max_tokens=max_tokens,
)

# Then create semantic chunks from the markdown
semantic_chunks = Chunker.create(
    text=markdown_content,
    temperature=temperature,
    max_tokens=max_tokens,
)

# Process the chunks
for chunk in semantic_chunks:
    print(f"Original: {chunk.original_text}")
    print(f"Summary: {chunk.concise_summary}")
    print("---")
```

## 🎮 Interactive Playground

Kallia includes an interactive playground built with Chainlit for easy testing and exploration:

### Running the Playground

1. **Navigate to the playground directory**

   ```bash
   cd kallia-playground
   ```

2. **Install playground dependencies**

   ```bash
   pip install -r requirements.txt
   ```

3. **Configure environment variables**

   ```bash
   cp .env.example .env
   # Edit .env with your configuration
   ```

4. **Run the playground**

   ```bash
   chainlit run kallia-playground/main.py
   ```

5. **Access the interface**
   Open your browser to `http://localhost:8000`

### Playground Features

- **File Upload**: Upload PDF documents directly through the web interface
- **Real-time Processing**: Watch as documents are processed page by page
- **Interactive Q&A**: Ask questions about uploaded documents
- **Source References**: View the original chunks that inform each answer
- **Multi-page Support**: Automatically processes all pages of uploaded documents

## 🏗️ Project Structure

```
kallia/
├── kallia/                      # Main package directory
│   ├── kallia/
│   │   ├── __init__.py
│   │   ├── main.py              # FastAPI application entry point
│   │   ├── models.py            # Pydantic models for API
│   │   ├── constants.py         # Application constants
│   │   ├── documents.py         # Document processing logic
│   │   ├── chunker.py           # Semantic chunking implementation
│   │   ├── utils.py             # Utility functions
│   │   ├── logger.py            # Logging configuration
│   │   ├── settings.py          # Application settings
│   │   ├── exceptions.py        # Custom exceptions
│   │   ├── messages.py          # Message handling
│   │   ├── prompts.py           # AI model prompts
│   │   ├── image_caption_serializer.py
│   │   └── unordered_list_serializer.py
│   ├── requirements.txt         # Python dependencies
│   ├── Dockerfile               # Docker container configuration
│   ├── docker-compose.yml       # Docker Compose setup
│   └── .env.example             # Environment variables template
├── kallia-playground/           # Interactive demo application
│   ├── kallia-playground/
│   │   ├── __init__.py
│   │   ├── main.py              # Chainlit application
│   │   ├── qa.py                # Q&A functionality
│   │   ├── settings.py          # Playground settings
│   │   ├── constants.py         # Playground constants
│   │   └── chainlit.md          # Chainlit configuration
│   ├── requirements.txt         # Playground dependencies
│   ├── Dockerfile               # Playground Docker config
│   ├── docker-compose.yml       # Playground Docker Compose
│   └── .env.example             # Playground environment template
├── tests/                       # Test suite
│   ├── __init__.py
│   ├── test_pdf_to_markdown.py
│   └── test_markdown_to_chunks.py
├── assets/                      # Test assets
│   └── pdf/
│       └── 01.pdf               # Sample PDF for testing
├── LICENSE                      # Apache 2.0 License
├── pyproject.toml               # Project configuration
└── README.md                    # This file
```

## 🔧 Development

### Code Style

The project follows Python best practices and uses:

- FastAPI for web framework
- Pydantic for data validation
- Structured logging
- Comprehensive error handling

### Testing

The project includes comprehensive tests for core functionality:

```bash
# Run tests
pytest tests/

# Run specific tests
pytest tests/test_pdf_to_markdown.py
pytest tests/test_markdown_to_chunks.py
```

Test coverage includes:

- PDF to markdown conversion
- Markdown to semantic chunks processing
- End-to-end document processing pipeline

## 📦 Dependencies

### Core Dependencies

- **FastAPI**: Modern, fast web framework for building APIs
- **Docling**: Document processing and conversion library

### Full Dependency List

See `kallia/requirements.txt` for complete dependency specifications:

- `fastapi[standard]==0.116.1`
- `docling==2.41.0`

### Playground Dependencies

The interactive playground has additional dependencies listed in `kallia-playground/requirements.txt`:

- `chainlit`: For the interactive web interface
- `langchain`: For document processing and Q&A functionality
- `pdfminer`: For PDF metadata extraction

## 🚨 Error Handling

The API provides comprehensive error handling with appropriate HTTP status codes:

- **400 Bad Request**: Invalid parameters or unsupported file format
- **500 Internal Server Error**: Processing errors
- **503 Service Unavailable**: External service connectivity issues

## 📄 License

This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.

## 👨‍💻 Author

**CK**

- Email: ck@kallia.net
- GitHub: [@kallia-project](https://github.com/kallia-project/kallia)

## 🔗 Links

- [GitHub Repository](https://github.com/kallia-project/kallia)
- [Issues](https://github.com/kallia-project/kallia/issues)
- [PyPI Package](https://pypi.org/project/kallia/)
- [Docker Hub](https://hub.docker.com/r/overheatsystem/kallia)

## 📈 Version

Current version: **0.1.2**

---

Built with ❤️ for intelligent document processing

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "kallia",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": null,
    "keywords": "document-processing, semantic-chunking, document-analysis, text-processing, machine-learning",
    "author": null,
    "author_email": "CK <ck@kallia.net>",
    "download_url": "https://files.pythonhosted.org/packages/cf/a2/bad8f63e5b59cf5f308624feaa0bd314d09ed21c4a9925a4eeff148d0ff8/kallia-0.1.2.tar.gz",
    "platform": null,
    "description": "# Kallia\r\n\r\n**Semantic Document Processing Library**\r\n\r\nKallia is a FastAPI-based document processing service that converts documents into intelligent semantic chunks. The library specializes in extracting meaningful content segments from documents while preserving context and semantic relationships.\r\n\r\n## \ud83d\ude80 Features\r\n\r\n- **Document-to-Markdown Conversion**: Standardized processing pipeline for consistent output\r\n- **Semantic Chunking**: Intelligent content segmentation that respects document structure and meaning\r\n- **PDF Support**: Currently supports PDF documents with extensible architecture for additional formats\r\n- **RESTful API**: Clean, well-documented API interface with comprehensive error handling\r\n- **Configurable Processing**: Adjustable parameters for temperature, token limits, and page selection\r\n- **Docker Ready**: Containerized deployment with Docker and docker-compose support\r\n- **Vision-Language Model Integration**: Leverages advanced AI models for content understanding\r\n- **Interactive Playground**: Chainlit-based demo application for testing and exploration\r\n\r\n## \ud83d\udccb Prerequisites\r\n\r\n- Python 3.11 or higher (3.11, 3.12, 3.13 supported)\r\n- Docker (optional, for containerized deployment)\r\n- Access to a compatible language model API (OpenRouter, Ollama, etc.)\r\n\r\n## \ud83d\udee0\ufe0f Installation\r\n\r\n### PyPI Installation (Recommended)\r\n\r\nInstall Kallia directly from PyPI:\r\n\r\n```bash\r\npip install kallia\r\n```\r\n\r\n### Local Development Setup\r\n\r\n1. **Clone the repository**\r\n\r\n   ```bash\r\n   git clone https://github.com/kallia-project/kallia.git\r\n   cd kallia\r\n   ```\r\n\r\n2. **Install dependencies**\r\n\r\n   ```bash\r\n   cd kallia\r\n   pip install -r requirements.txt\r\n   ```\r\n\r\n3. **Configure environment variables**\r\n\r\n   ```bash\r\n   cp .env.example .env\r\n   ```\r\n\r\n   Edit `.env` with your configuration:\r\n\r\n   ```env\r\n   KALLIA_PROVIDER_API_KEY=your_api_key_here\r\n   KALLIA_PROVIDER_BASE_URL=https://openrouter.ai/api/v1\r\n   KALLIA_PROVIDER_MODEL=qwen/qwen2.5-vl-32b-instruct\r\n   ```\r\n\r\n4. **Run the application**\r\n   ```bash\r\n   fastapi run kallia/main.py --port 8000\r\n   ```\r\n\r\n### Docker Deployment\r\n\r\n1. **Using Docker Compose (Recommended)**\r\n\r\n   ```bash\r\n   cd kallia\r\n   docker-compose up -d\r\n   ```\r\n\r\n2. **Manual Docker Build**\r\n\r\n   ```bash\r\n   # Build the Docker image\r\n   docker build -t kallia:0.1.2 .\r\n\r\n   # Run the container\r\n   docker run -p 8000:80 \\\r\n     -e KALLIA_PROVIDER_API_KEY=ollama \\\r\n     -e KALLIA_PROVIDER_BASE_URL=http://localhost:11434/v1 \\\r\n     -e KALLIA_PROVIDER_MODEL=qwen2.5vl:32b \\\r\n     kallia:0.1.2\r\n   ```\r\n\r\n## \u2699\ufe0f Configuration\r\n\r\n### Environment Variables\r\n\r\n| Variable                   | Description                              | Example                     |\r\n| -------------------------- | ---------------------------------------- | --------------------------- |\r\n| `KALLIA_PROVIDER_API_KEY`  | API key for your language model provider | `ollama`                    |\r\n| `KALLIA_PROVIDER_BASE_URL` | Base URL for the API endpoint            | `http://localhost:11434/v1` |\r\n| `KALLIA_PROVIDER_MODEL`    | Model identifier to use                  | `qwen2.5vl:32b`             |\r\n\r\n### Supported Providers\r\n\r\n- **OpenRouter**: Use OpenRouter API for access to various models\r\n- **Ollama**: Local model deployment with Ollama\r\n- **Custom Endpoints**: Any OpenAI-compatible API endpoint\r\n\r\n## \ud83d\udcd6 Usage\r\n\r\n### API Endpoint\r\n\r\n**POST** `/documents`\r\n\r\nConverts a document into semantic chunks with concise summaries.\r\n\r\n#### Request Body\r\n\r\n```json\r\n{\r\n  \"url\": \"https://raw.githubusercontent.com/kallia-project/kallia/refs/tags/v0.1.2/assets/pdf/01.pdf\",\r\n  \"page_number\": 1,\r\n  \"temperature\": 0.0,\r\n  \"max_tokens\": 8192\r\n}\r\n```\r\n\r\n#### Parameters\r\n\r\n- `url` (string, required): URL to the document to process\r\n- `page_number` (integer, optional): Specific page to process (default: 1)\r\n- `temperature` (float, optional): Model temperature for processing (default: 0.0)\r\n- `max_tokens` (integer, optional): Maximum tokens for processing (default: 8192)\r\n\r\n#### Response\r\n\r\n```json\r\n{\r\n  \"documents\": [\r\n    {\r\n      \"page_number\": 1,\r\n      \"chunks\": [\r\n        {\r\n          \"original_text\": \"Original document content...\",\r\n          \"concise_summary\": \"Concise summary of the content...\"\r\n        }\r\n      ]\r\n    }\r\n  ]\r\n}\r\n```\r\n\r\n### Example Usage\r\n\r\n#### cURL\r\n\r\n```bash\r\ncurl -X POST \"http://localhost:8000/documents\" \\\r\n  -H \"Content-Type: application/json\" \\\r\n  -d '{\r\n    \"url\": \"https://raw.githubusercontent.com/kallia-project/kallia/refs/tags/v0.1.2/assets/pdf/01.pdf\",\r\n    \"page_number\": 1,\r\n    \"temperature\": 0.0,\r\n    \"max_tokens\": 4096\r\n  }'\r\n```\r\n\r\n#### Python API Client\r\n\r\n```python\r\nimport requests\r\n\r\nresponse = requests.post(\r\n    \"http://localhost:8000/documents\",\r\n    json={\r\n        \"url\": \"https://raw.githubusercontent.com/kallia-project/kallia/refs/tags/v0.1.2/assets/pdf/01.pdf\",\r\n        \"page_number\": 1,\r\n        \"temperature\": 0.0,\r\n        \"max_tokens\": 4096\r\n    }\r\n)\r\n\r\nresult = response.json()\r\ndocuments = result[\"documents\"]\r\n\r\nfor document in documents:\r\n    print(f\"Page {document['page_number']}:\")\r\n    for chunk in document[\"chunks\"]:\r\n        print(f\"  Original: {chunk['original_text']}\")\r\n        print(f\"  Summary: {chunk['concise_summary']}\")\r\n        print(\"  ---\")\r\n```\r\n\r\n### Programmatic Usage\r\n\r\nYou can also use Kallia directly as a Python library without running the API server:\r\n\r\n#### Convert PDF to Markdown\r\n\r\n```python\r\nfrom kallia.documents import Documents\r\n\r\n# Convert a PDF document to markdown\r\nurl = \"./assets/pdf/01.pdf\"\r\npage_number = 1\r\ntemperature = 0.0\r\nmax_tokens = 8192\r\n\r\nmarkdown_content = Documents.to_markdown(\r\n    source=url,\r\n    page_number=page_number,\r\n    temperature=temperature,\r\n    max_tokens=max_tokens,\r\n)\r\n\r\nprint(markdown_content)\r\n```\r\n\r\n#### Create Semantic Chunks\r\n\r\n```python\r\nfrom kallia.documents import Documents\r\nfrom kallia.chunker import Chunker\r\n\r\n# First convert document to markdown\r\nurl = \"./assets/pdf/01.pdf\"\r\npage_number = 1\r\ntemperature = 0.0\r\nmax_tokens = 8192\r\n\r\nmarkdown_content = Documents.to_markdown(\r\n    source=url,\r\n    page_number=page_number,\r\n    temperature=temperature,\r\n    max_tokens=max_tokens,\r\n)\r\n\r\n# Then create semantic chunks from the markdown\r\nsemantic_chunks = Chunker.create(\r\n    text=markdown_content,\r\n    temperature=temperature,\r\n    max_tokens=max_tokens,\r\n)\r\n\r\n# Process the chunks\r\nfor chunk in semantic_chunks:\r\n    print(f\"Original: {chunk.original_text}\")\r\n    print(f\"Summary: {chunk.concise_summary}\")\r\n    print(\"---\")\r\n```\r\n\r\n## \ud83c\udfae Interactive Playground\r\n\r\nKallia includes an interactive playground built with Chainlit for easy testing and exploration:\r\n\r\n### Running the Playground\r\n\r\n1. **Navigate to the playground directory**\r\n\r\n   ```bash\r\n   cd kallia-playground\r\n   ```\r\n\r\n2. **Install playground dependencies**\r\n\r\n   ```bash\r\n   pip install -r requirements.txt\r\n   ```\r\n\r\n3. **Configure environment variables**\r\n\r\n   ```bash\r\n   cp .env.example .env\r\n   # Edit .env with your configuration\r\n   ```\r\n\r\n4. **Run the playground**\r\n\r\n   ```bash\r\n   chainlit run kallia-playground/main.py\r\n   ```\r\n\r\n5. **Access the interface**\r\n   Open your browser to `http://localhost:8000`\r\n\r\n### Playground Features\r\n\r\n- **File Upload**: Upload PDF documents directly through the web interface\r\n- **Real-time Processing**: Watch as documents are processed page by page\r\n- **Interactive Q&A**: Ask questions about uploaded documents\r\n- **Source References**: View the original chunks that inform each answer\r\n- **Multi-page Support**: Automatically processes all pages of uploaded documents\r\n\r\n## \ud83c\udfd7\ufe0f Project Structure\r\n\r\n```\r\nkallia/\r\n\u251c\u2500\u2500 kallia/                      # Main package directory\r\n\u2502   \u251c\u2500\u2500 kallia/\r\n\u2502   \u2502   \u251c\u2500\u2500 __init__.py\r\n\u2502   \u2502   \u251c\u2500\u2500 main.py              # FastAPI application entry point\r\n\u2502   \u2502   \u251c\u2500\u2500 models.py            # Pydantic models for API\r\n\u2502   \u2502   \u251c\u2500\u2500 constants.py         # Application constants\r\n\u2502   \u2502   \u251c\u2500\u2500 documents.py         # Document processing logic\r\n\u2502   \u2502   \u251c\u2500\u2500 chunker.py           # Semantic chunking implementation\r\n\u2502   \u2502   \u251c\u2500\u2500 utils.py             # Utility functions\r\n\u2502   \u2502   \u251c\u2500\u2500 logger.py            # Logging configuration\r\n\u2502   \u2502   \u251c\u2500\u2500 settings.py          # Application settings\r\n\u2502   \u2502   \u251c\u2500\u2500 exceptions.py        # Custom exceptions\r\n\u2502   \u2502   \u251c\u2500\u2500 messages.py          # Message handling\r\n\u2502   \u2502   \u251c\u2500\u2500 prompts.py           # AI model prompts\r\n\u2502   \u2502   \u251c\u2500\u2500 image_caption_serializer.py\r\n\u2502   \u2502   \u2514\u2500\u2500 unordered_list_serializer.py\r\n\u2502   \u251c\u2500\u2500 requirements.txt         # Python dependencies\r\n\u2502   \u251c\u2500\u2500 Dockerfile               # Docker container configuration\r\n\u2502   \u251c\u2500\u2500 docker-compose.yml       # Docker Compose setup\r\n\u2502   \u2514\u2500\u2500 .env.example             # Environment variables template\r\n\u251c\u2500\u2500 kallia-playground/           # Interactive demo application\r\n\u2502   \u251c\u2500\u2500 kallia-playground/\r\n\u2502   \u2502   \u251c\u2500\u2500 __init__.py\r\n\u2502   \u2502   \u251c\u2500\u2500 main.py              # Chainlit application\r\n\u2502   \u2502   \u251c\u2500\u2500 qa.py                # Q&A functionality\r\n\u2502   \u2502   \u251c\u2500\u2500 settings.py          # Playground settings\r\n\u2502   \u2502   \u251c\u2500\u2500 constants.py         # Playground constants\r\n\u2502   \u2502   \u2514\u2500\u2500 chainlit.md          # Chainlit configuration\r\n\u2502   \u251c\u2500\u2500 requirements.txt         # Playground dependencies\r\n\u2502   \u251c\u2500\u2500 Dockerfile               # Playground Docker config\r\n\u2502   \u251c\u2500\u2500 docker-compose.yml       # Playground Docker Compose\r\n\u2502   \u2514\u2500\u2500 .env.example             # Playground environment template\r\n\u251c\u2500\u2500 tests/                       # Test suite\r\n\u2502   \u251c\u2500\u2500 __init__.py\r\n\u2502   \u251c\u2500\u2500 test_pdf_to_markdown.py\r\n\u2502   \u2514\u2500\u2500 test_markdown_to_chunks.py\r\n\u251c\u2500\u2500 assets/                      # Test assets\r\n\u2502   \u2514\u2500\u2500 pdf/\r\n\u2502       \u2514\u2500\u2500 01.pdf               # Sample PDF for testing\r\n\u251c\u2500\u2500 LICENSE                      # Apache 2.0 License\r\n\u251c\u2500\u2500 pyproject.toml               # Project configuration\r\n\u2514\u2500\u2500 README.md                    # This file\r\n```\r\n\r\n## \ud83d\udd27 Development\r\n\r\n### Code Style\r\n\r\nThe project follows Python best practices and uses:\r\n\r\n- FastAPI for web framework\r\n- Pydantic for data validation\r\n- Structured logging\r\n- Comprehensive error handling\r\n\r\n### Testing\r\n\r\nThe project includes comprehensive tests for core functionality:\r\n\r\n```bash\r\n# Run tests\r\npytest tests/\r\n\r\n# Run specific tests\r\npytest tests/test_pdf_to_markdown.py\r\npytest tests/test_markdown_to_chunks.py\r\n```\r\n\r\nTest coverage includes:\r\n\r\n- PDF to markdown conversion\r\n- Markdown to semantic chunks processing\r\n- End-to-end document processing pipeline\r\n\r\n## \ud83d\udce6 Dependencies\r\n\r\n### Core Dependencies\r\n\r\n- **FastAPI**: Modern, fast web framework for building APIs\r\n- **Docling**: Document processing and conversion library\r\n\r\n### Full Dependency List\r\n\r\nSee `kallia/requirements.txt` for complete dependency specifications:\r\n\r\n- `fastapi[standard]==0.116.1`\r\n- `docling==2.41.0`\r\n\r\n### Playground Dependencies\r\n\r\nThe interactive playground has additional dependencies listed in `kallia-playground/requirements.txt`:\r\n\r\n- `chainlit`: For the interactive web interface\r\n- `langchain`: For document processing and Q&A functionality\r\n- `pdfminer`: For PDF metadata extraction\r\n\r\n## \ud83d\udea8 Error Handling\r\n\r\nThe API provides comprehensive error handling with appropriate HTTP status codes:\r\n\r\n- **400 Bad Request**: Invalid parameters or unsupported file format\r\n- **500 Internal Server Error**: Processing errors\r\n- **503 Service Unavailable**: External service connectivity issues\r\n\r\n## \ud83d\udcc4 License\r\n\r\nThis project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.\r\n\r\n## \ud83d\udc68\u200d\ud83d\udcbb Author\r\n\r\n**CK**\r\n\r\n- Email: ck@kallia.net\r\n- GitHub: [@kallia-project](https://github.com/kallia-project/kallia)\r\n\r\n## \ud83d\udd17 Links\r\n\r\n- [GitHub Repository](https://github.com/kallia-project/kallia)\r\n- [Issues](https://github.com/kallia-project/kallia/issues)\r\n- [PyPI Package](https://pypi.org/project/kallia/)\r\n- [Docker Hub](https://hub.docker.com/r/overheatsystem/kallia)\r\n\r\n## \ud83d\udcc8 Version\r\n\r\nCurrent version: **0.1.2**\r\n\r\n---\r\n\r\nBuilt with \u2764\ufe0f for intelligent document processing\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Semantic Document Processing Library",
    "version": "0.1.2",
    "project_urls": {
        "Homepage": "https://github.com/kallia-project/kallia",
        "Issues": "https://github.com/kallia-project/kallia/issues"
    },
    "split_keywords": [
        "document-processing",
        " semantic-chunking",
        " document-analysis",
        " text-processing",
        " machine-learning"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "02c078130a691f55d60d210926f81f90574cb26f55e139d1dcb608da54b1a229",
                "md5": "ddf52bcae769e1cb4e1da3d60aa33cac",
                "sha256": "2bd95300b4121e9e52f893dd5ee36f9c6d0cfbeb3f4214667bb397dd11413725"
            },
            "downloads": -1,
            "filename": "kallia-0.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ddf52bcae769e1cb4e1da3d60aa33cac",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11",
            "size": 17963,
            "upload_time": "2025-07-13T08:55:51",
            "upload_time_iso_8601": "2025-07-13T08:55:51.404042Z",
            "url": "https://files.pythonhosted.org/packages/02/c0/78130a691f55d60d210926f81f90574cb26f55e139d1dcb608da54b1a229/kallia-0.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "cfa2bad8f63e5b59cf5f308624feaa0bd314d09ed21c4a9925a4eeff148d0ff8",
                "md5": "7f163a50c629a61ffb1d0ccd71f449c0",
                "sha256": "938be4461e0aedffc9afba661558a18bf067e767ed3ac27ec4d6b09ca514314e"
            },
            "downloads": -1,
            "filename": "kallia-0.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "7f163a50c629a61ffb1d0ccd71f449c0",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11",
            "size": 19260,
            "upload_time": "2025-07-13T08:55:52",
            "upload_time_iso_8601": "2025-07-13T08:55:52.785903Z",
            "url": "https://files.pythonhosted.org/packages/cf/a2/bad8f63e5b59cf5f308624feaa0bd314d09ed21c4a9925a4eeff148d0ff8/kallia-0.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-13 08:55:52",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "kallia-project",
    "github_project": "kallia",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "kallia"
}
        
Elapsed time: 0.42939s