# Kallia
**Semantic Document Processing Library**
Kallia is a FastAPI-based document processing service that converts documents into intelligent semantic chunks. The library specializes in extracting meaningful content segments from documents while preserving context and semantic relationships.
## 🚀 Features
- **Document-to-Markdown Conversion**: Standardized processing pipeline for consistent output
- **Semantic Chunking**: Intelligent content segmentation that respects document structure and meaning
- **PDF Support**: Currently supports PDF documents with extensible architecture for additional formats
- **RESTful API**: Clean, well-documented API interface with comprehensive error handling
- **Configurable Processing**: Adjustable parameters for temperature, token limits, and page selection
- **Docker Ready**: Containerized deployment with Docker and docker-compose support
- **Vision-Language Model Integration**: Leverages advanced AI models for content understanding
- **Interactive Playground**: Chainlit-based demo application for testing and exploration
## 📋 Prerequisites
- Python 3.11 or higher (3.11, 3.12, 3.13 supported)
- Docker (optional, for containerized deployment)
- Access to a compatible language model API (OpenRouter, Ollama, etc.)
## 🛠️ Installation
### PyPI Installation (Recommended)
Install Kallia directly from PyPI:
```bash
pip install kallia
```
### Local Development Setup
1. **Clone the repository**
```bash
git clone https://github.com/kallia-project/kallia.git
cd kallia
```
2. **Install dependencies**
```bash
cd kallia
pip install -r requirements.txt
```
3. **Configure environment variables**
```bash
cp .env.example .env
```
Edit `.env` with your configuration:
```env
KALLIA_PROVIDER_API_KEY=your_api_key_here
KALLIA_PROVIDER_BASE_URL=https://openrouter.ai/api/v1
KALLIA_PROVIDER_MODEL=qwen/qwen2.5-vl-32b-instruct
```
4. **Run the application**
```bash
fastapi run kallia/main.py --port 8000
```
### Docker Deployment
1. **Using Docker Compose (Recommended)**
```bash
cd kallia
docker-compose up -d
```
2. **Manual Docker Build**
```bash
# Build the Docker image
docker build -t kallia:0.1.2 .
# Run the container
docker run -p 8000:80 \
-e KALLIA_PROVIDER_API_KEY=ollama \
-e KALLIA_PROVIDER_BASE_URL=http://localhost:11434/v1 \
-e KALLIA_PROVIDER_MODEL=qwen2.5vl:32b \
kallia:0.1.2
```
## ⚙️ Configuration
### Environment Variables
| Variable | Description | Example |
| -------------------------- | ---------------------------------------- | --------------------------- |
| `KALLIA_PROVIDER_API_KEY` | API key for your language model provider | `ollama` |
| `KALLIA_PROVIDER_BASE_URL` | Base URL for the API endpoint | `http://localhost:11434/v1` |
| `KALLIA_PROVIDER_MODEL` | Model identifier to use | `qwen2.5vl:32b` |
### Supported Providers
- **OpenRouter**: Use OpenRouter API for access to various models
- **Ollama**: Local model deployment with Ollama
- **Custom Endpoints**: Any OpenAI-compatible API endpoint
## 📖 Usage
### API Endpoint
**POST** `/documents`
Converts a document into semantic chunks with concise summaries.
#### Request Body
```json
{
"url": "https://raw.githubusercontent.com/kallia-project/kallia/refs/tags/v0.1.2/assets/pdf/01.pdf",
"page_number": 1,
"temperature": 0.0,
"max_tokens": 8192
}
```
#### Parameters
- `url` (string, required): URL to the document to process
- `page_number` (integer, optional): Specific page to process (default: 1)
- `temperature` (float, optional): Model temperature for processing (default: 0.0)
- `max_tokens` (integer, optional): Maximum tokens for processing (default: 8192)
#### Response
```json
{
"documents": [
{
"page_number": 1,
"chunks": [
{
"original_text": "Original document content...",
"concise_summary": "Concise summary of the content..."
}
]
}
]
}
```
### Example Usage
#### cURL
```bash
curl -X POST "http://localhost:8000/documents" \
-H "Content-Type: application/json" \
-d '{
"url": "https://raw.githubusercontent.com/kallia-project/kallia/refs/tags/v0.1.2/assets/pdf/01.pdf",
"page_number": 1,
"temperature": 0.0,
"max_tokens": 4096
}'
```
#### Python API Client
```python
import requests
response = requests.post(
"http://localhost:8000/documents",
json={
"url": "https://raw.githubusercontent.com/kallia-project/kallia/refs/tags/v0.1.2/assets/pdf/01.pdf",
"page_number": 1,
"temperature": 0.0,
"max_tokens": 4096
}
)
result = response.json()
documents = result["documents"]
for document in documents:
print(f"Page {document['page_number']}:")
for chunk in document["chunks"]:
print(f" Original: {chunk['original_text']}")
print(f" Summary: {chunk['concise_summary']}")
print(" ---")
```
### Programmatic Usage
You can also use Kallia directly as a Python library without running the API server:
#### Convert PDF to Markdown
```python
from kallia.documents import Documents
# Convert a PDF document to markdown
url = "./assets/pdf/01.pdf"
page_number = 1
temperature = 0.0
max_tokens = 8192
markdown_content = Documents.to_markdown(
source=url,
page_number=page_number,
temperature=temperature,
max_tokens=max_tokens,
)
print(markdown_content)
```
#### Create Semantic Chunks
```python
from kallia.documents import Documents
from kallia.chunker import Chunker
# First convert document to markdown
url = "./assets/pdf/01.pdf"
page_number = 1
temperature = 0.0
max_tokens = 8192
markdown_content = Documents.to_markdown(
source=url,
page_number=page_number,
temperature=temperature,
max_tokens=max_tokens,
)
# Then create semantic chunks from the markdown
semantic_chunks = Chunker.create(
text=markdown_content,
temperature=temperature,
max_tokens=max_tokens,
)
# Process the chunks
for chunk in semantic_chunks:
print(f"Original: {chunk.original_text}")
print(f"Summary: {chunk.concise_summary}")
print("---")
```
## 🎮 Interactive Playground
Kallia includes an interactive playground built with Chainlit for easy testing and exploration:
### Running the Playground
1. **Navigate to the playground directory**
```bash
cd kallia-playground
```
2. **Install playground dependencies**
```bash
pip install -r requirements.txt
```
3. **Configure environment variables**
```bash
cp .env.example .env
# Edit .env with your configuration
```
4. **Run the playground**
```bash
chainlit run kallia-playground/main.py
```
5. **Access the interface**
Open your browser to `http://localhost:8000`
### Playground Features
- **File Upload**: Upload PDF documents directly through the web interface
- **Real-time Processing**: Watch as documents are processed page by page
- **Interactive Q&A**: Ask questions about uploaded documents
- **Source References**: View the original chunks that inform each answer
- **Multi-page Support**: Automatically processes all pages of uploaded documents
## 🏗️ Project Structure
```
kallia/
├── kallia/ # Main package directory
│ ├── kallia/
│ │ ├── __init__.py
│ │ ├── main.py # FastAPI application entry point
│ │ ├── models.py # Pydantic models for API
│ │ ├── constants.py # Application constants
│ │ ├── documents.py # Document processing logic
│ │ ├── chunker.py # Semantic chunking implementation
│ │ ├── utils.py # Utility functions
│ │ ├── logger.py # Logging configuration
│ │ ├── settings.py # Application settings
│ │ ├── exceptions.py # Custom exceptions
│ │ ├── messages.py # Message handling
│ │ ├── prompts.py # AI model prompts
│ │ ├── image_caption_serializer.py
│ │ └── unordered_list_serializer.py
│ ├── requirements.txt # Python dependencies
│ ├── Dockerfile # Docker container configuration
│ ├── docker-compose.yml # Docker Compose setup
│ └── .env.example # Environment variables template
├── kallia-playground/ # Interactive demo application
│ ├── kallia-playground/
│ │ ├── __init__.py
│ │ ├── main.py # Chainlit application
│ │ ├── qa.py # Q&A functionality
│ │ ├── settings.py # Playground settings
│ │ ├── constants.py # Playground constants
│ │ └── chainlit.md # Chainlit configuration
│ ├── requirements.txt # Playground dependencies
│ ├── Dockerfile # Playground Docker config
│ ├── docker-compose.yml # Playground Docker Compose
│ └── .env.example # Playground environment template
├── tests/ # Test suite
│ ├── __init__.py
│ ├── test_pdf_to_markdown.py
│ └── test_markdown_to_chunks.py
├── assets/ # Test assets
│ └── pdf/
│ └── 01.pdf # Sample PDF for testing
├── LICENSE # Apache 2.0 License
├── pyproject.toml # Project configuration
└── README.md # This file
```
## 🔧 Development
### Code Style
The project follows Python best practices and uses:
- FastAPI for web framework
- Pydantic for data validation
- Structured logging
- Comprehensive error handling
### Testing
The project includes comprehensive tests for core functionality:
```bash
# Run tests
pytest tests/
# Run specific tests
pytest tests/test_pdf_to_markdown.py
pytest tests/test_markdown_to_chunks.py
```
Test coverage includes:
- PDF to markdown conversion
- Markdown to semantic chunks processing
- End-to-end document processing pipeline
## 📦 Dependencies
### Core Dependencies
- **FastAPI**: Modern, fast web framework for building APIs
- **Docling**: Document processing and conversion library
### Full Dependency List
See `kallia/requirements.txt` for complete dependency specifications:
- `fastapi[standard]==0.116.1`
- `docling==2.41.0`
### Playground Dependencies
The interactive playground has additional dependencies listed in `kallia-playground/requirements.txt`:
- `chainlit`: For the interactive web interface
- `langchain`: For document processing and Q&A functionality
- `pdfminer`: For PDF metadata extraction
## 🚨 Error Handling
The API provides comprehensive error handling with appropriate HTTP status codes:
- **400 Bad Request**: Invalid parameters or unsupported file format
- **500 Internal Server Error**: Processing errors
- **503 Service Unavailable**: External service connectivity issues
## 📄 License
This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.
## 👨💻 Author
**CK**
- Email: ck@kallia.net
- GitHub: [@kallia-project](https://github.com/kallia-project/kallia)
## 🔗 Links
- [GitHub Repository](https://github.com/kallia-project/kallia)
- [Issues](https://github.com/kallia-project/kallia/issues)
- [PyPI Package](https://pypi.org/project/kallia/)
- [Docker Hub](https://hub.docker.com/r/overheatsystem/kallia)
## 📈 Version
Current version: **0.1.2**
---
Built with ❤️ for intelligent document processing
Raw data
{
"_id": null,
"home_page": null,
"name": "kallia",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.11",
"maintainer_email": null,
"keywords": "document-processing, semantic-chunking, document-analysis, text-processing, machine-learning",
"author": null,
"author_email": "CK <ck@kallia.net>",
"download_url": "https://files.pythonhosted.org/packages/cf/a2/bad8f63e5b59cf5f308624feaa0bd314d09ed21c4a9925a4eeff148d0ff8/kallia-0.1.2.tar.gz",
"platform": null,
"description": "# Kallia\r\n\r\n**Semantic Document Processing Library**\r\n\r\nKallia is a FastAPI-based document processing service that converts documents into intelligent semantic chunks. The library specializes in extracting meaningful content segments from documents while preserving context and semantic relationships.\r\n\r\n## \ud83d\ude80 Features\r\n\r\n- **Document-to-Markdown Conversion**: Standardized processing pipeline for consistent output\r\n- **Semantic Chunking**: Intelligent content segmentation that respects document structure and meaning\r\n- **PDF Support**: Currently supports PDF documents with extensible architecture for additional formats\r\n- **RESTful API**: Clean, well-documented API interface with comprehensive error handling\r\n- **Configurable Processing**: Adjustable parameters for temperature, token limits, and page selection\r\n- **Docker Ready**: Containerized deployment with Docker and docker-compose support\r\n- **Vision-Language Model Integration**: Leverages advanced AI models for content understanding\r\n- **Interactive Playground**: Chainlit-based demo application for testing and exploration\r\n\r\n## \ud83d\udccb Prerequisites\r\n\r\n- Python 3.11 or higher (3.11, 3.12, 3.13 supported)\r\n- Docker (optional, for containerized deployment)\r\n- Access to a compatible language model API (OpenRouter, Ollama, etc.)\r\n\r\n## \ud83d\udee0\ufe0f Installation\r\n\r\n### PyPI Installation (Recommended)\r\n\r\nInstall Kallia directly from PyPI:\r\n\r\n```bash\r\npip install kallia\r\n```\r\n\r\n### Local Development Setup\r\n\r\n1. **Clone the repository**\r\n\r\n ```bash\r\n git clone https://github.com/kallia-project/kallia.git\r\n cd kallia\r\n ```\r\n\r\n2. **Install dependencies**\r\n\r\n ```bash\r\n cd kallia\r\n pip install -r requirements.txt\r\n ```\r\n\r\n3. **Configure environment variables**\r\n\r\n ```bash\r\n cp .env.example .env\r\n ```\r\n\r\n Edit `.env` with your configuration:\r\n\r\n ```env\r\n KALLIA_PROVIDER_API_KEY=your_api_key_here\r\n KALLIA_PROVIDER_BASE_URL=https://openrouter.ai/api/v1\r\n KALLIA_PROVIDER_MODEL=qwen/qwen2.5-vl-32b-instruct\r\n ```\r\n\r\n4. **Run the application**\r\n ```bash\r\n fastapi run kallia/main.py --port 8000\r\n ```\r\n\r\n### Docker Deployment\r\n\r\n1. **Using Docker Compose (Recommended)**\r\n\r\n ```bash\r\n cd kallia\r\n docker-compose up -d\r\n ```\r\n\r\n2. **Manual Docker Build**\r\n\r\n ```bash\r\n # Build the Docker image\r\n docker build -t kallia:0.1.2 .\r\n\r\n # Run the container\r\n docker run -p 8000:80 \\\r\n -e KALLIA_PROVIDER_API_KEY=ollama \\\r\n -e KALLIA_PROVIDER_BASE_URL=http://localhost:11434/v1 \\\r\n -e KALLIA_PROVIDER_MODEL=qwen2.5vl:32b \\\r\n kallia:0.1.2\r\n ```\r\n\r\n## \u2699\ufe0f Configuration\r\n\r\n### Environment Variables\r\n\r\n| Variable | Description | Example |\r\n| -------------------------- | ---------------------------------------- | --------------------------- |\r\n| `KALLIA_PROVIDER_API_KEY` | API key for your language model provider | `ollama` |\r\n| `KALLIA_PROVIDER_BASE_URL` | Base URL for the API endpoint | `http://localhost:11434/v1` |\r\n| `KALLIA_PROVIDER_MODEL` | Model identifier to use | `qwen2.5vl:32b` |\r\n\r\n### Supported Providers\r\n\r\n- **OpenRouter**: Use OpenRouter API for access to various models\r\n- **Ollama**: Local model deployment with Ollama\r\n- **Custom Endpoints**: Any OpenAI-compatible API endpoint\r\n\r\n## \ud83d\udcd6 Usage\r\n\r\n### API Endpoint\r\n\r\n**POST** `/documents`\r\n\r\nConverts a document into semantic chunks with concise summaries.\r\n\r\n#### Request Body\r\n\r\n```json\r\n{\r\n \"url\": \"https://raw.githubusercontent.com/kallia-project/kallia/refs/tags/v0.1.2/assets/pdf/01.pdf\",\r\n \"page_number\": 1,\r\n \"temperature\": 0.0,\r\n \"max_tokens\": 8192\r\n}\r\n```\r\n\r\n#### Parameters\r\n\r\n- `url` (string, required): URL to the document to process\r\n- `page_number` (integer, optional): Specific page to process (default: 1)\r\n- `temperature` (float, optional): Model temperature for processing (default: 0.0)\r\n- `max_tokens` (integer, optional): Maximum tokens for processing (default: 8192)\r\n\r\n#### Response\r\n\r\n```json\r\n{\r\n \"documents\": [\r\n {\r\n \"page_number\": 1,\r\n \"chunks\": [\r\n {\r\n \"original_text\": \"Original document content...\",\r\n \"concise_summary\": \"Concise summary of the content...\"\r\n }\r\n ]\r\n }\r\n ]\r\n}\r\n```\r\n\r\n### Example Usage\r\n\r\n#### cURL\r\n\r\n```bash\r\ncurl -X POST \"http://localhost:8000/documents\" \\\r\n -H \"Content-Type: application/json\" \\\r\n -d '{\r\n \"url\": \"https://raw.githubusercontent.com/kallia-project/kallia/refs/tags/v0.1.2/assets/pdf/01.pdf\",\r\n \"page_number\": 1,\r\n \"temperature\": 0.0,\r\n \"max_tokens\": 4096\r\n }'\r\n```\r\n\r\n#### Python API Client\r\n\r\n```python\r\nimport requests\r\n\r\nresponse = requests.post(\r\n \"http://localhost:8000/documents\",\r\n json={\r\n \"url\": \"https://raw.githubusercontent.com/kallia-project/kallia/refs/tags/v0.1.2/assets/pdf/01.pdf\",\r\n \"page_number\": 1,\r\n \"temperature\": 0.0,\r\n \"max_tokens\": 4096\r\n }\r\n)\r\n\r\nresult = response.json()\r\ndocuments = result[\"documents\"]\r\n\r\nfor document in documents:\r\n print(f\"Page {document['page_number']}:\")\r\n for chunk in document[\"chunks\"]:\r\n print(f\" Original: {chunk['original_text']}\")\r\n print(f\" Summary: {chunk['concise_summary']}\")\r\n print(\" ---\")\r\n```\r\n\r\n### Programmatic Usage\r\n\r\nYou can also use Kallia directly as a Python library without running the API server:\r\n\r\n#### Convert PDF to Markdown\r\n\r\n```python\r\nfrom kallia.documents import Documents\r\n\r\n# Convert a PDF document to markdown\r\nurl = \"./assets/pdf/01.pdf\"\r\npage_number = 1\r\ntemperature = 0.0\r\nmax_tokens = 8192\r\n\r\nmarkdown_content = Documents.to_markdown(\r\n source=url,\r\n page_number=page_number,\r\n temperature=temperature,\r\n max_tokens=max_tokens,\r\n)\r\n\r\nprint(markdown_content)\r\n```\r\n\r\n#### Create Semantic Chunks\r\n\r\n```python\r\nfrom kallia.documents import Documents\r\nfrom kallia.chunker import Chunker\r\n\r\n# First convert document to markdown\r\nurl = \"./assets/pdf/01.pdf\"\r\npage_number = 1\r\ntemperature = 0.0\r\nmax_tokens = 8192\r\n\r\nmarkdown_content = Documents.to_markdown(\r\n source=url,\r\n page_number=page_number,\r\n temperature=temperature,\r\n max_tokens=max_tokens,\r\n)\r\n\r\n# Then create semantic chunks from the markdown\r\nsemantic_chunks = Chunker.create(\r\n text=markdown_content,\r\n temperature=temperature,\r\n max_tokens=max_tokens,\r\n)\r\n\r\n# Process the chunks\r\nfor chunk in semantic_chunks:\r\n print(f\"Original: {chunk.original_text}\")\r\n print(f\"Summary: {chunk.concise_summary}\")\r\n print(\"---\")\r\n```\r\n\r\n## \ud83c\udfae Interactive Playground\r\n\r\nKallia includes an interactive playground built with Chainlit for easy testing and exploration:\r\n\r\n### Running the Playground\r\n\r\n1. **Navigate to the playground directory**\r\n\r\n ```bash\r\n cd kallia-playground\r\n ```\r\n\r\n2. **Install playground dependencies**\r\n\r\n ```bash\r\n pip install -r requirements.txt\r\n ```\r\n\r\n3. **Configure environment variables**\r\n\r\n ```bash\r\n cp .env.example .env\r\n # Edit .env with your configuration\r\n ```\r\n\r\n4. **Run the playground**\r\n\r\n ```bash\r\n chainlit run kallia-playground/main.py\r\n ```\r\n\r\n5. **Access the interface**\r\n Open your browser to `http://localhost:8000`\r\n\r\n### Playground Features\r\n\r\n- **File Upload**: Upload PDF documents directly through the web interface\r\n- **Real-time Processing**: Watch as documents are processed page by page\r\n- **Interactive Q&A**: Ask questions about uploaded documents\r\n- **Source References**: View the original chunks that inform each answer\r\n- **Multi-page Support**: Automatically processes all pages of uploaded documents\r\n\r\n## \ud83c\udfd7\ufe0f Project Structure\r\n\r\n```\r\nkallia/\r\n\u251c\u2500\u2500 kallia/ # Main package directory\r\n\u2502 \u251c\u2500\u2500 kallia/\r\n\u2502 \u2502 \u251c\u2500\u2500 __init__.py\r\n\u2502 \u2502 \u251c\u2500\u2500 main.py # FastAPI application entry point\r\n\u2502 \u2502 \u251c\u2500\u2500 models.py # Pydantic models for API\r\n\u2502 \u2502 \u251c\u2500\u2500 constants.py # Application constants\r\n\u2502 \u2502 \u251c\u2500\u2500 documents.py # Document processing logic\r\n\u2502 \u2502 \u251c\u2500\u2500 chunker.py # Semantic chunking implementation\r\n\u2502 \u2502 \u251c\u2500\u2500 utils.py # Utility functions\r\n\u2502 \u2502 \u251c\u2500\u2500 logger.py # Logging configuration\r\n\u2502 \u2502 \u251c\u2500\u2500 settings.py # Application settings\r\n\u2502 \u2502 \u251c\u2500\u2500 exceptions.py # Custom exceptions\r\n\u2502 \u2502 \u251c\u2500\u2500 messages.py # Message handling\r\n\u2502 \u2502 \u251c\u2500\u2500 prompts.py # AI model prompts\r\n\u2502 \u2502 \u251c\u2500\u2500 image_caption_serializer.py\r\n\u2502 \u2502 \u2514\u2500\u2500 unordered_list_serializer.py\r\n\u2502 \u251c\u2500\u2500 requirements.txt # Python dependencies\r\n\u2502 \u251c\u2500\u2500 Dockerfile # Docker container configuration\r\n\u2502 \u251c\u2500\u2500 docker-compose.yml # Docker Compose setup\r\n\u2502 \u2514\u2500\u2500 .env.example # Environment variables template\r\n\u251c\u2500\u2500 kallia-playground/ # Interactive demo application\r\n\u2502 \u251c\u2500\u2500 kallia-playground/\r\n\u2502 \u2502 \u251c\u2500\u2500 __init__.py\r\n\u2502 \u2502 \u251c\u2500\u2500 main.py # Chainlit application\r\n\u2502 \u2502 \u251c\u2500\u2500 qa.py # Q&A functionality\r\n\u2502 \u2502 \u251c\u2500\u2500 settings.py # Playground settings\r\n\u2502 \u2502 \u251c\u2500\u2500 constants.py # Playground constants\r\n\u2502 \u2502 \u2514\u2500\u2500 chainlit.md # Chainlit configuration\r\n\u2502 \u251c\u2500\u2500 requirements.txt # Playground dependencies\r\n\u2502 \u251c\u2500\u2500 Dockerfile # Playground Docker config\r\n\u2502 \u251c\u2500\u2500 docker-compose.yml # Playground Docker Compose\r\n\u2502 \u2514\u2500\u2500 .env.example # Playground environment template\r\n\u251c\u2500\u2500 tests/ # Test suite\r\n\u2502 \u251c\u2500\u2500 __init__.py\r\n\u2502 \u251c\u2500\u2500 test_pdf_to_markdown.py\r\n\u2502 \u2514\u2500\u2500 test_markdown_to_chunks.py\r\n\u251c\u2500\u2500 assets/ # Test assets\r\n\u2502 \u2514\u2500\u2500 pdf/\r\n\u2502 \u2514\u2500\u2500 01.pdf # Sample PDF for testing\r\n\u251c\u2500\u2500 LICENSE # Apache 2.0 License\r\n\u251c\u2500\u2500 pyproject.toml # Project configuration\r\n\u2514\u2500\u2500 README.md # This file\r\n```\r\n\r\n## \ud83d\udd27 Development\r\n\r\n### Code Style\r\n\r\nThe project follows Python best practices and uses:\r\n\r\n- FastAPI for web framework\r\n- Pydantic for data validation\r\n- Structured logging\r\n- Comprehensive error handling\r\n\r\n### Testing\r\n\r\nThe project includes comprehensive tests for core functionality:\r\n\r\n```bash\r\n# Run tests\r\npytest tests/\r\n\r\n# Run specific tests\r\npytest tests/test_pdf_to_markdown.py\r\npytest tests/test_markdown_to_chunks.py\r\n```\r\n\r\nTest coverage includes:\r\n\r\n- PDF to markdown conversion\r\n- Markdown to semantic chunks processing\r\n- End-to-end document processing pipeline\r\n\r\n## \ud83d\udce6 Dependencies\r\n\r\n### Core Dependencies\r\n\r\n- **FastAPI**: Modern, fast web framework for building APIs\r\n- **Docling**: Document processing and conversion library\r\n\r\n### Full Dependency List\r\n\r\nSee `kallia/requirements.txt` for complete dependency specifications:\r\n\r\n- `fastapi[standard]==0.116.1`\r\n- `docling==2.41.0`\r\n\r\n### Playground Dependencies\r\n\r\nThe interactive playground has additional dependencies listed in `kallia-playground/requirements.txt`:\r\n\r\n- `chainlit`: For the interactive web interface\r\n- `langchain`: For document processing and Q&A functionality\r\n- `pdfminer`: For PDF metadata extraction\r\n\r\n## \ud83d\udea8 Error Handling\r\n\r\nThe API provides comprehensive error handling with appropriate HTTP status codes:\r\n\r\n- **400 Bad Request**: Invalid parameters or unsupported file format\r\n- **500 Internal Server Error**: Processing errors\r\n- **503 Service Unavailable**: External service connectivity issues\r\n\r\n## \ud83d\udcc4 License\r\n\r\nThis project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.\r\n\r\n## \ud83d\udc68\u200d\ud83d\udcbb Author\r\n\r\n**CK**\r\n\r\n- Email: ck@kallia.net\r\n- GitHub: [@kallia-project](https://github.com/kallia-project/kallia)\r\n\r\n## \ud83d\udd17 Links\r\n\r\n- [GitHub Repository](https://github.com/kallia-project/kallia)\r\n- [Issues](https://github.com/kallia-project/kallia/issues)\r\n- [PyPI Package](https://pypi.org/project/kallia/)\r\n- [Docker Hub](https://hub.docker.com/r/overheatsystem/kallia)\r\n\r\n## \ud83d\udcc8 Version\r\n\r\nCurrent version: **0.1.2**\r\n\r\n---\r\n\r\nBuilt with \u2764\ufe0f for intelligent document processing\r\n",
"bugtrack_url": null,
"license": null,
"summary": "Semantic Document Processing Library",
"version": "0.1.2",
"project_urls": {
"Homepage": "https://github.com/kallia-project/kallia",
"Issues": "https://github.com/kallia-project/kallia/issues"
},
"split_keywords": [
"document-processing",
" semantic-chunking",
" document-analysis",
" text-processing",
" machine-learning"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "02c078130a691f55d60d210926f81f90574cb26f55e139d1dcb608da54b1a229",
"md5": "ddf52bcae769e1cb4e1da3d60aa33cac",
"sha256": "2bd95300b4121e9e52f893dd5ee36f9c6d0cfbeb3f4214667bb397dd11413725"
},
"downloads": -1,
"filename": "kallia-0.1.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ddf52bcae769e1cb4e1da3d60aa33cac",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.11",
"size": 17963,
"upload_time": "2025-07-13T08:55:51",
"upload_time_iso_8601": "2025-07-13T08:55:51.404042Z",
"url": "https://files.pythonhosted.org/packages/02/c0/78130a691f55d60d210926f81f90574cb26f55e139d1dcb608da54b1a229/kallia-0.1.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "cfa2bad8f63e5b59cf5f308624feaa0bd314d09ed21c4a9925a4eeff148d0ff8",
"md5": "7f163a50c629a61ffb1d0ccd71f449c0",
"sha256": "938be4461e0aedffc9afba661558a18bf067e767ed3ac27ec4d6b09ca514314e"
},
"downloads": -1,
"filename": "kallia-0.1.2.tar.gz",
"has_sig": false,
"md5_digest": "7f163a50c629a61ffb1d0ccd71f449c0",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.11",
"size": 19260,
"upload_time": "2025-07-13T08:55:52",
"upload_time_iso_8601": "2025-07-13T08:55:52.785903Z",
"url": "https://files.pythonhosted.org/packages/cf/a2/bad8f63e5b59cf5f308624feaa0bd314d09ed21c4a9925a4eeff148d0ff8/kallia-0.1.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-13 08:55:52",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "kallia-project",
"github_project": "kallia",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "kallia"
}