# Kallia
[](https://github.com/kallia-project/kallia)
[](LICENSE)
[](https://python.org)
[](https://hub.docker.com/r/overheatsystem/kallia)
**Kallia** is a semantic document processing library that converts documents into intelligent semantic chunks. The library specializes in extracting meaningful content segments from documents while preserving context and semantic relationships.
## ๐ Features
- **Document-to-Markdown Conversion**: Standardized processing pipeline for various document formats
- **Semantic Chunking**: Intelligent content segmentation that respects document structure and meaning
- **PDF Support**: Robust PDF processing with extensible architecture for additional formats
- **RESTful API**: FastAPI-based service with comprehensive error handling
- **Interactive Playground**: Chainlit-powered chat interface for document Q&A
- **Memory Management**: Long-term and short-term memory systems for conversational context
- **Configurable Processing**: Adjustable parameters (temperature, token limits, page selection)
- **Docker Support**: Containerized deployment for both core API and playground
## ๐ Requirements
- Python 3.11 or higher
- FastAPI 0.115.14
- Docling 2.41.0
## ๐ ๏ธ Installation
### Using pip
```bash
pip install kallia
```
### From Source
```bash
git clone https://github.com/kallia-project/kallia.git
cd kallia
pip install -e .
```
## ๐๏ธ Project Structure
```
kallia/
โโโ kallia/
โ โโโ core/ # Core API service
โ โ โโโ kallia_core/ # Main library modules
โ โ โ โโโ main.py # FastAPI application
โ โ โ โโโ documents.py # Document processing
โ โ โ โโโ chunker.py # Semantic chunking
โ โ โ โโโ memories.py # Memory management
โ โ โ โโโ models.py # Data models
โ โ โ โโโ ...
โ โ โโโ requirements.txt # Core dependencies
โ โ โโโ Dockerfile # Core service container
โ โ โโโ docker-compose.yml # Core service orchestration
โ โโโ playground/ # Interactive chat interface
โ โโโ kallia_playground/ # Playground modules
โ โ โโโ main.py # Chainlit application
โ โ โโโ qa.py # Q&A functionality
โ โ โโโ ...
โ โโโ requirements.txt # Playground dependencies
โ โโโ Dockerfile # Playground container
โ โโโ docker-compose.yml # Playground orchestration
โโโ tests/ # Test suite
โโโ assets/ # Sample documents
โโโ pyproject.toml # Project configuration
```
## ๐ Quick Start
### 1. Core API Service
Start the FastAPI service:
```bash
cd kallia/core
pip install -r requirements.txt
uvicorn kallia_core.main:app --reload
```
The API will be available at `http://localhost:8000`
#### API Endpoints
**Process Documents**
```bash
POST /documents
```
Request body:
```json
{
"url": "path/to/document.pdf",
"page_number": 1,
"temperature": 0.7,
"max_tokens": 4000
}
```
**Create Memories**
```bash
POST /memories
```
Request body:
```json
{
"messages": [
{ "role": "user", "content": "Hello" },
{ "role": "assistant", "content": "Hi there!" }
],
"temperature": 0.7,
"max_tokens": 4000
}
```
### 2. Interactive Playground
Start the Chainlit chat interface:
```bash
cd kallia/playground
pip install -r requirements.txt
chainlit run kallia_playground/main.py
```
The playground will be available at `http://localhost:8000`
### 3. Docker Deployment
**Core Service**
```bash
cd kallia/core
docker-compose up -d
```
**Playground**
```bash
cd kallia/playground
docker-compose up -d
```
## ๐ก Usage Examples
### Python API
```python
from kallia_core.documents import Documents
from kallia_core.chunker import Chunker
from kallia_core.memories import Memories
# Convert document to markdown
markdown_content = Documents.to_markdown(
source="document.pdf",
page_number=1,
temperature=0.7,
max_tokens=4000
)
# Create semantic chunks
chunks = Chunker.create(
text=markdown_content,
temperature=0.7,
max_tokens=4000
)
# Generate memories from conversation
messages = [
{"role": "user", "content": "What is this document about?"},
{"role": "assistant", "content": "This document discusses..."}
]
memories = Memories.create(messages)
```
### REST API
```bash
# Process a document
curl -X POST "http://localhost:8000/documents" \
-H "Content-Type: application/json" \
-d '{
"url": "https://raw.githubusercontent.com/kallia-project/kallia/refs/tags/v0.1.3/assets/pdf/01.pdf",
"page_number": 1,
"temperature": 0.7,
"max_tokens": 4000
}'
# Create memories
curl -X POST "http://localhost:8000/memories" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "Hello"},
{"role": "assistant", "content": "Hi there!"}
],
"temperature": 0.7,
"max_tokens": 4000
}'
```
## ๐งช Testing
Run the test suite:
```bash
python -m pytest tests/
```
Available tests:
- `test_pdf_to_markdown.py` - Document conversion tests
- `test_markdown_to_chunks.py` - Chunking functionality tests
- `test_histories_to_memories.py` - Memory creation tests
## ๐ง Configuration
### Environment Variables
Create a `.env` file based on the provided `.env.example` template in each directory:
**Core Service**:
```bash
cd kallia/core
cp .env.example .env
# Edit .env with your configuration
```
**Playground**:
```bash
cd kallia/playground
cp .env.example .env
# Edit .env with your configuration
```
### Supported File Formats
Currently supported:
- PDF documents
The architecture is designed to be extensible for additional formats.
## ๐ License
This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.
## ๐ Links
- **Homepage**: [https://github.com/kallia-project/kallia](https://github.com/kallia-project/kallia)
- **Docker Hub**: [https://hub.docker.com/r/overheatsystem/kallia](https://hub.docker.com/r/overheatsystem/kallia)
- **Issues**: [https://github.com/kallia-project/kallia/issues](https://github.com/kallia-project/kallia/issues)
- **Documentation**: Coming soon
## ๐จโ๐ป Author
**CK** - [ck@kallia.net](mailto:ck@kallia.net)
## ๐ท๏ธ Keywords
- document-processing
- semantic-chunking
- document-analysis
- text-processing
- machine-learning
- fastapi
- chainlit
- pdf-processing
- nlp
- ai
---
Built with โค๏ธ for intelligent document processing
Raw data
{
"_id": null,
"home_page": null,
"name": "kallia",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.11",
"maintainer_email": null,
"keywords": "document-processing, semantic-chunking, document-analysis, text-processing, machine-learning",
"author": null,
"author_email": "CK <ck@kallia.net>",
"download_url": "https://files.pythonhosted.org/packages/65/d7/f23f9a667ac210c6dc42758b41a1a3ef37dbf148309c2c3ff63cc0051979/kallia-0.1.3.tar.gz",
"platform": null,
"description": "# Kallia\r\n\r\n[](https://github.com/kallia-project/kallia)\r\n[](LICENSE)\r\n[](https://python.org)\r\n[](https://hub.docker.com/r/overheatsystem/kallia)\r\n\r\n**Kallia** is a semantic document processing library that converts documents into intelligent semantic chunks. The library specializes in extracting meaningful content segments from documents while preserving context and semantic relationships.\r\n\r\n## \ud83d\ude80 Features\r\n\r\n- **Document-to-Markdown Conversion**: Standardized processing pipeline for various document formats\r\n- **Semantic Chunking**: Intelligent content segmentation that respects document structure and meaning\r\n- **PDF Support**: Robust PDF processing with extensible architecture for additional formats\r\n- **RESTful API**: FastAPI-based service with comprehensive error handling\r\n- **Interactive Playground**: Chainlit-powered chat interface for document Q&A\r\n- **Memory Management**: Long-term and short-term memory systems for conversational context\r\n- **Configurable Processing**: Adjustable parameters (temperature, token limits, page selection)\r\n- **Docker Support**: Containerized deployment for both core API and playground\r\n\r\n## \ud83d\udccb Requirements\r\n\r\n- Python 3.11 or higher\r\n- FastAPI 0.115.14\r\n- Docling 2.41.0\r\n\r\n## \ud83d\udee0\ufe0f Installation\r\n\r\n### Using pip\r\n\r\n```bash\r\npip install kallia\r\n```\r\n\r\n### From Source\r\n\r\n```bash\r\ngit clone https://github.com/kallia-project/kallia.git\r\ncd kallia\r\npip install -e .\r\n```\r\n\r\n## \ud83c\udfd7\ufe0f Project Structure\r\n\r\n```\r\nkallia/\r\n\u251c\u2500\u2500 kallia/\r\n\u2502 \u251c\u2500\u2500 core/ # Core API service\r\n\u2502 \u2502 \u251c\u2500\u2500 kallia_core/ # Main library modules\r\n\u2502 \u2502 \u2502 \u251c\u2500\u2500 main.py # FastAPI application\r\n\u2502 \u2502 \u2502 \u251c\u2500\u2500 documents.py # Document processing\r\n\u2502 \u2502 \u2502 \u251c\u2500\u2500 chunker.py # Semantic chunking\r\n\u2502 \u2502 \u2502 \u251c\u2500\u2500 memories.py # Memory management\r\n\u2502 \u2502 \u2502 \u251c\u2500\u2500 models.py # Data models\r\n\u2502 \u2502 \u2502 \u2514\u2500\u2500 ...\r\n\u2502 \u2502 \u251c\u2500\u2500 requirements.txt # Core dependencies\r\n\u2502 \u2502 \u251c\u2500\u2500 Dockerfile # Core service container\r\n\u2502 \u2502 \u2514\u2500\u2500 docker-compose.yml # Core service orchestration\r\n\u2502 \u2514\u2500\u2500 playground/ # Interactive chat interface\r\n\u2502 \u251c\u2500\u2500 kallia_playground/ # Playground modules\r\n\u2502 \u2502 \u251c\u2500\u2500 main.py # Chainlit application\r\n\u2502 \u2502 \u251c\u2500\u2500 qa.py # Q&A functionality\r\n\u2502 \u2502 \u2514\u2500\u2500 ...\r\n\u2502 \u251c\u2500\u2500 requirements.txt # Playground dependencies\r\n\u2502 \u251c\u2500\u2500 Dockerfile # Playground container\r\n\u2502 \u2514\u2500\u2500 docker-compose.yml # Playground orchestration\r\n\u251c\u2500\u2500 tests/ # Test suite\r\n\u251c\u2500\u2500 assets/ # Sample documents\r\n\u2514\u2500\u2500 pyproject.toml # Project configuration\r\n```\r\n\r\n## \ud83d\ude80 Quick Start\r\n\r\n### 1. Core API Service\r\n\r\nStart the FastAPI service:\r\n\r\n```bash\r\ncd kallia/core\r\npip install -r requirements.txt\r\nuvicorn kallia_core.main:app --reload\r\n```\r\n\r\nThe API will be available at `http://localhost:8000`\r\n\r\n#### API Endpoints\r\n\r\n**Process Documents**\r\n\r\n```bash\r\nPOST /documents\r\n```\r\n\r\nRequest body:\r\n\r\n```json\r\n{\r\n \"url\": \"path/to/document.pdf\",\r\n \"page_number\": 1,\r\n \"temperature\": 0.7,\r\n \"max_tokens\": 4000\r\n}\r\n```\r\n\r\n**Create Memories**\r\n\r\n```bash\r\nPOST /memories\r\n```\r\n\r\nRequest body:\r\n\r\n```json\r\n{\r\n \"messages\": [\r\n { \"role\": \"user\", \"content\": \"Hello\" },\r\n { \"role\": \"assistant\", \"content\": \"Hi there!\" }\r\n ],\r\n \"temperature\": 0.7,\r\n \"max_tokens\": 4000\r\n}\r\n```\r\n\r\n### 2. Interactive Playground\r\n\r\nStart the Chainlit chat interface:\r\n\r\n```bash\r\ncd kallia/playground\r\npip install -r requirements.txt\r\nchainlit run kallia_playground/main.py\r\n```\r\n\r\nThe playground will be available at `http://localhost:8000`\r\n\r\n### 3. Docker Deployment\r\n\r\n**Core Service**\r\n\r\n```bash\r\ncd kallia/core\r\ndocker-compose up -d\r\n```\r\n\r\n**Playground**\r\n\r\n```bash\r\ncd kallia/playground\r\ndocker-compose up -d\r\n```\r\n\r\n## \ud83d\udca1 Usage Examples\r\n\r\n### Python API\r\n\r\n```python\r\nfrom kallia_core.documents import Documents\r\nfrom kallia_core.chunker import Chunker\r\nfrom kallia_core.memories import Memories\r\n\r\n# Convert document to markdown\r\nmarkdown_content = Documents.to_markdown(\r\n source=\"document.pdf\",\r\n page_number=1,\r\n temperature=0.7,\r\n max_tokens=4000\r\n)\r\n\r\n# Create semantic chunks\r\nchunks = Chunker.create(\r\n text=markdown_content,\r\n temperature=0.7,\r\n max_tokens=4000\r\n)\r\n\r\n# Generate memories from conversation\r\nmessages = [\r\n {\"role\": \"user\", \"content\": \"What is this document about?\"},\r\n {\"role\": \"assistant\", \"content\": \"This document discusses...\"}\r\n]\r\nmemories = Memories.create(messages)\r\n```\r\n\r\n### REST API\r\n\r\n```bash\r\n# Process a document\r\ncurl -X POST \"http://localhost:8000/documents\" \\\r\n -H \"Content-Type: application/json\" \\\r\n -d '{\r\n \"url\": \"https://raw.githubusercontent.com/kallia-project/kallia/refs/tags/v0.1.3/assets/pdf/01.pdf\",\r\n \"page_number\": 1,\r\n \"temperature\": 0.7,\r\n \"max_tokens\": 4000\r\n }'\r\n\r\n# Create memories\r\ncurl -X POST \"http://localhost:8000/memories\" \\\r\n -H \"Content-Type: application/json\" \\\r\n -d '{\r\n \"messages\": [\r\n {\"role\": \"user\", \"content\": \"Hello\"},\r\n {\"role\": \"assistant\", \"content\": \"Hi there!\"}\r\n ],\r\n \"temperature\": 0.7,\r\n \"max_tokens\": 4000\r\n }'\r\n```\r\n\r\n## \ud83e\uddea Testing\r\n\r\nRun the test suite:\r\n\r\n```bash\r\npython -m pytest tests/\r\n```\r\n\r\nAvailable tests:\r\n\r\n- `test_pdf_to_markdown.py` - Document conversion tests\r\n- `test_markdown_to_chunks.py` - Chunking functionality tests\r\n- `test_histories_to_memories.py` - Memory creation tests\r\n\r\n## \ud83d\udd27 Configuration\r\n\r\n### Environment Variables\r\n\r\nCreate a `.env` file based on the provided `.env.example` template in each directory:\r\n\r\n**Core Service**:\r\n\r\n```bash\r\ncd kallia/core\r\ncp .env.example .env\r\n# Edit .env with your configuration\r\n```\r\n\r\n**Playground**:\r\n\r\n```bash\r\ncd kallia/playground\r\ncp .env.example .env\r\n# Edit .env with your configuration\r\n```\r\n\r\n### Supported File Formats\r\n\r\nCurrently supported:\r\n\r\n- PDF documents\r\n\r\nThe architecture is designed to be extensible for additional formats.\r\n\r\n## \ud83d\udcdd License\r\n\r\nThis project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.\r\n\r\n## \ud83d\udd17 Links\r\n\r\n- **Homepage**: [https://github.com/kallia-project/kallia](https://github.com/kallia-project/kallia)\r\n- **Docker Hub**: [https://hub.docker.com/r/overheatsystem/kallia](https://hub.docker.com/r/overheatsystem/kallia)\r\n- **Issues**: [https://github.com/kallia-project/kallia/issues](https://github.com/kallia-project/kallia/issues)\r\n- **Documentation**: Coming soon\r\n\r\n## \ud83d\udc68\u200d\ud83d\udcbb Author\r\n\r\n**CK** - [ck@kallia.net](mailto:ck@kallia.net)\r\n\r\n## \ud83c\udff7\ufe0f Keywords\r\n\r\n- document-processing\r\n- semantic-chunking\r\n- document-analysis\r\n- text-processing\r\n- machine-learning\r\n- fastapi\r\n- chainlit\r\n- pdf-processing\r\n- nlp\r\n- ai\r\n\r\n---\r\n\r\nBuilt with \u2764\ufe0f for intelligent document processing\r\n",
"bugtrack_url": null,
"license": null,
"summary": "Semantic Document Processing Library",
"version": "0.1.3",
"project_urls": {
"Homepage": "https://github.com/kallia-project/kallia",
"Issues": "https://github.com/kallia-project/kallia/issues"
},
"split_keywords": [
"document-processing",
" semantic-chunking",
" document-analysis",
" text-processing",
" machine-learning"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "1b63a6e943000a1d2c368ac41bf0f96840ffba03989d64c93f053cfa4986b0fd",
"md5": "53233d35129e0411891333578222444a",
"sha256": "6ea11d63aa82a6df01122cb94ab13eaf89865d5c47c2773521b2cba6d5f9877b"
},
"downloads": -1,
"filename": "kallia-0.1.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "53233d35129e0411891333578222444a",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.11",
"size": 17701,
"upload_time": "2025-07-20T07:41:40",
"upload_time_iso_8601": "2025-07-20T07:41:40.270479Z",
"url": "https://files.pythonhosted.org/packages/1b/63/a6e943000a1d2c368ac41bf0f96840ffba03989d64c93f053cfa4986b0fd/kallia-0.1.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "65d7f23f9a667ac210c6dc42758b41a1a3ef37dbf148309c2c3ff63cc0051979",
"md5": "f3601920a99ddbaf06c0f1b866b94160",
"sha256": "7e2193680ca072395644a8d4ce5f3d5f3019030f5be897d9d9746a353268e47e"
},
"downloads": -1,
"filename": "kallia-0.1.3.tar.gz",
"has_sig": false,
"md5_digest": "f3601920a99ddbaf06c0f1b866b94160",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.11",
"size": 17124,
"upload_time": "2025-07-20T07:41:41",
"upload_time_iso_8601": "2025-07-20T07:41:41.531546Z",
"url": "https://files.pythonhosted.org/packages/65/d7/f23f9a667ac210c6dc42758b41a1a3ef37dbf148309c2c3ff63cc0051979/kallia-0.1.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-20 07:41:41",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "kallia-project",
"github_project": "kallia",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "kallia"
}