kallia


Namekallia JSON
Version 0.1.3 PyPI version JSON
download
home_pageNone
SummarySemantic Document Processing Library
upload_time2025-07-20 07:41:41
maintainerNone
docs_urlNone
authorNone
requires_python>=3.11
licenseNone
keywords document-processing semantic-chunking document-analysis text-processing machine-learning
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Kallia

[![Version](https://img.shields.io/badge/version-0.1.3-blue.svg)](https://github.com/kallia-project/kallia)
[![License](https://img.shields.io/badge/license-Apache%202.0-green.svg)](LICENSE)
[![Python](https://img.shields.io/badge/python-3.11%2B-blue.svg)](https://python.org)
[![Docker](https://img.shields.io/badge/docker-overheatsystem%2Fkallia-blue.svg)](https://hub.docker.com/r/overheatsystem/kallia)

**Kallia** is a semantic document processing library that converts documents into intelligent semantic chunks. The library specializes in extracting meaningful content segments from documents while preserving context and semantic relationships.

## ๐Ÿš€ Features

- **Document-to-Markdown Conversion**: Standardized processing pipeline for various document formats
- **Semantic Chunking**: Intelligent content segmentation that respects document structure and meaning
- **PDF Support**: Robust PDF processing with extensible architecture for additional formats
- **RESTful API**: FastAPI-based service with comprehensive error handling
- **Interactive Playground**: Chainlit-powered chat interface for document Q&A
- **Memory Management**: Long-term and short-term memory systems for conversational context
- **Configurable Processing**: Adjustable parameters (temperature, token limits, page selection)
- **Docker Support**: Containerized deployment for both core API and playground

## ๐Ÿ“‹ Requirements

- Python 3.11 or higher
- FastAPI 0.115.14
- Docling 2.41.0

## ๐Ÿ› ๏ธ Installation

### Using pip

```bash
pip install kallia
```

### From Source

```bash
git clone https://github.com/kallia-project/kallia.git
cd kallia
pip install -e .
```

## ๐Ÿ—๏ธ Project Structure

```
kallia/
โ”œโ”€โ”€ kallia/
โ”‚   โ”œโ”€โ”€ core/                    # Core API service
โ”‚   โ”‚   โ”œโ”€โ”€ kallia_core/         # Main library modules
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ main.py          # FastAPI application
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ documents.py     # Document processing
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ chunker.py       # Semantic chunking
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ memories.py      # Memory management
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ models.py        # Data models
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ ...
โ”‚   โ”‚   โ”œโ”€โ”€ requirements.txt     # Core dependencies
โ”‚   โ”‚   โ”œโ”€โ”€ Dockerfile          # Core service container
โ”‚   โ”‚   โ””โ”€โ”€ docker-compose.yml  # Core service orchestration
โ”‚   โ””โ”€โ”€ playground/             # Interactive chat interface
โ”‚       โ”œโ”€โ”€ kallia_playground/  # Playground modules
โ”‚       โ”‚   โ”œโ”€โ”€ main.py         # Chainlit application
โ”‚       โ”‚   โ”œโ”€โ”€ qa.py           # Q&A functionality
โ”‚       โ”‚   โ””โ”€โ”€ ...
โ”‚       โ”œโ”€โ”€ requirements.txt    # Playground dependencies
โ”‚       โ”œโ”€โ”€ Dockerfile         # Playground container
โ”‚       โ””โ”€โ”€ docker-compose.yml # Playground orchestration
โ”œโ”€โ”€ tests/                     # Test suite
โ”œโ”€โ”€ assets/                    # Sample documents
โ””โ”€โ”€ pyproject.toml            # Project configuration
```

## ๐Ÿš€ Quick Start

### 1. Core API Service

Start the FastAPI service:

```bash
cd kallia/core
pip install -r requirements.txt
uvicorn kallia_core.main:app --reload
```

The API will be available at `http://localhost:8000`

#### API Endpoints

**Process Documents**

```bash
POST /documents
```

Request body:

```json
{
  "url": "path/to/document.pdf",
  "page_number": 1,
  "temperature": 0.7,
  "max_tokens": 4000
}
```

**Create Memories**

```bash
POST /memories
```

Request body:

```json
{
  "messages": [
    { "role": "user", "content": "Hello" },
    { "role": "assistant", "content": "Hi there!" }
  ],
  "temperature": 0.7,
  "max_tokens": 4000
}
```

### 2. Interactive Playground

Start the Chainlit chat interface:

```bash
cd kallia/playground
pip install -r requirements.txt
chainlit run kallia_playground/main.py
```

The playground will be available at `http://localhost:8000`

### 3. Docker Deployment

**Core Service**

```bash
cd kallia/core
docker-compose up -d
```

**Playground**

```bash
cd kallia/playground
docker-compose up -d
```

## ๐Ÿ’ก Usage Examples

### Python API

```python
from kallia_core.documents import Documents
from kallia_core.chunker import Chunker
from kallia_core.memories import Memories

# Convert document to markdown
markdown_content = Documents.to_markdown(
    source="document.pdf",
    page_number=1,
    temperature=0.7,
    max_tokens=4000
)

# Create semantic chunks
chunks = Chunker.create(
    text=markdown_content,
    temperature=0.7,
    max_tokens=4000
)

# Generate memories from conversation
messages = [
    {"role": "user", "content": "What is this document about?"},
    {"role": "assistant", "content": "This document discusses..."}
]
memories = Memories.create(messages)
```

### REST API

```bash
# Process a document
curl -X POST "http://localhost:8000/documents" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://raw.githubusercontent.com/kallia-project/kallia/refs/tags/v0.1.3/assets/pdf/01.pdf",
    "page_number": 1,
    "temperature": 0.7,
    "max_tokens": 4000
  }'

# Create memories
curl -X POST "http://localhost:8000/memories" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "Hello"},
      {"role": "assistant", "content": "Hi there!"}
    ],
    "temperature": 0.7,
    "max_tokens": 4000
  }'
```

## ๐Ÿงช Testing

Run the test suite:

```bash
python -m pytest tests/
```

Available tests:

- `test_pdf_to_markdown.py` - Document conversion tests
- `test_markdown_to_chunks.py` - Chunking functionality tests
- `test_histories_to_memories.py` - Memory creation tests

## ๐Ÿ”ง Configuration

### Environment Variables

Create a `.env` file based on the provided `.env.example` template in each directory:

**Core Service**:

```bash
cd kallia/core
cp .env.example .env
# Edit .env with your configuration
```

**Playground**:

```bash
cd kallia/playground
cp .env.example .env
# Edit .env with your configuration
```

### Supported File Formats

Currently supported:

- PDF documents

The architecture is designed to be extensible for additional formats.

## ๐Ÿ“ License

This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.

## ๐Ÿ”— Links

- **Homepage**: [https://github.com/kallia-project/kallia](https://github.com/kallia-project/kallia)
- **Docker Hub**: [https://hub.docker.com/r/overheatsystem/kallia](https://hub.docker.com/r/overheatsystem/kallia)
- **Issues**: [https://github.com/kallia-project/kallia/issues](https://github.com/kallia-project/kallia/issues)
- **Documentation**: Coming soon

## ๐Ÿ‘จโ€๐Ÿ’ป Author

**CK** - [ck@kallia.net](mailto:ck@kallia.net)

## ๐Ÿท๏ธ Keywords

- document-processing
- semantic-chunking
- document-analysis
- text-processing
- machine-learning
- fastapi
- chainlit
- pdf-processing
- nlp
- ai

---

Built with โค๏ธ for intelligent document processing

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "kallia",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": null,
    "keywords": "document-processing, semantic-chunking, document-analysis, text-processing, machine-learning",
    "author": null,
    "author_email": "CK <ck@kallia.net>",
    "download_url": "https://files.pythonhosted.org/packages/65/d7/f23f9a667ac210c6dc42758b41a1a3ef37dbf148309c2c3ff63cc0051979/kallia-0.1.3.tar.gz",
    "platform": null,
    "description": "# Kallia\r\n\r\n[![Version](https://img.shields.io/badge/version-0.1.3-blue.svg)](https://github.com/kallia-project/kallia)\r\n[![License](https://img.shields.io/badge/license-Apache%202.0-green.svg)](LICENSE)\r\n[![Python](https://img.shields.io/badge/python-3.11%2B-blue.svg)](https://python.org)\r\n[![Docker](https://img.shields.io/badge/docker-overheatsystem%2Fkallia-blue.svg)](https://hub.docker.com/r/overheatsystem/kallia)\r\n\r\n**Kallia** is a semantic document processing library that converts documents into intelligent semantic chunks. The library specializes in extracting meaningful content segments from documents while preserving context and semantic relationships.\r\n\r\n## \ud83d\ude80 Features\r\n\r\n- **Document-to-Markdown Conversion**: Standardized processing pipeline for various document formats\r\n- **Semantic Chunking**: Intelligent content segmentation that respects document structure and meaning\r\n- **PDF Support**: Robust PDF processing with extensible architecture for additional formats\r\n- **RESTful API**: FastAPI-based service with comprehensive error handling\r\n- **Interactive Playground**: Chainlit-powered chat interface for document Q&A\r\n- **Memory Management**: Long-term and short-term memory systems for conversational context\r\n- **Configurable Processing**: Adjustable parameters (temperature, token limits, page selection)\r\n- **Docker Support**: Containerized deployment for both core API and playground\r\n\r\n## \ud83d\udccb Requirements\r\n\r\n- Python 3.11 or higher\r\n- FastAPI 0.115.14\r\n- Docling 2.41.0\r\n\r\n## \ud83d\udee0\ufe0f Installation\r\n\r\n### Using pip\r\n\r\n```bash\r\npip install kallia\r\n```\r\n\r\n### From Source\r\n\r\n```bash\r\ngit clone https://github.com/kallia-project/kallia.git\r\ncd kallia\r\npip install -e .\r\n```\r\n\r\n## \ud83c\udfd7\ufe0f Project Structure\r\n\r\n```\r\nkallia/\r\n\u251c\u2500\u2500 kallia/\r\n\u2502   \u251c\u2500\u2500 core/                    # Core API service\r\n\u2502   \u2502   \u251c\u2500\u2500 kallia_core/         # Main library modules\r\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 main.py          # FastAPI application\r\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 documents.py     # Document processing\r\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 chunker.py       # Semantic chunking\r\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 memories.py      # Memory management\r\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 models.py        # Data models\r\n\u2502   \u2502   \u2502   \u2514\u2500\u2500 ...\r\n\u2502   \u2502   \u251c\u2500\u2500 requirements.txt     # Core dependencies\r\n\u2502   \u2502   \u251c\u2500\u2500 Dockerfile          # Core service container\r\n\u2502   \u2502   \u2514\u2500\u2500 docker-compose.yml  # Core service orchestration\r\n\u2502   \u2514\u2500\u2500 playground/             # Interactive chat interface\r\n\u2502       \u251c\u2500\u2500 kallia_playground/  # Playground modules\r\n\u2502       \u2502   \u251c\u2500\u2500 main.py         # Chainlit application\r\n\u2502       \u2502   \u251c\u2500\u2500 qa.py           # Q&A functionality\r\n\u2502       \u2502   \u2514\u2500\u2500 ...\r\n\u2502       \u251c\u2500\u2500 requirements.txt    # Playground dependencies\r\n\u2502       \u251c\u2500\u2500 Dockerfile         # Playground container\r\n\u2502       \u2514\u2500\u2500 docker-compose.yml # Playground orchestration\r\n\u251c\u2500\u2500 tests/                     # Test suite\r\n\u251c\u2500\u2500 assets/                    # Sample documents\r\n\u2514\u2500\u2500 pyproject.toml            # Project configuration\r\n```\r\n\r\n## \ud83d\ude80 Quick Start\r\n\r\n### 1. Core API Service\r\n\r\nStart the FastAPI service:\r\n\r\n```bash\r\ncd kallia/core\r\npip install -r requirements.txt\r\nuvicorn kallia_core.main:app --reload\r\n```\r\n\r\nThe API will be available at `http://localhost:8000`\r\n\r\n#### API Endpoints\r\n\r\n**Process Documents**\r\n\r\n```bash\r\nPOST /documents\r\n```\r\n\r\nRequest body:\r\n\r\n```json\r\n{\r\n  \"url\": \"path/to/document.pdf\",\r\n  \"page_number\": 1,\r\n  \"temperature\": 0.7,\r\n  \"max_tokens\": 4000\r\n}\r\n```\r\n\r\n**Create Memories**\r\n\r\n```bash\r\nPOST /memories\r\n```\r\n\r\nRequest body:\r\n\r\n```json\r\n{\r\n  \"messages\": [\r\n    { \"role\": \"user\", \"content\": \"Hello\" },\r\n    { \"role\": \"assistant\", \"content\": \"Hi there!\" }\r\n  ],\r\n  \"temperature\": 0.7,\r\n  \"max_tokens\": 4000\r\n}\r\n```\r\n\r\n### 2. Interactive Playground\r\n\r\nStart the Chainlit chat interface:\r\n\r\n```bash\r\ncd kallia/playground\r\npip install -r requirements.txt\r\nchainlit run kallia_playground/main.py\r\n```\r\n\r\nThe playground will be available at `http://localhost:8000`\r\n\r\n### 3. Docker Deployment\r\n\r\n**Core Service**\r\n\r\n```bash\r\ncd kallia/core\r\ndocker-compose up -d\r\n```\r\n\r\n**Playground**\r\n\r\n```bash\r\ncd kallia/playground\r\ndocker-compose up -d\r\n```\r\n\r\n## \ud83d\udca1 Usage Examples\r\n\r\n### Python API\r\n\r\n```python\r\nfrom kallia_core.documents import Documents\r\nfrom kallia_core.chunker import Chunker\r\nfrom kallia_core.memories import Memories\r\n\r\n# Convert document to markdown\r\nmarkdown_content = Documents.to_markdown(\r\n    source=\"document.pdf\",\r\n    page_number=1,\r\n    temperature=0.7,\r\n    max_tokens=4000\r\n)\r\n\r\n# Create semantic chunks\r\nchunks = Chunker.create(\r\n    text=markdown_content,\r\n    temperature=0.7,\r\n    max_tokens=4000\r\n)\r\n\r\n# Generate memories from conversation\r\nmessages = [\r\n    {\"role\": \"user\", \"content\": \"What is this document about?\"},\r\n    {\"role\": \"assistant\", \"content\": \"This document discusses...\"}\r\n]\r\nmemories = Memories.create(messages)\r\n```\r\n\r\n### REST API\r\n\r\n```bash\r\n# Process a document\r\ncurl -X POST \"http://localhost:8000/documents\" \\\r\n  -H \"Content-Type: application/json\" \\\r\n  -d '{\r\n    \"url\": \"https://raw.githubusercontent.com/kallia-project/kallia/refs/tags/v0.1.3/assets/pdf/01.pdf\",\r\n    \"page_number\": 1,\r\n    \"temperature\": 0.7,\r\n    \"max_tokens\": 4000\r\n  }'\r\n\r\n# Create memories\r\ncurl -X POST \"http://localhost:8000/memories\" \\\r\n  -H \"Content-Type: application/json\" \\\r\n  -d '{\r\n    \"messages\": [\r\n      {\"role\": \"user\", \"content\": \"Hello\"},\r\n      {\"role\": \"assistant\", \"content\": \"Hi there!\"}\r\n    ],\r\n    \"temperature\": 0.7,\r\n    \"max_tokens\": 4000\r\n  }'\r\n```\r\n\r\n## \ud83e\uddea Testing\r\n\r\nRun the test suite:\r\n\r\n```bash\r\npython -m pytest tests/\r\n```\r\n\r\nAvailable tests:\r\n\r\n- `test_pdf_to_markdown.py` - Document conversion tests\r\n- `test_markdown_to_chunks.py` - Chunking functionality tests\r\n- `test_histories_to_memories.py` - Memory creation tests\r\n\r\n## \ud83d\udd27 Configuration\r\n\r\n### Environment Variables\r\n\r\nCreate a `.env` file based on the provided `.env.example` template in each directory:\r\n\r\n**Core Service**:\r\n\r\n```bash\r\ncd kallia/core\r\ncp .env.example .env\r\n# Edit .env with your configuration\r\n```\r\n\r\n**Playground**:\r\n\r\n```bash\r\ncd kallia/playground\r\ncp .env.example .env\r\n# Edit .env with your configuration\r\n```\r\n\r\n### Supported File Formats\r\n\r\nCurrently supported:\r\n\r\n- PDF documents\r\n\r\nThe architecture is designed to be extensible for additional formats.\r\n\r\n## \ud83d\udcdd License\r\n\r\nThis project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.\r\n\r\n## \ud83d\udd17 Links\r\n\r\n- **Homepage**: [https://github.com/kallia-project/kallia](https://github.com/kallia-project/kallia)\r\n- **Docker Hub**: [https://hub.docker.com/r/overheatsystem/kallia](https://hub.docker.com/r/overheatsystem/kallia)\r\n- **Issues**: [https://github.com/kallia-project/kallia/issues](https://github.com/kallia-project/kallia/issues)\r\n- **Documentation**: Coming soon\r\n\r\n## \ud83d\udc68\u200d\ud83d\udcbb Author\r\n\r\n**CK** - [ck@kallia.net](mailto:ck@kallia.net)\r\n\r\n## \ud83c\udff7\ufe0f Keywords\r\n\r\n- document-processing\r\n- semantic-chunking\r\n- document-analysis\r\n- text-processing\r\n- machine-learning\r\n- fastapi\r\n- chainlit\r\n- pdf-processing\r\n- nlp\r\n- ai\r\n\r\n---\r\n\r\nBuilt with \u2764\ufe0f for intelligent document processing\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Semantic Document Processing Library",
    "version": "0.1.3",
    "project_urls": {
        "Homepage": "https://github.com/kallia-project/kallia",
        "Issues": "https://github.com/kallia-project/kallia/issues"
    },
    "split_keywords": [
        "document-processing",
        " semantic-chunking",
        " document-analysis",
        " text-processing",
        " machine-learning"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "1b63a6e943000a1d2c368ac41bf0f96840ffba03989d64c93f053cfa4986b0fd",
                "md5": "53233d35129e0411891333578222444a",
                "sha256": "6ea11d63aa82a6df01122cb94ab13eaf89865d5c47c2773521b2cba6d5f9877b"
            },
            "downloads": -1,
            "filename": "kallia-0.1.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "53233d35129e0411891333578222444a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11",
            "size": 17701,
            "upload_time": "2025-07-20T07:41:40",
            "upload_time_iso_8601": "2025-07-20T07:41:40.270479Z",
            "url": "https://files.pythonhosted.org/packages/1b/63/a6e943000a1d2c368ac41bf0f96840ffba03989d64c93f053cfa4986b0fd/kallia-0.1.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "65d7f23f9a667ac210c6dc42758b41a1a3ef37dbf148309c2c3ff63cc0051979",
                "md5": "f3601920a99ddbaf06c0f1b866b94160",
                "sha256": "7e2193680ca072395644a8d4ce5f3d5f3019030f5be897d9d9746a353268e47e"
            },
            "downloads": -1,
            "filename": "kallia-0.1.3.tar.gz",
            "has_sig": false,
            "md5_digest": "f3601920a99ddbaf06c0f1b866b94160",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11",
            "size": 17124,
            "upload_time": "2025-07-20T07:41:41",
            "upload_time_iso_8601": "2025-07-20T07:41:41.531546Z",
            "url": "https://files.pythonhosted.org/packages/65/d7/f23f9a667ac210c6dc42758b41a1a3ef37dbf148309c2c3ff63cc0051979/kallia-0.1.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-20 07:41:41",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "kallia-project",
    "github_project": "kallia",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "kallia"
}
        
Elapsed time: 0.39949s