# π RocketRAG
**Fast, efficient, minimal, extendible and elegant RAG system**
RocketRAG is a high-performance Retrieval-Augmented Generation (RAG) system designed with a focus on speed, simplicity, and extensibility. Built on top of state-of-the-art libraries, it provides both CLI and web server capabilities for seamless integration into any workflow.
## π― Mission
RocketRAG aims to be the **fastest and most efficient RAG library** while maintaining:
- **Minimal footprint** - Clean, lightweight codebase
- **Maximum extensibility** - Pluggable architecture for all components
- **Peak performance** - Leveraging the best-in-class libraries
- **Ease of use** - Simple CLI and API interfaces
## β‘ Performance-First Architecture
RocketRAG is built on top of cutting-edge, performance-optimized libraries:
- **[Chonkie](https://github.com/bhavnicksm/chonkie)** - Ultra-fast semantic chunking with model2vec
- **[Kreuzberg](https://github.com/mixedbread-ai/kreuzberg)** - Lightning-fast document loading and processing
- **[llama-cpp-python](https://github.com/abetlen/llama-cpp-python)** - Optimized LLM inference with GGUF support
- **[Milvus Lite](https://github.com/milvus-io/milvus-lite)** - High-performance vector database
- **[Sentence Transformers](https://github.com/UKPLab/sentence-transformers)** - State-of-the-art embeddings
## π Quick Start
### Installation
#### Using pip
```bash
pip install rocketrag
```
#### Using uvx (recommended for CLI usage)
```bash
# Run directly without installation
uvx rocketrag --help
# Or install globally
uvx install rocketrag
```
### Basic Usage
```python
from rocketrag import RocketRAG
rag = RocketRAG("./data") # Path do your data (supports PDF, TXT, MD, etc.)
rag.prepare() # Construct vector database
# Ask questions
answer, sources = rag.ask("What is the main topic of the documents?")
print(answer)
```
### CLI Usage
```bash
# Prepare documents from a directory
rocketrag prepare --data-dir ./documents
# Ask questions via CLI
rocketrag ask "What are the key findings?"
# Start web server
rocketrag server --port 8000
```
#### Using uvx (no installation required)
```bash
# Same commands work with uvx
uvx rocketrag prepare --data-dir ./documents
uvx rocketrag ask "What are the key findings?"
uvx rocketrag server --port 8000
# Run as module
uvx --from rocketrag python -m rocketrag --help
```
## ποΈ Architecture
RocketRAG follows a modular, plugin-based architecture:
```
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Document β β Chunking β β Vectorization β
β Loaders βββββΆβ (Chonkie) βββββΆβ (SentenceTransf)β
β (Kreuzberg) β β β β β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β
βββββββββββββββββββ βββββββββββββββββββ β
β LLM β β Vector DB βββββββββββββββ
β (llama-cpp-py) ββββββ (Milvus Lite) β
β β β β
βββββββββββββββββββ βββββββββββββββββββ
```
### Core Components
- **BaseLoader**: Pluggable document loading (PDF, TXT, MD, etc.)
- **BaseChunker**: Configurable chunking strategies (semantic, recursive, etc.)
- **BaseVectorizer**: Flexible embedding models
- **BaseLLM**: Swappable language models
- **MilvusLiteDB**: High-performance vector storage and retrieval
## π§ Configuration
### Custom Components
```python
from rocketrag import RocketRAG
from rocketrag.vectors import SentenceTransformersVectorizer
from rocketrag.chonk import ChonkieChunker
from rocketrag.llm import LLamaLLM
from rocketrag.loaders import KreuzbergLoader
# Configure high-performance components
vectorizer = SentenceTransformersVectorizer(
model_name="minishlab/potion-multilingual-128M" # Fast multilingual model
)
chunker = ChonkieChunker(
method="semantic", # Semantic chunking for better context
embedding_model="minishlab/potion-multilingual-128M",
chunk_size=512
)
llm = LLamaLLM(
repo_id="unsloth/gemma-3n-E2B-it-GGUF",
filename="*Q8_0.gguf" # Quantized for speed
)
loader = KreuzbergLoader() # Ultra-fast document processing
rag = RocketRAG(
vectorizer=vectorizer,
chunker=chunker,
llm=llm,
loader=loader
)
```
### CLI Configuration
```bash
# Custom chunking strategy
rocketrag prepare \
--chonker chonkie \
--chonker-args '{"method": "semantic", "chunk_size": 512}' \
--vectorizer-args '{"model_name": "all-MiniLM-L6-v2"}'
# Custom LLM for inference
rocketrag ask "Your question" \
--repo-id "microsoft/DialoGPT-medium" \
--filename "*.gguf"
```
## π Web Server
RocketRAG includes a FastAPI-based web server with OpenAI-compatible endpoints:
```bash
# Start server
rocketrag server --port 8000 --host 0.0.0.0
```
### API Endpoints
- `GET /` - Interactive web interface
- `POST /ask` - Question answering
- `POST /ask/stream` - Streaming responses
- `GET /chat` - Chat interface
- `GET /browse` - Document browser
- `GET /visualize` - Vector visualization
- `GET /health` - Health check
### Example API Usage
```python
import requests
response = requests.post(
"http://localhost:8000/ask",
json={"question": "What are the main findings?"}
)
result = response.json()
print(result["answer"])
print(result["sources"])
```
## π¨ Features
### Core Features
- β‘ **Ultra-fast document processing** with Kreuzberg
- π§ **Semantic chunking** with Chonkie and model2vec
- π **High-performance vector search** with Milvus Lite
- π€ **Optimized LLM inference** with llama-cpp-python
- π **Rich CLI interface** with progress bars and formatting
- π **Web server** with interactive UI
- π **Pluggable architecture** for easy customization
### Advanced Features
- π **Vector visualization** for debugging and analysis
- π **Document browsing** interface
- π¬ **Streaming responses** for real-time interaction
- π **Batch processing** for large document sets
- π **Metadata preservation** throughout the pipeline
- π― **Context-aware chunking** for better retrieval
## π οΈ Development
### Installation for Development
```bash
git clone https://github.com/yourusername/rocketrag.git
cd rocketrag
pip install -e ".[dev]"
```
### Running Tests
```bash
pytest tests/
```
### Code Quality
```bash
ruff check .
ruff format .
```
## π Performance
RocketRAG is designed for speed:
- **Document Loading**: 10x faster with Kreuzberg's optimized parsers
- **Chunking**: Semantic chunking with model2vec for superior context preservation
- **Vectorization**: Optimized batch processing with sentence-transformers
- **Retrieval**: Sub-millisecond vector search with Milvus Lite
- **Generation**: GGUF quantization for 4x faster inference
## π€ Contributing
We welcome contributions! RocketRAG's modular architecture makes it easy to:
- Add new document loaders
- Implement custom chunking strategies
- Integrate different embedding models
- Support additional LLM backends
- Enhance the web interface
## π Acknowledgments
RocketRAG builds upon the excellent work of:
- [Chonkie](https://github.com/bhavnicksm/chonkie) for semantic chunking
- [Kreuzberg](https://github.com/mixedbread-ai/kreuzberg) for document processing
- [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) for LLM inference
- [Milvus](https://github.com/milvus-io/milvus-lite) for vector storage
- [Sentence Transformers](https://github.com/UKPLab/sentence-transformers) for embeddings
Raw data
{
"_id": null,
"home_page": null,
"name": "rocketrag",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.12",
"maintainer_email": null,
"keywords": "rag, retrieval, llm, ai, vector-database, embeddings",
"author": null,
"author_email": "Aleksander Obuchowski <obuchowskialeksander@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/53/bd/b0dfbb62f361c8da89bb78a555cf61387c903fd5ceccea2e611728c6b3c0/rocketrag-0.1.4.tar.gz",
"platform": null,
"description": "# \ud83d\ude80 RocketRAG\n\n**Fast, efficient, minimal, extendible and elegant RAG system**\n\nRocketRAG is a high-performance Retrieval-Augmented Generation (RAG) system designed with a focus on speed, simplicity, and extensibility. Built on top of state-of-the-art libraries, it provides both CLI and web server capabilities for seamless integration into any workflow.\n\n## \ud83c\udfaf Mission\n\nRocketRAG aims to be the **fastest and most efficient RAG library** while maintaining:\n- **Minimal footprint** - Clean, lightweight codebase\n- **Maximum extensibility** - Pluggable architecture for all components\n- **Peak performance** - Leveraging the best-in-class libraries\n- **Ease of use** - Simple CLI and API interfaces\n\n## \u26a1 Performance-First Architecture\n\nRocketRAG is built on top of cutting-edge, performance-optimized libraries:\n\n- **[Chonkie](https://github.com/bhavnicksm/chonkie)** - Ultra-fast semantic chunking with model2vec\n- **[Kreuzberg](https://github.com/mixedbread-ai/kreuzberg)** - Lightning-fast document loading and processing\n- **[llama-cpp-python](https://github.com/abetlen/llama-cpp-python)** - Optimized LLM inference with GGUF support\n- **[Milvus Lite](https://github.com/milvus-io/milvus-lite)** - High-performance vector database\n- **[Sentence Transformers](https://github.com/UKPLab/sentence-transformers)** - State-of-the-art embeddings\n\n## \ud83d\ude80 Quick Start\n\n### Installation\n\n#### Using pip\n```bash\npip install rocketrag\n```\n\n#### Using uvx (recommended for CLI usage)\n```bash\n# Run directly without installation\nuvx rocketrag --help\n\n# Or install globally\nuvx install rocketrag\n```\n\n### Basic Usage\n\n```python\nfrom rocketrag import RocketRAG\n\nrag = RocketRAG(\"./data\") # Path do your data (supports PDF, TXT, MD, etc.)\nrag.prepare() # Construct vector database\n\n# Ask questions\nanswer, sources = rag.ask(\"What is the main topic of the documents?\")\nprint(answer)\n```\n\n### CLI Usage\n\n```bash\n# Prepare documents from a directory\nrocketrag prepare --data-dir ./documents\n\n# Ask questions via CLI\nrocketrag ask \"What are the key findings?\"\n\n# Start web server\nrocketrag server --port 8000\n```\n\n#### Using uvx (no installation required)\n```bash\n# Same commands work with uvx\nuvx rocketrag prepare --data-dir ./documents\nuvx rocketrag ask \"What are the key findings?\"\nuvx rocketrag server --port 8000\n\n# Run as module\nuvx --from rocketrag python -m rocketrag --help\n```\n\n## \ud83c\udfd7\ufe0f Architecture\n\nRocketRAG follows a modular, plugin-based architecture:\n\n```\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502 Document \u2502 \u2502 Chunking \u2502 \u2502 Vectorization \u2502\n\u2502 Loaders \u2502\u2500\u2500\u2500\u25b6\u2502 (Chonkie) \u2502\u2500\u2500\u2500\u25b6\u2502 (SentenceTransf)\u2502\n\u2502 (Kreuzberg) \u2502 \u2502 \u2502 \u2502 \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n \u2502\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u2502\n\u2502 LLM \u2502 \u2502 Vector DB \u2502\u25c0\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n\u2502 (llama-cpp-py) \u2502\u25c0\u2500\u2500\u2500\u2502 (Milvus Lite) \u2502\n\u2502 \u2502 \u2502 \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n```\n\n### Core Components\n\n- **BaseLoader**: Pluggable document loading (PDF, TXT, MD, etc.)\n- **BaseChunker**: Configurable chunking strategies (semantic, recursive, etc.)\n- **BaseVectorizer**: Flexible embedding models\n- **BaseLLM**: Swappable language models\n- **MilvusLiteDB**: High-performance vector storage and retrieval\n\n## \ud83d\udd27 Configuration\n\n### Custom Components\n\n```python\nfrom rocketrag import RocketRAG\nfrom rocketrag.vectors import SentenceTransformersVectorizer\nfrom rocketrag.chonk import ChonkieChunker\nfrom rocketrag.llm import LLamaLLM\nfrom rocketrag.loaders import KreuzbergLoader\n\n# Configure high-performance components\nvectorizer = SentenceTransformersVectorizer(\n model_name=\"minishlab/potion-multilingual-128M\" # Fast multilingual model\n)\n\nchunker = ChonkieChunker(\n method=\"semantic\", # Semantic chunking for better context\n embedding_model=\"minishlab/potion-multilingual-128M\",\n chunk_size=512\n)\n\nllm = LLamaLLM(\n repo_id=\"unsloth/gemma-3n-E2B-it-GGUF\",\n filename=\"*Q8_0.gguf\" # Quantized for speed\n)\n\nloader = KreuzbergLoader() # Ultra-fast document processing\n\nrag = RocketRAG(\n vectorizer=vectorizer,\n chunker=chunker,\n llm=llm,\n loader=loader\n)\n```\n\n### CLI Configuration\n\n```bash\n# Custom chunking strategy\nrocketrag prepare \\\n --chonker chonkie \\\n --chonker-args '{\"method\": \"semantic\", \"chunk_size\": 512}' \\\n --vectorizer-args '{\"model_name\": \"all-MiniLM-L6-v2\"}'\n\n# Custom LLM for inference\nrocketrag ask \"Your question\" \\\n --repo-id \"microsoft/DialoGPT-medium\" \\\n --filename \"*.gguf\"\n```\n\n## \ud83c\udf10 Web Server\n\nRocketRAG includes a FastAPI-based web server with OpenAI-compatible endpoints:\n\n```bash\n# Start server\nrocketrag server --port 8000 --host 0.0.0.0\n```\n\n### API Endpoints\n\n- `GET /` - Interactive web interface\n- `POST /ask` - Question answering\n- `POST /ask/stream` - Streaming responses\n- `GET /chat` - Chat interface\n- `GET /browse` - Document browser\n- `GET /visualize` - Vector visualization\n- `GET /health` - Health check\n\n### Example API Usage\n\n```python\nimport requests\n\nresponse = requests.post(\n \"http://localhost:8000/ask\",\n json={\"question\": \"What are the main findings?\"}\n)\n\nresult = response.json()\nprint(result[\"answer\"])\nprint(result[\"sources\"])\n```\n\n## \ud83c\udfa8 Features\n\n### Core Features\n- \u26a1 **Ultra-fast document processing** with Kreuzberg\n- \ud83e\udde0 **Semantic chunking** with Chonkie and model2vec\n- \ud83d\udd0d **High-performance vector search** with Milvus Lite\n- \ud83e\udd16 **Optimized LLM inference** with llama-cpp-python\n- \ud83d\udcca **Rich CLI interface** with progress bars and formatting\n- \ud83c\udf10 **Web server** with interactive UI\n- \ud83d\udd0c **Pluggable architecture** for easy customization\n\n### Advanced Features\n- \ud83d\udcc8 **Vector visualization** for debugging and analysis\n- \ud83d\udcda **Document browsing** interface\n- \ud83d\udcac **Streaming responses** for real-time interaction\n- \ud83d\udd04 **Batch processing** for large document sets\n- \ud83d\udcdd **Metadata preservation** throughout the pipeline\n- \ud83c\udfaf **Context-aware chunking** for better retrieval\n\n## \ud83d\udee0\ufe0f Development\n\n### Installation for Development\n\n```bash\ngit clone https://github.com/yourusername/rocketrag.git\ncd rocketrag\npip install -e \".[dev]\"\n```\n\n### Running Tests\n\n```bash\npytest tests/\n```\n\n### Code Quality\n\n```bash\nruff check .\nruff format .\n```\n\n## \ud83d\udcca Performance\n\nRocketRAG is designed for speed:\n\n- **Document Loading**: 10x faster with Kreuzberg's optimized parsers\n- **Chunking**: Semantic chunking with model2vec for superior context preservation\n- **Vectorization**: Optimized batch processing with sentence-transformers\n- **Retrieval**: Sub-millisecond vector search with Milvus Lite\n- **Generation**: GGUF quantization for 4x faster inference\n\n## \ud83e\udd1d Contributing\n\nWe welcome contributions! RocketRAG's modular architecture makes it easy to:\n\n- Add new document loaders\n- Implement custom chunking strategies\n- Integrate different embedding models\n- Support additional LLM backends\n- Enhance the web interface\n\n## \ud83d\ude4f Acknowledgments\n\nRocketRAG builds upon the excellent work of:\n- [Chonkie](https://github.com/bhavnicksm/chonkie) for semantic chunking\n- [Kreuzberg](https://github.com/mixedbread-ai/kreuzberg) for document processing\n- [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) for LLM inference\n- [Milvus](https://github.com/milvus-io/milvus-lite) for vector storage\n- [Sentence Transformers](https://github.com/UKPLab/sentence-transformers) for embeddings\n\n",
"bugtrack_url": null,
"license": "CC-BY-4.0",
"summary": "Fast, efficient, minimal, extendible and elegant RAG system",
"version": "0.1.4",
"project_urls": {
"Bug Tracker": "https://github.com/TheLion-ai/RocketRAG/issues",
"Documentation": "https://github.com/TheLion-ai/RocketRAG#readme",
"Homepage": "https://github.com/TheLion-ai/RocketRAG",
"Repository": "https://github.com/TheLion-ai/RocketRAG"
},
"split_keywords": [
"rag",
" retrieval",
" llm",
" ai",
" vector-database",
" embeddings"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "5d6e18e5b04b0b29fec37929e788ab6a77eea549b8ea9334b79fd4137e8dd8de",
"md5": "f90d8e51fd07f18d810b416ebb99952e",
"sha256": "71cead93461a1f7f6fd17465463efdfc4e3cce0181e34b9e9234e4914a292783"
},
"downloads": -1,
"filename": "rocketrag-0.1.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "f90d8e51fd07f18d810b416ebb99952e",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.12",
"size": 39975,
"upload_time": "2025-09-01T12:36:42",
"upload_time_iso_8601": "2025-09-01T12:36:42.267922Z",
"url": "https://files.pythonhosted.org/packages/5d/6e/18e5b04b0b29fec37929e788ab6a77eea549b8ea9334b79fd4137e8dd8de/rocketrag-0.1.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "53bdb0dfbb62f361c8da89bb78a555cf61387c903fd5ceccea2e611728c6b3c0",
"md5": "58305b6c87a47fedba5a74d0bc738d8a",
"sha256": "0f0db3f4f651d5f10360b3110b63740dc18e04c52c7208b50a5aa9d0168592ba"
},
"downloads": -1,
"filename": "rocketrag-0.1.4.tar.gz",
"has_sig": false,
"md5_digest": "58305b6c87a47fedba5a74d0bc738d8a",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.12",
"size": 44155,
"upload_time": "2025-09-01T12:36:43",
"upload_time_iso_8601": "2025-09-01T12:36:43.517080Z",
"url": "https://files.pythonhosted.org/packages/53/bd/b0dfbb62f361c8da89bb78a555cf61387c903fd5ceccea2e611728c6b3c0/rocketrag-0.1.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-01 12:36:43",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "TheLion-ai",
"github_project": "RocketRAG",
"github_not_found": true,
"lcname": "rocketrag"
}