Name | ceylon-rag JSON |
Version |
0.3.1
JSON |
| download |
home_page | None |
Summary | None |
upload_time | 2025-01-14 16:11:41 |
maintainer | None |
docs_url | None |
author | dewma |
requires_python | <4.0,>=3.12 |
license | None |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# Ceylon AI RAG Framework
A powerful, modular, and extensible Retrieval-Augmented Generation (RAG) framework built with Python, supporting multiple LLM providers, embedders, and document types.
## 🌟 Features
- **Multiple Document Types**: Support for various document formats including:
- Text files (with extensive format support)
- PDF documents
- Images (with OCR capabilities)
- Source code files
- **Flexible Architecture**:
- Modular component design
- Pluggable LLM providers (OpenAI, Ollama)
- Extensible embedding providers
- Vector store integration (LanceDB)
- **Advanced RAG Capabilities**:
- Intelligent document chunking
- Context-aware searching
- Query expansion and reranking
- Metadata enrichment
- Source attribution
- **Specialized RAG Implementations**:
- `FolderRAG`: Process and analyze entire directory structures
- `CodeAnalysisRAG`: Specialized for source code understanding
- `SimpleRAG`: Basic RAG implementation for text data
- Support for custom RAG implementations
## 🚀 Getting Started
### Installation
```bash
# Install via pip
pip install ceylon-rag
# Or install from source
git clone https://github.com/ceylonai/ceylon-rag.git
cd ceylon-rag
pip install -e .
```
### Basic Usage
Here's a simple example using the framework:
```python
import asyncio
from dotenv import load_dotenv
from ceylon_rag import SimpleRAG
async def main():
# Load environment variables
load_dotenv()
# Configure the RAG system
config = {
"llm": {
"type": "openai",
"model_name": "gpt-4",
"api_key": os.getenv("OPENAI_API_KEY")
},
"embedder": {
"type": "openai",
"model_name": "text-embedding-3-small",
"api_key": os.getenv("OPENAI_API_KEY")
},
"vector_store": {
"type": "lancedb",
"db_path": "./data/lancedb",
"table_name": "documents"
}
}
# Initialize RAG
rag = SimpleRAG(config)
await rag.initialize()
try:
# Process your documents
documents = await rag.process_documents("path/to/documents")
# Query the system
result = await rag.query("What are the main topics in these documents?")
print(result.response)
finally:
await rag.close()
if __name__ == "__main__":
asyncio.run(main())
```
## 🏗️ Architecture
### Core Components
1. **Document Loaders**
- `TextLoader`: Handles text-based files
- `PDFLoader`: Processes PDF documents
- `ImageLoader`: Handles images with OCR
- Extensible base class for custom loaders
2. **Embedders**
- OpenAI embeddings support
- Ollama embeddings support
- Modular design for adding new providers
3. **LLM Providers**
- OpenAI integration
- Ollama integration
- Async interface for all providers
4. **Vector Store**
- LanceDB integration
- Efficient vector similarity search
- Metadata storage and retrieval
### Document Processing
The framework provides sophisticated document processing capabilities:
```python
# Example: Processing a code repository
async def analyze_codebase():
config = {
"llm": {
"type": "openai",
"model_name": "gpt-4"
},
"embedder": {
"type": "openai",
"model_name": "text-embedding-3-small"
},
"vector_store": {
"type": "lancedb",
"db_path": "./data/lancedb",
"table_name": "code_documents"
},
"chunk_size": 1000,
"chunk_overlap": 200
}
rag = CodeAnalysisRAG(config)
await rag.initialize()
documents = await rag.process_codebase("./src")
await rag.index_code(documents)
result = await rag.analyze_code(
"Explain the main architecture of this codebase"
)
print(result.response)
```
## 🔧 Advanced Configuration
### File Exclusions
Configure file exclusions using patterns:
```python
config = {
# ... other config options ...
"excluded_dirs": [
"venv",
"node_modules",
".git",
"__pycache__"
],
"excluded_files": [
".env",
"package-lock.json"
],
"excluded_extensions": [
".pyc",
".pyo",
".pyd"
],
"ignore_file": ".ragignore" # Similar to .gitignore
}
```
### Chunking Configuration
Customize document chunking:
```python
config = {
# ... other config options ...
"chunk_size": 1000, # Characters per chunk
"chunk_overlap": 200, # Overlap between chunks
}
```
## 🤝 Contributing
Contributions are welcome! Please feel free to submit pull requests. For major changes, please open an issue first to discuss what you would like to change.
## 📄 License
[MIT License](LICENSE)
## 🙏 Acknowledgments
- OpenAI for GPT and embedding models
- Ollama for local LLM support
- LanceDB team for vector storage
- All contributors and users of the framework
---
## 📚 API Documentation
For detailed API documentation, please visit our [API Documentation](docs/api.md) page.
## 🔗 Links
- [GitHub Repository](https://github.com/yourusername/ceylon-rag)
- [Issue Tracker](https://github.com/yourusername/ceylon-rag/issues)
- [Documentation](https://ceylon-rag.readthedocs.io/)
Raw data
{
"_id": null,
"home_page": null,
"name": "ceylon-rag",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.12",
"maintainer_email": null,
"keywords": null,
"author": "dewma",
"author_email": "dewmal@syigen.com",
"download_url": "https://files.pythonhosted.org/packages/8b/67/98474bb3706533745aa052a87579ba867560034a862c0d33898f049d2d30/ceylon_rag-0.3.1.tar.gz",
"platform": null,
"description": "# Ceylon AI RAG Framework\n\nA powerful, modular, and extensible Retrieval-Augmented Generation (RAG) framework built with Python, supporting multiple LLM providers, embedders, and document types.\n\n## \ud83c\udf1f Features\n\n- **Multiple Document Types**: Support for various document formats including:\n - Text files (with extensive format support)\n - PDF documents\n - Images (with OCR capabilities)\n - Source code files\n\n- **Flexible Architecture**:\n - Modular component design\n - Pluggable LLM providers (OpenAI, Ollama)\n - Extensible embedding providers\n - Vector store integration (LanceDB)\n\n- **Advanced RAG Capabilities**:\n - Intelligent document chunking\n - Context-aware searching\n - Query expansion and reranking\n - Metadata enrichment\n - Source attribution\n\n- **Specialized RAG Implementations**:\n - `FolderRAG`: Process and analyze entire directory structures\n - `CodeAnalysisRAG`: Specialized for source code understanding\n - `SimpleRAG`: Basic RAG implementation for text data\n - Support for custom RAG implementations\n\n## \ud83d\ude80 Getting Started\n\n### Installation\n\n```bash\n# Install via pip\npip install ceylon-rag\n\n# Or install from source\ngit clone https://github.com/ceylonai/ceylon-rag.git\ncd ceylon-rag\npip install -e .\n```\n\n### Basic Usage\n\nHere's a simple example using the framework:\n\n```python\nimport asyncio\nfrom dotenv import load_dotenv\nfrom ceylon_rag import SimpleRAG\n\nasync def main():\n # Load environment variables\n load_dotenv()\n\n # Configure the RAG system\n config = {\n \"llm\": {\n \"type\": \"openai\",\n \"model_name\": \"gpt-4\",\n \"api_key\": os.getenv(\"OPENAI_API_KEY\")\n },\n \"embedder\": {\n \"type\": \"openai\",\n \"model_name\": \"text-embedding-3-small\",\n \"api_key\": os.getenv(\"OPENAI_API_KEY\")\n },\n \"vector_store\": {\n \"type\": \"lancedb\",\n \"db_path\": \"./data/lancedb\",\n \"table_name\": \"documents\"\n }\n }\n\n # Initialize RAG\n rag = SimpleRAG(config)\n await rag.initialize()\n\n try:\n # Process your documents\n documents = await rag.process_documents(\"path/to/documents\")\n \n # Query the system\n result = await rag.query(\"What are the main topics in these documents?\")\n print(result.response)\n \n finally:\n await rag.close()\n\nif __name__ == \"__main__\":\n asyncio.run(main())\n```\n\n## \ud83c\udfd7\ufe0f Architecture\n\n### Core Components\n\n1. **Document Loaders**\n - `TextLoader`: Handles text-based files\n - `PDFLoader`: Processes PDF documents\n - `ImageLoader`: Handles images with OCR\n - Extensible base class for custom loaders\n\n2. **Embedders**\n - OpenAI embeddings support\n - Ollama embeddings support\n - Modular design for adding new providers\n\n3. **LLM Providers**\n - OpenAI integration\n - Ollama integration\n - Async interface for all providers\n\n4. **Vector Store**\n - LanceDB integration\n - Efficient vector similarity search\n - Metadata storage and retrieval\n\n### Document Processing\n\nThe framework provides sophisticated document processing capabilities:\n\n```python\n# Example: Processing a code repository\nasync def analyze_codebase():\n config = {\n \"llm\": {\n \"type\": \"openai\",\n \"model_name\": \"gpt-4\"\n },\n \"embedder\": {\n \"type\": \"openai\",\n \"model_name\": \"text-embedding-3-small\"\n },\n \"vector_store\": {\n \"type\": \"lancedb\",\n \"db_path\": \"./data/lancedb\",\n \"table_name\": \"code_documents\"\n },\n \"chunk_size\": 1000,\n \"chunk_overlap\": 200\n }\n\n rag = CodeAnalysisRAG(config)\n await rag.initialize()\n \n documents = await rag.process_codebase(\"./src\")\n await rag.index_code(documents)\n \n result = await rag.analyze_code(\n \"Explain the main architecture of this codebase\"\n )\n print(result.response)\n```\n\n## \ud83d\udd27 Advanced Configuration\n\n### File Exclusions\n\nConfigure file exclusions using patterns:\n\n```python\nconfig = {\n # ... other config options ...\n \"excluded_dirs\": [\n \"venv\",\n \"node_modules\",\n \".git\",\n \"__pycache__\"\n ],\n \"excluded_files\": [\n \".env\",\n \"package-lock.json\"\n ],\n \"excluded_extensions\": [\n \".pyc\",\n \".pyo\",\n \".pyd\"\n ],\n \"ignore_file\": \".ragignore\" # Similar to .gitignore\n}\n```\n\n### Chunking Configuration\n\nCustomize document chunking:\n\n```python\nconfig = {\n # ... other config options ...\n \"chunk_size\": 1000, # Characters per chunk\n \"chunk_overlap\": 200, # Overlap between chunks\n}\n```\n\n## \ud83e\udd1d Contributing\n\nContributions are welcome! Please feel free to submit pull requests. For major changes, please open an issue first to discuss what you would like to change.\n\n## \ud83d\udcc4 License\n\n[MIT License](LICENSE)\n\n## \ud83d\ude4f Acknowledgments\n\n- OpenAI for GPT and embedding models\n- Ollama for local LLM support\n- LanceDB team for vector storage\n- All contributors and users of the framework\n\n---\n\n## \ud83d\udcda API Documentation\n\nFor detailed API documentation, please visit our [API Documentation](docs/api.md) page.\n\n## \ud83d\udd17 Links\n\n- [GitHub Repository](https://github.com/yourusername/ceylon-rag)\n- [Issue Tracker](https://github.com/yourusername/ceylon-rag/issues)\n- [Documentation](https://ceylon-rag.readthedocs.io/)",
"bugtrack_url": null,
"license": null,
"summary": null,
"version": "0.3.1",
"project_urls": null,
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "66212eb499487ebaa8b46346af8728ebab13fe92c6c7638e97b28aef51198449",
"md5": "b2ac4aa847a2f58f617defcb657abe02",
"sha256": "f51c81568b9c57be1b9a9d62d908c1e4a44aadf26ea116d02165923e25cfd7cc"
},
"downloads": -1,
"filename": "ceylon_rag-0.3.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "b2ac4aa847a2f58f617defcb657abe02",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.12",
"size": 20503,
"upload_time": "2025-01-14T16:11:39",
"upload_time_iso_8601": "2025-01-14T16:11:39.222094Z",
"url": "https://files.pythonhosted.org/packages/66/21/2eb499487ebaa8b46346af8728ebab13fe92c6c7638e97b28aef51198449/ceylon_rag-0.3.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "8b6798474bb3706533745aa052a87579ba867560034a862c0d33898f049d2d30",
"md5": "5321a3ba1c60c19ca5f69cdc9e32e1f5",
"sha256": "f2512ed18080f31a278e6ab831996f2438c69f0ae002819fa4cd0927c05487ab"
},
"downloads": -1,
"filename": "ceylon_rag-0.3.1.tar.gz",
"has_sig": false,
"md5_digest": "5321a3ba1c60c19ca5f69cdc9e32e1f5",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.12",
"size": 40603,
"upload_time": "2025-01-14T16:11:41",
"upload_time_iso_8601": "2025-01-14T16:11:41.635205Z",
"url": "https://files.pythonhosted.org/packages/8b/67/98474bb3706533745aa052a87579ba867560034a862c0d33898f049d2d30/ceylon_rag-0.3.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-14 16:11:41",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "ceylon-rag"
}