# ragpackai ๐ฆ
**Portable Retrieval-Augmented Generation Library**
ragpackai is a Python library for creating, saving, loading, and querying portable RAG (Retrieval-Augmented Generation) packs. It allows you to bundle documents, embeddings, vectorstores, and configuration into a single `.rag` file that can be easily shared and deployed across different environments.
## โจ Features
- ๐ **Portable RAG Packs**: Bundle everything into a single `.rag` file
- ๐ **Provider Flexibility**: Support for OpenAI, Google, Groq, Cerebras, and HuggingFace
- ๐ **Encryption Support**: Optional AES-GCM encryption for sensitive data
- ๐ฏ **Runtime Overrides**: Change embedding/LLM providers without rebuilding
- ๐ **Multiple Formats**: Support for PDF, TXT, MD, and more
- ๐ ๏ธ **CLI Tools**: Command-line interface for easy pack management
- ๐ง **Lazy Loading**: Efficient dependency management with lazy imports
## ๐ Quick Start
### Installation
```bash
# Core installation
pip install ragpackai
# With optional providers
pip install ragpackai[google] # Google Vertex AI
pip install ragpackai[groq] # Groq
pip install ragpackai[cerebras] # Cerebras
pip install ragpackai[all] # All providers
```
### Basic Usage
```python
from ragpackai import ragpackai
# Create a pack from documents
pack = ragpackai.from_files([
"docs/manual.pdf",
"notes.txt",
"knowledge_base/"
])
# Save the pack
pack.save("my_knowledge.rag")
# Load and query
pack = ragpackai.load("my_knowledge.rag")
# Simple retrieval (no LLM)
results = pack.query("How do I install this?", top_k=3)
print(results)
# Question answering with LLM
answer = pack.ask("What are the main features?")
print(answer)
```
### Provider Overrides
```python
# Load with different providers
pack = ragpackai.load(
"my_knowledge.rag",
embedding_config={
"provider": "google",
"model_name": "textembedding-gecko"
},
llm_config={
"provider": "groq",
"model_name": "mixtral-8x7b-32768"
}
)
answer = pack.ask("Explain the architecture")
```
## ๐ ๏ธ Command Line Interface
### Create a RAG Pack
```bash
# From files and directories
ragpackai create docs/ notes.txt --output knowledge.rag
# With custom settings
ragpackai create docs/ \
--embedding-provider openai \
--embedding-model text-embedding-3-large \
--chunk-size 1024 \
--encrypt-key mypassword
```
### Query and Ask
```bash
# Simple retrieval
ragpackai query knowledge.rag "How to install?"
# Question answering
ragpackai ask knowledge.rag "What are the requirements?" \
--llm-provider openai \
--llm-model gpt-4o
# With provider overrides
ragpackai ask knowledge.rag "Explain the API" \
--embedding-provider google \
--embedding-model textembedding-gecko \
--llm-provider groq \
--llm-model mixtral-8x7b-32768
```
### Pack Information
```bash
ragpackai info knowledge.rag
```
## ๐๏ธ Architecture
### .rag File Structure
A `.rag` file is a structured zip archive:
```
mypack.rag
โโโ metadata.json # Pack metadata
โโโ config.json # Default configurations
โโโ documents/ # Original documents
โ โโโ doc1.txt
โ โโโ doc2.pdf
โโโ vectorstore/ # Chroma vectorstore
โโโ chroma.sqlite3
โโโ ...
```
### Supported Providers
**Embedding Providers:**
- `openai`: text-embedding-3-small, text-embedding-3-large
- `huggingface`: all-MiniLM-L6-v2, all-mpnet-base-v2 (offline)
- `google`: textembedding-gecko
**LLM Providers:**
- `openai`: gpt-4o, gpt-4o-mini, gpt-3.5-turbo
- `google`: gemini-pro, gemini-1.5-flash
- `groq`: mixtral-8x7b-32768, llama2-70b-4096
- `cerebras`: llama3.1-8b, llama3.1-70b
## ๐ API Reference
### ragpackai Class
#### `ragpackai.from_files(files, embed_model="openai:text-embedding-3-small", **kwargs)`
Create a RAG pack from files.
**Parameters:**
- `files`: List of file paths or directories
- `embed_model`: Embedding model in format "provider:model"
- `chunk_size`: Text chunk size (default: 512)
- `chunk_overlap`: Chunk overlap (default: 50)
- `name`: Pack name
#### `ragpackai.load(path, embedding_config=None, llm_config=None, **kwargs)`
Load a RAG pack from file.
**Parameters:**
- `path`: Path to .rag file
- `embedding_config`: Override embedding configuration
- `llm_config`: Override LLM configuration
- `reindex_on_mismatch`: Rebuild vectorstore if dimensions mismatch
- `decrypt_key`: Decryption password
#### `pack.save(path, encrypt_key=None)`
Save pack to .rag file.
#### `pack.query(question, top_k=3)`
Retrieve relevant chunks (no LLM).
#### `pack.ask(question, top_k=4, temperature=0.0)`
Ask question with LLM.
### Provider Wrappers
```python
# Direct provider access
from ragpackai.embeddings import OpenAI, HuggingFace, Google
from ragpackai.llms import OpenAIChat, GoogleChat, GroqChat
# Create embedding provider
embeddings = OpenAI(model_name="text-embedding-3-large")
vectors = embeddings.embed_documents(["Hello world"])
# Create LLM provider
llm = OpenAIChat(model_name="gpt-4o", temperature=0.7)
response = llm.invoke("What is AI?")
```
## ๐ง Configuration
### Environment Variables
```bash
# API Keys
export OPENAI_API_KEY="your-key"
export GOOGLE_CLOUD_PROJECT="your-project"
export GROQ_API_KEY="your-key"
export CEREBRAS_API_KEY="your-key"
# Optional
export GOOGLE_APPLICATION_CREDENTIALS="path/to/service-account.json"
```
### Configuration Files
```python
# Custom embedding config
embedding_config = {
"provider": "huggingface",
"model_name": "all-mpnet-base-v2",
"device": "cuda" # Use GPU
}
# Custom LLM config
llm_config = {
"provider": "openai",
"model_name": "gpt-4o",
"temperature": 0.7,
"max_tokens": 2000
}
```
## ๐ Security
### Encryption
ragpackai supports AES-GCM encryption for sensitive data:
```python
# Save with encryption
pack.save("sensitive.rag", encrypt_key="strong-password")
# Load encrypted pack
pack = ragpackai.load("sensitive.rag", decrypt_key="strong-password")
```
### Best Practices
- Use strong passwords for encryption
- Store API keys securely in environment variables
- Validate .rag files before loading in production
- Consider network security when sharing packs
## ๐งช Examples
See the `examples/` directory for complete examples:
- `basic_usage.py` - Simple pack creation and querying
- `provider_overrides.py` - Using different providers
- `encryption_example.py` - Working with encrypted packs
- `cli_examples.sh` - Command-line usage examples
## ๐ค Contributing
We welcome contributions! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
## ๐ License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## ๐ Support
- ๐ [Documentation](https://aimldev726.github.io/ragpackai/)
- ๐ [Issue Tracker](https://github.com/AIMLDev726/ragpackai/issues)
- ๐ฌ [Discussions](https://github.com/AIMLDev726/ragpackai/discussions)
## ๐ Acknowledgments
Built with:
- [LangChain](https://langchain.com/) - LLM framework
- [ChromaDB](https://www.trychroma.com/) - Vector database
- [Sentence Transformers](https://www.sbert.net/) - Embedding models
Raw data
{
"_id": null,
"home_page": null,
"name": "ragpackaiai",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": "ragpackai Team <aistudentlearn4@gmail.com>",
"keywords": "rag, retrieval, augmented, generation, llm, embeddings, vectorstore, ai, nlp, machine-learning, langchain",
"author": null,
"author_email": "ragpackai Team <aistudentlearn4@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/55/cc/99b2d6d89d31180f1d2bba21c6e460419c70a5d6965a8c1339cb5f9c3b28/ragpackaiai-0.1.1.tar.gz",
"platform": null,
"description": "# ragpackai \ud83d\udce6\r\n\r\n**Portable Retrieval-Augmented Generation Library**\r\n\r\nragpackai is a Python library for creating, saving, loading, and querying portable RAG (Retrieval-Augmented Generation) packs. It allows you to bundle documents, embeddings, vectorstores, and configuration into a single `.rag` file that can be easily shared and deployed across different environments.\r\n\r\n## \u2728 Features\r\n\r\n- \ud83d\ude80 **Portable RAG Packs**: Bundle everything into a single `.rag` file\r\n- \ud83d\udd04 **Provider Flexibility**: Support for OpenAI, Google, Groq, Cerebras, and HuggingFace\r\n- \ud83d\udd12 **Encryption Support**: Optional AES-GCM encryption for sensitive data\r\n- \ud83c\udfaf **Runtime Overrides**: Change embedding/LLM providers without rebuilding\r\n- \ud83d\udcda **Multiple Formats**: Support for PDF, TXT, MD, and more\r\n- \ud83d\udee0\ufe0f **CLI Tools**: Command-line interface for easy pack management\r\n- \ud83d\udd27 **Lazy Loading**: Efficient dependency management with lazy imports\r\n\r\n## \ud83d\ude80 Quick Start\r\n\r\n### Installation\r\n\r\n```bash\r\n# Core installation\r\npip install ragpackai\r\n\r\n# With optional providers\r\npip install ragpackai[google] # Google Vertex AI\r\npip install ragpackai[groq] # Groq\r\npip install ragpackai[cerebras] # Cerebras\r\npip install ragpackai[all] # All providers\r\n```\r\n\r\n### Basic Usage\r\n\r\n```python\r\nfrom ragpackai import ragpackai\r\n\r\n# Create a pack from documents\r\npack = ragpackai.from_files([\r\n \"docs/manual.pdf\", \r\n \"notes.txt\",\r\n \"knowledge_base/\"\r\n])\r\n\r\n# Save the pack\r\npack.save(\"my_knowledge.rag\")\r\n\r\n# Load and query\r\npack = ragpackai.load(\"my_knowledge.rag\")\r\n\r\n# Simple retrieval (no LLM)\r\nresults = pack.query(\"How do I install this?\", top_k=3)\r\nprint(results)\r\n\r\n# Question answering with LLM\r\nanswer = pack.ask(\"What are the main features?\")\r\nprint(answer)\r\n```\r\n\r\n### Provider Overrides\r\n\r\n```python\r\n# Load with different providers\r\npack = ragpackai.load(\r\n \"my_knowledge.rag\",\r\n embedding_config={\r\n \"provider\": \"google\", \r\n \"model_name\": \"textembedding-gecko\"\r\n },\r\n llm_config={\r\n \"provider\": \"groq\", \r\n \"model_name\": \"mixtral-8x7b-32768\"\r\n }\r\n)\r\n\r\nanswer = pack.ask(\"Explain the architecture\")\r\n```\r\n\r\n## \ud83d\udee0\ufe0f Command Line Interface\r\n\r\n### Create a RAG Pack\r\n\r\n```bash\r\n# From files and directories\r\nragpackai create docs/ notes.txt --output knowledge.rag\r\n\r\n# With custom settings\r\nragpackai create docs/ \\\r\n --embedding-provider openai \\\r\n --embedding-model text-embedding-3-large \\\r\n --chunk-size 1024 \\\r\n --encrypt-key mypassword\r\n```\r\n\r\n### Query and Ask\r\n\r\n```bash\r\n# Simple retrieval\r\nragpackai query knowledge.rag \"How to install?\"\r\n\r\n# Question answering\r\nragpackai ask knowledge.rag \"What are the requirements?\" \\\r\n --llm-provider openai \\\r\n --llm-model gpt-4o\r\n\r\n# With provider overrides\r\nragpackai ask knowledge.rag \"Explain the API\" \\\r\n --embedding-provider google \\\r\n --embedding-model textembedding-gecko \\\r\n --llm-provider groq \\\r\n --llm-model mixtral-8x7b-32768\r\n```\r\n\r\n### Pack Information\r\n\r\n```bash\r\nragpackai info knowledge.rag\r\n```\r\n\r\n## \ud83c\udfd7\ufe0f Architecture\r\n\r\n### .rag File Structure\r\n\r\nA `.rag` file is a structured zip archive:\r\n\r\n```\r\nmypack.rag\r\n\u251c\u2500\u2500 metadata.json # Pack metadata\r\n\u251c\u2500\u2500 config.json # Default configurations\r\n\u251c\u2500\u2500 documents/ # Original documents\r\n\u2502 \u251c\u2500\u2500 doc1.txt\r\n\u2502 \u2514\u2500\u2500 doc2.pdf\r\n\u2514\u2500\u2500 vectorstore/ # Chroma vectorstore\r\n \u251c\u2500\u2500 chroma.sqlite3\r\n \u2514\u2500\u2500 ...\r\n```\r\n\r\n### Supported Providers\r\n\r\n**Embedding Providers:**\r\n- `openai`: text-embedding-3-small, text-embedding-3-large\r\n- `huggingface`: all-MiniLM-L6-v2, all-mpnet-base-v2 (offline)\r\n- `google`: textembedding-gecko\r\n\r\n**LLM Providers:**\r\n- `openai`: gpt-4o, gpt-4o-mini, gpt-3.5-turbo\r\n- `google`: gemini-pro, gemini-1.5-flash\r\n- `groq`: mixtral-8x7b-32768, llama2-70b-4096\r\n- `cerebras`: llama3.1-8b, llama3.1-70b\r\n\r\n## \ud83d\udcd6 API Reference\r\n\r\n### ragpackai Class\r\n\r\n#### `ragpackai.from_files(files, embed_model=\"openai:text-embedding-3-small\", **kwargs)`\r\n\r\nCreate a RAG pack from files.\r\n\r\n**Parameters:**\r\n- `files`: List of file paths or directories\r\n- `embed_model`: Embedding model in format \"provider:model\"\r\n- `chunk_size`: Text chunk size (default: 512)\r\n- `chunk_overlap`: Chunk overlap (default: 50)\r\n- `name`: Pack name\r\n\r\n#### `ragpackai.load(path, embedding_config=None, llm_config=None, **kwargs)`\r\n\r\nLoad a RAG pack from file.\r\n\r\n**Parameters:**\r\n- `path`: Path to .rag file\r\n- `embedding_config`: Override embedding configuration\r\n- `llm_config`: Override LLM configuration\r\n- `reindex_on_mismatch`: Rebuild vectorstore if dimensions mismatch\r\n- `decrypt_key`: Decryption password\r\n\r\n#### `pack.save(path, encrypt_key=None)`\r\n\r\nSave pack to .rag file.\r\n\r\n#### `pack.query(question, top_k=3)`\r\n\r\nRetrieve relevant chunks (no LLM).\r\n\r\n#### `pack.ask(question, top_k=4, temperature=0.0)`\r\n\r\nAsk question with LLM.\r\n\r\n### Provider Wrappers\r\n\r\n```python\r\n# Direct provider access\r\nfrom ragpackai.embeddings import OpenAI, HuggingFace, Google\r\nfrom ragpackai.llms import OpenAIChat, GoogleChat, GroqChat\r\n\r\n# Create embedding provider\r\nembeddings = OpenAI(model_name=\"text-embedding-3-large\")\r\nvectors = embeddings.embed_documents([\"Hello world\"])\r\n\r\n# Create LLM provider\r\nllm = OpenAIChat(model_name=\"gpt-4o\", temperature=0.7)\r\nresponse = llm.invoke(\"What is AI?\")\r\n```\r\n\r\n## \ud83d\udd27 Configuration\r\n\r\n### Environment Variables\r\n\r\n```bash\r\n# API Keys\r\nexport OPENAI_API_KEY=\"your-key\"\r\nexport GOOGLE_CLOUD_PROJECT=\"your-project\"\r\nexport GROQ_API_KEY=\"your-key\"\r\nexport CEREBRAS_API_KEY=\"your-key\"\r\n\r\n# Optional\r\nexport GOOGLE_APPLICATION_CREDENTIALS=\"path/to/service-account.json\"\r\n```\r\n\r\n### Configuration Files\r\n\r\n```python\r\n# Custom embedding config\r\nembedding_config = {\r\n \"provider\": \"huggingface\",\r\n \"model_name\": \"all-mpnet-base-v2\",\r\n \"device\": \"cuda\" # Use GPU\r\n}\r\n\r\n# Custom LLM config\r\nllm_config = {\r\n \"provider\": \"openai\",\r\n \"model_name\": \"gpt-4o\",\r\n \"temperature\": 0.7,\r\n \"max_tokens\": 2000\r\n}\r\n```\r\n\r\n## \ud83d\udd12 Security\r\n\r\n### Encryption\r\n\r\nragpackai supports AES-GCM encryption for sensitive data:\r\n\r\n```python\r\n# Save with encryption\r\npack.save(\"sensitive.rag\", encrypt_key=\"strong-password\")\r\n\r\n# Load encrypted pack\r\npack = ragpackai.load(\"sensitive.rag\", decrypt_key=\"strong-password\")\r\n```\r\n\r\n### Best Practices\r\n\r\n- Use strong passwords for encryption\r\n- Store API keys securely in environment variables\r\n- Validate .rag files before loading in production\r\n- Consider network security when sharing packs\r\n\r\n## \ud83e\uddea Examples\r\n\r\nSee the `examples/` directory for complete examples:\r\n\r\n- `basic_usage.py` - Simple pack creation and querying\r\n- `provider_overrides.py` - Using different providers\r\n- `encryption_example.py` - Working with encrypted packs\r\n- `cli_examples.sh` - Command-line usage examples\r\n\r\n## \ud83e\udd1d Contributing\r\n\r\nWe welcome contributions! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.\r\n\r\n## \ud83d\udcc4 License\r\n\r\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\r\n\r\n## \ud83c\udd98 Support\r\n\r\n- \ud83d\udcd6 [Documentation](https://aimldev726.github.io/ragpackai/)\r\n- \ud83d\udc1b [Issue Tracker](https://github.com/AIMLDev726/ragpackai/issues)\r\n- \ud83d\udcac [Discussions](https://github.com/AIMLDev726/ragpackai/discussions)\r\n\r\n## \ud83d\ude4f Acknowledgments\r\n\r\nBuilt with:\r\n- [LangChain](https://langchain.com/) - LLM framework\r\n- [ChromaDB](https://www.trychroma.com/) - Vector database\r\n- [Sentence Transformers](https://www.sbert.net/) - Embedding models\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Portable Retrieval-Augmented Generation Library",
"version": "0.1.1",
"project_urls": {
"Bug Reports": "https://github.com/AIMLDev726/ragpackai/issues",
"Changelog": "https://github.com/AIMLDev726/ragpackai/blob/main/CHANGELOG.md",
"Documentation": "https://AIMLDev726.readthedocs.io/",
"Homepage": "https://github.com/AIMLDev726/ragpackai",
"Repository": "https://github.com/AIMLDev726/ragpackai"
},
"split_keywords": [
"rag",
" retrieval",
" augmented",
" generation",
" llm",
" embeddings",
" vectorstore",
" ai",
" nlp",
" machine-learning",
" langchain"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "83dfcdc4c3a0ffb5a012dae30f26e47fc95125714e6482a4dd74c98224acaba8",
"md5": "dbac682dc880029afccdacaedbe483b6",
"sha256": "f3d8d53a5b0b4ecc88d7aea6513799aaeed90c92de5c4cbce7e4ffe08dbe935d"
},
"downloads": -1,
"filename": "ragpackaiai-0.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "dbac682dc880029afccdacaedbe483b6",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 33838,
"upload_time": "2025-08-28T13:35:20",
"upload_time_iso_8601": "2025-08-28T13:35:20.347390Z",
"url": "https://files.pythonhosted.org/packages/83/df/cdc4c3a0ffb5a012dae30f26e47fc95125714e6482a4dd74c98224acaba8/ragpackaiai-0.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "55cc99b2d6d89d31180f1d2bba21c6e460419c70a5d6965a8c1339cb5f9c3b28",
"md5": "a22a18a8003daf955132f0a819544e99",
"sha256": "5ec1cb31f39412841104ef24e0f099bbb4ba4f0de907b406ed858e9edbecf600"
},
"downloads": -1,
"filename": "ragpackaiai-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "a22a18a8003daf955132f0a819544e99",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 41285,
"upload_time": "2025-08-28T13:35:22",
"upload_time_iso_8601": "2025-08-28T13:35:22.799328Z",
"url": "https://files.pythonhosted.org/packages/55/cc/99b2d6d89d31180f1d2bba21c6e460419c70a5d6965a8c1339cb5f9c3b28/ragpackaiai-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-28 13:35:22",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "AIMLDev726",
"github_project": "ragpackai",
"github_not_found": true,
"lcname": "ragpackaiai"
}