tinyrag


Nametinyrag JSON
Version 0.3.5 PyPI version JSON
download
home_pagehttps://github.com/Kenosis01/TinyRag
SummaryA minimal Python library for Retrieval-Augmented Generation with codebase indexing and multiple vector store backends
upload_time2025-08-23 16:55:20
maintainerNone
docs_urlNone
authorTinyRag Team
requires_python>=3.7
licenseMIT
keywords rag retrieval augmented generation vector database embeddings similarity search nlp ai machine-learning codebase code-indexing function-search code-analysis
VCS
bugtrack_url
requirements sentence-transformers requests numpy faiss-cpu scikit-learn chromadb PyPDF2 python-docx pdfminer.six
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <p align="center">
  <img src="logo.jpg" alt="Tinyrag Logo" width="200"/>
</p>


# TinyRag ๐Ÿš€

[![PyPI version](https://badge.fury.io/py/tinyrag.svg)](https://badge.fury.io/py/tinyrag)
[![Python 3.7+](https://img.shields.io/badge/python-3.7+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Documentation](https://img.shields.io/badge/docs-available-brightgreen.svg)](https://tinyrag-docs.netlify.app/docs)
[![PyPI Downloads](https://static.pepy.tech/badge/tinyrag)](https://pepy.tech/projects/tinyrag)



A **lightweight, powerful Python library** for **Retrieval-Augmented Generation (RAG)** that works locally without API keys. Features advanced codebase indexing, multiple document formats, and flexible vector storage backends.

> **๐ŸŽฏ Perfect for developers who need RAG capabilities without complexity or mandatory cloud dependencies.**

## ๐ŸŒŸ Key Features

### ๐Ÿš€ **Works Locally - No API Keys Required**
- **๐Ÿง  Local Embeddings**: Uses all-MiniLM-L6-v2 by default
- **๐Ÿ” Direct Search**: Query documents without LLM costs
- **โšก Zero Setup**: Works immediately after installation

### ๐Ÿ“š **Advanced Document Processing** 
- **๐Ÿ“„ Multi-Format**: PDF, DOCX, CSV, TXT, and raw text
- **๐Ÿ’ป Code Intelligence**: Function-level indexing for 7+ programming languages
- **๐Ÿงต Multithreading**: Parallel processing for faster indexing
- **๐Ÿ“Š Chunking Strategies**: Smart text segmentation

### ๐Ÿ—„๏ธ **Flexible Storage Options**
- **๐Ÿ”Œ Multiple Backends**: Memory, Pickle, Faiss, ChromaDB
- **๐Ÿ’พ Persistence**: Automatic or manual data saving
- **โšก Performance**: Choose speed vs. memory trade-offs
- **๐Ÿ”ง Configuration**: Customizable for any use case

### ๐Ÿ’ฌ **Optional AI Integration**
- **๐Ÿค– Custom System Prompts**: Tailor AI behavior for your domain
- **๐Ÿ”— Provider Support**: OpenAI, Azure, Anthropic, local models
- **๐Ÿ’ฐ Cost Control**: Use only when needed
- **๐ŸŽฏ RAG-Powered Chat**: Contextual AI responses

## ๐Ÿš€ Quick Start

> **๐Ÿ’ก New to TinyRag?** Check out our comprehensive [๐Ÿ“– Documentation](https://tinyrag-docs.netlify.app/docs) with step-by-step guides!

### Installation

```bash
# Basic installation
pip install tinyrag

# With all optional dependencies
pip install tinyrag[all]

# Specific vector stores
pip install tinyrag[faiss]    # High performance
pip install tinyrag[chroma]   # Persistent storage
pip install tinyrag[docs]     # Document processing
```

### Usage Examples

### ๐Ÿƒโ€โ™‚๏ธ 30-Second Example (No API Key Required)

```python
from tinyrag import TinyRag

# 1. Create TinyRag instance
rag = TinyRag()

# 2. Add your content  
rag.add_documents([
    "TinyRag makes RAG simple and powerful.",
    "docs/user_guide.pdf",
    "research_papers/"
])

# 3. Search your content
results = rag.query("How does TinyRag work?", k=3)
for text, score in results:
    print(f"Score: {score:.2f} - {text[:100]}...")
```

**Output:**
```
Score: 0.89 - TinyRag makes RAG simple and powerful.
Score: 0.76 - TinyRag is a lightweight Python library for...
Score: 0.72 - The system processes documents using semantic...
```

### ๐Ÿค– AI-Powered Chat (Optional)

```python
from tinyrag import Provider, TinyRag

# Set up AI provider
provider = Provider(
    api_key="sk-your-openai-key",
    model="gpt-4"
)

# Create smart assistant
rag = TinyRag(
    provider=provider,
    system_prompt="You are a helpful technical assistant."
)

# Add knowledge base
rag.add_documents(["technical_docs/", "api_guides/"])
rag.add_codebase("src/")  # Index your codebase

# Get intelligent answers
response = rag.chat("How do I implement user authentication?")
print(response)
# AI response based on your specific docs and code!
```

## ๐Ÿ“– Complete Documentation

**๐Ÿ“š [Full Documentation](docs/README.md)** - Comprehensive guides from beginner to expert

### ๐Ÿš€ **Getting Started**
- [**Quick Start**](docs/01-quick-start.md) - 5-minute introduction
- [**Installation**](docs/02-installation.md) - Complete setup guide  
- [**Basic Usage**](docs/03-basic-usage.md) - Core features without AI

### ๐Ÿ”ง **Core Features**
- [**Document Processing**](docs/04-document-processing.md) - PDF, DOCX, CSV, TXT
- [**Codebase Indexing**](docs/05-codebase-indexing.md) - Function-level code search
- [**Vector Stores**](docs/06-vector-stores.md) - Choose the right storage
- [**Search & Query**](docs/07-search-query.md) - Similarity search techniques

### ๐Ÿค– **AI Integration**
- [**System Prompts**](docs/08-system-prompts.md) - Customize AI behavior
- [**Chat Functionality**](docs/09-chat-functionality.md) - Build conversations
- [**Provider Configuration**](docs/10-provider-config.md) - AI model setup

---

## ๐Ÿ”ง Core API Reference

### Provider Class

```python
from tinyrag import Provider

# ๐Ÿ†“ No API key needed - works locally
provider = Provider(embedding_model="default")

# ๐Ÿค– With AI capabilities
provider = Provider(
    api_key="sk-your-key",
    model="gpt-4",                           # GPT-4, GPT-3.5, local models
    embedding_model="text-embedding-ada-002", # or "default" for local
    base_url="https://api.openai.com/v1"     # OpenAI, Azure, custom
)
```

### TinyRag Class

```python
from tinyrag import TinyRag

# ๐ŸŽ›๏ธ Choose your vector store
rag = TinyRag(
    provider=provider,               # Optional: for AI chat
    vector_store="faiss",           # memory, pickle, faiss, chromadb
    chunk_size=500,                 # Text chunk size
    max_workers=4,                  # Parallel processing
    system_prompt="Custom prompt"   # AI behavior
)
```

### ๐Ÿ—„๏ธ Vector Store Comparison

| Store | Performance | Persistence | Memory | Dependencies | Best For |
|-------|-------------|-------------|---------|--------------|----------|
| **Memory** | โšก Fast | โŒ None | ๐Ÿ“ˆ High | โœ… None | Development, testing |
| **Pickle** | ๐ŸŒ Fair | ๐Ÿ’พ Manual | ๐Ÿ“Š Medium | โœ… Minimal | Simple projects |
| **Faiss** | ๐Ÿš€ Excellent | ๐Ÿ’พ Manual | ๐Ÿ“‰ Low | ๐Ÿ“ฆ faiss-cpu | Large datasets, speed |
| **ChromaDB** | โšก Good | ๐Ÿ”„ Auto | ๐Ÿ“Š Medium | ๐Ÿ“ฆ chromadb | Production, features |

> **๐Ÿ’ก Recommendation:** Start with `memory` for development, use `faiss` for production performance.

## ๐Ÿ”ง Essential Methods

```python
# ๐Ÿ“„ Document Management
rag.add_documents(["file.pdf", "text"])   # Add any documents
rag.add_codebase("src/")                   # Index code functions
rag.clear_documents()                      # Reset everything

# ๐Ÿ” Search & Query (No AI needed)
results = rag.query("search term", k=5)   # Find similar content
code = rag.query("auth function")          # Search code too

# ๐Ÿค– AI Chat (Optional)
response = rag.chat("Explain this code")   # Get AI answers
rag.set_system_prompt("Be helpful")        # Customize AI

# ๐Ÿ’พ Persistence
rag.save_vector_store("my_data.pkl")       # Save your work
rag.load_vector_store("my_data.pkl")       # Load it back
```

> **๐Ÿ“– [Complete API Reference](docs/18-api-reference.md)** - Full method documentation

## ๐Ÿ’ป Code Intelligence

TinyRag indexes your codebase at the **function level** for intelligent code search:

### ๐ŸŒ Supported Languages

| Language | Extensions | Detection |
|----------|------------|----------|
| **Python** | `.py` | `def function_name` |
| **JavaScript** | `.js`, `.ts` | `function name()`, `const name =` |
| **Java** | `.java` | `public/private type name()` |
| **C/C++** | `.c`, `.cpp`, `.h` | `return_type function_name()` |
| **Go** | `.go` | `func functionName()` |
| **Rust** | `.rs` | `fn function_name()` |
| **PHP** | `.php` | `function functionName()` |

### ๐Ÿ” Code Search Examples

```python
# Index your entire project
rag.add_codebase("my_app/")

# Find authentication code
auth_code = rag.query("user authentication login")

# Database functions
db_code = rag.query("database query SELECT")

# API endpoints
api_code = rag.query("REST API endpoint")

# Get AI explanations (with API key)
response = rag.chat("How does user authentication work?")
# AI analyzes your actual code and explains it!
```

> **๐Ÿ’ก [Learn More](docs/05-codebase-indexing.md)** - Advanced code search techniques


## โš™๏ธ Configuration Examples

### ๐Ÿš€ Performance Optimized
```python
# Large datasets, maximum speed
rag = TinyRag(
    vector_store="faiss",
    chunk_size=800,
    max_workers=8  # Parallel processing
)
```

### ๐Ÿ’พ Production Setup
```python
# Persistent, multi-user ready
rag = TinyRag(
    provider=provider,
    vector_store="chromadb",
    vector_store_config={
        "collection_name": "company_docs",
        "persist_directory": "/data/vectors/"
    }
)
```

### ๐Ÿค– Custom AI Assistant
```python
# Domain-specific AI behavior
rag = TinyRag(
    provider=provider,
    system_prompt="""You are a senior software engineer.
    Provide detailed technical explanations with code examples."""
)
```

> **๐Ÿ”ง [Full Configuration Guide](docs/12-configuration.md)** - All options explained

## ๐Ÿ“ฆ Installation

### ๐ŸŽฏ Choose Your Setup

```bash
# ๐Ÿš€ Quick start (works immediately)
pip install tinyrag

# โšก High performance (recommended)
pip install tinyrag[faiss]

# ๐Ÿ“„ Document processing (PDF, DOCX)
pip install tinyrag[docs]

# ๐Ÿ—„๏ธ Production database
pip install tinyrag[chroma]

# ๐ŸŽ Everything included
pip install tinyrag[all]
```

### ๐Ÿ”ง What Each Option Includes

| Option | Includes | Use Case |
|--------|----------|----------|
| **Base** | Memory store, local embeddings | Development, testing |
| **[faiss]** | + High-performance search | Large datasets |
| **[docs]** | + PDF/DOCX processing | Document analysis |
| **[chroma]** | + Persistent database | Production apps |
| **[all]** | + Everything | Full features |

> **๐Ÿ’ก [Installation Guide](docs/02-installation.md)** - Detailed setup instructions

## ๐ŸŽฏ Real-World Use Cases

### ๐Ÿข **Business Applications**
- **๐Ÿ“‹ Customer Support**: Query company docs and policies
- **๐Ÿ“š Knowledge Management**: Searchable internal documentation
- **๐Ÿ” Research Tools**: Semantic search through research papers
- **๐Ÿ“Š Report Analysis**: Find insights across business reports

### ๐Ÿ‘จโ€๐Ÿ’ป **Developer Tools**
- **๐Ÿ”ง Code Documentation**: Auto-generate code explanations
- **๐Ÿ” Legacy Code Explorer**: Understand large codebases
- **๐Ÿ“– API Assistant**: Query technical documentation
- **๐Ÿงช Testing Helper**: Find relevant test patterns

### ๐ŸŽ“ **Educational & Research**
- **๐Ÿ“š Study Assistant**: Query textbooks and notes
- **๐Ÿ“ Writing Helper**: Research paper analysis
- **๐Ÿง  Learning Companion**: Personalized explanations
- **๐Ÿ“Š Data Analysis**: Explore datasets semantically

> **๐Ÿ’ก [See Complete Examples](docs/15-examples.md)** - Production-ready applications

---

## ๐Ÿ› ๏ธ Contributing

We welcome contributions! Here's how to get started:

```bash
# 1. Fork and clone
git clone https://github.com/Kenosis01/TinyRag.git
cd TinyRag

# 2. Install development dependencies  
pip install -e ".[all,dev]"

# 3. Run tests
python -m pytest

# 4. Make your changes and submit a PR!
```

### ๐Ÿ“‹ **Development Setup**
- **Python 3.7+** required
- **Core dependencies**: sentence-transformers, requests, numpy
- **Optional**: faiss-cpu, chromadb, PyPDF2, python-docx

> **๐Ÿ”ง [Development Guide](CONTRIBUTING.md)** - Detailed contributor guidelines

## ๐Ÿค Community & Support

### ๐Ÿ“ž **Get Help**
- **๐Ÿ“– [Complete Documentation](docs/README.md)** - Comprehensive guides
- **๐Ÿ› [GitHub Issues](https://github.com/Kenosis01/TinyRag/issues)** - Bug reports & feature requests
- **๐Ÿ’ฌ [Discussions](https://github.com/Kenosis01/TinyRag/discussions)** - Community Q&A
- **๐Ÿ“‹ [FAQ](docs/19-faq.md)** - Common questions answered

### ๐ŸŽ‰ **Show Your Support**
- โญ **Star this repo** if TinyRag helps you!
- ๐Ÿฆ **Share on Twitter** - spread the word
- โ˜• **[Buy me a coffee](https://buymeacoffee.com/kenosis)** - support development
- ๐Ÿค **Contribute** - help make TinyRag better

---

## ๐Ÿ“„ License

MIT License - see [LICENSE](LICENSE) for details.

---

<div align="center">

**๐Ÿš€ TinyRag - Making RAG Simple, Powerful, and Accessible! ๐Ÿš€**

*Build intelligent search and Q&A systems in minutes, not hours*

[![GitHub stars](https://img.shields.io/github/stars/Kenosis01/TinyRag?style=social)](https://github.com/Kenosis01/TinyRag)
[![PyPI downloads](https://img.shields.io/pypi/dm/tinyrag)](https://pypi.org/project/tinyrag/)
[![GitHub last commit](https://img.shields.io/github/last-commit/Kenosis01/TinyRag)](https://github.com/Kenosis01/TinyRag)

</div>

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Kenosis01/TinyRag",
    "name": "tinyrag",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "TinyRag Team <transformtrails@gmail.com>",
    "keywords": "rag, retrieval, augmented, generation, vector, database, embeddings, similarity, search, nlp, ai, machine-learning, codebase, code-indexing, function-search, code-analysis",
    "author": "TinyRag Team",
    "author_email": "TinyRag Team <transformtrails@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/38/5a/e514d4e3dd7737e5e09adbb3b2a8cfc04ead1768bf2d2a5becebd93e4444/tinyrag-0.3.5.tar.gz",
    "platform": null,
    "description": "<p align=\"center\">\r\n  <img src=\"logo.jpg\" alt=\"Tinyrag Logo\" width=\"200\"/>\r\n</p>\r\n\r\n\r\n# TinyRag \ud83d\ude80\r\n\r\n[![PyPI version](https://badge.fury.io/py/tinyrag.svg)](https://badge.fury.io/py/tinyrag)\r\n[![Python 3.7+](https://img.shields.io/badge/python-3.7+-blue.svg)](https://www.python.org/downloads/)\r\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\r\n[![Documentation](https://img.shields.io/badge/docs-available-brightgreen.svg)](https://tinyrag-docs.netlify.app/docs)\r\n[![PyPI Downloads](https://static.pepy.tech/badge/tinyrag)](https://pepy.tech/projects/tinyrag)\r\n\r\n\r\n\r\nA **lightweight, powerful Python library** for **Retrieval-Augmented Generation (RAG)** that works locally without API keys. Features advanced codebase indexing, multiple document formats, and flexible vector storage backends.\r\n\r\n> **\ud83c\udfaf Perfect for developers who need RAG capabilities without complexity or mandatory cloud dependencies.**\r\n\r\n## \ud83c\udf1f Key Features\r\n\r\n### \ud83d\ude80 **Works Locally - No API Keys Required**\r\n- **\ud83e\udde0 Local Embeddings**: Uses all-MiniLM-L6-v2 by default\r\n- **\ud83d\udd0d Direct Search**: Query documents without LLM costs\r\n- **\u26a1 Zero Setup**: Works immediately after installation\r\n\r\n### \ud83d\udcda **Advanced Document Processing** \r\n- **\ud83d\udcc4 Multi-Format**: PDF, DOCX, CSV, TXT, and raw text\r\n- **\ud83d\udcbb Code Intelligence**: Function-level indexing for 7+ programming languages\r\n- **\ud83e\uddf5 Multithreading**: Parallel processing for faster indexing\r\n- **\ud83d\udcca Chunking Strategies**: Smart text segmentation\r\n\r\n### \ud83d\uddc4\ufe0f **Flexible Storage Options**\r\n- **\ud83d\udd0c Multiple Backends**: Memory, Pickle, Faiss, ChromaDB\r\n- **\ud83d\udcbe Persistence**: Automatic or manual data saving\r\n- **\u26a1 Performance**: Choose speed vs. memory trade-offs\r\n- **\ud83d\udd27 Configuration**: Customizable for any use case\r\n\r\n### \ud83d\udcac **Optional AI Integration**\r\n- **\ud83e\udd16 Custom System Prompts**: Tailor AI behavior for your domain\r\n- **\ud83d\udd17 Provider Support**: OpenAI, Azure, Anthropic, local models\r\n- **\ud83d\udcb0 Cost Control**: Use only when needed\r\n- **\ud83c\udfaf RAG-Powered Chat**: Contextual AI responses\r\n\r\n## \ud83d\ude80 Quick Start\r\n\r\n> **\ud83d\udca1 New to TinyRag?** Check out our comprehensive [\ud83d\udcd6 Documentation](https://tinyrag-docs.netlify.app/docs) with step-by-step guides!\r\n\r\n### Installation\r\n\r\n```bash\r\n# Basic installation\r\npip install tinyrag\r\n\r\n# With all optional dependencies\r\npip install tinyrag[all]\r\n\r\n# Specific vector stores\r\npip install tinyrag[faiss]    # High performance\r\npip install tinyrag[chroma]   # Persistent storage\r\npip install tinyrag[docs]     # Document processing\r\n```\r\n\r\n### Usage Examples\r\n\r\n### \ud83c\udfc3\u200d\u2642\ufe0f 30-Second Example (No API Key Required)\r\n\r\n```python\r\nfrom tinyrag import TinyRag\r\n\r\n# 1. Create TinyRag instance\r\nrag = TinyRag()\r\n\r\n# 2. Add your content  \r\nrag.add_documents([\r\n    \"TinyRag makes RAG simple and powerful.\",\r\n    \"docs/user_guide.pdf\",\r\n    \"research_papers/\"\r\n])\r\n\r\n# 3. Search your content\r\nresults = rag.query(\"How does TinyRag work?\", k=3)\r\nfor text, score in results:\r\n    print(f\"Score: {score:.2f} - {text[:100]}...\")\r\n```\r\n\r\n**Output:**\r\n```\r\nScore: 0.89 - TinyRag makes RAG simple and powerful.\r\nScore: 0.76 - TinyRag is a lightweight Python library for...\r\nScore: 0.72 - The system processes documents using semantic...\r\n```\r\n\r\n### \ud83e\udd16 AI-Powered Chat (Optional)\r\n\r\n```python\r\nfrom tinyrag import Provider, TinyRag\r\n\r\n# Set up AI provider\r\nprovider = Provider(\r\n    api_key=\"sk-your-openai-key\",\r\n    model=\"gpt-4\"\r\n)\r\n\r\n# Create smart assistant\r\nrag = TinyRag(\r\n    provider=provider,\r\n    system_prompt=\"You are a helpful technical assistant.\"\r\n)\r\n\r\n# Add knowledge base\r\nrag.add_documents([\"technical_docs/\", \"api_guides/\"])\r\nrag.add_codebase(\"src/\")  # Index your codebase\r\n\r\n# Get intelligent answers\r\nresponse = rag.chat(\"How do I implement user authentication?\")\r\nprint(response)\r\n# AI response based on your specific docs and code!\r\n```\r\n\r\n## \ud83d\udcd6 Complete Documentation\r\n\r\n**\ud83d\udcda [Full Documentation](docs/README.md)** - Comprehensive guides from beginner to expert\r\n\r\n### \ud83d\ude80 **Getting Started**\r\n- [**Quick Start**](docs/01-quick-start.md) - 5-minute introduction\r\n- [**Installation**](docs/02-installation.md) - Complete setup guide  \r\n- [**Basic Usage**](docs/03-basic-usage.md) - Core features without AI\r\n\r\n### \ud83d\udd27 **Core Features**\r\n- [**Document Processing**](docs/04-document-processing.md) - PDF, DOCX, CSV, TXT\r\n- [**Codebase Indexing**](docs/05-codebase-indexing.md) - Function-level code search\r\n- [**Vector Stores**](docs/06-vector-stores.md) - Choose the right storage\r\n- [**Search & Query**](docs/07-search-query.md) - Similarity search techniques\r\n\r\n### \ud83e\udd16 **AI Integration**\r\n- [**System Prompts**](docs/08-system-prompts.md) - Customize AI behavior\r\n- [**Chat Functionality**](docs/09-chat-functionality.md) - Build conversations\r\n- [**Provider Configuration**](docs/10-provider-config.md) - AI model setup\r\n\r\n---\r\n\r\n## \ud83d\udd27 Core API Reference\r\n\r\n### Provider Class\r\n\r\n```python\r\nfrom tinyrag import Provider\r\n\r\n# \ud83c\udd93 No API key needed - works locally\r\nprovider = Provider(embedding_model=\"default\")\r\n\r\n# \ud83e\udd16 With AI capabilities\r\nprovider = Provider(\r\n    api_key=\"sk-your-key\",\r\n    model=\"gpt-4\",                           # GPT-4, GPT-3.5, local models\r\n    embedding_model=\"text-embedding-ada-002\", # or \"default\" for local\r\n    base_url=\"https://api.openai.com/v1\"     # OpenAI, Azure, custom\r\n)\r\n```\r\n\r\n### TinyRag Class\r\n\r\n```python\r\nfrom tinyrag import TinyRag\r\n\r\n# \ud83c\udf9b\ufe0f Choose your vector store\r\nrag = TinyRag(\r\n    provider=provider,               # Optional: for AI chat\r\n    vector_store=\"faiss\",           # memory, pickle, faiss, chromadb\r\n    chunk_size=500,                 # Text chunk size\r\n    max_workers=4,                  # Parallel processing\r\n    system_prompt=\"Custom prompt\"   # AI behavior\r\n)\r\n```\r\n\r\n### \ud83d\uddc4\ufe0f Vector Store Comparison\r\n\r\n| Store | Performance | Persistence | Memory | Dependencies | Best For |\r\n|-------|-------------|-------------|---------|--------------|----------|\r\n| **Memory** | \u26a1 Fast | \u274c None | \ud83d\udcc8 High | \u2705 None | Development, testing |\r\n| **Pickle** | \ud83d\udc0c Fair | \ud83d\udcbe Manual | \ud83d\udcca Medium | \u2705 Minimal | Simple projects |\r\n| **Faiss** | \ud83d\ude80 Excellent | \ud83d\udcbe Manual | \ud83d\udcc9 Low | \ud83d\udce6 faiss-cpu | Large datasets, speed |\r\n| **ChromaDB** | \u26a1 Good | \ud83d\udd04 Auto | \ud83d\udcca Medium | \ud83d\udce6 chromadb | Production, features |\r\n\r\n> **\ud83d\udca1 Recommendation:** Start with `memory` for development, use `faiss` for production performance.\r\n\r\n## \ud83d\udd27 Essential Methods\r\n\r\n```python\r\n# \ud83d\udcc4 Document Management\r\nrag.add_documents([\"file.pdf\", \"text\"])   # Add any documents\r\nrag.add_codebase(\"src/\")                   # Index code functions\r\nrag.clear_documents()                      # Reset everything\r\n\r\n# \ud83d\udd0d Search & Query (No AI needed)\r\nresults = rag.query(\"search term\", k=5)   # Find similar content\r\ncode = rag.query(\"auth function\")          # Search code too\r\n\r\n# \ud83e\udd16 AI Chat (Optional)\r\nresponse = rag.chat(\"Explain this code\")   # Get AI answers\r\nrag.set_system_prompt(\"Be helpful\")        # Customize AI\r\n\r\n# \ud83d\udcbe Persistence\r\nrag.save_vector_store(\"my_data.pkl\")       # Save your work\r\nrag.load_vector_store(\"my_data.pkl\")       # Load it back\r\n```\r\n\r\n> **\ud83d\udcd6 [Complete API Reference](docs/18-api-reference.md)** - Full method documentation\r\n\r\n## \ud83d\udcbb Code Intelligence\r\n\r\nTinyRag indexes your codebase at the **function level** for intelligent code search:\r\n\r\n### \ud83c\udf10 Supported Languages\r\n\r\n| Language | Extensions | Detection |\r\n|----------|------------|----------|\r\n| **Python** | `.py` | `def function_name` |\r\n| **JavaScript** | `.js`, `.ts` | `function name()`, `const name =` |\r\n| **Java** | `.java` | `public/private type name()` |\r\n| **C/C++** | `.c`, `.cpp`, `.h` | `return_type function_name()` |\r\n| **Go** | `.go` | `func functionName()` |\r\n| **Rust** | `.rs` | `fn function_name()` |\r\n| **PHP** | `.php` | `function functionName()` |\r\n\r\n### \ud83d\udd0d Code Search Examples\r\n\r\n```python\r\n# Index your entire project\r\nrag.add_codebase(\"my_app/\")\r\n\r\n# Find authentication code\r\nauth_code = rag.query(\"user authentication login\")\r\n\r\n# Database functions\r\ndb_code = rag.query(\"database query SELECT\")\r\n\r\n# API endpoints\r\napi_code = rag.query(\"REST API endpoint\")\r\n\r\n# Get AI explanations (with API key)\r\nresponse = rag.chat(\"How does user authentication work?\")\r\n# AI analyzes your actual code and explains it!\r\n```\r\n\r\n> **\ud83d\udca1 [Learn More](docs/05-codebase-indexing.md)** - Advanced code search techniques\r\n\r\n\r\n## \u2699\ufe0f Configuration Examples\r\n\r\n### \ud83d\ude80 Performance Optimized\r\n```python\r\n# Large datasets, maximum speed\r\nrag = TinyRag(\r\n    vector_store=\"faiss\",\r\n    chunk_size=800,\r\n    max_workers=8  # Parallel processing\r\n)\r\n```\r\n\r\n### \ud83d\udcbe Production Setup\r\n```python\r\n# Persistent, multi-user ready\r\nrag = TinyRag(\r\n    provider=provider,\r\n    vector_store=\"chromadb\",\r\n    vector_store_config={\r\n        \"collection_name\": \"company_docs\",\r\n        \"persist_directory\": \"/data/vectors/\"\r\n    }\r\n)\r\n```\r\n\r\n### \ud83e\udd16 Custom AI Assistant\r\n```python\r\n# Domain-specific AI behavior\r\nrag = TinyRag(\r\n    provider=provider,\r\n    system_prompt=\"\"\"You are a senior software engineer.\r\n    Provide detailed technical explanations with code examples.\"\"\"\r\n)\r\n```\r\n\r\n> **\ud83d\udd27 [Full Configuration Guide](docs/12-configuration.md)** - All options explained\r\n\r\n## \ud83d\udce6 Installation\r\n\r\n### \ud83c\udfaf Choose Your Setup\r\n\r\n```bash\r\n# \ud83d\ude80 Quick start (works immediately)\r\npip install tinyrag\r\n\r\n# \u26a1 High performance (recommended)\r\npip install tinyrag[faiss]\r\n\r\n# \ud83d\udcc4 Document processing (PDF, DOCX)\r\npip install tinyrag[docs]\r\n\r\n# \ud83d\uddc4\ufe0f Production database\r\npip install tinyrag[chroma]\r\n\r\n# \ud83c\udf81 Everything included\r\npip install tinyrag[all]\r\n```\r\n\r\n### \ud83d\udd27 What Each Option Includes\r\n\r\n| Option | Includes | Use Case |\r\n|--------|----------|----------|\r\n| **Base** | Memory store, local embeddings | Development, testing |\r\n| **[faiss]** | + High-performance search | Large datasets |\r\n| **[docs]** | + PDF/DOCX processing | Document analysis |\r\n| **[chroma]** | + Persistent database | Production apps |\r\n| **[all]** | + Everything | Full features |\r\n\r\n> **\ud83d\udca1 [Installation Guide](docs/02-installation.md)** - Detailed setup instructions\r\n\r\n## \ud83c\udfaf Real-World Use Cases\r\n\r\n### \ud83c\udfe2 **Business Applications**\r\n- **\ud83d\udccb Customer Support**: Query company docs and policies\r\n- **\ud83d\udcda Knowledge Management**: Searchable internal documentation\r\n- **\ud83d\udd0d Research Tools**: Semantic search through research papers\r\n- **\ud83d\udcca Report Analysis**: Find insights across business reports\r\n\r\n### \ud83d\udc68\u200d\ud83d\udcbb **Developer Tools**\r\n- **\ud83d\udd27 Code Documentation**: Auto-generate code explanations\r\n- **\ud83d\udd0d Legacy Code Explorer**: Understand large codebases\r\n- **\ud83d\udcd6 API Assistant**: Query technical documentation\r\n- **\ud83e\uddea Testing Helper**: Find relevant test patterns\r\n\r\n### \ud83c\udf93 **Educational & Research**\r\n- **\ud83d\udcda Study Assistant**: Query textbooks and notes\r\n- **\ud83d\udcdd Writing Helper**: Research paper analysis\r\n- **\ud83e\udde0 Learning Companion**: Personalized explanations\r\n- **\ud83d\udcca Data Analysis**: Explore datasets semantically\r\n\r\n> **\ud83d\udca1 [See Complete Examples](docs/15-examples.md)** - Production-ready applications\r\n\r\n---\r\n\r\n## \ud83d\udee0\ufe0f Contributing\r\n\r\nWe welcome contributions! Here's how to get started:\r\n\r\n```bash\r\n# 1. Fork and clone\r\ngit clone https://github.com/Kenosis01/TinyRag.git\r\ncd TinyRag\r\n\r\n# 2. Install development dependencies  \r\npip install -e \".[all,dev]\"\r\n\r\n# 3. Run tests\r\npython -m pytest\r\n\r\n# 4. Make your changes and submit a PR!\r\n```\r\n\r\n### \ud83d\udccb **Development Setup**\r\n- **Python 3.7+** required\r\n- **Core dependencies**: sentence-transformers, requests, numpy\r\n- **Optional**: faiss-cpu, chromadb, PyPDF2, python-docx\r\n\r\n> **\ud83d\udd27 [Development Guide](CONTRIBUTING.md)** - Detailed contributor guidelines\r\n\r\n## \ud83e\udd1d Community & Support\r\n\r\n### \ud83d\udcde **Get Help**\r\n- **\ud83d\udcd6 [Complete Documentation](docs/README.md)** - Comprehensive guides\r\n- **\ud83d\udc1b [GitHub Issues](https://github.com/Kenosis01/TinyRag/issues)** - Bug reports & feature requests\r\n- **\ud83d\udcac [Discussions](https://github.com/Kenosis01/TinyRag/discussions)** - Community Q&A\r\n- **\ud83d\udccb [FAQ](docs/19-faq.md)** - Common questions answered\r\n\r\n### \ud83c\udf89 **Show Your Support**\r\n- \u2b50 **Star this repo** if TinyRag helps you!\r\n- \ud83d\udc26 **Share on Twitter** - spread the word\r\n- \u2615 **[Buy me a coffee](https://buymeacoffee.com/kenosis)** - support development\r\n- \ud83e\udd1d **Contribute** - help make TinyRag better\r\n\r\n---\r\n\r\n## \ud83d\udcc4 License\r\n\r\nMIT License - see [LICENSE](LICENSE) for details.\r\n\r\n---\r\n\r\n<div align=\"center\">\r\n\r\n**\ud83d\ude80 TinyRag - Making RAG Simple, Powerful, and Accessible! \ud83d\ude80**\r\n\r\n*Build intelligent search and Q&A systems in minutes, not hours*\r\n\r\n[![GitHub stars](https://img.shields.io/github/stars/Kenosis01/TinyRag?style=social)](https://github.com/Kenosis01/TinyRag)\r\n[![PyPI downloads](https://img.shields.io/pypi/dm/tinyrag)](https://pypi.org/project/tinyrag/)\r\n[![GitHub last commit](https://img.shields.io/github/last-commit/Kenosis01/TinyRag)](https://github.com/Kenosis01/TinyRag)\r\n\r\n</div>\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A minimal Python library for Retrieval-Augmented Generation with codebase indexing and multiple vector store backends",
    "version": "0.3.5",
    "project_urls": {
        "Bug Tracker": "https://github.com/Kenosis01/TinyRag/issues",
        "Changelog": "https://github.com/Kenosis01/TinyRag/blob/main/CHANGELOG.md",
        "Documentation": "https://github.com/Kenosis01/TinyRag#readme",
        "Homepage": "https://github.com/Kenosis01/TinyRag",
        "Repository": "https://github.com/Kenosis01/TinyRag.git"
    },
    "split_keywords": [
        "rag",
        " retrieval",
        " augmented",
        " generation",
        " vector",
        " database",
        " embeddings",
        " similarity",
        " search",
        " nlp",
        " ai",
        " machine-learning",
        " codebase",
        " code-indexing",
        " function-search",
        " code-analysis"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "e1364819a40f378b9d63708c339e04b452b1e690dc4812a03fa6edd64afa3de6",
                "md5": "dfb1d0b39b076867cf0b1a9fd3eaee94",
                "sha256": "d1db964d40b5d73fa400c0dfd51dee5b4fa3e8d770d3eb5e8fea01d808173297"
            },
            "downloads": -1,
            "filename": "tinyrag-0.3.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "dfb1d0b39b076867cf0b1a9fd3eaee94",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 33289,
            "upload_time": "2025-08-23T16:55:18",
            "upload_time_iso_8601": "2025-08-23T16:55:18.594796Z",
            "url": "https://files.pythonhosted.org/packages/e1/36/4819a40f378b9d63708c339e04b452b1e690dc4812a03fa6edd64afa3de6/tinyrag-0.3.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "385ae514d4e3dd7737e5e09adbb3b2a8cfc04ead1768bf2d2a5becebd93e4444",
                "md5": "756aa7de32c1f4fcace3acbdf3d022b6",
                "sha256": "c110ad632982c834f10eaffaddae9e52e9ecc60f875429f9fe00f186be240326"
            },
            "downloads": -1,
            "filename": "tinyrag-0.3.5.tar.gz",
            "has_sig": false,
            "md5_digest": "756aa7de32c1f4fcace3acbdf3d022b6",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 102210,
            "upload_time": "2025-08-23T16:55:20",
            "upload_time_iso_8601": "2025-08-23T16:55:20.115033Z",
            "url": "https://files.pythonhosted.org/packages/38/5a/e514d4e3dd7737e5e09adbb3b2a8cfc04ead1768bf2d2a5becebd93e4444/tinyrag-0.3.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-23 16:55:20",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Kenosis01",
    "github_project": "TinyRag",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "sentence-transformers",
            "specs": []
        },
        {
            "name": "requests",
            "specs": []
        },
        {
            "name": "numpy",
            "specs": []
        },
        {
            "name": "faiss-cpu",
            "specs": []
        },
        {
            "name": "scikit-learn",
            "specs": []
        },
        {
            "name": "chromadb",
            "specs": []
        },
        {
            "name": "PyPDF2",
            "specs": []
        },
        {
            "name": "python-docx",
            "specs": []
        },
        {
            "name": "pdfminer.six",
            "specs": [
                [
                    ">=",
                    "20221105"
                ]
            ]
        }
    ],
    "lcname": "tinyrag"
}
        
Elapsed time: 1.11684s