# Ragmint






**Ragmint** (Retrieval-Augmented Generation Model Inspection & Tuning) is a modular, developer-friendly Python library for **evaluating, optimizing, and tuning RAG (Retrieval-Augmented Generation) pipelines**.
It provides a complete toolkit for **retriever selection**, **embedding model tuning**, and **automated RAG evaluation** with support for **Optuna-based Bayesian optimization**, **Auto-RAG tuning**, and **explainability** through Gemini or Claude.
---
## β¨ Features
- β
**Automated hyperparameter optimization** (Grid, Random, Bayesian via Optuna)
- π€ **Auto-RAG Tuner** β dynamically recommends retrieverβembedding pairs based on corpus size
- π§ **Explainability Layer** β interprets RAG performance via Gemini or Claude APIs
- π **Leaderboard Tracking** β stores and ranks experiment runs via JSON or external DB
- π **Built-in RAG evaluation metrics** β faithfulness, recall, BLEU, ROUGE, latency
- βοΈ **Retrievers** β FAISS, Chroma, ElasticSearch
- π§© **Embeddings** β OpenAI, HuggingFace
- πΎ **Caching, experiment tracking, and reproducibility** out of the box
- π§° **Clean modular structure** for easy integration in research and production setups
---
## π Quick Start
### 1οΈβ£ Installation
```bash
git clone https://github.com/andyolivers/ragmint.git
cd ragmint
pip install -e .
```
> The `-e` flag installs Ragmint in editable (development) mode.
> Requires **Python β₯ 3.9**.
---
### 2οΈβ£ Run a RAG Optimization Experiment
```bash
python ragmint/main.py --config configs/default.yaml --search bayesian
```
Example `configs/default.yaml`:
```yaml
retriever: faiss
embedding_model: text-embedding-3-small
reranker:
mode: mmr
lambda_param: 0.5
optimization:
search_method: bayesian
n_trials: 20
```
---
### 3οΈβ£ Manual Pipeline Usage
```python
from ragmint.core.pipeline import RAGPipeline
pipeline = RAGPipeline({
"embedding_model": "text-embedding-3-small",
"retriever": "faiss",
})
result = pipeline.run("What is retrieval-augmented generation?")
print(result)
```
---
## π§ͺ Dataset Options
Ragmint can automatically load evaluation datasets for your RAG pipeline:
| Mode | Example | Description |
|------|----------|-------------|
| π§± **Default** | `validation_set=None` | Uses built-in `experiments/validation_qa.json` |
| π **Custom File** | `validation_set="data/my_eval.json"` | Load your own QA dataset (JSON or CSV) |
| π **Hugging Face Dataset** | `validation_set="squad"` | Automatically downloads benchmark datasets (requires `pip install datasets`) |
### Example
```python
from ragmint.tuner import RAGMint
ragmint = RAGMint(
docs_path="data/docs/",
retrievers=["faiss", "chroma"],
embeddings=["text-embedding-3-small"],
rerankers=["mmr"],
)
# Use built-in default
ragmint.optimize(validation_set=None)
# Use Hugging Face benchmark
ragmint.optimize(validation_set="squad")
# Use your own dataset
ragmint.optimize(validation_set="data/custom_qa.json")
```
---
## π§ Auto-RAG Tuner
The **AutoRAGTuner** automatically recommends retrieverβembedding combinations
based on corpus size and average document length.
```python
from ragmint.autotuner import AutoRAGTuner
corpus_stats = {"size": 5000, "avg_len": 250}
tuner = AutoRAGTuner(corpus_stats)
recommendation = tuner.recommend()
print(recommendation)
# Example output: {"retriever": "Chroma", "embedding_model": "SentenceTransformers"}
```
---
## π Leaderboard Tracking
Track and visualize your best experiments across runs.
```python
from ragmint.leaderboard import Leaderboard
lb = Leaderboard("experiments/leaderboard.json")
lb.add_entry({"trial": 1, "faithfulness": 0.87, "latency": 0.12})
lb.show_top(3)
```
---
## π§ Explainability with Gemini / Claude
Compare two RAG configurations and receive natural language insights
on **why** one performs better.
```python
from ragmint.explainer import explain_results
config_a = {"retriever": "FAISS", "embedding_model": "OpenAI"}
config_b = {"retriever": "Chroma", "embedding_model": "SentenceTransformers"}
explanation = explain_results(config_a, config_b, model="gemini")
print(explanation)
```
> Set your API keys in a `.env` file or via environment variables:
> ```
> export GOOGLE_API_KEY="your_gemini_key"
> export ANTHROPIC_API_KEY="your_claude_key"
> ```
---
## π§© Folder Structure
```
ragmint/
βββ core/
β βββ pipeline.py
β βββ retriever.py
β βββ reranker.py
β βββ embedding.py
β βββ evaluation.py
βββ autotuner.py
βββ explainer.py
βββ leaderboard.py
βββ tuner.py
βββ utils/
βββ configs/
βββ experiments/
βββ tests/
βββ main.py
```
---
## π§ͺ Running Tests
```bash
pytest -v
```
To include integration tests with Gemini or Claude APIs:
```bash
pytest -m integration
```
---
## βοΈ Configuration via `pyproject.toml`
Your `pyproject.toml` includes all required dependencies:
```toml
[project]
name = "ragmint"
version = "0.1.0"
dependencies = [
"numpy",
"optuna",
"scikit-learn",
"faiss-cpu",
"chromadb",
"pytest",
"openai",
"tqdm",
"google-generativeai",
"google-genai",
]
```
---
## π Example Experiment Workflow
1. Define your retriever, embedding, and reranker setup
2. Launch optimization (Grid, Random, Bayesian) or AutoTune
3. Compare performance with explainability
4. Persist results to leaderboard for later inspection
---
## 𧬠Architecture Overview
```mermaid
flowchart TD
A[Query] --> B[Embedder]
B --> C[Retriever]
C --> D[Reranker]
D --> E[Generator]
E --> F[Evaluation]
F --> G[Optuna / AutoRAGTuner]
G -->|Best Params| B
```
---
## π Example Output
```
[INFO] Starting Bayesian optimization with Optuna
[INFO] Trial 7 finished: faithfulness=0.83, latency=0.42s
[INFO] Best parameters: {'lambda_param': 0.6, 'retriever': 'faiss'}
[INFO] AutoRAGTuner: Suggested retriever=Chroma for medium corpus
```
---
## π§ Why Ragmint?
- Built for **RAG researchers**, **AI engineers**, and **LLM ops**
- Works with **LangChain**, **LlamaIndex**, or standalone setups
- Designed for **extensibility** β plug in your own retrievers, models, or metrics
- Integrated **explainability and leaderboard** modules for research and production
---
## βοΈ License
Licensed under the **Apache License 2.0** β free for personal, research, and commercial use.
---
## π€ Author
**AndrΓ© Oliveira**
[andyolivers.com](https://andyolivers.com)
Data Scientist | AI Engineer
Raw data
{
"_id": null,
"home_page": null,
"name": "ragmint",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "RAG, LLM, retrieval, optimization, AI, evaluation",
"author": null,
"author_email": "Andre Oliveira <oandreoliveira@outlook.com>",
"download_url": "https://files.pythonhosted.org/packages/59/f8/d00ecfee0dc6e7aa507b226bdd4b7b0a96443210eb78959e3fab04dc97f6/ragmint-0.2.0.tar.gz",
"platform": null,
"description": "# Ragmint\n\n\n\n\n\n\n\n\n\n**Ragmint** (Retrieval-Augmented Generation Model Inspection & Tuning) is a modular, developer-friendly Python library for **evaluating, optimizing, and tuning RAG (Retrieval-Augmented Generation) pipelines**.\n\nIt provides a complete toolkit for **retriever selection**, **embedding model tuning**, and **automated RAG evaluation** with support for **Optuna-based Bayesian optimization**, **Auto-RAG tuning**, and **explainability** through Gemini or Claude.\n\n---\n\n## \u2728 Features\n\n- \u2705 **Automated hyperparameter optimization** (Grid, Random, Bayesian via Optuna) \n- \ud83e\udd16 **Auto-RAG Tuner** \u2014 dynamically recommends retriever\u2013embedding pairs based on corpus size \n- \ud83e\udde0 **Explainability Layer** \u2014 interprets RAG performance via Gemini or Claude APIs \n- \ud83c\udfc6 **Leaderboard Tracking** \u2014 stores and ranks experiment runs via JSON or external DB \n- \ud83d\udd0d **Built-in RAG evaluation metrics** \u2014 faithfulness, recall, BLEU, ROUGE, latency \n- \u2699\ufe0f **Retrievers** \u2014 FAISS, Chroma, ElasticSearch \n- \ud83e\udde9 **Embeddings** \u2014 OpenAI, HuggingFace \n- \ud83d\udcbe **Caching, experiment tracking, and reproducibility** out of the box \n- \ud83e\uddf0 **Clean modular structure** for easy integration in research and production setups \n\n---\n\n## \ud83d\ude80 Quick Start\n\n### 1\ufe0f\u20e3 Installation\n\n```bash\ngit clone https://github.com/andyolivers/ragmint.git\ncd ragmint\npip install -e .\n```\n\n> The `-e` flag installs Ragmint in editable (development) mode. \n> Requires **Python \u2265 3.9**.\n\n---\n\n### 2\ufe0f\u20e3 Run a RAG Optimization Experiment\n\n```bash\npython ragmint/main.py --config configs/default.yaml --search bayesian\n```\n\nExample `configs/default.yaml`:\n```yaml\nretriever: faiss\nembedding_model: text-embedding-3-small\nreranker:\n mode: mmr\n lambda_param: 0.5\noptimization:\n search_method: bayesian\n n_trials: 20\n```\n\n---\n\n### 3\ufe0f\u20e3 Manual Pipeline Usage\n\n```python\nfrom ragmint.core.pipeline import RAGPipeline\n\npipeline = RAGPipeline({\n \"embedding_model\": \"text-embedding-3-small\",\n \"retriever\": \"faiss\",\n})\n\nresult = pipeline.run(\"What is retrieval-augmented generation?\")\nprint(result)\n```\n\n---\n\n## \ud83e\uddea Dataset Options\n\nRagmint can automatically load evaluation datasets for your RAG pipeline:\n\n| Mode | Example | Description |\n|------|----------|-------------|\n| \ud83e\uddf1 **Default** | `validation_set=None` | Uses built-in `experiments/validation_qa.json` |\n| \ud83d\udcc1 **Custom File** | `validation_set=\"data/my_eval.json\"` | Load your own QA dataset (JSON or CSV) |\n| \ud83c\udf10 **Hugging Face Dataset** | `validation_set=\"squad\"` | Automatically downloads benchmark datasets (requires `pip install datasets`) |\n\n### Example\n\n```python\nfrom ragmint.tuner import RAGMint\n\nragmint = RAGMint(\n docs_path=\"data/docs/\",\n retrievers=[\"faiss\", \"chroma\"],\n embeddings=[\"text-embedding-3-small\"],\n rerankers=[\"mmr\"],\n)\n\n# Use built-in default\nragmint.optimize(validation_set=None)\n\n# Use Hugging Face benchmark\nragmint.optimize(validation_set=\"squad\")\n\n# Use your own dataset\nragmint.optimize(validation_set=\"data/custom_qa.json\")\n```\n\n---\n\n## \ud83e\udde0 Auto-RAG Tuner\n\nThe **AutoRAGTuner** automatically recommends retriever\u2013embedding combinations\nbased on corpus size and average document length.\n\n```python\nfrom ragmint.autotuner import AutoRAGTuner\n\ncorpus_stats = {\"size\": 5000, \"avg_len\": 250}\ntuner = AutoRAGTuner(corpus_stats)\nrecommendation = tuner.recommend()\nprint(recommendation)\n# Example output: {\"retriever\": \"Chroma\", \"embedding_model\": \"SentenceTransformers\"}\n```\n\n---\n\n## \ud83c\udfc6 Leaderboard Tracking\n\nTrack and visualize your best experiments across runs.\n\n```python\nfrom ragmint.leaderboard import Leaderboard\n\nlb = Leaderboard(\"experiments/leaderboard.json\")\nlb.add_entry({\"trial\": 1, \"faithfulness\": 0.87, \"latency\": 0.12})\nlb.show_top(3)\n```\n\n---\n\n## \ud83e\udde0 Explainability with Gemini / Claude\n\nCompare two RAG configurations and receive natural language insights\non **why** one performs better.\n\n```python\nfrom ragmint.explainer import explain_results\n\nconfig_a = {\"retriever\": \"FAISS\", \"embedding_model\": \"OpenAI\"}\nconfig_b = {\"retriever\": \"Chroma\", \"embedding_model\": \"SentenceTransformers\"}\n\nexplanation = explain_results(config_a, config_b, model=\"gemini\")\nprint(explanation)\n```\n\n> Set your API keys in a `.env` file or via environment variables:\n> ```\n> export GOOGLE_API_KEY=\"your_gemini_key\"\n> export ANTHROPIC_API_KEY=\"your_claude_key\"\n> ```\n\n---\n\n## \ud83e\udde9 Folder Structure\n\n```\nragmint/\n\u251c\u2500\u2500 core/\n\u2502 \u251c\u2500\u2500 pipeline.py\n\u2502 \u251c\u2500\u2500 retriever.py\n\u2502 \u251c\u2500\u2500 reranker.py\n\u2502 \u251c\u2500\u2500 embedding.py\n\u2502 \u2514\u2500\u2500 evaluation.py\n\u251c\u2500\u2500 autotuner.py\n\u251c\u2500\u2500 explainer.py\n\u251c\u2500\u2500 leaderboard.py\n\u251c\u2500\u2500 tuner.py\n\u251c\u2500\u2500 utils/\n\u251c\u2500\u2500 configs/\n\u251c\u2500\u2500 experiments/\n\u251c\u2500\u2500 tests/\n\u2514\u2500\u2500 main.py\n```\n\n---\n\n## \ud83e\uddea Running Tests\n\n```bash\npytest -v\n```\n\nTo include integration tests with Gemini or Claude APIs:\n```bash\npytest -m integration\n```\n\n---\n\n## \u2699\ufe0f Configuration via `pyproject.toml`\n\nYour `pyproject.toml` includes all required dependencies:\n\n```toml\n[project]\nname = \"ragmint\"\nversion = \"0.1.0\"\ndependencies = [\n \"numpy\",\n \"optuna\",\n \"scikit-learn\",\n \"faiss-cpu\",\n \"chromadb\",\n \"pytest\",\n \"openai\",\n \"tqdm\",\n \"google-generativeai\",\n \"google-genai\",\n]\n```\n\n---\n\n## \ud83d\udcca Example Experiment Workflow\n\n1. Define your retriever, embedding, and reranker setup \n2. Launch optimization (Grid, Random, Bayesian) or AutoTune \n3. Compare performance with explainability \n4. Persist results to leaderboard for later inspection \n\n---\n\n## \ud83e\uddec Architecture Overview\n\n```mermaid\nflowchart TD\n A[Query] --> B[Embedder]\n B --> C[Retriever]\n C --> D[Reranker]\n D --> E[Generator]\n E --> F[Evaluation]\n F --> G[Optuna / AutoRAGTuner]\n G -->|Best Params| B\n```\n\n---\n\n## \ud83d\udcd8 Example Output\n\n```\n[INFO] Starting Bayesian optimization with Optuna\n[INFO] Trial 7 finished: faithfulness=0.83, latency=0.42s\n[INFO] Best parameters: {'lambda_param': 0.6, 'retriever': 'faiss'}\n[INFO] AutoRAGTuner: Suggested retriever=Chroma for medium corpus\n```\n\n---\n\n## \ud83e\udde0 Why Ragmint?\n\n- Built for **RAG researchers**, **AI engineers**, and **LLM ops** \n- Works with **LangChain**, **LlamaIndex**, or standalone setups \n- Designed for **extensibility** \u2014 plug in your own retrievers, models, or metrics \n- Integrated **explainability and leaderboard** modules for research and production \n\n---\n\n## \u2696\ufe0f License\n\nLicensed under the **Apache License 2.0** \u2014 free for personal, research, and commercial use.\n\n---\n\n## \ud83d\udc64 Author\n\n**Andr\u00e9 Oliveira** \n[andyolivers.com](https://andyolivers.com) \nData Scientist | AI Engineer\n",
"bugtrack_url": null,
"license": "Apache License 2.0",
"summary": "A modular framework for evaluating and optimizing RAG pipelines.",
"version": "0.2.0",
"project_urls": {
"Documentation": "https://andyolivers.com",
"Homepage": "https://github.com/andyolivers/ragmint",
"Issues": "https://github.com/andyolivers/ragmint/issues"
},
"split_keywords": [
"rag",
" llm",
" retrieval",
" optimization",
" ai",
" evaluation"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "43639d2f3e51f474edad504a8239742658a736241873a6d0142213e92c4329f7",
"md5": "144df38555f96f50a7cdc3ea3c1f23b7",
"sha256": "9d8b66565cdf84d4a46516531d0c4a47611e93277b525b1d9b8a311c0af5429c"
},
"downloads": -1,
"filename": "ragmint-0.2.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "144df38555f96f50a7cdc3ea3c1f23b7",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 23995,
"upload_time": "2025-11-05T01:03:25",
"upload_time_iso_8601": "2025-11-05T01:03:25.960702Z",
"url": "https://files.pythonhosted.org/packages/43/63/9d2f3e51f474edad504a8239742658a736241873a6d0142213e92c4329f7/ragmint-0.2.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "59f8d00ecfee0dc6e7aa507b226bdd4b7b0a96443210eb78959e3fab04dc97f6",
"md5": "a9b74c52647af1ec9582f092415070d8",
"sha256": "402995bbe77dd7315358818a2240bca0ee15079f0ef1e244d997932d21f68511"
},
"downloads": -1,
"filename": "ragmint-0.2.0.tar.gz",
"has_sig": false,
"md5_digest": "a9b74c52647af1ec9582f092415070d8",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 19945,
"upload_time": "2025-11-05T01:03:27",
"upload_time_iso_8601": "2025-11-05T01:03:27.474942Z",
"url": "https://files.pythonhosted.org/packages/59/f8/d00ecfee0dc6e7aa507b226bdd4b7b0a96443210eb78959e3fab04dc97f6/ragmint-0.2.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-11-05 01:03:27",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "andyolivers",
"github_project": "ragmint",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "ragmint"
}