Name | krag JSON |
Version |
0.0.30
JSON |
| download |
home_page | None |
Summary | A Python package for RAG performance evaluation |
upload_time | 2025-01-26 08:00:57 |
maintainer | None |
docs_url | None |
author | Pandas-Studio |
requires_python | <3.14,>=3.10 |
license | None |
keywords |
rag
recall
mrr
map
ndcg
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# Krag
Krag is a Python package designed to evaluate RAG (Retrieval-Augmented Generation) systems. It provides various evaluation metrics including Hit Rate, Recall, Precision, MRR (Mean Reciprocal Rank), MAP (Mean Average Precision), and NDCG (Normalized Discounted Cumulative Gain).
## Installation
```bash
pip install krag
```
## Key Features
### 1. **Evaluation Metrics**
- **Hit Rate**: Measures ratio of correctly identified target documents
- **Recall**: Measures ratio of relevant documents found in top-k predictions
- **Precision**: Measures accuracy of top-k predictions
- **F1 Score**: Harmonic mean of Precision and Recall
- **MRR** (Mean Reciprocal Rank): Average of inverse rank for first relevant document
- **MAP** (Mean Average Precision): Average precision at ranks with relevant documents
- **NDCG** (Normalized Discounted Cumulative Gain): Measures ranking quality with position-weighted gains
### 2. **Document Matching Methods**
- Text Preprocessing
- TokenizerType (**KIWI**(for Korean), **NLTK**, WHITESPACE)
- TokenizerConfig for consistent normalization
- Language-specific tokenization support (Korean/ English)
- Matching Methods
- Exact text matching
- ROUGE-based matching (rouge1, rouge2, rougeL)
- Embedding-based similarity
- Supports HuggingFace, OpenAI, Ollama
- Configurable thresholds
- Cached embeddings for efficiency
### 3. **Configuration Options**
- Tokenizer selection and settings
- Rouge/similarity thresholds
- Embedding model configuration
- Averaging methods (micro/macro)
### 4. **Visualization**
- Comparative bar charts for metrics
## Metric Details
```python
def calculate_hit_rate(k):
"""
Hit rate = # queries with correct documents / total queries
- ALL: Found all target docs
- PARTIAL: Found at least one target doc
"""
def calculate_recall(k):
"""
Recall = # relevant docs retrieved / total relevant docs
- micro: Document-level average
- macro: Query-level average
"""
def calculate_precision(k):
"""
Precision = # relevant retrieved / # retrieved
- micro: Document-level average
- macro: Query-level average
"""
def calculate_f1_score(k):
"""F1 = 2 * (precision * recall)/(precision + recall)"""
def calculate_mrr(k):
"""MRR = mean(1/rank of first relevant doc)"""
def calculate_map(k):
"""MAP = mean(average precision per query)"""
def calculate_ndcg(k):
"""NDCG = DCG/IDCG with position discounting"""
```
## Usage Examples
### 1. Basic Evaluator
```python
from krag.document import KragDocument as Document
from krag.evaluators import OfflineRetrievalEvaluators, AveragingMethod, MatchingCriteria
actual_docs = [
[Document(page_content="This is the first document."),
Document(page_content="This is the second document.")],
]
predicted_docs = [
[Document(page_content="This is the last document."),
Document(page_content="This is the second document.")],
]
evaluator = OfflineRetrievalEvaluators(
actual_docs,
predicted_docs,
averaging_method=AveragingMethod.MICRO,
matching_criteria=MatchingCriteria.PARTIAL
)
# Calculate individual metrics
hit_rate = evaluator.calculate_hit_rate(k=2)
mrr = evaluator.calculate_mrr(k=2)
recall = evaluator.calculate_recall(k=2)
precision = evaluator.calculate_precision(k=2)
f1_score = evaluator.calculate_f1_score(k=2)
map_score = evaluator.calculate_map(k=2)
ndcg = evaluator.calculate_ndcg(k=2)
print(f"Hit Rate @2: {hit_rate}")
print(f"MRR @2: {mrr}")
print(f"Recall @2: {recall}")
print(f"Precision @2: {precision}")
print(f"F1 Score @2: {f1_score}")
print(f"MAP @2: {map_score}")
print(f"NDCG @2: {ndcg}")
# Visualize the evaluation results
evaluator.visualize_results(k=2)
```
#### Metric Values
```python
Hit Rate @2: {'hit_rate': 1.0}
MRR @2: {'mrr': 0.5}
Recall @2: {'micro_recall': 0.5}
Precision @2: {'micro_precision': 0.5}
F1 Score @2: {'micro_f1': 0.5}
MAP @2: {'map': 0.25}
NDCG @2: {'ndcg': 0.6309297535714575}
```
#### Visualization Result

### 2. ROUGE Evaluator
```python
from krag.evaluators import RougeOfflineRetrievalEvaluators
# Evaluation using ROUGE matching
evaluator = RougeOfflineRetrievalEvaluators(
actual_docs,
predicted_docs,
averaging_method=AveragingMethod.MICRO,
matching_criteria=MatchingCriteria.PARTIAL,
match_method="rouge2", # Choose from rouge1, rouge2, rougeL
threshold=0.8 # ROUGE score threshold
)
# Calculate individual metrics
hit_rate = evaluator.calculate_hit_rate(k=2)
mrr = evaluator.calculate_mrr(k=2)
recall = evaluator.calculate_recall(k=2)
precision = evaluator.calculate_precision(k=2)
f1_score = evaluator.calculate_f1_score(k=2)
map_score = evaluator.calculate_map(k=2)
ndcg = evaluator.calculate_ndcg(k=2)
print(f"Hit Rate @2: {hit_rate}")
print(f"MRR @2: {mrr}")
print(f"Recall @2: {recall}")
print(f"Precision @2: {precision}")
print(f"F1 Score @2: {f1_score}")
print(f"MAP @2: {map_score}")
print(f"NDCG @2: {ndcg}")
# Visualize the evaluation results
evaluator.visualize_results(k=2)
```
#### Metric Values
```python
Hit Rate @2: {'hit_rate': 1.0}
MRR @2: {'mrr': 0.5}
Recall @2: {'micro_recall': 0.5}
Precision @2: {'micro_precision': 0.5}
F1 Score @2: {'micro_f1': 0.5}
MAP @2: {'map': 0.25}
NDCG @2: {'ndcg': 0.8651447273736845}
```
#### Visualization Result

### 3. Embedding-based ROUGE Evaluator
Performs initial filtering using text embeddings followed by detailed comparison using ROUGE scores.
```python
from krag.evaluators import EmbeddingRougeOfflineRetrievalEvaluators
# Using HuggingFace embeddings
evaluator = EmbeddingRougeOfflineRetrievalEvaluators(
actual_docs,
predicted_docs,
averaging_method=AveragingMethod.MICRO,
matching_criteria=MatchingCriteria.PARTIAL,
embedding_type="huggingface",
embedding_config={
"model_name": "jhgan/ko-sroberta-multitask",
"model_kwargs": {'device': 'cpu'},
"encode_kwargs": {'normalize_embeddings': False}
},
similarity_threshold=0.7, # Embedding similarity threshold
rouge_threshold=0.8 # ROUGE score threshold
)
# Using OpenAI embeddings
evaluator = EmbeddingRougeOfflineRetrievalEvaluators(
actual_docs,
predicted_docs,
averaging_method=AveragingMethod.MICRO,
matching_criteria=MatchingCriteria.PARTIAL,
embedding_type="openai",
embedding_config={
"model": "text-embedding-3-small",
"dimensions": 1024 # Optional embedding dimensions
},
similarity_threshold=0.7, # Embedding similarity threshold
rouge_threshold=0.8 # ROUGE score threshold
)
# Using Ollama embeddings
evaluator = EmbeddingRougeOfflineRetrievalEvaluators(
actual_docs,
predicted_docs,
averaging_method=AveragingMethod.MICRO,
matching_criteria=MatchingCriteria.PARTIAL,
embedding_type="ollama",
embedding_config={"model": "bge-m3"},
similarity_threshold=0.8, # Embedding similarity threshold
rouge_threshold=0.8 # ROUGE score threshold
)
# Calculate individual metrics
hit_rate = evaluator.calculate_hit_rate(k=2)
mrr = evaluator.calculate_mrr(k=2)
recall = evaluator.calculate_recall(k=2)
precision = evaluator.calculate_precision(k=2)
f1_score = evaluator.calculate_f1_score(k=2)
map_score = evaluator.calculate_map(k=2)
ndcg = evaluator.calculate_ndcg(k=2)
print(f"Hit Rate @2: {hit_rate}")
print(f"MRR @2: {mrr}")
print(f"Recall @2: {recall}")
print(f"Precision @2: {precision}")
print(f"F1 Score @2: {f1_score}")
print(f"MAP @2: {map_score}")
print(f"NDCG @2: {ndcg}")
# Visualize the evaluation results
evaluator.visualize_results(k=2)
```
#### Metric Values
```python
Hit Rate @2: {'hit_rate': 1.0}
MRR @2: {'mrr': 0.5}
Recall @2: {'micro_recall': 1.0}
Precision @2: {'micro_precision': 0.5}
F1 Score @2: {'micro_f1': 0.6666666666666666}
MAP @2: {'map': 0.25}
NDCG @2: {'ndcg': 0.6309297535714575}
```
#### Visualization Result

## Important Notes
1. Required packages for embedding models:
- HuggingFace: `pip install langchain-huggingface`
- OpenAI: `pip install langchain-openai`
- Ollama: `pip install langchain-ollama`
2. OpenAI embeddings require an API key:
```python
import os
os.environ["OPENAI_API_KEY"] = "your-api-key"
```
## License
MIT License [MIT 라이선스](https://opensource.org/licenses/MIT)
## Contact
Questions: [ontofinances@gmail.com](mailto:ontofinances@gmail.com)
Raw data
{
"_id": null,
"home_page": null,
"name": "krag",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.14,>=3.10",
"maintainer_email": null,
"keywords": "RAG, Recall, MRR, mAP, NDCG",
"author": "Pandas-Studio",
"author_email": "ontofinance@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/c1/85/5485edc8d0f4d7a0bd94d297177ebe8aa8b48b27e64a741ea262c1ed7fc1/krag-0.0.30.tar.gz",
"platform": null,
"description": "# Krag\n\nKrag is a Python package designed to evaluate RAG (Retrieval-Augmented Generation) systems. It provides various evaluation metrics including Hit Rate, Recall, Precision, MRR (Mean Reciprocal Rank), MAP (Mean Average Precision), and NDCG (Normalized Discounted Cumulative Gain).\n\n## Installation\n\n```bash\npip install krag\n```\n\n## Key Features\n\n### 1. **Evaluation Metrics**\n\n- **Hit Rate**: Measures ratio of correctly identified target documents\n- **Recall**: Measures ratio of relevant documents found in top-k predictions\n- **Precision**: Measures accuracy of top-k predictions\n- **F1 Score**: Harmonic mean of Precision and Recall\n- **MRR** (Mean Reciprocal Rank): Average of inverse rank for first relevant document\n- **MAP** (Mean Average Precision): Average precision at ranks with relevant documents\n- **NDCG** (Normalized Discounted Cumulative Gain): Measures ranking quality with position-weighted gains\n\n### 2. **Document Matching Methods**\n\n- Text Preprocessing\n\n - TokenizerType (**KIWI**(for Korean), **NLTK**, WHITESPACE)\n - TokenizerConfig for consistent normalization\n - Language-specific tokenization support (Korean/ English)\n\n- Matching Methods\n - Exact text matching\n - ROUGE-based matching (rouge1, rouge2, rougeL)\n - Embedding-based similarity\n - Supports HuggingFace, OpenAI, Ollama\n - Configurable thresholds\n - Cached embeddings for efficiency\n\n### 3. **Configuration Options**\n\n- Tokenizer selection and settings\n- Rouge/similarity thresholds\n- Embedding model configuration\n- Averaging methods (micro/macro)\n\n### 4. **Visualization**\n\n- Comparative bar charts for metrics\n\n## Metric Details\n\n```python\ndef calculate_hit_rate(k):\n \"\"\"\n Hit rate = # queries with correct documents / total queries\n - ALL: Found all target docs\n - PARTIAL: Found at least one target doc\n \"\"\"\n\ndef calculate_recall(k):\n \"\"\"\n Recall = # relevant docs retrieved / total relevant docs\n - micro: Document-level average\n - macro: Query-level average\n \"\"\"\n\ndef calculate_precision(k):\n \"\"\"\n Precision = # relevant retrieved / # retrieved\n - micro: Document-level average\n - macro: Query-level average\n \"\"\"\n\ndef calculate_f1_score(k):\n \"\"\"F1 = 2 * (precision * recall)/(precision + recall)\"\"\"\n\ndef calculate_mrr(k):\n \"\"\"MRR = mean(1/rank of first relevant doc)\"\"\"\n\ndef calculate_map(k):\n \"\"\"MAP = mean(average precision per query)\"\"\"\n\ndef calculate_ndcg(k):\n \"\"\"NDCG = DCG/IDCG with position discounting\"\"\"\n```\n\n## Usage Examples\n\n### 1. Basic Evaluator\n\n```python\nfrom krag.document import KragDocument as Document\nfrom krag.evaluators import OfflineRetrievalEvaluators, AveragingMethod, MatchingCriteria\n\nactual_docs = [\n [Document(page_content=\"This is the first document.\"),\n Document(page_content=\"This is the second document.\")],\n]\n\npredicted_docs = [\n [Document(page_content=\"This is the last document.\"),\n Document(page_content=\"This is the second document.\")],\n]\n\nevaluator = OfflineRetrievalEvaluators(\n actual_docs,\n predicted_docs,\n averaging_method=AveragingMethod.MICRO,\n matching_criteria=MatchingCriteria.PARTIAL\n)\n\n# Calculate individual metrics\nhit_rate = evaluator.calculate_hit_rate(k=2)\nmrr = evaluator.calculate_mrr(k=2)\nrecall = evaluator.calculate_recall(k=2)\nprecision = evaluator.calculate_precision(k=2)\nf1_score = evaluator.calculate_f1_score(k=2)\nmap_score = evaluator.calculate_map(k=2)\nndcg = evaluator.calculate_ndcg(k=2)\n\nprint(f\"Hit Rate @2: {hit_rate}\")\nprint(f\"MRR @2: {mrr}\")\nprint(f\"Recall @2: {recall}\")\nprint(f\"Precision @2: {precision}\")\nprint(f\"F1 Score @2: {f1_score}\")\nprint(f\"MAP @2: {map_score}\")\nprint(f\"NDCG @2: {ndcg}\")\n\n# Visualize the evaluation results\nevaluator.visualize_results(k=2)\n```\n\n#### Metric Values\n\n```python\nHit Rate @2: {'hit_rate': 1.0}\nMRR @2: {'mrr': 0.5}\nRecall @2: {'micro_recall': 0.5}\nPrecision @2: {'micro_precision': 0.5}\nF1 Score @2: {'micro_f1': 0.5}\nMAP @2: {'map': 0.25}\nNDCG @2: {'ndcg': 0.6309297535714575}\n```\n\n#### Visualization Result\n\n\n\n### 2. ROUGE Evaluator\n\n```python\nfrom krag.evaluators import RougeOfflineRetrievalEvaluators\n\n# Evaluation using ROUGE matching\nevaluator = RougeOfflineRetrievalEvaluators(\n actual_docs,\n predicted_docs,\n averaging_method=AveragingMethod.MICRO,\n matching_criteria=MatchingCriteria.PARTIAL,\n match_method=\"rouge2\", # Choose from rouge1, rouge2, rougeL\n threshold=0.8 # ROUGE score threshold\n)\n\n# Calculate individual metrics\nhit_rate = evaluator.calculate_hit_rate(k=2)\nmrr = evaluator.calculate_mrr(k=2)\nrecall = evaluator.calculate_recall(k=2)\nprecision = evaluator.calculate_precision(k=2)\nf1_score = evaluator.calculate_f1_score(k=2)\nmap_score = evaluator.calculate_map(k=2)\nndcg = evaluator.calculate_ndcg(k=2)\n\nprint(f\"Hit Rate @2: {hit_rate}\")\nprint(f\"MRR @2: {mrr}\")\nprint(f\"Recall @2: {recall}\")\nprint(f\"Precision @2: {precision}\")\nprint(f\"F1 Score @2: {f1_score}\")\nprint(f\"MAP @2: {map_score}\")\nprint(f\"NDCG @2: {ndcg}\")\n\n# Visualize the evaluation results\nevaluator.visualize_results(k=2)\n```\n\n#### Metric Values\n\n```python\nHit Rate @2: {'hit_rate': 1.0}\nMRR @2: {'mrr': 0.5}\nRecall @2: {'micro_recall': 0.5}\nPrecision @2: {'micro_precision': 0.5}\nF1 Score @2: {'micro_f1': 0.5}\nMAP @2: {'map': 0.25}\nNDCG @2: {'ndcg': 0.8651447273736845}\n```\n\n#### Visualization Result\n\n\n\n### 3. Embedding-based ROUGE Evaluator\n\nPerforms initial filtering using text embeddings followed by detailed comparison using ROUGE scores.\n\n```python\nfrom krag.evaluators import EmbeddingRougeOfflineRetrievalEvaluators\n\n# Using HuggingFace embeddings\nevaluator = EmbeddingRougeOfflineRetrievalEvaluators(\n actual_docs,\n predicted_docs,\n averaging_method=AveragingMethod.MICRO,\n matching_criteria=MatchingCriteria.PARTIAL,\n embedding_type=\"huggingface\",\n embedding_config={\n \"model_name\": \"jhgan/ko-sroberta-multitask\",\n \"model_kwargs\": {'device': 'cpu'},\n \"encode_kwargs\": {'normalize_embeddings': False}\n },\n similarity_threshold=0.7, # Embedding similarity threshold\n rouge_threshold=0.8 # ROUGE score threshold\n)\n\n# Using OpenAI embeddings\nevaluator = EmbeddingRougeOfflineRetrievalEvaluators(\n actual_docs,\n predicted_docs,\n averaging_method=AveragingMethod.MICRO,\n matching_criteria=MatchingCriteria.PARTIAL,\n embedding_type=\"openai\",\n embedding_config={\n \"model\": \"text-embedding-3-small\",\n \"dimensions\": 1024 # Optional embedding dimensions\n },\n similarity_threshold=0.7, # Embedding similarity threshold\n rouge_threshold=0.8 # ROUGE score threshold\n)\n\n# Using Ollama embeddings\nevaluator = EmbeddingRougeOfflineRetrievalEvaluators(\n actual_docs,\n predicted_docs,\n averaging_method=AveragingMethod.MICRO,\n matching_criteria=MatchingCriteria.PARTIAL,\n embedding_type=\"ollama\",\n embedding_config={\"model\": \"bge-m3\"},\n similarity_threshold=0.8, # Embedding similarity threshold\n rouge_threshold=0.8 # ROUGE score threshold\n\n)\n\n# Calculate individual metrics\nhit_rate = evaluator.calculate_hit_rate(k=2)\nmrr = evaluator.calculate_mrr(k=2)\nrecall = evaluator.calculate_recall(k=2)\nprecision = evaluator.calculate_precision(k=2)\nf1_score = evaluator.calculate_f1_score(k=2)\nmap_score = evaluator.calculate_map(k=2)\nndcg = evaluator.calculate_ndcg(k=2)\n\nprint(f\"Hit Rate @2: {hit_rate}\")\nprint(f\"MRR @2: {mrr}\")\nprint(f\"Recall @2: {recall}\")\nprint(f\"Precision @2: {precision}\")\nprint(f\"F1 Score @2: {f1_score}\")\nprint(f\"MAP @2: {map_score}\")\nprint(f\"NDCG @2: {ndcg}\")\n\n# Visualize the evaluation results\nevaluator.visualize_results(k=2)\n```\n\n#### Metric Values\n\n```python\nHit Rate @2: {'hit_rate': 1.0}\nMRR @2: {'mrr': 0.5}\nRecall @2: {'micro_recall': 1.0}\nPrecision @2: {'micro_precision': 0.5}\nF1 Score @2: {'micro_f1': 0.6666666666666666}\nMAP @2: {'map': 0.25}\nNDCG @2: {'ndcg': 0.6309297535714575}\n```\n\n#### Visualization Result\n\n\n\n## Important Notes\n\n1. Required packages for embedding models:\n\n - HuggingFace: `pip install langchain-huggingface`\n - OpenAI: `pip install langchain-openai`\n - Ollama: `pip install langchain-ollama`\n\n2. OpenAI embeddings require an API key:\n ```python\n import os\n os.environ[\"OPENAI_API_KEY\"] = \"your-api-key\"\n ```\n\n## License\n\nMIT License [MIT \ub77c\uc774\uc120\uc2a4](https://opensource.org/licenses/MIT)\n\n## Contact\n\nQuestions: [ontofinances@gmail.com](mailto:ontofinances@gmail.com)\n\n",
"bugtrack_url": null,
"license": null,
"summary": "A Python package for RAG performance evaluation",
"version": "0.0.30",
"project_urls": null,
"split_keywords": [
"rag",
" recall",
" mrr",
" map",
" ndcg"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "0da68c1613bb7be6eb5eb03f1aaec47bfac4939801ba4566e8d2031025d327e5",
"md5": "11519f3f052ea7f8e7bfa5369e59ad71",
"sha256": "02359f555ebff4bd31c50b0d99e20770d844e5ad9dca5ed3113bb39f8b1774f4"
},
"downloads": -1,
"filename": "krag-0.0.30-py3-none-any.whl",
"has_sig": false,
"md5_digest": "11519f3f052ea7f8e7bfa5369e59ad71",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.14,>=3.10",
"size": 12984,
"upload_time": "2025-01-26T08:00:55",
"upload_time_iso_8601": "2025-01-26T08:00:55.669219Z",
"url": "https://files.pythonhosted.org/packages/0d/a6/8c1613bb7be6eb5eb03f1aaec47bfac4939801ba4566e8d2031025d327e5/krag-0.0.30-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "c1855485edc8d0f4d7a0bd94d297177ebe8aa8b48b27e64a741ea262c1ed7fc1",
"md5": "7653d23823119328a7f753d841927653",
"sha256": "3cacd11864304aed559e10d3d6857d7f155611f038341d61448092c34e14448f"
},
"downloads": -1,
"filename": "krag-0.0.30.tar.gz",
"has_sig": false,
"md5_digest": "7653d23823119328a7f753d841927653",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.14,>=3.10",
"size": 76949,
"upload_time": "2025-01-26T08:00:57",
"upload_time_iso_8601": "2025-01-26T08:00:57.851511Z",
"url": "https://files.pythonhosted.org/packages/c1/85/5485edc8d0f4d7a0bd94d297177ebe8aa8b48b27e64a741ea262c1ed7fc1/krag-0.0.30.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-26 08:00:57",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "krag"
}