krag


Namekrag JSON
Version 0.0.30 PyPI version JSON
download
home_pageNone
SummaryA Python package for RAG performance evaluation
upload_time2025-01-26 08:00:57
maintainerNone
docs_urlNone
authorPandas-Studio
requires_python<3.14,>=3.10
licenseNone
keywords rag recall mrr map ndcg
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Krag

Krag is a Python package designed to evaluate RAG (Retrieval-Augmented Generation) systems. It provides various evaluation metrics including Hit Rate, Recall, Precision, MRR (Mean Reciprocal Rank), MAP (Mean Average Precision), and NDCG (Normalized Discounted Cumulative Gain).

## Installation

```bash
pip install krag
```

## Key Features

### 1. **Evaluation Metrics**

- **Hit Rate**: Measures ratio of correctly identified target documents
- **Recall**: Measures ratio of relevant documents found in top-k predictions
- **Precision**: Measures accuracy of top-k predictions
- **F1 Score**: Harmonic mean of Precision and Recall
- **MRR** (Mean Reciprocal Rank): Average of inverse rank for first relevant document
- **MAP** (Mean Average Precision): Average precision at ranks with relevant documents
- **NDCG** (Normalized Discounted Cumulative Gain): Measures ranking quality with position-weighted gains

### 2. **Document Matching Methods**

- Text Preprocessing

  - TokenizerType (**KIWI**(for Korean), **NLTK**, WHITESPACE)
  - TokenizerConfig for consistent normalization
  - Language-specific tokenization support (Korean/ English)

- Matching Methods
  - Exact text matching
  - ROUGE-based matching (rouge1, rouge2, rougeL)
  - Embedding-based similarity
    - Supports HuggingFace, OpenAI, Ollama
    - Configurable thresholds
    - Cached embeddings for efficiency

### 3. **Configuration Options**

- Tokenizer selection and settings
- Rouge/similarity thresholds
- Embedding model configuration
- Averaging methods (micro/macro)

### 4. **Visualization**

- Comparative bar charts for metrics

## Metric Details

```python
def calculate_hit_rate(k):
    """
    Hit rate = # queries with correct documents / total queries
    - ALL: Found all target docs
    - PARTIAL: Found at least one target doc
    """

def calculate_recall(k):
    """
    Recall = # relevant docs retrieved / total relevant docs
    - micro: Document-level average
    - macro: Query-level average
    """

def calculate_precision(k):
    """
    Precision = # relevant retrieved / # retrieved
    - micro: Document-level average
    - macro: Query-level average
    """

def calculate_f1_score(k):
    """F1 = 2 * (precision * recall)/(precision + recall)"""

def calculate_mrr(k):
    """MRR = mean(1/rank of first relevant doc)"""

def calculate_map(k):
    """MAP = mean(average precision per query)"""

def calculate_ndcg(k):
    """NDCG = DCG/IDCG with position discounting"""
```

## Usage Examples

### 1. Basic Evaluator

```python
from krag.document import KragDocument as Document
from krag.evaluators import OfflineRetrievalEvaluators, AveragingMethod, MatchingCriteria

actual_docs = [
    [Document(page_content="This is the first document."),
     Document(page_content="This is the second document.")],
]

predicted_docs = [
    [Document(page_content="This is the last document."),
     Document(page_content="This is the second document.")],
]

evaluator = OfflineRetrievalEvaluators(
    actual_docs,
    predicted_docs,
    averaging_method=AveragingMethod.MICRO,
    matching_criteria=MatchingCriteria.PARTIAL
)

# Calculate individual metrics
hit_rate = evaluator.calculate_hit_rate(k=2)
mrr = evaluator.calculate_mrr(k=2)
recall = evaluator.calculate_recall(k=2)
precision = evaluator.calculate_precision(k=2)
f1_score = evaluator.calculate_f1_score(k=2)
map_score = evaluator.calculate_map(k=2)
ndcg = evaluator.calculate_ndcg(k=2)

print(f"Hit Rate @2: {hit_rate}")
print(f"MRR @2: {mrr}")
print(f"Recall @2: {recall}")
print(f"Precision @2: {precision}")
print(f"F1 Score @2: {f1_score}")
print(f"MAP @2: {map_score}")
print(f"NDCG @2: {ndcg}")

# Visualize the evaluation results
evaluator.visualize_results(k=2)
```

#### Metric Values

```python
Hit Rate @2: {'hit_rate': 1.0}
MRR @2: {'mrr': 0.5}
Recall @2: {'micro_recall': 0.5}
Precision @2: {'micro_precision': 0.5}
F1 Score @2: {'micro_f1': 0.5}
MAP @2: {'map': 0.25}
NDCG @2: {'ndcg': 0.6309297535714575}
```

#### Visualization Result

![Evaluation Results Visualization](docs/sample_1.png)

### 2. ROUGE Evaluator

```python
from krag.evaluators import RougeOfflineRetrievalEvaluators

# Evaluation using ROUGE matching
evaluator = RougeOfflineRetrievalEvaluators(
    actual_docs,
    predicted_docs,
    averaging_method=AveragingMethod.MICRO,
    matching_criteria=MatchingCriteria.PARTIAL,
    match_method="rouge2",  # Choose from rouge1, rouge2, rougeL
    threshold=0.8  # ROUGE score threshold
)

# Calculate individual metrics
hit_rate = evaluator.calculate_hit_rate(k=2)
mrr = evaluator.calculate_mrr(k=2)
recall = evaluator.calculate_recall(k=2)
precision = evaluator.calculate_precision(k=2)
f1_score = evaluator.calculate_f1_score(k=2)
map_score = evaluator.calculate_map(k=2)
ndcg = evaluator.calculate_ndcg(k=2)

print(f"Hit Rate @2: {hit_rate}")
print(f"MRR @2: {mrr}")
print(f"Recall @2: {recall}")
print(f"Precision @2: {precision}")
print(f"F1 Score @2: {f1_score}")
print(f"MAP @2: {map_score}")
print(f"NDCG @2: {ndcg}")

# Visualize the evaluation results
evaluator.visualize_results(k=2)
```

#### Metric Values

```python
Hit Rate @2: {'hit_rate': 1.0}
MRR @2: {'mrr': 0.5}
Recall @2: {'micro_recall': 0.5}
Precision @2: {'micro_precision': 0.5}
F1 Score @2: {'micro_f1': 0.5}
MAP @2: {'map': 0.25}
NDCG @2: {'ndcg': 0.8651447273736845}
```

#### Visualization Result

![Evaluation Results Visualization](docs/sample_2.png)

### 3. Embedding-based ROUGE Evaluator

Performs initial filtering using text embeddings followed by detailed comparison using ROUGE scores.

```python
from krag.evaluators import EmbeddingRougeOfflineRetrievalEvaluators

# Using HuggingFace embeddings
evaluator = EmbeddingRougeOfflineRetrievalEvaluators(
    actual_docs,
    predicted_docs,
    averaging_method=AveragingMethod.MICRO,
    matching_criteria=MatchingCriteria.PARTIAL,
    embedding_type="huggingface",
    embedding_config={
        "model_name": "jhgan/ko-sroberta-multitask",
        "model_kwargs": {'device': 'cpu'},
        "encode_kwargs": {'normalize_embeddings': False}
    },
    similarity_threshold=0.7,  # Embedding similarity threshold
    rouge_threshold=0.8  # ROUGE score threshold
)

# Using OpenAI embeddings
evaluator = EmbeddingRougeOfflineRetrievalEvaluators(
    actual_docs,
    predicted_docs,
    averaging_method=AveragingMethod.MICRO,
    matching_criteria=MatchingCriteria.PARTIAL,
    embedding_type="openai",
    embedding_config={
        "model": "text-embedding-3-small",
        "dimensions": 1024  # Optional embedding dimensions
    },
    similarity_threshold=0.7,  # Embedding similarity threshold
    rouge_threshold=0.8  # ROUGE score threshold
)

# Using Ollama embeddings
evaluator = EmbeddingRougeOfflineRetrievalEvaluators(
    actual_docs,
    predicted_docs,
    averaging_method=AveragingMethod.MICRO,
    matching_criteria=MatchingCriteria.PARTIAL,
    embedding_type="ollama",
    embedding_config={"model": "bge-m3"},
    similarity_threshold=0.8,  # Embedding similarity threshold
    rouge_threshold=0.8  # ROUGE score threshold

)

# Calculate individual metrics
hit_rate = evaluator.calculate_hit_rate(k=2)
mrr = evaluator.calculate_mrr(k=2)
recall = evaluator.calculate_recall(k=2)
precision = evaluator.calculate_precision(k=2)
f1_score = evaluator.calculate_f1_score(k=2)
map_score = evaluator.calculate_map(k=2)
ndcg = evaluator.calculate_ndcg(k=2)

print(f"Hit Rate @2: {hit_rate}")
print(f"MRR @2: {mrr}")
print(f"Recall @2: {recall}")
print(f"Precision @2: {precision}")
print(f"F1 Score @2: {f1_score}")
print(f"MAP @2: {map_score}")
print(f"NDCG @2: {ndcg}")

# Visualize the evaluation results
evaluator.visualize_results(k=2)
```

#### Metric Values

```python
Hit Rate @2: {'hit_rate': 1.0}
MRR @2: {'mrr': 0.5}
Recall @2: {'micro_recall': 1.0}
Precision @2: {'micro_precision': 0.5}
F1 Score @2: {'micro_f1': 0.6666666666666666}
MAP @2: {'map': 0.25}
NDCG @2: {'ndcg': 0.6309297535714575}
```

#### Visualization Result

![Evaluation Results Visualization](docs/sample_3.png)

## Important Notes

1. Required packages for embedding models:

   - HuggingFace: `pip install langchain-huggingface`
   - OpenAI: `pip install langchain-openai`
   - Ollama: `pip install langchain-ollama`

2. OpenAI embeddings require an API key:
   ```python
   import os
   os.environ["OPENAI_API_KEY"] = "your-api-key"
   ```

## License

MIT License [MIT 라이선스](https://opensource.org/licenses/MIT)

## Contact

Questions: [ontofinances@gmail.com](mailto:ontofinances@gmail.com)


            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "krag",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.14,>=3.10",
    "maintainer_email": null,
    "keywords": "RAG, Recall, MRR, mAP, NDCG",
    "author": "Pandas-Studio",
    "author_email": "ontofinance@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/c1/85/5485edc8d0f4d7a0bd94d297177ebe8aa8b48b27e64a741ea262c1ed7fc1/krag-0.0.30.tar.gz",
    "platform": null,
    "description": "# Krag\n\nKrag is a Python package designed to evaluate RAG (Retrieval-Augmented Generation) systems. It provides various evaluation metrics including Hit Rate, Recall, Precision, MRR (Mean Reciprocal Rank), MAP (Mean Average Precision), and NDCG (Normalized Discounted Cumulative Gain).\n\n## Installation\n\n```bash\npip install krag\n```\n\n## Key Features\n\n### 1. **Evaluation Metrics**\n\n- **Hit Rate**: Measures ratio of correctly identified target documents\n- **Recall**: Measures ratio of relevant documents found in top-k predictions\n- **Precision**: Measures accuracy of top-k predictions\n- **F1 Score**: Harmonic mean of Precision and Recall\n- **MRR** (Mean Reciprocal Rank): Average of inverse rank for first relevant document\n- **MAP** (Mean Average Precision): Average precision at ranks with relevant documents\n- **NDCG** (Normalized Discounted Cumulative Gain): Measures ranking quality with position-weighted gains\n\n### 2. **Document Matching Methods**\n\n- Text Preprocessing\n\n  - TokenizerType (**KIWI**(for Korean), **NLTK**, WHITESPACE)\n  - TokenizerConfig for consistent normalization\n  - Language-specific tokenization support (Korean/ English)\n\n- Matching Methods\n  - Exact text matching\n  - ROUGE-based matching (rouge1, rouge2, rougeL)\n  - Embedding-based similarity\n    - Supports HuggingFace, OpenAI, Ollama\n    - Configurable thresholds\n    - Cached embeddings for efficiency\n\n### 3. **Configuration Options**\n\n- Tokenizer selection and settings\n- Rouge/similarity thresholds\n- Embedding model configuration\n- Averaging methods (micro/macro)\n\n### 4. **Visualization**\n\n- Comparative bar charts for metrics\n\n## Metric Details\n\n```python\ndef calculate_hit_rate(k):\n    \"\"\"\n    Hit rate = # queries with correct documents / total queries\n    - ALL: Found all target docs\n    - PARTIAL: Found at least one target doc\n    \"\"\"\n\ndef calculate_recall(k):\n    \"\"\"\n    Recall = # relevant docs retrieved / total relevant docs\n    - micro: Document-level average\n    - macro: Query-level average\n    \"\"\"\n\ndef calculate_precision(k):\n    \"\"\"\n    Precision = # relevant retrieved / # retrieved\n    - micro: Document-level average\n    - macro: Query-level average\n    \"\"\"\n\ndef calculate_f1_score(k):\n    \"\"\"F1 = 2 * (precision * recall)/(precision + recall)\"\"\"\n\ndef calculate_mrr(k):\n    \"\"\"MRR = mean(1/rank of first relevant doc)\"\"\"\n\ndef calculate_map(k):\n    \"\"\"MAP = mean(average precision per query)\"\"\"\n\ndef calculate_ndcg(k):\n    \"\"\"NDCG = DCG/IDCG with position discounting\"\"\"\n```\n\n## Usage Examples\n\n### 1. Basic Evaluator\n\n```python\nfrom krag.document import KragDocument as Document\nfrom krag.evaluators import OfflineRetrievalEvaluators, AveragingMethod, MatchingCriteria\n\nactual_docs = [\n    [Document(page_content=\"This is the first document.\"),\n     Document(page_content=\"This is the second document.\")],\n]\n\npredicted_docs = [\n    [Document(page_content=\"This is the last document.\"),\n     Document(page_content=\"This is the second document.\")],\n]\n\nevaluator = OfflineRetrievalEvaluators(\n    actual_docs,\n    predicted_docs,\n    averaging_method=AveragingMethod.MICRO,\n    matching_criteria=MatchingCriteria.PARTIAL\n)\n\n# Calculate individual metrics\nhit_rate = evaluator.calculate_hit_rate(k=2)\nmrr = evaluator.calculate_mrr(k=2)\nrecall = evaluator.calculate_recall(k=2)\nprecision = evaluator.calculate_precision(k=2)\nf1_score = evaluator.calculate_f1_score(k=2)\nmap_score = evaluator.calculate_map(k=2)\nndcg = evaluator.calculate_ndcg(k=2)\n\nprint(f\"Hit Rate @2: {hit_rate}\")\nprint(f\"MRR @2: {mrr}\")\nprint(f\"Recall @2: {recall}\")\nprint(f\"Precision @2: {precision}\")\nprint(f\"F1 Score @2: {f1_score}\")\nprint(f\"MAP @2: {map_score}\")\nprint(f\"NDCG @2: {ndcg}\")\n\n# Visualize the evaluation results\nevaluator.visualize_results(k=2)\n```\n\n#### Metric Values\n\n```python\nHit Rate @2: {'hit_rate': 1.0}\nMRR @2: {'mrr': 0.5}\nRecall @2: {'micro_recall': 0.5}\nPrecision @2: {'micro_precision': 0.5}\nF1 Score @2: {'micro_f1': 0.5}\nMAP @2: {'map': 0.25}\nNDCG @2: {'ndcg': 0.6309297535714575}\n```\n\n#### Visualization Result\n\n![Evaluation Results Visualization](docs/sample_1.png)\n\n### 2. ROUGE Evaluator\n\n```python\nfrom krag.evaluators import RougeOfflineRetrievalEvaluators\n\n# Evaluation using ROUGE matching\nevaluator = RougeOfflineRetrievalEvaluators(\n    actual_docs,\n    predicted_docs,\n    averaging_method=AveragingMethod.MICRO,\n    matching_criteria=MatchingCriteria.PARTIAL,\n    match_method=\"rouge2\",  # Choose from rouge1, rouge2, rougeL\n    threshold=0.8  # ROUGE score threshold\n)\n\n# Calculate individual metrics\nhit_rate = evaluator.calculate_hit_rate(k=2)\nmrr = evaluator.calculate_mrr(k=2)\nrecall = evaluator.calculate_recall(k=2)\nprecision = evaluator.calculate_precision(k=2)\nf1_score = evaluator.calculate_f1_score(k=2)\nmap_score = evaluator.calculate_map(k=2)\nndcg = evaluator.calculate_ndcg(k=2)\n\nprint(f\"Hit Rate @2: {hit_rate}\")\nprint(f\"MRR @2: {mrr}\")\nprint(f\"Recall @2: {recall}\")\nprint(f\"Precision @2: {precision}\")\nprint(f\"F1 Score @2: {f1_score}\")\nprint(f\"MAP @2: {map_score}\")\nprint(f\"NDCG @2: {ndcg}\")\n\n# Visualize the evaluation results\nevaluator.visualize_results(k=2)\n```\n\n#### Metric Values\n\n```python\nHit Rate @2: {'hit_rate': 1.0}\nMRR @2: {'mrr': 0.5}\nRecall @2: {'micro_recall': 0.5}\nPrecision @2: {'micro_precision': 0.5}\nF1 Score @2: {'micro_f1': 0.5}\nMAP @2: {'map': 0.25}\nNDCG @2: {'ndcg': 0.8651447273736845}\n```\n\n#### Visualization Result\n\n![Evaluation Results Visualization](docs/sample_2.png)\n\n### 3. Embedding-based ROUGE Evaluator\n\nPerforms initial filtering using text embeddings followed by detailed comparison using ROUGE scores.\n\n```python\nfrom krag.evaluators import EmbeddingRougeOfflineRetrievalEvaluators\n\n# Using HuggingFace embeddings\nevaluator = EmbeddingRougeOfflineRetrievalEvaluators(\n    actual_docs,\n    predicted_docs,\n    averaging_method=AveragingMethod.MICRO,\n    matching_criteria=MatchingCriteria.PARTIAL,\n    embedding_type=\"huggingface\",\n    embedding_config={\n        \"model_name\": \"jhgan/ko-sroberta-multitask\",\n        \"model_kwargs\": {'device': 'cpu'},\n        \"encode_kwargs\": {'normalize_embeddings': False}\n    },\n    similarity_threshold=0.7,  # Embedding similarity threshold\n    rouge_threshold=0.8  # ROUGE score threshold\n)\n\n# Using OpenAI embeddings\nevaluator = EmbeddingRougeOfflineRetrievalEvaluators(\n    actual_docs,\n    predicted_docs,\n    averaging_method=AveragingMethod.MICRO,\n    matching_criteria=MatchingCriteria.PARTIAL,\n    embedding_type=\"openai\",\n    embedding_config={\n        \"model\": \"text-embedding-3-small\",\n        \"dimensions\": 1024  # Optional embedding dimensions\n    },\n    similarity_threshold=0.7,  # Embedding similarity threshold\n    rouge_threshold=0.8  # ROUGE score threshold\n)\n\n# Using Ollama embeddings\nevaluator = EmbeddingRougeOfflineRetrievalEvaluators(\n    actual_docs,\n    predicted_docs,\n    averaging_method=AveragingMethod.MICRO,\n    matching_criteria=MatchingCriteria.PARTIAL,\n    embedding_type=\"ollama\",\n    embedding_config={\"model\": \"bge-m3\"},\n    similarity_threshold=0.8,  # Embedding similarity threshold\n    rouge_threshold=0.8  # ROUGE score threshold\n\n)\n\n# Calculate individual metrics\nhit_rate = evaluator.calculate_hit_rate(k=2)\nmrr = evaluator.calculate_mrr(k=2)\nrecall = evaluator.calculate_recall(k=2)\nprecision = evaluator.calculate_precision(k=2)\nf1_score = evaluator.calculate_f1_score(k=2)\nmap_score = evaluator.calculate_map(k=2)\nndcg = evaluator.calculate_ndcg(k=2)\n\nprint(f\"Hit Rate @2: {hit_rate}\")\nprint(f\"MRR @2: {mrr}\")\nprint(f\"Recall @2: {recall}\")\nprint(f\"Precision @2: {precision}\")\nprint(f\"F1 Score @2: {f1_score}\")\nprint(f\"MAP @2: {map_score}\")\nprint(f\"NDCG @2: {ndcg}\")\n\n# Visualize the evaluation results\nevaluator.visualize_results(k=2)\n```\n\n#### Metric Values\n\n```python\nHit Rate @2: {'hit_rate': 1.0}\nMRR @2: {'mrr': 0.5}\nRecall @2: {'micro_recall': 1.0}\nPrecision @2: {'micro_precision': 0.5}\nF1 Score @2: {'micro_f1': 0.6666666666666666}\nMAP @2: {'map': 0.25}\nNDCG @2: {'ndcg': 0.6309297535714575}\n```\n\n#### Visualization Result\n\n![Evaluation Results Visualization](docs/sample_3.png)\n\n## Important Notes\n\n1. Required packages for embedding models:\n\n   - HuggingFace: `pip install langchain-huggingface`\n   - OpenAI: `pip install langchain-openai`\n   - Ollama: `pip install langchain-ollama`\n\n2. OpenAI embeddings require an API key:\n   ```python\n   import os\n   os.environ[\"OPENAI_API_KEY\"] = \"your-api-key\"\n   ```\n\n## License\n\nMIT License [MIT \ub77c\uc774\uc120\uc2a4](https://opensource.org/licenses/MIT)\n\n## Contact\n\nQuestions: [ontofinances@gmail.com](mailto:ontofinances@gmail.com)\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A Python package for RAG performance evaluation",
    "version": "0.0.30",
    "project_urls": null,
    "split_keywords": [
        "rag",
        " recall",
        " mrr",
        " map",
        " ndcg"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0da68c1613bb7be6eb5eb03f1aaec47bfac4939801ba4566e8d2031025d327e5",
                "md5": "11519f3f052ea7f8e7bfa5369e59ad71",
                "sha256": "02359f555ebff4bd31c50b0d99e20770d844e5ad9dca5ed3113bb39f8b1774f4"
            },
            "downloads": -1,
            "filename": "krag-0.0.30-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "11519f3f052ea7f8e7bfa5369e59ad71",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.14,>=3.10",
            "size": 12984,
            "upload_time": "2025-01-26T08:00:55",
            "upload_time_iso_8601": "2025-01-26T08:00:55.669219Z",
            "url": "https://files.pythonhosted.org/packages/0d/a6/8c1613bb7be6eb5eb03f1aaec47bfac4939801ba4566e8d2031025d327e5/krag-0.0.30-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c1855485edc8d0f4d7a0bd94d297177ebe8aa8b48b27e64a741ea262c1ed7fc1",
                "md5": "7653d23823119328a7f753d841927653",
                "sha256": "3cacd11864304aed559e10d3d6857d7f155611f038341d61448092c34e14448f"
            },
            "downloads": -1,
            "filename": "krag-0.0.30.tar.gz",
            "has_sig": false,
            "md5_digest": "7653d23823119328a7f753d841927653",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.14,>=3.10",
            "size": 76949,
            "upload_time": "2025-01-26T08:00:57",
            "upload_time_iso_8601": "2025-01-26T08:00:57.851511Z",
            "url": "https://files.pythonhosted.org/packages/c1/85/5485edc8d0f4d7a0bd94d297177ebe8aa8b48b27e64a741ea262c1ed7fc1/krag-0.0.30.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-26 08:00:57",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "krag"
}
        
Elapsed time: 0.38932s