examinationrag

Name	examinationrag JSON
Version	0.1.4 JSON
	download
home_page	https://github.com/DocAILab/XRAG
Summary	XRAG: eXamining the Core - Benchmarking Foundational Component Modules in Advanced Retrieval-Augmented Generation
upload_time	2025-02-07 04:58:08
maintainer	None
docs_url	None
author	DocAILab
requires_python	>=3.9.0
license	Apache 2.0 License
keywords	rag llm chatgpt llamaindex benchmarking
VCS
bugtrack_url
requirements	dashscope langchain scikit-learn llama-index litellm numpy openai pandas sentence_transformers torch tqdm transformers llama-index-embeddings-huggingface llama-index-embeddings-langchain llama-index-embeddings-openai llama-index-retrievers-bm25 llama-index-readers-file llama-index-legacy llama-index-llms-dashscope llama-index-llms-huggingface llama-index-llms-openai llama-index-llms-ollama FlagEmbedding zhipuai uptrain toml deepeval rouge-score evaluate fire validators llama-index-postprocessor-cohere-rerank llama-index-postprocessor-colbert-rerank llama-index-postprocessor-flag-embedding-reranker sacrebleu jiwer seqeval bert_score sentencepiece mauve-text streamlit streamlit-option-menu streamlit_card fastapi uvicorn pydantic
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            <div style="display: flex; justify-content: center; align-items: center; height: 100px;">
  <h1 style="font-size: 48px;">
    XRAG: eXamining the Core - Benchmarking Foundational Component Modules in Advanced Retrieval-Augmented Generation
  </h1>
</div>







<img src="imgs/logo.png" width="100%" align="center" alt="XRAG">

[![PyPI version](https://badge.fury.io/py/examinationrag.svg)](https://badge.fury.io/py/examinationrag)
[![Python](https://img.shields.io/pypi/pyversions/examinationrag)](https://pypi.org/project/examinationrag/)
[![License](https://img.shields.io/github/license/DocAILab/XRAG)](https://github.com/DocAILab/XRAG/blob/main/LICENSE)
[![Downloads](https://static.pepy.tech/badge/examinationrag)](https://pepy.tech/project/examinationrag)
[![GitHub stars](https://img.shields.io/github/stars/DocAILab/XRAG)](https://github.com/DocAILab/XRAG/stargazers)
[![GitHub issues](https://img.shields.io/github/issues/DocAILab/XRAG)](https://github.com/DocAILab/XRAG/issues)
[![arXiv](https://img.shields.io/badge/arXiv-2412.15529-b31b1b.svg)](https://arxiv.org/abs/2412.15529)

## 📑 Table of Contents

- [:mega: Updates](#mega-updates)
- [:book: Introduction](#-introduction)
- [:sparkles: Features](#-features)
- [:globe_with_meridians: WebUI Demo](#-webui-demo)
- [:hammer_and_wrench: Installation](#️-installation)
- [:rocket: Quick Start](#-quick-start)
- [:gear: Configuration](#️-configuration)
- [:warning: Troubleshooting](#-troubleshooting)
- [:clipboard: Changelog](#-changelog)
- [:speech_balloon: Feedback and Support](#-feedback-and-support)
- [:round_pushpin: Acknowledgement](#round_pushpin-acknowledgement)
- [:books: Citation](#-citation)

## :mega: Updates

- **2025-01.09: Add API support. Now you can use XRAG as a backend service.**
- **2025-01.06: Add ollama LLM support.**
- **2025-01.05: Add generate command. Now you can generate your own QA pairs from a folder which contains your documents.**
- **2024-12.23: XRAG Documentation is released**🌈.
- **2024-12.20: XRAG is released**🎉.
---
## 📖 Introduction
<img src="imgs/overall.png" width="100%" align="center" alt="XRAG">
XRAG is a benchmarking framework designed to evaluate the foundational components of advanced Retrieval-Augmented Generation (RAG) systems. By dissecting and analyzing each core module, XRAG provides insights into how different configurations and components impact the overall performance of RAG systems.

---

## ✨ Features

- **🔍 Comprehensive Evaluation Framework**: 
  - Multiple evaluation dimensions: LLM-based evaluation, Deep evaluation, and traditional metrics
  - Support for evaluating retrieval quality, response faithfulness, and answer correctness
  - Built-in evaluation models including LlamaIndex, DeepEval, and custom metrics

- **⚙️ Flexible Architecture**:
  - Modular design with pluggable components for retrievers, embeddings, and LLMs
  - Support for various retrieval methods: Vector, BM25, Hybrid, and Tree-based
  - Easy integration with custom retrieval and evaluation strategies

- **🤖 Multiple LLM Support**:
  - Seamless integration with OpenAI models
  - Support for local models (Qwen, LLaMA, etc.)
  - Configurable model parameters and API settings

- **📊 Rich Evaluation Metrics**:
  - Traditional metrics: F1, EM, MRR, Hit@K, MAP, NDCG
  - LLM-based metrics: Faithfulness, Relevancy, Correctness
  - Deep evaluation metrics: Contextual Precision/Recall, Hallucination, Bias

- **🎯 Advanced Retrieval Methods**:
  - BM25-based retrieval
  - Vector-based semantic search
  - Tree-structured retrieval
  - Keyword-based retrieval
  - Document summary retrieval
  - Custom retrieval strategies

- **💻 User-Friendly Interface**:
  - Command-line interface with rich options
  - Web UI for interactive evaluation
  - Detailed evaluation reports and visualizations

---

## 🌐 WebUI Demo

XRAG provides an intuitive web interface for interactive evaluation and visualization. Launch it with:

```bash
xrag-cli webui
```

The WebUI guides you through the following workflow:

### 1. Dataset Upload and Configuration
<img src="imgs/dataset.png" width="100%" align="center" alt="Dataset Selection" style="border: 2px solid #666; border-radius: 8px; margin: 20px 0;">

Upload and configure your datasets:
- Support for benchmark datasets (HotpotQA, DropQA, NaturalQA)
- Custom dataset integration
- Automatic format conversion and preprocessing

### 2. Index Building and Configuration
<img src="imgs/index.png" width="100%" align="center" alt="Index Building" style="border: 2px solid #666; border-radius: 8px; margin: 20px 0;">

Configure system parameters and build indices:
- API key configuration
- Parameter settings
- Vector database index construction
- Chunk size optimization

### 3. RAG Strategy Configuration
<img src="imgs/strategies.png" width="100%" align="center" alt="RAG Strategies" style="border: 2px solid #666; border-radius: 8px; margin: 20px 0;">

Define your RAG pipeline components:
- Pre-retrieval methods
- Retriever selection
- Post-processor configuration
- Custom prompt template creation

### 4. Interactive Testing
<img src="imgs/evalone.png" width="100%" align="center" alt="Testing Interface" style="border: 2px solid #666; border-radius: 8px; margin: 20px 0;">

Test your RAG system interactively:
- Real-time query testing
- Retrieval result inspection
- Response generation review
- Performance analysis

### 5. Comprehensive Evaluation
<img src="imgs/eval2.png" width="100%" align="center" alt="Evaluation Metrics" style="border: 2px solid #666; border-radius: 8px; margin: 20px 0;">
<img src="imgs/eval3.png" width="100%" align="center" alt="Evaluation Dashboard" style="border: 2px solid #666; border-radius: 8px; margin: 20px 0;">


## 🛠️ Installation

Before installing XRAG, ensure that you have Python 3.11 or later installed.

### Create a Virtual Environment via conda(Recommended)

```bash
  
# Create a new conda environment
conda create -n xrag python=3.11

# Activate the environment
conda activate xrag
```

### **Install via pip**

You can install XRAG directly using `pip`:

```bash
# Install XRAG
pip install examinationrag

# Install 'jury' without dependencies to avoid conflicts
pip install jury --no-deps
```
---

## 🚀 Quick Start

Here's how you can get started with XRAG:

### 1. **Prepare Configuration**: 

Modify the `config.toml` file to set up your desired configurations.

### 2. Using `xrag-cli`

After installing XRAG, the `xrag-cli` command becomes available in your environment. This command provides a convenient way to interact with XRAG without needing to call Python scripts directly.

### **Command Structure**

```bash
xrag-cli [command] [options]
```

### **Commands and Options**

- **run**: Runs the benchmarking process.

  ```bash
  xrag-cli run [--override key=value ...]
  ```

- **webui**: Launches the web-based user interface.

  ```bash
  xrag-cli webui
  ```

- **ver**: Displays the current version of XRAG.

  ```bash
  xrag-cli version
  ```

- **help**: Displays help information.

  ```bash
  xrag-cli help
  ```

- **generate**: Generate QA pairs from a folder.

  ```bash
  xrag-cli generate -i <input_file> -o <output_file> -n <num_questions> -s <sentence_length>
  ```

- **api**: Launch the API server for XRAG services.

  ```bash
  xrag-cli api [--host <host>] [--port <port>] [--json_path <json_path>] [--dataset_folder <dataset_folder>]
  ```

  Options:
  - `--host`: API server host address (default: 0.0.0.0)
  - `--port`: API server port number (default: 8000)
  - `--json_path`: Path to the JSON configuration file
  - `--dataset_folder`: Path to the dataset folder

### **Using the API Service**

Once the API server is running, you can interact with it using HTTP requests. Here are the available endpoints:

#### 1. Query Endpoint

Send a POST request to `/query` to get answers based on your documents:

```bash
curl -X POST "http://localhost:8000/query" \
     -H "Content-Type: application/json" \
     -d '{
           "query": "your question here",
           "top_k": 3
         }'
```

Response format:
```json
{
    "answer": "Generated answer to your question",
    "sources": [
        {
            "content": "Source document content",
            "id": "document_id",
            "score": 0.85
        }
    ]
}
```

#### 2. Health Check

Check the API server status with a GET request to `/health`:

```bash
curl "http://localhost:8000/health"
```

Response format:
```json
{
    "status": "healthy",
    "engine_status": "initialized"
}
```

The API service supports both custom JSON datasets and folder-based documents:
- Use `--json_path` for JSON format QA datasets
- Use `--dataset_folder` for document folders
- Do **not** set `--json_path` and `--dataset_folder` at the same time.

  

### **Overriding Configuration Parameters**

Use the `--override` flag followed by key-value pairs to override configuration settings:

```bash
xrag-cli run --override embeddings="new-embedding-model"
```

### **Generate QA pairs from a folder**

```bash
xrag-cli generate -i <input_file> -o <output_file> -n <num_questions> -s <sentence_length>
```

Automatically generate QA pairs from a folder.

---

## ⚙️ Configuration

XRAG uses a `config.toml` file for configuration management. Here's a detailed explanation of the configuration options:

```toml
[api_keys]
api_key = "sk-xxxx"          # Your API key for LLM service
api_base = "https://xxx"     # API base URL
api_name = "gpt-4o"     # Model name
auth_token = "hf_xxx"        # Hugging Face auth token

[settings]
llm = "openai" # openai, huggingface, ollama
ollama_model = "llama2:7b" # ollama model name
huggingface_model = "llama" # huggingface model name
embeddings = "BAAI/bge-large-en-v1.5"
split_type = "sentence"
chunk_size = 128
dataset = "hotpot_qa"
persist_dir = "storage"
# ... additional settings ...
```

---

## ❗ Troubleshooting

- **Dependency Conflicts**: If you encounter dependency issues, ensure that you have the correct versions specified in `requirements.txt` and consider using a virtual environment.

- **Invalid Configuration Keys**: Ensure that the keys you override match exactly with those in the `config.toml` file.

- **Data Type Mismatches**: When overriding configurations, make sure the values are of the correct data type (e.g., integers, booleans).

---

## 📝 Changelog


### Version 0.1.3

- Add API support. Now you can use XRAG as a backend service.

### Version 0.1.2
- Add ollama LLM support.

### Version 0.1.1
- Add generate command. Now you can generate your own QA pairs from a folder which contains your documents.

### Version 0.1.0

- Initial release with core benchmarking functionality.
- Support for HotpotQA dataset.
- Command-line configuration overrides.
- Introduction of the `xrag-cli` command-line tool.

---

## 💬 Feedback and Support

We value feedback from our users. If you have suggestions, feature requests, or encounter issues:

- **Open an Issue**: Submit an issue on our [GitHub repository](https://github.com/DocAILab/xrag/issues).
- **Email Us**: Reach out at [luoyangyifei@buaa.edu.cn](mailto:luoyangyifei@buaa.edu.cn).
- **Join the Discussion**: Participate in discussions and share your insights.

---


## :round_pushpin: Acknowledgement

- Organizers: [Qianren Mao](https://github.com/qianrenmao), [Yangyifei Luo (罗杨一飞)](https://github.com/lyyf2002), [Jinlong Zhang (张金龙)](https://github.com/therealoliver), [Hanwen Hao (郝瀚文)](https://github.com/TheSleepGod), [Zhenting Huang (黄振庭)](https://github.com/hztBUAA), [Zhilong Cao(曹之龙)](https://github.com/afdafczl)

- This project is inspired by [RAGLAB](https://github.com/fate-ubw/RAGLab), [FlashRAG](https://github.com/RUC-NLPIR/FlashRAG), [FastRAG](https://github.com/IntelLabs/fastRAG), [AutoRAG](https://github.com/Marker-Inc-Korea/AutoRAG), [LocalRAG](https://github.com/jasonyux/LocalRQA).

- We are deeply grateful for the following external libraries, which have been pivotal to the development and functionality of our project:  [LlamaIndex](https://docs.llamaindex.ai/en/stable/), [Hugging Face Transformers](https://github.com/huggingface/transformers).

## 📚 Citation

If you find this work helpful, please cite our paper:

```bibtex
@misc{mao2024xragexaminingcore,
      title={XRAG: eXamining the Core -- Benchmarking Foundational Components in Advanced Retrieval-Augmented Generation}, 
      author={Qianren Mao and Yangyifei Luo and Jinlong Zhang and Hanwen Hao and Zhilong Cao and Xiaolong Wang and Xiao Guan and Zhenting Huang and Weifeng Jiang and Shuyu Guo and Zhentao Han and Qili Zhang and Siyuan Tao and Yujie Liu and Junnan Liu and Zhixing Tan and Jie Sun and Bo Li and Xudong Liu and Richong Zhang and Jianxin Li},
      year={2024},
      eprint={2412.15529},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2412.15529}, 
}
```

## 🙏 Thank You

Thank you for using XRAG! We hope it proves valuable in your research and development efforts in the field of Retrieval-Augmented Generation.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/DocAILab/XRAG",
    "name": "examinationrag",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9.0",
    "maintainer_email": null,
    "keywords": "RAG, LLM, ChatGPT, LLamaIndex, Benchmarking",
    "author": "DocAILab",
    "author_email": "luoyangyifei@buaa.edu.cn",
    "download_url": "https://files.pythonhosted.org/packages/fe/d5/1f6ac9bdcc6a7ec36d2d848440ab9a2c169d0caa119992f21c2233687397/examinationrag-0.1.4.tar.gz",
    "platform": null,
    "description": "<div style=\"display: flex; justify-content: center; align-items: center; height: 100px;\">\n  <h1 style=\"font-size: 48px;\">\n    XRAG: eXamining the Core - Benchmarking Foundational Component Modules in Advanced Retrieval-Augmented Generation\n  </h1>\n</div>\n\n\n\n\n\n\n\n<img src=\"imgs/logo.png\" width=\"100%\" align=\"center\" alt=\"XRAG\">\n\n[![PyPI version](https://badge.fury.io/py/examinationrag.svg)](https://badge.fury.io/py/examinationrag)\n[![Python](https://img.shields.io/pypi/pyversions/examinationrag)](https://pypi.org/project/examinationrag/)\n[![License](https://img.shields.io/github/license/DocAILab/XRAG)](https://github.com/DocAILab/XRAG/blob/main/LICENSE)\n[![Downloads](https://static.pepy.tech/badge/examinationrag)](https://pepy.tech/project/examinationrag)\n[![GitHub stars](https://img.shields.io/github/stars/DocAILab/XRAG)](https://github.com/DocAILab/XRAG/stargazers)\n[![GitHub issues](https://img.shields.io/github/issues/DocAILab/XRAG)](https://github.com/DocAILab/XRAG/issues)\n[![arXiv](https://img.shields.io/badge/arXiv-2412.15529-b31b1b.svg)](https://arxiv.org/abs/2412.15529)\n\n## \ud83d\udcd1 Table of Contents\n\n- [:mega: Updates](#mega-updates)\n- [:book: Introduction](#-introduction)\n- [:sparkles: Features](#-features)\n- [:globe_with_meridians: WebUI Demo](#-webui-demo)\n- [:hammer_and_wrench: Installation](#\ufe0f-installation)\n- [:rocket: Quick Start](#-quick-start)\n- [:gear: Configuration](#\ufe0f-configuration)\n- [:warning: Troubleshooting](#-troubleshooting)\n- [:clipboard: Changelog](#-changelog)\n- [:speech_balloon: Feedback and Support](#-feedback-and-support)\n- [:round_pushpin: Acknowledgement](#round_pushpin-acknowledgement)\n- [:books: Citation](#-citation)\n\n## :mega: Updates\n\n- **2025-01.09: Add API support. Now you can use XRAG as a backend service.**\n- **2025-01.06: Add ollama LLM support.**\n- **2025-01.05: Add generate command. Now you can generate your own QA pairs from a folder which contains your documents.**\n- **2024-12.23: XRAG Documentation is released**\ud83c\udf08.\n- **2024-12.20: XRAG is released**\ud83c\udf89.\n---\n## \ud83d\udcd6 Introduction\n<img src=\"imgs/overall.png\" width=\"100%\" align=\"center\" alt=\"XRAG\">\nXRAG is a benchmarking framework designed to evaluate the foundational components of advanced Retrieval-Augmented Generation (RAG) systems. By dissecting and analyzing each core module, XRAG provides insights into how different configurations and components impact the overall performance of RAG systems.\n\n---\n\n## \u2728 Features\n\n- **\ud83d\udd0d Comprehensive Evaluation Framework**: \n  - Multiple evaluation dimensions: LLM-based evaluation, Deep evaluation, and traditional metrics\n  - Support for evaluating retrieval quality, response faithfulness, and answer correctness\n  - Built-in evaluation models including LlamaIndex, DeepEval, and custom metrics\n\n- **\u2699\ufe0f Flexible Architecture**:\n  - Modular design with pluggable components for retrievers, embeddings, and LLMs\n  - Support for various retrieval methods: Vector, BM25, Hybrid, and Tree-based\n  - Easy integration with custom retrieval and evaluation strategies\n\n- **\ud83e\udd16 Multiple LLM Support**:\n  - Seamless integration with OpenAI models\n  - Support for local models (Qwen, LLaMA, etc.)\n  - Configurable model parameters and API settings\n\n- **\ud83d\udcca Rich Evaluation Metrics**:\n  - Traditional metrics: F1, EM, MRR, Hit@K, MAP, NDCG\n  - LLM-based metrics: Faithfulness, Relevancy, Correctness\n  - Deep evaluation metrics: Contextual Precision/Recall, Hallucination, Bias\n\n- **\ud83c\udfaf Advanced Retrieval Methods**:\n  - BM25-based retrieval\n  - Vector-based semantic search\n  - Tree-structured retrieval\n  - Keyword-based retrieval\n  - Document summary retrieval\n  - Custom retrieval strategies\n\n- **\ud83d\udcbb User-Friendly Interface**:\n  - Command-line interface with rich options\n  - Web UI for interactive evaluation\n  - Detailed evaluation reports and visualizations\n\n---\n\n## \ud83c\udf10 WebUI Demo\n\nXRAG provides an intuitive web interface for interactive evaluation and visualization. Launch it with:\n\n```bash\nxrag-cli webui\n```\n\nThe WebUI guides you through the following workflow:\n\n### 1. Dataset Upload and Configuration\n<img src=\"imgs/dataset.png\" width=\"100%\" align=\"center\" alt=\"Dataset Selection\" style=\"border: 2px solid #666; border-radius: 8px; margin: 20px 0;\">\n\nUpload and configure your datasets:\n- Support for benchmark datasets (HotpotQA, DropQA, NaturalQA)\n- Custom dataset integration\n- Automatic format conversion and preprocessing\n\n### 2. Index Building and Configuration\n<img src=\"imgs/index.png\" width=\"100%\" align=\"center\" alt=\"Index Building\" style=\"border: 2px solid #666; border-radius: 8px; margin: 20px 0;\">\n\nConfigure system parameters and build indices:\n- API key configuration\n- Parameter settings\n- Vector database index construction\n- Chunk size optimization\n\n### 3. RAG Strategy Configuration\n<img src=\"imgs/strategies.png\" width=\"100%\" align=\"center\" alt=\"RAG Strategies\" style=\"border: 2px solid #666; border-radius: 8px; margin: 20px 0;\">\n\nDefine your RAG pipeline components:\n- Pre-retrieval methods\n- Retriever selection\n- Post-processor configuration\n- Custom prompt template creation\n\n### 4. Interactive Testing\n<img src=\"imgs/evalone.png\" width=\"100%\" align=\"center\" alt=\"Testing Interface\" style=\"border: 2px solid #666; border-radius: 8px; margin: 20px 0;\">\n\nTest your RAG system interactively:\n- Real-time query testing\n- Retrieval result inspection\n- Response generation review\n- Performance analysis\n\n### 5. Comprehensive Evaluation\n<img src=\"imgs/eval2.png\" width=\"100%\" align=\"center\" alt=\"Evaluation Metrics\" style=\"border: 2px solid #666; border-radius: 8px; margin: 20px 0;\">\n<img src=\"imgs/eval3.png\" width=\"100%\" align=\"center\" alt=\"Evaluation Dashboard\" style=\"border: 2px solid #666; border-radius: 8px; margin: 20px 0;\">\n\n\n## \ud83d\udee0\ufe0f Installation\n\nBefore installing XRAG, ensure that you have Python 3.11 or later installed.\n\n### Create a Virtual Environment via conda(Recommended)\n\n```bash\n  \n# Create a new conda environment\nconda create -n xrag python=3.11\n\n# Activate the environment\nconda activate xrag\n```\n\n### **Install via pip**\n\nYou can install XRAG directly using `pip`:\n\n```bash\n# Install XRAG\npip install examinationrag\n\n# Install 'jury' without dependencies to avoid conflicts\npip install jury --no-deps\n```\n---\n\n## \ud83d\ude80 Quick Start\n\nHere's how you can get started with XRAG:\n\n### 1. **Prepare Configuration**: \n\nModify the `config.toml` file to set up your desired configurations.\n\n### 2. Using `xrag-cli`\n\nAfter installing XRAG, the `xrag-cli` command becomes available in your environment. This command provides a convenient way to interact with XRAG without needing to call Python scripts directly.\n\n### **Command Structure**\n\n```bash\nxrag-cli [command] [options]\n```\n\n### **Commands and Options**\n\n- **run**: Runs the benchmarking process.\n\n  ```bash\n  xrag-cli run [--override key=value ...]\n  ```\n\n- **webui**: Launches the web-based user interface.\n\n  ```bash\n  xrag-cli webui\n  ```\n\n- **ver**: Displays the current version of XRAG.\n\n  ```bash\n  xrag-cli version\n  ```\n\n- **help**: Displays help information.\n\n  ```bash\n  xrag-cli help\n  ```\n\n- **generate**: Generate QA pairs from a folder.\n\n  ```bash\n  xrag-cli generate -i <input_file> -o <output_file> -n <num_questions> -s <sentence_length>\n  ```\n\n- **api**: Launch the API server for XRAG services.\n\n  ```bash\n  xrag-cli api [--host <host>] [--port <port>] [--json_path <json_path>] [--dataset_folder <dataset_folder>]\n  ```\n\n  Options:\n  - `--host`: API server host address (default: 0.0.0.0)\n  - `--port`: API server port number (default: 8000)\n  - `--json_path`: Path to the JSON configuration file\n  - `--dataset_folder`: Path to the dataset folder\n\n### **Using the API Service**\n\nOnce the API server is running, you can interact with it using HTTP requests. Here are the available endpoints:\n\n#### 1. Query Endpoint\n\nSend a POST request to `/query` to get answers based on your documents:\n\n```bash\ncurl -X POST \"http://localhost:8000/query\" \\\n     -H \"Content-Type: application/json\" \\\n     -d '{\n           \"query\": \"your question here\",\n           \"top_k\": 3\n         }'\n```\n\nResponse format:\n```json\n{\n    \"answer\": \"Generated answer to your question\",\n    \"sources\": [\n        {\n            \"content\": \"Source document content\",\n            \"id\": \"document_id\",\n            \"score\": 0.85\n        }\n    ]\n}\n```\n\n#### 2. Health Check\n\nCheck the API server status with a GET request to `/health`:\n\n```bash\ncurl \"http://localhost:8000/health\"\n```\n\nResponse format:\n```json\n{\n    \"status\": \"healthy\",\n    \"engine_status\": \"initialized\"\n}\n```\n\nThe API service supports both custom JSON datasets and folder-based documents:\n- Use `--json_path` for JSON format QA datasets\n- Use `--dataset_folder` for document folders\n- Do **not** set `--json_path` and `--dataset_folder` at the same time.\n\n  \n\n### **Overriding Configuration Parameters**\n\nUse the `--override` flag followed by key-value pairs to override configuration settings:\n\n```bash\nxrag-cli run --override embeddings=\"new-embedding-model\"\n```\n\n### **Generate QA pairs from a folder**\n\n```bash\nxrag-cli generate -i <input_file> -o <output_file> -n <num_questions> -s <sentence_length>\n```\n\nAutomatically generate QA pairs from a folder.\n\n---\n\n## \u2699\ufe0f Configuration\n\nXRAG uses a `config.toml` file for configuration management. Here's a detailed explanation of the configuration options:\n\n```toml\n[api_keys]\napi_key = \"sk-xxxx\"          # Your API key for LLM service\napi_base = \"https://xxx\"     # API base URL\napi_name = \"gpt-4o\"     # Model name\nauth_token = \"hf_xxx\"        # Hugging Face auth token\n\n[settings]\nllm = \"openai\" # openai, huggingface, ollama\nollama_model = \"llama2:7b\" # ollama model name\nhuggingface_model = \"llama\" # huggingface model name\nembeddings = \"BAAI/bge-large-en-v1.5\"\nsplit_type = \"sentence\"\nchunk_size = 128\ndataset = \"hotpot_qa\"\npersist_dir = \"storage\"\n# ... additional settings ...\n```\n\n---\n\n## \u2757 Troubleshooting\n\n- **Dependency Conflicts**: If you encounter dependency issues, ensure that you have the correct versions specified in `requirements.txt` and consider using a virtual environment.\n\n- **Invalid Configuration Keys**: Ensure that the keys you override match exactly with those in the `config.toml` file.\n\n- **Data Type Mismatches**: When overriding configurations, make sure the values are of the correct data type (e.g., integers, booleans).\n\n---\n\n## \ud83d\udcdd Changelog\n\n\n### Version 0.1.3\n\n- Add API support. Now you can use XRAG as a backend service.\n\n### Version 0.1.2\n- Add ollama LLM support.\n\n### Version 0.1.1\n- Add generate command. Now you can generate your own QA pairs from a folder which contains your documents.\n\n### Version 0.1.0\n\n- Initial release with core benchmarking functionality.\n- Support for HotpotQA dataset.\n- Command-line configuration overrides.\n- Introduction of the `xrag-cli` command-line tool.\n\n---\n\n## \ud83d\udcac Feedback and Support\n\nWe value feedback from our users. If you have suggestions, feature requests, or encounter issues:\n\n- **Open an Issue**: Submit an issue on our [GitHub repository](https://github.com/DocAILab/xrag/issues).\n- **Email Us**: Reach out at [luoyangyifei@buaa.edu.cn](mailto:luoyangyifei@buaa.edu.cn).\n- **Join the Discussion**: Participate in discussions and share your insights.\n\n---\n\n\n## :round_pushpin: Acknowledgement\n\n- Organizers: [Qianren Mao](https://github.com/qianrenmao), [Yangyifei Luo (\u7f57\u6768\u4e00\u98de)](https://github.com/lyyf2002), [Jinlong Zhang (\u5f20\u91d1\u9f99)](https://github.com/therealoliver), [Hanwen Hao (\u90dd\u701a\u6587)](https://github.com/TheSleepGod), [Zhenting Huang (\u9ec4\u632f\u5ead)](https://github.com/hztBUAA), [Zhilong Cao(\u66f9\u4e4b\u9f99)](https://github.com/afdafczl)\n\n- This project is inspired by [RAGLAB](https://github.com/fate-ubw/RAGLab), [FlashRAG](https://github.com/RUC-NLPIR/FlashRAG), [FastRAG](https://github.com/IntelLabs/fastRAG), [AutoRAG](https://github.com/Marker-Inc-Korea/AutoRAG), [LocalRAG](https://github.com/jasonyux/LocalRQA).\n\n- We are deeply grateful for the following external libraries, which have been pivotal to the development and functionality of our project:  [LlamaIndex](https://docs.llamaindex.ai/en/stable/), [Hugging Face Transformers](https://github.com/huggingface/transformers).\n\n## \ud83d\udcda Citation\n\nIf you find this work helpful, please cite our paper:\n\n```bibtex\n@misc{mao2024xragexaminingcore,\n      title={XRAG: eXamining the Core -- Benchmarking Foundational Components in Advanced Retrieval-Augmented Generation}, \n      author={Qianren Mao and Yangyifei Luo and Jinlong Zhang and Hanwen Hao and Zhilong Cao and Xiaolong Wang and Xiao Guan and Zhenting Huang and Weifeng Jiang and Shuyu Guo and Zhentao Han and Qili Zhang and Siyuan Tao and Yujie Liu and Junnan Liu and Zhixing Tan and Jie Sun and Bo Li and Xudong Liu and Richong Zhang and Jianxin Li},\n      year={2024},\n      eprint={2412.15529},\n      archivePrefix={arXiv},\n      primaryClass={cs.CL},\n      url={https://arxiv.org/abs/2412.15529}, \n}\n```\n\n## \ud83d\ude4f Thank You\n\nThank you for using XRAG! We hope it proves valuable in your research and development efforts in the field of Retrieval-Augmented Generation.\n",
    "bugtrack_url": null,
    "license": "Apache 2.0 License",
    "summary": "XRAG: eXamining the Core - Benchmarking Foundational Component Modules in Advanced Retrieval-Augmented Generation",
    "version": "0.1.4",
    "project_urls": {
        "Homepage": "https://github.com/DocAILab/XRAG"
    },
    "split_keywords": [
        "rag",
        " llm",
        " chatgpt",
        " llamaindex",
        " benchmarking"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "d12ce0ca55fcb232db430b932b7e94cfa750b61b766cb755ee7efb21ebbb135c",
                "md5": "7ddac7b6d788d8477d29a8f99f0efe7e",
                "sha256": "6b46e9c52b9a980d5a79621ad1341f3ba240ec10aa1e2ca3ed00f6687e503b5d"
            },
            "downloads": -1,
            "filename": "examinationrag-0.1.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "7ddac7b6d788d8477d29a8f99f0efe7e",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9.0",
            "size": 75884,
            "upload_time": "2025-02-07T04:58:07",
            "upload_time_iso_8601": "2025-02-07T04:58:07.045298Z",
            "url": "https://files.pythonhosted.org/packages/d1/2c/e0ca55fcb232db430b932b7e94cfa750b61b766cb755ee7efb21ebbb135c/examinationrag-0.1.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "fed51f6ac9bdcc6a7ec36d2d848440ab9a2c169d0caa119992f21c2233687397",
                "md5": "3207ab43539e2bdd1ee5ac3aaa358a57",
                "sha256": "910cca1682ba77f1f90908562bfb75c6c9e5e0a7adbe869515e8530b051b0eef"
            },
            "downloads": -1,
            "filename": "examinationrag-0.1.4.tar.gz",
            "has_sig": false,
            "md5_digest": "3207ab43539e2bdd1ee5ac3aaa358a57",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9.0",
            "size": 65328,
            "upload_time": "2025-02-07T04:58:08",
            "upload_time_iso_8601": "2025-02-07T04:58:08.985066Z",
            "url": "https://files.pythonhosted.org/packages/fe/d5/1f6ac9bdcc6a7ec36d2d848440ab9a2c169d0caa119992f21c2233687397/examinationrag-0.1.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-07 04:58:08",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "DocAILab",
    "github_project": "XRAG",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "dashscope",
            "specs": [
                [
                    "==",
                    "1.20.0"
                ]
            ]
        },
        {
            "name": "langchain",
            "specs": [
                [
                    "==",
                    "0.2.7"
                ]
            ]
        },
        {
            "name": "scikit-learn",
            "specs": [
                [
                    "==",
                    "1.3.2"
                ]
            ]
        },
        {
            "name": "llama-index",
            "specs": [
                [
                    "==",
                    "0.11.11"
                ]
            ]
        },
        {
            "name": "litellm",
            "specs": [
                [
                    "==",
                    "1.48.10"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    "==",
                    "1.26.4"
                ]
            ]
        },
        {
            "name": "openai",
            "specs": [
                [
                    "==",
                    "1.47.0"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    "==",
                    "2.2.2"
                ]
            ]
        },
        {
            "name": "sentence_transformers",
            "specs": [
                [
                    "==",
                    "3.0.1"
                ]
            ]
        },
        {
            "name": "torch",
            "specs": [
                [
                    "==",
                    "2.3.0"
                ]
            ]
        },
        {
            "name": "tqdm",
            "specs": [
                [
                    "==",
                    "4.66.4"
                ]
            ]
        },
        {
            "name": "transformers",
            "specs": [
                [
                    "==",
                    "4.42.2"
                ]
            ]
        },
        {
            "name": "llama-index-embeddings-huggingface",
            "specs": [
                [
                    "==",
                    "0.3.1"
                ]
            ]
        },
        {
            "name": "llama-index-embeddings-langchain",
            "specs": [
                [
                    "==",
                    "0.2.1"
                ]
            ]
        },
        {
            "name": "llama-index-embeddings-openai",
            "specs": [
                [
                    "==",
                    "0.2.5"
                ]
            ]
        },
        {
            "name": "llama-index-retrievers-bm25",
            "specs": [
                [
                    "==",
                    "0.3.0"
                ]
            ]
        },
        {
            "name": "llama-index-readers-file",
            "specs": [
                [
                    "==",
                    "0.2.2"
                ]
            ]
        },
        {
            "name": "llama-index-legacy",
            "specs": [
                [
                    "==",
                    "0.9.48"
                ]
            ]
        },
        {
            "name": "llama-index-llms-dashscope",
            "specs": [
                [
                    "==",
                    "0.2.1"
                ]
            ]
        },
        {
            "name": "llama-index-llms-huggingface",
            "specs": [
                [
                    "==",
                    "0.3.4"
                ]
            ]
        },
        {
            "name": "llama-index-llms-openai",
            "specs": [
                [
                    "==",
                    "0.2.9"
                ]
            ]
        },
        {
            "name": "llama-index-llms-ollama",
            "specs": [
                [
                    "==",
                    "0.3.0"
                ]
            ]
        },
        {
            "name": "FlagEmbedding",
            "specs": [
                [
                    "==",
                    "1.2.10"
                ]
            ]
        },
        {
            "name": "zhipuai",
            "specs": []
        },
        {
            "name": "uptrain",
            "specs": []
        },
        {
            "name": "toml",
            "specs": []
        },
        {
            "name": "deepeval",
            "specs": [
                [
                    "==",
                    "0.21.26"
                ]
            ]
        },
        {
            "name": "rouge-score",
            "specs": [
                [
                    "==",
                    "0.0.4"
                ]
            ]
        },
        {
            "name": "evaluate",
            "specs": []
        },
        {
            "name": "fire",
            "specs": [
                [
                    ">=",
                    "0.4.0"
                ]
            ]
        },
        {
            "name": "validators",
            "specs": []
        },
        {
            "name": "llama-index-postprocessor-cohere-rerank",
            "specs": [
                [
                    "==",
                    "0.2.1"
                ]
            ]
        },
        {
            "name": "llama-index-postprocessor-colbert-rerank",
            "specs": [
                [
                    "==",
                    "0.2.1"
                ]
            ]
        },
        {
            "name": "llama-index-postprocessor-flag-embedding-reranker",
            "specs": [
                [
                    "==",
                    "0.2.0"
                ]
            ]
        },
        {
            "name": "sacrebleu",
            "specs": []
        },
        {
            "name": "jiwer",
            "specs": []
        },
        {
            "name": "seqeval",
            "specs": [
                [
                    "==",
                    "1.2.2"
                ]
            ]
        },
        {
            "name": "bert_score",
            "specs": []
        },
        {
            "name": "sentencepiece",
            "specs": []
        },
        {
            "name": "mauve-text",
            "specs": []
        },
        {
            "name": "streamlit",
            "specs": []
        },
        {
            "name": "streamlit-option-menu",
            "specs": []
        },
        {
            "name": "streamlit_card",
            "specs": []
        },
        {
            "name": "fastapi",
            "specs": []
        },
        {
            "name": "uvicorn",
            "specs": []
        },
        {
            "name": "pydantic",
            "specs": []
        }
    ],
    "lcname": "examinationrag"
}

DocAILab