rankify


Namerankify JSON
Version 0.1.0.post4 PyPI version JSON
download
home_pageNone
SummaryA Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation
upload_time2025-02-10 03:29:01
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseNone
keywords retrieval re-ranking rag nlp search
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            


### <div align="center">πŸ”₯ Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation πŸ”₯<div>


<div align="center">
<a href="https://arxiv.org/abs/2502.02464" target="_blank"><img src=https://img.shields.io/badge/arXiv-b5212f.svg?logo=arxiv></a>
<a href="https://huggingface.co/datasets/abdoelsayed/reranking-datasets" target="_blank"><img src=https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace%20Datasets-27b3b4.svg></a>
<a href="https://huggingface.co/datasets/abdoelsayed/reranking-datasets-light" target="_blank"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace%20Datasets%20light-orange.svg"></a>
<a><img alt="Static Badge" src="https://img.shields.io/badge/Python-3.10_3.11-blue"></a>
<a href="https://opensource.org/license/apache-2-0"><img src="https://img.shields.io/static/v1?label=License&message=Apache-2.0&color=red"></a>
 <a href="https://pepy.tech/projects/rankify"><img src="https://static.pepy.tech/badge/rankify" alt="PyPI Downloads"></a>
<a href="https://github.com/DataScienceUIBK/rankify/releases"><img alt="GitHub release" src="https://img.shields.io/github/release/DataScienceUIBK/rankify.svg?label=Version&color=orange"></a>    
</div>


_A modular and efficient retrieval, reranking  and RAG  framework designed to work with state-of-the-art models for retrieval, ranking and rag tasks._

_Rankify is a Python toolkit designed for unified retrieval, re-ranking, and retrieval-augmented generation (RAG) research. Our toolkit integrates 40 pre-retrieved benchmark datasets and supports 7 retrieval techniques, 24 state-of-the-art re-ranking models, and multiple RAG methods. Rankify provides a modular and extensible framework, enabling seamless experimentation and benchmarking across retrieval pipelines. Comprehensive documentation, open-source implementation, and pre-built evaluation tools make Rankify a powerful resource for researchers and practitioners in the field._



## ✨ Features

- **Comprehensive Retrieval & Reranking Framework**: Rankify unifies retrieval, re-ranking, and retrieval-augmented generation (RAG) into a single modular Python toolkit, enabling seamless experimentation and benchmarking.  

- **Extensive Dataset Support**: Includes **40 benchmark datasets** with **pre-retrieved documents**, covering diverse domains such as **question answering, dialogue, entity linking, and fact verification**.  

- **Diverse Retriever Integration**: Supports **7 retrieval techniques**, including **BM25, DPR, ANCE, BPR, ColBERT, BGE, and Contriever**, providing flexibility for various retrieval strategies.  

- **Advanced Re-ranking Models**: Implements **24 primary re-ranking models** with **41 sub-methods**, covering **pointwise, pairwise, and listwise** re-ranking approaches for enhanced ranking performance.  

- **Prebuilt Retrieval Indices**: Provides **precomputed Wikipedia and MS MARCO corpora** for multiple retrieval models, eliminating indexing overhead and accelerating experiments.  

- **Seamless RAG Integration**: Bridges retrieval and generative models (e.g., **GPT, LLAMA, T5**), enabling retrieval-augmented generation with **zero-shot**, **Fusion-in-Decoder (FiD)**, and **in-context learning** strategies.  

- **Modular & Extensible Design**: Easily integrates custom datasets, retrievers, re-rankers, and generation models using Rankify’s structured Python API.  

- **Comprehensive Evaluation Suite**: Offers **automated performance evaluation** with **retrieval, ranking, and RAG metrics**, ensuring reproducible benchmarking.  

- **User-Friendly Documentation**: Detailed **[πŸ“– online documentation](http://rankify.readthedocs.io/)**, example notebooks, and tutorials for easy adoption.  

## πŸ” Roadmap  

**Rankify** is still under development, and this is our first release (**v0.1.0**). While it already supports a wide range of retrieval, re-ranking, and RAG techniques, we are actively enhancing its capabilities by adding more retrievers, rankers, datasets, and features.  

### πŸš€ Planned Improvements  

- **Retrievers**  
  - [x] Support for **BM25, DPR, ANCE, BPR, ColBERT, BGE, and Contriever**  
  - [ ] Add missing retrievers: **Spar, MSS, MSS-DPR**  
  - [ ] Enable **custom index loading** and support for user-defined retrieval corpora  

- **Re-Rankers**  
  - [x] 24 primary re-ranking models with 41 sub-methods  
  - [ ] Expand the list by adding **more advanced ranking models** 

- **Datasets**  
  - [x] 40 benchmark datasets for retrieval, ranking, and RAG  
  - [ ] Add **more datasets**  
  - [ ] Support for **custom dataset integration**  

- **Retrieval-Augmented Generation (RAG)**  
  - [x] Integration with **GPT, LLAMA, and T5**  
  - [ ] Extend support for **more generative models**   

- **Evaluation & Usability**  
  - [x] Standard retrieval and ranking evaluation metrics (Top-K, EM, Recall, ...)
  - [ ] Add **advanced evaluation metrics** (NDCG, MAP for retriever )  

- **Pipeline Integration**  
  - [ ] **Add a pipeline module** for streamlined retrieval, re-ranking, and RAG workflows 

## πŸ”§ Installation  

#### Set up the virtual environment
First, create and activate a conda environment with Python 3.10:

```bash
conda create -n rankify python=3.10
conda activate rankify
```
#### Install PyTorch 2.5.1
we recommend installing Rankify with PyTorch 2.5.1 for Rankify. Refer to the [PyTorch installation page](https://pytorch.org/get-started/previous-versions/) for platform-specific installation commands. 

If you have access to GPUs, it's recommended to install the CUDA version 12.4 or 12.6 of PyTorch, as many of the evaluation metrics are optimized for GPU use.

To install Pytorch 2.5.1 you can install it from the following cmd
```bash
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124
```


#### Basic Installation

To install **Rankify**, simply use **pip** (requires Python 3.10+):  
```base
pip install rankify
```

Or, to install from **GitHub** for the latest development version:  

```bash
git clone https://github.com/DataScienceUIBK/rankify.git
cd rankify
pip install -e .
# For full functionality we recommend installing Rankify with all dependencies:
pip install -e ".[all]"
# Install dependencies for retrieval only (BM25, DPR, ANCE, etc.)
pip install -e ".[retriever]"
# Install dependencies for base re-ranking only (excluding vLLM)
pip install -e ".[base]"
# Install base re-ranking with vLLM support for `FirstModelReranker`, `LiT5ScoreReranker`, `LiT5DistillReranker`, `VicunaReranker`, and `ZephyrReranker'.
pip install -e ".[reranking]"
# Install dependencies for retrieval-augmented generation (RAG)
pip install -e ".[rag]"
```
This will install the base functionality required for retrieval, re-ranking, and retrieval-augmented generation (RAG).  


#### Recommended Installation  

For full functionality, we **recommend installing Rankify with all dependencies**:
```bash
pip install "rankify[all]"
```
This ensures you have all necessary modules, including retrieval, re-ranking, and RAG support.

#### Optional Dependencies

If you prefer to install only specific components, choose from the following:
```bash
# Install dependencies for retrieval only (BM25, DPR, ANCE, etc.)
pip install "rankify[retriever]"

# Install dependencies for base re-ranking only (excluding vLLM)
pip install "rankify[base]"

# Install base re-ranking with vLLM support for `FirstModelReranker`, `LiT5ScoreReranker`, `LiT5DistillReranker`, `VicunaReranker`, and `ZephyrReranker'.
pip install "rankify[reranking]"

# Install dependencies for retrieval-augmented generation (RAG)
pip install "rankify[rag]"
```
#### Using ColBERT Retriever  

If you want to use **ColBERT Retriever**, follow these additional setup steps:
```bash
# Install GCC and required libraries
conda install -c conda-forge gcc=9.4.0 gxx=9.4.0
conda install -c conda-forge libstdcxx-ng
```
```bash
# Export necessary environment variables
export LD_LIBRARY_PATH=$CONDA_PREFIX/lib:$LD_LIBRARY_PATH
export CC=gcc
export CXX=g++
export PATH=$CONDA_PREFIX/bin:$PATH

# Clear cached torch extensions
rm -rf ~/.cache/torch_extensions/*
```

---

## πŸš€ Quick Start

### **1️⃣. Pre-retrieved Datasets**  

We provide **1,000 pre-retrieved documents per dataset**, which you can download from:  

πŸ”— **[Hugging Face Dataset Repository](https://huggingface.co/datasets/abdoelsayed/reranking-datasets-light)**  

#### **Dataset Format**  

The pre-retrieved documents are structured as follows:
```json
[
    {
        "question": "...",
        "answers": ["...", "...", ...],
        "ctxs": [
            {
                "id": "...",         // Passage ID from database TSV file
                "score": "...",      // Retriever score
                "has_answer": true|false  // Whether the passage contains the answer
            }
        ]
    }
]
```


#### **Access Datasets in Rankify**  

You can **easily download and use pre-retrieved datasets** through **Rankify**.  

#### **List Available Datasets**  

To see all available datasets:
```python
from rankify.dataset.dataset import Dataset 

# Display available datasets
Dataset.avaiable_dataset()
```

**BM25 Retriever**
```python
from rankify.dataset.dataset import Dataset
# Download BM25-retrieved documents for nq-dev
dataset = Dataset(retriever="bm25", dataset_name="nq-dev", n_docs=100)
documents = dataset.download(force_download=False)
# Download BM25-retrieved documents for 2wikimultihopqa-dev
dataset = Dataset(retriever="bm25", dataset_name="2wikimultihopqa-train", n_docs=100)
documents = dataset.download(force_download=False)
# Download BM25-retrieved documents for archivialqa-dev
dataset = Dataset(retriever="bm25", dataset_name="archivialqa-dev", n_docs=100)
documents = dataset.download(force_download=False)
# Download BM25-retrieved documents for archivialqa-test
dataset = Dataset(retriever="bm25", dataset_name="archivialqa-test", n_docs=100)
documents = dataset.download(force_download=False)
# Download BM25-retrieved documents for chroniclingamericaqa-test
dataset = Dataset(retriever="bm25", dataset_name="chroniclingamericaqa-test", n_docs=100)
documents = dataset.download(force_download=False)
# Download BM25-retrieved documents for chroniclingamericaqa-dev
dataset = Dataset(retriever="bm25", dataset_name="chroniclingamericaqa-dev", n_docs=100)
documents = dataset.download(force_download=False)
# Download BM25-retrieved documents for entityquestions-test
dataset = Dataset(retriever="bm25", dataset_name="entityquestions-test", n_docs=100)
documents = dataset.download(force_download=False)
# Download BM25-retrieved documents for ambig_qa-dev
dataset = Dataset(retriever="bm25", dataset_name="ambig_qa-dev", n_docs=100)
documents = dataset.download(force_download=False)
# Download BM25-retrieved documents for ambig_qa-train
dataset = Dataset(retriever="bm25", dataset_name="ambig_qa-train", n_docs=100)
documents = dataset.download(force_download=False)
# Download BM25-retrieved documents for arc-test
dataset = Dataset(retriever="bm25", dataset_name="arc-test", n_docs=100)
documents = dataset.download(force_download=False)
# Download BM25-retrieved documents for arc-dev
dataset = Dataset(retriever="bm25", dataset_name="arc-dev", n_docs=100)
documents = dataset.download(force_download=False)
```

**BGE Retriever**
```python
from rankify.dataset.dataset import Dataset
# Download BGE-retrieved documents for nq-dev
dataset = Dataset(retriever="bge", dataset_name="nq-dev", n_docs=100)
documents = dataset.download(force_download=False)
# Download BGE-retrieved documents for 2wikimultihopqa-dev
dataset = Dataset(retriever="bge", dataset_name="2wikimultihopqa-train", n_docs=100)
documents = dataset.download(force_download=False)
# Download BGE-retrieved documents for archivialqa-dev
dataset = Dataset(retriever="bge", dataset_name="archivialqa-dev", n_docs=100)
documents = dataset.download(force_download=False)
```

**ColBERT Retriever**


```python
from rankify.dataset.dataset import Dataset
# Download ColBERT-retrieved documents for nq-dev
dataset = Dataset(retriever="colbert", dataset_name="nq-dev", n_docs=100)
documents = dataset.download(force_download=False)
# Download ColBERT-retrieved documents for 2wikimultihopqa-dev
dataset = Dataset(retriever="colbert", dataset_name="2wikimultihopqa-train", n_docs=100)
documents = dataset.download(force_download=False)
# Download ColBERT-retrieved documents for archivialqa-dev
dataset = Dataset(retriever="colbert", dataset_name="archivialqa-dev", n_docs=100)
documents = dataset.download(force_download=False)
```

**MSS-DPR Retriever**


```python
from rankify.dataset.dataset import Dataset
# Download MSS-DPR-retrieved documents for nq-dev
dataset = Dataset(retriever="mss-dpr", dataset_name="nq-dev", n_docs=100)
documents = dataset.download(force_download=False)
# Download MSS-DPR-retrieved documents for 2wikimultihopqa-dev
dataset = Dataset(retriever="mss-dpr", dataset_name="2wikimultihopqa-train", n_docs=100)
documents = dataset.download(force_download=False)
# Download MSS-DPR-retrieved documents for archivialqa-dev
dataset = Dataset(retriever="mss-dpr", dataset_name="archivialqa-dev", n_docs=100)
documents = dataset.download(force_download=False)
```

**MSS Retriever**

```python
from rankify.dataset.dataset import Dataset
# Download MSS-retrieved documents for nq-dev
dataset = Dataset(retriever="mss", dataset_name="nq-dev", n_docs=100)
documents = dataset.download(force_download=False)
# Download MSS-retrieved documents for 2wikimultihopqa-dev
dataset = Dataset(retriever="mss", dataset_name="2wikimultihopqa-train", n_docs=100)
documents = dataset.download(force_download=False)
# Download MSS-retrieved documents for archivialqa-dev
dataset = Dataset(retriever="mss", dataset_name="archivialqa-dev", n_docs=100)
documents = dataset.download(force_download=False)
```

**Contriever Retriever**

```python
from rankify.dataset.dataset import Dataset
# Download MSS-retrieved documents for nq-dev
dataset = Dataset(retriever="contriever", dataset_name="nq-dev", n_docs=100)
documents = dataset.download(force_download=False)
# Download MSS-retrieved documents for 2wikimultihopqa-dev
dataset = Dataset(retriever="contriever", dataset_name="2wikimultihopqa-train", n_docs=100)
documents = dataset.download(force_download=False)
# Download MSS-retrieved documents for archivialqa-dev
dataset = Dataset(retriever="contriever", dataset_name="archivialqa-dev", n_docs=100)
documents = dataset.download(force_download=False)
```


**ANCE Retriever**

```python
from rankify.dataset.dataset import Dataset
# Download ANCE-retrieved documents for nq-dev
dataset = Dataset(retriever="ance", dataset_name="nq-dev", n_docs=100)
documents = dataset.download(force_download=False)
# Download ANCE-retrieved documents for 2wikimultihopqa-dev
dataset = Dataset(retriever="ance", dataset_name="2wikimultihopqa-train", n_docs=100)
documents = dataset.download(force_download=False)
# Download ANCE-retrieved documents for archivialqa-dev
dataset = Dataset(retriever="ance", dataset_name="archivialqa-dev", n_docs=100)
documents = dataset.download(force_download=False)
```

**Load Pre-retrieved Dataset from File**  

If you have already downloaded a dataset, you can load it directly:
```python
from rankify.dataset.dataset import Dataset

# Load pre-downloaded BM25 dataset for WebQuestions
documents = Dataset.load_dataset('./tests/out-datasets/bm25/web_questions/test.json', 100)
```
Now, you can integrate **retrieved documents** with **re-ranking** and **RAG** workflows! πŸš€  

---

### 2️⃣. Running Retrieval
To perform retrieval using **Rankify**, you can choose from various retrieval methods such as **BM25, DPR, ANCE, Contriever, ColBERT, and BGE**.  

**Example: Running Retrieval on Sample Queries**  
```python
from rankify.dataset.dataset import Document, Question, Answer, Context
from rankify.retrievers.retriever import Retriever

# Sample Documents
documents = [
    Document(question=Question("the cast of a good day to die hard?"), answers=Answer([
            "Jai Courtney",
            "Sebastian Koch",
            "Radivoje Bukvić",
            "Yuliya Snigir",
            "Sergei Kolesnikov",
            "Mary Elizabeth Winstead",
            "Bruce Willis"
        ]), contexts=[]),
    Document(question=Question("Who wrote Hamlet?"), answers=Answer(["Shakespeare"]), contexts=[])
]
```

```python
# BM25 retrieval on Wikipedia
bm25_retriever_wiki = Retriever(method="bm25", n_docs=5, index_type="wiki")

# BM25 retrieval on MS MARCO
bm25_retriever_msmacro = Retriever(method="bm25", n_docs=5, index_type="msmarco")


# DPR (multi-encoder) retrieval on Wikipedia
dpr_retriever_wiki = Retriever(method="dpr", model="dpr-multi", n_docs=5, index_type="wiki")

# DPR (multi-encoder) retrieval on MS MARCO
dpr_retriever_msmacro = Retriever(method="dpr", model="dpr-multi", n_docs=5, index_type="msmarco")

# DPR (single-encoder) retrieval on Wikipedia
dpr_retriever_wiki = Retriever(method="dpr", model="dpr-single", n_docs=5, index_type="wiki")

# DPR (single-encoder) retrieval on MS MARCO
dpr_retriever_msmacro = Retriever(method="dpr", model="dpr-single", n_docs=5, index_type="msmarco")

# ANCE retrieval on Wikipedia
ance_retriever_wiki = Retriever(method="ance", model="ance-multi", n_docs=5, index_type="wiki")

# ANCE retrieval on MS MARCO
ance_retriever_msmacro = Retriever(method="ance", model="ance-multi", n_docs=5, index_type="msmarco")


# Contriever retrieval on Wikipedia
contriever_retriever_wiki = Retriever(method="contriever", model="facebook/contriever-msmarco", n_docs=5, index_type="wiki")

# Contriever retrieval on MS MARCO
contriever_retriever_msmacro = Retriever(method="contriever", model="facebook/contriever-msmarco", n_docs=5, index_type="msmarco")


# ColBERT retrieval on Wikipedia
colbert_retriever_wiki = Retriever(method="colbert", model="colbert-ir/colbertv2.0", n_docs=5, index_type="wiki")

# ColBERT retrieval on MS MARCO
colbert_retriever_msmacro = Retriever(method="colbert", model="colbert-ir/colbertv2.0", n_docs=5, index_type="msmarco")


# BGE retrieval on Wikipedia
bge_retriever_wiki = Retriever(method="bge", model="BAAI/bge-large-en-v1.5", n_docs=5, index_type="wiki")

# BGE retrieval on MS MARCO
bge_retriever_msmacro = Retriever(method="bge", model="BAAI/bge-large-en-v1.5", n_docs=5, index_type="msmarco")
```

**Running Retrieval**

After defining the retriever, you can retrieve documents using:
```python
retrieved_documents = bm25_retriever_wiki.retrieve(documents)

for i, doc in enumerate(retrieved_documents):
    print(f"\nDocument {i+1}:")
    print(doc)
```

---
## 3️⃣. Running Reranking
Rankify provides support for multiple reranking models. Below are examples of how to use each model.  

** Example: Reranking a Document**  
```python
from rankify.dataset.dataset import Document, Question, Answer, Context
from rankify.models.reranking import Reranking

# Sample document setup
question = Question("When did Thomas Edison invent the light bulb?")
answers = Answer(["1879"])
contexts = [
    Context(text="Lightning strike at Seoul National University", id=1),
    Context(text="Thomas Edison tried to invent a device for cars but failed", id=2),
    Context(text="Coffee is good for diet", id=3),
    Context(text="Thomas Edison invented the light bulb in 1879", id=4),
    Context(text="Thomas Edison worked with electricity", id=5),
]
document = Document(question=question, answers=answers, contexts=contexts)

# Initialize the reranker
reranker = Reranking(method="monot5", model_name="monot5-base-msmarco")

# Apply reranking
reranker.rank([document])

# Print reordered contexts
for context in document.reorder_contexts:
    print(f"  - {context.text}")
```


**Examples of Using Different Reranking Models**  
```python
# UPR
model = Reranking(method='upr', model_name='t5-base')

# API-Based Rerankers
model = Reranking(method='apiranker', model_name='voyage', api_key='your-api-key')
model = Reranking(method='apiranker', model_name='jina', api_key='your-api-key')
model = Reranking(method='apiranker', model_name='mixedbread.ai', api_key='your-api-key')

# Blender Reranker
model = Reranking(method='blender_reranker', model_name='PairRM')

# ColBERT Reranker
model = Reranking(method='colbert_ranker', model_name='Colbert')

# EchoRank
model = Reranking(method='echorank', model_name='flan-t5-large')

# First Ranker
model = Reranking(method='first_ranker', model_name='base')

# FlashRank
model = Reranking(method='flashrank', model_name='ms-marco-TinyBERT-L-2-v2')

# InContext Reranker
Reranking(method='incontext_reranker', model_name='llamav3.1-8b')

# InRanker
model = Reranking(method='inranker', model_name='inranker-small')

# ListT5
model = Reranking(method='listt5', model_name='listt5-base')

# LiT5 Distill
model = Reranking(method='lit5distill', model_name='LiT5-Distill-base')

# LiT5 Score
model = Reranking(method='lit5score', model_name='LiT5-Distill-base')

# LLM Layerwise Ranker
model = Reranking(method='llm_layerwise_ranker', model_name='bge-multilingual-gemma2')

# LLM2Vec
model = Reranking(method='llm2vec', model_name='Meta-Llama-31-8B')

# MonoBERT
model = Reranking(method='monobert', model_name='monobert-large')

# MonoT5
Reranking(method='monot5', model_name='monot5-base-msmarco')

# RankGPT
model = Reranking(method='rankgpt', model_name='llamav3.1-8b')

# RankGPT API
model = Reranking(method='rankgpt-api', model_name='gpt-3.5', api_key="gpt-api-key")
model = Reranking(method='rankgpt-api', model_name='gpt-4', api_key="gpt-api-key")
model = Reranking(method='rankgpt-api', model_name='llamav3.1-8b', api_key="together-api-key")
model = Reranking(method='rankgpt-api', model_name='claude-3-5', api_key="claude-api-key")

# RankT5
model = Reranking(method='rankt5', model_name='rankt5-base')

# Sentence Transformer Reranker
model = Reranking(method='sentence_transformer_reranker', model_name='all-MiniLM-L6-v2')
model = Reranking(method='sentence_transformer_reranker', model_name='gtr-t5-base')
model = Reranking(method='sentence_transformer_reranker', model_name='sentence-t5-base')
model = Reranking(method='sentence_transformer_reranker', model_name='distilbert-multilingual-nli-stsb-quora-ranking')
model = Reranking(method='sentence_transformer_reranker', model_name='msmarco-bert-co-condensor')

# SPLADE
model = Reranking(method='splade', model_name='splade-cocondenser')

# Transformer Ranker
model = Reranking(method='transformer_ranker', model_name='mxbai-rerank-xsmall')
model = Reranking(method='transformer_ranker', model_name='bge-reranker-base')
model = Reranking(method='transformer_ranker', model_name='bce-reranker-base')
model = Reranking(method='transformer_ranker', model_name='jina-reranker-tiny')
model = Reranking(method='transformer_ranker', model_name='gte-multilingual-reranker-base')
model = Reranking(method='transformer_ranker', model_name='nli-deberta-v3-large')
model = Reranking(method='transformer_ranker', model_name='ms-marco-TinyBERT-L-6')
model = Reranking(method='transformer_ranker', model_name='msmarco-MiniLM-L12-en-de-v1')

# TwoLAR
model = Reranking(method='twolar', model_name='twolar-xl')

# Vicuna Reranker
model = Reranking(method='vicuna_reranker', model_name='rank_vicuna_7b_v1')

# Zephyr Reranker
model = Reranking(method='zephyr_reranker', model_name='rank_zephyr_7b_v1_full')
```
---

## 4️⃣. Using Generator Module
Rankify provides a **Generator Module** to facilitate **retrieval-augmented generation (RAG)** by integrating retrieved documents into generative models for producing answers. Below is an example of how to use different generator methods.  

```python
from rankify.dataset.dataset import Document, Question, Answer, Context
from rankify.generator.generator import Generator

# Define question and answer
question = Question("What is the capital of France?")
answers = Answer(["Paris"])
contexts = [
    Context(id=1, title="France", text="The capital of France is Paris.", score=0.9),
    Context(id=2, title="Germany", text="Berlin is the capital of Germany.", score=0.5)
]

# Construct document
doc = Document(question=question, answers=answers, contexts=contexts)

# Initialize Generator (e.g., Meta Llama)
generator = Generator(method="in-context-ralm", model_name='meta-llama/Llama-3.1-8B')

# Generate answer
generated_answers = generator.generate([doc])
print(generated_answers)  # Output: ["Paris"]
```

---
## 5️⃣ Evaluating with Metrics  

Rankify provides built-in **evaluation metrics** for **retrieval, re-ranking, and retrieval-augmented generation (RAG)**. These metrics help assess the quality of retrieved documents, the effectiveness of ranking models, and the accuracy of generated answers.  

**Evaluating Generated Answers**  

You can evaluate the quality of **retrieval-augmented generation (RAG) results** by comparing generated answers with ground-truth answers.
```python
from rankify.metrics.metrics import Metrics
from rankify.dataset.dataset import Dataset

# Load dataset
dataset = Dataset('bm25', 'nq-test', 100)
documents = dataset.download(force_download=False)

# Initialize Generator
generator = Generator(method="in-context-ralm", model_name='meta-llama/Llama-3.1-8B')

# Generate answers
generated_answers = generator.generate(documents)

# Evaluate generated answers
metrics = Metrics(documents)
print(metrics.calculate_generation_metrics(generated_answers))
```

**Evaluating Retrieval Performance**  

```python
# Calculate retrieval metrics before reranking
metrics = Metrics(documents)
before_ranking_metrics = metrics.calculate_retrieval_metrics(ks=[1, 5, 10, 20, 50, 100], use_reordered=False)

print(before_ranking_metrics)
```

**Evaluating Reranked Results**  
```python
# Calculate retrieval metrics after reranking
after_ranking_metrics = metrics.calculate_retrieval_metrics(ks=[1, 5, 10, 20, 50, 100], use_reordered=True)
print(after_ranking_metrics)
```

## πŸ“œ Supported Models


### **1️⃣ Retrievers**  
- βœ… **BM25**
- βœ… **DPR** 
- βœ… **ColBERT**   
- βœ… **ANCE**
- βœ… **BGE** 
- βœ… **Contriever** 
- βœ… **BPR** 
---

### **2️⃣ Rerankers**  

- βœ… **Cross-Encoders** 
- βœ… **RankGPT**
- βœ… **RankGPT-API** 
- βœ… **MonoT5**
- βœ… **MonoBert**
- βœ… **RankT5** 
- βœ… **ListT5** 
- βœ… **LiT5Score**
- βœ… **LiT5Dist**
- βœ… **Vicuna Reranker**
- βœ… **Zephyr Reranker**
- βœ… **Sentence Transformer-based** 
- βœ… **FlashRank Models**  
- βœ… **API-Based Rerankers**  
- βœ… **ColBERT Reranker**
- βœ… **LLM Layerwise Ranker** 
- βœ… **Splade Reranker**
- βœ… **ColBERT Reranker**
- βœ… **UPR Reranker**
- βœ… **Inranker Reranker**
- βœ… **Transformer Reranker**
- βœ… **FIRST Reranker**
- βœ… **Blender Reranker**
- βœ… **LLM2VEC Reranker**
- βœ… **ECHO Reranker**
- βœ… **Incontext Reranker**
---

### **3️⃣ Generators**  
- βœ… **Fusion-in-Decoder (FiD) with T5**
- βœ… **In-Context Learning RLAM** 
---

## πŸ“– Documentation

For full API documentation, visit the [Rankify Docs](http://rankify.readthedocs.io/).

---

## πŸ’‘ Contributing


Follow these steps to get involved:

1. **Fork this repository** to your GitHub account.

2. **Create a new branch** for your feature or fix:

   ```bash
   git checkout -b feature/YourFeatureName
   ```

3. **Make your changes** and **commit them**:

   ```bash
   git commit -m "Add YourFeatureName"
   ```

4. **Push the changes** to your branch:

   ```bash
   git push origin feature/YourFeatureName
   ```

5. **Submit a Pull Request** to propose your changes.

Thank you for helping make this project better!

---


## πŸ”– License

Rankify is licensed under the Apache-2.0 License - see the [LICENSE](https://opensource.org/license/apache-2-0) file for details.

## 🌟 Citation

Please kindly cite our paper if helps your research:

```BibTex
@article{abdallah2025rankify,
  title={Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation},
  author={Abdallah, Abdelrahman and Mozafari, Jamshid and Piryani, Bhawna and Ali, Mohammed and Jatowt, Adam},
  journal={arXiv preprint arXiv:2502.02464},
  year={2025}
}
```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "rankify",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": "Abdelrahman Abdallah <abdoelsayed2016@gmail.com>",
    "keywords": "retrieval, re-ranking, RAG, nlp, search",
    "author": null,
    "author_email": "Abdelrahman Abdallah <abdoelsayed2016@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/c9/9c/af381a1cccc7ed629ef58074583fc28d32a0ff20f0e8309ef3ac966e644a/rankify-0.1.0.post4.tar.gz",
    "platform": null,
    "description": "\n\n\n### <div align=\"center\">\ud83d\udd25 Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation \ud83d\udd25<div>\n\n\n<div align=\"center\">\n<a href=\"https://arxiv.org/abs/2502.02464\" target=\"_blank\"><img src=https://img.shields.io/badge/arXiv-b5212f.svg?logo=arxiv></a>\n<a href=\"https://huggingface.co/datasets/abdoelsayed/reranking-datasets\" target=\"_blank\"><img src=https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace%20Datasets-27b3b4.svg></a>\n<a href=\"https://huggingface.co/datasets/abdoelsayed/reranking-datasets-light\" target=\"_blank\"><img src=\"https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace%20Datasets%20light-orange.svg\"></a>\n<a><img alt=\"Static Badge\" src=\"https://img.shields.io/badge/Python-3.10_3.11-blue\"></a>\n<a href=\"https://opensource.org/license/apache-2-0\"><img src=\"https://img.shields.io/static/v1?label=License&message=Apache-2.0&color=red\"></a>\n <a href=\"https://pepy.tech/projects/rankify\"><img src=\"https://static.pepy.tech/badge/rankify\" alt=\"PyPI Downloads\"></a>\n<a href=\"https://github.com/DataScienceUIBK/rankify/releases\"><img alt=\"GitHub release\" src=\"https://img.shields.io/github/release/DataScienceUIBK/rankify.svg?label=Version&color=orange\"></a>    \n</div>\n\n\n_A modular and efficient retrieval, reranking  and RAG  framework designed to work with state-of-the-art models for retrieval, ranking and rag tasks._\n\n_Rankify is a Python toolkit designed for unified retrieval, re-ranking, and retrieval-augmented generation (RAG) research. Our toolkit integrates 40 pre-retrieved benchmark datasets and supports 7 retrieval techniques, 24 state-of-the-art re-ranking models, and multiple RAG methods. Rankify provides a modular and extensible framework, enabling seamless experimentation and benchmarking across retrieval pipelines. Comprehensive documentation, open-source implementation, and pre-built evaluation tools make Rankify a powerful resource for researchers and practitioners in the field._\n\n\n\n## \u2728 Features\n\n- **Comprehensive Retrieval & Reranking Framework**: Rankify unifies retrieval, re-ranking, and retrieval-augmented generation (RAG) into a single modular Python toolkit, enabling seamless experimentation and benchmarking.  \n\n- **Extensive Dataset Support**: Includes **40 benchmark datasets** with **pre-retrieved documents**, covering diverse domains such as **question answering, dialogue, entity linking, and fact verification**.  \n\n- **Diverse Retriever Integration**: Supports **7 retrieval techniques**, including **BM25, DPR, ANCE, BPR, ColBERT, BGE, and Contriever**, providing flexibility for various retrieval strategies.  \n\n- **Advanced Re-ranking Models**: Implements **24 primary re-ranking models** with **41 sub-methods**, covering **pointwise, pairwise, and listwise** re-ranking approaches for enhanced ranking performance.  \n\n- **Prebuilt Retrieval Indices**: Provides **precomputed Wikipedia and MS MARCO corpora** for multiple retrieval models, eliminating indexing overhead and accelerating experiments.  \n\n- **Seamless RAG Integration**: Bridges retrieval and generative models (e.g., **GPT, LLAMA, T5**), enabling retrieval-augmented generation with **zero-shot**, **Fusion-in-Decoder (FiD)**, and **in-context learning** strategies.  \n\n- **Modular & Extensible Design**: Easily integrates custom datasets, retrievers, re-rankers, and generation models using Rankify\u2019s structured Python API.  \n\n- **Comprehensive Evaluation Suite**: Offers **automated performance evaluation** with **retrieval, ranking, and RAG metrics**, ensuring reproducible benchmarking.  \n\n- **User-Friendly Documentation**: Detailed **[\ud83d\udcd6 online documentation](http://rankify.readthedocs.io/)**, example notebooks, and tutorials for easy adoption.  \n\n## \ud83d\udd0d Roadmap  \n\n**Rankify** is still under development, and this is our first release (**v0.1.0**). While it already supports a wide range of retrieval, re-ranking, and RAG techniques, we are actively enhancing its capabilities by adding more retrievers, rankers, datasets, and features.  \n\n### \ud83d\ude80 Planned Improvements  \n\n- **Retrievers**  \n  - [x] Support for **BM25, DPR, ANCE, BPR, ColBERT, BGE, and Contriever**  \n  - [ ] Add missing retrievers: **Spar, MSS, MSS-DPR**  \n  - [ ] Enable **custom index loading** and support for user-defined retrieval corpora  \n\n- **Re-Rankers**  \n  - [x] 24 primary re-ranking models with 41 sub-methods  \n  - [ ] Expand the list by adding **more advanced ranking models** \n\n- **Datasets**  \n  - [x] 40 benchmark datasets for retrieval, ranking, and RAG  \n  - [ ] Add **more datasets**  \n  - [ ] Support for **custom dataset integration**  \n\n- **Retrieval-Augmented Generation (RAG)**  \n  - [x] Integration with **GPT, LLAMA, and T5**  \n  - [ ] Extend support for **more generative models**   \n\n- **Evaluation & Usability**  \n  - [x] Standard retrieval and ranking evaluation metrics (Top-K, EM, Recall, ...)\n  - [ ] Add **advanced evaluation metrics** (NDCG, MAP for retriever )  \n\n- **Pipeline Integration**  \n  - [ ] **Add a pipeline module** for streamlined retrieval, re-ranking, and RAG workflows \n\n## \ud83d\udd27 Installation  \n\n#### Set up the virtual environment\nFirst, create and activate a conda environment with Python 3.10:\n\n```bash\nconda create -n rankify python=3.10\nconda activate rankify\n```\n#### Install PyTorch 2.5.1\nwe recommend installing Rankify with PyTorch 2.5.1 for Rankify. Refer to the [PyTorch installation page](https://pytorch.org/get-started/previous-versions/) for platform-specific installation commands. \n\nIf you have access to GPUs, it's recommended to install the CUDA version 12.4 or 12.6 of PyTorch, as many of the evaluation metrics are optimized for GPU use.\n\nTo install Pytorch 2.5.1 you can install it from the following cmd\n```bash\npip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124\n```\n\n\n#### Basic Installation\n\nTo install **Rankify**, simply use **pip** (requires Python 3.10+):  \n```base\npip install rankify\n```\n\nOr, to install from **GitHub** for the latest development version:  \n\n```bash\ngit clone https://github.com/DataScienceUIBK/rankify.git\ncd rankify\npip install -e .\n# For full functionality we recommend installing Rankify with all dependencies:\npip install -e \".[all]\"\n# Install dependencies for retrieval only (BM25, DPR, ANCE, etc.)\npip install -e \".[retriever]\"\n# Install dependencies for base re-ranking only (excluding vLLM)\npip install -e \".[base]\"\n# Install base re-ranking with vLLM support for `FirstModelReranker`, `LiT5ScoreReranker`, `LiT5DistillReranker`, `VicunaReranker`, and `ZephyrReranker'.\npip install -e \".[reranking]\"\n# Install dependencies for retrieval-augmented generation (RAG)\npip install -e \".[rag]\"\n```\nThis will install the base functionality required for retrieval, re-ranking, and retrieval-augmented generation (RAG).  \n\n\n#### Recommended Installation  \n\nFor full functionality, we **recommend installing Rankify with all dependencies**:\n```bash\npip install \"rankify[all]\"\n```\nThis ensures you have all necessary modules, including retrieval, re-ranking, and RAG support.\n\n#### Optional Dependencies\n\nIf you prefer to install only specific components, choose from the following:\n```bash\n# Install dependencies for retrieval only (BM25, DPR, ANCE, etc.)\npip install \"rankify[retriever]\"\n\n# Install dependencies for base re-ranking only (excluding vLLM)\npip install \"rankify[base]\"\n\n# Install base re-ranking with vLLM support for `FirstModelReranker`, `LiT5ScoreReranker`, `LiT5DistillReranker`, `VicunaReranker`, and `ZephyrReranker'.\npip install \"rankify[reranking]\"\n\n# Install dependencies for retrieval-augmented generation (RAG)\npip install \"rankify[rag]\"\n```\n#### Using ColBERT Retriever  \n\nIf you want to use **ColBERT Retriever**, follow these additional setup steps:\n```bash\n# Install GCC and required libraries\nconda install -c conda-forge gcc=9.4.0 gxx=9.4.0\nconda install -c conda-forge libstdcxx-ng\n```\n```bash\n# Export necessary environment variables\nexport LD_LIBRARY_PATH=$CONDA_PREFIX/lib:$LD_LIBRARY_PATH\nexport CC=gcc\nexport CXX=g++\nexport PATH=$CONDA_PREFIX/bin:$PATH\n\n# Clear cached torch extensions\nrm -rf ~/.cache/torch_extensions/*\n```\n\n---\n\n## \ud83d\ude80 Quick Start\n\n### **1\ufe0f\u20e3. Pre-retrieved Datasets**  \n\nWe provide **1,000 pre-retrieved documents per dataset**, which you can download from:  \n\n\ud83d\udd17 **[Hugging Face Dataset Repository](https://huggingface.co/datasets/abdoelsayed/reranking-datasets-light)**  \n\n#### **Dataset Format**  \n\nThe pre-retrieved documents are structured as follows:\n```json\n[\n    {\n        \"question\": \"...\",\n        \"answers\": [\"...\", \"...\", ...],\n        \"ctxs\": [\n            {\n                \"id\": \"...\",         // Passage ID from database TSV file\n                \"score\": \"...\",      // Retriever score\n                \"has_answer\": true|false  // Whether the passage contains the answer\n            }\n        ]\n    }\n]\n```\n\n\n#### **Access Datasets in Rankify**  \n\nYou can **easily download and use pre-retrieved datasets** through **Rankify**.  \n\n#### **List Available Datasets**  \n\nTo see all available datasets:\n```python\nfrom rankify.dataset.dataset import Dataset \n\n# Display available datasets\nDataset.avaiable_dataset()\n```\n\n**BM25 Retriever**\n```python\nfrom rankify.dataset.dataset import Dataset\n# Download BM25-retrieved documents for nq-dev\ndataset = Dataset(retriever=\"bm25\", dataset_name=\"nq-dev\", n_docs=100)\ndocuments = dataset.download(force_download=False)\n# Download BM25-retrieved documents for 2wikimultihopqa-dev\ndataset = Dataset(retriever=\"bm25\", dataset_name=\"2wikimultihopqa-train\", n_docs=100)\ndocuments = dataset.download(force_download=False)\n# Download BM25-retrieved documents for archivialqa-dev\ndataset = Dataset(retriever=\"bm25\", dataset_name=\"archivialqa-dev\", n_docs=100)\ndocuments = dataset.download(force_download=False)\n# Download BM25-retrieved documents for archivialqa-test\ndataset = Dataset(retriever=\"bm25\", dataset_name=\"archivialqa-test\", n_docs=100)\ndocuments = dataset.download(force_download=False)\n# Download BM25-retrieved documents for chroniclingamericaqa-test\ndataset = Dataset(retriever=\"bm25\", dataset_name=\"chroniclingamericaqa-test\", n_docs=100)\ndocuments = dataset.download(force_download=False)\n# Download BM25-retrieved documents for chroniclingamericaqa-dev\ndataset = Dataset(retriever=\"bm25\", dataset_name=\"chroniclingamericaqa-dev\", n_docs=100)\ndocuments = dataset.download(force_download=False)\n# Download BM25-retrieved documents for entityquestions-test\ndataset = Dataset(retriever=\"bm25\", dataset_name=\"entityquestions-test\", n_docs=100)\ndocuments = dataset.download(force_download=False)\n# Download BM25-retrieved documents for ambig_qa-dev\ndataset = Dataset(retriever=\"bm25\", dataset_name=\"ambig_qa-dev\", n_docs=100)\ndocuments = dataset.download(force_download=False)\n# Download BM25-retrieved documents for ambig_qa-train\ndataset = Dataset(retriever=\"bm25\", dataset_name=\"ambig_qa-train\", n_docs=100)\ndocuments = dataset.download(force_download=False)\n# Download BM25-retrieved documents for arc-test\ndataset = Dataset(retriever=\"bm25\", dataset_name=\"arc-test\", n_docs=100)\ndocuments = dataset.download(force_download=False)\n# Download BM25-retrieved documents for arc-dev\ndataset = Dataset(retriever=\"bm25\", dataset_name=\"arc-dev\", n_docs=100)\ndocuments = dataset.download(force_download=False)\n```\n\n**BGE Retriever**\n```python\nfrom rankify.dataset.dataset import Dataset\n# Download BGE-retrieved documents for nq-dev\ndataset = Dataset(retriever=\"bge\", dataset_name=\"nq-dev\", n_docs=100)\ndocuments = dataset.download(force_download=False)\n# Download BGE-retrieved documents for 2wikimultihopqa-dev\ndataset = Dataset(retriever=\"bge\", dataset_name=\"2wikimultihopqa-train\", n_docs=100)\ndocuments = dataset.download(force_download=False)\n# Download BGE-retrieved documents for archivialqa-dev\ndataset = Dataset(retriever=\"bge\", dataset_name=\"archivialqa-dev\", n_docs=100)\ndocuments = dataset.download(force_download=False)\n```\n\n**ColBERT Retriever**\n\n\n```python\nfrom rankify.dataset.dataset import Dataset\n# Download ColBERT-retrieved documents for nq-dev\ndataset = Dataset(retriever=\"colbert\", dataset_name=\"nq-dev\", n_docs=100)\ndocuments = dataset.download(force_download=False)\n# Download ColBERT-retrieved documents for 2wikimultihopqa-dev\ndataset = Dataset(retriever=\"colbert\", dataset_name=\"2wikimultihopqa-train\", n_docs=100)\ndocuments = dataset.download(force_download=False)\n# Download ColBERT-retrieved documents for archivialqa-dev\ndataset = Dataset(retriever=\"colbert\", dataset_name=\"archivialqa-dev\", n_docs=100)\ndocuments = dataset.download(force_download=False)\n```\n\n**MSS-DPR Retriever**\n\n\n```python\nfrom rankify.dataset.dataset import Dataset\n# Download MSS-DPR-retrieved documents for nq-dev\ndataset = Dataset(retriever=\"mss-dpr\", dataset_name=\"nq-dev\", n_docs=100)\ndocuments = dataset.download(force_download=False)\n# Download MSS-DPR-retrieved documents for 2wikimultihopqa-dev\ndataset = Dataset(retriever=\"mss-dpr\", dataset_name=\"2wikimultihopqa-train\", n_docs=100)\ndocuments = dataset.download(force_download=False)\n# Download MSS-DPR-retrieved documents for archivialqa-dev\ndataset = Dataset(retriever=\"mss-dpr\", dataset_name=\"archivialqa-dev\", n_docs=100)\ndocuments = dataset.download(force_download=False)\n```\n\n**MSS Retriever**\n\n```python\nfrom rankify.dataset.dataset import Dataset\n# Download MSS-retrieved documents for nq-dev\ndataset = Dataset(retriever=\"mss\", dataset_name=\"nq-dev\", n_docs=100)\ndocuments = dataset.download(force_download=False)\n# Download MSS-retrieved documents for 2wikimultihopqa-dev\ndataset = Dataset(retriever=\"mss\", dataset_name=\"2wikimultihopqa-train\", n_docs=100)\ndocuments = dataset.download(force_download=False)\n# Download MSS-retrieved documents for archivialqa-dev\ndataset = Dataset(retriever=\"mss\", dataset_name=\"archivialqa-dev\", n_docs=100)\ndocuments = dataset.download(force_download=False)\n```\n\n**Contriever Retriever**\n\n```python\nfrom rankify.dataset.dataset import Dataset\n# Download MSS-retrieved documents for nq-dev\ndataset = Dataset(retriever=\"contriever\", dataset_name=\"nq-dev\", n_docs=100)\ndocuments = dataset.download(force_download=False)\n# Download MSS-retrieved documents for 2wikimultihopqa-dev\ndataset = Dataset(retriever=\"contriever\", dataset_name=\"2wikimultihopqa-train\", n_docs=100)\ndocuments = dataset.download(force_download=False)\n# Download MSS-retrieved documents for archivialqa-dev\ndataset = Dataset(retriever=\"contriever\", dataset_name=\"archivialqa-dev\", n_docs=100)\ndocuments = dataset.download(force_download=False)\n```\n\n\n**ANCE Retriever**\n\n```python\nfrom rankify.dataset.dataset import Dataset\n# Download ANCE-retrieved documents for nq-dev\ndataset = Dataset(retriever=\"ance\", dataset_name=\"nq-dev\", n_docs=100)\ndocuments = dataset.download(force_download=False)\n# Download ANCE-retrieved documents for 2wikimultihopqa-dev\ndataset = Dataset(retriever=\"ance\", dataset_name=\"2wikimultihopqa-train\", n_docs=100)\ndocuments = dataset.download(force_download=False)\n# Download ANCE-retrieved documents for archivialqa-dev\ndataset = Dataset(retriever=\"ance\", dataset_name=\"archivialqa-dev\", n_docs=100)\ndocuments = dataset.download(force_download=False)\n```\n\n**Load Pre-retrieved Dataset from File**  \n\nIf you have already downloaded a dataset, you can load it directly:\n```python\nfrom rankify.dataset.dataset import Dataset\n\n# Load pre-downloaded BM25 dataset for WebQuestions\ndocuments = Dataset.load_dataset('./tests/out-datasets/bm25/web_questions/test.json', 100)\n```\nNow, you can integrate **retrieved documents** with **re-ranking** and **RAG** workflows! \ud83d\ude80  \n\n---\n\n### 2\ufe0f\u20e3. Running Retrieval\nTo perform retrieval using **Rankify**, you can choose from various retrieval methods such as **BM25, DPR, ANCE, Contriever, ColBERT, and BGE**.  \n\n**Example: Running Retrieval on Sample Queries**  \n```python\nfrom rankify.dataset.dataset import Document, Question, Answer, Context\nfrom rankify.retrievers.retriever import Retriever\n\n# Sample Documents\ndocuments = [\n    Document(question=Question(\"the cast of a good day to die hard?\"), answers=Answer([\n            \"Jai Courtney\",\n            \"Sebastian Koch\",\n            \"Radivoje Bukvi\u0107\",\n            \"Yuliya Snigir\",\n            \"Sergei Kolesnikov\",\n            \"Mary Elizabeth Winstead\",\n            \"Bruce Willis\"\n        ]), contexts=[]),\n    Document(question=Question(\"Who wrote Hamlet?\"), answers=Answer([\"Shakespeare\"]), contexts=[])\n]\n```\n\n```python\n# BM25 retrieval on Wikipedia\nbm25_retriever_wiki = Retriever(method=\"bm25\", n_docs=5, index_type=\"wiki\")\n\n# BM25 retrieval on MS MARCO\nbm25_retriever_msmacro = Retriever(method=\"bm25\", n_docs=5, index_type=\"msmarco\")\n\n\n# DPR (multi-encoder) retrieval on Wikipedia\ndpr_retriever_wiki = Retriever(method=\"dpr\", model=\"dpr-multi\", n_docs=5, index_type=\"wiki\")\n\n# DPR (multi-encoder) retrieval on MS MARCO\ndpr_retriever_msmacro = Retriever(method=\"dpr\", model=\"dpr-multi\", n_docs=5, index_type=\"msmarco\")\n\n# DPR (single-encoder) retrieval on Wikipedia\ndpr_retriever_wiki = Retriever(method=\"dpr\", model=\"dpr-single\", n_docs=5, index_type=\"wiki\")\n\n# DPR (single-encoder) retrieval on MS MARCO\ndpr_retriever_msmacro = Retriever(method=\"dpr\", model=\"dpr-single\", n_docs=5, index_type=\"msmarco\")\n\n# ANCE retrieval on Wikipedia\nance_retriever_wiki = Retriever(method=\"ance\", model=\"ance-multi\", n_docs=5, index_type=\"wiki\")\n\n# ANCE retrieval on MS MARCO\nance_retriever_msmacro = Retriever(method=\"ance\", model=\"ance-multi\", n_docs=5, index_type=\"msmarco\")\n\n\n# Contriever retrieval on Wikipedia\ncontriever_retriever_wiki = Retriever(method=\"contriever\", model=\"facebook/contriever-msmarco\", n_docs=5, index_type=\"wiki\")\n\n# Contriever retrieval on MS MARCO\ncontriever_retriever_msmacro = Retriever(method=\"contriever\", model=\"facebook/contriever-msmarco\", n_docs=5, index_type=\"msmarco\")\n\n\n# ColBERT retrieval on Wikipedia\ncolbert_retriever_wiki = Retriever(method=\"colbert\", model=\"colbert-ir/colbertv2.0\", n_docs=5, index_type=\"wiki\")\n\n# ColBERT retrieval on MS MARCO\ncolbert_retriever_msmacro = Retriever(method=\"colbert\", model=\"colbert-ir/colbertv2.0\", n_docs=5, index_type=\"msmarco\")\n\n\n# BGE retrieval on Wikipedia\nbge_retriever_wiki = Retriever(method=\"bge\", model=\"BAAI/bge-large-en-v1.5\", n_docs=5, index_type=\"wiki\")\n\n# BGE retrieval on MS MARCO\nbge_retriever_msmacro = Retriever(method=\"bge\", model=\"BAAI/bge-large-en-v1.5\", n_docs=5, index_type=\"msmarco\")\n```\n\n**Running Retrieval**\n\nAfter defining the retriever, you can retrieve documents using:\n```python\nretrieved_documents = bm25_retriever_wiki.retrieve(documents)\n\nfor i, doc in enumerate(retrieved_documents):\n    print(f\"\\nDocument {i+1}:\")\n    print(doc)\n```\n\n---\n## 3\ufe0f\u20e3. Running Reranking\nRankify provides support for multiple reranking models. Below are examples of how to use each model.  \n\n** Example: Reranking a Document**  \n```python\nfrom rankify.dataset.dataset import Document, Question, Answer, Context\nfrom rankify.models.reranking import Reranking\n\n# Sample document setup\nquestion = Question(\"When did Thomas Edison invent the light bulb?\")\nanswers = Answer([\"1879\"])\ncontexts = [\n    Context(text=\"Lightning strike at Seoul National University\", id=1),\n    Context(text=\"Thomas Edison tried to invent a device for cars but failed\", id=2),\n    Context(text=\"Coffee is good for diet\", id=3),\n    Context(text=\"Thomas Edison invented the light bulb in 1879\", id=4),\n    Context(text=\"Thomas Edison worked with electricity\", id=5),\n]\ndocument = Document(question=question, answers=answers, contexts=contexts)\n\n# Initialize the reranker\nreranker = Reranking(method=\"monot5\", model_name=\"monot5-base-msmarco\")\n\n# Apply reranking\nreranker.rank([document])\n\n# Print reordered contexts\nfor context in document.reorder_contexts:\n    print(f\"  - {context.text}\")\n```\n\n\n**Examples of Using Different Reranking Models**  \n```python\n# UPR\nmodel = Reranking(method='upr', model_name='t5-base')\n\n# API-Based Rerankers\nmodel = Reranking(method='apiranker', model_name='voyage', api_key='your-api-key')\nmodel = Reranking(method='apiranker', model_name='jina', api_key='your-api-key')\nmodel = Reranking(method='apiranker', model_name='mixedbread.ai', api_key='your-api-key')\n\n# Blender Reranker\nmodel = Reranking(method='blender_reranker', model_name='PairRM')\n\n# ColBERT Reranker\nmodel = Reranking(method='colbert_ranker', model_name='Colbert')\n\n# EchoRank\nmodel = Reranking(method='echorank', model_name='flan-t5-large')\n\n# First Ranker\nmodel = Reranking(method='first_ranker', model_name='base')\n\n# FlashRank\nmodel = Reranking(method='flashrank', model_name='ms-marco-TinyBERT-L-2-v2')\n\n# InContext Reranker\nReranking(method='incontext_reranker', model_name='llamav3.1-8b')\n\n# InRanker\nmodel = Reranking(method='inranker', model_name='inranker-small')\n\n# ListT5\nmodel = Reranking(method='listt5', model_name='listt5-base')\n\n# LiT5 Distill\nmodel = Reranking(method='lit5distill', model_name='LiT5-Distill-base')\n\n# LiT5 Score\nmodel = Reranking(method='lit5score', model_name='LiT5-Distill-base')\n\n# LLM Layerwise Ranker\nmodel = Reranking(method='llm_layerwise_ranker', model_name='bge-multilingual-gemma2')\n\n# LLM2Vec\nmodel = Reranking(method='llm2vec', model_name='Meta-Llama-31-8B')\n\n# MonoBERT\nmodel = Reranking(method='monobert', model_name='monobert-large')\n\n# MonoT5\nReranking(method='monot5', model_name='monot5-base-msmarco')\n\n# RankGPT\nmodel = Reranking(method='rankgpt', model_name='llamav3.1-8b')\n\n# RankGPT API\nmodel = Reranking(method='rankgpt-api', model_name='gpt-3.5', api_key=\"gpt-api-key\")\nmodel = Reranking(method='rankgpt-api', model_name='gpt-4', api_key=\"gpt-api-key\")\nmodel = Reranking(method='rankgpt-api', model_name='llamav3.1-8b', api_key=\"together-api-key\")\nmodel = Reranking(method='rankgpt-api', model_name='claude-3-5', api_key=\"claude-api-key\")\n\n# RankT5\nmodel = Reranking(method='rankt5', model_name='rankt5-base')\n\n# Sentence Transformer Reranker\nmodel = Reranking(method='sentence_transformer_reranker', model_name='all-MiniLM-L6-v2')\nmodel = Reranking(method='sentence_transformer_reranker', model_name='gtr-t5-base')\nmodel = Reranking(method='sentence_transformer_reranker', model_name='sentence-t5-base')\nmodel = Reranking(method='sentence_transformer_reranker', model_name='distilbert-multilingual-nli-stsb-quora-ranking')\nmodel = Reranking(method='sentence_transformer_reranker', model_name='msmarco-bert-co-condensor')\n\n# SPLADE\nmodel = Reranking(method='splade', model_name='splade-cocondenser')\n\n# Transformer Ranker\nmodel = Reranking(method='transformer_ranker', model_name='mxbai-rerank-xsmall')\nmodel = Reranking(method='transformer_ranker', model_name='bge-reranker-base')\nmodel = Reranking(method='transformer_ranker', model_name='bce-reranker-base')\nmodel = Reranking(method='transformer_ranker', model_name='jina-reranker-tiny')\nmodel = Reranking(method='transformer_ranker', model_name='gte-multilingual-reranker-base')\nmodel = Reranking(method='transformer_ranker', model_name='nli-deberta-v3-large')\nmodel = Reranking(method='transformer_ranker', model_name='ms-marco-TinyBERT-L-6')\nmodel = Reranking(method='transformer_ranker', model_name='msmarco-MiniLM-L12-en-de-v1')\n\n# TwoLAR\nmodel = Reranking(method='twolar', model_name='twolar-xl')\n\n# Vicuna Reranker\nmodel = Reranking(method='vicuna_reranker', model_name='rank_vicuna_7b_v1')\n\n# Zephyr Reranker\nmodel = Reranking(method='zephyr_reranker', model_name='rank_zephyr_7b_v1_full')\n```\n---\n\n## 4\ufe0f\u20e3. Using Generator Module\nRankify provides a **Generator Module** to facilitate **retrieval-augmented generation (RAG)** by integrating retrieved documents into generative models for producing answers. Below is an example of how to use different generator methods.  \n\n```python\nfrom rankify.dataset.dataset import Document, Question, Answer, Context\nfrom rankify.generator.generator import Generator\n\n# Define question and answer\nquestion = Question(\"What is the capital of France?\")\nanswers = Answer([\"Paris\"])\ncontexts = [\n    Context(id=1, title=\"France\", text=\"The capital of France is Paris.\", score=0.9),\n    Context(id=2, title=\"Germany\", text=\"Berlin is the capital of Germany.\", score=0.5)\n]\n\n# Construct document\ndoc = Document(question=question, answers=answers, contexts=contexts)\n\n# Initialize Generator (e.g., Meta Llama)\ngenerator = Generator(method=\"in-context-ralm\", model_name='meta-llama/Llama-3.1-8B')\n\n# Generate answer\ngenerated_answers = generator.generate([doc])\nprint(generated_answers)  # Output: [\"Paris\"]\n```\n\n---\n## 5\ufe0f\u20e3 Evaluating with Metrics  \n\nRankify provides built-in **evaluation metrics** for **retrieval, re-ranking, and retrieval-augmented generation (RAG)**. These metrics help assess the quality of retrieved documents, the effectiveness of ranking models, and the accuracy of generated answers.  \n\n**Evaluating Generated Answers**  \n\nYou can evaluate the quality of **retrieval-augmented generation (RAG) results** by comparing generated answers with ground-truth answers.\n```python\nfrom rankify.metrics.metrics import Metrics\nfrom rankify.dataset.dataset import Dataset\n\n# Load dataset\ndataset = Dataset('bm25', 'nq-test', 100)\ndocuments = dataset.download(force_download=False)\n\n# Initialize Generator\ngenerator = Generator(method=\"in-context-ralm\", model_name='meta-llama/Llama-3.1-8B')\n\n# Generate answers\ngenerated_answers = generator.generate(documents)\n\n# Evaluate generated answers\nmetrics = Metrics(documents)\nprint(metrics.calculate_generation_metrics(generated_answers))\n```\n\n**Evaluating Retrieval Performance**  \n\n```python\n# Calculate retrieval metrics before reranking\nmetrics = Metrics(documents)\nbefore_ranking_metrics = metrics.calculate_retrieval_metrics(ks=[1, 5, 10, 20, 50, 100], use_reordered=False)\n\nprint(before_ranking_metrics)\n```\n\n**Evaluating Reranked Results**  \n```python\n# Calculate retrieval metrics after reranking\nafter_ranking_metrics = metrics.calculate_retrieval_metrics(ks=[1, 5, 10, 20, 50, 100], use_reordered=True)\nprint(after_ranking_metrics)\n```\n\n## \ud83d\udcdc Supported Models\n\n\n### **1\ufe0f\u20e3 Retrievers**  \n- \u2705 **BM25**\n- \u2705 **DPR** \n- \u2705 **ColBERT**   \n- \u2705 **ANCE**\n- \u2705 **BGE** \n- \u2705 **Contriever** \n- \u2705 **BPR** \n---\n\n### **2\ufe0f\u20e3 Rerankers**  \n\n- \u2705 **Cross-Encoders** \n- \u2705 **RankGPT**\n- \u2705 **RankGPT-API** \n- \u2705 **MonoT5**\n- \u2705 **MonoBert**\n- \u2705 **RankT5** \n- \u2705 **ListT5** \n- \u2705 **LiT5Score**\n- \u2705 **LiT5Dist**\n- \u2705 **Vicuna Reranker**\n- \u2705 **Zephyr Reranker**\n- \u2705 **Sentence Transformer-based** \n- \u2705 **FlashRank Models**  \n- \u2705 **API-Based Rerankers**  \n- \u2705 **ColBERT Reranker**\n- \u2705 **LLM Layerwise Ranker** \n- \u2705 **Splade Reranker**\n- \u2705 **ColBERT Reranker**\n- \u2705 **UPR Reranker**\n- \u2705 **Inranker Reranker**\n- \u2705 **Transformer Reranker**\n- \u2705 **FIRST Reranker**\n- \u2705 **Blender Reranker**\n- \u2705 **LLM2VEC Reranker**\n- \u2705 **ECHO Reranker**\n- \u2705 **Incontext Reranker**\n---\n\n### **3\ufe0f\u20e3 Generators**  \n- \u2705 **Fusion-in-Decoder (FiD) with T5**\n- \u2705 **In-Context Learning RLAM** \n---\n\n## \ud83d\udcd6 Documentation\n\nFor full API documentation, visit the [Rankify Docs](http://rankify.readthedocs.io/).\n\n---\n\n## \ud83d\udca1 Contributing\n\n\nFollow these steps to get involved:\n\n1. **Fork this repository** to your GitHub account.\n\n2. **Create a new branch** for your feature or fix:\n\n   ```bash\n   git checkout -b feature/YourFeatureName\n   ```\n\n3. **Make your changes** and **commit them**:\n\n   ```bash\n   git commit -m \"Add YourFeatureName\"\n   ```\n\n4. **Push the changes** to your branch:\n\n   ```bash\n   git push origin feature/YourFeatureName\n   ```\n\n5. **Submit a Pull Request** to propose your changes.\n\nThank you for helping make this project better!\n\n---\n\n\n## \ud83d\udd16 License\n\nRankify is licensed under the Apache-2.0 License - see the [LICENSE](https://opensource.org/license/apache-2-0) file for details.\n\n## \ud83c\udf1f Citation\n\nPlease kindly cite our paper if helps your research:\n\n```BibTex\n@article{abdallah2025rankify,\n  title={Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation},\n  author={Abdallah, Abdelrahman and Mozafari, Jamshid and Piryani, Bhawna and Ali, Mohammed and Jatowt, Adam},\n  journal={arXiv preprint arXiv:2502.02464},\n  year={2025}\n}\n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation",
    "version": "0.1.0.post4",
    "project_urls": {
        "Documentation": "http://rankify.readthedocs.io/",
        "Homepage": "https://github.com/DataScienceUIBK/rankify",
        "Hugging Face Dataset": "https://huggingface.co/datasets/abdoelsayed/reranking-datasets",
        "Issues": "https://github.com/DataScienceUIBK/rankify/issues",
        "PyPI": "https://pypi.org/project/rankify/",
        "Source Code": "https://github.com/DataScienceUIBK/rankify"
    },
    "split_keywords": [
        "retrieval",
        " re-ranking",
        " rag",
        " nlp",
        " search"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "1d24fe246462ed80d843bd66bf3ab1c98cb3dd9f129c533f783f409bbb4b9f34",
                "md5": "85089c7ab08f5face67e7c8c354ccaad",
                "sha256": "1379cf21b23b1c701403f946ee2fa778f7036c64e2bfd8fa0823441b46be79c3"
            },
            "downloads": -1,
            "filename": "rankify-0.1.0.post4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "85089c7ab08f5face67e7c8c354ccaad",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 424123,
            "upload_time": "2025-02-10T03:28:58",
            "upload_time_iso_8601": "2025-02-10T03:28:58.052513Z",
            "url": "https://files.pythonhosted.org/packages/1d/24/fe246462ed80d843bd66bf3ab1c98cb3dd9f129c533f783f409bbb4b9f34/rankify-0.1.0.post4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "c99caf381a1cccc7ed629ef58074583fc28d32a0ff20f0e8309ef3ac966e644a",
                "md5": "e7621fd7d2516f8e9f7c7cf446233424",
                "sha256": "d6bcec711a1244ff3fdf5c0c845ed3fe2ec9f585ff08511e88f13637a838c3cb"
            },
            "downloads": -1,
            "filename": "rankify-0.1.0.post4.tar.gz",
            "has_sig": false,
            "md5_digest": "e7621fd7d2516f8e9f7c7cf446233424",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 319770,
            "upload_time": "2025-02-10T03:29:01",
            "upload_time_iso_8601": "2025-02-10T03:29:01.520643Z",
            "url": "https://files.pythonhosted.org/packages/c9/9c/af381a1cccc7ed629ef58074583fc28d32a0ff20f0e8309ef3ac966e644a/rankify-0.1.0.post4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-10 03:29:01",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "DataScienceUIBK",
    "github_project": "rankify",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "rankify"
}
        
Elapsed time: 0.77416s