retri-evals

Name	retri-evals JSON
Version	0.0.1 JSON
	download
home_page	https://github.com/DeployQL/retri-evals
Summary	Open-source tool for building and evaluating retrieval pipelines.
upload_time	2024-01-04 23:09:31
maintainer
docs_url	None
author	Matt Barta
requires_python	>=3.8,<3.12
license	AGPL-3.0
keywords	data-science artificial-intelligence developers-tools mlops rag retrieval
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Retrieval Evaluation Pipelines <sup>alpha</sup>
### RAG evaluation framework for faster iteration


## About retri-eval
Evaluating all of the components of a RAG pipeline is challenging. We didn't find a
great existing solution that was
1. flexible enough to fit on top of our document and query processing.
2. gave us confidence in scaling the database up without increasing latency or costs.
3. encouraged reuse of components.

retri-eval aims to be unopinionated enough that you can reuse any existing pipelines you have.

## Built With
- MTEB
- BEIR
- Pydantic

## Getting Started
### Installation
```bash
pip install retri-eval
```
### Define your data type
We use Pydantic to make sure that the index receives the expected data.

To use MTEB and BEIR datasets, retri-eval expects your data to provide a `doc_id` field.
This is set inside of our retriever and is how BEIR evaluates your results.

Below, we create a `QdrantDocument` that specifically indexes text alongside the embedding.
```python
class QdrantDocument(MTEBDocument):
    id: str
    doc_id: str
    embedding: List[float]
    text: str
```

### Create a Document Processing Pipeline
A document processor encapsulates the logic to translate from raw data to our defined type.

```python
class DocumentProcessor(ProcessingPipeline[Dict[str, str], QdrantDocument]):
    def __init__(self, model, name='', version=''):
        super().__init__(name, version)
        self.model = model

    def process(self, batch: List[Dict[str, str]], batch_size: int=0, **kwargs) -> List[QdrantDocument]:
        chunker = lambda x: [x]

        results = []
        for x in batch:
            doc = MTEBDocument(**x)

            chunks = chunker(doc.text)
            embedding = self.model.encode(chunks)
            for i, chunk in enumerate(chunks):
                results.append(QdrantDocument(
                    id=uuid.uuid4().hex,
                    doc_id=doc.doc_id,
                    text=chunk,
                    embedding=embedding[i],
                ))
        return results
```

### Create a Query Processing Pipeline
Similar to document processing, we need a way to convert strings to something the index will understand.

For dense retrieval, we return embeddings from a model.

```python
class QueryProcessor(ProcessingPipeline[str, List[float]]):
    def __init__(self, model, name = '', version = ''):
        super().__init__(name, version)
        self.model = model

    def process(self, batch: List[str], batch_size: int=0, **kwargs) -> List[List[float]]:
        return self.model.encode_queries(batch)
```

### Define a Retriever
The Retriever class acts as our interface to processing. It defines our search behavior
over the index. retri-eval defines a DenseRetriever for MTEB.

```python
model_name ="BAAI/bge-small-en-v1.5"
model = FlagModel(model_name,
                  query_instruction_for_retrieval="Represent this sentence for searching relevant passages: ",
                  use_fp16=True)

index = QdrantIndex("CQADupstackEnglish", vector_config=VectorParams(size=384, distance=Distance.COSINE))
doc_processor = DocumentProcessor(model, name=model_name)
query_processor = QueryProcessor(model, name=model_name)

retriever = DenseRetriever(
    index=index,
    query_processor=query_processor,
    doc_processor=doc_processor,
)
```

### Use our MTEB Tasks
MTEB makes it difficult to use our own search functionality, so we wrote our own MTEB Task
and extended MTEB tasks to use it.

This lets us bring our own indexes and define custom searching behavior. We're hoping to upstream this in the future.

````python
from retri-eval.evaluation.mteb_tasks import CQADupstackEnglishRetrieval

eval = MTEB(tasks=[CQADupstackEnglishRetrieval()])
results = eval.run(retriever, verbosity=2, overwrite_results=True, output_folder=f"results/{id}")

print(json.dumps(results, indent=1))
````
results:
```bash
{
 "CQADupstackEnglishRetrieval": {
  "mteb_version": "1.1.1",
  "dataset_revision": null,
  "mteb_dataset_name": "CQADupstackEnglishRetrieval",
  "test": {
   "ndcg_at_1": 0.37006,
   "ndcg_at_3": 0.39158,
   "ndcg_at_5": 0.4085,
   "ndcg_at_10": 0.42312,
   "ndcg_at_100": 0.46351,
   "ndcg_at_1000": 0.48629,
   "map_at_1": 0.29171,
   "map_at_3": 0.35044,
   "map_at_5": 0.36476,
   "map_at_10": 0.3735,
   "map_at_100": 0.38446,
   "map_at_1000": 0.38571,
   "recall_at_1": 0.29171,
   "recall_at_3": 0.40163,
   "recall_at_5": 0.44919,
   "recall_at_10": 0.49723,
   "recall_at_100": 0.67031,
   "recall_at_1000": 0.81938,
   "precision_at_1": 0.37006,
   "precision_at_3": 0.18535,
   "precision_at_5": 0.13121,
   "precision_at_10": 0.07694,
   "precision_at_100": 0.01252,
   "precision_at_1000": 0.00173,
   "mrr_at_1": 0.37006,
   "mrr_at_3": 0.41943,
   "mrr_at_5": 0.4314,
   "mrr_at_10": 0.43838,
   "mrr_at_100": 0.44447,
   "mrr_at_1000": 0.44497,
   "retrieval_latency_at_50": 0.07202814750780817,
   "retrieval_latency_at_95": 0.09553944145009152,
   "retrieval_latency_at_99": 0.20645513817435127,
   "evaluation_time": 538.25
  }
 }
}
```

## Roadmap
retri-eval is still in active development. We're planning to add the following functionality:

- [ ] Support reranking models
- [ ] Add support for hybrid retrieval baselines
- [ ] Support for automatic dataset generation
- [ ] Support parallel execution
- [ ] Add support for latency and cost benchmarks

# What dataset to evaluate on
retri-eval is currently integrated into MTEB for retrieval tasks only, but we're working on more.

[MTEB's available tasks](https://github.com/embeddings-benchmark/mteb/tree/main?tab=readme-ov-file#available-tasks)

We also recommend building your own internal dataset, but this can be time consuming and potentially
error prone. We'd love to chat if you're working on this.

## License
Distributed under the AGPL-3 License. If you need an alternate license, please reach out.


# Let's Chat!
Reach out! Our team has experience working on petabyte-scale search and analytics applications.
We'd love to hear what you're working on and see how we can help.

Matt - matt _[at]_ deployql.com - [Or Schedule some time to chat on my calendar](https://calendar.app.google/obJmewkwVSuUcSK1A)



## Acknowledgements
- [MTEB](https://github.com/embeddings-benchmark/mteb)

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/DeployQL/retri-evals",
    "name": "retri-evals",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8,<3.12",
    "maintainer_email": "",
    "keywords": "data-science,artificial-intelligence,developers-tools,mlops,rag,retrieval",
    "author": "Matt Barta",
    "author_email": "contact@deployql.com",
    "download_url": "https://files.pythonhosted.org/packages/99/2c/3d6eb132c50eae96f37c139e847fead960345a2f5b27967fada2d9114b6c/retri_evals-0.0.1.tar.gz",
    "platform": null,
    "description": "# Retrieval Evaluation Pipelines <sup>alpha</sup>\n### RAG evaluation framework for faster iteration\n\n\n## About retri-eval\nEvaluating all of the components of a RAG pipeline is challenging. We didn't find a\ngreat existing solution that was\n1. flexible enough to fit on top of our document and query processing.\n2. gave us confidence in scaling the database up without increasing latency or costs.\n3. encouraged reuse of components.\n\nretri-eval aims to be unopinionated enough that you can reuse any existing pipelines you have.\n\n## Built With\n- MTEB\n- BEIR\n- Pydantic\n\n## Getting Started\n### Installation\n```bash\npip install retri-eval\n```\n### Define your data type\nWe use Pydantic to make sure that the index receives the expected data.\n\nTo use MTEB and BEIR datasets, retri-eval expects your data to provide a `doc_id` field.\nThis is set inside of our retriever and is how BEIR evaluates your results.\n\nBelow, we create a `QdrantDocument` that specifically indexes text alongside the embedding.\n```python\nclass QdrantDocument(MTEBDocument):\n    id: str\n    doc_id: str\n    embedding: List[float]\n    text: str\n```\n\n### Create a Document Processing Pipeline\nA document processor encapsulates the logic to translate from raw data to our defined type.\n\n```python\nclass DocumentProcessor(ProcessingPipeline[Dict[str, str], QdrantDocument]):\n    def __init__(self, model, name='', version=''):\n        super().__init__(name, version)\n        self.model = model\n\n    def process(self, batch: List[Dict[str, str]], batch_size: int=0, **kwargs) -> List[QdrantDocument]:\n        chunker = lambda x: [x]\n\n        results = []\n        for x in batch:\n            doc = MTEBDocument(**x)\n\n            chunks = chunker(doc.text)\n            embedding = self.model.encode(chunks)\n            for i, chunk in enumerate(chunks):\n                results.append(QdrantDocument(\n                    id=uuid.uuid4().hex,\n                    doc_id=doc.doc_id,\n                    text=chunk,\n                    embedding=embedding[i],\n                ))\n        return results\n```\n\n### Create a Query Processing Pipeline\nSimilar to document processing, we need a way to convert strings to something the index will understand.\n\nFor dense retrieval, we return embeddings from a model.\n\n```python\nclass QueryProcessor(ProcessingPipeline[str, List[float]]):\n    def __init__(self, model, name = '', version = ''):\n        super().__init__(name, version)\n        self.model = model\n\n    def process(self, batch: List[str], batch_size: int=0, **kwargs) -> List[List[float]]:\n        return self.model.encode_queries(batch)\n```\n\n### Define a Retriever\nThe Retriever class acts as our interface to processing. It defines our search behavior\nover the index. retri-eval defines a DenseRetriever for MTEB.\n\n```python\nmodel_name =\"BAAI/bge-small-en-v1.5\"\nmodel = FlagModel(model_name,\n                  query_instruction_for_retrieval=\"Represent this sentence for searching relevant passages: \",\n                  use_fp16=True)\n\nindex = QdrantIndex(\"CQADupstackEnglish\", vector_config=VectorParams(size=384, distance=Distance.COSINE))\ndoc_processor = DocumentProcessor(model, name=model_name)\nquery_processor = QueryProcessor(model, name=model_name)\n\nretriever = DenseRetriever(\n    index=index,\n    query_processor=query_processor,\n    doc_processor=doc_processor,\n)\n```\n\n### Use our MTEB Tasks\nMTEB makes it difficult to use our own search functionality, so we wrote our own MTEB Task\nand extended MTEB tasks to use it.\n\nThis lets us bring our own indexes and define custom searching behavior. We're hoping to upstream this in the future.\n\n````python\nfrom retri-eval.evaluation.mteb_tasks import CQADupstackEnglishRetrieval\n\neval = MTEB(tasks=[CQADupstackEnglishRetrieval()])\nresults = eval.run(retriever, verbosity=2, overwrite_results=True, output_folder=f\"results/{id}\")\n\nprint(json.dumps(results, indent=1))\n````\nresults:\n```bash\n{\n \"CQADupstackEnglishRetrieval\": {\n  \"mteb_version\": \"1.1.1\",\n  \"dataset_revision\": null,\n  \"mteb_dataset_name\": \"CQADupstackEnglishRetrieval\",\n  \"test\": {\n   \"ndcg_at_1\": 0.37006,\n   \"ndcg_at_3\": 0.39158,\n   \"ndcg_at_5\": 0.4085,\n   \"ndcg_at_10\": 0.42312,\n   \"ndcg_at_100\": 0.46351,\n   \"ndcg_at_1000\": 0.48629,\n   \"map_at_1\": 0.29171,\n   \"map_at_3\": 0.35044,\n   \"map_at_5\": 0.36476,\n   \"map_at_10\": 0.3735,\n   \"map_at_100\": 0.38446,\n   \"map_at_1000\": 0.38571,\n   \"recall_at_1\": 0.29171,\n   \"recall_at_3\": 0.40163,\n   \"recall_at_5\": 0.44919,\n   \"recall_at_10\": 0.49723,\n   \"recall_at_100\": 0.67031,\n   \"recall_at_1000\": 0.81938,\n   \"precision_at_1\": 0.37006,\n   \"precision_at_3\": 0.18535,\n   \"precision_at_5\": 0.13121,\n   \"precision_at_10\": 0.07694,\n   \"precision_at_100\": 0.01252,\n   \"precision_at_1000\": 0.00173,\n   \"mrr_at_1\": 0.37006,\n   \"mrr_at_3\": 0.41943,\n   \"mrr_at_5\": 0.4314,\n   \"mrr_at_10\": 0.43838,\n   \"mrr_at_100\": 0.44447,\n   \"mrr_at_1000\": 0.44497,\n   \"retrieval_latency_at_50\": 0.07202814750780817,\n   \"retrieval_latency_at_95\": 0.09553944145009152,\n   \"retrieval_latency_at_99\": 0.20645513817435127,\n   \"evaluation_time\": 538.25\n  }\n }\n}\n```\n\n## Roadmap\nretri-eval is still in active development. We're planning to add the following functionality:\n\n- [ ] Support reranking models\n- [ ] Add support for hybrid retrieval baselines\n- [ ] Support for automatic dataset generation\n- [ ] Support parallel execution\n- [ ] Add support for latency and cost benchmarks\n\n# What dataset to evaluate on\nretri-eval is currently integrated into MTEB for retrieval tasks only, but we're working on more.\n\n[MTEB's available tasks](https://github.com/embeddings-benchmark/mteb/tree/main?tab=readme-ov-file#available-tasks)\n\nWe also recommend building your own internal dataset, but this can be time consuming and potentially\nerror prone. We'd love to chat if you're working on this.\n\n## License\nDistributed under the AGPL-3 License. If you need an alternate license, please reach out.\n\n\n# Let's Chat!\nReach out! Our team has experience working on petabyte-scale search and analytics applications.\nWe'd love to hear what you're working on and see how we can help.\n\nMatt - matt _[at]_ deployql.com - [Or Schedule some time to chat on my calendar](https://calendar.app.google/obJmewkwVSuUcSK1A)\n\n\n\n## Acknowledgements\n- [MTEB](https://github.com/embeddings-benchmark/mteb)",
    "bugtrack_url": null,
    "license": "AGPL-3.0",
    "summary": "Open-source tool for building and evaluating retrieval pipelines.",
    "version": "0.0.1",
    "project_urls": {
        "Homepage": "https://github.com/DeployQL/retri-evals"
    },
    "split_keywords": [
        "data-science",
        "artificial-intelligence",
        "developers-tools",
        "mlops",
        "rag",
        "retrieval"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e914cfefaffaafb5aa72aae725577844cf8ed2f15461891aff36bb068c8b1b83",
                "md5": "a7ef76044d01004e8c425b3972363592",
                "sha256": "0ca486359b7894c00b2770d8a9040da021ffb8293d0e0621d56aee87c830d4ff"
            },
            "downloads": -1,
            "filename": "retri_evals-0.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a7ef76044d01004e8c425b3972363592",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8,<3.12",
            "size": 23254,
            "upload_time": "2024-01-04T23:09:29",
            "upload_time_iso_8601": "2024-01-04T23:09:29.824511Z",
            "url": "https://files.pythonhosted.org/packages/e9/14/cfefaffaafb5aa72aae725577844cf8ed2f15461891aff36bb068c8b1b83/retri_evals-0.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "992c3d6eb132c50eae96f37c139e847fead960345a2f5b27967fada2d9114b6c",
                "md5": "7025b5e48597842f7f311ea963c386f4",
                "sha256": "6bcb87d6d46196bbf43bb078336ac353081966d09dea60c700800c8321c282c3"
            },
            "downloads": -1,
            "filename": "retri_evals-0.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "7025b5e48597842f7f311ea963c386f4",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8,<3.12",
            "size": 22999,
            "upload_time": "2024-01-04T23:09:31",
            "upload_time_iso_8601": "2024-01-04T23:09:31.786880Z",
            "url": "https://files.pythonhosted.org/packages/99/2c/3d6eb132c50eae96f37c139e847fead960345a2f5b27967fada2d9114b6c/retri_evals-0.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-01-04 23:09:31",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "DeployQL",
    "github_project": "retri-evals",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "retri-evals"
}

Matt Barta