<div align="center">
<h1>Cherche</h1>
<p>Neural search</p>
</div>
<p align="center"><img width=300 src="docs/img/logo.png"/></p>
<div align="center">
<!-- Documentation -->
<a href="https://raphaelsty.github.io/cherche/"><img src="https://img.shields.io/website?label=docs&style=flat-square&url=https%3A%2F%2Fraphaelsty.github.io/cherche/%2F" alt="documentation"></a>
<!-- Demo -->
<a href="https://raphaelsty.github.io/knowledge/?query=cherche%20neural%20search"><img src="https://img.shields.io/badge/demo-running-blueviolet?style=flat-square" alt="Demo"></a>
<!-- License -->
<a href="https://opensource.org/licenses/MIT"><img src="https://img.shields.io/badge/License-MIT-blue.svg?style=flat-square" alt="license"></a>
</div>
Cherche enables the development of a neural search pipeline that employs retrievers and pre-trained language models both as retrievers and rankers. The primary advantage of Cherche lies in its capacity to construct end-to-end pipelines. Additionally, Cherche is well-suited for offline semantic search due to its compatibility with batch computation.
Here are some of the features Cherche offers:
[Live demo of a NLP search engine powered by Cherche](https://raphaelsty.github.io/knowledge/?query=cherche%20neural%20search)
![Alt text](docs/img/explain.png)
## Installation 🤖
To install Cherche for use with a simple retriever on CPU, such as TfIdf, Flash, Lunr, Fuzz, use the following command:
```sh
pip install cherche
```
To install Cherche for use with any semantic retriever or ranker on CPU, use the following command:
```sh
pip install "cherche[cpu]"
```
Finally, if you plan to use any semantic retriever or ranker on GPU, use the following command:
```sh
pip install "cherche[gpu]"
```
By following these installation instructions, you will be able to use Cherche with the appropriate requirements for your needs.
### Documentation
Documentation is available [here](https://raphaelsty.github.io/cherche/). It provides details
about retrievers, rankers, pipelines and examples.
## QuickStart 📑
### Documents
Cherche allows findings the right document within a list of objects. Here is an example of a corpus.
```python
from cherche import data
documents = data.load_towns()
documents[:3]
[{'id': 0,
'title': 'Paris',
'url': 'https://en.wikipedia.org/wiki/Paris',
'article': 'Paris is the capital and most populous city of France.'},
{'id': 1,
'title': 'Paris',
'url': 'https://en.wikipedia.org/wiki/Paris',
'article': "Since the 17th century, Paris has been one of Europe's major centres of science, and arts."},
{'id': 2,
'title': 'Paris',
'url': 'https://en.wikipedia.org/wiki/Paris',
'article': 'The City of Paris is the centre and seat of government of the region and province of Île-de-France.'
}]
```
### Retriever ranker
Here is an example of a neural search pipeline composed of a TF-IDF that quickly retrieves documents, followed by a ranking model. The ranking model sorts the documents produced by the retriever based on the semantic similarity between the query and the documents. We can call the pipeline using a list of queries and get relevant documents for each query.
```python
from cherche import data, retrieve, rank
from sentence_transformers import SentenceTransformer
from lenlp import sparse
# List of dicts
documents = data.load_towns()
# Retrieve on fields title and article
retriever = retrieve.BM25(
key="id",
on=["title", "article"],
documents=documents,
k=30
)
# Rank on fields title and article
ranker = rank.Encoder(
key = "id",
on = ["title", "article"],
encoder = SentenceTransformer("sentence-transformers/all-mpnet-base-v2").encode,
k = 3,
)
# Pipeline creation
search = retriever + ranker
search.add(documents=documents)
# Search documents for 3 queries.
search(["Bordeaux", "Paris", "Toulouse"])
[[{'id': 57, 'similarity': 0.69513524},
{'id': 63, 'similarity': 0.6214994},
{'id': 65, 'similarity': 0.61809087}],
[{'id': 16, 'similarity': 0.59158516},
{'id': 0, 'similarity': 0.58217555},
{'id': 1, 'similarity': 0.57944715}],
[{'id': 26, 'similarity': 0.6925601},
{'id': 37, 'similarity': 0.63977146},
{'id': 28, 'similarity': 0.62772334}]]
```
We can map the index to the documents to access their contents using pipelines:
```python
search += documents
search(["Bordeaux", "Paris", "Toulouse"])
[[{'id': 57,
'title': 'Bordeaux',
'url': 'https://en.wikipedia.org/wiki/Bordeaux',
'similarity': 0.69513524},
{'id': 63,
'title': 'Bordeaux',
'similarity': 0.6214994},
{'id': 65,
'title': 'Bordeaux',
'url': 'https://en.wikipedia.org/wiki/Bordeaux',
'similarity': 0.61809087}],
[{'id': 16,
'title': 'Paris',
'url': 'https://en.wikipedia.org/wiki/Paris',
'article': 'Paris received 12.',
'similarity': 0.59158516},
{'id': 0,
'title': 'Paris',
'url': 'https://en.wikipedia.org/wiki/Paris',
'similarity': 0.58217555},
{'id': 1,
'title': 'Paris',
'url': 'https://en.wikipedia.org/wiki/Paris',
'similarity': 0.57944715}],
[{'id': 26,
'title': 'Toulouse',
'url': 'https://en.wikipedia.org/wiki/Toulouse',
'similarity': 0.6925601},
{'id': 37,
'title': 'Toulouse',
'url': 'https://en.wikipedia.org/wiki/Toulouse',
'similarity': 0.63977146},
{'id': 28,
'title': 'Toulouse',
'url': 'https://en.wikipedia.org/wiki/Toulouse',
'similarity': 0.62772334}]]
```
## Retrieve
Cherche provides [retrievers](https://raphaelsty.github.io/cherche/retrieve/retrieve/) that filter input documents based on a query.
- retrieve.TfIdf
- retrieve.BM25
- retrieve.Lunr
- retrieve.Flash
- retrieve.Encoder
- retrieve.DPR
- retrieve.Fuzz
- retrieve.Embedding
## Rank
Cherche provides [rankers](https://raphaelsty.github.io/cherche/rank/rank/) that filter documents in output of retrievers.
Cherche rankers are compatible with [SentenceTransformers](https://www.sbert.net/docs/pretrained_models.html) models which are available on [Hugging Face hub](https://huggingface.co/models?pipeline_tag=zero-shot-classification&sort=downloads).
- rank.Encoder
- rank.DPR
- rank.CrossEncoder
- rank.Embedding
## Question answering
Cherche provides modules dedicated to question answering. These modules are compatible with Hugging Face's pre-trained models and fully integrated into neural search pipelines.
## Contributors 🤝
Cherche was created for/by Renault and is now available to all.
We welcome all contributions.
<p align="center"><img src="docs/img/renault.jpg"/></p>
## Acknowledgements 👏
Lunr retriever is a wrapper around [Lunr.py](https://github.com/yeraydiazdiaz/lunr.py). Flash retriever is a wrapper around [FlashText](https://github.com/vi3k6i5/flashtext). DPR, Encode and CrossEncoder rankers are wrappers dedicated to the use of the pre-trained models of [SentenceTransformers](https://www.sbert.net/docs/pretrained_models.html) in a neural search pipeline.
## Citations
If you use cherche to produce results for your scientific publication, please refer to our SIGIR paper:
```bibtex
@inproceedings{Sourty2022sigir,
author = {Raphael Sourty and Jose G. Moreno and Lynda Tamine and Francois-Paul Servant},
title = {CHERCHE: A new tool to rapidly implement pipelines in information retrieval},
booktitle = {Proceedings of SIGIR 2022},
year = {2022}
}
```
## Dev Team 💾
The Cherche dev team is made up of [Raphaël Sourty](https://github.com/raphaelsty), [François-Paul Servant](https://github.com/fpservant), [Nicolas Bizzozzero](https://github.com/NicolasBizzozzero), [Jose G Moreno](https://scholar.google.com/citations?user=4BZFUw8AAAAJ&hl=fr). 🥳
Raw data
{
"_id": null,
"home_page": "https://github.com/raphaelsty/cherche",
"name": "cherche",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": null,
"keywords": "neural search, information retrieval, question answering, semantic search",
"author": "Raphael Sourty",
"author_email": "raphael.sourty@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/e6/49/f33984b81c3acfbeff2278f26fced217ace2b8410ba22f70b3b81d773913/cherche-2.2.1.tar.gz",
"platform": null,
"description": "<div align=\"center\">\n <h1>Cherche</h1>\n <p>Neural search</p>\n</div>\n\n<p align=\"center\"><img width=300 src=\"docs/img/logo.png\"/></p>\n\n<div align=\"center\">\n <!-- Documentation -->\n <a href=\"https://raphaelsty.github.io/cherche/\"><img src=\"https://img.shields.io/website?label=docs&style=flat-square&url=https%3A%2F%2Fraphaelsty.github.io/cherche/%2F\" alt=\"documentation\"></a>\n <!-- Demo -->\n <a href=\"https://raphaelsty.github.io/knowledge/?query=cherche%20neural%20search\"><img src=\"https://img.shields.io/badge/demo-running-blueviolet?style=flat-square\" alt=\"Demo\"></a>\n <!-- License -->\n <a href=\"https://opensource.org/licenses/MIT\"><img src=\"https://img.shields.io/badge/License-MIT-blue.svg?style=flat-square\" alt=\"license\"></a>\n</div>\n\n\nCherche enables the development of a neural search pipeline that employs retrievers and pre-trained language models both as retrievers and rankers. The primary advantage of Cherche lies in its capacity to construct end-to-end pipelines. Additionally, Cherche is well-suited for offline semantic search due to its compatibility with batch computation.\n\nHere are some of the features Cherche offers:\n\n[Live demo of a NLP search engine powered by Cherche](https://raphaelsty.github.io/knowledge/?query=cherche%20neural%20search)\n\n![Alt text](docs/img/explain.png)\n\n## Installation \ud83e\udd16\n\nTo install Cherche for use with a simple retriever on CPU, such as TfIdf, Flash, Lunr, Fuzz, use the following command:\n\n```sh\npip install cherche\n```\n\nTo install Cherche for use with any semantic retriever or ranker on CPU, use the following command:\n\n```sh\npip install \"cherche[cpu]\"\n```\n\nFinally, if you plan to use any semantic retriever or ranker on GPU, use the following command:\n\n```sh\npip install \"cherche[gpu]\"\n```\n\nBy following these installation instructions, you will be able to use Cherche with the appropriate requirements for your needs.\n\n### Documentation\n\nDocumentation is available [here](https://raphaelsty.github.io/cherche/). It provides details\nabout retrievers, rankers, pipelines and examples.\n\n## QuickStart \ud83d\udcd1\n\n### Documents\n\nCherche allows findings the right document within a list of objects. Here is an example of a corpus.\n\n```python\nfrom cherche import data\n\ndocuments = data.load_towns()\n\ndocuments[:3]\n[{'id': 0,\n 'title': 'Paris',\n 'url': 'https://en.wikipedia.org/wiki/Paris',\n 'article': 'Paris is the capital and most populous city of France.'},\n {'id': 1,\n 'title': 'Paris',\n 'url': 'https://en.wikipedia.org/wiki/Paris',\n 'article': \"Since the 17th century, Paris has been one of Europe's major centres of science, and arts.\"},\n {'id': 2,\n 'title': 'Paris',\n 'url': 'https://en.wikipedia.org/wiki/Paris',\n 'article': 'The City of Paris is the centre and seat of government of the region and province of \u00cele-de-France.'\n }]\n```\n\n### Retriever ranker\n\nHere is an example of a neural search pipeline composed of a TF-IDF that quickly retrieves documents, followed by a ranking model. The ranking model sorts the documents produced by the retriever based on the semantic similarity between the query and the documents. We can call the pipeline using a list of queries and get relevant documents for each query.\n\n```python\nfrom cherche import data, retrieve, rank\nfrom sentence_transformers import SentenceTransformer\nfrom lenlp import sparse\n\n# List of dicts\ndocuments = data.load_towns()\n\n# Retrieve on fields title and article\nretriever = retrieve.BM25(\n key=\"id\", \n on=[\"title\", \"article\"], \n documents=documents, \n k=30\n)\n\n# Rank on fields title and article\nranker = rank.Encoder(\n key = \"id\",\n on = [\"title\", \"article\"],\n encoder = SentenceTransformer(\"sentence-transformers/all-mpnet-base-v2\").encode,\n k = 3,\n)\n\n# Pipeline creation\nsearch = retriever + ranker\n\nsearch.add(documents=documents)\n\n# Search documents for 3 queries.\nsearch([\"Bordeaux\", \"Paris\", \"Toulouse\"])\n[[{'id': 57, 'similarity': 0.69513524},\n {'id': 63, 'similarity': 0.6214994},\n {'id': 65, 'similarity': 0.61809087}],\n [{'id': 16, 'similarity': 0.59158516},\n {'id': 0, 'similarity': 0.58217555},\n {'id': 1, 'similarity': 0.57944715}],\n [{'id': 26, 'similarity': 0.6925601},\n {'id': 37, 'similarity': 0.63977146},\n {'id': 28, 'similarity': 0.62772334}]]\n```\n\nWe can map the index to the documents to access their contents using pipelines:\n\n```python\nsearch += documents\nsearch([\"Bordeaux\", \"Paris\", \"Toulouse\"])\n[[{'id': 57,\n 'title': 'Bordeaux',\n 'url': 'https://en.wikipedia.org/wiki/Bordeaux',\n 'similarity': 0.69513524},\n {'id': 63,\n 'title': 'Bordeaux',\n 'similarity': 0.6214994},\n {'id': 65,\n 'title': 'Bordeaux',\n 'url': 'https://en.wikipedia.org/wiki/Bordeaux',\n 'similarity': 0.61809087}],\n [{'id': 16,\n 'title': 'Paris',\n 'url': 'https://en.wikipedia.org/wiki/Paris',\n 'article': 'Paris received 12.',\n 'similarity': 0.59158516},\n {'id': 0,\n 'title': 'Paris',\n 'url': 'https://en.wikipedia.org/wiki/Paris',\n 'similarity': 0.58217555},\n {'id': 1,\n 'title': 'Paris',\n 'url': 'https://en.wikipedia.org/wiki/Paris',\n 'similarity': 0.57944715}],\n [{'id': 26,\n 'title': 'Toulouse',\n 'url': 'https://en.wikipedia.org/wiki/Toulouse',\n 'similarity': 0.6925601},\n {'id': 37,\n 'title': 'Toulouse',\n 'url': 'https://en.wikipedia.org/wiki/Toulouse',\n 'similarity': 0.63977146},\n {'id': 28,\n 'title': 'Toulouse',\n 'url': 'https://en.wikipedia.org/wiki/Toulouse',\n 'similarity': 0.62772334}]]\n```\n\n## Retrieve\n\nCherche provides [retrievers](https://raphaelsty.github.io/cherche/retrieve/retrieve/) that filter input documents based on a query.\n\n- retrieve.TfIdf\n- retrieve.BM25\n- retrieve.Lunr\n- retrieve.Flash\n- retrieve.Encoder\n- retrieve.DPR\n- retrieve.Fuzz\n- retrieve.Embedding\n\n## Rank\n\nCherche provides [rankers](https://raphaelsty.github.io/cherche/rank/rank/) that filter documents in output of retrievers.\n\nCherche rankers are compatible with [SentenceTransformers](https://www.sbert.net/docs/pretrained_models.html) models which are available on [Hugging Face hub](https://huggingface.co/models?pipeline_tag=zero-shot-classification&sort=downloads).\n\n- rank.Encoder\n- rank.DPR\n- rank.CrossEncoder\n- rank.Embedding\n\n## Question answering\n\nCherche provides modules dedicated to question answering. These modules are compatible with Hugging Face's pre-trained models and fully integrated into neural search pipelines.\n\n## Contributors \ud83e\udd1d\nCherche was created for/by Renault and is now available to all.\nWe welcome all contributions.\n\n<p align=\"center\"><img src=\"docs/img/renault.jpg\"/></p>\n\n## Acknowledgements \ud83d\udc4f\n\nLunr retriever is a wrapper around [Lunr.py](https://github.com/yeraydiazdiaz/lunr.py). Flash retriever is a wrapper around [FlashText](https://github.com/vi3k6i5/flashtext). DPR, Encode and CrossEncoder rankers are wrappers dedicated to the use of the pre-trained models of [SentenceTransformers](https://www.sbert.net/docs/pretrained_models.html) in a neural search pipeline.\n\n## Citations\n\nIf you use cherche to produce results for your scientific publication, please refer to our SIGIR paper:\n\n```bibtex\n@inproceedings{Sourty2022sigir,\n author = {Raphael Sourty and Jose G. Moreno and Lynda Tamine and Francois-Paul Servant},\n title = {CHERCHE: A new tool to rapidly implement pipelines in information retrieval},\n booktitle = {Proceedings of SIGIR 2022},\n year = {2022}\n}\n```\n\n## Dev Team \ud83d\udcbe\n\nThe Cherche dev team is made up of [Rapha\u00ebl Sourty](https://github.com/raphaelsty), [Fran\u00e7ois-Paul Servant](https://github.com/fpservant), [Nicolas Bizzozzero](https://github.com/NicolasBizzozzero), [Jose G Moreno](https://scholar.google.com/citations?user=4BZFUw8AAAAJ&hl=fr). \ud83e\udd73\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Neural Search",
"version": "2.2.1",
"project_urls": {
"Download": "https://github.com/user/cherche/archive/v_01.tar.gz",
"Homepage": "https://github.com/raphaelsty/cherche"
},
"split_keywords": [
"neural search",
" information retrieval",
" question answering",
" semantic search"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "e649f33984b81c3acfbeff2278f26fced217ace2b8410ba22f70b3b81d773913",
"md5": "40070b82d9f5fa11e8b38640c123a63d",
"sha256": "7984a88f0bf9349cd08907baefc2527deaacea95d3acf5f4a1692896c507fe34"
},
"downloads": -1,
"filename": "cherche-2.2.1.tar.gz",
"has_sig": false,
"md5_digest": "40070b82d9f5fa11e8b38640c123a63d",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 1915769,
"upload_time": "2024-06-01T17:04:25",
"upload_time_iso_8601": "2024-06-01T17:04:25.745248Z",
"url": "https://files.pythonhosted.org/packages/e6/49/f33984b81c3acfbeff2278f26fced217ace2b8410ba22f70b3b81d773913/cherche-2.2.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-06-01 17:04:25",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "raphaelsty",
"github_project": "cherche",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "cherche"
}