<div align="center">
<h1>Neural-Cherche</h1>
<p>Neural Search</p>
</div>
<p align="center"><img width=500 src="docs/img/logo.png"/></p>
<div align="center">
<!-- Documentation -->
<a href="https://raphaelsty.github.io/neural-cherche/"><img src="https://img.shields.io/website?label=Documentation&style=flat-square&url=https%3A%2F%2Fraphaelsty.github.io/neural-cherche/%2F" alt="documentation"></a>
<!-- License -->
<a href="https://opensource.org/licenses/MIT"><img src="https://img.shields.io/badge/License-MIT-blue.svg?style=flat-square" alt="license"></a>
</div>
Neural-Cherche is a library designed to fine-tune neural search models such as Splade, ColBERT, and SparseEmbed on a specific dataset. Neural-Cherche also provide classes to run efficient inference on a fine-tuned retriever or ranker. Neural-Cherche aims to offer a straightforward and effective method for fine-tuning and utilizing neural search models in both offline and online settings. It also enables users to save all computed embeddings to prevent redundant computations.
## Installation
We can install neural_cherche using:
```
pip install neural-cherche
```
If we plan to evaluate our model while training install:
```
pip install "neural-cherche[eval]"
```
## Documentation
The complete documentation is available [here](https://neural-cherche.readthedocs.io/en/latest/).
## Quick Start
Your training dataset must be made out of triples `(anchor, positive, negative)` where anchor is a query, positive is a document that is directly linked to the anchor and negative is a document that is not relevant for the anchor.
```python
X = [
("anchor 1", "positive 1", "negative 1"),
("anchor 2", "positive 2", "negative 2"),
("anchor 3", "positive 3", "negative 3"),
]
```
And here is how to fine-tune ColBERT using neural-cherche:
```python
import torch
from neural_cherche import models, utils, train
model = models.ColBERT(
model_name_or_path="sentence-transformers/all-mpnet-base-v2",
device="cuda" if torch.cuda.is_available() else "cpu"
)
optimizer = torch.optim.AdamW(model.parameters(), lr=3e-5)
X = [
("query", "positive document", "negative document"),
("query", "positive document", "negative document"),
("query", "positive document", "negative document"),
]
for anchor, positive, negative in utils.iter(
X,
epochs=1,
batch_size=32,
shuffle=True
):
loss = train.train_colbert(
model=model,
optimizer=optimizer,
anchor=anchor,
positive=positive,
negative=negative,
)
model.save_pretrained("checkpoint")
```
## References
- *[SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking](https://arxiv.org/abs/2107.05720)* authored by Thibault Formal, Benjamin Piwowarski, Stéphane Clinchant, SIGIR 2021.
- *[SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval](https://arxiv.org/abs/2109.10086)* authored by Thibault Formal, Carlos Lassance, Benjamin Piwowarski, Stéphane Clinchant, SIGIR 2022.
- *[SparseEmbed: Learning Sparse Lexical Representations with Contextual Embeddings for Retrieval](https://research.google/pubs/pub52289/)* authored by Weize Kong, Jeffrey M. Dudek, Cheng Li, Mingyang Zhang, and Mike Bendersky, SIGIR 2023.
- *[ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT](https://arxiv.org/abs/2004.12832)* authored by Omar Khattab, Matei Zaharia, SIGIR 2020.
Raw data
{
"_id": null,
"home_page": "https://github.com/raphaelsty/neural-cherche",
"name": "neural-cherche",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": "",
"keywords": "neural search,information retrieval,semantic search,SparseEmbed,Google Research,Splade,Stanford,ColBERT",
"author": "Raphael Sourty",
"author_email": "raphael.sourty@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/83/be/c976cd8a03f90697ba27205e5978a726f400f81a56a40e22d7bcba694c37/neural_cherche-1.0.0.tar.gz",
"platform": null,
"description": "<div align=\"center\">\n <h1>Neural-Cherche</h1>\n <p>Neural Search</p>\n</div>\n\n<p align=\"center\"><img width=500 src=\"docs/img/logo.png\"/></p>\n\n<div align=\"center\">\n <!-- Documentation -->\n <a href=\"https://raphaelsty.github.io/neural-cherche/\"><img src=\"https://img.shields.io/website?label=Documentation&style=flat-square&url=https%3A%2F%2Fraphaelsty.github.io/neural-cherche/%2F\" alt=\"documentation\"></a>\n <!-- License -->\n <a href=\"https://opensource.org/licenses/MIT\"><img src=\"https://img.shields.io/badge/License-MIT-blue.svg?style=flat-square\" alt=\"license\"></a>\n</div>\n\nNeural-Cherche is a library designed to fine-tune neural search models such as Splade, ColBERT, and SparseEmbed on a specific dataset. Neural-Cherche also provide classes to run efficient inference on a fine-tuned retriever or ranker. Neural-Cherche aims to offer a straightforward and effective method for fine-tuning and utilizing neural search models in both offline and online settings. It also enables users to save all computed embeddings to prevent redundant computations.\n\n## Installation\n\nWe can install neural_cherche using:\n\n```\npip install neural-cherche\n```\n\nIf we plan to evaluate our model while training install:\n\n```\npip install \"neural-cherche[eval]\"\n```\n\n## Documentation\n\nThe complete documentation is available [here](https://neural-cherche.readthedocs.io/en/latest/).\n\n## Quick Start\n\nYour training dataset must be made out of triples `(anchor, positive, negative)` where anchor is a query, positive is a document that is directly linked to the anchor and negative is a document that is not relevant for the anchor.\n\n```python\nX = [\n (\"anchor 1\", \"positive 1\", \"negative 1\"),\n (\"anchor 2\", \"positive 2\", \"negative 2\"),\n (\"anchor 3\", \"positive 3\", \"negative 3\"),\n]\n```\n\nAnd here is how to fine-tune ColBERT using neural-cherche:\n\n```python\nimport torch\n\nfrom neural_cherche import models, utils, train\n\nmodel = models.ColBERT(\n model_name_or_path=\"sentence-transformers/all-mpnet-base-v2\",\n device=\"cuda\" if torch.cuda.is_available() else \"cpu\"\n)\n\noptimizer = torch.optim.AdamW(model.parameters(), lr=3e-5)\n\nX = [\n (\"query\", \"positive document\", \"negative document\"),\n (\"query\", \"positive document\", \"negative document\"),\n (\"query\", \"positive document\", \"negative document\"),\n]\n\nfor anchor, positive, negative in utils.iter(\n X,\n epochs=1,\n batch_size=32,\n shuffle=True\n ):\n\n loss = train.train_colbert(\n model=model,\n optimizer=optimizer,\n anchor=anchor,\n positive=positive,\n negative=negative,\n )\n\nmodel.save_pretrained(\"checkpoint\")\n```\n\n## References\n\n- *[SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking](https://arxiv.org/abs/2107.05720)* authored by Thibault Formal, Benjamin Piwowarski, St\u00e9phane Clinchant, SIGIR 2021.\n\n- *[SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval](https://arxiv.org/abs/2109.10086)* authored by Thibault Formal, Carlos Lassance, Benjamin Piwowarski, St\u00e9phane Clinchant, SIGIR 2022.\n\n- *[SparseEmbed: Learning Sparse Lexical Representations with Contextual Embeddings for Retrieval](https://research.google/pubs/pub52289/)* authored by Weize Kong, Jeffrey M. Dudek, Cheng Li, Mingyang Zhang, and Mike Bendersky, SIGIR 2023.\n\n- *[ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT](https://arxiv.org/abs/2004.12832)* authored by Omar Khattab, Matei Zaharia, SIGIR 2020.\n",
"bugtrack_url": null,
"license": "",
"summary": "Sparse Embeddings for Neural Search.",
"version": "1.0.0",
"project_urls": {
"Download": "https://github.com/user/neural-cherche/archive/v_01.tar.gz",
"Homepage": "https://github.com/raphaelsty/neural-cherche"
},
"split_keywords": [
"neural search",
"information retrieval",
"semantic search",
"sparseembed",
"google research",
"splade",
"stanford",
"colbert"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "83bec976cd8a03f90697ba27205e5978a726f400f81a56a40e22d7bcba694c37",
"md5": "29da5f7de44a9f590ee69ba63a436719",
"sha256": "c417fbf477ad57cfca2acd5f15fb599a19ed76ab3ef0f3112799749f0ae8aca8"
},
"downloads": -1,
"filename": "neural_cherche-1.0.0.tar.gz",
"has_sig": false,
"md5_digest": "29da5f7de44a9f590ee69ba63a436719",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 22993,
"upload_time": "2023-11-16T22:57:51",
"upload_time_iso_8601": "2023-11-16T22:57:51.109226Z",
"url": "https://files.pythonhosted.org/packages/83/be/c976cd8a03f90697ba27205e5978a726f400f81a56a40e22d7bcba694c37/neural_cherche-1.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-11-16 22:57:51",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "raphaelsty",
"github_project": "neural-cherche",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "neural-cherche"
}