neural-cherche


Nameneural-cherche JSON
Version 1.0.0 PyPI version JSON
download
home_pagehttps://github.com/raphaelsty/neural-cherche
SummarySparse Embeddings for Neural Search.
upload_time2023-11-16 22:57:51
maintainer
docs_urlNone
authorRaphael Sourty
requires_python>=3.6
license
keywords neural search information retrieval semantic search sparseembed google research splade stanford colbert
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <div align="center">
  <h1>Neural-Cherche</h1>
  <p>Neural Search</p>
</div>

<p align="center"><img width=500 src="docs/img/logo.png"/></p>

<div align="center">
  <!-- Documentation -->
  <a href="https://raphaelsty.github.io/neural-cherche/"><img src="https://img.shields.io/website?label=Documentation&style=flat-square&url=https%3A%2F%2Fraphaelsty.github.io/neural-cherche/%2F" alt="documentation"></a>
  <!-- License -->
  <a href="https://opensource.org/licenses/MIT"><img src="https://img.shields.io/badge/License-MIT-blue.svg?style=flat-square" alt="license"></a>
</div>

Neural-Cherche is a library designed to fine-tune neural search models such as Splade, ColBERT, and SparseEmbed on a specific dataset. Neural-Cherche also provide classes to run efficient inference on a fine-tuned retriever or ranker. Neural-Cherche aims to offer a straightforward and effective method for fine-tuning and utilizing neural search models in both offline and online settings. It also enables users to save all computed embeddings to prevent redundant computations.

## Installation

We can install neural_cherche using:

```
pip install neural-cherche
```

If we plan to evaluate our model while training install:

```
pip install "neural-cherche[eval]"
```

## Documentation

The complete documentation is available [here](https://neural-cherche.readthedocs.io/en/latest/).

## Quick Start

Your training dataset must be made out of triples `(anchor, positive, negative)` where anchor is a query, positive is a document that is directly linked to the anchor and negative is a document that is not relevant for the anchor.

```python
X = [
    ("anchor 1", "positive 1", "negative 1"),
    ("anchor 2", "positive 2", "negative 2"),
    ("anchor 3", "positive 3", "negative 3"),
]
```

And here is how to fine-tune ColBERT using neural-cherche:

```python
import torch

from neural_cherche import models, utils, train

model = models.ColBERT(
    model_name_or_path="sentence-transformers/all-mpnet-base-v2",
    device="cuda" if torch.cuda.is_available() else "cpu"
)

optimizer = torch.optim.AdamW(model.parameters(), lr=3e-5)

X = [
    ("query", "positive document", "negative document"),
    ("query", "positive document", "negative document"),
    ("query", "positive document", "negative document"),
]

for anchor, positive, negative in utils.iter(
        X,
        epochs=1,
        batch_size=32,
        shuffle=True
    ):

    loss = train.train_colbert(
        model=model,
        optimizer=optimizer,
        anchor=anchor,
        positive=positive,
        negative=negative,
    )

model.save_pretrained("checkpoint")
```

## References

- *[SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking](https://arxiv.org/abs/2107.05720)* authored by Thibault Formal, Benjamin Piwowarski, Stéphane Clinchant, SIGIR 2021.

- *[SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval](https://arxiv.org/abs/2109.10086)* authored by Thibault Formal, Carlos Lassance, Benjamin Piwowarski, Stéphane Clinchant, SIGIR 2022.

- *[SparseEmbed: Learning Sparse Lexical Representations with Contextual Embeddings for Retrieval](https://research.google/pubs/pub52289/)* authored by Weize Kong, Jeffrey M. Dudek, Cheng Li, Mingyang Zhang, and Mike Bendersky, SIGIR 2023.

- *[ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT](https://arxiv.org/abs/2004.12832)* authored by Omar Khattab, Matei Zaharia, SIGIR 2020.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/raphaelsty/neural-cherche",
    "name": "neural-cherche",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "neural search,information retrieval,semantic search,SparseEmbed,Google Research,Splade,Stanford,ColBERT",
    "author": "Raphael Sourty",
    "author_email": "raphael.sourty@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/83/be/c976cd8a03f90697ba27205e5978a726f400f81a56a40e22d7bcba694c37/neural_cherche-1.0.0.tar.gz",
    "platform": null,
    "description": "<div align=\"center\">\n  <h1>Neural-Cherche</h1>\n  <p>Neural Search</p>\n</div>\n\n<p align=\"center\"><img width=500 src=\"docs/img/logo.png\"/></p>\n\n<div align=\"center\">\n  <!-- Documentation -->\n  <a href=\"https://raphaelsty.github.io/neural-cherche/\"><img src=\"https://img.shields.io/website?label=Documentation&style=flat-square&url=https%3A%2F%2Fraphaelsty.github.io/neural-cherche/%2F\" alt=\"documentation\"></a>\n  <!-- License -->\n  <a href=\"https://opensource.org/licenses/MIT\"><img src=\"https://img.shields.io/badge/License-MIT-blue.svg?style=flat-square\" alt=\"license\"></a>\n</div>\n\nNeural-Cherche is a library designed to fine-tune neural search models such as Splade, ColBERT, and SparseEmbed on a specific dataset. Neural-Cherche also provide classes to run efficient inference on a fine-tuned retriever or ranker. Neural-Cherche aims to offer a straightforward and effective method for fine-tuning and utilizing neural search models in both offline and online settings. It also enables users to save all computed embeddings to prevent redundant computations.\n\n## Installation\n\nWe can install neural_cherche using:\n\n```\npip install neural-cherche\n```\n\nIf we plan to evaluate our model while training install:\n\n```\npip install \"neural-cherche[eval]\"\n```\n\n## Documentation\n\nThe complete documentation is available [here](https://neural-cherche.readthedocs.io/en/latest/).\n\n## Quick Start\n\nYour training dataset must be made out of triples `(anchor, positive, negative)` where anchor is a query, positive is a document that is directly linked to the anchor and negative is a document that is not relevant for the anchor.\n\n```python\nX = [\n    (\"anchor 1\", \"positive 1\", \"negative 1\"),\n    (\"anchor 2\", \"positive 2\", \"negative 2\"),\n    (\"anchor 3\", \"positive 3\", \"negative 3\"),\n]\n```\n\nAnd here is how to fine-tune ColBERT using neural-cherche:\n\n```python\nimport torch\n\nfrom neural_cherche import models, utils, train\n\nmodel = models.ColBERT(\n    model_name_or_path=\"sentence-transformers/all-mpnet-base-v2\",\n    device=\"cuda\" if torch.cuda.is_available() else \"cpu\"\n)\n\noptimizer = torch.optim.AdamW(model.parameters(), lr=3e-5)\n\nX = [\n    (\"query\", \"positive document\", \"negative document\"),\n    (\"query\", \"positive document\", \"negative document\"),\n    (\"query\", \"positive document\", \"negative document\"),\n]\n\nfor anchor, positive, negative in utils.iter(\n        X,\n        epochs=1,\n        batch_size=32,\n        shuffle=True\n    ):\n\n    loss = train.train_colbert(\n        model=model,\n        optimizer=optimizer,\n        anchor=anchor,\n        positive=positive,\n        negative=negative,\n    )\n\nmodel.save_pretrained(\"checkpoint\")\n```\n\n## References\n\n- *[SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking](https://arxiv.org/abs/2107.05720)* authored by Thibault Formal, Benjamin Piwowarski, St\u00e9phane Clinchant, SIGIR 2021.\n\n- *[SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval](https://arxiv.org/abs/2109.10086)* authored by Thibault Formal, Carlos Lassance, Benjamin Piwowarski, St\u00e9phane Clinchant, SIGIR 2022.\n\n- *[SparseEmbed: Learning Sparse Lexical Representations with Contextual Embeddings for Retrieval](https://research.google/pubs/pub52289/)* authored by Weize Kong, Jeffrey M. Dudek, Cheng Li, Mingyang Zhang, and Mike Bendersky, SIGIR 2023.\n\n- *[ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT](https://arxiv.org/abs/2004.12832)* authored by Omar Khattab, Matei Zaharia, SIGIR 2020.\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Sparse Embeddings for Neural Search.",
    "version": "1.0.0",
    "project_urls": {
        "Download": "https://github.com/user/neural-cherche/archive/v_01.tar.gz",
        "Homepage": "https://github.com/raphaelsty/neural-cherche"
    },
    "split_keywords": [
        "neural search",
        "information retrieval",
        "semantic search",
        "sparseembed",
        "google research",
        "splade",
        "stanford",
        "colbert"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "83bec976cd8a03f90697ba27205e5978a726f400f81a56a40e22d7bcba694c37",
                "md5": "29da5f7de44a9f590ee69ba63a436719",
                "sha256": "c417fbf477ad57cfca2acd5f15fb599a19ed76ab3ef0f3112799749f0ae8aca8"
            },
            "downloads": -1,
            "filename": "neural_cherche-1.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "29da5f7de44a9f590ee69ba63a436719",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 22993,
            "upload_time": "2023-11-16T22:57:51",
            "upload_time_iso_8601": "2023-11-16T22:57:51.109226Z",
            "url": "https://files.pythonhosted.org/packages/83/be/c976cd8a03f90697ba27205e5978a726f400f81a56a40e22d7bcba694c37/neural_cherche-1.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-11-16 22:57:51",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "raphaelsty",
    "github_project": "neural-cherche",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "neural-cherche"
}
        
Elapsed time: 0.14277s