<!--- BADGES: START --->
[](https://huggingface.co/models?library=sentence-transformers)
[][#github-license]
[][#pypi-package]
[][#pypi-package]
[][#docs-package]
<!-- [][#pypi-package] -->
[#github-license]: https://github.com/UKPLab/sentence-transformers/blob/master/LICENSE
[#pypi-package]: https://pypi.org/project/sentence-transformers/
[#conda-forge-package]: https://anaconda.org/conda-forge/sentence-transformers
[#docs-package]: https://www.sbert.net/
<!--- BADGES: END --->
# Sentence Transformers: Embeddings, Retrieval, and Reranking
This framework provides an easy method to compute embeddings for accessing, using, and training state-of-the-art embedding and reranker models. It can be used to compute embeddings using Sentence Transformer models ([quickstart](https://sbert.net/docs/quickstart.html#sentence-transformer)), to calculate similarity scores using Cross-Encoder (a.k.a. reranker) models ([quickstart](https://sbert.net/docs/quickstart.html#cross-encoder)) or to generate sparse embeddings using Sparse Encoder models ([quickstart](https://sbert.net/docs/quickstart.html#sparse-encoder)). This unlocks a wide range of applications, including [semantic search](https://sbert.net/examples/applications/semantic-search/README.html), [semantic textual similarity](https://sbert.net/docs/sentence_transformer/usage/semantic_textual_similarity.html), and [paraphrase mining](https://sbert.net/examples/applications/paraphrase-mining/README.html).
A wide selection of over [15,000 pre-trained Sentence Transformers models](https://huggingface.co/models?library=sentence-transformers) are available for immediate use on 🤗 Hugging Face, including many of the state-of-the-art models from the [Massive Text Embeddings Benchmark (MTEB) leaderboard](https://huggingface.co/spaces/mteb/leaderboard). Additionally, it is easy to train or finetune your own [embedding models](https://sbert.net/docs/sentence_transformer/training_overview.html), [reranker models](https://sbert.net/docs/cross_encoder/training_overview.html) or [sparse encoder models](https://sbert.net/docs/sparse_encoder/training_overview.html) using Sentence Transformers, enabling you to create custom models for your specific use cases.
For the **full documentation**, see **[www.SBERT.net](https://www.sbert.net)**.
## Installation
We recommend **Python 3.9+**, **[PyTorch 1.11.0+](https://pytorch.org/get-started/locally/)**, and **[transformers v4.34.0+](https://github.com/huggingface/transformers)**.
**Install with pip**
```
pip install -U sentence-transformers
```
**Install with conda**
```
conda install -c conda-forge sentence-transformers
```
**Install from sources**
Alternatively, you can also clone the latest version from the [repository](https://github.com/UKPLab/sentence-transformers) and install it directly from the source code:
````
pip install -e .
````
**PyTorch with CUDA**
If you want to use a GPU / CUDA, you must install PyTorch with the matching CUDA Version. Follow
[PyTorch - Get Started](https://pytorch.org/get-started/locally/) for further details how to install PyTorch.
## Getting Started
See [Quickstart](https://www.sbert.net/docs/quickstart.html) in our documentation.
### Embedding Models
First download a pretrained embedding a.k.a. Sentence Transformer model.
````python
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")
````
Then provide some texts to the model.
````python
sentences = [
"The weather is lovely today.",
"It's so sunny outside!",
"He drove to the stadium.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# => (3, 384)
````
And that's already it. We now have numpy arrays with the embeddings, one for each text. We can use these to compute similarities.
````python
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.6660, 0.1046],
# [0.6660, 1.0000, 0.1411],
# [0.1046, 0.1411, 1.0000]])
````
### Reranker Models
First download a pretrained reranker a.k.a. Cross Encoder model.
```python
from sentence_transformers import CrossEncoder
# 1. Load a pretrained CrossEncoder model
model = CrossEncoder("cross-encoder/ms-marco-MiniLM-L6-v2")
```
Then provide some texts to the model.
```python
# The texts for which to predict similarity scores
query = "How many people live in Berlin?"
passages = [
"Berlin had a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.",
"Berlin has a yearly total of about 135 million day visitors, making it one of the most-visited cities in the European Union.",
"In 2013 around 600,000 Berliners were registered in one of the more than 2,300 sport and fitness clubs.",
]
# 2a. predict scores for pairs of texts
scores = model.predict([(query, passage) for passage in passages])
print(scores)
# => [8.607139 5.506266 6.352977]
```
And we're good to go. You can also use [`model.rank`](https://sbert.net/docs/package_reference/cross_encoder/cross_encoder.html#sentence_transformers.cross_encoder.CrossEncoder.rank) to avoid having to perform the reranking manually:
```python
# 2b. Rank a list of passages for a query
ranks = model.rank(query, passages, return_documents=True)
print("Query:", query)
for rank in ranks:
print(f"- #{rank['corpus_id']} ({rank['score']:.2f}): {rank['text']}")
"""
Query: How many people live in Berlin?
- #0 (8.61): Berlin had a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.
- #2 (6.35): In 2013 around 600,000 Berliners were registered in one of the more than 2,300 sport and fitness clubs.
- #1 (5.51): Berlin has a yearly total of about 135 million day visitors, making it one of the most-visited cities in the European Union.
"""
```
### Sparse Encoder Models
First download a pretrained sparse embedding a.k.a. Sparse Encoder model.
```python
from sentence_transformers import SparseEncoder
# 1. Load a pretrained SparseEncoder model
model = SparseEncoder("naver/splade-cocondenser-ensembledistil")
# The sentences to encode
sentences = [
"The weather is lovely today.",
"It's so sunny outside!",
"He drove to the stadium.",
]
# 2. Calculate sparse embeddings by calling model.encode()
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 30522] - sparse representation with vocabulary size dimensions
# 3. Calculate the embedding similarities
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 35.629, 9.154, 0.098],
# [ 9.154, 27.478, 0.019],
# [ 0.098, 0.019, 29.553]])
# 4. Check sparsity stats
stats = SparseEncoder.sparsity(embeddings)
print(f"Sparsity: {stats['sparsity_ratio']:.2%}")
# Sparsity: 99.84%
```
## Pre-Trained Models
We provide a large list of pretrained models for more than 100 languages. Some models are general purpose models, while others produce embeddings for specific use cases.
* [Pretrained Sentence Transformer (Embedding) Models](https://sbert.net/docs/sentence_transformer/pretrained_models.html)
* [Pretrained Cross Encoder (Reranker) Models](https://sbert.net/docs/cross_encoder/pretrained_models.html)
* [Pretrained Sparse Encoder (Sparse Embeddings) Models](https://sbert.net/docs/sparse_encoder/pretrained_models.html)
## Training
This framework allows you to fine-tune your own sentence embedding methods, so that you get task-specific sentence embeddings. You have various options to choose from in order to get perfect sentence embeddings for your specific task.
* Embedding Models
* [Sentence Transformer > Training Overview](https://www.sbert.net/docs/sentence_transformer/training_overview.html)
* [Sentence Transformer > Training Examples](https://www.sbert.net/docs/sentence_transformer/training/examples.html) or [training examples on GitHub](https://github.com/UKPLab/sentence-transformers/tree/master/examples/sentence_transformer/training).
* Reranker Models
* [Cross Encoder > Training Overview](https://www.sbert.net/docs/cross_encoder/training_overview.html)
* [Cross Encoder > Training Examples](https://www.sbert.net/docs/cross_encoder/training/examples.html) or [training examples on GitHub](https://github.com/UKPLab/sentence-transformers/tree/master/examples/cross_encoder/training).
* Sparse Embedding Models
* [Sparse Encoder > Training Overview](https://www.sbert.net/docs/sparse_encoder/training_overview.html)
* [Sparse Encoder > Training Examples](https://www.sbert.net/docs/sparse_encoder/training/examples.html) or [training examples on GitHub](https://github.com/UKPLab/sentence-transformers/tree/master/examples/sparse_encoder/training).
Some highlights across the different types of training are:
- Support of various transformer networks including BERT, RoBERTa, XLM-R, DistilBERT, Electra, BART, ...
- Multi-Lingual and multi-task learning
- Evaluation during training to find optimal model
- [20+ loss functions](https://www.sbert.net/docs/package_reference/sentence_transformer/losses.html) for embedding models, [10+ loss functions](https://www.sbert.net/docs/package_reference/cross_encoder/losses.html) for reranker models and [10+ loss functions](https://www.sbert.net/docs/package_reference/sparse_encoder/losses.html) for sparse embedding models, allowing you to tune models specifically for semantic search, paraphrase mining, semantic similarity comparison, clustering, triplet loss, contrastive loss, etc.
## Application Examples
You can use this framework for:
- **Computing Sentence Embeddings**
- [Dense Embeddings](https://www.sbert.net/examples/sentence_transformer/applications/computing-embeddings/README.html)
- [Sparse Embeddings](https://www.sbert.net/examples/sparse_encoder/applications/computing_embeddings/README.html)
- **Semantic Textual Similarity**
- [Dense STS](https://www.sbert.net/docs/sentence_transformer/usage/semantic_textual_similarity.html)
- [Sparse STS](https://www.sbert.net/examples/sparse_encoder/applications/semantic_textual_similarity/README.html)
- **Semantic Search**
- [Dense Search](https://www.sbert.net/examples/sentence_transformer/applications/semantic-search/README.html)
- [Sparse Search](https://www.sbert.net/examples/sparse_encoder/applications/semantic_search/README.html)
- **Retrieve & Re-Rank**
- [Dense only Retrieval](https://www.sbert.net/examples/sentence_transformer/applications/retrieve_rerank/README.html)
- [Sparse/Dense/Hybrid Retrieval](https://www.sbert.net/examples/sentence_transformer/applications/retrieve_rerank/README.html)
- [Clustering](https://www.sbert.net/examples/sentence_transformer/applications/clustering/README.html)
- [Paraphrase Mining](https://www.sbert.net/examples/sentence_transformer/applications/paraphrase-mining/README.html)
- [Translated Sentence Mining](https://www.sbert.net/examples/sentence_transformer/applications/parallel-sentence-mining/README.html)
- [Multilingual Image Search, Clustering & Duplicate Detection](https://www.sbert.net/examples/sentence_transformer/applications/image-search/README.html)
and many more use-cases.
For all examples, see [examples/sentence_transformer/applications](https://github.com/UKPLab/sentence-transformers/tree/master/examples/sentence_transformer/applications).
## Development setup
After cloning the repo (or a fork) to your machine, in a virtual environment, run:
```
python -m pip install -e ".[dev]"
pre-commit install
```
To test your changes, run:
```
pytest
```
## Citing & Authors
If you find this repository helpful, feel free to cite our publication [Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks](https://arxiv.org/abs/1908.10084):
```bibtex
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
```
If you use one of the multilingual models, feel free to cite our publication [Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation](https://arxiv.org/abs/2004.09813):
```bibtex
@inproceedings{reimers-2020-multilingual-sentence-bert,
title = "Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2020",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/2004.09813",
}
```
Please have a look at [Publications](https://www.sbert.net/docs/publications.html) for our different publications that are integrated into SentenceTransformers.
Maintainer: [Tom Aarsen](https://github.com/tomaarsen), 🤗 Hugging Face
https://www.ukp.tu-darmstadt.de/
Don't hesitate to open an issue if something is broken (and it shouldn't be) or if you have further questions.
> This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.
Raw data
{
"_id": null,
"home_page": null,
"name": "sentence-transformers",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": "Tom Aarsen <tom.aarsen@huggingface.co>",
"keywords": "Transformer Networks, BERT, XLNet, sentence embedding, PyTorch, NLP, deep learning",
"author": null,
"author_email": "Nils Reimers <info@nils-reimers.de>, Tom Aarsen <tom.aarsen@huggingface.co>",
"download_url": "https://files.pythonhosted.org/packages/46/b8/1b99379b730bc403d8e9ddc2db56f8ac9ce743734b44a1dbeebb900490d4/sentence_transformers-5.1.0.tar.gz",
"platform": null,
"description": "<!--- BADGES: START --->\r\n[](https://huggingface.co/models?library=sentence-transformers)\r\n[][#github-license]\r\n[][#pypi-package]\r\n[][#pypi-package]\r\n[][#docs-package]\r\n<!-- [][#pypi-package] -->\r\n\r\n[#github-license]: https://github.com/UKPLab/sentence-transformers/blob/master/LICENSE\r\n[#pypi-package]: https://pypi.org/project/sentence-transformers/\r\n[#conda-forge-package]: https://anaconda.org/conda-forge/sentence-transformers\r\n[#docs-package]: https://www.sbert.net/\r\n<!--- BADGES: END --->\r\n\r\n# Sentence Transformers: Embeddings, Retrieval, and Reranking\r\n\r\nThis framework provides an easy method to compute embeddings for accessing, using, and training state-of-the-art embedding and reranker models. It can be used to compute embeddings using Sentence Transformer models ([quickstart](https://sbert.net/docs/quickstart.html#sentence-transformer)), to calculate similarity scores using Cross-Encoder (a.k.a. reranker) models ([quickstart](https://sbert.net/docs/quickstart.html#cross-encoder)) or to generate sparse embeddings using Sparse Encoder models ([quickstart](https://sbert.net/docs/quickstart.html#sparse-encoder)). This unlocks a wide range of applications, including [semantic search](https://sbert.net/examples/applications/semantic-search/README.html), [semantic textual similarity](https://sbert.net/docs/sentence_transformer/usage/semantic_textual_similarity.html), and [paraphrase mining](https://sbert.net/examples/applications/paraphrase-mining/README.html).\r\n\r\nA wide selection of over [15,000 pre-trained Sentence Transformers models](https://huggingface.co/models?library=sentence-transformers) are available for immediate use on \ud83e\udd17 Hugging Face, including many of the state-of-the-art models from the [Massive Text Embeddings Benchmark (MTEB) leaderboard](https://huggingface.co/spaces/mteb/leaderboard). Additionally, it is easy to train or finetune your own [embedding models](https://sbert.net/docs/sentence_transformer/training_overview.html), [reranker models](https://sbert.net/docs/cross_encoder/training_overview.html) or [sparse encoder models](https://sbert.net/docs/sparse_encoder/training_overview.html) using Sentence Transformers, enabling you to create custom models for your specific use cases.\r\n\r\nFor the **full documentation**, see **[www.SBERT.net](https://www.sbert.net)**.\r\n\r\n## Installation\r\n\r\nWe recommend **Python 3.9+**, **[PyTorch 1.11.0+](https://pytorch.org/get-started/locally/)**, and **[transformers v4.34.0+](https://github.com/huggingface/transformers)**.\r\n\r\n**Install with pip**\r\n\r\n```\r\npip install -U sentence-transformers\r\n```\r\n\r\n**Install with conda**\r\n\r\n```\r\nconda install -c conda-forge sentence-transformers\r\n```\r\n\r\n**Install from sources**\r\n\r\nAlternatively, you can also clone the latest version from the [repository](https://github.com/UKPLab/sentence-transformers) and install it directly from the source code:\r\n\r\n````\r\npip install -e .\r\n```` \r\n\r\n**PyTorch with CUDA**\r\n\r\nIf you want to use a GPU / CUDA, you must install PyTorch with the matching CUDA Version. Follow\r\n[PyTorch - Get Started](https://pytorch.org/get-started/locally/) for further details how to install PyTorch.\r\n\r\n## Getting Started\r\n\r\nSee [Quickstart](https://www.sbert.net/docs/quickstart.html) in our documentation.\r\n\r\n### Embedding Models\r\n\r\nFirst download a pretrained embedding a.k.a. Sentence Transformer model.\r\n\r\n````python\r\nfrom sentence_transformers import SentenceTransformer\r\n\r\nmodel = SentenceTransformer(\"all-MiniLM-L6-v2\")\r\n````\r\n\r\nThen provide some texts to the model.\r\n\r\n````python\r\nsentences = [\r\n \"The weather is lovely today.\",\r\n \"It's so sunny outside!\",\r\n \"He drove to the stadium.\",\r\n]\r\nembeddings = model.encode(sentences)\r\nprint(embeddings.shape)\r\n# => (3, 384)\r\n````\r\n\r\nAnd that's already it. We now have numpy arrays with the embeddings, one for each text. We can use these to compute similarities.\r\n\r\n````python\r\nsimilarities = model.similarity(embeddings, embeddings)\r\nprint(similarities)\r\n# tensor([[1.0000, 0.6660, 0.1046],\r\n# [0.6660, 1.0000, 0.1411],\r\n# [0.1046, 0.1411, 1.0000]])\r\n````\r\n\r\n### Reranker Models\r\n\r\nFirst download a pretrained reranker a.k.a. Cross Encoder model.\r\n\r\n```python\r\nfrom sentence_transformers import CrossEncoder\r\n\r\n# 1. Load a pretrained CrossEncoder model\r\nmodel = CrossEncoder(\"cross-encoder/ms-marco-MiniLM-L6-v2\")\r\n```\r\n\r\nThen provide some texts to the model.\r\n\r\n```python\r\n# The texts for which to predict similarity scores\r\nquery = \"How many people live in Berlin?\"\r\npassages = [\r\n \"Berlin had a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.\",\r\n \"Berlin has a yearly total of about 135 million day visitors, making it one of the most-visited cities in the European Union.\",\r\n \"In 2013 around 600,000 Berliners were registered in one of the more than 2,300 sport and fitness clubs.\",\r\n]\r\n\r\n# 2a. predict scores for pairs of texts\r\nscores = model.predict([(query, passage) for passage in passages])\r\nprint(scores)\r\n# => [8.607139 5.506266 6.352977]\r\n```\r\n\r\nAnd we're good to go. You can also use [`model.rank`](https://sbert.net/docs/package_reference/cross_encoder/cross_encoder.html#sentence_transformers.cross_encoder.CrossEncoder.rank) to avoid having to perform the reranking manually:\r\n\r\n```python\r\n# 2b. Rank a list of passages for a query\r\nranks = model.rank(query, passages, return_documents=True)\r\n\r\nprint(\"Query:\", query)\r\nfor rank in ranks:\r\n print(f\"- #{rank['corpus_id']} ({rank['score']:.2f}): {rank['text']}\")\r\n\"\"\"\r\nQuery: How many people live in Berlin?\r\n- #0 (8.61): Berlin had a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.\r\n- #2 (6.35): In 2013 around 600,000 Berliners were registered in one of the more than 2,300 sport and fitness clubs.\r\n- #1 (5.51): Berlin has a yearly total of about 135 million day visitors, making it one of the most-visited cities in the European Union.\r\n\"\"\"\r\n```\r\n### Sparse Encoder Models\r\n\r\nFirst download a pretrained sparse embedding a.k.a. Sparse Encoder model.\r\n\r\n```python\r\n\r\nfrom sentence_transformers import SparseEncoder\r\n\r\n# 1. Load a pretrained SparseEncoder model\r\nmodel = SparseEncoder(\"naver/splade-cocondenser-ensembledistil\")\r\n\r\n# The sentences to encode\r\nsentences = [\r\n \"The weather is lovely today.\",\r\n \"It's so sunny outside!\",\r\n \"He drove to the stadium.\",\r\n]\r\n\r\n# 2. Calculate sparse embeddings by calling model.encode()\r\nembeddings = model.encode(sentences)\r\nprint(embeddings.shape)\r\n# [3, 30522] - sparse representation with vocabulary size dimensions\r\n\r\n# 3. Calculate the embedding similarities\r\nsimilarities = model.similarity(embeddings, embeddings)\r\nprint(similarities)\r\n# tensor([[ 35.629, 9.154, 0.098],\r\n# [ 9.154, 27.478, 0.019],\r\n# [ 0.098, 0.019, 29.553]])\r\n\r\n# 4. Check sparsity stats\r\nstats = SparseEncoder.sparsity(embeddings)\r\nprint(f\"Sparsity: {stats['sparsity_ratio']:.2%}\")\r\n# Sparsity: 99.84%\r\n```\r\n\r\n## Pre-Trained Models\r\n\r\nWe provide a large list of pretrained models for more than 100 languages. Some models are general purpose models, while others produce embeddings for specific use cases. \r\n\r\n* [Pretrained Sentence Transformer (Embedding) Models](https://sbert.net/docs/sentence_transformer/pretrained_models.html)\r\n* [Pretrained Cross Encoder (Reranker) Models](https://sbert.net/docs/cross_encoder/pretrained_models.html)\r\n* [Pretrained Sparse Encoder (Sparse Embeddings) Models](https://sbert.net/docs/sparse_encoder/pretrained_models.html)\r\n\r\n## Training\r\n\r\nThis framework allows you to fine-tune your own sentence embedding methods, so that you get task-specific sentence embeddings. You have various options to choose from in order to get perfect sentence embeddings for your specific task. \r\n\r\n* Embedding Models\r\n * [Sentence Transformer > Training Overview](https://www.sbert.net/docs/sentence_transformer/training_overview.html)\r\n * [Sentence Transformer > Training Examples](https://www.sbert.net/docs/sentence_transformer/training/examples.html) or [training examples on GitHub](https://github.com/UKPLab/sentence-transformers/tree/master/examples/sentence_transformer/training).\r\n* Reranker Models\r\n * [Cross Encoder > Training Overview](https://www.sbert.net/docs/cross_encoder/training_overview.html)\r\n * [Cross Encoder > Training Examples](https://www.sbert.net/docs/cross_encoder/training/examples.html) or [training examples on GitHub](https://github.com/UKPLab/sentence-transformers/tree/master/examples/cross_encoder/training).\r\n* Sparse Embedding Models\r\n * [Sparse Encoder > Training Overview](https://www.sbert.net/docs/sparse_encoder/training_overview.html)\r\n * [Sparse Encoder > Training Examples](https://www.sbert.net/docs/sparse_encoder/training/examples.html) or [training examples on GitHub](https://github.com/UKPLab/sentence-transformers/tree/master/examples/sparse_encoder/training).\r\n\r\nSome highlights across the different types of training are:\r\n- Support of various transformer networks including BERT, RoBERTa, XLM-R, DistilBERT, Electra, BART, ...\r\n- Multi-Lingual and multi-task learning\r\n- Evaluation during training to find optimal model\r\n- [20+ loss functions](https://www.sbert.net/docs/package_reference/sentence_transformer/losses.html) for embedding models, [10+ loss functions](https://www.sbert.net/docs/package_reference/cross_encoder/losses.html) for reranker models and [10+ loss functions](https://www.sbert.net/docs/package_reference/sparse_encoder/losses.html) for sparse embedding models, allowing you to tune models specifically for semantic search, paraphrase mining, semantic similarity comparison, clustering, triplet loss, contrastive loss, etc.\r\n\r\n## Application Examples\r\n\r\nYou can use this framework for:\r\n\r\n- **Computing Sentence Embeddings**\r\n - [Dense Embeddings](https://www.sbert.net/examples/sentence_transformer/applications/computing-embeddings/README.html)\r\n - [Sparse Embeddings](https://www.sbert.net/examples/sparse_encoder/applications/computing_embeddings/README.html)\r\n\r\n- **Semantic Textual Similarity** \r\n - [Dense STS](https://www.sbert.net/docs/sentence_transformer/usage/semantic_textual_similarity.html)\r\n - [Sparse STS](https://www.sbert.net/examples/sparse_encoder/applications/semantic_textual_similarity/README.html)\r\n\r\n- **Semantic Search**\r\n - [Dense Search](https://www.sbert.net/examples/sentence_transformer/applications/semantic-search/README.html) \r\n - [Sparse Search](https://www.sbert.net/examples/sparse_encoder/applications/semantic_search/README.html)\r\n\r\n- **Retrieve & Re-Rank**\r\n - [Dense only Retrieval](https://www.sbert.net/examples/sentence_transformer/applications/retrieve_rerank/README.html)\r\n - [Sparse/Dense/Hybrid Retrieval](https://www.sbert.net/examples/sentence_transformer/applications/retrieve_rerank/README.html)\r\n\r\n- [Clustering](https://www.sbert.net/examples/sentence_transformer/applications/clustering/README.html)\r\n- [Paraphrase Mining](https://www.sbert.net/examples/sentence_transformer/applications/paraphrase-mining/README.html)\r\n- [Translated Sentence Mining](https://www.sbert.net/examples/sentence_transformer/applications/parallel-sentence-mining/README.html)\r\n- [Multilingual Image Search, Clustering & Duplicate Detection](https://www.sbert.net/examples/sentence_transformer/applications/image-search/README.html)\r\n\r\nand many more use-cases.\r\n\r\nFor all examples, see [examples/sentence_transformer/applications](https://github.com/UKPLab/sentence-transformers/tree/master/examples/sentence_transformer/applications).\r\n\r\n## Development setup\r\n\r\nAfter cloning the repo (or a fork) to your machine, in a virtual environment, run:\r\n\r\n```\r\npython -m pip install -e \".[dev]\"\r\n\r\npre-commit install\r\n```\r\n\r\nTo test your changes, run:\r\n\r\n```\r\npytest\r\n```\r\n\r\n## Citing & Authors\r\n\r\nIf you find this repository helpful, feel free to cite our publication [Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks](https://arxiv.org/abs/1908.10084):\r\n\r\n```bibtex \r\n@inproceedings{reimers-2019-sentence-bert,\r\n title = \"Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks\",\r\n author = \"Reimers, Nils and Gurevych, Iryna\",\r\n booktitle = \"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing\",\r\n month = \"11\",\r\n year = \"2019\",\r\n publisher = \"Association for Computational Linguistics\",\r\n url = \"https://arxiv.org/abs/1908.10084\",\r\n}\r\n```\r\n\r\nIf you use one of the multilingual models, feel free to cite our publication [Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation](https://arxiv.org/abs/2004.09813):\r\n\r\n```bibtex\r\n@inproceedings{reimers-2020-multilingual-sentence-bert,\r\n title = \"Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation\",\r\n author = \"Reimers, Nils and Gurevych, Iryna\",\r\n booktitle = \"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing\",\r\n month = \"11\",\r\n year = \"2020\",\r\n publisher = \"Association for Computational Linguistics\",\r\n url = \"https://arxiv.org/abs/2004.09813\",\r\n}\r\n```\r\n\r\nPlease have a look at [Publications](https://www.sbert.net/docs/publications.html) for our different publications that are integrated into SentenceTransformers.\r\n\r\nMaintainer: [Tom Aarsen](https://github.com/tomaarsen), \ud83e\udd17 Hugging Face\r\n\r\nhttps://www.ukp.tu-darmstadt.de/\r\n\r\nDon't hesitate to open an issue if something is broken (and it shouldn't be) or if you have further questions.\r\n\r\n> This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.\r\n",
"bugtrack_url": null,
"license": "Apache 2.0",
"summary": "Embeddings, Retrieval, and Reranking",
"version": "5.1.0",
"project_urls": {
"Homepage": "https://www.SBERT.net",
"Repository": "https://github.com/UKPLab/sentence-transformers/"
},
"split_keywords": [
"transformer networks",
" bert",
" xlnet",
" sentence embedding",
" pytorch",
" nlp",
" deep learning"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "6d702b5b76e98191ec3b8b0d1dde52d00ddcc3806799149a9ce987b0d2d31015",
"md5": "51316390ae4f43b9a96d5ac1ed04e047",
"sha256": "fc803929f6a3ce82e2b2c06e0efed7a36de535c633d5ce55efac0b710ea5643e"
},
"downloads": -1,
"filename": "sentence_transformers-5.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "51316390ae4f43b9a96d5ac1ed04e047",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 483377,
"upload_time": "2025-08-06T13:48:53",
"upload_time_iso_8601": "2025-08-06T13:48:53.627851Z",
"url": "https://files.pythonhosted.org/packages/6d/70/2b5b76e98191ec3b8b0d1dde52d00ddcc3806799149a9ce987b0d2d31015/sentence_transformers-5.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "46b81b99379b730bc403d8e9ddc2db56f8ac9ce743734b44a1dbeebb900490d4",
"md5": "668aa352a153d17b94c0cdf830216073",
"sha256": "70c7630697cc1c64ffca328d6e8688430ebd134b3c2df03dc07cb3a016b04739"
},
"downloads": -1,
"filename": "sentence_transformers-5.1.0.tar.gz",
"has_sig": false,
"md5_digest": "668aa352a153d17b94c0cdf830216073",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 370745,
"upload_time": "2025-08-06T13:48:55",
"upload_time_iso_8601": "2025-08-06T13:48:55.226670Z",
"url": "https://files.pythonhosted.org/packages/46/b8/1b99379b730bc403d8e9ddc2db56f8ac9ce743734b44a1dbeebb900490d4/sentence_transformers-5.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-06 13:48:55",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "UKPLab",
"github_project": "sentence-transformers",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "sentence-transformers"
}