## YASEM (Yet Another Splade|Sparse Embedder)
YASEM is a simple and efficient library for executing SPLADE (Sparse Lexical and Expansion Model for Information Retrieval) and creating sparse vectors. It provides a straightforward interface inspired by [SentenceTransformers](https://sbert.net/) for easy integration into your projects.
## Why YASEM?
- Simplicity: YASEM focuses on providing a clean and simple implementation of SPLADE without unnecessary complexity.
- Efficiency: Generate sparse embeddings quickly and easily.
- Flexibility: Works with both NumPy and PyTorch backends.
- Convenience: Includes helpful utilities like get_token_values for inspecting feature representations.
## Installation
You can install YASEM using pip:
```bash
pip install yasem
```
## Quick Start
Here's a simple example of how to use YASEM:
```python
from yasem import SpladeEmbedder
# Initialize the embedder
embedder = SpladeEmbedder("naver/splade-v3")
# Prepare some sentences
sentences = [
"Hello, my dog is cute",
"Hello, my cat is cute",
"Hello, I like a ramen",
"Hello, I like a sushi",
]
# Generate embeddings
embeddings = embedder.encode(sentences)
# or sparse csr matrix
# embeddings = embedder.encode(sentences, convert_to_csr_matrix=True)
# Compute similarity
similarity = embedder.similarity(embeddings, embeddings)
print(similarity)
# [[148.62903569 106.88184372 18.86930016 22.87525314]
# [106.88184372 122.79656474 17.45339064 21.44758757]
# [ 18.86930016 17.45339064 61.00272733 40.92700849]
# [ 22.87525314 21.44758757 40.92700849 73.98511539]]
# Inspect token values for the first sentence
token_values = embedder.get_token_values(embeddings[0])
print(token_values)
# {'hello': 6.89453125, 'dog': 6.48828125, 'cute': 4.6015625,
# 'message': 2.38671875, 'greeting': 2.259765625,
# ...
token_values = embedder.get_token_values(embeddings[3])
print(token_values)
# {'##shi': 3.63671875, 'su': 3.470703125, 'eat': 3.25,
# 'hello': 2.73046875, 'you': 2.435546875, 'like': 2.26953125, 'taste': 1.8203125,
```
## Features
- Easy-to-use API inspired by SentenceTransformers
- Support for both NumPy and scipy.sparse.csr_matrix
- Efficient dot product similarity computation
- Utility function to inspect token values in embeddings
## License
This project is licensed under the MIT License. See the LICENSE file for the full license text. Copyright (c) 2024 Yuichi Tateno (@hotchpotch)
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
## Acknowledgements
This library is inspired by the SPLADE model and aims to provide a simple interface for its usage. Special thanks to the authors of the original SPLADE paper and the developers of the model.
Raw data
{
"_id": null,
"home_page": "https://github.com/hotchpotch/yasem",
"name": "yasem",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "nlp, embeddings, splade, sparse-vectors",
"author": "Yuichi Tateno",
"author_email": "hotchpotch@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/79/7f/3476fa0e1d6a22e1ad16f6622eff4f31d8e438b508a0b725185c5e35cd51/yasem-0.3.2.tar.gz",
"platform": null,
"description": "## YASEM (Yet Another Splade|Sparse Embedder)\n\nYASEM is a simple and efficient library for executing SPLADE (Sparse Lexical and Expansion Model for Information Retrieval) and creating sparse vectors. It provides a straightforward interface inspired by [SentenceTransformers](https://sbert.net/) for easy integration into your projects.\n\n## Why YASEM?\n\n- Simplicity: YASEM focuses on providing a clean and simple implementation of SPLADE without unnecessary complexity.\n- Efficiency: Generate sparse embeddings quickly and easily.\n- Flexibility: Works with both NumPy and PyTorch backends.\n- Convenience: Includes helpful utilities like get_token_values for inspecting feature representations.\n\n## Installation\n\nYou can install YASEM using pip:\n\n```bash\npip install yasem\n```\n\n## Quick Start\n\nHere's a simple example of how to use YASEM:\n\n```python\nfrom yasem import SpladeEmbedder\n\n# Initialize the embedder\nembedder = SpladeEmbedder(\"naver/splade-v3\")\n\n# Prepare some sentences\nsentences = [\n \"Hello, my dog is cute\",\n \"Hello, my cat is cute\",\n \"Hello, I like a ramen\",\n \"Hello, I like a sushi\",\n]\n\n# Generate embeddings\nembeddings = embedder.encode(sentences)\n# or sparse csr matrix\n# embeddings = embedder.encode(sentences, convert_to_csr_matrix=True)\n\n# Compute similarity\nsimilarity = embedder.similarity(embeddings, embeddings)\nprint(similarity)\n# [[148.62903569 106.88184372 18.86930016 22.87525314]\n# [106.88184372 122.79656474 17.45339064 21.44758757]\n# [ 18.86930016 17.45339064 61.00272733 40.92700849]\n# [ 22.87525314 21.44758757 40.92700849 73.98511539]]\n\n\n# Inspect token values for the first sentence\ntoken_values = embedder.get_token_values(embeddings[0])\nprint(token_values)\n# {'hello': 6.89453125, 'dog': 6.48828125, 'cute': 4.6015625,\n# 'message': 2.38671875, 'greeting': 2.259765625,\n# ...\n\ntoken_values = embedder.get_token_values(embeddings[3])\nprint(token_values)\n# {'##shi': 3.63671875, 'su': 3.470703125, 'eat': 3.25,\n# 'hello': 2.73046875, 'you': 2.435546875, 'like': 2.26953125, 'taste': 1.8203125,\n```\n\n## Features\n\n- Easy-to-use API inspired by SentenceTransformers\n- Support for both NumPy and scipy.sparse.csr_matrix\n- Efficient dot product similarity computation\n- Utility function to inspect token values in embeddings\n\n## License\n\nThis project is licensed under the MIT License. See the LICENSE file for the full license text. Copyright (c) 2024 Yuichi Tateno (@hotchpotch)\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## Acknowledgements\n\nThis library is inspired by the SPLADE model and aims to provide a simple interface for its usage. Special thanks to the authors of the original SPLADE paper and the developers of the model.",
"bugtrack_url": null,
"license": "MIT",
"summary": "YASEM - Yet Another Splade|Sparse Embedder - A simple and efficient library for SPLADE embeddings",
"version": "0.3.2",
"project_urls": {
"Homepage": "https://github.com/hotchpotch/yasem",
"Repository": "https://github.com/hotchpotch/yasem"
},
"split_keywords": [
"nlp",
" embeddings",
" splade",
" sparse-vectors"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "7d40f671cbf3b24effd63b73a0b34aa318e02e84aaa5d99c02102904d0391b89",
"md5": "27f46de51aa8356a36bb0e18a17d1bf4",
"sha256": "2be8ef108ebdbbfc0647c1e8d1546e3a533601823166e640d98e04cbdcdaf52c"
},
"downloads": -1,
"filename": "yasem-0.3.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "27f46de51aa8356a36bb0e18a17d1bf4",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 6149,
"upload_time": "2024-10-26T01:09:03",
"upload_time_iso_8601": "2024-10-26T01:09:03.222708Z",
"url": "https://files.pythonhosted.org/packages/7d/40/f671cbf3b24effd63b73a0b34aa318e02e84aaa5d99c02102904d0391b89/yasem-0.3.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "797f3476fa0e1d6a22e1ad16f6622eff4f31d8e438b508a0b725185c5e35cd51",
"md5": "8eee8687960857ff28e073b02678276c",
"sha256": "5c91b40651c09e0129bc116d87e95af8c7417d87ba2ce568532c1f3aa0d83278"
},
"downloads": -1,
"filename": "yasem-0.3.2.tar.gz",
"has_sig": false,
"md5_digest": "8eee8687960857ff28e073b02678276c",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 5567,
"upload_time": "2024-10-26T01:09:04",
"upload_time_iso_8601": "2024-10-26T01:09:04.974303Z",
"url": "https://files.pythonhosted.org/packages/79/7f/3476fa0e1d6a22e1ad16f6622eff4f31d8e438b508a0b725185c5e35cd51/yasem-0.3.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-26 01:09:04",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "hotchpotch",
"github_project": "yasem",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "yasem"
}