Name | yasem JSON |
Version |
0.4.1
JSON |
| download |
home_page | None |
Summary | YASEM - Yet Another Splade|Sparse Embedder - A simple and efficient library for SPLADE embeddings |
upload_time | 2024-12-16 04:55:17 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.9 |
license | MIT |
keywords |
nlp
embeddings
splade
sparse-vectors
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
## YASEM (Yet Another Splade|Sparse Embedder)
YASEM is a simple and efficient library for executing SPLADE (Sparse Lexical and Expansion Model for Information Retrieval) and creating sparse vectors. It provides a straightforward interface inspired by [SentenceTransformers](https://sbert.net/) for easy integration into your projects.
## Why YASEM?
- Simplicity: YASEM focuses on providing a clean and simple implementation of SPLADE without unnecessary complexity.
- Efficiency: Generate sparse embeddings quickly and easily.
- Flexibility: Works with both NumPy and PyTorch backends.
- Convenience: Includes helpful utilities like get_token_values for inspecting feature representations.
## Installation
You can install YASEM using pip:
```bash
pip install yasem
```
## Quick Start
Here's a simple example of how to use YASEM:
```python
from yasem import SpladeEmbedder
# Initialize the embedder
embedder = SpladeEmbedder("naver/splade-v3")
# Prepare some sentences
sentences = [
"Hello, my dog is cute",
"Hello, my cat is cute",
"Hello, I like a ramen",
"Hello, I like a sushi",
]
# Generate embeddings
embeddings = embedder.encode(sentences)
# or sparse csr matrix
# embeddings = embedder.encode(sentences, convert_to_csr_matrix=True)
# Compute similarity
similarity = embedder.similarity(embeddings, embeddings)
print(similarity)
# [[148.62903569 106.88184372 18.86930016 22.87525314]
# [106.88184372 122.79656474 17.45339064 21.44758757]
# [ 18.86930016 17.45339064 61.00272733 40.92700849]
# [ 22.87525314 21.44758757 40.92700849 73.98511539]]
# Inspect token values for the first sentence
token_values = embedder.get_token_values(embeddings[0])
print(token_values)
# {'hello': 6.89453125, 'dog': 6.48828125, 'cute': 4.6015625,
# 'message': 2.38671875, 'greeting': 2.259765625,
# ...
token_values = embedder.get_token_values(embeddings[3])
print(token_values)
# {'##shi': 3.63671875, 'su': 3.470703125, 'eat': 3.25,
# 'hello': 2.73046875, 'you': 2.435546875, 'like': 2.26953125, 'taste': 1.8203125,
```
## rank API
```python
# Rank documents based on query
query = "What programming language is best for machine learning?"
documents = [
"Python is widely used in machine learning due to its extensive libraries like TensorFlow and PyTorch",
"JavaScript is primarily used for web development and front-end applications",
"SQL is essential for database management and data manipulation"
]
# Get ranked results with relevance scores
results = embedder.rank(query, documents)
print(results)
# [
# {'corpus_id': 0, 'score': 12.453}, # Python/ML document ranks highest
# {'corpus_id': 2, 'score': 5.234},
# {'corpus_id': 1, 'score': 3.123}
# ]
# Get ranked results including document text
results = embedder.rank(query, documents, return_documents=True)
print(results)
# [
# {
# 'corpus_id': 0,
# 'score': 12.453,
# 'text': 'Python is widely used in machine learning due to its extensive libraries like TensorFlow and PyTorch'
# },
# {
# 'corpus_id': 2,
# 'score': 5.234,
# 'text': 'SQL is essential for database management and data manipulation'
# },
# ...
# ]
```
## Features
- Easy-to-use API inspired by SentenceTransformers
- Support for both NumPy and scipy.sparse.csr_matrix
- Efficient dot product similarity computation
- Utility function to inspect token values in embeddings
## License
This project is licensed under the MIT License. See the LICENSE file for the full license text. Copyright (c) 2024 Yuichi Tateno (@hotchpotch)
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
## Acknowledgements
This library is inspired by the SPLADE model and aims to provide a simple interface for its usage. Special thanks to the authors of the original SPLADE paper and the developers of the model.
Raw data
{
"_id": null,
"home_page": null,
"name": "yasem",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "nlp, embeddings, splade, sparse-vectors",
"author": null,
"author_email": "Yuichi Tateno <hotchpotch@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/5c/37/39186e0ee0f8a9acb50e7bec1058d7291e7fac827719f8ed9fdb7e27429d/yasem-0.4.1.tar.gz",
"platform": null,
"description": "## YASEM (Yet Another Splade|Sparse Embedder)\n\nYASEM is a simple and efficient library for executing SPLADE (Sparse Lexical and Expansion Model for Information Retrieval) and creating sparse vectors. It provides a straightforward interface inspired by [SentenceTransformers](https://sbert.net/) for easy integration into your projects.\n\n## Why YASEM?\n\n- Simplicity: YASEM focuses on providing a clean and simple implementation of SPLADE without unnecessary complexity.\n- Efficiency: Generate sparse embeddings quickly and easily.\n- Flexibility: Works with both NumPy and PyTorch backends.\n- Convenience: Includes helpful utilities like get_token_values for inspecting feature representations.\n\n## Installation\n\nYou can install YASEM using pip:\n\n```bash\npip install yasem\n```\n\n## Quick Start\n\nHere's a simple example of how to use YASEM:\n\n```python\nfrom yasem import SpladeEmbedder\n\n# Initialize the embedder\nembedder = SpladeEmbedder(\"naver/splade-v3\")\n\n# Prepare some sentences\nsentences = [\n \"Hello, my dog is cute\",\n \"Hello, my cat is cute\",\n \"Hello, I like a ramen\",\n \"Hello, I like a sushi\",\n]\n\n# Generate embeddings\nembeddings = embedder.encode(sentences)\n# or sparse csr matrix\n# embeddings = embedder.encode(sentences, convert_to_csr_matrix=True)\n\n# Compute similarity\nsimilarity = embedder.similarity(embeddings, embeddings)\nprint(similarity)\n# [[148.62903569 106.88184372 18.86930016 22.87525314]\n# [106.88184372 122.79656474 17.45339064 21.44758757]\n# [ 18.86930016 17.45339064 61.00272733 40.92700849]\n# [ 22.87525314 21.44758757 40.92700849 73.98511539]]\n\n\n# Inspect token values for the first sentence\ntoken_values = embedder.get_token_values(embeddings[0])\nprint(token_values)\n# {'hello': 6.89453125, 'dog': 6.48828125, 'cute': 4.6015625,\n# 'message': 2.38671875, 'greeting': 2.259765625,\n# ...\n\ntoken_values = embedder.get_token_values(embeddings[3])\nprint(token_values)\n# {'##shi': 3.63671875, 'su': 3.470703125, 'eat': 3.25,\n# 'hello': 2.73046875, 'you': 2.435546875, 'like': 2.26953125, 'taste': 1.8203125,\n```\n\n## rank API\n\n```python\n# Rank documents based on query\nquery = \"What programming language is best for machine learning?\"\ndocuments = [\n \"Python is widely used in machine learning due to its extensive libraries like TensorFlow and PyTorch\",\n \"JavaScript is primarily used for web development and front-end applications\", \n \"SQL is essential for database management and data manipulation\"\n]\n\n# Get ranked results with relevance scores\nresults = embedder.rank(query, documents)\nprint(results)\n# [\n# {'corpus_id': 0, 'score': 12.453}, # Python/ML document ranks highest\n# {'corpus_id': 2, 'score': 5.234},\n# {'corpus_id': 1, 'score': 3.123}\n# ]\n\n# Get ranked results including document text\nresults = embedder.rank(query, documents, return_documents=True)\nprint(results) \n# [\n# {\n# 'corpus_id': 0,\n# 'score': 12.453,\n# 'text': 'Python is widely used in machine learning due to its extensive libraries like TensorFlow and PyTorch'\n# },\n# {\n# 'corpus_id': 2, \n# 'score': 5.234,\n# 'text': 'SQL is essential for database management and data manipulation'\n# },\n# ...\n# ]\n```\n\n## Features\n\n- Easy-to-use API inspired by SentenceTransformers\n- Support for both NumPy and scipy.sparse.csr_matrix\n- Efficient dot product similarity computation\n- Utility function to inspect token values in embeddings\n\n## License\n\nThis project is licensed under the MIT License. See the LICENSE file for the full license text. Copyright (c) 2024 Yuichi Tateno (@hotchpotch)\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## Acknowledgements\n\nThis library is inspired by the SPLADE model and aims to provide a simple interface for its usage. Special thanks to the authors of the original SPLADE paper and the developers of the model.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "YASEM - Yet Another Splade|Sparse Embedder - A simple and efficient library for SPLADE embeddings",
"version": "0.4.1",
"project_urls": {
"homepage": "https://github.com/hotchpotch/yasem",
"repository": "https://github.com/hotchpotch/yasem"
},
"split_keywords": [
"nlp",
" embeddings",
" splade",
" sparse-vectors"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "0d580c4c68b33a9a5773b62b272f6b3dc392cab01b25a2f18e92d2409be4c6c9",
"md5": "e2a659d6738e8fa6ae171495705cb41c",
"sha256": "86a59e6251ab82f029ff7f44cb4affe841f3f7d956b7e2fc24ffcd33e1b6df02"
},
"downloads": -1,
"filename": "yasem-0.4.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "e2a659d6738e8fa6ae171495705cb41c",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 7265,
"upload_time": "2024-12-16T04:55:14",
"upload_time_iso_8601": "2024-12-16T04:55:14.662860Z",
"url": "https://files.pythonhosted.org/packages/0d/58/0c4c68b33a9a5773b62b272f6b3dc392cab01b25a2f18e92d2409be4c6c9/yasem-0.4.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "5c3739186e0ee0f8a9acb50e7bec1058d7291e7fac827719f8ed9fdb7e27429d",
"md5": "2166784a8a0cd7216ed87f2a9da633bc",
"sha256": "db1b57feb4d8f4ca013954c2ea2167f4de2f66cc652ee0de0ce59cd24172eb1b"
},
"downloads": -1,
"filename": "yasem-0.4.1.tar.gz",
"has_sig": false,
"md5_digest": "2166784a8a0cd7216ed87f2a9da633bc",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 9140,
"upload_time": "2024-12-16T04:55:17",
"upload_time_iso_8601": "2024-12-16T04:55:17.365627Z",
"url": "https://files.pythonhosted.org/packages/5c/37/39186e0ee0f8a9acb50e7bec1058d7291e7fac827719f8ed9fdb7e27429d/yasem-0.4.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-16 04:55:17",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "hotchpotch",
"github_project": "yasem",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "yasem"
}