yasem


Nameyasem JSON
Version 0.3.2 PyPI version JSON
download
home_pagehttps://github.com/hotchpotch/yasem
SummaryYASEM - Yet Another Splade|Sparse Embedder - A simple and efficient library for SPLADE embeddings
upload_time2024-10-26 01:09:04
maintainerNone
docs_urlNone
authorYuichi Tateno
requires_python>=3.9
licenseMIT
keywords nlp embeddings splade sparse-vectors
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ## YASEM (Yet Another Splade|Sparse Embedder)

YASEM is a simple and efficient library for executing SPLADE (Sparse Lexical and Expansion Model for Information Retrieval) and creating sparse vectors. It provides a straightforward interface inspired by [SentenceTransformers](https://sbert.net/) for easy integration into your projects.

## Why YASEM?

- Simplicity: YASEM focuses on providing a clean and simple implementation of SPLADE without unnecessary complexity.
- Efficiency: Generate sparse embeddings quickly and easily.
- Flexibility: Works with both NumPy and PyTorch backends.
- Convenience: Includes helpful utilities like get_token_values for inspecting feature representations.

## Installation

You can install YASEM using pip:

```bash
pip install yasem
```

## Quick Start

Here's a simple example of how to use YASEM:

```python
from yasem import SpladeEmbedder

# Initialize the embedder
embedder = SpladeEmbedder("naver/splade-v3")

# Prepare some sentences
sentences = [
    "Hello, my dog is cute",
    "Hello, my cat is cute",
    "Hello, I like a ramen",
    "Hello, I like a sushi",
]

# Generate embeddings
embeddings = embedder.encode(sentences)
# or sparse csr matrix
# embeddings = embedder.encode(sentences, convert_to_csr_matrix=True)

# Compute similarity
similarity = embedder.similarity(embeddings, embeddings)
print(similarity)
# [[148.62903569 106.88184372  18.86930016  22.87525314]
#  [106.88184372 122.79656474  17.45339064  21.44758757]
#  [ 18.86930016  17.45339064  61.00272733  40.92700849]
#  [ 22.87525314  21.44758757  40.92700849  73.98511539]]


# Inspect token values for the first sentence
token_values = embedder.get_token_values(embeddings[0])
print(token_values)
# {'hello': 6.89453125, 'dog': 6.48828125, 'cute': 4.6015625,
#  'message': 2.38671875, 'greeting': 2.259765625,
#    ...

token_values = embedder.get_token_values(embeddings[3])
print(token_values)
# {'##shi': 3.63671875, 'su': 3.470703125, 'eat': 3.25,
#  'hello': 2.73046875, 'you': 2.435546875, 'like': 2.26953125, 'taste': 1.8203125,
```

## Features

- Easy-to-use API inspired by SentenceTransformers
- Support for both NumPy and scipy.sparse.csr_matrix
- Efficient dot product similarity computation
- Utility function to inspect token values in embeddings

## License

This project is licensed under the MIT License. See the LICENSE file for the full license text. Copyright (c) 2024 Yuichi Tateno (@hotchpotch)

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## Acknowledgements

This library is inspired by the SPLADE model and aims to provide a simple interface for its usage. Special thanks to the authors of the original SPLADE paper and the developers of the model.
            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/hotchpotch/yasem",
    "name": "yasem",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "nlp, embeddings, splade, sparse-vectors",
    "author": "Yuichi Tateno",
    "author_email": "hotchpotch@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/79/7f/3476fa0e1d6a22e1ad16f6622eff4f31d8e438b508a0b725185c5e35cd51/yasem-0.3.2.tar.gz",
    "platform": null,
    "description": "## YASEM (Yet Another Splade|Sparse Embedder)\n\nYASEM is a simple and efficient library for executing SPLADE (Sparse Lexical and Expansion Model for Information Retrieval) and creating sparse vectors. It provides a straightforward interface inspired by [SentenceTransformers](https://sbert.net/) for easy integration into your projects.\n\n## Why YASEM?\n\n- Simplicity: YASEM focuses on providing a clean and simple implementation of SPLADE without unnecessary complexity.\n- Efficiency: Generate sparse embeddings quickly and easily.\n- Flexibility: Works with both NumPy and PyTorch backends.\n- Convenience: Includes helpful utilities like get_token_values for inspecting feature representations.\n\n## Installation\n\nYou can install YASEM using pip:\n\n```bash\npip install yasem\n```\n\n## Quick Start\n\nHere's a simple example of how to use YASEM:\n\n```python\nfrom yasem import SpladeEmbedder\n\n# Initialize the embedder\nembedder = SpladeEmbedder(\"naver/splade-v3\")\n\n# Prepare some sentences\nsentences = [\n    \"Hello, my dog is cute\",\n    \"Hello, my cat is cute\",\n    \"Hello, I like a ramen\",\n    \"Hello, I like a sushi\",\n]\n\n# Generate embeddings\nembeddings = embedder.encode(sentences)\n# or sparse csr matrix\n# embeddings = embedder.encode(sentences, convert_to_csr_matrix=True)\n\n# Compute similarity\nsimilarity = embedder.similarity(embeddings, embeddings)\nprint(similarity)\n# [[148.62903569 106.88184372  18.86930016  22.87525314]\n#  [106.88184372 122.79656474  17.45339064  21.44758757]\n#  [ 18.86930016  17.45339064  61.00272733  40.92700849]\n#  [ 22.87525314  21.44758757  40.92700849  73.98511539]]\n\n\n# Inspect token values for the first sentence\ntoken_values = embedder.get_token_values(embeddings[0])\nprint(token_values)\n# {'hello': 6.89453125, 'dog': 6.48828125, 'cute': 4.6015625,\n#  'message': 2.38671875, 'greeting': 2.259765625,\n#    ...\n\ntoken_values = embedder.get_token_values(embeddings[3])\nprint(token_values)\n# {'##shi': 3.63671875, 'su': 3.470703125, 'eat': 3.25,\n#  'hello': 2.73046875, 'you': 2.435546875, 'like': 2.26953125, 'taste': 1.8203125,\n```\n\n## Features\n\n- Easy-to-use API inspired by SentenceTransformers\n- Support for both NumPy and scipy.sparse.csr_matrix\n- Efficient dot product similarity computation\n- Utility function to inspect token values in embeddings\n\n## License\n\nThis project is licensed under the MIT License. See the LICENSE file for the full license text. Copyright (c) 2024 Yuichi Tateno (@hotchpotch)\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## Acknowledgements\n\nThis library is inspired by the SPLADE model and aims to provide a simple interface for its usage. Special thanks to the authors of the original SPLADE paper and the developers of the model.",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "YASEM - Yet Another Splade|Sparse Embedder - A simple and efficient library for SPLADE embeddings",
    "version": "0.3.2",
    "project_urls": {
        "Homepage": "https://github.com/hotchpotch/yasem",
        "Repository": "https://github.com/hotchpotch/yasem"
    },
    "split_keywords": [
        "nlp",
        " embeddings",
        " splade",
        " sparse-vectors"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7d40f671cbf3b24effd63b73a0b34aa318e02e84aaa5d99c02102904d0391b89",
                "md5": "27f46de51aa8356a36bb0e18a17d1bf4",
                "sha256": "2be8ef108ebdbbfc0647c1e8d1546e3a533601823166e640d98e04cbdcdaf52c"
            },
            "downloads": -1,
            "filename": "yasem-0.3.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "27f46de51aa8356a36bb0e18a17d1bf4",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 6149,
            "upload_time": "2024-10-26T01:09:03",
            "upload_time_iso_8601": "2024-10-26T01:09:03.222708Z",
            "url": "https://files.pythonhosted.org/packages/7d/40/f671cbf3b24effd63b73a0b34aa318e02e84aaa5d99c02102904d0391b89/yasem-0.3.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "797f3476fa0e1d6a22e1ad16f6622eff4f31d8e438b508a0b725185c5e35cd51",
                "md5": "8eee8687960857ff28e073b02678276c",
                "sha256": "5c91b40651c09e0129bc116d87e95af8c7417d87ba2ce568532c1f3aa0d83278"
            },
            "downloads": -1,
            "filename": "yasem-0.3.2.tar.gz",
            "has_sig": false,
            "md5_digest": "8eee8687960857ff28e073b02678276c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 5567,
            "upload_time": "2024-10-26T01:09:04",
            "upload_time_iso_8601": "2024-10-26T01:09:04.974303Z",
            "url": "https://files.pythonhosted.org/packages/79/7f/3476fa0e1d6a22e1ad16f6622eff4f31d8e438b508a0b725185c5e35cd51/yasem-0.3.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-26 01:09:04",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "hotchpotch",
    "github_project": "yasem",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "yasem"
}
        
Elapsed time: 0.43198s