yasem


Nameyasem JSON
Version 0.1.2 PyPI version JSON
download
home_pagehttps://github.com/hotchpotch/yasem
SummaryYASEM - Yet Another Splade|Sparse Embedder - A simple and efficient library for SPLADE embeddings
upload_time2024-09-09 01:26:01
maintainerNone
docs_urlNone
authorYuichi Tateno
requires_python>=3.9
licenseMIT
keywords nlp embeddings splade sparse-vectors
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ## YASEM (Yet Another Splade|Sparse Embedder)

YASEM is a simple and efficient library for executing SPLADE (Sparse Lexical and Expansion Model for Information Retrieval) and creating sparse vectors. It provides a straightforward interface inspired by [SentenceTransformers](https://sbert.net/) for easy integration into your projects.

## Why YASEM?

- Simplicity: YASEM focuses on providing a clean and simple implementation of SPLADE without unnecessary complexity.
- Efficiency: Generate sparse embeddings quickly and easily.
- Flexibility: Works with both NumPy and PyTorch backends.
- Convenience: Includes helpful utilities like get_token_values for inspecting feature representations.

## Installation

You can install YASEM using pip:

```bash
pip install yasem
```

## Quick Start

Here's a simple example of how to use YASEM:

```python
from yasem import SpladeEmbedder

# Initialize the embedder
embedder = SpladeEmbedder("naver/splade-v3")

# Prepare some sentences
sentences = [
    "Hello, my dog is cute",
    "Hello, my cat is cute",
    "Hello, I like a ramen",
    "Hello, I like a sushi",
]

# Generate embeddings
embeddings = embedder.encode(sentences)

# Compute similarity (dot product)
similarity = embedder.similarity(embeddings, embeddings)
print(similarity)
# [[148.62903569 106.88184372  18.86930016  22.87525314]
#  [106.88184372 122.79656474  17.45339064  21.44758757]
#  [ 18.86930016  17.45339064  61.00272733  40.92700849]
#  [ 22.87525314  21.44758757  40.92700849  73.98511539]]

# Inspect token values for the first sentence
token_values = embedder.get_token_values(embeddings[0])
print(token_values)
# {'hello': 6.89453125, 'dog': 6.48828125, 'cute': 4.6015625,
#  'message': 2.38671875, 'greeting': 2.259765625,
#    ...

token_values = embedder.get_token_values(embeddings[3])
print(token_values)
# {'##shi': 3.63671875, 'su': 3.470703125, 'eat': 3.25,
#  'hello': 2.73046875, 'you': 2.435546875, 'like': 2.26953125, 'taste': 1.8203125,
```

## Features

- Easy-to-use API inspired by SentenceTransformers
- Support for both NumPy and PyTorch tensors
- Efficient dot product similarity computation
- Utility function to inspect token values in embeddings

## License

This project is licensed under the MIT License. See the LICENSE file for the full license text. Copyright (c) 2024 Yuichi Tateno (@hotchpotch)

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## Acknowledgements

This library is inspired by the SPLADE model and aims to provide a simple interface for its usage. Special thanks to the authors of the original SPLADE paper and the developers of the model.
            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/hotchpotch/yasem",
    "name": "yasem",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "nlp, embeddings, splade, sparse-vectors",
    "author": "Yuichi Tateno",
    "author_email": "hotchpotch@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/8f/a8/fde36ef8266d21c91f19402ad1fbf92f339ae4cf2c18cc09f2614431cfb6/yasem-0.1.2.tar.gz",
    "platform": null,
    "description": "## YASEM (Yet Another Splade|Sparse Embedder)\n\nYASEM is a simple and efficient library for executing SPLADE (Sparse Lexical and Expansion Model for Information Retrieval) and creating sparse vectors. It provides a straightforward interface inspired by [SentenceTransformers](https://sbert.net/) for easy integration into your projects.\n\n## Why YASEM?\n\n- Simplicity: YASEM focuses on providing a clean and simple implementation of SPLADE without unnecessary complexity.\n- Efficiency: Generate sparse embeddings quickly and easily.\n- Flexibility: Works with both NumPy and PyTorch backends.\n- Convenience: Includes helpful utilities like get_token_values for inspecting feature representations.\n\n## Installation\n\nYou can install YASEM using pip:\n\n```bash\npip install yasem\n```\n\n## Quick Start\n\nHere's a simple example of how to use YASEM:\n\n```python\nfrom yasem import SpladeEmbedder\n\n# Initialize the embedder\nembedder = SpladeEmbedder(\"naver/splade-v3\")\n\n# Prepare some sentences\nsentences = [\n    \"Hello, my dog is cute\",\n    \"Hello, my cat is cute\",\n    \"Hello, I like a ramen\",\n    \"Hello, I like a sushi\",\n]\n\n# Generate embeddings\nembeddings = embedder.encode(sentences)\n\n# Compute similarity (dot product)\nsimilarity = embedder.similarity(embeddings, embeddings)\nprint(similarity)\n# [[148.62903569 106.88184372  18.86930016  22.87525314]\n#  [106.88184372 122.79656474  17.45339064  21.44758757]\n#  [ 18.86930016  17.45339064  61.00272733  40.92700849]\n#  [ 22.87525314  21.44758757  40.92700849  73.98511539]]\n\n# Inspect token values for the first sentence\ntoken_values = embedder.get_token_values(embeddings[0])\nprint(token_values)\n# {'hello': 6.89453125, 'dog': 6.48828125, 'cute': 4.6015625,\n#  'message': 2.38671875, 'greeting': 2.259765625,\n#    ...\n\ntoken_values = embedder.get_token_values(embeddings[3])\nprint(token_values)\n# {'##shi': 3.63671875, 'su': 3.470703125, 'eat': 3.25,\n#  'hello': 2.73046875, 'you': 2.435546875, 'like': 2.26953125, 'taste': 1.8203125,\n```\n\n## Features\n\n- Easy-to-use API inspired by SentenceTransformers\n- Support for both NumPy and PyTorch tensors\n- Efficient dot product similarity computation\n- Utility function to inspect token values in embeddings\n\n## License\n\nThis project is licensed under the MIT License. See the LICENSE file for the full license text. Copyright (c) 2024 Yuichi Tateno (@hotchpotch)\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## Acknowledgements\n\nThis library is inspired by the SPLADE model and aims to provide a simple interface for its usage. Special thanks to the authors of the original SPLADE paper and the developers of the model.",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "YASEM - Yet Another Splade|Sparse Embedder - A simple and efficient library for SPLADE embeddings",
    "version": "0.1.2",
    "project_urls": {
        "Homepage": "https://github.com/hotchpotch/yasem",
        "Repository": "https://github.com/hotchpotch/yasem"
    },
    "split_keywords": [
        "nlp",
        " embeddings",
        " splade",
        " sparse-vectors"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3221fe69fefac6b7e501c570b1870383a5e6f81a35c5fc9111a48f2d207e7b9d",
                "md5": "772c5f955f93c4247c20f116213f2cc8",
                "sha256": "1d35b9658654442fe54f6093db0974b6bfffd73f33384db5a83bbf708d4a36cc"
            },
            "downloads": -1,
            "filename": "yasem-0.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "772c5f955f93c4247c20f116213f2cc8",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 5186,
            "upload_time": "2024-09-09T01:25:59",
            "upload_time_iso_8601": "2024-09-09T01:25:59.595254Z",
            "url": "https://files.pythonhosted.org/packages/32/21/fe69fefac6b7e501c570b1870383a5e6f81a35c5fc9111a48f2d207e7b9d/yasem-0.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8fa8fde36ef8266d21c91f19402ad1fbf92f339ae4cf2c18cc09f2614431cfb6",
                "md5": "c50f05ee18f46a006cc98284a38e8821",
                "sha256": "95288783a462f0ad02faf49563d0832e6ba494e9bd4cc1b8c0b9c9d99b6b14f9"
            },
            "downloads": -1,
            "filename": "yasem-0.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "c50f05ee18f46a006cc98284a38e8821",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 4653,
            "upload_time": "2024-09-09T01:26:01",
            "upload_time_iso_8601": "2024-09-09T01:26:01.207662Z",
            "url": "https://files.pythonhosted.org/packages/8f/a8/fde36ef8266d21c91f19402ad1fbf92f339ae4cf2c18cc09f2614431cfb6/yasem-0.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-09 01:26:01",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "hotchpotch",
    "github_project": "yasem",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "yasem"
}
        
Elapsed time: 0.35329s