# Yet Another Sentence Embedding Library
The goal of this library is to make it easy to transform lists of sentences or sets of sentences into a matrix of embeddings (eg. one per sentence). This can be done either at the sentence/document level or by grouping sentence embeddings into grouped embeddings.
Such matrices of documents can easily be queried using kd-trees (see notebook in examples) for the most similar document in training data to a queried sentence. It can also be used to cluster document groups together solely by the text in the campaign.
The results can be tested for quality on a handcrafted evaluation dataset by checking how well the sentence embeddings cluster around the natural clusters of the existing ad campaigns.
### (Gensim) Weighed Sentence Embeddings with Gensim model
```python
import gensim.downloader as model_api
import yase
# Load pretrained gensim model
model = model_api.load("glove-wiki-gigaword-300")
# Tokenize list of sentences
tokens = yase.tokenize(data, lower=True, split=True)
# get word weights for higher quality embeddings
weights = yase.getWordWeights(data, "tf-idf")
# create sentence embeddings from tokens
my_embeddings = embedding.sentenceEmbedding(tokens, model, weights)
```
## Running unit tests
```
python -m unittest discover tests
```
Raw data
{
"_id": null,
"home_page": "https://github.com/VHRanger/YASE/",
"name": "YASE",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "",
"author": "Matt Ranger",
"author_email": "",
"download_url": "https://files.pythonhosted.org/packages/81/9c/9653e79e139db5694c0d002cdabdd87b1c7d4d43d1daa50d2c638259903b/YASE-1.0.0.tar.gz",
"platform": null,
"description": "# Yet Another Sentence Embedding Library\n\nThe goal of this library is to make it easy to transform lists of sentences or sets of sentences into a matrix of embeddings (eg. one per sentence). This can be done either at the sentence/document level or by grouping sentence embeddings into grouped embeddings.\n\nSuch matrices of documents can easily be queried using kd-trees (see notebook in examples) for the most similar document in training data to a queried sentence. It can also be used to cluster document groups together solely by the text in the campaign.\n\nThe results can be tested for quality on a handcrafted evaluation dataset by checking how well the sentence embeddings cluster around the natural clusters of the existing ad campaigns.\n\n\n### (Gensim) Weighed Sentence Embeddings with Gensim model\n```python\n import gensim.downloader as model_api\n import yase\n # Load pretrained gensim model\n model = model_api.load(\"glove-wiki-gigaword-300\")\n # Tokenize list of sentences \n tokens = yase.tokenize(data, lower=True, split=True)\n # get word weights for higher quality embeddings\n weights = yase.getWordWeights(data, \"tf-idf\")\n # create sentence embeddings from tokens\n my_embeddings = embedding.sentenceEmbedding(tokens, model, weights)\n```\n\n\n## Running unit tests\n```\npython -m unittest discover tests\n```\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "",
"version": "1.0.0",
"project_urls": {
"Homepage": "https://github.com/VHRanger/YASE/"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "3dd79766b7466a639a0a769156bf628c78644f3e226569d6890def8238856658",
"md5": "7ec7a230e64ff9c0a72acd0bbb5e12a1",
"sha256": "0c9de948d6b6aae6681d6de386797dc8bb98b96031bff4b42ae6a4887112d1ee"
},
"downloads": -1,
"filename": "YASE-1.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "7ec7a230e64ff9c0a72acd0bbb5e12a1",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 9830,
"upload_time": "2023-07-06T01:21:51",
"upload_time_iso_8601": "2023-07-06T01:21:51.409852Z",
"url": "https://files.pythonhosted.org/packages/3d/d7/9766b7466a639a0a769156bf628c78644f3e226569d6890def8238856658/YASE-1.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "819c9653e79e139db5694c0d002cdabdd87b1c7d4d43d1daa50d2c638259903b",
"md5": "2fdc6ff3680c829bb0d776784cb7a847",
"sha256": "b5be337252240f3662018b9d44d5a66a3f626983335c7291f2a4dfa622c9549f"
},
"downloads": -1,
"filename": "YASE-1.0.0.tar.gz",
"has_sig": false,
"md5_digest": "2fdc6ff3680c829bb0d776784cb7a847",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 13127,
"upload_time": "2023-07-06T01:21:52",
"upload_time_iso_8601": "2023-07-06T01:21:52.777183Z",
"url": "https://files.pythonhosted.org/packages/81/9c/9653e79e139db5694c0d002cdabdd87b1c7d4d43d1daa50d2c638259903b/YASE-1.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-07-06 01:21:52",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "VHRanger",
"github_project": "YASE",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "yase"
}