YASE


NameYASE JSON
Version 1.0.0 PyPI version JSON
download
home_pagehttps://github.com/VHRanger/YASE/
Summary
upload_time2023-07-06 01:21:52
maintainer
docs_urlNone
authorMatt Ranger
requires_python
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Yet Another Sentence Embedding Library

The goal of this library is to make it easy to transform lists of sentences or sets of sentences into a matrix of embeddings (eg. one per sentence). This can be done either at the sentence/document level or by grouping sentence embeddings into grouped embeddings.

Such matrices of documents can easily be queried using kd-trees (see notebook in examples) for the most similar document in training data to a queried sentence. It can also be used to cluster document groups together solely by the text in the campaign.

The results can be tested for quality on a handcrafted evaluation dataset by checking how well the sentence embeddings cluster around the natural clusters of the existing ad campaigns.


### (Gensim) Weighed Sentence Embeddings with Gensim model
```python
    import gensim.downloader as model_api
    import yase
    # Load pretrained gensim model
    model = model_api.load("glove-wiki-gigaword-300")
    # Tokenize list of sentences 
    tokens = yase.tokenize(data, lower=True, split=True)
    # get word weights for higher quality embeddings
    weights = yase.getWordWeights(data, "tf-idf")
    # create sentence embeddings from tokens
    my_embeddings = embedding.sentenceEmbedding(tokens, model, weights)
```


## Running unit tests
```
python -m unittest discover tests
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/VHRanger/YASE/",
    "name": "YASE",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "",
    "author": "Matt Ranger",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/81/9c/9653e79e139db5694c0d002cdabdd87b1c7d4d43d1daa50d2c638259903b/YASE-1.0.0.tar.gz",
    "platform": null,
    "description": "# Yet Another Sentence Embedding Library\n\nThe goal of this library is to make it easy to transform lists of sentences or sets of sentences into a matrix of embeddings (eg. one per sentence). This can be done either at the sentence/document level or by grouping sentence embeddings into grouped embeddings.\n\nSuch matrices of documents can easily be queried using kd-trees (see notebook in examples) for the most similar document in training data to a queried sentence. It can also be used to cluster document groups together solely by the text in the campaign.\n\nThe results can be tested for quality on a handcrafted evaluation dataset by checking how well the sentence embeddings cluster around the natural clusters of the existing ad campaigns.\n\n\n### (Gensim) Weighed Sentence Embeddings with Gensim model\n```python\n    import gensim.downloader as model_api\n    import yase\n    # Load pretrained gensim model\n    model = model_api.load(\"glove-wiki-gigaword-300\")\n    # Tokenize list of sentences \n    tokens = yase.tokenize(data, lower=True, split=True)\n    # get word weights for higher quality embeddings\n    weights = yase.getWordWeights(data, \"tf-idf\")\n    # create sentence embeddings from tokens\n    my_embeddings = embedding.sentenceEmbedding(tokens, model, weights)\n```\n\n\n## Running unit tests\n```\npython -m unittest discover tests\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "",
    "version": "1.0.0",
    "project_urls": {
        "Homepage": "https://github.com/VHRanger/YASE/"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3dd79766b7466a639a0a769156bf628c78644f3e226569d6890def8238856658",
                "md5": "7ec7a230e64ff9c0a72acd0bbb5e12a1",
                "sha256": "0c9de948d6b6aae6681d6de386797dc8bb98b96031bff4b42ae6a4887112d1ee"
            },
            "downloads": -1,
            "filename": "YASE-1.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "7ec7a230e64ff9c0a72acd0bbb5e12a1",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 9830,
            "upload_time": "2023-07-06T01:21:51",
            "upload_time_iso_8601": "2023-07-06T01:21:51.409852Z",
            "url": "https://files.pythonhosted.org/packages/3d/d7/9766b7466a639a0a769156bf628c78644f3e226569d6890def8238856658/YASE-1.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "819c9653e79e139db5694c0d002cdabdd87b1c7d4d43d1daa50d2c638259903b",
                "md5": "2fdc6ff3680c829bb0d776784cb7a847",
                "sha256": "b5be337252240f3662018b9d44d5a66a3f626983335c7291f2a4dfa622c9549f"
            },
            "downloads": -1,
            "filename": "YASE-1.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "2fdc6ff3680c829bb0d776784cb7a847",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 13127,
            "upload_time": "2023-07-06T01:21:52",
            "upload_time_iso_8601": "2023-07-06T01:21:52.777183Z",
            "url": "https://files.pythonhosted.org/packages/81/9c/9653e79e139db5694c0d002cdabdd87b1c7d4d43d1daa50d2c638259903b/YASE-1.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-07-06 01:21:52",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "VHRanger",
    "github_project": "YASE",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "yase"
}
        
Elapsed time: 0.09191s