embeddings


Nameembeddings JSON
Version 0.0.8 PyPI version JSON
download
home_pagehttps://github.com/vzhong/embeddings
SummaryPretrained word embeddings in Python.
upload_time2020-02-11 20:47:25
maintainer
docs_urlNone
authorVictor Zhong
requires_python
licenseMIT
keywords text nlp machine-learning
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI
coveralls test coverage No coveralls.
            Embeddings
==========

.. image:: https://readthedocs.org/projects/embeddings/badge/?version=latest
    :target: http://embeddings.readthedocs.io/en/latest/?badge=latest
    :alt: Documentation Status
.. image:: https://travis-ci.org/vzhong/embeddings.svg?branch=master
    :target: https://travis-ci.org/vzhong/embeddings

Embeddings is a python package that provides pretrained word embeddings for natural language processing and machine learning.

Instead of loading a large file to query for embeddings, ``embeddings`` is backed by a database and fast to load and query:

.. code-block:: python

    >>> %timeit GloveEmbedding('common_crawl_840', d_emb=300)
    100 loops, best of 3: 12.7 ms per loop

    >>> %timeit GloveEmbedding('common_crawl_840', d_emb=300).emb('canada')
    100 loops, best of 3: 12.9 ms per loop

    >>> g = GloveEmbedding('common_crawl_840', d_emb=300)

    >>> %timeit -n1 g.emb('canada')
    1 loop, best of 3: 38.2 µs per loop


Installation
------------

.. code-block:: sh

    pip install embeddings  # from pypi
    pip install git+https://github.com/vzhong/embeddings.git  # from github


Usage
-----

Upon first use, the embeddings are first downloaded to disk in the form of a SQLite database.
This may take a long time for large embeddings such as GloVe.
Further usage of the embeddings are directly queried against the database.
Embedding databases are stored in the ``$EMBEDDINGS_ROOT`` directory (defaults to ``~/.embeddings``). Note that this location is probably **undesirable** if your home directory is on NFS, as it would slow down database queries significantly.


.. code-block:: python

    from embeddings import GloveEmbedding, FastTextEmbedding, KazumaCharEmbedding, ConcatEmbedding

    g = GloveEmbedding('common_crawl_840', d_emb=300, show_progress=True)
    f = FastTextEmbedding()
    k = KazumaCharEmbedding()
    c = ConcatEmbedding([g, f, k])
    for w in ['canada', 'vancouver', 'toronto']:
        print('embedding {}'.format(w))
        print(g.emb(w))
        print(f.emb(w))
        print(k.emb(w))
        print(c.emb(w))


Docker
------

If you use Docker, an image prepopulated with the Common Crawl 840 GloVe embeddings and Kazuma Hashimoto's character ngram embeddings is available at `vzhong/embeddings <https://hub.docker.com/r/vzhong/embeddings>`_.
To mount volumes from this container, set ``$EMBEDDINGS_ROOT`` in your container to ``/opt/embeddings``.

For example:

.. code-block:: bash

    docker run --volumes-from vzhong/embeddings -e EMBEDDINGS_ROOT='/opt/embeddings' myimage python train.py


Contribution
------------

Pull requests welcome!



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/vzhong/embeddings",
    "name": "embeddings",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "text nlp machine-learning",
    "author": "Victor Zhong",
    "author_email": "victor@victorzhong.com",
    "download_url": "https://files.pythonhosted.org/packages/b1/1f/2c6597fc0ecf694a0a6f9dd795935d85d9810c44da7ad0a506a7d021d746/embeddings-0.0.8.tar.gz",
    "platform": "",
    "description": "Embeddings\n==========\n\n.. image:: https://readthedocs.org/projects/embeddings/badge/?version=latest\n    :target: http://embeddings.readthedocs.io/en/latest/?badge=latest\n    :alt: Documentation Status\n.. image:: https://travis-ci.org/vzhong/embeddings.svg?branch=master\n    :target: https://travis-ci.org/vzhong/embeddings\n\nEmbeddings is a python package that provides pretrained word embeddings for natural language processing and machine learning.\n\nInstead of loading a large file to query for embeddings, ``embeddings`` is backed by a database and fast to load and query:\n\n.. code-block:: python\n\n    >>> %timeit GloveEmbedding('common_crawl_840', d_emb=300)\n    100 loops, best of 3: 12.7 ms per loop\n\n    >>> %timeit GloveEmbedding('common_crawl_840', d_emb=300).emb('canada')\n    100 loops, best of 3: 12.9 ms per loop\n\n    >>> g = GloveEmbedding('common_crawl_840', d_emb=300)\n\n    >>> %timeit -n1 g.emb('canada')\n    1 loop, best of 3: 38.2 \u00b5s per loop\n\n\nInstallation\n------------\n\n.. code-block:: sh\n\n    pip install embeddings  # from pypi\n    pip install git+https://github.com/vzhong/embeddings.git  # from github\n\n\nUsage\n-----\n\nUpon first use, the embeddings are first downloaded to disk in the form of a SQLite database.\nThis may take a long time for large embeddings such as GloVe.\nFurther usage of the embeddings are directly queried against the database.\nEmbedding databases are stored in the ``$EMBEDDINGS_ROOT`` directory (defaults to ``~/.embeddings``). Note that this location is probably **undesirable** if your home directory is on NFS, as it would slow down database queries significantly.\n\n\n.. code-block:: python\n\n    from embeddings import GloveEmbedding, FastTextEmbedding, KazumaCharEmbedding, ConcatEmbedding\n\n    g = GloveEmbedding('common_crawl_840', d_emb=300, show_progress=True)\n    f = FastTextEmbedding()\n    k = KazumaCharEmbedding()\n    c = ConcatEmbedding([g, f, k])\n    for w in ['canada', 'vancouver', 'toronto']:\n        print('embedding {}'.format(w))\n        print(g.emb(w))\n        print(f.emb(w))\n        print(k.emb(w))\n        print(c.emb(w))\n\n\nDocker\n------\n\nIf you use Docker, an image prepopulated with the Common Crawl 840 GloVe embeddings and Kazuma Hashimoto's character ngram embeddings is available at `vzhong/embeddings <https://hub.docker.com/r/vzhong/embeddings>`_.\nTo mount volumes from this container, set ``$EMBEDDINGS_ROOT`` in your container to ``/opt/embeddings``.\n\nFor example:\n\n.. code-block:: bash\n\n    docker run --volumes-from vzhong/embeddings -e EMBEDDINGS_ROOT='/opt/embeddings' myimage python train.py\n\n\nContribution\n------------\n\nPull requests welcome!\n\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Pretrained word embeddings in Python.",
    "version": "0.0.8",
    "split_keywords": [
        "text",
        "nlp",
        "machine-learning"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "5301150cdaafa9a5c6c430ca245020d2",
                "sha256": "5dccf752f88d33804c1c86a146dccc7c2fc554239bfb89086dfd490070daab65"
            },
            "downloads": -1,
            "filename": "embeddings-0.0.8-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "5301150cdaafa9a5c6c430ca245020d2",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 12251,
            "upload_time": "2020-02-11T20:47:23",
            "upload_time_iso_8601": "2020-02-11T20:47:23.935711Z",
            "url": "https://files.pythonhosted.org/packages/bd/da/55d07bcdaac48b293aa88d797be3d89f6b960e2f71565dd64204fa0b6a4f/embeddings-0.0.8-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "be4bb4444f5fbdc7e2d2d7fb5d19fc7a",
                "sha256": "53e95fbbc737ef9d9bb171b22f126e011fe15f959e692ba6bb2ad0f808370d7a"
            },
            "downloads": -1,
            "filename": "embeddings-0.0.8.tar.gz",
            "has_sig": false,
            "md5_digest": "be4bb4444f5fbdc7e2d2d7fb5d19fc7a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 8596,
            "upload_time": "2020-02-11T20:47:25",
            "upload_time_iso_8601": "2020-02-11T20:47:25.607317Z",
            "url": "https://files.pythonhosted.org/packages/b1/1f/2c6597fc0ecf694a0a6f9dd795935d85d9810c44da7ad0a506a7d021d746/embeddings-0.0.8.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2020-02-11 20:47:25",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "vzhong",
    "github_project": "embeddings",
    "travis_ci": true,
    "coveralls": false,
    "github_actions": false,
    "lcname": "embeddings"
}
        
Elapsed time: 0.02016s