Embeddings
==========
.. image:: https://readthedocs.org/projects/embeddings/badge/?version=latest
:target: http://embeddings.readthedocs.io/en/latest/?badge=latest
:alt: Documentation Status
.. image:: https://travis-ci.org/vzhong/embeddings.svg?branch=master
:target: https://travis-ci.org/vzhong/embeddings
Embeddings is a python package that provides pretrained word embeddings for natural language processing and machine learning.
Instead of loading a large file to query for embeddings, ``embeddings`` is backed by a database and fast to load and query:
.. code-block:: python
>>> %timeit GloveEmbedding('common_crawl_840', d_emb=300)
100 loops, best of 3: 12.7 ms per loop
>>> %timeit GloveEmbedding('common_crawl_840', d_emb=300).emb('canada')
100 loops, best of 3: 12.9 ms per loop
>>> g = GloveEmbedding('common_crawl_840', d_emb=300)
>>> %timeit -n1 g.emb('canada')
1 loop, best of 3: 38.2 µs per loop
Installation
------------
.. code-block:: sh
pip install embeddings # from pypi
pip install git+https://github.com/vzhong/embeddings.git # from github
Usage
-----
Upon first use, the embeddings are first downloaded to disk in the form of a SQLite database.
This may take a long time for large embeddings such as GloVe.
Further usage of the embeddings are directly queried against the database.
Embedding databases are stored in the ``$EMBEDDINGS_ROOT`` directory (defaults to ``~/.embeddings``). Note that this location is probably **undesirable** if your home directory is on NFS, as it would slow down database queries significantly.
.. code-block:: python
from embeddings import GloveEmbedding, FastTextEmbedding, KazumaCharEmbedding, ConcatEmbedding
g = GloveEmbedding('common_crawl_840', d_emb=300, show_progress=True)
f = FastTextEmbedding()
k = KazumaCharEmbedding()
c = ConcatEmbedding([g, f, k])
for w in ['canada', 'vancouver', 'toronto']:
print('embedding {}'.format(w))
print(g.emb(w))
print(f.emb(w))
print(k.emb(w))
print(c.emb(w))
Docker
------
If you use Docker, an image prepopulated with the Common Crawl 840 GloVe embeddings and Kazuma Hashimoto's character ngram embeddings is available at `vzhong/embeddings <https://hub.docker.com/r/vzhong/embeddings>`_.
To mount volumes from this container, set ``$EMBEDDINGS_ROOT`` in your container to ``/opt/embeddings``.
For example:
.. code-block:: bash
docker run --volumes-from vzhong/embeddings -e EMBEDDINGS_ROOT='/opt/embeddings' myimage python train.py
Contribution
------------
Pull requests welcome!
Raw data
{
"_id": null,
"home_page": "https://github.com/vzhong/embeddings",
"name": "embeddings",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "text nlp machine-learning",
"author": "Victor Zhong",
"author_email": "victor@victorzhong.com",
"download_url": "https://files.pythonhosted.org/packages/b1/1f/2c6597fc0ecf694a0a6f9dd795935d85d9810c44da7ad0a506a7d021d746/embeddings-0.0.8.tar.gz",
"platform": "",
"description": "Embeddings\n==========\n\n.. image:: https://readthedocs.org/projects/embeddings/badge/?version=latest\n :target: http://embeddings.readthedocs.io/en/latest/?badge=latest\n :alt: Documentation Status\n.. image:: https://travis-ci.org/vzhong/embeddings.svg?branch=master\n :target: https://travis-ci.org/vzhong/embeddings\n\nEmbeddings is a python package that provides pretrained word embeddings for natural language processing and machine learning.\n\nInstead of loading a large file to query for embeddings, ``embeddings`` is backed by a database and fast to load and query:\n\n.. code-block:: python\n\n >>> %timeit GloveEmbedding('common_crawl_840', d_emb=300)\n 100 loops, best of 3: 12.7 ms per loop\n\n >>> %timeit GloveEmbedding('common_crawl_840', d_emb=300).emb('canada')\n 100 loops, best of 3: 12.9 ms per loop\n\n >>> g = GloveEmbedding('common_crawl_840', d_emb=300)\n\n >>> %timeit -n1 g.emb('canada')\n 1 loop, best of 3: 38.2 \u00b5s per loop\n\n\nInstallation\n------------\n\n.. code-block:: sh\n\n pip install embeddings # from pypi\n pip install git+https://github.com/vzhong/embeddings.git # from github\n\n\nUsage\n-----\n\nUpon first use, the embeddings are first downloaded to disk in the form of a SQLite database.\nThis may take a long time for large embeddings such as GloVe.\nFurther usage of the embeddings are directly queried against the database.\nEmbedding databases are stored in the ``$EMBEDDINGS_ROOT`` directory (defaults to ``~/.embeddings``). Note that this location is probably **undesirable** if your home directory is on NFS, as it would slow down database queries significantly.\n\n\n.. code-block:: python\n\n from embeddings import GloveEmbedding, FastTextEmbedding, KazumaCharEmbedding, ConcatEmbedding\n\n g = GloveEmbedding('common_crawl_840', d_emb=300, show_progress=True)\n f = FastTextEmbedding()\n k = KazumaCharEmbedding()\n c = ConcatEmbedding([g, f, k])\n for w in ['canada', 'vancouver', 'toronto']:\n print('embedding {}'.format(w))\n print(g.emb(w))\n print(f.emb(w))\n print(k.emb(w))\n print(c.emb(w))\n\n\nDocker\n------\n\nIf you use Docker, an image prepopulated with the Common Crawl 840 GloVe embeddings and Kazuma Hashimoto's character ngram embeddings is available at `vzhong/embeddings <https://hub.docker.com/r/vzhong/embeddings>`_.\nTo mount volumes from this container, set ``$EMBEDDINGS_ROOT`` in your container to ``/opt/embeddings``.\n\nFor example:\n\n.. code-block:: bash\n\n docker run --volumes-from vzhong/embeddings -e EMBEDDINGS_ROOT='/opt/embeddings' myimage python train.py\n\n\nContribution\n------------\n\nPull requests welcome!\n\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Pretrained word embeddings in Python.",
"version": "0.0.8",
"split_keywords": [
"text",
"nlp",
"machine-learning"
],
"urls": [
{
"comment_text": "",
"digests": {
"md5": "5301150cdaafa9a5c6c430ca245020d2",
"sha256": "5dccf752f88d33804c1c86a146dccc7c2fc554239bfb89086dfd490070daab65"
},
"downloads": -1,
"filename": "embeddings-0.0.8-py3-none-any.whl",
"has_sig": false,
"md5_digest": "5301150cdaafa9a5c6c430ca245020d2",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 12251,
"upload_time": "2020-02-11T20:47:23",
"upload_time_iso_8601": "2020-02-11T20:47:23.935711Z",
"url": "https://files.pythonhosted.org/packages/bd/da/55d07bcdaac48b293aa88d797be3d89f6b960e2f71565dd64204fa0b6a4f/embeddings-0.0.8-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"md5": "be4bb4444f5fbdc7e2d2d7fb5d19fc7a",
"sha256": "53e95fbbc737ef9d9bb171b22f126e011fe15f959e692ba6bb2ad0f808370d7a"
},
"downloads": -1,
"filename": "embeddings-0.0.8.tar.gz",
"has_sig": false,
"md5_digest": "be4bb4444f5fbdc7e2d2d7fb5d19fc7a",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 8596,
"upload_time": "2020-02-11T20:47:25",
"upload_time_iso_8601": "2020-02-11T20:47:25.607317Z",
"url": "https://files.pythonhosted.org/packages/b1/1f/2c6597fc0ecf694a0a6f9dd795935d85d9810c44da7ad0a506a7d021d746/embeddings-0.0.8.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2020-02-11 20:47:25",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "vzhong",
"github_project": "embeddings",
"travis_ci": true,
"coveralls": false,
"github_actions": false,
"lcname": "embeddings"
}