datawords


Namedatawords JSON
Version 0.7.4 PyPI version JSON
download
home_pageNone
SummaryA library to work with text data
upload_time2023-11-08 09:58:49
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseNone
keywords datascience nlp text transformers
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # datawords

[![PyPI - Version](https://img.shields.io/pypi/v/datawords.svg)](https://pypi.org/project/datawords)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/datawords.svg)](https://pypi.org/project/datawords)
[![readthedocs](https://readthedocs.org/projects/datawords/badge/?version=latest)](https://datawords.readthedocs.io/en/latest/)

-----

This is a library oriented to common and uncommon NLP tasks.


**Datawords** emerge after two years of solving different projects that required NLP techniques 
like training and saving Word2Vec ([Gensim](https://radimrehurek.com/gensim/)) models, finding entities on text ([Spacy](https://spacy.io/)), ranking texts ([scikit-network](https://scikit-network.readthedocs.io/en/latest/)), indexing it ([Spotify Annoy](https://github.com/spotify/annoy)), translating it ([Hugging Face](https://huggingface.co/docs/transformers/index)). 

Then to use those libraries some pre-processing, post-processing tasks and transformations were also required. For this reasons, **datawords exists**. 

Sometimes it’s very opinated (Indexing happens over text, and not over vectors besides Annoy allows it), and sometimes gives you freedom and provide you with helper classes and functions to use freely.

Another way to see this library is as an agreggator of all that excellent libraries mentioned before.

In a nutshell, **Datawords let’s you**:

- Train Word2Vec models (Gensim)
- Build Indexes for texts (Annoy, SQLite)
- Translate texts (Transformers)
- Rank texts (PageRank)


**Table of Contents**

- [Installation](#installation)
- [License](#license)

## Installation

```console
pip install datawords
```

To use transformes from [HuggingFace](https://huggingface.co/) please do:

```console
pip install datawords[transformers]
```

## Quickstart

**deepnlp**:

```python
from datawords.deepnlp import translators
mn = translators.build_model_name("es", "en")
rsp = transform_mp("es", "en", model_path=fp, texts=["hola mundo", "adios mundo", "notias eran las de antes", "Messi es un dios para muchas personas"])

```

## License

`datawords` is distributed under the terms of the [MPL-2.0](https://www.mozilla.org/en-US/MPL/2.0/) license.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "datawords",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "datascience,nlp,text,transformers",
    "author": null,
    "author_email": "Xavier Petit <nuxion@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/1b/34/348900699491f43aadbcab62968d19b052747d50720aeb00f28962dfe55b/datawords-0.7.4.tar.gz",
    "platform": null,
    "description": "# datawords\n\n[![PyPI - Version](https://img.shields.io/pypi/v/datawords.svg)](https://pypi.org/project/datawords)\n[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/datawords.svg)](https://pypi.org/project/datawords)\n[![readthedocs](https://readthedocs.org/projects/datawords/badge/?version=latest)](https://datawords.readthedocs.io/en/latest/)\n\n-----\n\nThis is a library oriented to common and uncommon NLP tasks.\n\n\n**Datawords** emerge after two years of solving different projects that required NLP techniques \nlike training and saving Word2Vec ([Gensim](https://radimrehurek.com/gensim/)) models, finding entities on text ([Spacy](https://spacy.io/)), ranking texts ([scikit-network](https://scikit-network.readthedocs.io/en/latest/)), indexing it ([Spotify Annoy](https://github.com/spotify/annoy)), translating it ([Hugging Face](https://huggingface.co/docs/transformers/index)). \n\nThen to use those libraries some pre-processing, post-processing tasks and transformations were also required. For this reasons, **datawords exists**. \n\nSometimes it\u2019s very opinated (Indexing happens over text, and not over vectors besides Annoy allows it), and sometimes gives you freedom and provide you with helper classes and functions to use freely.\n\nAnother way to see this library is as an agreggator of all that excellent libraries mentioned before.\n\nIn a nutshell, **Datawords let\u2019s you**:\n\n- Train Word2Vec models (Gensim)\n- Build Indexes for texts (Annoy, SQLite)\n- Translate texts (Transformers)\n- Rank texts (PageRank)\n\n\n**Table of Contents**\n\n- [Installation](#installation)\n- [License](#license)\n\n## Installation\n\n```console\npip install datawords\n```\n\nTo use transformes from [HuggingFace](https://huggingface.co/) please do:\n\n```console\npip install datawords[transformers]\n```\n\n## Quickstart\n\n**deepnlp**:\n\n```python\nfrom datawords.deepnlp import translators\nmn = translators.build_model_name(\"es\", \"en\")\nrsp = transform_mp(\"es\", \"en\", model_path=fp, texts=[\"hola mundo\", \"adios mundo\", \"notias eran las de antes\", \"Messi es un dios para muchas personas\"])\n\n```\n\n## License\n\n`datawords` is distributed under the terms of the [MPL-2.0](https://www.mozilla.org/en-US/MPL/2.0/) license.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A library to work with text data",
    "version": "0.7.4",
    "project_urls": {
        "Documentation": "https://github.com/unknown/datawords#readme",
        "Issues": "https://github.com/unknown/datawords/issues",
        "Source": "https://github.com/unknown/datawords"
    },
    "split_keywords": [
        "datascience",
        "nlp",
        "text",
        "transformers"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "2ae0ab518224b2d7a75959cad0259410ab878186df866882d3afd84c70528e46",
                "md5": "cca69f9ed573dd59e26cb2879cd66623",
                "sha256": "580c7a5ad3d65ad1d2c574ee17acaf7ed39680d746ef23937754ea8097624b70"
            },
            "downloads": -1,
            "filename": "datawords-0.7.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "cca69f9ed573dd59e26cb2879cd66623",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 36655,
            "upload_time": "2023-11-08T09:58:52",
            "upload_time_iso_8601": "2023-11-08T09:58:52.455493Z",
            "url": "https://files.pythonhosted.org/packages/2a/e0/ab518224b2d7a75959cad0259410ab878186df866882d3afd84c70528e46/datawords-0.7.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "1b34348900699491f43aadbcab62968d19b052747d50720aeb00f28962dfe55b",
                "md5": "64fbb02c6e84fd9afd1b6a33ff684245",
                "sha256": "5611c4d8b1728037ad9ce9b944ab204abdcf948e63a4cbf427a5aae1de717624"
            },
            "downloads": -1,
            "filename": "datawords-0.7.4.tar.gz",
            "has_sig": false,
            "md5_digest": "64fbb02c6e84fd9afd1b6a33ff684245",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 56059,
            "upload_time": "2023-11-08T09:58:49",
            "upload_time_iso_8601": "2023-11-08T09:58:49.533274Z",
            "url": "https://files.pythonhosted.org/packages/1b/34/348900699491f43aadbcab62968d19b052747d50720aeb00f28962dfe55b/datawords-0.7.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-11-08 09:58:49",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "unknown",
    "github_project": "datawords#readme",
    "github_not_found": true,
    "lcname": "datawords"
}
        
Elapsed time: 2.28999s