zensols.deepnlp


Namezensols.deepnlp JSON
Version 1.14.0 PyPI version JSON
download
home_pagehttps://github.com/plandes/deepnlp
SummaryDeep learning utility library for natural language processing that aids in feature engineering and embedding layers.
upload_time2024-04-14 19:37:18
maintainerNone
docs_urlNone
authorPaul Landes
requires_pythonNone
licenseNone
keywords tooling
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # DeepZensols Natural Language Processing

[![PyPI][pypi-badge]][pypi-link]
[![Python 3.10][python310-badge]][python310-link]
[![Python 3.11][python311-badge]][python311-link]
[![Build Status][build-badge]][build-link]

Deep learning utility library for natural language processing that aids in
feature engineering and embedding layers.

* See the [full documentation].
* See the [paper](https://aclanthology.org/2023.nlposs-1.16)

Features:
* Configurable layers with little to no need to write code.
* [Natural language specific layers]:
  * Easily configurable word embedding layers for [Glove], [Word2Vec],
    [fastText].
  * Huggingface transformer ([BERT]) context based word vector layer.
  * Full [Embedding+BiLSTM-CRF] implementation using easy to configure
	constituent layers.
* [NLP specific vectorizers] that generate [zensols deeplearn] encoded and
  decoded [batched tensors] for [spaCy] parsed features, dependency tree
  features, overlapping text features and others.
* Easily swapable during runtime embedded layers as [batched tensors] and other
  linguistic vectorized features.
* Support for token, document and embedding level vectorized features.
* Transformer word piece to linguistic token mapping.
* Two full documented reference models provided as both command line and
  [Jupyter notebooks](#usage-and-reference-models).
* Command line support for training, testing, debugging, and creating
  predictions.


## Documentation

* [Full documentation](https://plandes.github.io/deepnlp/index.html)
* [Layers](https://plandes.github.io/deepnlp/doc/layers.html): NLP specific
  layers such as embeddings and transformers
* [Vectorizers](https://plandes.github.io/deepnlp/doc/vectorizers.html):
  specific vectorizers that digitize natural language text in to tensors ready
  as [PyTorch] input
* [API reference](https://plandes.github.io/install/api.html)
* [Reference Models](#usage-and-reference-models)


## Obtaining

The easiest way to install the command line program is via the `pip` installer:
```bash
pip3 install zensols.deepnlp
```

Binaries are also available on [pypi].


## Usage

The API can be used as is and manually configuring each component.  However,
this (like any Zensols API) was designed to instantiated with inverse of
control using [resource libraries].

### Component

Components and out of the box models are available with little to no coding.
However, this [simple example](example/simple/harness.py) that uses the
library's components is recommended for starters.  The example is a command
line application that in-lines a simple configuration needed to create deep
learning NLP components.

Similarly, [this example](example/fill-mask/harness.py) is also a command line
example, but uses a masked langauge model to fill in words.


### Reference Models

If you're in a rush, you can dive right in to the [Clickbate Text
Classification] reference model, which is a working project that uses this
library.  However, you'll either end up reading up on the [zensols deeplearn]
library before or during the tutorial.

The usage of this library is explained in terms of the reference models:

* The [Clickbate Text Classification] is the best reference model to start with
  because the only code consists of is the corpus reader and a module to remove
  sentence segmentation (corpus are newline delimited headlines).  It was also
  uses [resource libraries], which greatly reduces complexity, where as the
  other reference models do not.  Also see the [Jupyter clickbate
  classification notebook].

* The [Movie Review Sentiment] trained and tested on the [Stanford movie
  review] and [Cornell sentiment polarity] data sets, which assigns a positive
  or negative score to a natural language movie review by critics.  Also see
  the [Jupyter movie sentiment notebook].

* The [Named Entity Recognizer] trained and tested on the [CoNLL 2003 data set]
  to label named entities on natural language text.  Also see the [Jupyter NER
  notebook].

The unit test cases are also a good resource for the more detailed programming
integration with various parts of the library.


## Attribution

This project, or reference model code, uses:
* [Gensim] for [Glove], [Word2Vec] and [fastText] word embeddings.
* [Huggingface Transformers] for [BERT] contextual word embeddings.
* [h5py] for fast read access to word embedding vectors.
* [zensols nlparse] for feature generation from [spaCy] parsing.
* [zensols deeplearn] for deep learning network libraries.

Corpora used include:
* [Stanford movie review]
* [Cornell sentiment polarity]
* [CoNLL 2003 data set]


## Citation

If you use this project in your research please use the following BibTeX entry:

```bibtex
@inproceedings{landes-etal-2023-deepzensols,
    title = "{D}eep{Z}ensols: A Deep Learning Natural Language Processing Framework for Experimentation and Reproducibility",
    author = "Landes, Paul  and
      Di Eugenio, Barbara  and
      Caragea, Cornelia",
    editor = "Tan, Liling  and
      Milajevs, Dmitrijs  and
      Chauhan, Geeticka  and
      Gwinnup, Jeremy  and
      Rippeth, Elijah",
    booktitle = "Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)",
    month = dec,
    year = "2023",
    address = "Singapore, Singapore",
    publisher = "Empirical Methods in Natural Language Processing",
    url = "https://aclanthology.org/2023.nlposs-1.16",
    pages = "141--146"
}
```


## Changelog

An extensive changelog is available [here](CHANGELOG.md).


## Community

Please star this repository and let me know how and where you use this API.
Contributions as pull requests, feedback and any input is welcome.


## License

[MIT License](LICENSE.md)

Copyright (c) 2020 - 2023 Paul Landes


<!-- links -->
[pypi]: https://pypi.org/project/zensols.deepnlp/
[pypi-link]: https://pypi.python.org/pypi/zensols.deepnlp
[pypi-badge]: https://img.shields.io/pypi/v/zensols.deepnlp.svg
[python310-badge]: https://img.shields.io/badge/python-3.10-blue.svg
[python310-link]: https://www.python.org/downloads/release/python-3100
[python311-badge]: https://img.shields.io/badge/python-3.11-blue.svg
[python311-link]: https://www.python.org/downloads/release/python-3110
[build-badge]: https://github.com/plandes/util/workflows/CI/badge.svg
[build-link]: https://github.com/plandes/deepnlp/actions

[PyTorch]: https://pytorch.org
[Gensim]: https://radimrehurek.com/gensim/
[Huggingface Transformers]: https://huggingface.co
[Glove]: https://nlp.stanford.edu/projects/glove/
[Word2Vec]: https://code.google.com/archive/p/word2vec/
[fastText]: https://fasttext.cc
[BERT]: https://huggingface.co/transformers/model_doc/bert.html
[h5py]: https://www.h5py.org
[spaCy]: https://spacy.io
[Pandas]: https://pandas.pydata.org

[Stanford movie review]: https://nlp.stanford.edu/sentiment/
[Cornell sentiment polarity]: https://www.cs.cornell.edu/people/pabo/movie-review-data/
[CoNLL 2003 data set]: https://www.clips.uantwerpen.be/conll2003/ner/

[zensols deeplearn]: https://github.com/plandes/deeplearn
[zensols nlparse]: https://github.com/plandes/nlparse

[full documentation]: https://plandes.github.io/deepnlp/index.html
[resource libraries]: https://plandes.github.io/util/doc/config.html#resource-libraries
[Natural language specific layers]: https://plandes.github.io/deepnlp/doc/layers.html
[Clickbate Text Classification]: https://plandes.github.io/deepnlp/doc/clickbate-example.html
[Movie Review Sentiment]: https://plandes.github.io/deepnlp/doc/movie-example.html
[Named Entity Recognizer]: https://plandes.github.io/deepnlp/doc/ner-example.html
[Embedding+BiLSTM-CRF]: https://plandes.github.io/deepnlp/doc/ner-example.html#bilstm-crf
[batched tensors]: https://plandes.github.io/deeplearn/doc/preprocess.html#batches
[deep convolution layer]: https://plandes.github.io/deepnlp/api/zensols.deepnlp.layer.html#zensols.deepnlp.layer.conv.DeepConvolution1d
[NLP specific vectorizers]: https://plandes.github.io/deepnlp/doc/vectorizers.html
[Jupyter NER notebook]: https://github.com/plandes/deepnlp/blob/master/example/ner/notebook/ner.ipynb
[Jupyter movie sentiment notebook]: https://github.com/plandes/deepnlp/blob/master/example/movie/notebook/movie.ipynb
[Jupyter clickbate classification notebook]: https://github.com/plandes/deepnlp/blob/master/example/clickbate/notebook/clickbate.ipynb

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/plandes/deepnlp",
    "name": "zensols.deepnlp",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "tooling",
    "author": "Paul Landes",
    "author_email": "landes@mailc.net",
    "download_url": "https://github.com/plandes/deepnlp/releases/download/v1.14.0/zensols.deepnlp-1.14.0-py3-none-any.whl",
    "platform": null,
    "description": "# DeepZensols Natural Language Processing\n\n[![PyPI][pypi-badge]][pypi-link]\n[![Python 3.10][python310-badge]][python310-link]\n[![Python 3.11][python311-badge]][python311-link]\n[![Build Status][build-badge]][build-link]\n\nDeep learning utility library for natural language processing that aids in\nfeature engineering and embedding layers.\n\n* See the [full documentation].\n* See the [paper](https://aclanthology.org/2023.nlposs-1.16)\n\nFeatures:\n* Configurable layers with little to no need to write code.\n* [Natural language specific layers]:\n  * Easily configurable word embedding layers for [Glove], [Word2Vec],\n    [fastText].\n  * Huggingface transformer ([BERT]) context based word vector layer.\n  * Full [Embedding+BiLSTM-CRF] implementation using easy to configure\n\tconstituent layers.\n* [NLP specific vectorizers] that generate [zensols deeplearn] encoded and\n  decoded [batched tensors] for [spaCy] parsed features, dependency tree\n  features, overlapping text features and others.\n* Easily swapable during runtime embedded layers as [batched tensors] and other\n  linguistic vectorized features.\n* Support for token, document and embedding level vectorized features.\n* Transformer word piece to linguistic token mapping.\n* Two full documented reference models provided as both command line and\n  [Jupyter notebooks](#usage-and-reference-models).\n* Command line support for training, testing, debugging, and creating\n  predictions.\n\n\n## Documentation\n\n* [Full documentation](https://plandes.github.io/deepnlp/index.html)\n* [Layers](https://plandes.github.io/deepnlp/doc/layers.html): NLP specific\n  layers such as embeddings and transformers\n* [Vectorizers](https://plandes.github.io/deepnlp/doc/vectorizers.html):\n  specific vectorizers that digitize natural language text in to tensors ready\n  as [PyTorch] input\n* [API reference](https://plandes.github.io/install/api.html)\n* [Reference Models](#usage-and-reference-models)\n\n\n## Obtaining\n\nThe easiest way to install the command line program is via the `pip` installer:\n```bash\npip3 install zensols.deepnlp\n```\n\nBinaries are also available on [pypi].\n\n\n## Usage\n\nThe API can be used as is and manually configuring each component.  However,\nthis (like any Zensols API) was designed to instantiated with inverse of\ncontrol using [resource libraries].\n\n### Component\n\nComponents and out of the box models are available with little to no coding.\nHowever, this [simple example](example/simple/harness.py) that uses the\nlibrary's components is recommended for starters.  The example is a command\nline application that in-lines a simple configuration needed to create deep\nlearning NLP components.\n\nSimilarly, [this example](example/fill-mask/harness.py) is also a command line\nexample, but uses a masked langauge model to fill in words.\n\n\n### Reference Models\n\nIf you're in a rush, you can dive right in to the [Clickbate Text\nClassification] reference model, which is a working project that uses this\nlibrary.  However, you'll either end up reading up on the [zensols deeplearn]\nlibrary before or during the tutorial.\n\nThe usage of this library is explained in terms of the reference models:\n\n* The [Clickbate Text Classification] is the best reference model to start with\n  because the only code consists of is the corpus reader and a module to remove\n  sentence segmentation (corpus are newline delimited headlines).  It was also\n  uses [resource libraries], which greatly reduces complexity, where as the\n  other reference models do not.  Also see the [Jupyter clickbate\n  classification notebook].\n\n* The [Movie Review Sentiment] trained and tested on the [Stanford movie\n  review] and [Cornell sentiment polarity] data sets, which assigns a positive\n  or negative score to a natural language movie review by critics.  Also see\n  the [Jupyter movie sentiment notebook].\n\n* The [Named Entity Recognizer] trained and tested on the [CoNLL 2003 data set]\n  to label named entities on natural language text.  Also see the [Jupyter NER\n  notebook].\n\nThe unit test cases are also a good resource for the more detailed programming\nintegration with various parts of the library.\n\n\n## Attribution\n\nThis project, or reference model code, uses:\n* [Gensim] for [Glove], [Word2Vec] and [fastText] word embeddings.\n* [Huggingface Transformers] for [BERT] contextual word embeddings.\n* [h5py] for fast read access to word embedding vectors.\n* [zensols nlparse] for feature generation from [spaCy] parsing.\n* [zensols deeplearn] for deep learning network libraries.\n\nCorpora used include:\n* [Stanford movie review]\n* [Cornell sentiment polarity]\n* [CoNLL 2003 data set]\n\n\n## Citation\n\nIf you use this project in your research please use the following BibTeX entry:\n\n```bibtex\n@inproceedings{landes-etal-2023-deepzensols,\n    title = \"{D}eep{Z}ensols: A Deep Learning Natural Language Processing Framework for Experimentation and Reproducibility\",\n    author = \"Landes, Paul  and\n      Di Eugenio, Barbara  and\n      Caragea, Cornelia\",\n    editor = \"Tan, Liling  and\n      Milajevs, Dmitrijs  and\n      Chauhan, Geeticka  and\n      Gwinnup, Jeremy  and\n      Rippeth, Elijah\",\n    booktitle = \"Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)\",\n    month = dec,\n    year = \"2023\",\n    address = \"Singapore, Singapore\",\n    publisher = \"Empirical Methods in Natural Language Processing\",\n    url = \"https://aclanthology.org/2023.nlposs-1.16\",\n    pages = \"141--146\"\n}\n```\n\n\n## Changelog\n\nAn extensive changelog is available [here](CHANGELOG.md).\n\n\n## Community\n\nPlease star this repository and let me know how and where you use this API.\nContributions as pull requests, feedback and any input is welcome.\n\n\n## License\n\n[MIT License](LICENSE.md)\n\nCopyright (c) 2020 - 2023 Paul Landes\n\n\n<!-- links -->\n[pypi]: https://pypi.org/project/zensols.deepnlp/\n[pypi-link]: https://pypi.python.org/pypi/zensols.deepnlp\n[pypi-badge]: https://img.shields.io/pypi/v/zensols.deepnlp.svg\n[python310-badge]: https://img.shields.io/badge/python-3.10-blue.svg\n[python310-link]: https://www.python.org/downloads/release/python-3100\n[python311-badge]: https://img.shields.io/badge/python-3.11-blue.svg\n[python311-link]: https://www.python.org/downloads/release/python-3110\n[build-badge]: https://github.com/plandes/util/workflows/CI/badge.svg\n[build-link]: https://github.com/plandes/deepnlp/actions\n\n[PyTorch]: https://pytorch.org\n[Gensim]: https://radimrehurek.com/gensim/\n[Huggingface Transformers]: https://huggingface.co\n[Glove]: https://nlp.stanford.edu/projects/glove/\n[Word2Vec]: https://code.google.com/archive/p/word2vec/\n[fastText]: https://fasttext.cc\n[BERT]: https://huggingface.co/transformers/model_doc/bert.html\n[h5py]: https://www.h5py.org\n[spaCy]: https://spacy.io\n[Pandas]: https://pandas.pydata.org\n\n[Stanford movie review]: https://nlp.stanford.edu/sentiment/\n[Cornell sentiment polarity]: https://www.cs.cornell.edu/people/pabo/movie-review-data/\n[CoNLL 2003 data set]: https://www.clips.uantwerpen.be/conll2003/ner/\n\n[zensols deeplearn]: https://github.com/plandes/deeplearn\n[zensols nlparse]: https://github.com/plandes/nlparse\n\n[full documentation]: https://plandes.github.io/deepnlp/index.html\n[resource libraries]: https://plandes.github.io/util/doc/config.html#resource-libraries\n[Natural language specific layers]: https://plandes.github.io/deepnlp/doc/layers.html\n[Clickbate Text Classification]: https://plandes.github.io/deepnlp/doc/clickbate-example.html\n[Movie Review Sentiment]: https://plandes.github.io/deepnlp/doc/movie-example.html\n[Named Entity Recognizer]: https://plandes.github.io/deepnlp/doc/ner-example.html\n[Embedding+BiLSTM-CRF]: https://plandes.github.io/deepnlp/doc/ner-example.html#bilstm-crf\n[batched tensors]: https://plandes.github.io/deeplearn/doc/preprocess.html#batches\n[deep convolution layer]: https://plandes.github.io/deepnlp/api/zensols.deepnlp.layer.html#zensols.deepnlp.layer.conv.DeepConvolution1d\n[NLP specific vectorizers]: https://plandes.github.io/deepnlp/doc/vectorizers.html\n[Jupyter NER notebook]: https://github.com/plandes/deepnlp/blob/master/example/ner/notebook/ner.ipynb\n[Jupyter movie sentiment notebook]: https://github.com/plandes/deepnlp/blob/master/example/movie/notebook/movie.ipynb\n[Jupyter clickbate classification notebook]: https://github.com/plandes/deepnlp/blob/master/example/clickbate/notebook/clickbate.ipynb\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Deep learning utility library for natural language processing that aids in feature engineering and embedding layers.",
    "version": "1.14.0",
    "project_urls": {
        "Download": "https://github.com/plandes/deepnlp/releases/download/v1.14.0/zensols.deepnlp-1.14.0-py3-none-any.whl",
        "Homepage": "https://github.com/plandes/deepnlp"
    },
    "split_keywords": [
        "tooling"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "eba95dfb4e63319765dd4944eb7361e094d02b08d7f3c228050b3eccf09d0c6f",
                "md5": "3161a7f4441c3ae02c6f8ab5f1770442",
                "sha256": "0fdb3a3859673b97d60d2d85b2157d956ecd95f57389289bcb0297d126cdf6a7"
            },
            "downloads": -1,
            "filename": "zensols.deepnlp-1.14.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3161a7f4441c3ae02c6f8ab5f1770442",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 119331,
            "upload_time": "2024-04-14T19:37:18",
            "upload_time_iso_8601": "2024-04-14T19:37:18.307563Z",
            "url": "https://files.pythonhosted.org/packages/eb/a9/5dfb4e63319765dd4944eb7361e094d02b08d7f3c228050b3eccf09d0c6f/zensols.deepnlp-1.14.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-14 19:37:18",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "plandes",
    "github_project": "deepnlp",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "zensols.deepnlp"
}
        
Elapsed time: 0.23433s