zensols.deeplearn


Namezensols.deeplearn JSON
Version 1.11.1 PyPI version JSON
download
home_pagehttps://github.com/plandes/deeplearn
SummaryGeneral deep learing utility library
upload_time2024-03-14 20:50:48
maintainer
docs_urlNone
authorPaul Landes
requires_python
license
keywords tooling
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Deep Zensols Deep Learning Framework

[![PyPI][pypi-badge]][pypi-link]
[![Python 3.10][python310-badge]][python310-link]
[![Python 3.11][python311-badge]][python311-link]
[![Build Status][build-badge]][build-link]

This deep learning library was designed to provide consistent and reproducible
results.

* See the [full documentation].
* See the [paper](https://aclanthology.org/2023.nlposs-1.16)

Features:
* Easy to configure and framework to allow for programmatic [debugging] of
  neural networks.
* [Reproducibility] of results
  * All [random seed state] is persisted in the trained model files.
  * Persisting of keys and key order across train, validation and test sets.
* Analysis of results with complete metrics available.
* A [vectorization] framework that allows for pickling tensors.
* Additional [layers]:
  * Full [BiLSTM-CRF] and stand-alone [CRF] implementation using easy to
    configure constituent layers.
  * Easy to configure *N* [deep convolution layer] with automatic
    dimensionality calculation and configurable pooling and batch centering.
  * [Convolutional layer factory] with dimensionality calculation.
  * [Recurrent layers] that abstracts RNN, GRU and LSTM.
  * *N* deep [linear layers].
  * Each layer's configurable with activation, dropout and batch normalization.
* [Pandas] integration to [data load], [easily manage] [vectorized features],
  and [report results].
* Multi-process for time consuming CPU feature [vectorization] requiring little
  to no coding.
* Resource and tensor deallocation with memory management.
* [Real-time performance] and loss metrics with plotting while training.
* Thorough [unit test] coverage.
* [Debugging] layers using easy to configure Python logging module and control
  points.
* A workflow and API to package and distribute models.  Then automatically
  download, install and inference with them in (optionally) two separate code
  bases.

Much of the code provides convenience functionality to [PyTorch].  However,
there is functionality that could be used for other deep learning APIs.


## Documentation

See the [full documentation].


## Obtaining

The easiest way to install the command line program is via the `pip` installer:
```bash
pip3 install zensols.deeplearn
```

Binaries are also available on [pypi].


## Workflow

This package provides a workflow for processing features, training and then
testing a model.  A high level outline of this process follows:
1. Container objects are used to represent and access data as features.
1. Instances of *data points* wrap the container objects.
1. Vectorize the features of each data point in to tensors.
1. Store the vectorized tensor features to disk so they can be retrieved
   quickly and frequently.
1. At train time, load the vectorized features in to memory and train.
1. Test the model and store the results to disk.

To jump right in, see the [examples](#examples) section.  However, it is better
to peruse the in depth explanation with the [Iris example] code follows:
* The initial [data processing], which includes data representation to batch
  creation.
* Creating and configuring the [model].
* Using a [facade] to train, validate and test the model.
* Analysis of [results], including training/validation loss graphs and
  performance metrics.


## Examples

The [Iris example] (also see the [Iris example configuration]) is the most
basic example of how to use this framework.  This example is detailed in the
[workflow](#workflow) documentation in detail.

There are also examples in the form of [Juypter] notebooks as well, which
include the:
* [Iris notebook] data set, which is a small data set of flower dimensions as a
  three label classification,
* [MNIST notebook] for the handwritten digit data set,
* [debugging notebook].


## Attribution

This project, or example code, uses:
* [PyTorch] as the underlying framework.
* Branched code from [Torch CRF](#torch-crf) for the [CRF] class.
* [pycuda] for Python integration with [CUDA].
* [scipy] for scientific utility.
* [Pandas] for prediction output.
* [matplotlib] for plotting loss curves.

Corpora used include:
* [Iris data set]
* [Adult data set]
* [MNIST data set]


### Torch CRF

The [CRF] class was taken and modified from Kemal Kurniawan's [pytorch_crf]
GitHub repository.  See the `README.md` module documentation for more
information.  This module was [forked pytorch_crf] with modifications.
However, the modifications were not merged and the project appears to be
inactive.

**Important**: This project will change to use it as a dependency pending
merging of the changes needed by this project.  Until then, it will remain as a
separate class in this project, which is easier to maintain as the only
class/code is the `CRF` class.

The [pytorch_crf] repository uses the same license as this repository, which
the [MIT License].  For this reason, there are no software/package tainting
issues.


## See Also

The [zensols deepnlp] project is a deep learning utility library for natural
language processing that aids in feature engineering and embedding layers that
builds on this project.


## Citation

If you use this project in your research please use the following BibTeX entry:

```bibtex
@inproceedings{landes-etal-2023-deepzensols,
    title = "{D}eep{Z}ensols: A Deep Learning Natural Language Processing Framework for Experimentation and Reproducibility",
    author = "Landes, Paul  and
      Di Eugenio, Barbara  and
      Caragea, Cornelia",
    editor = "Tan, Liling  and
      Milajevs, Dmitrijs  and
      Chauhan, Geeticka  and
      Gwinnup, Jeremy  and
      Rippeth, Elijah",
    booktitle = "Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)",
    month = dec,
    year = "2023",
    address = "Singapore, Singapore",
    publisher = "Empirical Methods in Natural Language Processing",
    url = "https://aclanthology.org/2023.nlposs-1.16",
    pages = "141--146"
}
```


## Changelog

An extensive changelog is available [here](CHANGELOG.md).


## Community

Please star the project and let me know how and where you use this API.
Contributions as pull requests, feedback and any input is welcome.


## License

[MIT License]

Copyright (c) 2020 - 2023 Paul Landes


<!-- links -->
[pypi]: https://pypi.org/project/zensols.deeplearn/
[pypi-link]: https://pypi.python.org/pypi/zensols.deeplearn
[pypi-badge]: https://img.shields.io/pypi/v/zensols.deeplearn.svg
[python310-badge]: https://img.shields.io/badge/python-3.10-blue.svg
[python310-link]: https://www.python.org/downloads/release/python-3100
[python311-badge]: https://img.shields.io/badge/python-3.11-blue.svg
[python311-link]: https://www.python.org/downloads/release/python-3110
[build-badge]: https://github.com/plandes/util/workflows/CI/badge.svg
[build-link]: https://github.com/plandes/deeplearn/actions

[MIT License]: LICENSE.md
[PyTorch]: https://pytorch.org
[Juypter]: https://jupyter.org
[pycuda]: https://pypi.org/project/pycuda/
[CUDA]: https://developer.nvidia.com/cuda-toolkit
[scipy]: https://www.scipy.org
[Pandas]: https://pandas.pydata.org
[matplotlib]: https://matplotlib.org

[pytorch_crf]: https://github.com/kmkurn/pytorch-crf
[forked pytorch_crf]: https://github.com/plandes/pytorch-crf
[zensols.deeplearn.layer.CRF]: api/zensols.deeplearn.layer.html#zensols.deeplearn.layer.crf.CRF
[zensols deepnlp]: https://plandes.github.io/deepnlp

[full documentation]: https://plandes.github.io/deeplearn/index.html
[Iris notebook]: https://github.com/plandes/deeplearn/tree/master/notebook/iris.ipynb
[MNIST notebook]: https://github.com/plandes/deeplearn/tree/master/notebook/mnist.ipynb
[debugging notebook]: https://github.com/plandes/deeplearn/tree/master/notebook/debug.ipynb

[model]: https://plandes.github.io/deeplearn/doc/model.html
[facade]: https://plandes.github.io/deeplearn/doc/facade.html
[results]: https://plandes.github.io/deeplearn/doc/results.html
[data processing]: https://plandes.github.io/deeplearn/doc/preprocess.html
[layers]: https://plandes.github.io/deeplearn/doc/layers.html
[reproducibility]: https://plandes.github.io/deeplearn/doc/results.html#reproducibility
[debugging]: https://plandes.github.io/deeplearn/doc/facade.html#debugging-the-model
[random seed state]: api/zensols.deeplearn.html#zensols.deeplearn.torchconfig.TorchConfig.set_random_seed
[Real-time performance]: https://plandes.github.io/deeplearn/doc/results.html#plotting-loss
[Debugging]: https://plandes.github.io/deeplearn/doc/model.html#debugging
[unit test]: https://github.com/plandes/deeplearn/tree/master/test/python
[vectorization]: https://plandes.github.io/deeplearn/doc/preprocess.html#vectorizers
[Iris example]: https://github.com/plandes/deeplearn/blob/master/test/python/iris/model.py
[Iris example configuration]: https://github.com/plandes/deeplearn/blob/master/test-resources/iris

[Iris data set]: https://archive.ics.uci.edu/ml/datasets/iris
[Adult data set]: http://archive.ics.uci.edu/ml/datasets/Adult
[MNIST data set]: http://yann.lecun.com/exdb/mnist/

[data load]: https://plandes.github.io/deeplearn/api/zensols.dataframe.html?highlight=dataframestash#zensols.dataframe.stash.DataframeStash
[easily manage]: https://plandes.github.io/deeplearn/api/zensols.deeplearn.dataframe.html?highlight=dataframefeaturevectorizermanager#zensols.deeplearn.dataframe.vectorize.DataframeFeatureVectorizerManager
[vectorized features]: https://plandes.github.io/deeplearn/api/zensols.deeplearn.vectorize.html?highlight=seriesencodablefeaturevectorizer#zensols.deeplearn.vectorize.vectorizers.OneHotEncodedEncodableFeatureVectorizer
[report results]: https://plandes.github.io/deeplearn/api/zensols.deeplearn.result.html?highlight=modelresultreporter#zensols.deeplearn.result.report.ModelResultReporter

[Convolutional layer factory]: https://plandes.github.io/deeplearn/api/zensols.deeplearn.layer.html#zensols.deeplearn.layer.conv.ConvolutionLayerFactory
[CRF]: https://plandes.github.io/deeplearn/api/zensols.deeplearn.layer.html#zensols.deeplearn.layer.crf.CRF
[BiLSTM-CRF]: https://plandes.github.io/deeplearn/api/zensols.deeplearn.layer.html?highlight=recurrentcrf#zensols.deeplearn.layer.recurcrf.RecurrentCRF
[Recurrent layers]: https://plandes.github.io/deeplearn/api/zensols.deeplearn.layer.html#zensols.deeplearn.layer.recur.RecurrentAggregation
[linear layers]: https://plandes.github.io/deeplearn/api/zensols.deeplearn.layer.html#zensols.deeplearn.layer.linear.DeepLinear

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/plandes/deeplearn",
    "name": "zensols.deeplearn",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "tooling",
    "author": "Paul Landes",
    "author_email": "landes@mailc.net",
    "download_url": "https://github.com/plandes/deeplearn/releases/download/v1.11.1/zensols.deeplearn-1.11.1-py3-none-any.whl",
    "platform": null,
    "description": "# Deep Zensols Deep Learning Framework\n\n[![PyPI][pypi-badge]][pypi-link]\n[![Python 3.10][python310-badge]][python310-link]\n[![Python 3.11][python311-badge]][python311-link]\n[![Build Status][build-badge]][build-link]\n\nThis deep learning library was designed to provide consistent and reproducible\nresults.\n\n* See the [full documentation].\n* See the [paper](https://aclanthology.org/2023.nlposs-1.16)\n\nFeatures:\n* Easy to configure and framework to allow for programmatic [debugging] of\n  neural networks.\n* [Reproducibility] of results\n  * All [random seed state] is persisted in the trained model files.\n  * Persisting of keys and key order across train, validation and test sets.\n* Analysis of results with complete metrics available.\n* A [vectorization] framework that allows for pickling tensors.\n* Additional [layers]:\n  * Full [BiLSTM-CRF] and stand-alone [CRF] implementation using easy to\n    configure constituent layers.\n  * Easy to configure *N* [deep convolution layer] with automatic\n    dimensionality calculation and configurable pooling and batch centering.\n  * [Convolutional layer factory] with dimensionality calculation.\n  * [Recurrent layers] that abstracts RNN, GRU and LSTM.\n  * *N* deep [linear layers].\n  * Each layer's configurable with activation, dropout and batch normalization.\n* [Pandas] integration to [data load], [easily manage] [vectorized features],\n  and [report results].\n* Multi-process for time consuming CPU feature [vectorization] requiring little\n  to no coding.\n* Resource and tensor deallocation with memory management.\n* [Real-time performance] and loss metrics with plotting while training.\n* Thorough [unit test] coverage.\n* [Debugging] layers using easy to configure Python logging module and control\n  points.\n* A workflow and API to package and distribute models.  Then automatically\n  download, install and inference with them in (optionally) two separate code\n  bases.\n\nMuch of the code provides convenience functionality to [PyTorch].  However,\nthere is functionality that could be used for other deep learning APIs.\n\n\n## Documentation\n\nSee the [full documentation].\n\n\n## Obtaining\n\nThe easiest way to install the command line program is via the `pip` installer:\n```bash\npip3 install zensols.deeplearn\n```\n\nBinaries are also available on [pypi].\n\n\n## Workflow\n\nThis package provides a workflow for processing features, training and then\ntesting a model.  A high level outline of this process follows:\n1. Container objects are used to represent and access data as features.\n1. Instances of *data points* wrap the container objects.\n1. Vectorize the features of each data point in to tensors.\n1. Store the vectorized tensor features to disk so they can be retrieved\n   quickly and frequently.\n1. At train time, load the vectorized features in to memory and train.\n1. Test the model and store the results to disk.\n\nTo jump right in, see the [examples](#examples) section.  However, it is better\nto peruse the in depth explanation with the [Iris example] code follows:\n* The initial [data processing], which includes data representation to batch\n  creation.\n* Creating and configuring the [model].\n* Using a [facade] to train, validate and test the model.\n* Analysis of [results], including training/validation loss graphs and\n  performance metrics.\n\n\n## Examples\n\nThe [Iris example] (also see the [Iris example configuration]) is the most\nbasic example of how to use this framework.  This example is detailed in the\n[workflow](#workflow) documentation in detail.\n\nThere are also examples in the form of [Juypter] notebooks as well, which\ninclude the:\n* [Iris notebook] data set, which is a small data set of flower dimensions as a\n  three label classification,\n* [MNIST notebook] for the handwritten digit data set,\n* [debugging notebook].\n\n\n## Attribution\n\nThis project, or example code, uses:\n* [PyTorch] as the underlying framework.\n* Branched code from [Torch CRF](#torch-crf) for the [CRF] class.\n* [pycuda] for Python integration with [CUDA].\n* [scipy] for scientific utility.\n* [Pandas] for prediction output.\n* [matplotlib] for plotting loss curves.\n\nCorpora used include:\n* [Iris data set]\n* [Adult data set]\n* [MNIST data set]\n\n\n### Torch CRF\n\nThe [CRF] class was taken and modified from Kemal Kurniawan's [pytorch_crf]\nGitHub repository.  See the `README.md` module documentation for more\ninformation.  This module was [forked pytorch_crf] with modifications.\nHowever, the modifications were not merged and the project appears to be\ninactive.\n\n**Important**: This project will change to use it as a dependency pending\nmerging of the changes needed by this project.  Until then, it will remain as a\nseparate class in this project, which is easier to maintain as the only\nclass/code is the `CRF` class.\n\nThe [pytorch_crf] repository uses the same license as this repository, which\nthe [MIT License].  For this reason, there are no software/package tainting\nissues.\n\n\n## See Also\n\nThe [zensols deepnlp] project is a deep learning utility library for natural\nlanguage processing that aids in feature engineering and embedding layers that\nbuilds on this project.\n\n\n## Citation\n\nIf you use this project in your research please use the following BibTeX entry:\n\n```bibtex\n@inproceedings{landes-etal-2023-deepzensols,\n    title = \"{D}eep{Z}ensols: A Deep Learning Natural Language Processing Framework for Experimentation and Reproducibility\",\n    author = \"Landes, Paul  and\n      Di Eugenio, Barbara  and\n      Caragea, Cornelia\",\n    editor = \"Tan, Liling  and\n      Milajevs, Dmitrijs  and\n      Chauhan, Geeticka  and\n      Gwinnup, Jeremy  and\n      Rippeth, Elijah\",\n    booktitle = \"Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)\",\n    month = dec,\n    year = \"2023\",\n    address = \"Singapore, Singapore\",\n    publisher = \"Empirical Methods in Natural Language Processing\",\n    url = \"https://aclanthology.org/2023.nlposs-1.16\",\n    pages = \"141--146\"\n}\n```\n\n\n## Changelog\n\nAn extensive changelog is available [here](CHANGELOG.md).\n\n\n## Community\n\nPlease star the project and let me know how and where you use this API.\nContributions as pull requests, feedback and any input is welcome.\n\n\n## License\n\n[MIT License]\n\nCopyright (c) 2020 - 2023 Paul Landes\n\n\n<!-- links -->\n[pypi]: https://pypi.org/project/zensols.deeplearn/\n[pypi-link]: https://pypi.python.org/pypi/zensols.deeplearn\n[pypi-badge]: https://img.shields.io/pypi/v/zensols.deeplearn.svg\n[python310-badge]: https://img.shields.io/badge/python-3.10-blue.svg\n[python310-link]: https://www.python.org/downloads/release/python-3100\n[python311-badge]: https://img.shields.io/badge/python-3.11-blue.svg\n[python311-link]: https://www.python.org/downloads/release/python-3110\n[build-badge]: https://github.com/plandes/util/workflows/CI/badge.svg\n[build-link]: https://github.com/plandes/deeplearn/actions\n\n[MIT License]: LICENSE.md\n[PyTorch]: https://pytorch.org\n[Juypter]: https://jupyter.org\n[pycuda]: https://pypi.org/project/pycuda/\n[CUDA]: https://developer.nvidia.com/cuda-toolkit\n[scipy]: https://www.scipy.org\n[Pandas]: https://pandas.pydata.org\n[matplotlib]: https://matplotlib.org\n\n[pytorch_crf]: https://github.com/kmkurn/pytorch-crf\n[forked pytorch_crf]: https://github.com/plandes/pytorch-crf\n[zensols.deeplearn.layer.CRF]: api/zensols.deeplearn.layer.html#zensols.deeplearn.layer.crf.CRF\n[zensols deepnlp]: https://plandes.github.io/deepnlp\n\n[full documentation]: https://plandes.github.io/deeplearn/index.html\n[Iris notebook]: https://github.com/plandes/deeplearn/tree/master/notebook/iris.ipynb\n[MNIST notebook]: https://github.com/plandes/deeplearn/tree/master/notebook/mnist.ipynb\n[debugging notebook]: https://github.com/plandes/deeplearn/tree/master/notebook/debug.ipynb\n\n[model]: https://plandes.github.io/deeplearn/doc/model.html\n[facade]: https://plandes.github.io/deeplearn/doc/facade.html\n[results]: https://plandes.github.io/deeplearn/doc/results.html\n[data processing]: https://plandes.github.io/deeplearn/doc/preprocess.html\n[layers]: https://plandes.github.io/deeplearn/doc/layers.html\n[reproducibility]: https://plandes.github.io/deeplearn/doc/results.html#reproducibility\n[debugging]: https://plandes.github.io/deeplearn/doc/facade.html#debugging-the-model\n[random seed state]: api/zensols.deeplearn.html#zensols.deeplearn.torchconfig.TorchConfig.set_random_seed\n[Real-time performance]: https://plandes.github.io/deeplearn/doc/results.html#plotting-loss\n[Debugging]: https://plandes.github.io/deeplearn/doc/model.html#debugging\n[unit test]: https://github.com/plandes/deeplearn/tree/master/test/python\n[vectorization]: https://plandes.github.io/deeplearn/doc/preprocess.html#vectorizers\n[Iris example]: https://github.com/plandes/deeplearn/blob/master/test/python/iris/model.py\n[Iris example configuration]: https://github.com/plandes/deeplearn/blob/master/test-resources/iris\n\n[Iris data set]: https://archive.ics.uci.edu/ml/datasets/iris\n[Adult data set]: http://archive.ics.uci.edu/ml/datasets/Adult\n[MNIST data set]: http://yann.lecun.com/exdb/mnist/\n\n[data load]: https://plandes.github.io/deeplearn/api/zensols.dataframe.html?highlight=dataframestash#zensols.dataframe.stash.DataframeStash\n[easily manage]: https://plandes.github.io/deeplearn/api/zensols.deeplearn.dataframe.html?highlight=dataframefeaturevectorizermanager#zensols.deeplearn.dataframe.vectorize.DataframeFeatureVectorizerManager\n[vectorized features]: https://plandes.github.io/deeplearn/api/zensols.deeplearn.vectorize.html?highlight=seriesencodablefeaturevectorizer#zensols.deeplearn.vectorize.vectorizers.OneHotEncodedEncodableFeatureVectorizer\n[report results]: https://plandes.github.io/deeplearn/api/zensols.deeplearn.result.html?highlight=modelresultreporter#zensols.deeplearn.result.report.ModelResultReporter\n\n[Convolutional layer factory]: https://plandes.github.io/deeplearn/api/zensols.deeplearn.layer.html#zensols.deeplearn.layer.conv.ConvolutionLayerFactory\n[CRF]: https://plandes.github.io/deeplearn/api/zensols.deeplearn.layer.html#zensols.deeplearn.layer.crf.CRF\n[BiLSTM-CRF]: https://plandes.github.io/deeplearn/api/zensols.deeplearn.layer.html?highlight=recurrentcrf#zensols.deeplearn.layer.recurcrf.RecurrentCRF\n[Recurrent layers]: https://plandes.github.io/deeplearn/api/zensols.deeplearn.layer.html#zensols.deeplearn.layer.recur.RecurrentAggregation\n[linear layers]: https://plandes.github.io/deeplearn/api/zensols.deeplearn.layer.html#zensols.deeplearn.layer.linear.DeepLinear\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "General deep learing utility library",
    "version": "1.11.1",
    "project_urls": {
        "Download": "https://github.com/plandes/deeplearn/releases/download/v1.11.1/zensols.deeplearn-1.11.1-py3-none-any.whl",
        "Homepage": "https://github.com/plandes/deeplearn"
    },
    "split_keywords": [
        "tooling"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "bc5a5730ac9d6afbd726e56532d7f1b699f49169b7a20cfad43e5d88f81aeb20",
                "md5": "6334192da19684374fccb04380e4c35f",
                "sha256": "70c85fc94785d5f9be88cb586da4c81b05da1511c4e6296084c31e5f961dd929"
            },
            "downloads": -1,
            "filename": "zensols.deeplearn-1.11.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6334192da19684374fccb04380e4c35f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 172003,
            "upload_time": "2024-03-14T20:50:48",
            "upload_time_iso_8601": "2024-03-14T20:50:48.203197Z",
            "url": "https://files.pythonhosted.org/packages/bc/5a/5730ac9d6afbd726e56532d7f1b699f49169b7a20cfad43e5d88f81aeb20/zensols.deeplearn-1.11.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-14 20:50:48",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "plandes",
    "github_project": "deeplearn",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "zensols.deeplearn"
}
        
Elapsed time: 0.23279s