ocrd-fork-tfaip


Nameocrd-fork-tfaip JSON
Version 1.2.7 PyPI version JSON
download
home_pagehttps://github.com/bertsky/tfaip
SummaryPython-based research framework for developing, organizing, and deploying Deep Learning models powered by Tensorflow.
upload_time2024-09-28 22:12:14
maintainerbertsky
docs_urlNone
authorPLANET AI GmbH
requires_python>=3.7
licenseGPL-v3.0
keywords machine learning tensorflow framework
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![Python Test](https://github.com/Planet-AI-GmbH/tfaip/actions/workflows/python-test.yml/badge.svg)](https://github.com/Planet-AI-GmbH/tfaip/actions/workflows/python-test.yml)
[![Python Test](https://github.com/Planet-AI-GmbH/tfaip/actions/workflows/python-publish.yml/badge.svg)](https://github.com/Planet-AI-GmbH/tfaip/actions/workflows/python-publish.yml)

# _tfaip_ - A Generic and Powerful Research Framework for Deep Learning based on Tensorflow

*tfaip* is a Python-based research framework for developing, organizing, and deploying Deep Learning models powered by [Tensorflow](https://www.tensorflow.org/).
It enables to implement both simple and complex scenarios that are structured and highly configurable by parameters that can directly be modified by the command line (read the [docs](https://tfaip.readthedocs.io)).
For example, the [tutorial.full](examples/tutorial/full)-scenario for learning MNIST allows to modify the graph during training but also other hyper-parameters such as the optimizer:
```bash
export PYTHONPATH=$PWD  # set the PYTHONPATH so that the examples dir is found
# Change the graph
tfaip-train examples.tutorial.full --model.graph MLP --model.graph.nodes 200 100 50 --model.graph.activation relu
tfaip-train examples.tutorial.full --model.graph MLP --model.graph.nodes 200 100 50 --model.graph.activation tanh
tfaip-train examples.tutorial.full --model.graph CNN --model.graph.filters 40 20 --model.graph.dense 100
# Change the optimizer
tfaip-train examples.tutorial.full --trainer.optimizer RMSprop --trainer.optimizer.beta1 0.01 --trainer.optimizer.clip_global_norm 1
# ...
```

A trained model can then easily be integrated in a workflow to predict provided `data`:
```python
predictor = TutorialScenario.create_predictor("PATH_TO_TRAINED_MODEL", PredictorParams())
for sample in predictor.predict(data):
    print(sample.outputs)
```

In practice, _tfaip_ follows the rules of object orientation, i.e., the code for a scenario (e.g., image-classification (MNIST), text recognition, NLP, etc.) is organized by implementing classes.
By default, each [`Scenario`](https://tfaip.readthedocs.io/en/latest/doc.scenario.html) must implement [`Model`](https://tfaip.readthedocs.io/en/latest/doc.model.html), and [`Data`](https://tfaip.readthedocs.io/en/latest/doc.data.html).
See [here](examples/tutorial/full) for the complete code to run the upper example for MNIST and see [here](examples/tutorial/min) for the minimal setup.


## Setup

To setup _tfaip_ create a virtual Python (at least 3.7) environment and install the `tfaip` pip package: `pip install tfaip`:
```bash
virtualenv -p python3 venv
source venv/bin/activate
pip install tfaip
pip install tfaip[devel]  # to install additional development/test requirements
```
Have a look at the [wiki](https://tfaip.readthedocs.io/en/latest/doc.installation.html) for further setup instructions.

## Run the Tutorial

After the setup succeeded, launch a training of the tutorial which is an implementation of the common MNIST scenario:
```bash
export PYTHONPATH=$PWD  # set the PYTHONPATH so that the examples dir is found
tfaip-train examples.tutorial.full
# If you have a GPU, select it by specifying its ID
tfaip-train examples.tutorial.full --device.gpus 0
```

## Next Steps

Start reading the [Minimum Tutorial](examples/tutorial/min), optionally have a look at the [Full Tutorial](examples/tutorial/full) to see more features.
The [docs](https://tfaip.readthedocs.io/en/latest) provides a full description of `tfaip`.

To set up a _new custom scenario_, copy the [general template](examples/template/general) and implement the abstract methods.
Consider renaming the classes!
Launch the training by providing the path or package-name of the new scenario which _must_ be located in the `PYTHONPATH`!

## Features of _tfaip_

_tfaip_ provides different features which allow designing generic scenarios with maximum flexibility and high performance.

### Code design

* _Fully Object-Oriented_: Implement classes and abstract functions or overwrite any function to extend, adapt, or modify its default functionality.
* _Typing support_: _tfaip_ is fully typed with simplifies working with an IDE (e.g., use PyCharm!).
* Using pythons `dataclasses` module to set up parameters which are automatically converted to parameters of the command line by our [`paiargparse`](https://github.com/Planet-AI-GmbH/paiargparse) package.

### Data-Pipeline
Every scenario requires the setup of a data-pipeline to read and transform data.
*tfaip* offers to easily implement and modify even complex pipelines by defining multiple `DataProcessors` which usually implement a small operation to map an input sample to an output sample.
E.g., one `DataProcessor` loads the data (`input=filename`, `output=image`), another one applies normalization rules, again another one applies data augmentation, etc.
The **great advantage** of this setup is that the data processors run in Python and can automatically be parallelized by *tfaip* for speed up by setting `run_parallel=True`.

### Deep-Learning-Features

Since _tfaip_ is based on Tensorflow the full API are available for designing models, graphs, and even data pipelines.
Furthermore, *tfaip* supports additional common techniques for improving the performance of a Deep-Learning model out of the box:

* Warm-starting (i.e., loading a pretrained model)
* EMA-weights
* Early-Stopping
* Weight-Decay
* various optimizers and learning-rate schedules

## Contributing

We highly encourage users to contribute own scenarios and improvements of _tfaip_.
Please read the [contribution guidelines](https://tfaip.readthedocs.io/en/latest/doc.development.html).

## Benchmarks

All timings were obtained on a Intel Core i7, 10th Gen CPU.

### MNIST

The following Table compares the MNIST Tutorial of Keras to the [Minimum Tutorial](examples/tutorial/min).
The keras code was adopted to use the same network architecture and hyperparemter settings (batch size of 16, 10 epochs of training).

Code | Time Per Epoch | Train Acc | Val Acc | Best Val Acc
:---- | --------------: | ---------: | -------: | ------------: 
Keras |  16 s | 99.65% | 98.24% | 98.60% 
_tfaip_ | 18 s |  99.76% | 98.66% | 98.66% 

_tfaip_ and Keras result in comparable accuracies, as to be expected since the actual code for training the graph is fundamentally identical.
_tfaip_ is however a bit slower due some overhead in the input pipeline and additional functionality (e.g., benchmarks, or automatic tracking of the best model).
This overhead is negligible for almost any real-world scenario because due to a clearly larger network architecture, the computation times for inference and backpropagation become the bottleneck. 

### Data Pipeline

Integrating pure-python operations (e.g., numpy) into a `tf.data.Dataset `to apply high-level preprocessing is slow by default since [tf.data.Dataset.map](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#map) in cooperation with [tf.py_function](https://www.tensorflow.org/api_docs/python/tf/py_function) does not run in parallel and is therefore blocked by Python's GIL.
_tfaip_ curcumvents this issue by providing an (optional) parallelizable input pipeline.
The following table shows the time in seconds for two different tasks:

* PYTHON: applying some pure python functions on the data
* NUMPY: applying several numpy operations on the data


|         Mode        |     Task     |     Threads 1      |     Threads 2      |     Threads 4      |     Threads 6      |
|:---------------------|:--------------|--------------------:|--------------------:|--------------------:|--------------------:|
| tf.py_function |    PYTHON    | 23.47| 22.78 | 24.38  | 25.76  |
|     _tfaip_    |    PYTHON    | 26.68| 14.48 |  8.11  | 8.13  |
| tf.py_function |    NUMPY     | 104.10 | 82.78  | 76.33  | 77.56  |
|     _tfaip_    |    NUMPY     | 97.07  | 56.93  | 43.78 | 42.73  |

The PYTHON task clearly shows that `tf.data.Dataset.map` is not able to utilize multiple threads.
The speed-up in the NUMPY tasks occurs possibly due to paralization in the numpy API to C.


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/bertsky/tfaip",
    "name": "ocrd-fork-tfaip",
    "maintainer": "bertsky",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": "machine learning, tensorflow, framework",
    "author": "PLANET AI GmbH",
    "author_email": "admin@planet-ai.de",
    "download_url": "https://files.pythonhosted.org/packages/91/c9/fff687dd7819268cb4a5a492077767a800cf6a82f7fabdc8bffd0b2c4db5/ocrd_fork_tfaip-1.2.7.tar.gz",
    "platform": null,
    "description": "[![Python Test](https://github.com/Planet-AI-GmbH/tfaip/actions/workflows/python-test.yml/badge.svg)](https://github.com/Planet-AI-GmbH/tfaip/actions/workflows/python-test.yml)\n[![Python Test](https://github.com/Planet-AI-GmbH/tfaip/actions/workflows/python-publish.yml/badge.svg)](https://github.com/Planet-AI-GmbH/tfaip/actions/workflows/python-publish.yml)\n\n# _tfaip_ - A Generic and Powerful Research Framework for Deep Learning based on Tensorflow\n\n*tfaip* is a Python-based research framework for developing, organizing, and deploying Deep Learning models powered by [Tensorflow](https://www.tensorflow.org/).\nIt enables to implement both simple and complex scenarios that are structured and highly configurable by parameters that can directly be modified by the command line (read the [docs](https://tfaip.readthedocs.io)).\nFor example, the [tutorial.full](examples/tutorial/full)-scenario for learning MNIST allows to modify the graph during training but also other hyper-parameters such as the optimizer:\n```bash\nexport PYTHONPATH=$PWD  # set the PYTHONPATH so that the examples dir is found\n# Change the graph\ntfaip-train examples.tutorial.full --model.graph MLP --model.graph.nodes 200 100 50 --model.graph.activation relu\ntfaip-train examples.tutorial.full --model.graph MLP --model.graph.nodes 200 100 50 --model.graph.activation tanh\ntfaip-train examples.tutorial.full --model.graph CNN --model.graph.filters 40 20 --model.graph.dense 100\n# Change the optimizer\ntfaip-train examples.tutorial.full --trainer.optimizer RMSprop --trainer.optimizer.beta1 0.01 --trainer.optimizer.clip_global_norm 1\n# ...\n```\n\nA trained model can then easily be integrated in a workflow to predict provided `data`:\n```python\npredictor = TutorialScenario.create_predictor(\"PATH_TO_TRAINED_MODEL\", PredictorParams())\nfor sample in predictor.predict(data):\n    print(sample.outputs)\n```\n\nIn practice, _tfaip_ follows the rules of object orientation, i.e., the code for a scenario (e.g., image-classification (MNIST), text recognition, NLP, etc.) is organized by implementing classes.\nBy default, each [`Scenario`](https://tfaip.readthedocs.io/en/latest/doc.scenario.html) must implement [`Model`](https://tfaip.readthedocs.io/en/latest/doc.model.html), and [`Data`](https://tfaip.readthedocs.io/en/latest/doc.data.html).\nSee [here](examples/tutorial/full) for the complete code to run the upper example for MNIST and see [here](examples/tutorial/min) for the minimal setup.\n\n\n## Setup\n\nTo setup _tfaip_ create a virtual Python (at least 3.7) environment and install the `tfaip` pip package: `pip install tfaip`:\n```bash\nvirtualenv -p python3 venv\nsource venv/bin/activate\npip install tfaip\npip install tfaip[devel]  # to install additional development/test requirements\n```\nHave a look at the [wiki](https://tfaip.readthedocs.io/en/latest/doc.installation.html) for further setup instructions.\n\n## Run the Tutorial\n\nAfter the setup succeeded, launch a training of the tutorial which is an implementation of the common MNIST scenario:\n```bash\nexport PYTHONPATH=$PWD  # set the PYTHONPATH so that the examples dir is found\ntfaip-train examples.tutorial.full\n# If you have a GPU, select it by specifying its ID\ntfaip-train examples.tutorial.full --device.gpus 0\n```\n\n## Next Steps\n\nStart reading the [Minimum Tutorial](examples/tutorial/min), optionally have a look at the [Full Tutorial](examples/tutorial/full) to see more features.\nThe [docs](https://tfaip.readthedocs.io/en/latest) provides a full description of `tfaip`.\n\nTo set up a _new custom scenario_, copy the [general template](examples/template/general) and implement the abstract methods.\nConsider renaming the classes!\nLaunch the training by providing the path or package-name of the new scenario which _must_ be located in the `PYTHONPATH`!\n\n## Features of _tfaip_\n\n_tfaip_ provides different features which allow designing generic scenarios with maximum flexibility and high performance.\n\n### Code design\n\n* _Fully Object-Oriented_: Implement classes and abstract functions or overwrite any function to extend, adapt, or modify its default functionality.\n* _Typing support_: _tfaip_ is fully typed with simplifies working with an IDE (e.g., use PyCharm!).\n* Using pythons `dataclasses` module to set up parameters which are automatically converted to parameters of the command line by our [`paiargparse`](https://github.com/Planet-AI-GmbH/paiargparse) package.\n\n### Data-Pipeline\nEvery scenario requires the setup of a data-pipeline to read and transform data.\n*tfaip* offers to easily implement and modify even complex pipelines by defining multiple `DataProcessors` which usually implement a small operation to map an input sample to an output sample.\nE.g., one `DataProcessor` loads the data (`input=filename`, `output=image`), another one applies normalization rules, again another one applies data augmentation, etc.\nThe **great advantage** of this setup is that the data processors run in Python and can automatically be parallelized by *tfaip* for speed up by setting `run_parallel=True`.\n\n### Deep-Learning-Features\n\nSince _tfaip_ is based on Tensorflow the full API are available for designing models, graphs, and even data pipelines.\nFurthermore, *tfaip* supports additional common techniques for improving the performance of a Deep-Learning model out of the box:\n\n* Warm-starting (i.e., loading a pretrained model)\n* EMA-weights\n* Early-Stopping\n* Weight-Decay\n* various optimizers and learning-rate schedules\n\n## Contributing\n\nWe highly encourage users to contribute own scenarios and improvements of _tfaip_.\nPlease read the [contribution guidelines](https://tfaip.readthedocs.io/en/latest/doc.development.html).\n\n## Benchmarks\n\nAll timings were obtained on a Intel Core i7, 10th Gen CPU.\n\n### MNIST\n\nThe following Table compares the MNIST Tutorial of Keras to the [Minimum Tutorial](examples/tutorial/min).\nThe keras code was adopted to use the same network architecture and hyperparemter settings (batch size of 16, 10 epochs of training).\n\nCode | Time Per Epoch | Train Acc | Val Acc | Best Val Acc\n:---- | --------------: | ---------: | -------: | ------------: \nKeras |  16 s | 99.65% | 98.24% | 98.60% \n_tfaip_ | 18 s |  99.76% | 98.66% | 98.66% \n\n_tfaip_ and Keras result in comparable accuracies, as to be expected since the actual code for training the graph is fundamentally identical.\n_tfaip_ is however a bit slower due some overhead in the input pipeline and additional functionality (e.g., benchmarks, or automatic tracking of the best model).\nThis overhead is negligible for almost any real-world scenario because due to a clearly larger network architecture, the computation times for inference and backpropagation become the bottleneck. \n\n### Data Pipeline\n\nIntegrating pure-python operations (e.g., numpy) into a `tf.data.Dataset `to apply high-level preprocessing is slow by default since [tf.data.Dataset.map](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#map) in cooperation with [tf.py_function](https://www.tensorflow.org/api_docs/python/tf/py_function) does not run in parallel and is therefore blocked by Python's GIL.\n_tfaip_ curcumvents this issue by providing an (optional) parallelizable input pipeline.\nThe following table shows the time in seconds for two different tasks:\n\n* PYTHON: applying some pure python functions on the data\n* NUMPY: applying several numpy operations on the data\n\n\n|         Mode        |     Task     |     Threads 1      |     Threads 2      |     Threads 4      |     Threads 6      |\n|:---------------------|:--------------|--------------------:|--------------------:|--------------------:|--------------------:|\n| tf.py_function |    PYTHON    | 23.47| 22.78 | 24.38  | 25.76  |\n|     _tfaip_    |    PYTHON    | 26.68| 14.48 |  8.11  | 8.13  |\n| tf.py_function |    NUMPY     | 104.10 | 82.78  | 76.33  | 77.56  |\n|     _tfaip_    |    NUMPY     | 97.07  | 56.93  | 43.78 | 42.73  |\n\nThe PYTHON task clearly shows that `tf.data.Dataset.map` is not able to utilize multiple threads.\nThe speed-up in the NUMPY tasks occurs possibly due to paralization in the numpy API to C.\n\n",
    "bugtrack_url": null,
    "license": "GPL-v3.0",
    "summary": "Python-based research framework for developing, organizing, and deploying Deep Learning models powered by Tensorflow.",
    "version": "1.2.7",
    "project_urls": {
        "Homepage": "https://github.com/bertsky/tfaip"
    },
    "split_keywords": [
        "machine learning",
        " tensorflow",
        " framework"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "adf1d09888ce5676c710f1eddb13656c80209342783b86cc2639ad25343ce089",
                "md5": "110227275fa76500b3671e08c679cc97",
                "sha256": "e33ab958eeecc6a6c99a1fc5e52e5b49e35731166650326675f42ac839b5bcd7"
            },
            "downloads": -1,
            "filename": "ocrd_fork_tfaip-1.2.7-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "110227275fa76500b3671e08c679cc97",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 361547,
            "upload_time": "2024-09-28T22:12:11",
            "upload_time_iso_8601": "2024-09-28T22:12:11.489369Z",
            "url": "https://files.pythonhosted.org/packages/ad/f1/d09888ce5676c710f1eddb13656c80209342783b86cc2639ad25343ce089/ocrd_fork_tfaip-1.2.7-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "91c9fff687dd7819268cb4a5a492077767a800cf6a82f7fabdc8bffd0b2c4db5",
                "md5": "a2d283a7ebf091b22f09ad6644f14941",
                "sha256": "9a17f13c16b4ece9d7275e9fdd10ebb3b38a73361c2fc5887496229a0b7b7fb0"
            },
            "downloads": -1,
            "filename": "ocrd_fork_tfaip-1.2.7.tar.gz",
            "has_sig": false,
            "md5_digest": "a2d283a7ebf091b22f09ad6644f14941",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 183212,
            "upload_time": "2024-09-28T22:12:14",
            "upload_time_iso_8601": "2024-09-28T22:12:14.272533Z",
            "url": "https://files.pythonhosted.org/packages/91/c9/fff687dd7819268cb4a5a492077767a800cf6a82f7fabdc8bffd0b2c4db5/ocrd_fork_tfaip-1.2.7.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-28 22:12:14",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "bertsky",
    "github_project": "tfaip",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "ocrd-fork-tfaip"
}
        
Elapsed time: 0.32276s