profit

Name	profit JSON
Version	0.6 JSON
	download
home_page	https://github.com/redmod-team/profit
Summary	Probabilistic response model fitting with interactive tools
upload_time	2022-12-23 22:09:11
maintainer
docs_url	None
author	Christopher Albert
requires_python	>=3.7
license	MIT
keywords	parameter study gaussian process regression hpc active learning
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            [![DOI](https://zenodo.org/badge/168945305.svg)](https://zenodo.org/badge/latestdoi/168945305)
[![PyPI](https://img.shields.io/pypi/v/profit)](https://pypi.org/project/profit/)
[![Python Versions](https://img.shields.io/pypi/pyversions/profit)](https://pypi.org/project/profit/)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![Coverage Status](https://coveralls.io/repos/github/redmod-team/profit/badge.svg)](https://coveralls.io/github/redmod-team/profit)

[![Documentation Status](https://readthedocs.org/projects/profit/badge/?version=latest)](https://profit.readthedocs.io/en/latest/?badge=latest)
[![Install & Test Status](https://github.com/redmod-team/profit/actions/workflows/install-and-test.yml/badge.svg?)](https://github.com/redmod-team/profit/actions/workflows/install-and-test.yml)
[![pre-commit.ci status](https://results.pre-commit.ci/badge/github/redmod-team/profit/master.svg)](https://results.pre-commit.ci/latest/github/redmod-team/profit/master)
[![Publish to PyPI Status](https://github.com/redmod-team/profit/actions/workflows/publish-to-pypi.yml/badge.svg)](https://github.com/redmod-team/profit/actions/workflows/publish-to-pypi.yml)

<img src="https://raw.githubusercontent.com/redmod-team/profit/master/logo.png" width="208.5px">

# Probabilistic Response Model Fitting with Interactive Tools

This is a collection of tools for studying parametric dependencies of
black-box simulation codes or experiments and construction of reduced
order response models over input parameter space.

proFit can be fed with a number of data points consisting of different
input parameter combinations and the resulting output of the simulation under
investigation. It then fits a response-surface through the point cloud
using Gaussian process regression (GPR) models.
This probabilistic response model allows to predict ("interpolate") the output
at yet unexplored parameter combinations including uncertainty estimates.
It can also tell you where to put more training points to gain maximum new
information (experimental design) and automatically generate and start
new simulation runs locally or on a cluster. Results can be explored and checked
visually in a web frontend.

Telling proFit how to interact with your existing simulations is easy
and requires no changes in your existing code. Current functionality covers
starting simulations locally or on a cluster via [Slurm](https://slurm.schedmd.com), subsequent
surrogate modelling using [GPy](https://github.com/SheffieldML/GPy),
[scikit-learn](https://github.com/scikit-learn/scikit-learn),
as well as an active learning algorithm to iteratively sample at interesting
points and a Markov-Chain-Monte-Carlo (MCMC) algorithm. The web frontend to interactively explore the point cloud
and surrogate is based on [plotly/dash](https://github.com/plotly/dash).

## Features

* Compute evaluation points (e.g. from a random distribution) to run simulation
* Template replacement and automatic generation of run directories
* Starting parallel runs locally or on the cluster (SLURM)
* Collection of result output and postprocessing
* Response-model fitting using Gaussian Process Regression and Linear Regression
* Active learning to reduce number of samples needed
* MCMC to find a posterior parameter distribution (similar to active learning)
* Graphical user interface to explore the results

## Installation

Currently, the code is under heavy development, so it should be cloned
from GitHub via Git and pulled regularly.

### Requirements
```bash
sudo apt install python3-dev build-essential
```
To enable compilation of the fortran modules the following is needed:
```bash
sudo apt install gfortran
```

### Dependencies
* numpy, scipy, matplotlib, sympy, pandas
* [ChaosPy](https://github.com/jonathf/chaospy)
* GPy
* scikit-learn
* h5py
* [plotly/dash](https://github.com/plotly/dash) - for the UI
* [ZeroMQ](https://github.com/zeromq/pyzmq) - for messaging
* sphinx - for documentation, only needed when `docs` is specified
* torch, GPyTorch - only needed when `gpu` is specified

All dependencies are configured in `setup.cfg` and should be installed automatically when using `pip`.

Automatic tests use `pytest`.

### Windows 10
To install proFit under Windows 10 we recommend using *Windows Subsystem
for Linux (WSL2)* with the Ubuntu 20.04 LTS distribution ([install guide](https://docs.microsoft.com/en-us/windows/wsl/install-win10)).

After the installation of WSL2 execute the following steps in your Linux terminal (when asked press `y` to continue):

Make sure you have the right version of Python installed and the basic developer toolset available
   ```bash
   sudo apt update
   sudo apt install python3 python3-pip python3-dev build-essential
   ```

To install proFit from Git (see below), make sure that the project is located in the Linux file system
not the Windows system.

To configure the Python interpreter available in your Linux distribution in pycharm
(tested with professional edition) follow this [guide](https://www.jetbrains.com/help/pycharm/using-wsl-as-a-remote-interpreter.html).

### Installation from PyPI
To install the latest stable version of proFit, use
```bash
pip install profit
```

For the latest pre-release, use
```bash
pip install --pre profit
```


### Installation from Git
To install proFit for the current user (`--user`) in development-mode (`-e`) use:
```bash
git clone https://github.com/redmod-team/profit.git
cd profit
pip install -e . --user
```

### Fortran
Certain surrogates require a compiled Fortran backend. To enable compilation of the fortran modules during install:

    USE_FORTRAN=1 pip install .

### Troubleshooting installation problems
1. Make sure you have all the requirements mentioned above installed.

2. If `pip` is not recognized try the following:
```bash
python3 -m pip install -e . --user
```
3. If pip warns you about PATH or proFit is not found close and reopen the terminal
   and type `profit --help` to check if the installation was successful.


### Documentation using *Sphinx*
Install requirements for building the documentation using `sphinx`

    pip install .[docs]

Additionally `pandoc` is required on a system level:

    sudo apt install pandoc


## HowTo

Examples for different model codes are available under `examples/`:
* `fit`: Simple fit via python interface.
* `mockup`: Simple model called by console command based on template directory.

Also, the integration tests under `tests/integration_tests/` may be informative examples:
* `active_learning`:
  * 1D: One dimensional mockup with active learning
  * 2D: Two dimensional mockup with active learning
  * Log: Active learning with logarithmic search space
  * MCMC: Markov-Chain-Monte-Carlo application to mockup experimental data
* `mockup`:
  * 1D
  * 2D
  * Custom postprocessor: Instead of the prebuilt postprocessor, a user-built class is used.
  * Custom worker: A user-built worker function is used.
  * Independent: Output with an independent (linear) variable additional to input parameters: f(t; u, v).
  * KarhunenLoeve: Multi output surrogate model with Karhunen-Loeve encoder.
  * Multi output: Multi output surrogate with two different output variables.

### Steps

1. Create and enter a directory (e.g. `study`) containing `profit.yaml` for your run.
    If your code is based on text configuration files for each run, copy the according directory to `template` and
    replace values of parameters to be varied within UQ/surrogate models by placeholders `{param}`.

2. Running the simulations:
   ```bash
   profit run
   ```
   to start simulations at all the points. Per default the generated input variables are written to `input.txt` and the
   output data is collected in `output.txt`.

   For each run of the simulation, proFit creates a run directory, fills the templates with the generated input data and
   collects the results. Each step can be customized with the
   [configuration file](https://profit.readthedocs.io/en/latest/config.html).

3. To fit the model:
   ```bash
   profit fit
   ```
   Customization can be done with `profit.yaml` again.

4. Explore data graphically:
   ```bash
   profit ui
   ```
   starts a Dash-based browser UI

The figure below gives a graphical representation of the typical profit workflow described above.
The boxes in red describe user actions while the boxes in blue are conducted by profit.

<img src="https://raw.githubusercontent.com/redmod-team/profit/master/doc/pics/profit_workflow.png" width="300px">

### Cluster
proFit supports scheduling the runs on a cluster using *slurm*. This is done entirely via the configuration files and
the usage doesn't change.

`profit ui` starts a *dash* server and it is possible to remotely connect to it (e.g. via *ssh port forwarding*)

## User-supplied files

* a [configuration file](https://profit.readthedocs.io/en/latest/config.html): (default: `profit.yaml`)
  * Add parameters and their distributions via `variables`
  * Set paths and filenames
  * Configure the run backend (how to interact with the simulation)
  * Configure the fit / surrogate model

* the `template` directory
  * containing everything a simulation run needs (scripts, links to executables, input files, etc)
  * input files use a template format where `{variable_name}` is substituted with the generated values

* a custom *Postprocessor* (optional)
  * if the default postprocessors don't work with the simulation a custom one can be specified using the `include` parameter in the configuration.

Example directory structure:

<img src="https://raw.githubusercontent.com/redmod-team/profit/master/doc/pics/example_directory.png" width="200px">

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/redmod-team/profit",
    "name": "profit",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "Parameter Study,Gaussian Process,Regression,HPC,Active Learning",
    "author": "Christopher Albert",
    "author_email": "albert@tugraz.at",
    "download_url": "https://files.pythonhosted.org/packages/5f/5f/417ece7db8b8bb09805f3098974b3fb374631cee792e066c5a2cbb0a7712/profit-0.6.tar.gz",
    "platform": null,
    "description": "[![DOI](https://zenodo.org/badge/168945305.svg)](https://zenodo.org/badge/latestdoi/168945305)\n[![PyPI](https://img.shields.io/pypi/v/profit)](https://pypi.org/project/profit/)\n[![Python Versions](https://img.shields.io/pypi/pyversions/profit)](https://pypi.org/project/profit/)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n[![Coverage Status](https://coveralls.io/repos/github/redmod-team/profit/badge.svg)](https://coveralls.io/github/redmod-team/profit)\n\n[![Documentation Status](https://readthedocs.org/projects/profit/badge/?version=latest)](https://profit.readthedocs.io/en/latest/?badge=latest)\n[![Install & Test Status](https://github.com/redmod-team/profit/actions/workflows/install-and-test.yml/badge.svg?)](https://github.com/redmod-team/profit/actions/workflows/install-and-test.yml)\n[![pre-commit.ci status](https://results.pre-commit.ci/badge/github/redmod-team/profit/master.svg)](https://results.pre-commit.ci/latest/github/redmod-team/profit/master)\n[![Publish to PyPI Status](https://github.com/redmod-team/profit/actions/workflows/publish-to-pypi.yml/badge.svg)](https://github.com/redmod-team/profit/actions/workflows/publish-to-pypi.yml)\n\n<img src=\"https://raw.githubusercontent.com/redmod-team/profit/master/logo.png\" width=\"208.5px\">\n\n# Probabilistic Response Model Fitting with Interactive Tools\n\nThis is a collection of tools for studying parametric dependencies of\nblack-box simulation codes or experiments and construction of reduced\norder response models over input parameter space.\n\nproFit can be fed with a number of data points consisting of different\ninput parameter combinations and the resulting output of the simulation under\ninvestigation. It then fits a response-surface through the point cloud\nusing Gaussian process regression (GPR) models.\nThis probabilistic response model allows to predict (\"interpolate\") the output\nat yet unexplored parameter combinations including uncertainty estimates.\nIt can also tell you where to put more training points to gain maximum new\ninformation (experimental design) and automatically generate and start\nnew simulation runs locally or on a cluster. Results can be explored and checked\nvisually in a web frontend.\n\nTelling proFit how to interact with your existing simulations is easy\nand requires no changes in your existing code. Current functionality covers\nstarting simulations locally or on a cluster via [Slurm](https://slurm.schedmd.com), subsequent\nsurrogate modelling using [GPy](https://github.com/SheffieldML/GPy),\n[scikit-learn](https://github.com/scikit-learn/scikit-learn),\nas well as an active learning algorithm to iteratively sample at interesting\npoints and a Markov-Chain-Monte-Carlo (MCMC) algorithm. The web frontend to interactively explore the point cloud\nand surrogate is based on [plotly/dash](https://github.com/plotly/dash).\n\n## Features\n\n* Compute evaluation points (e.g. from a random distribution) to run simulation\n* Template replacement and automatic generation of run directories\n* Starting parallel runs locally or on the cluster (SLURM)\n* Collection of result output and postprocessing\n* Response-model fitting using Gaussian Process Regression and Linear Regression\n* Active learning to reduce number of samples needed\n* MCMC to find a posterior parameter distribution (similar to active learning)\n* Graphical user interface to explore the results\n\n## Installation\n\nCurrently, the code is under heavy development, so it should be cloned\nfrom GitHub via Git and pulled regularly.\n\n### Requirements\n```bash\nsudo apt install python3-dev build-essential\n```\nTo enable compilation of the fortran modules the following is needed:\n```bash\nsudo apt install gfortran\n```\n\n### Dependencies\n* numpy, scipy, matplotlib, sympy, pandas\n* [ChaosPy](https://github.com/jonathf/chaospy)\n* GPy\n* scikit-learn\n* h5py\n* [plotly/dash](https://github.com/plotly/dash) - for the UI\n* [ZeroMQ](https://github.com/zeromq/pyzmq) - for messaging\n* sphinx - for documentation, only needed when `docs` is specified\n* torch, GPyTorch - only needed when `gpu` is specified\n\nAll dependencies are configured in `setup.cfg` and should be installed automatically when using `pip`.\n\nAutomatic tests use `pytest`.\n\n### Windows 10\nTo install proFit under Windows 10 we recommend using *Windows Subsystem\nfor Linux (WSL2)* with the Ubuntu 20.04 LTS distribution ([install guide](https://docs.microsoft.com/en-us/windows/wsl/install-win10)).\n\nAfter the installation of WSL2 execute the following steps in your Linux terminal (when asked press `y` to continue):\n\nMake sure you have the right version of Python installed and the basic developer toolset available\n   ```bash\n   sudo apt update\n   sudo apt install python3 python3-pip python3-dev build-essential\n   ```\n\nTo install proFit from Git (see below), make sure that the project is located in the Linux file system\nnot the Windows system.\n\nTo configure the Python interpreter available in your Linux distribution in pycharm\n(tested with professional edition) follow this [guide](https://www.jetbrains.com/help/pycharm/using-wsl-as-a-remote-interpreter.html).\n\n### Installation from PyPI\nTo install the latest stable version of proFit, use\n```bash\npip install profit\n```\n\nFor the latest pre-release, use\n```bash\npip install --pre profit\n```\n\n\n### Installation from Git\nTo install proFit for the current user (`--user`) in development-mode (`-e`) use:\n```bash\ngit clone https://github.com/redmod-team/profit.git\ncd profit\npip install -e . --user\n```\n\n### Fortran\nCertain surrogates require a compiled Fortran backend. To enable compilation of the fortran modules during install:\n\n    USE_FORTRAN=1 pip install .\n\n### Troubleshooting installation problems\n1. Make sure you have all the requirements mentioned above installed.\n\n2. If `pip` is not recognized try the following:\n```bash\npython3 -m pip install -e . --user\n```\n3. If pip warns you about PATH or proFit is not found close and reopen the terminal\n   and type `profit --help` to check if the installation was successful.\n\n\n### Documentation using *Sphinx*\nInstall requirements for building the documentation using `sphinx`\n\n    pip install .[docs]\n\nAdditionally `pandoc` is required on a system level:\n\n    sudo apt install pandoc\n\n\n## HowTo\n\nExamples for different model codes are available under `examples/`:\n* `fit`: Simple fit via python interface.\n* `mockup`: Simple model called by console command based on template directory.\n\nAlso, the integration tests under `tests/integration_tests/` may be informative examples:\n* `active_learning`:\n  * 1D: One dimensional mockup with active learning\n  * 2D: Two dimensional mockup with active learning\n  * Log: Active learning with logarithmic search space\n  * MCMC: Markov-Chain-Monte-Carlo application to mockup experimental data\n* `mockup`:\n  * 1D\n  * 2D\n  * Custom postprocessor: Instead of the prebuilt postprocessor, a user-built class is used.\n  * Custom worker: A user-built worker function is used.\n  * Independent: Output with an independent (linear) variable additional to input parameters: f(t; u, v).\n  * KarhunenLoeve: Multi output surrogate model with Karhunen-Loeve encoder.\n  * Multi output: Multi output surrogate with two different output variables.\n\n### Steps\n\n1. Create and enter a directory (e.g. `study`) containing `profit.yaml` for your run.\n    If your code is based on text configuration files for each run, copy the according directory to `template` and\n    replace values of parameters to be varied within UQ/surrogate models by placeholders `{param}`.\n\n2. Running the simulations:\n   ```bash\n   profit run\n   ```\n   to start simulations at all the points. Per default the generated input variables are written to `input.txt` and the\n   output data is collected in `output.txt`.\n\n   For each run of the simulation, proFit creates a run directory, fills the templates with the generated input data and\n   collects the results. Each step can be customized with the\n   [configuration file](https://profit.readthedocs.io/en/latest/config.html).\n\n3. To fit the model:\n   ```bash\n   profit fit\n   ```\n   Customization can be done with `profit.yaml` again.\n\n4. Explore data graphically:\n   ```bash\n   profit ui\n   ```\n   starts a Dash-based browser UI\n\nThe figure below gives a graphical representation of the typical profit workflow described above.\nThe boxes in red describe user actions while the boxes in blue are conducted by profit.\n\n<img src=\"https://raw.githubusercontent.com/redmod-team/profit/master/doc/pics/profit_workflow.png\" width=\"300px\">\n\n### Cluster\nproFit supports scheduling the runs on a cluster using *slurm*. This is done entirely via the configuration files and\nthe usage doesn't change.\n\n`profit ui` starts a *dash* server and it is possible to remotely connect to it (e.g. via *ssh port forwarding*)\n\n## User-supplied files\n\n* a [configuration file](https://profit.readthedocs.io/en/latest/config.html): (default: `profit.yaml`)\n  * Add parameters and their distributions via `variables`\n  * Set paths and filenames\n  * Configure the run backend (how to interact with the simulation)\n  * Configure the fit / surrogate model\n\n* the `template` directory\n  * containing everything a simulation run needs (scripts, links to executables, input files, etc)\n  * input files use a template format where `{variable_name}` is substituted with the generated values\n\n* a custom *Postprocessor* (optional)\n  * if the default postprocessors don't work with the simulation a custom one can be specified using the `include` parameter in the configuration.\n\nExample directory structure:\n\n<img src=\"https://raw.githubusercontent.com/redmod-team/profit/master/doc/pics/example_directory.png\" width=\"200px\">\n\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Probabilistic response model fitting with interactive tools",
    "version": "0.6",
    "split_keywords": [
        "parameter study",
        "gaussian process",
        "regression",
        "hpc",
        "active learning"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "3f7c596b7b7b15951aace1c258892207",
                "sha256": "892ca10cb892325be068bffaaaa566ffa992effd0c2b6052d189eb1bdfab4011"
            },
            "downloads": -1,
            "filename": "profit-0.6-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3f7c596b7b7b15951aace1c258892207",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 101552,
            "upload_time": "2022-12-23T22:09:09",
            "upload_time_iso_8601": "2022-12-23T22:09:09.010383Z",
            "url": "https://files.pythonhosted.org/packages/a4/15/097897b8d9d6b17536b85de3c8c68b7b3c8300107bd3baf4832d8f13a20f/profit-0.6-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "653e6da054a2af2cb90804579fcee72f",
                "sha256": "922e74ea6ae834a4bc18bfb9ef466970ba981a58791cfda6d3a731589e72d4c9"
            },
            "downloads": -1,
            "filename": "profit-0.6.tar.gz",
            "has_sig": false,
            "md5_digest": "653e6da054a2af2cb90804579fcee72f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 2068313,
            "upload_time": "2022-12-23T22:09:11",
            "upload_time_iso_8601": "2022-12-23T22:09:11.024921Z",
            "url": "https://files.pythonhosted.org/packages/5f/5f/417ece7db8b8bb09805f3098974b3fb374631cee792e066c5a2cbb0a7712/profit-0.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2022-12-23 22:09:11",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "redmod-team",
    "github_project": "profit",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "profit"
}

Christopher Albert