memento-ml

Name	memento-ml JSON
Version	1.2.0 JSON
	download
home_page	None
Summary	A Python library for running computationally expensive experiments
upload_time	2024-07-03 22:48:30
maintainer	None
docs_url	None
author	None
requires_python	>=3.8
license	BSD 3-Clause License Copyright (c) 2023, wickerlab Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
keywords	experiment parallel sklearn machine learning
VCS
bugtrack_url
requirements	cloudpickle networkx numpy pandas pandas-stubs python-dateutil pytz six types-pytz tzdata
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            [![PyPI](https://img.shields.io/pypi/v/memento-ml)](https://pypi.org/project/memento-ml/)
![Python versions](https://img.shields.io/pypi/pyversions/memento-ml)
[![DOI](https://zenodo.org/badge/608197208.svg)](https://zenodo.org/doi/10.5281/zenodo.10929405)


# MEMENTO

`MEMENTO` is a Python library for running computationally expensive experiments.

Running complex sets of machine learning experiments is challenging and time-consuming due to the lack of a unified framework.
This leaves researchers forced to spend time implementing necessary features such as parallelization, caching, and checkpointing themselves instead of focussing on their project.
To simplify the process, we introduce `MEMENTO`, a Python package that is designed to aid researchers and data scientists in the efficient management and execution of computationally intensive experiments.
`MEMENTO` has the capacity to streamline any experimental pipeline by providing a straightforward configuration matrix and the ability to concurrently run experiments across multiple threads.

If you need to run a large number of time-consuming experiments `MEMENTO` can help:

- Structure your configuration
- Parallelize experiments across CPUs
- Save and restore results
- Checkpoint in-progress experiments
- Send notifications when experiments fail or finish

[![Demo video](https://img.youtube.com/vi/GEtdCl1ZUWc/0.jpg)](http://www.youtube.com/watch?v=GEtdCl1ZUWc)

## Getting Started

`MEMENTO` is officially available on PyPl. To install the package:

### Install

```bash
pip install memento-ml
```

### The Configuration Matrix

The core of `MEMENTO` is a configuration `matrix` that describes the list of experiments you
want `MEMENTO` to run. This must contain a key `parameters` which is itself a dict, this describes
each paramter you want to vary for your experiments and their values.

As an example let's say you wanted to test a few simple linear classifiers on a number of
image recognition datasets. You might write something like this:

> Don't worry if you're not working on machine learning, this is just an example.

```python
matrix = {
  "parameters": {
    "model": [
      sklearn.svm.SVC,
      sklearn.linear_model.Perceptron,
      sklearn.linear_model.LogisticRegression
    ],
    "dataset": ["imagenet", "mnist", "cifar10", "quickdraw"]
  }
}
```

`MEMENTO` would then generate 12 configurations by taking the _cartesian product_ of the
parameters.

Frequently you might also want to set some global configuration values, such as a regularization
parameter or potentially even change your preprocessing pipeline. In this case `MEMENTO` also
accepts a "settings" key. These settings apply to all experiments and can be accessed from the
configuration list as well as individual configurations.

```python
matrix = {
  "parameters": ...,
  "settings": {
    "regularization": 1e-1,
    "preprocessing": make_preprocessing_pipeline()
  }
}
```

You can also exclude specific parameter configurations. Returning to our machine learning
example, if you know SVCs perform poorly on cifar10 you might decide to skip that
experiment entirely. This is done with the "exclude" key:

```python
matrix = {
  "parameters": ...,
  "exclude": [
    {"model": sklearn.svm.SVC, "dataset": "cifar10"}
  ]
}
```

### Running an experiment

Along with a configuration matrix you need some code to run your experiments. This can be any
`Callable` such as a function, lambda, class, or class method.

```python
from memento import Memento, Config, Context

def experiment(context: Context, config: Config):
  classifier = config.model()
  dataset = fetch_dataset(config.dataset)

  classifier.fit(*dataset)

  return classifier

Memento(experiment).run(matrix)
```

You can also perform a dry run to check you've gotten the matrix correct.

```python
Memento(experiment).run(matrix, dry_run=True)
```

```python
Running configurations:
  {'model': sklearn.svm.SVC, 'dataset': 'imagenet'}
  {'model': sklearn.svm.SVC, 'dataset': 'mnist'}
  {'model': sklearn.svm.SVC, 'dataset': 'cifar10'}
  {'model': sklearn.svm.SVC, 'dataset': 'quickdraw'}
  {'model': sklearn.linear_model.Perceptron, 'dataset': 'imagenet'}
  ...
Exiting due to dry run
```

## Code demo

- Code demo can be found [here](demo).
- `MEMENTO` does not depend on any ML packages, e.g., `scikit-learn`. The `scikit-learn` and `jupyterlab` packages are required to run the demo (`./demo/*`).

```bash
pip install memento-ml scikit-learn jupyterlab
```

## Cite

If you find `MEMENTO` useful and use it in your research, please cite

> Memento: Facilitating Effortless, Efficient, and Reliable ML Experiments - 
> Z Pullar-Strecker, X Chang, L Brydon, I Ziogas, K Dost, J Wicker -
> Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 2023 - Springer -
> https://link.springer.com/chapter/10.1007/978-3-031-43430-3_21

## Roadmap

- Finish HPC support
- Improve result serialisation
- Improve customization for notification

## Contributors

- [Zac Pullar-Strecker](https://github.com/zacps)
- [Feras Albaroudi](https://github.com/NeedsSoySauce)
- [Liam Scott-Russell](https://github.com/Liam-Scott-Russell)
- [Joshua de Wet](https://github.com/Dewera)
- [Nipun Jasti](https://github.com/watefeenex)
- [James Lamberton](https://github.com/JamesLamberton)
- [Xinglong (Luke) Chang](https://github.com/changx03)
- [Liam Brydon](https://github.com/MyCreativityOutlet)
- [Ioannis Ziogas](izio995@aucklanduni.ac.nz)
- [Katharina Dost](katharina.dost@auckland.ac.nz)
- [Joerg Wicker](https://github.com/joergwicker)

## License

MEMENTO is licensed under the [3-Clause BSD License](https://opensource.org/licenses/BSD-3-Clause) license.

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "memento-ml",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "experiment, parallel, sklearn, machine learning",
    "author": null,
    "author_email": "Wickerlab dev team <luke.x.chang@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/52/e0/cac895c320758ca1684c989453591447458e32877c1da88750a381c6722e/memento_ml-1.2.0.tar.gz",
    "platform": null,
    "description": "[![PyPI](https://img.shields.io/pypi/v/memento-ml)](https://pypi.org/project/memento-ml/)\r\n![Python versions](https://img.shields.io/pypi/pyversions/memento-ml)\r\n[![DOI](https://zenodo.org/badge/608197208.svg)](https://zenodo.org/doi/10.5281/zenodo.10929405)\r\n\r\n\r\n# MEMENTO\r\n\r\n`MEMENTO` is a Python library for running computationally expensive experiments.\r\n\r\nRunning complex sets of machine learning experiments is challenging and time-consuming due to the lack of a unified framework.\r\nThis leaves researchers forced to spend time implementing necessary features such as parallelization, caching, and checkpointing themselves instead of focussing on their project.\r\nTo simplify the process, we introduce `MEMENTO`, a Python package that is designed to aid researchers and data scientists in the efficient management and execution of computationally intensive experiments.\r\n`MEMENTO` has the capacity to streamline any experimental pipeline by providing a straightforward configuration matrix and the ability to concurrently run experiments across multiple threads.\r\n\r\nIf you need to run a large number of time-consuming experiments `MEMENTO` can help:\r\n\r\n- Structure your configuration\r\n- Parallelize experiments across CPUs\r\n- Save and restore results\r\n- Checkpoint in-progress experiments\r\n- Send notifications when experiments fail or finish\r\n\r\n[![Demo video](https://img.youtube.com/vi/GEtdCl1ZUWc/0.jpg)](http://www.youtube.com/watch?v=GEtdCl1ZUWc)\r\n\r\n## Getting Started\r\n\r\n`MEMENTO` is officially available on PyPl. To install the package:\r\n\r\n### Install\r\n\r\n```bash\r\npip install memento-ml\r\n```\r\n\r\n### The Configuration Matrix\r\n\r\nThe core of `MEMENTO` is a configuration `matrix` that describes the list of experiments you\r\nwant `MEMENTO` to run. This must contain a key `parameters` which is itself a dict, this describes\r\neach paramter you want to vary for your experiments and their values.\r\n\r\nAs an example let's say you wanted to test a few simple linear classifiers on a number of\r\nimage recognition datasets. You might write something like this:\r\n\r\n> Don't worry if you're not working on machine learning, this is just an example.\r\n\r\n```python\r\nmatrix = {\r\n  \"parameters\": {\r\n    \"model\": [\r\n      sklearn.svm.SVC,\r\n      sklearn.linear_model.Perceptron,\r\n      sklearn.linear_model.LogisticRegression\r\n    ],\r\n    \"dataset\": [\"imagenet\", \"mnist\", \"cifar10\", \"quickdraw\"]\r\n  }\r\n}\r\n```\r\n\r\n`MEMENTO` would then generate 12 configurations by taking the _cartesian product_ of the\r\nparameters.\r\n\r\nFrequently you might also want to set some global configuration values, such as a regularization\r\nparameter or potentially even change your preprocessing pipeline. In this case `MEMENTO` also\r\naccepts a \"settings\" key. These settings apply to all experiments and can be accessed from the\r\nconfiguration list as well as individual configurations.\r\n\r\n```python\r\nmatrix = {\r\n  \"parameters\": ...,\r\n  \"settings\": {\r\n    \"regularization\": 1e-1,\r\n    \"preprocessing\": make_preprocessing_pipeline()\r\n  }\r\n}\r\n```\r\n\r\nYou can also exclude specific parameter configurations. Returning to our machine learning\r\nexample, if you know SVCs perform poorly on cifar10 you might decide to skip that\r\nexperiment entirely. This is done with the \"exclude\" key:\r\n\r\n```python\r\nmatrix = {\r\n  \"parameters\": ...,\r\n  \"exclude\": [\r\n    {\"model\": sklearn.svm.SVC, \"dataset\": \"cifar10\"}\r\n  ]\r\n}\r\n```\r\n\r\n### Running an experiment\r\n\r\nAlong with a configuration matrix you need some code to run your experiments. This can be any\r\n`Callable` such as a function, lambda, class, or class method.\r\n\r\n```python\r\nfrom memento import Memento, Config, Context\r\n\r\ndef experiment(context: Context, config: Config):\r\n  classifier = config.model()\r\n  dataset = fetch_dataset(config.dataset)\r\n\r\n  classifier.fit(*dataset)\r\n\r\n  return classifier\r\n\r\nMemento(experiment).run(matrix)\r\n```\r\n\r\nYou can also perform a dry run to check you've gotten the matrix correct.\r\n\r\n```python\r\nMemento(experiment).run(matrix, dry_run=True)\r\n```\r\n\r\n```python\r\nRunning configurations:\r\n  {'model': sklearn.svm.SVC, 'dataset': 'imagenet'}\r\n  {'model': sklearn.svm.SVC, 'dataset': 'mnist'}\r\n  {'model': sklearn.svm.SVC, 'dataset': 'cifar10'}\r\n  {'model': sklearn.svm.SVC, 'dataset': 'quickdraw'}\r\n  {'model': sklearn.linear_model.Perceptron, 'dataset': 'imagenet'}\r\n  ...\r\nExiting due to dry run\r\n```\r\n\r\n## Code demo\r\n\r\n- Code demo can be found [here](demo).\r\n- `MEMENTO` does not depend on any ML packages, e.g., `scikit-learn`. The `scikit-learn` and `jupyterlab` packages are required to run the demo (`./demo/*`).\r\n\r\n```bash\r\npip install memento-ml scikit-learn jupyterlab\r\n```\r\n\r\n## Cite\r\n\r\nIf you find `MEMENTO` useful and use it in your research, please cite\r\n\r\n> Memento: Facilitating Effortless, Efficient, and Reliable ML Experiments - \r\n> Z Pullar-Strecker, X Chang, L Brydon, I Ziogas, K Dost, J Wicker -\r\n> Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 2023 - Springer -\r\n> https://link.springer.com/chapter/10.1007/978-3-031-43430-3_21\r\n\r\n## Roadmap\r\n\r\n- Finish HPC support\r\n- Improve result serialisation\r\n- Improve customization for notification\r\n\r\n## Contributors\r\n\r\n- [Zac Pullar-Strecker](https://github.com/zacps)\r\n- [Feras Albaroudi](https://github.com/NeedsSoySauce)\r\n- [Liam Scott-Russell](https://github.com/Liam-Scott-Russell)\r\n- [Joshua de Wet](https://github.com/Dewera)\r\n- [Nipun Jasti](https://github.com/watefeenex)\r\n- [James Lamberton](https://github.com/JamesLamberton)\r\n- [Xinglong (Luke) Chang](https://github.com/changx03)\r\n- [Liam Brydon](https://github.com/MyCreativityOutlet)\r\n- [Ioannis Ziogas](izio995@aucklanduni.ac.nz)\r\n- [Katharina Dost](katharina.dost@auckland.ac.nz)\r\n- [Joerg Wicker](https://github.com/joergwicker)\r\n\r\n## License\r\n\r\nMEMENTO is licensed under the [3-Clause BSD License](https://opensource.org/licenses/BSD-3-Clause) license.\r\n",
    "bugtrack_url": null,
    "license": "BSD 3-Clause License  Copyright (c) 2023, wickerlab  Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.  3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS \"AS IS\" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ",
    "summary": "A Python library for running computationally expensive experiments",
    "version": "1.2.0",
    "project_urls": {
        "Homepage": "https://github.com/wickerlab/memento"
    },
    "split_keywords": [
        "experiment",
        " parallel",
        " sklearn",
        " machine learning"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1ea510530318b39e5bcdad7ca66b5c7ea33c047085a373f55d2dfd8386bbe3f2",
                "md5": "7c4631960a890ab84cd81a0e4fc2d6a5",
                "sha256": "030f155a27d0e8ea21c8d6ef662ee4b3832a563ebe7f7a3de458ce54a4850daa"
            },
            "downloads": -1,
            "filename": "memento_ml-1.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "7c4631960a890ab84cd81a0e4fc2d6a5",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 21407,
            "upload_time": "2024-07-03T22:48:28",
            "upload_time_iso_8601": "2024-07-03T22:48:28.381137Z",
            "url": "https://files.pythonhosted.org/packages/1e/a5/10530318b39e5bcdad7ca66b5c7ea33c047085a373f55d2dfd8386bbe3f2/memento_ml-1.2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "52e0cac895c320758ca1684c989453591447458e32877c1da88750a381c6722e",
                "md5": "bba8af30cf5f88d0f134ef42db317e6a",
                "sha256": "6ad579f831acc4a656c7fd316c892f4aa15708a4a7789cd280da6511f8aa436f"
            },
            "downloads": -1,
            "filename": "memento_ml-1.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "bba8af30cf5f88d0f134ef42db317e6a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 28166,
            "upload_time": "2024-07-03T22:48:30",
            "upload_time_iso_8601": "2024-07-03T22:48:30.854784Z",
            "url": "https://files.pythonhosted.org/packages/52/e0/cac895c320758ca1684c989453591447458e32877c1da88750a381c6722e/memento_ml-1.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-07-03 22:48:30",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "wickerlab",
    "github_project": "memento",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "cloudpickle",
            "specs": [
                [
                    "==",
                    "2.2.1"
                ]
            ]
        },
        {
            "name": "networkx",
            "specs": [
                [
                    "==",
                    "3.1"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    "==",
                    "1.24.3"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    "==",
                    "2.0.1"
                ]
            ]
        },
        {
            "name": "pandas-stubs",
            "specs": [
                [
                    "==",
                    "2.0.1.230501"
                ]
            ]
        },
        {
            "name": "python-dateutil",
            "specs": [
                [
                    "==",
                    "2.8.2"
                ]
            ]
        },
        {
            "name": "pytz",
            "specs": [
                [
                    "==",
                    "2023.3"
                ]
            ]
        },
        {
            "name": "six",
            "specs": [
                [
                    "==",
                    "1.16.0"
                ]
            ]
        },
        {
            "name": "types-pytz",
            "specs": [
                [
                    "==",
                    "2023.3.0.0"
                ]
            ]
        },
        {
            "name": "tzdata",
            "specs": [
                [
                    "==",
                    "2023.3"
                ]
            ]
        }
    ],
    "lcname": "memento-ml"
}

None