topsearch


Nametopsearch JSON
Version 0.0.3 PyPI version JSON
download
home_pagehttps://github.com/IBM/topography-searcher/
SummaryA Python package for topographical analysis of machine learning models and physical systems
upload_time2024-04-24 10:34:46
maintainerNone
docs_urlNone
authorLuke Dicks
requires_python<3.13,>=3.10
licenseMIT
keywords machine-learning chemistry topography landscapes explainable-ai
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI
coveralls test coverage No coveralls.
            <p align="center">
    <img src="./images/TopSearchLogo.png" height="390" width="390">
</p>

# TopSearch

## Introduction

The TopSearch package provides functionality to map the topography of surfaces, and can be used to map the solution space of machine learning algorithms. Many machine learning algorithms have non-convex loss (or cost) functions, and the aim of fitting is usually to locate low-valued or diverse solutions. An understanding of the solution space organisation provides key understanding about the reproducibility, explainability and performance of ML methods. 

The methodology derives from the field of chemical physics, and is agnostic to the given surface, allowing application to a wide range of machine learning algorithms. Leveraging ideas from chemical physics we can assess the performance and reliability of [neural networks](https://doi.org/10.1088/2632-2153/ac49a9), [Gaussian processes](https://arxiv.org/abs/2305.10748), Bayesian optimisation, [clustering algorithms](https://doi.org/10.1063/5.0078793) and understand the effect of [dataset roughness](https://doi.org/10.1039/D3ME00189J) on model performance. Application of the same framework to many different machine learning paradigms provides a route for transferable understanding, and its application to machine learning is reviewed in [this paper](https://doi.org/10.1039/D3DD00204G).

## Overview

The topographical mapping is performed using the [energy landscape framework](https://doi.org/10.1146/annurev-physchem-050317-021219). The energy landscape framework, developed in chemical physics, encodes surfaces as a network of stationary points. These stationary points, points with zero gradient, can be either local minima or transition states. Transition states are maximal in a single direction and local minimisation along that given direction (both forwards and backwards) locates two connected minima. Each transition state gives the lowest barrier between its two connected minima, and provides information about the intermediate behaviour of the function. In the network, each minimum is a node and edges exist between minima connected by a transition state. The complete network of minima and transition states constitutes the solution landscape, and we show an example landscape below.

| <img src="./images/StationaryPointsExample.png" height="317" width="425"> <img src="./images/NetworkExample.png" height="295" width="390"> |
|:--:|
| **Top**: A contour plot of the original surface with the stationary points, and their connections, overlaid. The minima are given in green, the transition states in red, and the connections between them with solid black lines. **Bottom**: The corresponding network abstraction of the surface. Here, the separation between connected nodes is specified by the height of the transition state between them. |

Topographical mapping of surfaces involves two main steps:
* Global optimisation &rarr; location of the global minimum and other low-valued local minima of the surface
* Landscape exploration &rarr; attempt transition state searches between pairs of minima to generate a fully connected set of minima

Global optimisation is usually performed using the [basin-hopping](https://arxiv.org/abs/cond-mat/9803344) algorithm within TopSearch. Basin-hopping is a modified Monte Carlo approach that includes local minimisation at each step to efficiently optimise complex solution spaces. During global optimisation we store all unique minima that we encounter, which gives us the initial set of minima that will be connected by transition state searches.

Landscape exploration involves selecting pairs of minima and attempting to find transition states between them. Transition state location is usually performed using a combination of double-ended and single-ended transition state searches. Double-ended searches aim to locate the lowest-valued path between a given pair of minima, and we use the [nudged elastic band algorithm](https://doi.org/10.1063/1.1329672) within TopSearch. Single-ended methods start from a single point and follow the lowest eigenmode towards the nearest transition state, and this is performed using [hybrid eigenvector-following](https://www-wales.ch.cam.ac.uk/pdf/CPL.341.185.2001.pdf). The nudged elastic band algorithm locates an approximate minimum energy path between two chosen minima, and the maxima on this path are refined to true transition states using hybrid eigenvector-following. There are a variety of schemes to decide which pairs of minima should be selected for transition state connections, all of which aim to produce a fully connected network of minima and explore important regions of solution space.

For more details of the methodology please refer to [`common_citations.md`](./common_citations.md).

## Installation

Instructions are for installation into a conda environment. First create the conda environment
```
conda create -n topsearch_env python=3.11
conda activate topsearch_env
```
The package is available on PyPI and can be installed using pip
```
pip install topsearch==0.0.2
```

For the source code you can clone the git repository locally using
```
git clone https://github.com/IBM/topography-searcher.git
```
and then install the dependencies using either
```
pip install -r requirements.txt
```
or from the pyproject.toml file with
```
poetry install
```

We can test the environment build by running
```
cd tests
pytest
```
For a successful build we should have all tests pass. If this is the case then enjoy using TopSearch!

_Note_: By default we do not specify the dependencies for molecular potentials (`dft.py`, `ml_potentials.py`) due to the large increase in environment size, which is unnecessary for machine learning applications. The dependencies for a given potential should be installed in addition if required. Therefore, we do not run the tests in `molecular_potentials` by default, but these can be run manually with
```
cd tests
cd molecular_potentials
pytest test*
```

## Examples

We provide several examples to illustrate the tasks that TopSearch can perform in [`examples`](./examples). These examples are provided as both an annotated Jupyter notebook or a python script (the scripts are further separated for ease), and each example has a detailed description of its content within the README. `example_function` as a [notebook](./examples/notebooks/example_function.ipynb) or [scripts](./examples/scripts/example_function) are the best place to start for an introduction to the methodology, where we apply it to some simple test functions.

## Contributors

This package is written and maintained by Luke Dicks at IBM Research as part of the AI-Enriched Simulation team. Please contact Luke (<luke.dicks@ibm.com>) or Edward Pyzer-Knapp (<EPyzerK3@uk.ibm.com>) for questions about how to use and/or contribute.

## License

TopSearch is an open-source software licensed under the MIT License. Check the details in the [`LICENSE`](./LICENSE) file.

## Citations

If you use this package please cite it appropriately using the 'Cite this repository' dropdown in the right sidebar. Moreover, we also provide a bibliography of previous relevant work in [`common_citations.md`](./common_citations.md). This file provides references to energy landscape algorithms and their applications to different fields of machine learning, each given as a bibtex entry with the handle providing a summary of the content.


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/IBM/topography-searcher/",
    "name": "topsearch",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.13,>=3.10",
    "maintainer_email": null,
    "keywords": "machine-learning, chemistry, topography, landscapes, explainable-ai",
    "author": "Luke Dicks",
    "author_email": "luke.dicks@ibm.com",
    "download_url": "https://files.pythonhosted.org/packages/9c/80/426162326716baf631863c4a679d1853945acb957f12f84be0d75d134cb4/topsearch-0.0.3.tar.gz",
    "platform": null,
    "description": "<p align=\"center\">\n    <img src=\"./images/TopSearchLogo.png\" height=\"390\" width=\"390\">\n</p>\n\n# TopSearch\n\n## Introduction\n\nThe TopSearch package provides functionality to map the topography of surfaces, and can be used to map the solution space of machine learning algorithms. Many machine learning algorithms have non-convex loss (or cost) functions, and the aim of fitting is usually to locate low-valued or diverse solutions. An understanding of the solution space organisation provides key understanding about the reproducibility, explainability and performance of ML methods. \n\nThe methodology derives from the field of chemical physics, and is agnostic to the given surface, allowing application to a wide range of machine learning algorithms. Leveraging ideas from chemical physics we can assess the performance and reliability of [neural networks](https://doi.org/10.1088/2632-2153/ac49a9), [Gaussian processes](https://arxiv.org/abs/2305.10748), Bayesian optimisation, [clustering algorithms](https://doi.org/10.1063/5.0078793) and understand the effect of [dataset roughness](https://doi.org/10.1039/D3ME00189J) on model performance. Application of the same framework to many different machine learning paradigms provides a route for transferable understanding, and its application to machine learning is reviewed in [this paper](https://doi.org/10.1039/D3DD00204G).\n\n## Overview\n\nThe topographical mapping is performed using the [energy landscape framework](https://doi.org/10.1146/annurev-physchem-050317-021219). The energy landscape framework, developed in chemical physics, encodes surfaces as a network of stationary points. These stationary points, points with zero gradient, can be either local minima or transition states. Transition states are maximal in a single direction and local minimisation along that given direction (both forwards and backwards) locates two connected minima. Each transition state gives the lowest barrier between its two connected minima, and provides information about the intermediate behaviour of the function. In the network, each minimum is a node and edges exist between minima connected by a transition state. The complete network of minima and transition states constitutes the solution landscape, and we show an example landscape below.\n\n| <img src=\"./images/StationaryPointsExample.png\" height=\"317\" width=\"425\"> <img src=\"./images/NetworkExample.png\" height=\"295\" width=\"390\"> |\n|:--:|\n| **Top**: A contour plot of the original surface with the stationary points, and their connections, overlaid. The minima are given in green, the transition states in red, and the connections between them with solid black lines. **Bottom**: The corresponding network abstraction of the surface. Here, the separation between connected nodes is specified by the height of the transition state between them. |\n\nTopographical mapping of surfaces involves two main steps:\n* Global optimisation &rarr; location of the global minimum and other low-valued local minima of the surface\n* Landscape exploration &rarr; attempt transition state searches between pairs of minima to generate a fully connected set of minima\n\nGlobal optimisation is usually performed using the [basin-hopping](https://arxiv.org/abs/cond-mat/9803344) algorithm within TopSearch. Basin-hopping is a modified Monte Carlo approach that includes local minimisation at each step to efficiently optimise complex solution spaces. During global optimisation we store all unique minima that we encounter, which gives us the initial set of minima that will be connected by transition state searches.\n\nLandscape exploration involves selecting pairs of minima and attempting to find transition states between them. Transition state location is usually performed using a combination of double-ended and single-ended transition state searches. Double-ended searches aim to locate the lowest-valued path between a given pair of minima, and we use the [nudged elastic band algorithm](https://doi.org/10.1063/1.1329672) within TopSearch. Single-ended methods start from a single point and follow the lowest eigenmode towards the nearest transition state, and this is performed using [hybrid eigenvector-following](https://www-wales.ch.cam.ac.uk/pdf/CPL.341.185.2001.pdf). The nudged elastic band algorithm locates an approximate minimum energy path between two chosen minima, and the maxima on this path are refined to true transition states using hybrid eigenvector-following. There are a variety of schemes to decide which pairs of minima should be selected for transition state connections, all of which aim to produce a fully connected network of minima and explore important regions of solution space.\n\nFor more details of the methodology please refer to [`common_citations.md`](./common_citations.md).\n\n## Installation\n\nInstructions are for installation into a conda environment. First create the conda environment\n```\nconda create -n topsearch_env python=3.11\nconda activate topsearch_env\n```\nThe package is available on PyPI and can be installed using pip\n```\npip install topsearch==0.0.2\n```\n\nFor the source code you can clone the git repository locally using\n```\ngit clone https://github.com/IBM/topography-searcher.git\n```\nand then install the dependencies using either\n```\npip install -r requirements.txt\n```\nor from the pyproject.toml file with\n```\npoetry install\n```\n\nWe can test the environment build by running\n```\ncd tests\npytest\n```\nFor a successful build we should have all tests pass. If this is the case then enjoy using TopSearch!\n\n_Note_: By default we do not specify the dependencies for molecular potentials (`dft.py`, `ml_potentials.py`) due to the large increase in environment size, which is unnecessary for machine learning applications. The dependencies for a given potential should be installed in addition if required. Therefore, we do not run the tests in `molecular_potentials` by default, but these can be run manually with\n```\ncd tests\ncd molecular_potentials\npytest test*\n```\n\n## Examples\n\nWe provide several examples to illustrate the tasks that TopSearch can perform in [`examples`](./examples). These examples are provided as both an annotated Jupyter notebook or a python script (the scripts are further separated for ease), and each example has a detailed description of its content within the README. `example_function` as a [notebook](./examples/notebooks/example_function.ipynb) or [scripts](./examples/scripts/example_function) are the best place to start for an introduction to the methodology, where we apply it to some simple test functions.\n\n## Contributors\n\nThis package is written and maintained by Luke Dicks at IBM Research as part of the AI-Enriched Simulation team. Please contact Luke (<luke.dicks@ibm.com>) or Edward Pyzer-Knapp (<EPyzerK3@uk.ibm.com>) for questions about how to use and/or contribute.\n\n## License\n\nTopSearch is an open-source software licensed under the MIT License. Check the details in the [`LICENSE`](./LICENSE) file.\n\n## Citations\n\nIf you use this package please cite it appropriately using the 'Cite this repository' dropdown in the right sidebar. Moreover, we also provide a bibliography of previous relevant work in [`common_citations.md`](./common_citations.md). This file provides references to energy landscape algorithms and their applications to different fields of machine learning, each given as a bibtex entry with the handle providing a summary of the content.\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A Python package for topographical analysis of machine learning models and physical systems",
    "version": "0.0.3",
    "project_urls": {
        "Homepage": "https://github.com/IBM/topography-searcher/",
        "Repository": "https://github.com/IBM/topography-searcher/"
    },
    "split_keywords": [
        "machine-learning",
        " chemistry",
        " topography",
        " landscapes",
        " explainable-ai"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b658388b23ac9f43a9514daa5f047dfdbb1f75a2245842af5def24e0a4acee1a",
                "md5": "9547780899a1d8bd8f176937477c7a3f",
                "sha256": "a61f1a54298d49b16236c5d46a8c48f96d74ca574469b3870a92a21e472a0b16"
            },
            "downloads": -1,
            "filename": "topsearch-0.0.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9547780899a1d8bd8f176937477c7a3f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.13,>=3.10",
            "size": 69329,
            "upload_time": "2024-04-24T10:34:44",
            "upload_time_iso_8601": "2024-04-24T10:34:44.468873Z",
            "url": "https://files.pythonhosted.org/packages/b6/58/388b23ac9f43a9514daa5f047dfdbb1f75a2245842af5def24e0a4acee1a/topsearch-0.0.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9c80426162326716baf631863c4a679d1853945acb957f12f84be0d75d134cb4",
                "md5": "a1f96d8c413df1a1d300ee52ce0bbfc9",
                "sha256": "37102502acb0e3dafedc713be8a6b8890bea73b39695d19a9cacbbecbe8a2fbb"
            },
            "downloads": -1,
            "filename": "topsearch-0.0.3.tar.gz",
            "has_sig": false,
            "md5_digest": "a1f96d8c413df1a1d300ee52ce0bbfc9",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.13,>=3.10",
            "size": 124857,
            "upload_time": "2024-04-24T10:34:46",
            "upload_time_iso_8601": "2024-04-24T10:34:46.383330Z",
            "url": "https://files.pythonhosted.org/packages/9c/80/426162326716baf631863c4a679d1853945acb957f12f84be0d75d134cb4/topsearch-0.0.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-24 10:34:46",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "IBM",
    "github_project": "topography-searcher",
    "travis_ci": true,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "topsearch"
}
        
Elapsed time: 0.60752s