gauche


Namegauche JSON
Version 0.1.6 PyPI version JSON
download
home_page
SummaryGaussian Process Library for Molecules, Chemical Reactions and Proteins.
upload_time2023-12-11 00:36:18
maintainer
docs_urlNone
author
requires_python>=3.8
licenseMIT
keywords machine-learning gaussian-processes kernels pytorch chemistry biology protein ligand
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![Project Status: Active – The project has reached a stable, usable state and is being actively developed.](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Docs](https://assets.readthedocs.org/static/projects/badges/passing-flat.svg)](https://leojklarner.github.io/gauche/)
[![CodeFactor](https://www.codefactor.io/repository/github/leojklarner/gauche/badge)](https://www.codefactor.io/repository/github/leojklarner/gauche)
<a href="https://github.com/psf/black"><img alt="Code style: black" src="https://img.shields.io/badge/code%20style-black-000000.svg"></a>
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/leojklarner/gauche/HEAD)

<!--[![DOI:10.48550/arXiv.2212.04450](https://zenodo.org/badge/DOI/10.48550/arXiv.2212.04450.svg)](https://doi.org/10.48550/arXiv.2212.04450)
[comment]: #[![fair-software.eu](https://img.shields.io/badge/fair--software.eu-%E2%97%8F%20%20%E2%97%8F%20%20%E2%97%8B%20%20%E2%97%8B%20%20%E2%97%8B-orange)](https://fair-software.eu)-->


<p align="left">
  <a href="https://arxiv.org/abs/2212.04450">
    <img src="imgs/gauche_banner_1.png"/>
  </a>
</p>

**GAUCHE** is a collaborative, open-source software library that aims to make state-of-the-art
probabilistic modelling and black-box optimisation techniques more easily accessible to scientific
experts in chemistry, materials science and beyond. We provide 30+ bespoke kernels for molecules, chemical reactions and proteins and illustrate how they can be used for Gaussian processes and Bayesian optimisation in 10+ easy-to-adapt tutorial notebooks. 

[Overview](#overview) | [Getting Started](#getting-started) | [Documentation](https://leojklarner.github.io/gauche/) | [Paper (NeurIPS 2023)](https://arxiv.org/abs/2212.04450)


## What's New?

* GAUCHE has been accepted to the [NeurIPS 2023 Main Track](https://neurips.cc/virtual/2023/poster/70081)! More details forthcoming!
* Check out our new [Molecular Preference Learning](https://github.com/leojklarner/gauche/blob/main/notebooks/Molecular%20Preference%20Learning.ipynb) and [Preferential Bayesian Optimisation](https://github.com/leojklarner/gauche/blob/main/notebooks/Preferential%20Bayesian%20Optimisation.ipynb) notebooks that show how you can use GAUCHE to learn the latent utility function of a human medicinal chemist from pairwise preference feedback!
* Check out our new [Sparse GP Regression for Big Molecular Data](https://github.com/leojklarner/gauche/blob/main/notebooks/Sparse%20GP%20Regression%20for%20Big%20Molecular%20Data.ipynb) notebook that shows how you can scale molecular GPs to thousands of data points with sparse inducing point kernels!

## Overview

General-purpose Gaussian process (GP) and Bayesian optimisation (BO) libraries do not cater for molecular representations. Likewise, general-purpose molecular machine learning libraries do not consider GPs and BO. To bridge this gap, GAUCHE provides a modular, robust and easy-to-use framework of 30+ parallelisable and batch-GP-compatible implementations of string, fingerprint and graph kernels that operate on a range of widely-used molecular representations.

<p align="left">
  <a href="https://leojklarner.github.io/gauche/">
    <img src="imgs/gauche_overview.png" width="100%" />
  </a>
</p>

### Kernels

Standard GP packages typically assume continuous input spaces of low and fixed dimensionality. This makes it difficult to apply them to common molecular representations: molecular graphs are discrete objects, SMILES strings vary in length and topological fingerprints tend to be high-dimensional and sparse. To bridge this gap, GAUCHE provides:

* **Fingerprint Kernels** that measure the similarity between bit/count vectors of descriptor by examining the degree to which their elements overlap.
* **String Kernels** that measure the similarity between strings by examining the degree to which their sub-strings overlap.
* **Graph Kernels** that measure between graphs by examining the degree to which certain substructural motifs overlap.

### Representations

GAUCHE supports any representation that is based on bit/count vectors, strings or graphs. For rapid prototyping and benchmarking, we also provide a range of standard featurisation techniques for molecules, chemical reactions and proteins:
 
<table>
<thead>
  <tr>
    <th>Domain</th>
    <th>Representation</th>
  </tr>
</thead>
<tbody>
  <tr>
    <td rowspan="1">Molecules</td>
    <td>ECFP Fingerprints [1], rdkit Fragments, Fragprints, Molecular Graphs [2], SMILES [3], SELFIES [4] </td>
  </tr>
  <tr>
    <td rowspan="1">Chemical Reactions</td>
    <td>One-Hot Encoding, Data-Driven Reaction Fingerprints [5], Differential Reaction Fingerprints [6], Reaction SMARTS</td>
  </tr>
  <tr>
    <td rowspan="21">Proteins</td>
    <td>Sequences, Graphs [7]</td>
  </tr>
</tbody>
</table>

### Extensions

If there are any specific kernels or representations that you would like to see included in GAUCHE, please reach out or submit an issue/pull request.

## Getting Started

The easiest way to get started with GAUCHE is to check out our tutorial notebooks:

|   |   |
|---|---|
| [GP Regression on Molecules](https://leojklarner.github.io/gauche/notebooks/gp_regression_on_molecules.html)  |  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/leojklarner/gauche/blob/main/notebooks/GP%20Regression%20on%20Molecules.ipynb)   |
| [Bayesian Optimisation Over Molecules](https://leojklarner.github.io/gauche/notebooks/bayesian_optimisation_over_molecules.html)  |  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/leojklarner/gauche/blob/main/notebooks/Bayesian%20Optimisation%20Over%20Molecules.ipynb)   |
| [Multioutput Gaussian Processes for Multitask Learning](https://leojklarner.github.io/gauche/notebooks/multitask_gp_regression_on_molecules.html)  |  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/leojklarner/gauche/blob/main/notebooks/Multitask%20GP%20Regression%20on%20Molecules.ipynb)   |
| [Training GPs on Graphs](https://leojklarner.github.io/gauche/notebooks/training_gps_on_graphs.html)  |  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/leojklarner/gauche/blob/main/notebooks/Training%20GPs%20on%20Graphs.ipynb)   |
| [Sparse GP Regression for Big Molecular Data](https://leojklarner.github.io/gauche/notebooks/sparse_gp_regression_for_big_molecular_data.html)  |  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/leojklarner/gauche/blob/main/notebooks/Sparse%20GP%20Regression%20for%20Big%20Molecular%20Data.ipynb)   |
|[Molecular Preference Learning](https://github.com/leojklarner/gauche/blob/main/notebooks/Molecular%20Preference%20Learning.ipynb)|[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/leojklarner/gauche/blob/main/notebooks/Molecular%20Preference%20Learning.ipynb) |
|[Preferential Bayesian Optimisation](https://github.com/leojklarner/gauche/blob/main/notebooks/Preferential%20Bayesian%20Optimisation.ipynb)|[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/leojklarner/gauche/blob/main/notebooks/Preferential%20Bayesian%20Optimisation.ipynb) |

### Installation

We recommend using a conda virtual environment:
```
git clone https://github.com/leojklarner/gauche.git
cd gauche

conda env create -f conda_env.yml

pip install --no-deps rxnfp
pip install --no-deps drfp
pip install transformers
pip install mordred

# optional for running tests
pip install gpflow grakel
```

### Example Usage: GP Regression on Molecules

Fitting a GP model with a kernel from GAUCHE and using it to predict the properties of new molecules is as easy as this. For more detail, check out our [GP Regression on Molecules Tutorial](https://leojklarner.github.io/gauche/notebooks/gp_regression_on_molecules.html) and the corresponding section in the [Docs](https://leojklarner.github.io/gauche/modules/dataloader.html).

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/leojklarner/gauche/blob/main/notebooks/GP%20Regression%20on%20Molecules.ipynb)

```python
import gpytorch
from botorch import fit_gpytorch_model
from gauche.kernels.fingerprint_kernels.tanimoto_kernel import TanimotoKernel

class TanimotoGP(gpytorch.models.ExactGP):
  def __init__(self, train_x, train_y, likelihood):
    super(TanimotoGP, self).__init__(train_x, train_y, likelihood)
    self.mean_module = gpytorch.means.ConstantMean()
    self.covar_module = gpytorch.kernels.ScaleKernel(TanimotoKernel())
  
  def forward(self, x):
    mean_x = self.mean_module(x)
    covar_x = self.covar_module(x)
    return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)

# initialise GP likelihood, model and 
# marginal log likelihood objective
likelihood = gpytorch.likelihoods.GaussianLikelihood()
model = TanimotoGP(X_train, y_train, likelihood)
mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)

# fit GP with BoTorch in order to use
# the LBFGS-B optimiser (recommended)
fit_gpytorch_model(mll)

# use the trained GP to get predictions and 
# uncertainty estimates for new molecules
model.eval()
likelihood.eval()
preds = model(X_test)
pred_means, pred_vars = preds.mean, preds.variance
```

### Example Usage: Bayesian Optimisation Over Molecules

|   |   |  
|---|---|
[Tutorial (Bayesian Optimisation Over Molecules)](https://leojklarner.github.io/gauche/notebooks/bayesian_optimisation_over_molecules.html)  | [Docs](https://leojklarner.github.io/gauche/modules/kernels.html)
| [![Open In Colab(https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/leojklarner/gauche/blob/main/notebooks/Bayesian%20Optimisation%20Over%20Molecules.ipynb) | |

```python
from botorch.models.gp_regression import SingleTaskGP
from gprotorch.kernels.fingerprint_kernels.tanimoto_kernel import TanimotoKernel

# We define our custom GP surrogate model using the Tanimoto kernel
class TanimotoGP(SingleTaskGP):

    def __init__(self, train_X, train_Y):
        super().__init__(train_X, train_Y, GaussianLikelihood())
        self.mean_module = ConstantMean()
        self.covar_module = ScaleKernel(base_kernel=TanimotoKernel())
        self.to(train_X)  # make sure we're on the right device/dtype

    def forward(self, x):
        mean_x = self.mean_module(x)
        covar_x = self.covar_module(x)
        return MultivariateNormal(mean_x, covar_x)
```

## Citing GAUCHE

If GAUCHE is useful for your work please consider citing the following paper:

```bibtex
@misc{griffiths2022gauche,
      title={GAUCHE: A Library for Gaussian Processes in Chemistry}, 
      author={Ryan-Rhys Griffiths and Leo Klarner and Henry B. Moss and Aditya Ravuri and Sang Truong and Bojana Rankovic and Yuanqi Du and Arian Jamasb and Julius Schwartz and Austin Tripp and Gregory Kell and Anthony Bourached and Alex Chan and Jacob Moss and Chengzhi Guo and Alpha A. Lee and Philippe Schwaller and Jian Tang},
      year={2022},
      eprint={2212.04450},
      archivePrefix={arXiv},
      primaryClass={physics.chem-ph}
}

```


## References

[1] Rogers, D. and Hahn, M., 2010. [Extended-connectivity fingerprints](https://pubs.acs.org/doi/abs/10.1021/ci100050t). Journal of Chemical Information and Modeling, 50(5), pp.742-754.

[2] Fey, M., & Lenssen, J. E. (2019). [Fast graph representation learning with PyTorch Geometric](https://arxiv.org/abs/1903.02428). arXiv preprint arXiv:1903.02428.

[3] Weininger, D., 1988. [SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules.](https://pubs.acs.org/doi/pdf/10.1021/ci00057a005) Journal of Chemical Information and Computer Sciences, 28(1), pp.31-36.

[4] Krenn, M., Häse, F., Nigam, A., Friederich, P. and Aspuru-Guzik, A., 2020. [Self-referencing embedded strings (SELFIES): A 100% robust molecular string representation](https://iopscience.iop.org/article/10.1088/2632-2153/aba947/meta). Machine Learning: Science and Technology, 1(4), p.045024.

[5] Probst, D., Schwaller, P. and Reymond, J.L., 2022. [Reaction classification and yield prediction using the differential reaction fingerprint DRFP](https://pubs.rsc.org/en/content/articlehtml/2022/dd/d1dd00006c). Digital Discovery, 1(2), pp.91-97.

[6] Schwaller, P., Probst, D., Vaucher, A.C., Nair, V.H., Kreutter, D., Laino, T. and Reymond, J.L., 2021. [Mapping the space of chemical reactions using attention-based neural networks](https://www.nature.com/articles/s42256-020-00284-w). Nature Machine Intelligence, 3(2), pp.144-152.

[7] Jamasb, A., Viñas Torné, R., Ma, E., Du, Y., Harris, C., Huang, K., Hall, D., Lió, P. and Blundell, T., 2022. [Graphein-a Python library for geometric deep learning and network analysis on biomolecular structures and interaction networks](https://proceedings.neurips.cc/paper_files/paper/2022/hash/ade039c1db0391106a3375bd2feb310a-Abstract-Conference.html). Advances in Neural Information Processing Systems, 35, pp.27153-27167.

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "gauche",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "machine-learning gaussian-processes kernels pytorch chemistry biology protein ligand",
    "author": "",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/3f/52/1dc52738f246fd60a69aa1cde8930d03a56007b1c051c0b94c9db5901285/gauche-0.1.6.tar.gz",
    "platform": "any",
    "description": "[![Project Status: Active \u2013 The project has reached a stable, usable state and is being actively developed.](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Docs](https://assets.readthedocs.org/static/projects/badges/passing-flat.svg)](https://leojklarner.github.io/gauche/)\n[![CodeFactor](https://www.codefactor.io/repository/github/leojklarner/gauche/badge)](https://www.codefactor.io/repository/github/leojklarner/gauche)\n<a href=\"https://github.com/psf/black\"><img alt=\"Code style: black\" src=\"https://img.shields.io/badge/code%20style-black-000000.svg\"></a>\n[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/leojklarner/gauche/HEAD)\n\n<!--[![DOI:10.48550/arXiv.2212.04450](https://zenodo.org/badge/DOI/10.48550/arXiv.2212.04450.svg)](https://doi.org/10.48550/arXiv.2212.04450)\n[comment]: #[![fair-software.eu](https://img.shields.io/badge/fair--software.eu-%E2%97%8F%20%20%E2%97%8F%20%20%E2%97%8B%20%20%E2%97%8B%20%20%E2%97%8B-orange)](https://fair-software.eu)-->\n\n\n<p align=\"left\">\n  <a href=\"https://arxiv.org/abs/2212.04450\">\n    <img src=\"imgs/gauche_banner_1.png\"/>\n  </a>\n</p>\n\n**GAUCHE** is a collaborative, open-source software library that aims to make state-of-the-art\nprobabilistic modelling and black-box optimisation techniques more easily accessible to scientific\nexperts in chemistry, materials science and beyond. We provide 30+ bespoke kernels for molecules, chemical reactions and proteins and illustrate how they can be used for Gaussian processes and Bayesian optimisation in 10+ easy-to-adapt tutorial notebooks. \n\n[Overview](#overview) | [Getting Started](#getting-started) | [Documentation](https://leojklarner.github.io/gauche/) | [Paper (NeurIPS 2023)](https://arxiv.org/abs/2212.04450)\n\n\n## What's New?\n\n* GAUCHE has been accepted to the [NeurIPS 2023 Main Track](https://neurips.cc/virtual/2023/poster/70081)! More details forthcoming!\n* Check out our new [Molecular Preference Learning](https://github.com/leojklarner/gauche/blob/main/notebooks/Molecular%20Preference%20Learning.ipynb) and [Preferential Bayesian Optimisation](https://github.com/leojklarner/gauche/blob/main/notebooks/Preferential%20Bayesian%20Optimisation.ipynb) notebooks that show how you can use GAUCHE to learn the latent utility function of a human medicinal chemist from pairwise preference feedback!\n* Check out our new [Sparse GP Regression for Big Molecular Data](https://github.com/leojklarner/gauche/blob/main/notebooks/Sparse%20GP%20Regression%20for%20Big%20Molecular%20Data.ipynb) notebook that shows how you can scale molecular GPs to thousands of data points with sparse inducing point kernels!\n\n## Overview\n\nGeneral-purpose Gaussian process (GP) and Bayesian optimisation (BO) libraries do not cater for molecular representations. Likewise, general-purpose molecular machine learning libraries do not consider GPs and BO. To bridge this gap, GAUCHE provides a modular, robust and easy-to-use framework of 30+ parallelisable and batch-GP-compatible implementations of string, fingerprint and graph kernels that operate on a range of widely-used molecular representations.\n\n<p align=\"left\">\n  <a href=\"https://leojklarner.github.io/gauche/\">\n    <img src=\"imgs/gauche_overview.png\" width=\"100%\" />\n  </a>\n</p>\n\n### Kernels\n\nStandard GP packages typically assume continuous input spaces of low and fixed dimensionality. This makes it difficult to apply them to common molecular representations: molecular graphs are discrete objects, SMILES strings vary in length and topological fingerprints tend to be high-dimensional and sparse. To bridge this gap, GAUCHE provides:\n\n* **Fingerprint Kernels** that measure the similarity between bit/count vectors of descriptor by examining the degree to which their elements overlap.\n* **String Kernels** that measure the similarity between strings by examining the degree to which their sub-strings overlap.\n* **Graph Kernels** that measure between graphs by examining the degree to which certain substructural motifs overlap.\n\n### Representations\n\nGAUCHE supports any representation that is based on bit/count vectors, strings or graphs. For rapid prototyping and benchmarking, we also provide a range of standard featurisation techniques for molecules, chemical reactions and proteins:\n \n<table>\n<thead>\n  <tr>\n    <th>Domain</th>\n    <th>Representation</th>\n  </tr>\n</thead>\n<tbody>\n  <tr>\n    <td rowspan=\"1\">Molecules</td>\n    <td>ECFP Fingerprints [1], rdkit Fragments, Fragprints, Molecular Graphs [2], SMILES [3], SELFIES [4] </td>\n  </tr>\n  <tr>\n    <td rowspan=\"1\">Chemical Reactions</td>\n    <td>One-Hot Encoding, Data-Driven Reaction Fingerprints [5], Differential Reaction Fingerprints [6], Reaction SMARTS</td>\n  </tr>\n  <tr>\n    <td rowspan=\"21\">Proteins</td>\n    <td>Sequences, Graphs [7]</td>\n  </tr>\n</tbody>\n</table>\n\n### Extensions\n\nIf there are any specific kernels or representations that you would like to see included in GAUCHE, please reach out or submit an issue/pull request.\n\n## Getting Started\n\nThe easiest way to get started with GAUCHE is to check out our tutorial notebooks:\n\n|   |   |\n|---|---|\n| [GP Regression on Molecules](https://leojklarner.github.io/gauche/notebooks/gp_regression_on_molecules.html)  |  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/leojklarner/gauche/blob/main/notebooks/GP%20Regression%20on%20Molecules.ipynb)   |\n| [Bayesian Optimisation Over Molecules](https://leojklarner.github.io/gauche/notebooks/bayesian_optimisation_over_molecules.html)  |  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/leojklarner/gauche/blob/main/notebooks/Bayesian%20Optimisation%20Over%20Molecules.ipynb)   |\n| [Multioutput Gaussian Processes for Multitask Learning](https://leojklarner.github.io/gauche/notebooks/multitask_gp_regression_on_molecules.html)  |  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/leojklarner/gauche/blob/main/notebooks/Multitask%20GP%20Regression%20on%20Molecules.ipynb)   |\n| [Training GPs on Graphs](https://leojklarner.github.io/gauche/notebooks/training_gps_on_graphs.html)  |  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/leojklarner/gauche/blob/main/notebooks/Training%20GPs%20on%20Graphs.ipynb)   |\n| [Sparse GP Regression for Big Molecular Data](https://leojklarner.github.io/gauche/notebooks/sparse_gp_regression_for_big_molecular_data.html)  |  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/leojklarner/gauche/blob/main/notebooks/Sparse%20GP%20Regression%20for%20Big%20Molecular%20Data.ipynb)   |\n|[Molecular Preference Learning](https://github.com/leojklarner/gauche/blob/main/notebooks/Molecular%20Preference%20Learning.ipynb)|[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/leojklarner/gauche/blob/main/notebooks/Molecular%20Preference%20Learning.ipynb) |\n|[Preferential Bayesian Optimisation](https://github.com/leojklarner/gauche/blob/main/notebooks/Preferential%20Bayesian%20Optimisation.ipynb)|[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/leojklarner/gauche/blob/main/notebooks/Preferential%20Bayesian%20Optimisation.ipynb) |\n\n### Installation\n\nWe recommend using a conda virtual environment:\n```\ngit clone https://github.com/leojklarner/gauche.git\ncd gauche\n\nconda env create -f conda_env.yml\n\npip install --no-deps rxnfp\npip install --no-deps drfp\npip install transformers\npip install mordred\n\n# optional for running tests\npip install gpflow grakel\n```\n\n### Example Usage: GP Regression on Molecules\n\nFitting a GP model with a kernel from GAUCHE and using it to predict the properties of new molecules is as easy as this. For more detail, check out our [GP Regression on Molecules Tutorial](https://leojklarner.github.io/gauche/notebooks/gp_regression_on_molecules.html) and the corresponding section in the [Docs](https://leojklarner.github.io/gauche/modules/dataloader.html).\n\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/leojklarner/gauche/blob/main/notebooks/GP%20Regression%20on%20Molecules.ipynb)\n\n```python\nimport gpytorch\nfrom botorch import fit_gpytorch_model\nfrom gauche.kernels.fingerprint_kernels.tanimoto_kernel import TanimotoKernel\n\nclass TanimotoGP(gpytorch.models.ExactGP):\n  def __init__(self, train_x, train_y, likelihood):\n    super(TanimotoGP, self).__init__(train_x, train_y, likelihood)\n    self.mean_module = gpytorch.means.ConstantMean()\n    self.covar_module = gpytorch.kernels.ScaleKernel(TanimotoKernel())\n  \n  def forward(self, x):\n    mean_x = self.mean_module(x)\n    covar_x = self.covar_module(x)\n    return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)\n\n# initialise GP likelihood, model and \n# marginal log likelihood objective\nlikelihood = gpytorch.likelihoods.GaussianLikelihood()\nmodel = TanimotoGP(X_train, y_train, likelihood)\nmll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)\n\n# fit GP with BoTorch in order to use\n# the LBFGS-B optimiser (recommended)\nfit_gpytorch_model(mll)\n\n# use the trained GP to get predictions and \n# uncertainty estimates for new molecules\nmodel.eval()\nlikelihood.eval()\npreds = model(X_test)\npred_means, pred_vars = preds.mean, preds.variance\n```\n\n### Example Usage: Bayesian Optimisation Over Molecules\n\n|   |   |  \n|---|---|\n[Tutorial (Bayesian Optimisation Over Molecules)](https://leojklarner.github.io/gauche/notebooks/bayesian_optimisation_over_molecules.html)  | [Docs](https://leojklarner.github.io/gauche/modules/kernels.html)\n| [![Open In Colab(https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/leojklarner/gauche/blob/main/notebooks/Bayesian%20Optimisation%20Over%20Molecules.ipynb) | |\n\n```python\nfrom botorch.models.gp_regression import SingleTaskGP\nfrom gprotorch.kernels.fingerprint_kernels.tanimoto_kernel import TanimotoKernel\n\n# We define our custom GP surrogate model using the Tanimoto kernel\nclass TanimotoGP(SingleTaskGP):\n\n    def __init__(self, train_X, train_Y):\n        super().__init__(train_X, train_Y, GaussianLikelihood())\n        self.mean_module = ConstantMean()\n        self.covar_module = ScaleKernel(base_kernel=TanimotoKernel())\n        self.to(train_X)  # make sure we're on the right device/dtype\n\n    def forward(self, x):\n        mean_x = self.mean_module(x)\n        covar_x = self.covar_module(x)\n        return MultivariateNormal(mean_x, covar_x)\n```\n\n## Citing GAUCHE\n\nIf GAUCHE is useful for your work please consider citing the following paper:\n\n```bibtex\n@misc{griffiths2022gauche,\n      title={GAUCHE: A Library for Gaussian Processes in Chemistry}, \n      author={Ryan-Rhys Griffiths and Leo Klarner and Henry B. Moss and Aditya Ravuri and Sang Truong and Bojana Rankovic and Yuanqi Du and Arian Jamasb and Julius Schwartz and Austin Tripp and Gregory Kell and Anthony Bourached and Alex Chan and Jacob Moss and Chengzhi Guo and Alpha A. Lee and Philippe Schwaller and Jian Tang},\n      year={2022},\n      eprint={2212.04450},\n      archivePrefix={arXiv},\n      primaryClass={physics.chem-ph}\n}\n\n```\n\n\n## References\n\n[1] Rogers, D. and Hahn, M., 2010. [Extended-connectivity fingerprints](https://pubs.acs.org/doi/abs/10.1021/ci100050t). Journal of Chemical Information and Modeling, 50(5), pp.742-754.\n\n[2] Fey, M., & Lenssen, J. E. (2019). [Fast graph representation learning with PyTorch Geometric](https://arxiv.org/abs/1903.02428). arXiv preprint arXiv:1903.02428.\n\n[3] Weininger, D., 1988. [SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules.](https://pubs.acs.org/doi/pdf/10.1021/ci00057a005) Journal of Chemical Information and Computer Sciences, 28(1), pp.31-36.\n\n[4] Krenn, M., H\u00e4se, F., Nigam, A., Friederich, P. and Aspuru-Guzik, A., 2020. [Self-referencing embedded strings (SELFIES): A 100% robust molecular string representation](https://iopscience.iop.org/article/10.1088/2632-2153/aba947/meta). Machine Learning: Science and Technology, 1(4), p.045024.\n\n[5] Probst, D., Schwaller, P. and Reymond, J.L., 2022. [Reaction classification and yield prediction using the differential reaction fingerprint DRFP](https://pubs.rsc.org/en/content/articlehtml/2022/dd/d1dd00006c). Digital Discovery, 1(2), pp.91-97.\n\n[6] Schwaller, P., Probst, D., Vaucher, A.C., Nair, V.H., Kreutter, D., Laino, T. and Reymond, J.L., 2021. [Mapping the space of chemical reactions using attention-based neural networks](https://www.nature.com/articles/s42256-020-00284-w). Nature Machine Intelligence, 3(2), pp.144-152.\n\n[7] Jamasb, A., Vi\u00f1as Torn\u00e9, R., Ma, E., Du, Y., Harris, C., Huang, K., Hall, D., Li\u00f3, P. and Blundell, T., 2022. [Graphein-a Python library for geometric deep learning and network analysis on biomolecular structures and interaction networks](https://proceedings.neurips.cc/paper_files/paper/2022/hash/ade039c1db0391106a3375bd2feb310a-Abstract-Conference.html). Advances in Neural Information Processing Systems, 35, pp.27153-27167.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Gaussian Process Library for Molecules, Chemical Reactions and Proteins.",
    "version": "0.1.6",
    "project_urls": null,
    "split_keywords": [
        "machine-learning",
        "gaussian-processes",
        "kernels",
        "pytorch",
        "chemistry",
        "biology",
        "protein",
        "ligand"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3f521dc52738f246fd60a69aa1cde8930d03a56007b1c051c0b94c9db5901285",
                "md5": "31b677f3d00872588b3a9a82266309dd",
                "sha256": "f658523fd24eae2751ff9c86d50c6ef1ab2386402c07030f46028fccfa529791"
            },
            "downloads": -1,
            "filename": "gauche-0.1.6.tar.gz",
            "has_sig": false,
            "md5_digest": "31b677f3d00872588b3a9a82266309dd",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 364330,
            "upload_time": "2023-12-11T00:36:18",
            "upload_time_iso_8601": "2023-12-11T00:36:18.706691Z",
            "url": "https://files.pythonhosted.org/packages/3f/52/1dc52738f246fd60a69aa1cde8930d03a56007b1c051c0b94c9db5901285/gauche-0.1.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-12-11 00:36:18",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "gauche"
}
        
Elapsed time: 0.14964s