ChemSpaceAL


NameChemSpaceAL JSON
Version 2.0.1 PyPI version JSON
download
home_pagehttps://github.com/batistagroup/ChemSpaceAL
SummaryChemSpaceAL Python package: an efficient active learning methodology applied to protein-specific molecular generation
upload_time2024-02-24 04:04:03
maintainer
docs_urlNone
authorGregory W. Kyro, Anton Morgunov & Rafael I. Brent
requires_python>=3.10
license
keywords active learning artificial intelligence deep learning machine learning molecular generation drug discovery
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage
            # ChemSpaceAL: An Efficient Active Learning Methodology Applied to Protein- Specific Molecular Generation

[![](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
[![Checked with mypy](https://www.mypy-lang.org/static/mypy_badge.svg)](https://mypy-lang.org/)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![codecov](https://codecov.io/gh/batistagroup/ChemSpaceAL/graph/badge.svg?token=ROJSISYJWC)](https://codecov.io/gh/batistagroup/ChemSpaceAL)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://github.com/batistagroup/ChemSpaceAL/blob/main/LICENSE)
[![image](https://img.shields.io/pypi/v/ChemSpaceAL.svg)](https://pypi.org/project/ChemSpaceAL/)
<a target="_blank" href="https://colab.research.google.com/github/batistagroup/ChemSpaceAL/blob/main/ChemSpaceAL.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

![A description of the active learning methodology](media/toc_figure.jpg)

## Abstract

The incredible capabilities of generative artificial intelligence models have inevitably led to their application in the domain of drug discovery. Within this domain, the vastness of chemical space motivates the development of more efficient methods for identifying regions with molecules that exhibit desired characteristics. In this work, we present a computationally efficient active learning methodology that requires evaluation of only a subset of the generated data in the constructed sample space to successfully align a generative model with respect to a specified objective. We demonstrate the applicability of this methodology to targeted molecular generation by fine-tuning a GPT-based molecular generator toward a protein with FDA-approved small-molecule inhibitors, c-Abl kinase. Remarkably, the model learns to generate molecules similar to the inhibitors without prior knowledge of their existence, and even reproduces two of them exactly. We also show that the methodology is effective for a protein without any commercially available small-molecule inhibitors, the HNH domain of the CRISPR-associated protein 9 (Cas9) enzyme. We believe that the inherent generality of this method ensures that it will remain applicable as the exciting field of in silico molecular generation evolves. To facilitate implementation and reproducibility, we have made all of our software available through the open-source ChemSpaceAL Python package.

## Preprint

Associated preprint can be found on [arXiv](https://arxiv.org/abs/2309.05853). Note, a second version of the preprint has been posted on Dec 4, 2023.

## Installation

in order to install the [ChemSpaceAL package](https://pypi.org/project/ChemSpaceAL/), simply run:

```pip install ChemSpaceAL```

You could also open [ChemSpaceAL.ipynb in Google Colab](https://colab.research.google.com/github/batistagroup/ChemSpaceAL/blob/main/ChemSpaceAL.ipynb) to see an example of how to use a package.

## Contact

Please feel free to reach out to us through either of the following emails if you have any questions or need any additional files:

- <gregory.kyro@yale.edu>
- <anton.morgunov@yale.edu>
- <rafi.brent@yale.edu>

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/batistagroup/ChemSpaceAL",
    "name": "ChemSpaceAL",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": "",
    "keywords": "active learning,artificial intelligence,deep learning,machine learning,molecular generation,drug discovery",
    "author": "Gregory W. Kyro, Anton Morgunov & Rafael I. Brent",
    "author_email": "gregory.kyro@yale.edu",
    "download_url": "https://files.pythonhosted.org/packages/b2/73/0b447f1bd04a93b2ae78355055b364f9a77376b04373120199a3897a3861/ChemSpaceAL-2.0.1.tar.gz",
    "platform": null,
    "description": "# ChemSpaceAL: An Efficient Active Learning Methodology Applied to Protein- Specific Molecular Generation\n\n[![](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)\n[![Checked with mypy](https://www.mypy-lang.org/static/mypy_badge.svg)](https://mypy-lang.org/)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n[![codecov](https://codecov.io/gh/batistagroup/ChemSpaceAL/graph/badge.svg?token=ROJSISYJWC)](https://codecov.io/gh/batistagroup/ChemSpaceAL)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://github.com/batistagroup/ChemSpaceAL/blob/main/LICENSE)\n[![image](https://img.shields.io/pypi/v/ChemSpaceAL.svg)](https://pypi.org/project/ChemSpaceAL/)\n<a target=\"_blank\" href=\"https://colab.research.google.com/github/batistagroup/ChemSpaceAL/blob/main/ChemSpaceAL.ipynb\">\n  <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/>\n</a>\n\n![A description of the active learning methodology](media/toc_figure.jpg)\n\n## Abstract\n\nThe incredible capabilities of generative artificial intelligence models have inevitably led to their application in the domain of drug discovery. Within this domain, the vastness of chemical space motivates the development of more efficient methods for identifying regions with molecules that exhibit desired characteristics. In this work, we present a computationally efficient active learning methodology that requires evaluation of only a subset of the generated data in the constructed sample space to successfully align a generative model with respect to a specified objective. We demonstrate the applicability of this methodology to targeted molecular generation by fine-tuning a GPT-based molecular generator toward a protein with FDA-approved small-molecule inhibitors, c-Abl kinase. Remarkably, the model learns to generate molecules similar to the inhibitors without prior knowledge of their existence, and even reproduces two of them exactly. We also show that the methodology is effective for a protein without any commercially available small-molecule inhibitors, the HNH domain of the CRISPR-associated protein 9 (Cas9) enzyme. We believe that the inherent generality of this method ensures that it will remain applicable as the exciting field of in silico molecular generation evolves. To facilitate implementation and reproducibility, we have made all of our software available through the open-source ChemSpaceAL Python package.\n\n## Preprint\n\nAssociated preprint can be found on [arXiv](https://arxiv.org/abs/2309.05853). Note, a second version of the preprint has been posted on Dec 4, 2023.\n\n## Installation\n\nin order to install the [ChemSpaceAL package](https://pypi.org/project/ChemSpaceAL/), simply run:\n\n```pip install ChemSpaceAL```\n\nYou could also open [ChemSpaceAL.ipynb in Google Colab](https://colab.research.google.com/github/batistagroup/ChemSpaceAL/blob/main/ChemSpaceAL.ipynb) to see an example of how to use a package.\n\n## Contact\n\nPlease feel free to reach out to us through either of the following emails if you have any questions or need any additional files:\n\n- <gregory.kyro@yale.edu>\n- <anton.morgunov@yale.edu>\n- <rafi.brent@yale.edu>\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "ChemSpaceAL Python package: an efficient active learning methodology applied to protein-specific molecular generation",
    "version": "2.0.1",
    "project_urls": {
        "Download": "https://github.com/gregory-kyro/ChemSpaceAL/archive/refs/tags/v1.0.3.tar.gz",
        "Homepage": "https://github.com/batistagroup/ChemSpaceAL"
    },
    "split_keywords": [
        "active learning",
        "artificial intelligence",
        "deep learning",
        "machine learning",
        "molecular generation",
        "drug discovery"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4d383bcf337b038ac45b9d16736f6e2b562470ee3f97d5bf7773ca217c30f32d",
                "md5": "c2722e7c6db62a1aa5b4114d373ff5d5",
                "sha256": "088e741bf70035203519fe5a6d1ea4ab149c37bee39c5bde4fcae97ca167ca77"
            },
            "downloads": -1,
            "filename": "ChemSpaceAL-2.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c2722e7c6db62a1aa5b4114d373ff5d5",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 32087,
            "upload_time": "2024-02-24T04:04:01",
            "upload_time_iso_8601": "2024-02-24T04:04:01.858498Z",
            "url": "https://files.pythonhosted.org/packages/4d/38/3bcf337b038ac45b9d16736f6e2b562470ee3f97d5bf7773ca217c30f32d/ChemSpaceAL-2.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b2730b447f1bd04a93b2ae78355055b364f9a77376b04373120199a3897a3861",
                "md5": "f8b70ff205fed41c784130edce7c80f2",
                "sha256": "c431afd86c1cef3c37d4d31e8a047ecdad955542d56cc74b18b895e3dca0948a"
            },
            "downloads": -1,
            "filename": "ChemSpaceAL-2.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "f8b70ff205fed41c784130edce7c80f2",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 28151,
            "upload_time": "2024-02-24T04:04:03",
            "upload_time_iso_8601": "2024-02-24T04:04:03.581455Z",
            "url": "https://files.pythonhosted.org/packages/b2/73/0b447f1bd04a93b2ae78355055b364f9a77376b04373120199a3897a3861/ChemSpaceAL-2.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-24 04:04:03",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "batistagroup",
    "github_project": "ChemSpaceAL",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "requirements": [],
    "lcname": "chemspaceal"
}
        
Elapsed time: 3.70107s