evo-prot-grad


Nameevo-prot-grad JSON
Version 0.1.4 PyPI version JSON
download
home_pagehttps://github.nrel.gov/NREL/EvoProtGrad/
SummaryDirected evolution of proteins with fast gradient-based discrete MCMC.
upload_time2023-07-28 23:51:24
maintainer
docs_urlNone
authorPatrick Emami
requires_python>=3.8
licenseBSD 3-Clause
keywords protein engineering directed evolution huggingface protein language models mcmc
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # EvoProtGrad

[![PyPI version](https://badge.fury.io/py/evo-prot-grad.svg)](https://badge.fury.io/py/evo-prot-grad)
[![License](https://img.shields.io/badge/License-BSD_3--Clause-blue.svg)](https://opensource.org/licenses/BSD-3-Clause)

A Python package for directed **evo**lution on a **pro**tein sequence with **grad**ient-based discrete Markov chain monte carlo (MCMC). Users are able to compose their custom protein models that map sequence to function with various pretrained models, including protein language models (PLMs), to guide and constrain search. The library is designed to natively integrate with 🤗 HuggingFace and supports PLMs from the [transformers](https://huggingface.co/docs/transformers/index) library.

The underlying search technique is based on a variant of discrete MCMC that uses gradients of a *differentiable* compositional target function to rapidly explore a protein's fitness landscape in sequence space. 
We allow users to compose their own custom target function for MCMC by leveraging the Product of Experts MCMC paradigm.
Each model is an "expert" that contributes its own knowledge about the protein's fitness landscape to the overall target function.
Our MCMC sampler is designed to be more efficient and effective than brute force and random search while maintaining most of the generality and flexibility.
 
See our [publication](https://doi.org/10.1088/2632-2153/accacd) and our [documentation](https://nrel.github.io/EvoProtGrad) for more details.


## Installation

EvoProtGrad is available on PyPI and can be installed with pip:

```bash
pip install evo_prot_grad
```

If you wish to run tests or register a new expert model with EvoProtGrad, please clone this repo and install in editable mode as follows:

```bash
git clone https://github.com/NREL/EvoProtGrad.git
cd EvoProtGrad
pip install -e .
```

## Basic Usage

Create a `ProtBERT` expert from a pretrained HuggingFace protein language model (PLM) using `evo_prot_grad.get_expert`:

```python
import evo_prot_grad

prot_bert_expert = evo_prot_grad.get_expert('bert', temperature = 1.0)
```
The default BERT-style PLM in `EvoProtGrad` is `Rostlab/prot_bert`. Normally, we would need to also specify the model and tokenizer. When using a default PLM expert, we automatically pull these from the HuggingFace Hub. The temperature parameter rescales the expert scores and can be used to trade off the importance of different experts. For masked language models like `prot_bert`, we score variant sequences with the sum of amino acid log probabilities by default.

Then, create an instance of `DirectedEvolution` and run the search, returning a list of the best variant per Markov chain (as measured by the `prot_bert` expert):

```python
variants, scores = evo_prot_grad.DirectedEvolution(
                   wt_fasta = 'test/gfp.fasta',    # path to wild type fasta file
                   output = 'best',                # return best, last, all variants    
                   experts = [prot_bert_expert],   # list of experts to compose
                   parallel_chains = 1,            # number of parallel chains to run
                   n_steps = 20,                   # number of MCMC steps per chain
                   max_mutations = 10,             # maximum number of mutations per variant
                   verbose = True                  # print debug info to command line
)()
```

We provide a few  experts in `evo_prot_grad/experts` that you can use out of the box, such as:

Protein Language Models (PLMs)

- `bert`, BERT-style PLMs, default: `Rostlab/prot_bert`
- `causallm`, CausalLM-style PLMs, default: `lightonai/RITA_s`
- `esm`, ESM-style PLMs, default: `facebook/esm2_t6_8M_UR50D`

Potts models

- `evcouplings`

and an generic expert for supervised downstream regression models

- `onehot_downstream_regression`

See `demo.ipynb` to get started right away in a Jupyter notebook.

## Citation

If you use EvoProtGrad in your research, please cite the following publication:

```bibtex
@article{emami2023plug,
  title={Plug \& play directed evolution of proteins with gradient-based discrete MCMC},
  author={Emami, Patrick and Perreault, Aidan and Law, Jeffrey and Biagioni, David and John, Peter St},
  journal={Machine Learning: Science and Technology},
  volume={4},
  number={2},
  pages={025014},
  year={2023},
  publisher={IOP Publishing}
}
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.nrel.gov/NREL/EvoProtGrad/",
    "name": "evo-prot-grad",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "protein engineering,directed evolution,huggingface,protein language models,mcmc",
    "author": "Patrick Emami",
    "author_email": "Patrick.Emami@nrel.gov",
    "download_url": "https://files.pythonhosted.org/packages/06/36/5659194805a73118d983c2b42b85b43ad1cf9d8bf66bd48e1c80ee259143/evo_prot_grad-0.1.4.tar.gz",
    "platform": null,
    "description": "# EvoProtGrad\n\n[![PyPI version](https://badge.fury.io/py/evo-prot-grad.svg)](https://badge.fury.io/py/evo-prot-grad)\n[![License](https://img.shields.io/badge/License-BSD_3--Clause-blue.svg)](https://opensource.org/licenses/BSD-3-Clause)\n\nA Python package for directed **evo**lution on a **pro**tein sequence with **grad**ient-based discrete Markov chain monte carlo (MCMC). Users are able to compose their custom protein models that map sequence to function with various pretrained models, including protein language models (PLMs), to guide and constrain search. The library is designed to natively integrate with \ud83e\udd17 HuggingFace and supports PLMs from the [transformers](https://huggingface.co/docs/transformers/index) library.\n\nThe underlying search technique is based on a variant of discrete MCMC that uses gradients of a *differentiable* compositional target function to rapidly explore a protein's fitness landscape in sequence space. \nWe allow users to compose their own custom target function for MCMC by leveraging the Product of Experts MCMC paradigm.\nEach model is an \"expert\" that contributes its own knowledge about the protein's fitness landscape to the overall target function.\nOur MCMC sampler is designed to be more efficient and effective than brute force and random search while maintaining most of the generality and flexibility.\n \nSee our [publication](https://doi.org/10.1088/2632-2153/accacd) and our [documentation](https://nrel.github.io/EvoProtGrad) for more details.\n\n\n## Installation\n\nEvoProtGrad is available on PyPI and can be installed with pip:\n\n```bash\npip install evo_prot_grad\n```\n\nIf you wish to run tests or register a new expert model with EvoProtGrad, please clone this repo and install in editable mode as follows:\n\n```bash\ngit clone https://github.com/NREL/EvoProtGrad.git\ncd EvoProtGrad\npip install -e .\n```\n\n## Basic Usage\n\nCreate a `ProtBERT` expert from a pretrained HuggingFace protein language model (PLM) using `evo_prot_grad.get_expert`:\n\n```python\nimport evo_prot_grad\n\nprot_bert_expert = evo_prot_grad.get_expert('bert', temperature = 1.0)\n```\nThe default BERT-style PLM in `EvoProtGrad` is `Rostlab/prot_bert`. Normally, we would need to also specify the model and tokenizer. When using a default PLM expert, we automatically pull these from the HuggingFace Hub. The temperature parameter rescales the expert scores and can be used to trade off the importance of different experts. For masked language models like `prot_bert`, we score variant sequences with the sum of amino acid log probabilities by default.\n\nThen, create an instance of `DirectedEvolution` and run the search, returning a list of the best variant per Markov chain (as measured by the `prot_bert` expert):\n\n```python\nvariants, scores = evo_prot_grad.DirectedEvolution(\n                   wt_fasta = 'test/gfp.fasta',    # path to wild type fasta file\n                   output = 'best',                # return best, last, all variants    \n                   experts = [prot_bert_expert],   # list of experts to compose\n                   parallel_chains = 1,            # number of parallel chains to run\n                   n_steps = 20,                   # number of MCMC steps per chain\n                   max_mutations = 10,             # maximum number of mutations per variant\n                   verbose = True                  # print debug info to command line\n)()\n```\n\nWe provide a few  experts in `evo_prot_grad/experts` that you can use out of the box, such as:\n\nProtein Language Models (PLMs)\n\n- `bert`, BERT-style PLMs, default: `Rostlab/prot_bert`\n- `causallm`, CausalLM-style PLMs, default: `lightonai/RITA_s`\n- `esm`, ESM-style PLMs, default: `facebook/esm2_t6_8M_UR50D`\n\nPotts models\n\n- `evcouplings`\n\nand an generic expert for supervised downstream regression models\n\n- `onehot_downstream_regression`\n\nSee `demo.ipynb` to get started right away in a Jupyter notebook.\n\n## Citation\n\nIf you use EvoProtGrad in your research, please cite the following publication:\n\n```bibtex\n@article{emami2023plug,\n  title={Plug \\& play directed evolution of proteins with gradient-based discrete MCMC},\n  author={Emami, Patrick and Perreault, Aidan and Law, Jeffrey and Biagioni, David and John, Peter St},\n  journal={Machine Learning: Science and Technology},\n  volume={4},\n  number={2},\n  pages={025014},\n  year={2023},\n  publisher={IOP Publishing}\n}\n```\n",
    "bugtrack_url": null,
    "license": "BSD 3-Clause",
    "summary": "Directed evolution of proteins with fast gradient-based discrete MCMC.",
    "version": "0.1.4",
    "project_urls": {
        "Homepage": "https://github.nrel.gov/NREL/EvoProtGrad/"
    },
    "split_keywords": [
        "protein engineering",
        "directed evolution",
        "huggingface",
        "protein language models",
        "mcmc"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "32dae0675029ab5047141fe21d089a3a2f3984125e7a8b98a3ff1a957b6e4e31",
                "md5": "5f0444894451186eda7060101aab0117",
                "sha256": "8c96ca8cf56e7ae7974cb4bf98544bf33a67bd1be62a543ba124bfbcad8d942d"
            },
            "downloads": -1,
            "filename": "evo_prot_grad-0.1.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "5f0444894451186eda7060101aab0117",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 25178,
            "upload_time": "2023-07-28T23:51:23",
            "upload_time_iso_8601": "2023-07-28T23:51:23.516641Z",
            "url": "https://files.pythonhosted.org/packages/32/da/e0675029ab5047141fe21d089a3a2f3984125e7a8b98a3ff1a957b6e4e31/evo_prot_grad-0.1.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "06365659194805a73118d983c2b42b85b43ad1cf9d8bf66bd48e1c80ee259143",
                "md5": "b37df3adb3dc572b65ffff4671064e13",
                "sha256": "7d3aadf9e7eb7611248ee1c6c14d48f92b3486f1ae1877977bd31130b5765790"
            },
            "downloads": -1,
            "filename": "evo_prot_grad-0.1.4.tar.gz",
            "has_sig": false,
            "md5_digest": "b37df3adb3dc572b65ffff4671064e13",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 22465,
            "upload_time": "2023-07-28T23:51:24",
            "upload_time_iso_8601": "2023-07-28T23:51:24.862878Z",
            "url": "https://files.pythonhosted.org/packages/06/36/5659194805a73118d983c2b42b85b43ad1cf9d8bf66bd48e1c80ee259143/evo_prot_grad-0.1.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-07-28 23:51:24",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "evo-prot-grad"
}
        
Elapsed time: 0.09923s