moebius


Namemoebius JSON
Version 0.3.dev3 PyPI version JSON
download
home_page
SummaryPython package for optimizing peptide sequences using Bayesian optimization (BO)
upload_time2023-06-12 17:21:08
maintainer
docs_urlNone
author
requires_python<=3.10
licenseApache-2.0
keywords drug design peptide bayesian optimization helm
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Mobius

A python package for optimizing peptide sequences using Bayesian optimization (BO)

## Installation

I highly recommand you to install Mamba (https://github.com/conda-forge/miniforge#mambaforge) if you want a clean python environnment. To install everything properly with `mamba`, you just have to do this:

```bash
mamba env create -f environment.yaml -n mobius
mamba activate mobius
```

We can now install the `mobius` package from the PyPI index:
```bash
# This is not a mistake, the package is called moebius on PyPI
pip install moebius
```

You can also get it directly from the source code:
```bash
pip install git+https://git.scicore.unibas.ch/schwede/mobius.git@v0.3
```

## Quick tutorial

```python
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#

import numpy as np
from mobius import Planner, SequenceGA
from mobius import Map4Fingerprint
from mobius import GPModel, ExpectedImprovement, TanimotoSimilarityKernel
from mobius import LinearPeptideEmulator
from mobius import homolog_scanning, alanine_scanning
from mobius import convert_FASTA_to_HELM
```

Simple linear peptide emulator/oracle for MHC class I A*0201. The Position Specific Scoring Matrices
(PSSM) can be downloaded from the [IEDB](http://tools.iedb.org/mhci/download/) database (see `Scoring 
matrices of SMM and SMMPMBEC` section). WARNING: This is for benchmarking purpose only. This step should be an 
actual lab experiment.
```python
pssm_files = ['IEDB_MHC_I-2.9_matx_smm_smmpmbec/smmpmbec_matrix/HLA-A-02:01-8.txt',
              'IEDB_MHC_I-2.9_matx_smm_smmpmbec/smmpmbec_matrix/HLA-A-02:01-9.txt',
              'IEDB_MHC_I-2.9_matx_smm_smmpmbec/smmpmbec_matrix/HLA-A-02:01-10.txt',
              'IEDB_MHC_I-2.9_matx_smm_smmpmbec/smmpmbec_matrix/HLA-A-02:01-11.txt']
lpe = LinearPeptideEmulator(pssm_files)
```

Now we define a peptide sequence we want to optimize
```python
lead_peptide = convert_FASTA_to_HELM('HMTEVVRRC')[0]
```

Then we generate the first seed library of 96 peptides using a combination of both alanine scanning 
and homolog scanning sequence-based strategies
```python
seed_library = [lead_peptide]

for seq in alanine_scanning(lead_peptide):
    seed_library.append(seq)
    
for seq in homolog_scanning(lead_peptide):
    seed_library.append(seq)

    if len(seed_library) >= 96:
        print('Reach max. number of peptides allowed.')
        break
```

The seed library is then virtually tested (Make/Test) using the linear peptide emulator we defined earlier.
WARNING: This is for benchmarking purpose only. This step is supposed to be an actual lab experiment.
```python
pic50_seed_library = lpe.predict(seed_library)
```

Once we got results from our first lab experiment we can now start the Bayesian Optimization (BO) First, 
we define the molecular fingerprint we want to use as well as the surrogate model (Gaussian Process),  
the acquisition function (Expected Improvement) and the optimization methode (SequenceGA).
```python
map4 = Map4Fingerprint(input_type='helm_rdkit', dimensions=4096, radius=1)
gpmodel = GPModel(kernel=TanimotoSimilarityKernel(), input_transformer=map4)
acq = ExpectedImprovement(gpmodel, maximize=False)
optimizer = SequenceGA(total_attempts=5)
```

... and now let's define the search protocol in a YAML configuration file (`design_protocol.yaml`) that will be used 
to optimize the peptide sequence. This YAML configuration file defines the design protocol, in which you need 
to define the peptide scaffold, linear here. Additionnaly, you can specify the sets of monomers to be used at 
specific positions during the optimization.  You can also define some filtering criteria to remove peptide sequences 
that might exhibit some problematic properties during synthesis, such as self-aggregation or solubility.

```YAML
design:
  monomers: 
    default: [A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y]
    APOLAR: [A, F, G, I, L, P, V, W]
    POLAR: [C, D, E, H, K, N, Q, R, K, S, T, M]
    AROMATIC: [F, H, W, Y]
    POS_CHARGED: [K, R]
    NEG_CHARGED: [D, E]
  scaffolds:
    - PEPTIDE1{X.M.X.X.X.X.X.X.X}$$$$V2.0:
        PEPTIDE1:
          1: [AROMATIC, NEG_CHARGED]
          4: POLAR
          9: [A, V, I, L, M, T]
filters:
  - class_path: mobius.PeptideSelfAggregationFilter
  - class_path: mobius.PeptideSolubilityFilter
    init_args:
      hydrophobe_ratio: 0.5
      charged_per_amino_acids: 5

```

Once acquisition function / surrogate model are defined and the parameters set in the YAML 
configuration file, we can initiate the planner method.
```python
ps = Planner(acq, optimizer, design_protocol='design_protocol.yaml')
```

Now it is time to run the optimization!!

```python
peptides = list(seed_library)[:]
pic50_scores = list(pic50_seed_library)[:]

# Here we are going to do 3 DMT cycles
for i in range(3):
    # Run optimization, recommand 96 new peptides based on existing data
    suggested_peptides, _ = ps.recommand(peptides, pic50_scores, batch_size=96)

    # Here you can add whatever methods you want to further filter out peptides
    
    # Get the pIC50 (Make/Test) of all the suggested peptides using the MHC emulator
    # WARNING: This is for benchmarking purpose only. This 
    # step is supposed to be an actual lab experiment.
    pic50_suggested_peptides = lpe.predict(suggested_peptides)
    
    # Add all the new data
    peptides.extend(list(suggested_peptides))
    pic50_scores.extend(list(pic50_suggested_peptides))
    
    best_seq = peptides[np.argmin(pic50_scores)]
    best_pic50 = np.min(pic50_scores)
    print('Best peptide found so far: %s / %.3f' % (best_seq, best_pic50))
    print('')
```

## Documentation

The installation instructions, documentation and tutorials can be found on [readthedocs.org](https://mobius.readthedocs.io/en/latest/).

## Citation

* [J. Eberhardt, M. Lill, T. Schwede. (2023). Combining Bayesian optimization with sequence- or structure-based strategies for optimization of peptide-binding protein.](https://doi.org/10.26434/chemrxiv-2023-b7l81)

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "moebius",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "<=3.10",
    "maintainer_email": "",
    "keywords": "drug design,peptide,Bayesian optimization,HELM",
    "author": "",
    "author_email": "Jerome Eberhardt <jerome.eberhardt@unibas.ch>, Markus Lill <markus.lill@unibas.ch>, Torsten Schwede <tosten.schwede@unibas.ch>",
    "download_url": "https://files.pythonhosted.org/packages/4a/14/037c81c5ae90fd6454c8970036dd705506f359283575a1eae301bfa5d0ad/moebius-0.3.dev3.tar.gz",
    "platform": null,
    "description": "# Mobius\n\nA python package for optimizing peptide sequences using Bayesian optimization (BO)\n\n## Installation\n\nI highly recommand you to install Mamba (https://github.com/conda-forge/miniforge#mambaforge) if you want a clean python environnment. To install everything properly with `mamba`, you just have to do this:\n\n```bash\nmamba env create -f environment.yaml -n mobius\nmamba activate mobius\n```\n\nWe can now install the `mobius` package from the PyPI index:\n```bash\n# This is not a mistake, the package is called moebius on PyPI\npip install moebius\n```\n\nYou can also get it directly from the source code:\n```bash\npip install git+https://git.scicore.unibas.ch/schwede/mobius.git@v0.3\n```\n\n## Quick tutorial\n\n```python\n#!/usr/bin/env python\n# -*- coding: utf-8 -*-\n#\n\nimport numpy as np\nfrom mobius import Planner, SequenceGA\nfrom mobius import Map4Fingerprint\nfrom mobius import GPModel, ExpectedImprovement, TanimotoSimilarityKernel\nfrom mobius import LinearPeptideEmulator\nfrom mobius import homolog_scanning, alanine_scanning\nfrom mobius import convert_FASTA_to_HELM\n```\n\nSimple linear peptide emulator/oracle for MHC class I A*0201. The Position Specific Scoring Matrices\n(PSSM) can be downloaded from the [IEDB](http://tools.iedb.org/mhci/download/) database (see `Scoring \nmatrices of SMM and SMMPMBEC` section). WARNING: This is for benchmarking purpose only. This step should be an \nactual lab experiment.\n```python\npssm_files = ['IEDB_MHC_I-2.9_matx_smm_smmpmbec/smmpmbec_matrix/HLA-A-02:01-8.txt',\n              'IEDB_MHC_I-2.9_matx_smm_smmpmbec/smmpmbec_matrix/HLA-A-02:01-9.txt',\n              'IEDB_MHC_I-2.9_matx_smm_smmpmbec/smmpmbec_matrix/HLA-A-02:01-10.txt',\n              'IEDB_MHC_I-2.9_matx_smm_smmpmbec/smmpmbec_matrix/HLA-A-02:01-11.txt']\nlpe = LinearPeptideEmulator(pssm_files)\n```\n\nNow we define a peptide sequence we want to optimize\n```python\nlead_peptide = convert_FASTA_to_HELM('HMTEVVRRC')[0]\n```\n\nThen we generate the first seed library of 96 peptides using a combination of both alanine scanning \nand homolog scanning sequence-based strategies\n```python\nseed_library = [lead_peptide]\n\nfor seq in alanine_scanning(lead_peptide):\n    seed_library.append(seq)\n    \nfor seq in homolog_scanning(lead_peptide):\n    seed_library.append(seq)\n\n    if len(seed_library) >= 96:\n        print('Reach max. number of peptides allowed.')\n        break\n```\n\nThe seed library is then virtually tested (Make/Test) using the linear peptide emulator we defined earlier.\nWARNING: This is for benchmarking purpose only. This step is supposed to be an actual lab experiment.\n```python\npic50_seed_library = lpe.predict(seed_library)\n```\n\nOnce we got results from our first lab experiment we can now start the Bayesian Optimization (BO) First, \nwe define the molecular fingerprint we want to use as well as the surrogate model (Gaussian Process),  \nthe acquisition function (Expected Improvement) and the optimization methode (SequenceGA).\n```python\nmap4 = Map4Fingerprint(input_type='helm_rdkit', dimensions=4096, radius=1)\ngpmodel = GPModel(kernel=TanimotoSimilarityKernel(), input_transformer=map4)\nacq = ExpectedImprovement(gpmodel, maximize=False)\noptimizer = SequenceGA(total_attempts=5)\n```\n\n... and now let's define the search protocol in a YAML configuration file (`design_protocol.yaml`) that will be used \nto optimize the peptide sequence. This YAML configuration file defines the design protocol, in which you need \nto define the peptide scaffold, linear here. Additionnaly, you can specify the sets of monomers to be used at \nspecific positions during the optimization.  You can also define some filtering criteria to remove peptide sequences \nthat might exhibit some problematic properties during synthesis, such as self-aggregation or solubility.\n\n```YAML\ndesign:\n  monomers: \n    default: [A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y]\n    APOLAR: [A, F, G, I, L, P, V, W]\n    POLAR: [C, D, E, H, K, N, Q, R, K, S, T, M]\n    AROMATIC: [F, H, W, Y]\n    POS_CHARGED: [K, R]\n    NEG_CHARGED: [D, E]\n  scaffolds:\n    - PEPTIDE1{X.M.X.X.X.X.X.X.X}$$$$V2.0:\n        PEPTIDE1:\n          1: [AROMATIC, NEG_CHARGED]\n          4: POLAR\n          9: [A, V, I, L, M, T]\nfilters:\n  - class_path: mobius.PeptideSelfAggregationFilter\n  - class_path: mobius.PeptideSolubilityFilter\n    init_args:\n      hydrophobe_ratio: 0.5\n      charged_per_amino_acids: 5\n\n```\n\nOnce acquisition function / surrogate model are defined and the parameters set in the YAML \nconfiguration file, we can initiate the planner method.\n```python\nps = Planner(acq, optimizer, design_protocol='design_protocol.yaml')\n```\n\nNow it is time to run the optimization!!\n\n```python\npeptides = list(seed_library)[:]\npic50_scores = list(pic50_seed_library)[:]\n\n# Here we are going to do 3 DMT cycles\nfor i in range(3):\n    # Run optimization, recommand 96 new peptides based on existing data\n    suggested_peptides, _ = ps.recommand(peptides, pic50_scores, batch_size=96)\n\n    # Here you can add whatever methods you want to further filter out peptides\n    \n    # Get the pIC50 (Make/Test) of all the suggested peptides using the MHC emulator\n    # WARNING: This is for benchmarking purpose only. This \n    # step is supposed to be an actual lab experiment.\n    pic50_suggested_peptides = lpe.predict(suggested_peptides)\n    \n    # Add all the new data\n    peptides.extend(list(suggested_peptides))\n    pic50_scores.extend(list(pic50_suggested_peptides))\n    \n    best_seq = peptides[np.argmin(pic50_scores)]\n    best_pic50 = np.min(pic50_scores)\n    print('Best peptide found so far: %s / %.3f' % (best_seq, best_pic50))\n    print('')\n```\n\n## Documentation\n\nThe installation instructions, documentation and tutorials can be found on [readthedocs.org](https://mobius.readthedocs.io/en/latest/).\n\n## Citation\n\n* [J. Eberhardt, M. Lill, T. Schwede. (2023). Combining Bayesian optimization with sequence- or structure-based strategies for optimization of peptide-binding protein.](https://doi.org/10.26434/chemrxiv-2023-b7l81)\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Python package for optimizing peptide sequences using Bayesian optimization (BO)",
    "version": "0.3.dev3",
    "project_urls": {
        "Documentation": "https://mobius.readthedocs.io/en/latest/",
        "Homepage": "https://git.scicore.unibas.ch/schwede/mobius",
        "Source": "https://git.scicore.unibas.ch/schwede/mobius"
    },
    "split_keywords": [
        "drug design",
        "peptide",
        "bayesian optimization",
        "helm"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "079947002785fbcd252992fa14b81cb82a53f724e280f1dfa4137d589372d8d6",
                "md5": "9d23f9b5b01639407fafb4b414b151b4",
                "sha256": "936c3cd302cfd9eb465a9a007e4e8ab6e6e3db82cc05e4c4fe5dbaa6cf6658e3"
            },
            "downloads": -1,
            "filename": "moebius-0.3.dev3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9d23f9b5b01639407fafb4b414b151b4",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<=3.10",
            "size": 8183,
            "upload_time": "2023-06-12T17:21:06",
            "upload_time_iso_8601": "2023-06-12T17:21:06.591149Z",
            "url": "https://files.pythonhosted.org/packages/07/99/47002785fbcd252992fa14b81cb82a53f724e280f1dfa4137d589372d8d6/moebius-0.3.dev3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4a14037c81c5ae90fd6454c8970036dd705506f359283575a1eae301bfa5d0ad",
                "md5": "2503f023b45dfbfda5173bfcad67b751",
                "sha256": "fc407779e36afb592d2cbcec657b7f3a17dcd56eee0cdb1067c9f6001788dbba"
            },
            "downloads": -1,
            "filename": "moebius-0.3.dev3.tar.gz",
            "has_sig": false,
            "md5_digest": "2503f023b45dfbfda5173bfcad67b751",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<=3.10",
            "size": 8365,
            "upload_time": "2023-06-12T17:21:08",
            "upload_time_iso_8601": "2023-06-12T17:21:08.717378Z",
            "url": "https://files.pythonhosted.org/packages/4a/14/037c81c5ae90fd6454c8970036dd705506f359283575a1eae301bfa5d0ad/moebius-0.3.dev3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-06-12 17:21:08",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "moebius"
}
        
Elapsed time: 1.18796s