sapiens


Namesapiens JSON
Version 1.0.4 PyPI version JSON
download
home_pagehttps://github.com/Merck/Sapiens
SummarySapiens: Human antibody language model based on BERT
upload_time2023-04-19 13:15:56
maintainer
docs_urlNone
authorDavid Prihoda
requires_python>=3.7
licenseMIT
keywords sapiens antibody humanization bert biophi
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Sapiens: Human antibody language model

```
    ____              _                
   / ___|  __ _ _ __ (_) ___ _ __  ___ 
   \___ \ / _` | '_ \| |/ _ \ '_ \/ __|
    ___| | |_| | |_| | |  __/ | | \__ \
   |____/ \__,_|  __/|_|\___|_| |_|___/
               |_|                    
```

<p>
<img src="https://github.com/Merck/Sapiens/actions/workflows/python-package-conda.yml/badge.svg"
    alt="Build & Test"></a>
<a href="https://pypi.org/project/sapiens/">
    <img src="https://img.shields.io/pypi/dm/sapiens"
        alt="Pip Install"></a>
<a href="https://github.com/Merck/Sapiens/releases">
    <img src="https://img.shields.io/pypi/v/sapiens"
        alt="Latest release"></a>
</p>

Sapiens is a human antibody language model based on BERT.

Learn more in the Sapiens, OASis and BioPhi in our publication:

> David Prihoda, Jad Maamary, Andrew Waight, Veronica Juan, Laurence Fayadat-Dilman, Daniel Svozil & Danny A. Bitton (2022) 
> BioPhi: A platform for antibody design, humanization, and humanness evaluation based on natural antibody repertoires and deep learning, mAbs, 14:1, DOI: https://doi.org/10.1080/19420862.2021.2020203


For more information about BioPhi, see the [BioPhi repository](https://github.com/Merck/BioPhi)

## Features

- Infilling missing residues in human antibody sequences
- Suggesting mutations (in frameworks as well as CDRs)
- Creating vector representations (embeddings) of residues or sequences

![Sapiens Antibody t-SNE Example](notebooks/Embedding_t-SNE.png)

## Usage

Install Sapiens using pip:

```bash
# Recommended: Create dedicated conda environment
conda create -n sapiens python=3.8
conda activate sapiens
# Install Sapiens
pip install sapiens
```

❗️ Python 3.7 or 3.8 is currently required due to fairseq bug in Python 3.9 and above: https://github.com/pytorch/fairseq/issues/3535

### Antibody sequence infilling

Positions marked with * or X will be infilled with the most likely human residues, given the rest of the sequence

```python
import sapiens

best = sapiens.predict_masked(
    '**QLV*SGVEVKKPGASVKVSCKASGYTFTNYYMYWVRQAPGQGLEWMGGINPSNGGTNFNEKFKNRVTLTTDSSTTTAYMELKSLQFDDTAVYYCARRDYRFDMGFDYWGQGTTVTVSS',
    'H'
)
print(best)
# QVQLVQSGVEVKKPGASVKVSCKASGYTFTNYYMYWVRQAPGQGLEWMGGINPSNGGTNFNEKFKNRVTLTTDSSTTTAYMELKSLQFDDTAVYYCARRDYRFDMGFDYWGQGTTVTVSS
```

### Suggesting mutations

Return residue scores for a given sequence:

```python
import sapiens

scores = sapiens.predict_scores(
    '**QLV*SGVEVKKPGASVKVSCKASGYTFTNYYMYWVRQAPGQGLEWMGGINPSNGGTNFNEKFKNRVTLTTDSSTTTAYMELKSLQFDDTAVYYCARRDYRFDMGFDYWGQGTTVTVSS',
    'H'
)
scores.head()
#           A         C         D         E  ...
# 0  0.003272  0.004147  0.004011  0.004590  ... <- based on masked input
# 1  0.012038  0.003854  0.006803  0.008174  ... <- based on masked input
# 2  0.003384  0.003895  0.003726  0.004068  ... <- based on Q input
# 3  0.004612  0.005325  0.004443  0.004641  ... <- based on L input
# 4  0.005519  0.003664  0.003555  0.005269  ... <- based on V input
#
# Scores are given both for residues that are masked and that are present. 
# When inputting a non-human antibody sequence, the output scores can be used for humanization.
```

### Antibody sequence embedding

Get a vector representation of each position in a sequence

```python
import sapiens

residue_embed = sapiens.predict_residue_embedding(
    'QVKLQESGAELARPGASVKLSCKASGYTFTNYWMQWVKQRPGQGLDWIGAIYPGDGNTRYTHKFKGKATLTADKSSSTAYMQLSSLASEDSGVYYCARGEGNYAWFAYWGQGTTVTVSS', 
    'H', 
    layer=None
)
residue_embed.shape
# (layer, position in sequence, features)
# (5, 119, 128)
```

Get a single vector for each sequence

```python
seq_embed = sapiens.predict_sequence_embedding(
    'QVKLQESGAELARPGASVKLSCKASGYTFTNYWMQWVKQRPGQGLDWIGAIYPGDGNTRYTHKFKGKATLTADKSSSTAYMQLSSLASEDSGVYYCARGEGNYAWFAYWGQGTTVTVSS', 
    'H', 
    layer=None
)
seq_embed.shape
# (layer, features)
# (5, 128)
```

### Notebooks

Try out Sapiens in your browser using these example notebooks:

<table>
    <tr><th>Links</th><th>Notebook</th><th>Description</th></tr>
    <tr>
        <td>
            <a href="https://mybinder.org/v2/gh/Merck/Sapiens/main?labpath=notebooks%2F01_sapiens_antibody_infilling.ipynb"><img src="https://mybinder.org/badge_logo.svg" /></a>
        </td>
        <td><a href="notebooks/01_sapiens_antibody_infilling.ipynb">01_sapiens_antibody_infilling</a></td>
        <td>Predict missing positions in an antibody sequence</td>
    </tr>
    <tr>
        <td>
            <a href="https://mybinder.org/v2/gh/Merck/Sapiens/main?labpath=notebooks%2F02_sapiens_antibody_embedding.ipynb"><img src="https://mybinder.org/badge_logo.svg" /></a>
        </td>
        <td><a href="notebooks/02_sapiens_antibody_embedding.ipynb">02_sapiens_antibody_embedding</a></td>
        <td>Get vector representations and visualize them using t-SNE</td>
    </tr>
</table>


## Acknowledgements

Sapiens is based on antibody repertoires from the Observed Antibody Space:

> Kovaltsuk, A., Leem, J., Kelm, S., Snowden, J., Deane, C. M., & Krawczyk, K. (2018). Observed Antibody Space: A Resource for Data Mining Next-Generation Sequencing of Antibody Repertoires. The Journal of Immunology, 201(8), 2502–2509. https://doi.org/10.4049/jimmunol.1800708

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Merck/Sapiens",
    "name": "sapiens",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "sapiens,antibody humanization,bert,biophi",
    "author": "David Prihoda",
    "author_email": "david.prihoda@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/37/b1/38ee24c99f7700fffdb1d2e6aef25ca6cb0c9510095265511f4985208e5d/sapiens-1.0.4.tar.gz",
    "platform": null,
    "description": "# Sapiens: Human antibody language model\n\n```\n    ____              _                \n   / ___|  __ _ _ __ (_) ___ _ __  ___ \n   \\___ \\ / _` | '_ \\| |/ _ \\ '_ \\/ __|\n    ___| | |_| | |_| | |  __/ | | \\__ \\\n   |____/ \\__,_|  __/|_|\\___|_| |_|___/\n               |_|                    \n```\n\n<p>\n<img src=\"https://github.com/Merck/Sapiens/actions/workflows/python-package-conda.yml/badge.svg\"\n    alt=\"Build & Test\"></a>\n<a href=\"https://pypi.org/project/sapiens/\">\n    <img src=\"https://img.shields.io/pypi/dm/sapiens\"\n        alt=\"Pip Install\"></a>\n<a href=\"https://github.com/Merck/Sapiens/releases\">\n    <img src=\"https://img.shields.io/pypi/v/sapiens\"\n        alt=\"Latest release\"></a>\n</p>\n\nSapiens is a human antibody language model based on BERT.\n\nLearn more in the Sapiens, OASis and BioPhi in our publication:\n\n> David Prihoda, Jad Maamary, Andrew Waight, Veronica Juan, Laurence Fayadat-Dilman, Daniel Svozil & Danny A. Bitton (2022) \n> BioPhi: A platform for antibody design, humanization, and humanness evaluation based on natural antibody repertoires and deep learning, mAbs, 14:1, DOI: https://doi.org/10.1080/19420862.2021.2020203\n\n\nFor more information about BioPhi, see the [BioPhi repository](https://github.com/Merck/BioPhi)\n\n## Features\n\n- Infilling missing residues in human antibody sequences\n- Suggesting mutations (in frameworks as well as CDRs)\n- Creating vector representations (embeddings) of residues or sequences\n\n![Sapiens Antibody t-SNE Example](notebooks/Embedding_t-SNE.png)\n\n## Usage\n\nInstall Sapiens using pip:\n\n```bash\n# Recommended: Create dedicated conda environment\nconda create -n sapiens python=3.8\nconda activate sapiens\n# Install Sapiens\npip install sapiens\n```\n\n\u2757\ufe0f Python 3.7 or 3.8 is currently required due to fairseq bug in Python 3.9 and above: https://github.com/pytorch/fairseq/issues/3535\n\n### Antibody sequence infilling\n\nPositions marked with * or X will be infilled with the most likely human residues, given the rest of the sequence\n\n```python\nimport sapiens\n\nbest = sapiens.predict_masked(\n    '**QLV*SGVEVKKPGASVKVSCKASGYTFTNYYMYWVRQAPGQGLEWMGGINPSNGGTNFNEKFKNRVTLTTDSSTTTAYMELKSLQFDDTAVYYCARRDYRFDMGFDYWGQGTTVTVSS',\n    'H'\n)\nprint(best)\n# QVQLVQSGVEVKKPGASVKVSCKASGYTFTNYYMYWVRQAPGQGLEWMGGINPSNGGTNFNEKFKNRVTLTTDSSTTTAYMELKSLQFDDTAVYYCARRDYRFDMGFDYWGQGTTVTVSS\n```\n\n### Suggesting mutations\n\nReturn residue scores for a given sequence:\n\n```python\nimport sapiens\n\nscores = sapiens.predict_scores(\n    '**QLV*SGVEVKKPGASVKVSCKASGYTFTNYYMYWVRQAPGQGLEWMGGINPSNGGTNFNEKFKNRVTLTTDSSTTTAYMELKSLQFDDTAVYYCARRDYRFDMGFDYWGQGTTVTVSS',\n    'H'\n)\nscores.head()\n#           A         C         D         E  ...\n# 0  0.003272  0.004147  0.004011  0.004590  ... <- based on masked input\n# 1  0.012038  0.003854  0.006803  0.008174  ... <- based on masked input\n# 2  0.003384  0.003895  0.003726  0.004068  ... <- based on Q input\n# 3  0.004612  0.005325  0.004443  0.004641  ... <- based on L input\n# 4  0.005519  0.003664  0.003555  0.005269  ... <- based on V input\n#\n# Scores are given both for residues that are masked and that are present. \n# When inputting a non-human antibody sequence, the output scores can be used for humanization.\n```\n\n### Antibody sequence embedding\n\nGet a vector representation of each position in a sequence\n\n```python\nimport sapiens\n\nresidue_embed = sapiens.predict_residue_embedding(\n    'QVKLQESGAELARPGASVKLSCKASGYTFTNYWMQWVKQRPGQGLDWIGAIYPGDGNTRYTHKFKGKATLTADKSSSTAYMQLSSLASEDSGVYYCARGEGNYAWFAYWGQGTTVTVSS', \n    'H', \n    layer=None\n)\nresidue_embed.shape\n# (layer, position in sequence, features)\n# (5, 119, 128)\n```\n\nGet a single vector for each sequence\n\n```python\nseq_embed = sapiens.predict_sequence_embedding(\n    'QVKLQESGAELARPGASVKLSCKASGYTFTNYWMQWVKQRPGQGLDWIGAIYPGDGNTRYTHKFKGKATLTADKSSSTAYMQLSSLASEDSGVYYCARGEGNYAWFAYWGQGTTVTVSS', \n    'H', \n    layer=None\n)\nseq_embed.shape\n# (layer, features)\n# (5, 128)\n```\n\n### Notebooks\n\nTry out Sapiens in your browser using these example notebooks:\n\n<table>\n    <tr><th>Links</th><th>Notebook</th><th>Description</th></tr>\n    <tr>\n        <td>\n            <a href=\"https://mybinder.org/v2/gh/Merck/Sapiens/main?labpath=notebooks%2F01_sapiens_antibody_infilling.ipynb\"><img src=\"https://mybinder.org/badge_logo.svg\" /></a>\n        </td>\n        <td><a href=\"notebooks/01_sapiens_antibody_infilling.ipynb\">01_sapiens_antibody_infilling</a></td>\n        <td>Predict missing positions in an antibody sequence</td>\n    </tr>\n    <tr>\n        <td>\n            <a href=\"https://mybinder.org/v2/gh/Merck/Sapiens/main?labpath=notebooks%2F02_sapiens_antibody_embedding.ipynb\"><img src=\"https://mybinder.org/badge_logo.svg\" /></a>\n        </td>\n        <td><a href=\"notebooks/02_sapiens_antibody_embedding.ipynb\">02_sapiens_antibody_embedding</a></td>\n        <td>Get vector representations and visualize them using t-SNE</td>\n    </tr>\n</table>\n\n\n## Acknowledgements\n\nSapiens is based on antibody repertoires from the Observed Antibody Space:\n\n> Kovaltsuk, A., Leem, J., Kelm, S., Snowden, J., Deane, C. M., & Krawczyk, K. (2018). Observed Antibody Space: A Resource for Data Mining Next-Generation Sequencing of Antibody Repertoires. The Journal of Immunology, 201(8), 2502\u20132509. https://doi.org/10.4049/jimmunol.1800708\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Sapiens: Human antibody language model based on BERT",
    "version": "1.0.4",
    "split_keywords": [
        "sapiens",
        "antibody humanization",
        "bert",
        "biophi"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "dc47f29f6d317ed8fb88249c91f30b6756e1fad2a49e873d7e60a739838b4e0c",
                "md5": "7a4f1410d1b5f1cfca22af1d68102b57",
                "sha256": "36466bdc8caef2ba148c5051affafdd1c2938d0ab9f35c24f2febfb11a1f96b2"
            },
            "downloads": -1,
            "filename": "sapiens-1.0.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "7a4f1410d1b5f1cfca22af1d68102b57",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 6965664,
            "upload_time": "2023-04-19T13:15:49",
            "upload_time_iso_8601": "2023-04-19T13:15:49.392400Z",
            "url": "https://files.pythonhosted.org/packages/dc/47/f29f6d317ed8fb88249c91f30b6756e1fad2a49e873d7e60a739838b4e0c/sapiens-1.0.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "37b138ee24c99f7700fffdb1d2e6aef25ca6cb0c9510095265511f4985208e5d",
                "md5": "c485d2b8ccb40077f68003f846eac0c6",
                "sha256": "805e620398078fa0ea08bbce5493e56acf42653c9f08908a7bcbaa7553daa00b"
            },
            "downloads": -1,
            "filename": "sapiens-1.0.4.tar.gz",
            "has_sig": false,
            "md5_digest": "c485d2b8ccb40077f68003f846eac0c6",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 6928648,
            "upload_time": "2023-04-19T13:15:56",
            "upload_time_iso_8601": "2023-04-19T13:15:56.368720Z",
            "url": "https://files.pythonhosted.org/packages/37/b1/38ee24c99f7700fffdb1d2e6aef25ca6cb0c9510095265511f4985208e5d/sapiens-1.0.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-04-19 13:15:56",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "Merck",
    "github_project": "Sapiens",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "sapiens"
}
        
Elapsed time: 0.08017s