# Sapiens: Human antibody language model
```
____ _
/ ___| __ _ _ __ (_) ___ _ __ ___
\___ \ / _` | '_ \| |/ _ \ '_ \/ __|
___| | |_| | |_| | | __/ | | \__ \
|____/ \__,_| __/|_|\___|_| |_|___/
|_|
```
<p>
<img src="https://github.com/Merck/Sapiens/actions/workflows/python-package-conda.yml/badge.svg"
alt="Build & Test"></a>
<a href="https://pypi.org/project/sapiens/">
<img src="https://img.shields.io/pypi/dm/sapiens"
alt="Pip Install"></a>
<a href="https://github.com/Merck/Sapiens/releases">
<img src="https://img.shields.io/pypi/v/sapiens"
alt="Latest release"></a>
</p>
Sapiens is a human antibody language model based on BERT.
Learn more in the Sapiens, OASis and BioPhi in our publication:
> David Prihoda, Jad Maamary, Andrew Waight, Veronica Juan, Laurence Fayadat-Dilman, Daniel Svozil & Danny A. Bitton (2022)
> BioPhi: A platform for antibody design, humanization, and humanness evaluation based on natural antibody repertoires and deep learning, mAbs, 14:1, DOI: https://doi.org/10.1080/19420862.2021.2020203
For more information about BioPhi, see the [BioPhi repository](https://github.com/Merck/BioPhi)
## Features
- Infilling missing residues in human antibody sequences
- Suggesting mutations (in frameworks as well as CDRs)
- Creating vector representations (embeddings) of residues or sequences
![Sapiens Antibody t-SNE Example](notebooks/Embedding_t-SNE.png)
## Usage
Install Sapiens using pip:
```bash
# Recommended: Create dedicated conda environment
conda create -n sapiens python=3.8
conda activate sapiens
# Install Sapiens
pip install sapiens
```
❗️ Python 3.7 or 3.8 is currently required due to fairseq bug in Python 3.9 and above: https://github.com/pytorch/fairseq/issues/3535
### Antibody sequence infilling
Positions marked with * or X will be infilled with the most likely human residues, given the rest of the sequence
```python
import sapiens
best = sapiens.predict_masked(
'**QLV*SGVEVKKPGASVKVSCKASGYTFTNYYMYWVRQAPGQGLEWMGGINPSNGGTNFNEKFKNRVTLTTDSSTTTAYMELKSLQFDDTAVYYCARRDYRFDMGFDYWGQGTTVTVSS',
'H'
)
print(best)
# QVQLVQSGVEVKKPGASVKVSCKASGYTFTNYYMYWVRQAPGQGLEWMGGINPSNGGTNFNEKFKNRVTLTTDSSTTTAYMELKSLQFDDTAVYYCARRDYRFDMGFDYWGQGTTVTVSS
```
### Suggesting mutations
Return residue scores for a given sequence:
```python
import sapiens
scores = sapiens.predict_scores(
'**QLV*SGVEVKKPGASVKVSCKASGYTFTNYYMYWVRQAPGQGLEWMGGINPSNGGTNFNEKFKNRVTLTTDSSTTTAYMELKSLQFDDTAVYYCARRDYRFDMGFDYWGQGTTVTVSS',
'H'
)
scores.head()
# A C D E ...
# 0 0.003272 0.004147 0.004011 0.004590 ... <- based on masked input
# 1 0.012038 0.003854 0.006803 0.008174 ... <- based on masked input
# 2 0.003384 0.003895 0.003726 0.004068 ... <- based on Q input
# 3 0.004612 0.005325 0.004443 0.004641 ... <- based on L input
# 4 0.005519 0.003664 0.003555 0.005269 ... <- based on V input
#
# Scores are given both for residues that are masked and that are present.
# When inputting a non-human antibody sequence, the output scores can be used for humanization.
```
### Antibody sequence embedding
Get a vector representation of each position in a sequence
```python
import sapiens
residue_embed = sapiens.predict_residue_embedding(
'QVKLQESGAELARPGASVKLSCKASGYTFTNYWMQWVKQRPGQGLDWIGAIYPGDGNTRYTHKFKGKATLTADKSSSTAYMQLSSLASEDSGVYYCARGEGNYAWFAYWGQGTTVTVSS',
'H',
layer=None
)
residue_embed.shape
# (layer, position in sequence, features)
# (5, 119, 128)
```
Get a single vector for each sequence
```python
seq_embed = sapiens.predict_sequence_embedding(
'QVKLQESGAELARPGASVKLSCKASGYTFTNYWMQWVKQRPGQGLDWIGAIYPGDGNTRYTHKFKGKATLTADKSSSTAYMQLSSLASEDSGVYYCARGEGNYAWFAYWGQGTTVTVSS',
'H',
layer=None
)
seq_embed.shape
# (layer, features)
# (5, 128)
```
### Notebooks
Try out Sapiens in your browser using these example notebooks:
<table>
<tr><th>Links</th><th>Notebook</th><th>Description</th></tr>
<tr>
<td>
<a href="https://mybinder.org/v2/gh/Merck/Sapiens/main?labpath=notebooks%2F01_sapiens_antibody_infilling.ipynb"><img src="https://mybinder.org/badge_logo.svg" /></a>
</td>
<td><a href="notebooks/01_sapiens_antibody_infilling.ipynb">01_sapiens_antibody_infilling</a></td>
<td>Predict missing positions in an antibody sequence</td>
</tr>
<tr>
<td>
<a href="https://mybinder.org/v2/gh/Merck/Sapiens/main?labpath=notebooks%2F02_sapiens_antibody_embedding.ipynb"><img src="https://mybinder.org/badge_logo.svg" /></a>
</td>
<td><a href="notebooks/02_sapiens_antibody_embedding.ipynb">02_sapiens_antibody_embedding</a></td>
<td>Get vector representations and visualize them using t-SNE</td>
</tr>
</table>
## Acknowledgements
Sapiens is based on antibody repertoires from the Observed Antibody Space:
> Kovaltsuk, A., Leem, J., Kelm, S., Snowden, J., Deane, C. M., & Krawczyk, K. (2018). Observed Antibody Space: A Resource for Data Mining Next-Generation Sequencing of Antibody Repertoires. The Journal of Immunology, 201(8), 2502–2509. https://doi.org/10.4049/jimmunol.1800708
Raw data
{
"_id": null,
"home_page": "https://github.com/Merck/Sapiens",
"name": "sapiens",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": "",
"keywords": "sapiens,antibody humanization,bert,biophi",
"author": "David Prihoda",
"author_email": "david.prihoda@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/37/b1/38ee24c99f7700fffdb1d2e6aef25ca6cb0c9510095265511f4985208e5d/sapiens-1.0.4.tar.gz",
"platform": null,
"description": "# Sapiens: Human antibody language model\n\n```\n ____ _ \n / ___| __ _ _ __ (_) ___ _ __ ___ \n \\___ \\ / _` | '_ \\| |/ _ \\ '_ \\/ __|\n ___| | |_| | |_| | | __/ | | \\__ \\\n |____/ \\__,_| __/|_|\\___|_| |_|___/\n |_| \n```\n\n<p>\n<img src=\"https://github.com/Merck/Sapiens/actions/workflows/python-package-conda.yml/badge.svg\"\n alt=\"Build & Test\"></a>\n<a href=\"https://pypi.org/project/sapiens/\">\n <img src=\"https://img.shields.io/pypi/dm/sapiens\"\n alt=\"Pip Install\"></a>\n<a href=\"https://github.com/Merck/Sapiens/releases\">\n <img src=\"https://img.shields.io/pypi/v/sapiens\"\n alt=\"Latest release\"></a>\n</p>\n\nSapiens is a human antibody language model based on BERT.\n\nLearn more in the Sapiens, OASis and BioPhi in our publication:\n\n> David Prihoda, Jad Maamary, Andrew Waight, Veronica Juan, Laurence Fayadat-Dilman, Daniel Svozil & Danny A. Bitton (2022) \n> BioPhi: A platform for antibody design, humanization, and humanness evaluation based on natural antibody repertoires and deep learning, mAbs, 14:1, DOI: https://doi.org/10.1080/19420862.2021.2020203\n\n\nFor more information about BioPhi, see the [BioPhi repository](https://github.com/Merck/BioPhi)\n\n## Features\n\n- Infilling missing residues in human antibody sequences\n- Suggesting mutations (in frameworks as well as CDRs)\n- Creating vector representations (embeddings) of residues or sequences\n\n![Sapiens Antibody t-SNE Example](notebooks/Embedding_t-SNE.png)\n\n## Usage\n\nInstall Sapiens using pip:\n\n```bash\n# Recommended: Create dedicated conda environment\nconda create -n sapiens python=3.8\nconda activate sapiens\n# Install Sapiens\npip install sapiens\n```\n\n\u2757\ufe0f Python 3.7 or 3.8 is currently required due to fairseq bug in Python 3.9 and above: https://github.com/pytorch/fairseq/issues/3535\n\n### Antibody sequence infilling\n\nPositions marked with * or X will be infilled with the most likely human residues, given the rest of the sequence\n\n```python\nimport sapiens\n\nbest = sapiens.predict_masked(\n '**QLV*SGVEVKKPGASVKVSCKASGYTFTNYYMYWVRQAPGQGLEWMGGINPSNGGTNFNEKFKNRVTLTTDSSTTTAYMELKSLQFDDTAVYYCARRDYRFDMGFDYWGQGTTVTVSS',\n 'H'\n)\nprint(best)\n# QVQLVQSGVEVKKPGASVKVSCKASGYTFTNYYMYWVRQAPGQGLEWMGGINPSNGGTNFNEKFKNRVTLTTDSSTTTAYMELKSLQFDDTAVYYCARRDYRFDMGFDYWGQGTTVTVSS\n```\n\n### Suggesting mutations\n\nReturn residue scores for a given sequence:\n\n```python\nimport sapiens\n\nscores = sapiens.predict_scores(\n '**QLV*SGVEVKKPGASVKVSCKASGYTFTNYYMYWVRQAPGQGLEWMGGINPSNGGTNFNEKFKNRVTLTTDSSTTTAYMELKSLQFDDTAVYYCARRDYRFDMGFDYWGQGTTVTVSS',\n 'H'\n)\nscores.head()\n# A C D E ...\n# 0 0.003272 0.004147 0.004011 0.004590 ... <- based on masked input\n# 1 0.012038 0.003854 0.006803 0.008174 ... <- based on masked input\n# 2 0.003384 0.003895 0.003726 0.004068 ... <- based on Q input\n# 3 0.004612 0.005325 0.004443 0.004641 ... <- based on L input\n# 4 0.005519 0.003664 0.003555 0.005269 ... <- based on V input\n#\n# Scores are given both for residues that are masked and that are present. \n# When inputting a non-human antibody sequence, the output scores can be used for humanization.\n```\n\n### Antibody sequence embedding\n\nGet a vector representation of each position in a sequence\n\n```python\nimport sapiens\n\nresidue_embed = sapiens.predict_residue_embedding(\n 'QVKLQESGAELARPGASVKLSCKASGYTFTNYWMQWVKQRPGQGLDWIGAIYPGDGNTRYTHKFKGKATLTADKSSSTAYMQLSSLASEDSGVYYCARGEGNYAWFAYWGQGTTVTVSS', \n 'H', \n layer=None\n)\nresidue_embed.shape\n# (layer, position in sequence, features)\n# (5, 119, 128)\n```\n\nGet a single vector for each sequence\n\n```python\nseq_embed = sapiens.predict_sequence_embedding(\n 'QVKLQESGAELARPGASVKLSCKASGYTFTNYWMQWVKQRPGQGLDWIGAIYPGDGNTRYTHKFKGKATLTADKSSSTAYMQLSSLASEDSGVYYCARGEGNYAWFAYWGQGTTVTVSS', \n 'H', \n layer=None\n)\nseq_embed.shape\n# (layer, features)\n# (5, 128)\n```\n\n### Notebooks\n\nTry out Sapiens in your browser using these example notebooks:\n\n<table>\n <tr><th>Links</th><th>Notebook</th><th>Description</th></tr>\n <tr>\n <td>\n <a href=\"https://mybinder.org/v2/gh/Merck/Sapiens/main?labpath=notebooks%2F01_sapiens_antibody_infilling.ipynb\"><img src=\"https://mybinder.org/badge_logo.svg\" /></a>\n </td>\n <td><a href=\"notebooks/01_sapiens_antibody_infilling.ipynb\">01_sapiens_antibody_infilling</a></td>\n <td>Predict missing positions in an antibody sequence</td>\n </tr>\n <tr>\n <td>\n <a href=\"https://mybinder.org/v2/gh/Merck/Sapiens/main?labpath=notebooks%2F02_sapiens_antibody_embedding.ipynb\"><img src=\"https://mybinder.org/badge_logo.svg\" /></a>\n </td>\n <td><a href=\"notebooks/02_sapiens_antibody_embedding.ipynb\">02_sapiens_antibody_embedding</a></td>\n <td>Get vector representations and visualize them using t-SNE</td>\n </tr>\n</table>\n\n\n## Acknowledgements\n\nSapiens is based on antibody repertoires from the Observed Antibody Space:\n\n> Kovaltsuk, A., Leem, J., Kelm, S., Snowden, J., Deane, C. M., & Krawczyk, K. (2018). Observed Antibody Space: A Resource for Data Mining Next-Generation Sequencing of Antibody Repertoires. The Journal of Immunology, 201(8), 2502\u20132509. https://doi.org/10.4049/jimmunol.1800708\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Sapiens: Human antibody language model based on BERT",
"version": "1.0.4",
"split_keywords": [
"sapiens",
"antibody humanization",
"bert",
"biophi"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "dc47f29f6d317ed8fb88249c91f30b6756e1fad2a49e873d7e60a739838b4e0c",
"md5": "7a4f1410d1b5f1cfca22af1d68102b57",
"sha256": "36466bdc8caef2ba148c5051affafdd1c2938d0ab9f35c24f2febfb11a1f96b2"
},
"downloads": -1,
"filename": "sapiens-1.0.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "7a4f1410d1b5f1cfca22af1d68102b57",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 6965664,
"upload_time": "2023-04-19T13:15:49",
"upload_time_iso_8601": "2023-04-19T13:15:49.392400Z",
"url": "https://files.pythonhosted.org/packages/dc/47/f29f6d317ed8fb88249c91f30b6756e1fad2a49e873d7e60a739838b4e0c/sapiens-1.0.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "37b138ee24c99f7700fffdb1d2e6aef25ca6cb0c9510095265511f4985208e5d",
"md5": "c485d2b8ccb40077f68003f846eac0c6",
"sha256": "805e620398078fa0ea08bbce5493e56acf42653c9f08908a7bcbaa7553daa00b"
},
"downloads": -1,
"filename": "sapiens-1.0.4.tar.gz",
"has_sig": false,
"md5_digest": "c485d2b8ccb40077f68003f846eac0c6",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 6928648,
"upload_time": "2023-04-19T13:15:56",
"upload_time_iso_8601": "2023-04-19T13:15:56.368720Z",
"url": "https://files.pythonhosted.org/packages/37/b1/38ee24c99f7700fffdb1d2e6aef25ca6cb0c9510095265511f4985208e5d/sapiens-1.0.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-04-19 13:15:56",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "Merck",
"github_project": "Sapiens",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "sapiens"
}