Name | antiberty JSON |
Version |
0.1.3
JSON |
| download |
home_page | |
Summary | |
upload_time | 2023-07-16 23:15:31 |
maintainer | |
docs_url | None |
author | |
requires_python | |
license | |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# AntiBERTy
Official repository for AntiBERTy, an antibody-specific transformer language model pre-trained on 558M natural antibody sequences, as described in [Deciphering antibody affinity maturation with language models and weakly supervised learning](https://arxiv.org/abs/2112.07782).
## Setup
To use AntiBERTy, install via pip:
```bash
pip install antiberty
```
Alternatively, you can clone this repository and install the package locally:
```bash
$ git clone git@github.com:jeffreyruffolo/AntiBERTy.git
$ pip install AntiBERTy
```
## Usage
### Embeddings
To use AntiBERTy to generate sequence embeddings, use the `embed` function. The output is a list of embedding tensors, where each tensor is the embedding for the corresponding sequence. Each embedding has dimension `[(Length + 2) x 512]`.
```python
from antiberty import AntiBERTyRunner
antiberty = AntiBERTyRunner()
sequences = [
"EVQLVQSGPEVKKPGTSVKVSCKASGFTFMSSAVQWVRQARGQRLEWIGWIVIGSGNTNYAQKFQERVTITRDMSTSTAYMELSSLRSEDTAVYYCAAPYCSSISCNDGFDIWGQGTMVTVS",
"DVVMTQTPFSLPVSLGDQASISCRSSQSLVHSNGNTYLHWYLQKPGQSPKLLIYKVSNRFSGVPDRFSGSGSGTDFTLKISRVEAEDLGVYFCSQSTHVPYTFGGGTKLEIK",
]
embeddings = antiberty.embed(sequences)
```
To access the attention matrices, pass the `return_attention` flag to the `embed` function. The output is a list of attention matrices, where each matrix is the attention matrix for the corresponding sequence. Each attention matrix has dimension `[Layer x Heads x (Length + 2) x (Length + 2)]`.
```python
from antiberty import AntiBERTyRunner
antiberty = AntiBERTyRunner()
sequences = [
"EVQLVQSGPEVKKPGTSVKVSCKASGFTFMSSAVQWVRQARGQRLEWIGWIVIGSGNTNYAQKFQERVTITRDMSTSTAYMELSSLRSEDTAVYYCAAPYCSSISCNDGFDIWGQGTMVTVS",
"DVVMTQTPFSLPVSLGDQASISCRSSQSLVHSNGNTYLHWYLQKPGQSPKLLIYKVSNRFSGVPDRFSGSGSGTDFTLKISRVEAEDLGVYFCSQSTHVPYTFGGGTKLEIK",
]
embeddings, attentions = antiberty.embed(sequences, return_attention=True)
```
The `embed` function can also be used with masked sequences. Masked residues should be indicated with underscores.
### Classification
To use AntiBERTy to predict the species and chain type of sequences, use the `classify` function. The output is two lists of classifications for each sequences.
```python
from antiberty import AntiBERTyRunner
antiberty = AntiBERTyRunner()
sequences = [
"EVQLVQSGPEVKKPGTSVKVSCKASGFTFMSSAVQWVRQARGQRLEWIGWIVIGSGNTNYAQKFQERVTITRDMSTSTAYMELSSLRSEDTAVYYCAAPYCSSISCNDGFDIWGQGTMVTVS",
"DVVMTQTPFSLPVSLGDQASISCRSSQSLVHSNGNTYLHWYLQKPGQSPKLLIYKVSNRFSGVPDRFSGSGSGTDFTLKISRVEAEDLGVYFCSQSTHVPYTFGGGTKLEIK",
]
species_preds, chain_preds = antiberty.classify(sequences)
```
The `classify` function can also be used with masked sequences. Masked residues should be indicated with underscores.
### Mask prediction
To use AntiBERTy to predict the identity of masked residues, use the `fill_masks` function. Masked residues should be indicated with underscores. The output is a list of filled sequences, corresponding to the input masked sequences.
```python
from antiberty import AntiBERTyRunner
antiberty = AntiBERTyRunner()
sequences = [
"____VQSGPEVKKPGTSVKVSCKASGFTFMSSAVQWVRQARGQRLEWIGWIVIGSGN_NYAQKFQERVTITRDM__STAYMELSSLRSEDTAVYYCAAPYCSSISCNDGFD____GTMVTVS",
"DVVMTQTPFSLPV__GDQASISCRSSQSLVHSNGNTY_HWYLQKPGQSPKLLIYKVSNRFSGVPDRFSG_GSGTDFTLKISRVEAEDLGVYFCSQSTHVPYTFGG__KLEIK",
]
filled_sequences = antiberty.fill_masks(sequences)
```
### Pseudo log-likelihood
To use AntiBERTy to calculate the pseudo log-likelihood of a sequence, use the `pseudo_log_likelihood` function. The pseudo log-likelihood of a sequence is calculated as the average of per-residue masked log-likelihoods. The output is a list of pseudo log-likelihoods, corresponding to the input sequences.
```python
from antiberty import AntiBERTyRunner
antiberty = AntiBERTyRunner()
sequences = [
"EVQLVQSGPEVKKPGTSVKVSCKASGFTFMSSAVQWVRQARGQRLEWIGWIVIGSGNTNYAQKFQERVTITRDMSTSTAYMELSSLRSEDTAVYYCAAPYCSSISCNDGFDIWGQGTMVTVS",
"DVVMTQSSTPFSLPVSLGDQASISCRSSQSLVHSNGNTYLHWYLQKPGQSPKLLIYKVSNRFSGVPDRFSGSGSGTDFTLKISRVEAEDLGVYFCSQSTHVPYTFGGGTKLEIK",
]
pll = antiberty.pseudo_log_likelihood(sequences, batch_size=16)
```
## Citing this work
```bibtex
@article{ruffolo2021deciphering,
title = {Deciphering antibody affinity maturation with language models and weakly supervised learning},
author = {Ruffolo, Jeffrey A and Gray, Jeffrey J and Sulam, Jeremias},
journal = {arXiv preprint arXiv:2112.07782},
year= {2021}
}
```
Raw data
{
"_id": null,
"home_page": "",
"name": "antiberty",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "",
"author": "",
"author_email": "",
"download_url": "https://files.pythonhosted.org/packages/a1/3b/2cf48ec21956252fdc5c5dd1b7f8bb8b12f5208bd3eaaad412ced3ed0ff5/antiberty-0.1.3.tar.gz",
"platform": null,
"description": "# AntiBERTy\nOfficial repository for AntiBERTy, an antibody-specific transformer language model pre-trained on 558M natural antibody sequences, as described in [Deciphering antibody affinity maturation with language models and weakly supervised learning](https://arxiv.org/abs/2112.07782).\n\n\n## Setup\nTo use AntiBERTy, install via pip:\n```bash\npip install antiberty\n```\n\nAlternatively, you can clone this repository and install the package locally:\n```bash\n$ git clone git@github.com:jeffreyruffolo/AntiBERTy.git \n$ pip install AntiBERTy\n```\n\n## Usage\n\n### Embeddings\n\nTo use AntiBERTy to generate sequence embeddings, use the `embed` function. The output is a list of embedding tensors, where each tensor is the embedding for the corresponding sequence. Each embedding has dimension `[(Length + 2) x 512]`.\n\n```python\nfrom antiberty import AntiBERTyRunner\n\nantiberty = AntiBERTyRunner()\n\nsequences = [\n \"EVQLVQSGPEVKKPGTSVKVSCKASGFTFMSSAVQWVRQARGQRLEWIGWIVIGSGNTNYAQKFQERVTITRDMSTSTAYMELSSLRSEDTAVYYCAAPYCSSISCNDGFDIWGQGTMVTVS\",\n \"DVVMTQTPFSLPVSLGDQASISCRSSQSLVHSNGNTYLHWYLQKPGQSPKLLIYKVSNRFSGVPDRFSGSGSGTDFTLKISRVEAEDLGVYFCSQSTHVPYTFGGGTKLEIK\",\n]\nembeddings = antiberty.embed(sequences)\n```\n\nTo access the attention matrices, pass the `return_attention` flag to the `embed` function. The output is a list of attention matrices, where each matrix is the attention matrix for the corresponding sequence. Each attention matrix has dimension `[Layer x Heads x (Length + 2) x (Length + 2)]`.\n\n```python\nfrom antiberty import AntiBERTyRunner\n\nantiberty = AntiBERTyRunner()\n\nsequences = [\n \"EVQLVQSGPEVKKPGTSVKVSCKASGFTFMSSAVQWVRQARGQRLEWIGWIVIGSGNTNYAQKFQERVTITRDMSTSTAYMELSSLRSEDTAVYYCAAPYCSSISCNDGFDIWGQGTMVTVS\",\n \"DVVMTQTPFSLPVSLGDQASISCRSSQSLVHSNGNTYLHWYLQKPGQSPKLLIYKVSNRFSGVPDRFSGSGSGTDFTLKISRVEAEDLGVYFCSQSTHVPYTFGGGTKLEIK\",\n]\nembeddings, attentions = antiberty.embed(sequences, return_attention=True)\n```\n\nThe `embed` function can also be used with masked sequences. Masked residues should be indicated with underscores.\n\n### Classification\nTo use AntiBERTy to predict the species and chain type of sequences, use the `classify` function. The output is two lists of classifications for each sequences.\n\n```python\nfrom antiberty import AntiBERTyRunner\n\nantiberty = AntiBERTyRunner()\n\nsequences = [\n \"EVQLVQSGPEVKKPGTSVKVSCKASGFTFMSSAVQWVRQARGQRLEWIGWIVIGSGNTNYAQKFQERVTITRDMSTSTAYMELSSLRSEDTAVYYCAAPYCSSISCNDGFDIWGQGTMVTVS\",\n \"DVVMTQTPFSLPVSLGDQASISCRSSQSLVHSNGNTYLHWYLQKPGQSPKLLIYKVSNRFSGVPDRFSGSGSGTDFTLKISRVEAEDLGVYFCSQSTHVPYTFGGGTKLEIK\",\n]\nspecies_preds, chain_preds = antiberty.classify(sequences)\n```\n\nThe `classify` function can also be used with masked sequences. Masked residues should be indicated with underscores.\n\n### Mask prediction\nTo use AntiBERTy to predict the identity of masked residues, use the `fill_masks` function. Masked residues should be indicated with underscores. The output is a list of filled sequences, corresponding to the input masked sequences.\n\n```python\nfrom antiberty import AntiBERTyRunner\n\nantiberty = AntiBERTyRunner()\n\nsequences = [\n \"____VQSGPEVKKPGTSVKVSCKASGFTFMSSAVQWVRQARGQRLEWIGWIVIGSGN_NYAQKFQERVTITRDM__STAYMELSSLRSEDTAVYYCAAPYCSSISCNDGFD____GTMVTVS\",\n \"DVVMTQTPFSLPV__GDQASISCRSSQSLVHSNGNTY_HWYLQKPGQSPKLLIYKVSNRFSGVPDRFSG_GSGTDFTLKISRVEAEDLGVYFCSQSTHVPYTFGG__KLEIK\",\n]\nfilled_sequences = antiberty.fill_masks(sequences)\n```\n\n### Pseudo log-likelihood\nTo use AntiBERTy to calculate the pseudo log-likelihood of a sequence, use the `pseudo_log_likelihood` function. The pseudo log-likelihood of a sequence is calculated as the average of per-residue masked log-likelihoods. The output is a list of pseudo log-likelihoods, corresponding to the input sequences.\n\n```python\nfrom antiberty import AntiBERTyRunner\n\nantiberty = AntiBERTyRunner()\n\nsequences = [\n \"EVQLVQSGPEVKKPGTSVKVSCKASGFTFMSSAVQWVRQARGQRLEWIGWIVIGSGNTNYAQKFQERVTITRDMSTSTAYMELSSLRSEDTAVYYCAAPYCSSISCNDGFDIWGQGTMVTVS\",\n \"DVVMTQSSTPFSLPVSLGDQASISCRSSQSLVHSNGNTYLHWYLQKPGQSPKLLIYKVSNRFSGVPDRFSGSGSGTDFTLKISRVEAEDLGVYFCSQSTHVPYTFGGGTKLEIK\",\n]\n\npll = antiberty.pseudo_log_likelihood(sequences, batch_size=16)\n```\n\n## Citing this work\n\n```bibtex\n@article{ruffolo2021deciphering,\n title = {Deciphering antibody affinity maturation with language models and weakly supervised learning},\n author = {Ruffolo, Jeffrey A and Gray, Jeffrey J and Sulam, Jeremias},\n journal = {arXiv preprint arXiv:2112.07782},\n year= {2021}\n}\n```\n",
"bugtrack_url": null,
"license": "",
"summary": "",
"version": "0.1.3",
"project_urls": null,
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "9769ef028f0b04dde139c4656ea81b398fd238800c770c372ad4ffb780eec973",
"md5": "d2c4ad0cd64116b2ffa38736ebe83356",
"sha256": "30d910992b190013871bac49cdc032e01a19339f7d2b958ab99b0eb44638352a"
},
"downloads": -1,
"filename": "antiberty-0.1.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "d2c4ad0cd64116b2ffa38736ebe83356",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 96631471,
"upload_time": "2023-07-16T23:11:38",
"upload_time_iso_8601": "2023-07-16T23:11:38.781377Z",
"url": "https://files.pythonhosted.org/packages/97/69/ef028f0b04dde139c4656ea81b398fd238800c770c372ad4ffb780eec973/antiberty-0.1.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "a13b2cf48ec21956252fdc5c5dd1b7f8bb8b12f5208bd3eaaad412ced3ed0ff5",
"md5": "d3f2c92a3d79f5395f6faab5569c3f02",
"sha256": "899a401e8b0ef9586d27713b4867aa26149ec0b63387d0be55164f458b6c3bad"
},
"downloads": -1,
"filename": "antiberty-0.1.3.tar.gz",
"has_sig": false,
"md5_digest": "d3f2c92a3d79f5395f6faab5569c3f02",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 96629175,
"upload_time": "2023-07-16T23:15:31",
"upload_time_iso_8601": "2023-07-16T23:15:31.959845Z",
"url": "https://files.pythonhosted.org/packages/a1/3b/2cf48ec21956252fdc5c5dd1b7f8bb8b12f5208bd3eaaad412ced3ed0ff5/antiberty-0.1.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-07-16 23:15:31",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "antiberty"
}