antiberty


Nameantiberty JSON
Version 0.1.3 PyPI version JSON
download
home_page
Summary
upload_time2023-07-16 23:15:31
maintainer
docs_urlNone
author
requires_python
license
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # AntiBERTy
Official repository for AntiBERTy, an antibody-specific transformer language model pre-trained on 558M natural antibody sequences, as described in [Deciphering antibody affinity maturation with language models and weakly supervised learning](https://arxiv.org/abs/2112.07782).


## Setup
To use AntiBERTy, install via pip:
```bash
pip install antiberty
```

Alternatively, you can clone this repository and install the package locally:
```bash
$ git clone git@github.com:jeffreyruffolo/AntiBERTy.git 
$ pip install AntiBERTy
```

## Usage

### Embeddings

To use AntiBERTy to generate sequence embeddings, use the `embed` function. The output is a list of embedding tensors, where each tensor is the embedding for the corresponding sequence. Each embedding has dimension `[(Length + 2) x 512]`.

```python
from antiberty import AntiBERTyRunner

antiberty = AntiBERTyRunner()

sequences = [
    "EVQLVQSGPEVKKPGTSVKVSCKASGFTFMSSAVQWVRQARGQRLEWIGWIVIGSGNTNYAQKFQERVTITRDMSTSTAYMELSSLRSEDTAVYYCAAPYCSSISCNDGFDIWGQGTMVTVS",
    "DVVMTQTPFSLPVSLGDQASISCRSSQSLVHSNGNTYLHWYLQKPGQSPKLLIYKVSNRFSGVPDRFSGSGSGTDFTLKISRVEAEDLGVYFCSQSTHVPYTFGGGTKLEIK",
]
embeddings = antiberty.embed(sequences)
```

To access the attention matrices, pass the `return_attention` flag to the `embed` function. The output is a list of attention matrices, where each matrix is the attention matrix for the corresponding sequence. Each attention matrix has dimension `[Layer x Heads x (Length + 2) x (Length + 2)]`.

```python
from antiberty import AntiBERTyRunner

antiberty = AntiBERTyRunner()

sequences = [
    "EVQLVQSGPEVKKPGTSVKVSCKASGFTFMSSAVQWVRQARGQRLEWIGWIVIGSGNTNYAQKFQERVTITRDMSTSTAYMELSSLRSEDTAVYYCAAPYCSSISCNDGFDIWGQGTMVTVS",
    "DVVMTQTPFSLPVSLGDQASISCRSSQSLVHSNGNTYLHWYLQKPGQSPKLLIYKVSNRFSGVPDRFSGSGSGTDFTLKISRVEAEDLGVYFCSQSTHVPYTFGGGTKLEIK",
]
embeddings, attentions = antiberty.embed(sequences, return_attention=True)
```

The `embed` function can also be used with masked sequences. Masked residues should be indicated with underscores.

### Classification
To use AntiBERTy to predict the species and chain type of sequences, use the `classify` function. The output is two lists of classifications for each sequences.

```python
from antiberty import AntiBERTyRunner

antiberty = AntiBERTyRunner()

sequences = [
    "EVQLVQSGPEVKKPGTSVKVSCKASGFTFMSSAVQWVRQARGQRLEWIGWIVIGSGNTNYAQKFQERVTITRDMSTSTAYMELSSLRSEDTAVYYCAAPYCSSISCNDGFDIWGQGTMVTVS",
    "DVVMTQTPFSLPVSLGDQASISCRSSQSLVHSNGNTYLHWYLQKPGQSPKLLIYKVSNRFSGVPDRFSGSGSGTDFTLKISRVEAEDLGVYFCSQSTHVPYTFGGGTKLEIK",
]
species_preds, chain_preds = antiberty.classify(sequences)
```

The `classify` function can also be used with masked sequences. Masked residues should be indicated with underscores.

### Mask prediction
To use AntiBERTy to predict the identity of masked residues, use the `fill_masks` function. Masked residues should be indicated with underscores. The output is a list of filled sequences, corresponding to the input masked sequences.

```python
from antiberty import AntiBERTyRunner

antiberty = AntiBERTyRunner()

sequences = [
    "____VQSGPEVKKPGTSVKVSCKASGFTFMSSAVQWVRQARGQRLEWIGWIVIGSGN_NYAQKFQERVTITRDM__STAYMELSSLRSEDTAVYYCAAPYCSSISCNDGFD____GTMVTVS",
    "DVVMTQTPFSLPV__GDQASISCRSSQSLVHSNGNTY_HWYLQKPGQSPKLLIYKVSNRFSGVPDRFSG_GSGTDFTLKISRVEAEDLGVYFCSQSTHVPYTFGG__KLEIK",
]
filled_sequences = antiberty.fill_masks(sequences)
```

### Pseudo log-likelihood
To use AntiBERTy to calculate the pseudo log-likelihood of a sequence, use the `pseudo_log_likelihood` function. The pseudo log-likelihood of a sequence is calculated as the average of per-residue masked log-likelihoods. The output is a list of pseudo log-likelihoods, corresponding to the input sequences.

```python
from antiberty import AntiBERTyRunner

antiberty = AntiBERTyRunner()

sequences = [
    "EVQLVQSGPEVKKPGTSVKVSCKASGFTFMSSAVQWVRQARGQRLEWIGWIVIGSGNTNYAQKFQERVTITRDMSTSTAYMELSSLRSEDTAVYYCAAPYCSSISCNDGFDIWGQGTMVTVS",
    "DVVMTQSSTPFSLPVSLGDQASISCRSSQSLVHSNGNTYLHWYLQKPGQSPKLLIYKVSNRFSGVPDRFSGSGSGTDFTLKISRVEAEDLGVYFCSQSTHVPYTFGGGTKLEIK",
]

pll = antiberty.pseudo_log_likelihood(sequences, batch_size=16)
```

## Citing this work

```bibtex
@article{ruffolo2021deciphering,
    title = {Deciphering antibody affinity maturation with language models and weakly supervised learning},
    author = {Ruffolo, Jeffrey A and Gray, Jeffrey J and Sulam, Jeremias},
    journal = {arXiv preprint arXiv:2112.07782},
    year= {2021}
}
```

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "antiberty",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "",
    "author": "",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/a1/3b/2cf48ec21956252fdc5c5dd1b7f8bb8b12f5208bd3eaaad412ced3ed0ff5/antiberty-0.1.3.tar.gz",
    "platform": null,
    "description": "# AntiBERTy\nOfficial repository for AntiBERTy, an antibody-specific transformer language model pre-trained on 558M natural antibody sequences, as described in [Deciphering antibody affinity maturation with language models and weakly supervised learning](https://arxiv.org/abs/2112.07782).\n\n\n## Setup\nTo use AntiBERTy, install via pip:\n```bash\npip install antiberty\n```\n\nAlternatively, you can clone this repository and install the package locally:\n```bash\n$ git clone git@github.com:jeffreyruffolo/AntiBERTy.git \n$ pip install AntiBERTy\n```\n\n## Usage\n\n### Embeddings\n\nTo use AntiBERTy to generate sequence embeddings, use the `embed` function. The output is a list of embedding tensors, where each tensor is the embedding for the corresponding sequence. Each embedding has dimension `[(Length + 2) x 512]`.\n\n```python\nfrom antiberty import AntiBERTyRunner\n\nantiberty = AntiBERTyRunner()\n\nsequences = [\n    \"EVQLVQSGPEVKKPGTSVKVSCKASGFTFMSSAVQWVRQARGQRLEWIGWIVIGSGNTNYAQKFQERVTITRDMSTSTAYMELSSLRSEDTAVYYCAAPYCSSISCNDGFDIWGQGTMVTVS\",\n    \"DVVMTQTPFSLPVSLGDQASISCRSSQSLVHSNGNTYLHWYLQKPGQSPKLLIYKVSNRFSGVPDRFSGSGSGTDFTLKISRVEAEDLGVYFCSQSTHVPYTFGGGTKLEIK\",\n]\nembeddings = antiberty.embed(sequences)\n```\n\nTo access the attention matrices, pass the `return_attention` flag to the `embed` function. The output is a list of attention matrices, where each matrix is the attention matrix for the corresponding sequence. Each attention matrix has dimension `[Layer x Heads x (Length + 2) x (Length + 2)]`.\n\n```python\nfrom antiberty import AntiBERTyRunner\n\nantiberty = AntiBERTyRunner()\n\nsequences = [\n    \"EVQLVQSGPEVKKPGTSVKVSCKASGFTFMSSAVQWVRQARGQRLEWIGWIVIGSGNTNYAQKFQERVTITRDMSTSTAYMELSSLRSEDTAVYYCAAPYCSSISCNDGFDIWGQGTMVTVS\",\n    \"DVVMTQTPFSLPVSLGDQASISCRSSQSLVHSNGNTYLHWYLQKPGQSPKLLIYKVSNRFSGVPDRFSGSGSGTDFTLKISRVEAEDLGVYFCSQSTHVPYTFGGGTKLEIK\",\n]\nembeddings, attentions = antiberty.embed(sequences, return_attention=True)\n```\n\nThe `embed` function can also be used with masked sequences. Masked residues should be indicated with underscores.\n\n### Classification\nTo use AntiBERTy to predict the species and chain type of sequences, use the `classify` function. The output is two lists of classifications for each sequences.\n\n```python\nfrom antiberty import AntiBERTyRunner\n\nantiberty = AntiBERTyRunner()\n\nsequences = [\n    \"EVQLVQSGPEVKKPGTSVKVSCKASGFTFMSSAVQWVRQARGQRLEWIGWIVIGSGNTNYAQKFQERVTITRDMSTSTAYMELSSLRSEDTAVYYCAAPYCSSISCNDGFDIWGQGTMVTVS\",\n    \"DVVMTQTPFSLPVSLGDQASISCRSSQSLVHSNGNTYLHWYLQKPGQSPKLLIYKVSNRFSGVPDRFSGSGSGTDFTLKISRVEAEDLGVYFCSQSTHVPYTFGGGTKLEIK\",\n]\nspecies_preds, chain_preds = antiberty.classify(sequences)\n```\n\nThe `classify` function can also be used with masked sequences. Masked residues should be indicated with underscores.\n\n### Mask prediction\nTo use AntiBERTy to predict the identity of masked residues, use the `fill_masks` function. Masked residues should be indicated with underscores. The output is a list of filled sequences, corresponding to the input masked sequences.\n\n```python\nfrom antiberty import AntiBERTyRunner\n\nantiberty = AntiBERTyRunner()\n\nsequences = [\n    \"____VQSGPEVKKPGTSVKVSCKASGFTFMSSAVQWVRQARGQRLEWIGWIVIGSGN_NYAQKFQERVTITRDM__STAYMELSSLRSEDTAVYYCAAPYCSSISCNDGFD____GTMVTVS\",\n    \"DVVMTQTPFSLPV__GDQASISCRSSQSLVHSNGNTY_HWYLQKPGQSPKLLIYKVSNRFSGVPDRFSG_GSGTDFTLKISRVEAEDLGVYFCSQSTHVPYTFGG__KLEIK\",\n]\nfilled_sequences = antiberty.fill_masks(sequences)\n```\n\n### Pseudo log-likelihood\nTo use AntiBERTy to calculate the pseudo log-likelihood of a sequence, use the `pseudo_log_likelihood` function. The pseudo log-likelihood of a sequence is calculated as the average of per-residue masked log-likelihoods. The output is a list of pseudo log-likelihoods, corresponding to the input sequences.\n\n```python\nfrom antiberty import AntiBERTyRunner\n\nantiberty = AntiBERTyRunner()\n\nsequences = [\n    \"EVQLVQSGPEVKKPGTSVKVSCKASGFTFMSSAVQWVRQARGQRLEWIGWIVIGSGNTNYAQKFQERVTITRDMSTSTAYMELSSLRSEDTAVYYCAAPYCSSISCNDGFDIWGQGTMVTVS\",\n    \"DVVMTQSSTPFSLPVSLGDQASISCRSSQSLVHSNGNTYLHWYLQKPGQSPKLLIYKVSNRFSGVPDRFSGSGSGTDFTLKISRVEAEDLGVYFCSQSTHVPYTFGGGTKLEIK\",\n]\n\npll = antiberty.pseudo_log_likelihood(sequences, batch_size=16)\n```\n\n## Citing this work\n\n```bibtex\n@article{ruffolo2021deciphering,\n    title = {Deciphering antibody affinity maturation with language models and weakly supervised learning},\n    author = {Ruffolo, Jeffrey A and Gray, Jeffrey J and Sulam, Jeremias},\n    journal = {arXiv preprint arXiv:2112.07782},\n    year= {2021}\n}\n```\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "",
    "version": "0.1.3",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9769ef028f0b04dde139c4656ea81b398fd238800c770c372ad4ffb780eec973",
                "md5": "d2c4ad0cd64116b2ffa38736ebe83356",
                "sha256": "30d910992b190013871bac49cdc032e01a19339f7d2b958ab99b0eb44638352a"
            },
            "downloads": -1,
            "filename": "antiberty-0.1.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d2c4ad0cd64116b2ffa38736ebe83356",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 96631471,
            "upload_time": "2023-07-16T23:11:38",
            "upload_time_iso_8601": "2023-07-16T23:11:38.781377Z",
            "url": "https://files.pythonhosted.org/packages/97/69/ef028f0b04dde139c4656ea81b398fd238800c770c372ad4ffb780eec973/antiberty-0.1.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a13b2cf48ec21956252fdc5c5dd1b7f8bb8b12f5208bd3eaaad412ced3ed0ff5",
                "md5": "d3f2c92a3d79f5395f6faab5569c3f02",
                "sha256": "899a401e8b0ef9586d27713b4867aa26149ec0b63387d0be55164f458b6c3bad"
            },
            "downloads": -1,
            "filename": "antiberty-0.1.3.tar.gz",
            "has_sig": false,
            "md5_digest": "d3f2c92a3d79f5395f6faab5569c3f02",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 96629175,
            "upload_time": "2023-07-16T23:15:31",
            "upload_time_iso_8601": "2023-07-16T23:15:31.959845Z",
            "url": "https://files.pythonhosted.org/packages/a1/3b/2cf48ec21956252fdc5c5dd1b7f8bb8b12f5208bd3eaaad412ced3ed0ff5/antiberty-0.1.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-07-16 23:15:31",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "antiberty"
}
        
Elapsed time: 0.10256s