| Name | epact JSON |
| Version |
0.1.1
JSON |
| download |
| home_page | None |
| Summary | Epitope-anchored contrastive transfer learning for paired CD8+ T Cell receptor-antigen Rrcognition |
| upload_time | 2024-08-16 10:30:22 |
| maintainer | None |
| docs_url | None |
| author | None |
| requires_python | >=3.10 |
| license | None |
| keywords |
|
| VCS |
 |
| bugtrack_url |
|
| requirements |
No requirements were recorded.
|
| Travis-CI |
No Travis.
|
| coveralls test coverage |
No coveralls.
|
## EPACT: Epitope-anchored Contrastive Transfer Learning for Paired CD8+ T Cell Receptor-antigen Recognition
This repository contains the source code for the paper [**Epitope-anchored contrastive transfer learning for paired CD8 T cell receptor-antigen recognition**](https://www.biorxiv.org/content/10.1101/2024.04.05.588255v1).

EPACT is developed by a divide-and-conquer paradigm that combines **pre-training** on TCR or pMHC data and **transfer learning** to predict TCR$\alpha\beta$-pMHC binding specificity and interaction conformation via **epitope-anchored** **contrastive** **learning**.
### Colab Notebook <a href="https://colab.research.google.com/github/zhangyumeng1sjtu/EPACT/blob/main/EPACT.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
### Installation
1. Clone the repository.
```python
git clone https://github.com/zhangyumeng1sjtu/EPACT.git
```
2. Create a virtual environment by conda.
```python
conda create -n EPACT_env python=3.10.12
conda activate EPACT_env
```
3. Download PyTorch>=2.0.1, which is compatible with your CUDA version and other Python packages.
```python
conda install pytorch==2.0.1 pytorch-cuda=11.7 -c pytorch -c nvidia # for CUDA 11.7
pip install -r requirements.txt
```
### Data and model checkpoints
The following data and model checkpoints are available at [Zenodo](https://zenodo.org/records/10996150).
- `data/binding`: binding data between paired TCR$\alpha\beta$ and pMHC derived from IEDB, VDJdb, McPAS, TBAdb, 10X, and Francis et al.
- `data/pretrained`: human peptides from IEDB, human CD8+ TCRs from 10X Genomics Datasets and STAPLER, peptide-MHC-I binding affinity data from NetMHCpan4.1, and peptide-MHC-I eluted ligand data from BigMHC.
- `data/structure`: Crystal structures of TCR-pMHC protein complexes in STCRDab. Distance matrices were calculated according to the closest distance between heavy atoms from two amino acid residues.
- `checkpoints/paired-cdr3-pmhc-binding`: model checkpoints for predicting TCR$\alpha\beta$-pMHC binding specificity from CDR3 sequences.
- `checkpoints/paired-cdr123-pmhc-binding`: model checkpoints for predicting TCR$\alpha\beta$-pMHC binding specificity from CDR1, CDR2, and CDR3 sequences.
- `checkpoints/paired-cdr123-pmhc-interaction`: model checkpoints for predicting CDR-epitope residue-level distance matrix and contact sites.
- `checkpoints/pretrained`: model checkpoints for pre-trained language model of TCRs and peptides, and peptide-MHC models (binding affinity & eluted ligand).
### Usage
#### 1. Pre-training
- Pre-train peptide and TCR$\alpha\beta$ language models.
```bash
# pretrain epitope masked language model.
python scripts/pretrain/pretrain_plm.py --config configs/config-pretrain-epitope-lm.yml
# pretrain paired cdr3 masked language model.
python scripts/pretrain/pretrain_plm.py --config configs/config-pretrain-cdr3-lm.yml
# pretrain paired cdr123 masked language model.
python scripts/pretrain/pretrain_plm.py --config configs/config-pretrain-cdr123-lm.yml
```
- Train peptide-MHC binding affinity or eluted ligand models.
```bash
# pretrain peptide-MHC binding affinity model.
python scripts/pretrain/pretrain_pmhc_model.py --config configs/config-pmhc-binding.yml
# pretrain peptide-MHC eluted ligand model.
python scripts/pretrain/pretrain_pmhc_model.py --config configs/config-pmhc-elution.yml
```
#### 2. Predict binding specificity
- Train TCR$\alpha\beta$-pMHC binding models.
```bash
# finetune Paired TCR-pMHC binding model (CDR3).
python scripts/train/train_tcr_pmhc_binding.py --config configs/config-paired-cdr3-pmhc-binding.yml
# finetune Paired TCR-pMHC binding model (CDR123).
python scripts/train/train_tcr_pmhc_binding.py --config configs/config-paired-cdr123-pmhc-binding.yml
```
- Predict TCR$\alpha\beta$-pMHC binding specificity.
```bash
# predict cross-validation results
for i in {1..5}
do
python scripts/predict/predict_tcr_pmhc_binding.py \
--config configs/config-paired-cdr123-pmhc-binding.yml \
--input_data_path data/binding/Full-TCR/k-fold-data/val_fold_${i}.csv \
--model_location checkpoints/paired-cdr123-pmhc-binding/paired-cdr123-pmhc-binding-model-fold-${i}.pt\
--log_dir results/preds-cdr123-pmhc-binding/Fold_${i}/
done
```
- Predict TCR$\alpha\beta$-pMHC binding ranks compared to background TCRs
```bash
# predict binding ranks for SARS-CoV-2 responsive TCR clonotypes
python scripts/predict/predict_tcr_pmhc_binding_rank.py --config configs/config-paired-cdr123-pmhc-binding.yml \
--log_dir results/ranking-covid-cdr123/ \
--input_data_path data/binding/covid_clonotypes.csv \
--model_location checkpoints/paired-cdr123-pmhc-binding/paired-cdr123-pmhc-binding-model-all.pt \
--bg_tcr_path data/pretrained/10x-paired-healthy-human-tcr-repertoire.csv \
--num_bg_tcrs 20000
```
#### 3. Predict interaction conformation
- Train TCR$\alpha\beta$-pMHC interaction model.
```bash
# finetune Paired TCR-pMHC interaction model (CDR123).
python scripts/train/train_tcr_pmhc_interact.py --config configs/config-paired-cdr123-pmhc-interact.yml
```
- Predict TCR$\alpha\beta$-pMHC interaction conformations.
```bash
# predict distance matrices and contact sites between MEL8 TCR and HLA-A2-presented peptides.
for i in {1..5}
do
python scripts/predict/predict_tcr_pmhc_interact.py --config configs/config-paired-cdr123-pmhc-interact.yml \
--input_data_path data/MEL8_A0201_peptides.csv \
--model_location checkpoints/paired-cdr123-pmhc-interaction/paired-cdr123-pmhc-interaction-model-fold-${i}.pt \
--log_dir results/interaction-MEL8-bg-cdr123-closest/Fold_${i}/
done
```
### Citation
```tex
@article {Zhang2024.04.05.588255,
author = {Yumeng Zhang and Zhikang Wang and Yunzhe Jiang and Dene R Littler and Mark Gerstein and Anthony W Purcell and Jamie Rossjohn and Hong-Yu Ou and Jiangning Song},
title = {Epitope-anchored contrastive transfer learning for paired CD8+ T cell receptor-antigen recognition},
elocation-id = {2024.04.05.588255},
year = {2024},
doi = {10.1101/2024.04.05.588255},
publisher = {Cold Spring Harbor Laboratory},
URL = {https://www.biorxiv.org/content/early/2024/04/07/2024.04.05.588255},
eprint = {https://www.biorxiv.org/content/early/2024/04/07/2024.04.05.588255.full.pdf},
journal = {bioRxiv}
}
```
### Contact
If you have any questions, please contact us at [zhangyumeng1@sjtu.edu.cn](mailto:zhangyumeng1@sjtu.edu.cn) or [jiangning.song@monash.edu](mailto:jiangning.song@monash.edu).
Raw data
{
"_id": null,
"home_page": null,
"name": "epact",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": null,
"author": null,
"author_email": "Yumeng Zhang <zhangyumeng1@sjtu.edu.cn>",
"download_url": "https://files.pythonhosted.org/packages/61/f2/5988742d9a3fbff922074fdb4c4c2462355ded96592e88ebae2b60fc0fa0/epact-0.1.1.tar.gz",
"platform": null,
"description": "## EPACT: Epitope-anchored Contrastive Transfer Learning for Paired CD8+ T Cell Receptor-antigen Recognition\n\nThis repository contains the source code for the paper [**Epitope-anchored contrastive transfer learning for paired CD8 T cell receptor-antigen recognition**](https://www.biorxiv.org/content/10.1101/2024.04.05.588255v1).\n\n\n\nEPACT is developed by a divide-and-conquer paradigm that combines **pre-training** on TCR or pMHC data and **transfer learning** to predict TCR$\\alpha\\beta$-pMHC binding specificity and interaction conformation via **epitope-anchored** **contrastive** **learning**.\n\n### Colab Notebook <a href=\"https://colab.research.google.com/github/zhangyumeng1sjtu/EPACT/blob/main/EPACT.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n\n### Installation\n\n1. Clone the repository.\n\n ```python\n git clone https://github.com/zhangyumeng1sjtu/EPACT.git\n ```\n2. Create a virtual environment by conda.\n\n ```python\n conda create -n EPACT_env python=3.10.12\n conda activate EPACT_env\n ```\n3. Download PyTorch>=2.0.1, which is compatible with your CUDA version and other Python packages.\n\n ```python\n conda install pytorch==2.0.1 pytorch-cuda=11.7 -c pytorch -c nvidia # for CUDA 11.7\n pip install -r requirements.txt\n ```\n\n### Data and model checkpoints\n\nThe following data and model checkpoints are available at [Zenodo](https://zenodo.org/records/10996150).\n\n- `data/binding`: binding data between paired TCR$\\alpha\\beta$ and pMHC derived from IEDB, VDJdb, McPAS, TBAdb, 10X, and Francis et al.\n- `data/pretrained`: human peptides from IEDB, human CD8+ TCRs from 10X Genomics Datasets and STAPLER, peptide-MHC-I binding affinity data from NetMHCpan4.1, and peptide-MHC-I eluted ligand data from BigMHC.\n- `data/structure`: Crystal structures of TCR-pMHC protein complexes in STCRDab. Distance matrices were calculated according to the closest distance between heavy atoms from two amino acid residues.\n- `checkpoints/paired-cdr3-pmhc-binding`: model checkpoints for predicting TCR$\\alpha\\beta$-pMHC binding specificity from CDR3 sequences.\n- `checkpoints/paired-cdr123-pmhc-binding`: model checkpoints for predicting TCR$\\alpha\\beta$-pMHC binding specificity from CDR1, CDR2, and CDR3 sequences.\n- `checkpoints/paired-cdr123-pmhc-interaction`: model checkpoints for predicting CDR-epitope residue-level distance matrix and contact sites.\n- `checkpoints/pretrained`: model checkpoints for pre-trained language model of TCRs and peptides, and peptide-MHC models (binding affinity & eluted ligand).\n\n### Usage\n\n#### 1. Pre-training\n\n- Pre-train peptide and TCR$\\alpha\\beta$ language models.\n\n ```bash\n # pretrain epitope masked language model.\n python scripts/pretrain/pretrain_plm.py --config configs/config-pretrain-epitope-lm.yml\n\n # pretrain paired cdr3 masked language model.\n python scripts/pretrain/pretrain_plm.py --config configs/config-pretrain-cdr3-lm.yml\n\n # pretrain paired cdr123 masked language model.\n python scripts/pretrain/pretrain_plm.py --config configs/config-pretrain-cdr123-lm.yml\n ```\n- Train peptide-MHC binding affinity or eluted ligand models.\n\n ```bash\n # pretrain peptide-MHC binding affinity model.\n python scripts/pretrain/pretrain_pmhc_model.py --config configs/config-pmhc-binding.yml\n\n # pretrain peptide-MHC eluted ligand model.\n python scripts/pretrain/pretrain_pmhc_model.py --config configs/config-pmhc-elution.yml\n ```\n\n#### 2. Predict binding specificity\n\n- Train TCR$\\alpha\\beta$-pMHC binding models.\n\n ```bash\n # finetune Paired TCR-pMHC binding model (CDR3).\n python scripts/train/train_tcr_pmhc_binding.py --config configs/config-paired-cdr3-pmhc-binding.yml \n\n # finetune Paired TCR-pMHC binding model (CDR123).\n python scripts/train/train_tcr_pmhc_binding.py --config configs/config-paired-cdr123-pmhc-binding.yml\n ```\n- Predict TCR$\\alpha\\beta$-pMHC binding specificity.\n\n ```bash\n # predict cross-validation results\n for i in {1..5}\n do\n python scripts/predict/predict_tcr_pmhc_binding.py \\\n --config configs/config-paired-cdr123-pmhc-binding.yml \\\n --input_data_path data/binding/Full-TCR/k-fold-data/val_fold_${i}.csv \\\n --model_location checkpoints/paired-cdr123-pmhc-binding/paired-cdr123-pmhc-binding-model-fold-${i}.pt\\\n --log_dir results/preds-cdr123-pmhc-binding/Fold_${i}/\n done\n ```\n- Predict TCR$\\alpha\\beta$-pMHC binding ranks compared to background TCRs\n\n ```bash\n # predict binding ranks for SARS-CoV-2 responsive TCR clonotypes\n python scripts/predict/predict_tcr_pmhc_binding_rank.py --config configs/config-paired-cdr123-pmhc-binding.yml \\\n --log_dir results/ranking-covid-cdr123/ \\\n --input_data_path data/binding/covid_clonotypes.csv \\\n --model_location checkpoints/paired-cdr123-pmhc-binding/paired-cdr123-pmhc-binding-model-all.pt \\\n --bg_tcr_path data/pretrained/10x-paired-healthy-human-tcr-repertoire.csv \\\n --num_bg_tcrs 20000\n ```\n\n#### 3. Predict interaction conformation\n\n- Train TCR$\\alpha\\beta$-pMHC interaction model.\n\n ```bash\n # finetune Paired TCR-pMHC interaction model (CDR123).\n python scripts/train/train_tcr_pmhc_interact.py --config configs/config-paired-cdr123-pmhc-interact.yml\n ```\n- Predict TCR$\\alpha\\beta$-pMHC interaction conformations.\n\n ```bash\n # predict distance matrices and contact sites between MEL8 TCR and HLA-A2-presented peptides.\n for i in {1..5}\n do\n python scripts/predict/predict_tcr_pmhc_interact.py --config configs/config-paired-cdr123-pmhc-interact.yml \\\n --input_data_path data/MEL8_A0201_peptides.csv \\\n --model_location checkpoints/paired-cdr123-pmhc-interaction/paired-cdr123-pmhc-interaction-model-fold-${i}.pt \\\n --log_dir results/interaction-MEL8-bg-cdr123-closest/Fold_${i}/\n done\n ```\n\n### Citation\n\n```tex\n@article {Zhang2024.04.05.588255,\n\tauthor = {Yumeng Zhang and Zhikang Wang and Yunzhe Jiang and Dene R Littler and Mark Gerstein and Anthony W Purcell and Jamie Rossjohn and Hong-Yu Ou and Jiangning Song},\n\ttitle = {Epitope-anchored contrastive transfer learning for paired CD8+ T cell receptor-antigen recognition},\n\telocation-id = {2024.04.05.588255},\n\tyear = {2024},\n\tdoi = {10.1101/2024.04.05.588255},\n\tpublisher = {Cold Spring Harbor Laboratory},\n\tURL = {https://www.biorxiv.org/content/early/2024/04/07/2024.04.05.588255},\n\teprint = {https://www.biorxiv.org/content/early/2024/04/07/2024.04.05.588255.full.pdf},\n\tjournal = {bioRxiv}\n}\n```\n\n### Contact\n\nIf you have any questions, please contact us at [zhangyumeng1@sjtu.edu.cn](mailto:zhangyumeng1@sjtu.edu.cn) or [jiangning.song@monash.edu](mailto:jiangning.song@monash.edu).\n",
"bugtrack_url": null,
"license": null,
"summary": "Epitope-anchored contrastive transfer learning for paired CD8+ T Cell receptor-antigen Rrcognition",
"version": "0.1.1",
"project_urls": {
"Homepage": "https://github.com/zhangyumeng1sjtu/EPACT",
"Issues": "https://github.com/zhangyumeng1sjtu/EPACT/issues"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "e93b8ada381122cbded1da4ccff265a4216910cb03a40816f34506daea38def1",
"md5": "f6d70285529aa8f5afd15118b564f72e",
"sha256": "0864df13057a0707b4151b02bb7bb4e0d1e7099995fb7cd4066d6f8c0fea7d0e"
},
"downloads": -1,
"filename": "epact-0.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "f6d70285529aa8f5afd15118b564f72e",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 49715,
"upload_time": "2024-08-16T10:30:21",
"upload_time_iso_8601": "2024-08-16T10:30:21.059050Z",
"url": "https://files.pythonhosted.org/packages/e9/3b/8ada381122cbded1da4ccff265a4216910cb03a40816f34506daea38def1/epact-0.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "61f25988742d9a3fbff922074fdb4c4c2462355ded96592e88ebae2b60fc0fa0",
"md5": "be8b5b5f8cfa816d532ad3ea69453027",
"sha256": "710b583cf9112d2ccdc4a2ef040eae59131f83a7cea9d36037649beca6d413bd"
},
"downloads": -1,
"filename": "epact-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "be8b5b5f8cfa816d532ad3ea69453027",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 45658,
"upload_time": "2024-08-16T10:30:22",
"upload_time_iso_8601": "2024-08-16T10:30:22.534129Z",
"url": "https://files.pythonhosted.org/packages/61/f2/5988742d9a3fbff922074fdb4c4c2462355ded96592e88ebae2b60fc0fa0/epact-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-08-16 10:30:22",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "zhangyumeng1sjtu",
"github_project": "EPACT",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "epact"
}