epact


Nameepact JSON
Version 0.1.1 PyPI version JSON
download
home_pageNone
SummaryEpitope-anchored contrastive transfer learning for paired CD8+ T Cell receptor-antigen Rrcognition
upload_time2024-08-16 10:30:22
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ## EPACT: Epitope-anchored Contrastive Transfer Learning for Paired CD8+ T Cell Receptor-antigen Recognition

This repository contains the source code for the paper [**Epitope-anchored contrastive transfer learning for paired CD8 T cell receptor-antigen recognition**](https://www.biorxiv.org/content/10.1101/2024.04.05.588255v1).

![model](./model.png)

EPACT is developed by a divide-and-conquer paradigm that combines **pre-training** on TCR or pMHC data and **transfer learning** to predict TCR$\alpha\beta$-pMHC binding specificity and interaction conformation via **epitope-anchored** **contrastive** **learning**.

### Colab Notebook <a href="https://colab.research.google.com/github/zhangyumeng1sjtu/EPACT/blob/main/EPACT.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Installation

1. Clone the repository.

   ```python
   git clone https://github.com/zhangyumeng1sjtu/EPACT.git
   ```
2. Create a virtual environment by conda.

   ```python
   conda create -n EPACT_env python=3.10.12
   conda activate EPACT_env
   ```
3. Download PyTorch>=2.0.1, which is compatible with your CUDA version and other Python packages.

   ```python
   conda install pytorch==2.0.1 pytorch-cuda=11.7 -c pytorch -c nvidia # for CUDA 11.7
   pip install -r requirements.txt
   ```

### Data and model checkpoints

The following data and model checkpoints are available at [Zenodo](https://zenodo.org/records/10996150).

- `data/binding`: binding data between paired TCR$\alpha\beta$ and pMHC derived from IEDB, VDJdb, McPAS, TBAdb, 10X, and Francis et al.
- `data/pretrained`: human peptides from IEDB, human CD8+ TCRs from 10X Genomics Datasets and STAPLER, peptide-MHC-I binding affinity data from NetMHCpan4.1, and peptide-MHC-I eluted ligand data from BigMHC.
- `data/structure`: Crystal structures of TCR-pMHC protein complexes in STCRDab. Distance matrices were calculated according to the closest distance between heavy atoms from two amino acid residues.
- `checkpoints/paired-cdr3-pmhc-binding`: model checkpoints for predicting TCR$\alpha\beta$-pMHC binding specificity from CDR3 sequences.
- `checkpoints/paired-cdr123-pmhc-binding`: model checkpoints for predicting TCR$\alpha\beta$-pMHC binding specificity from CDR1, CDR2, and CDR3 sequences.
- `checkpoints/paired-cdr123-pmhc-interaction`: model checkpoints for predicting CDR-epitope residue-level distance matrix and contact sites.
- `checkpoints/pretrained`: model checkpoints for pre-trained language model of TCRs and peptides, and peptide-MHC models (binding affinity & eluted ligand).

### Usage

#### 1. Pre-training

- Pre-train peptide and TCR$\alpha\beta$ language models.

  ```bash
  # pretrain epitope masked language model.
  python scripts/pretrain/pretrain_plm.py --config configs/config-pretrain-epitope-lm.yml

  # pretrain paired cdr3 masked language model.
  python scripts/pretrain/pretrain_plm.py --config configs/config-pretrain-cdr3-lm.yml

  # pretrain paired cdr123 masked language model.
  python scripts/pretrain/pretrain_plm.py --config configs/config-pretrain-cdr123-lm.yml
  ```
- Train peptide-MHC binding affinity or eluted ligand models.

  ```bash
  # pretrain peptide-MHC binding affinity model.
  python scripts/pretrain/pretrain_pmhc_model.py --config configs/config-pmhc-binding.yml

  # pretrain peptide-MHC eluted ligand model.
  python scripts/pretrain/pretrain_pmhc_model.py --config configs/config-pmhc-elution.yml
  ```

#### 2. Predict binding specificity

- Train TCR$\alpha\beta$-pMHC binding models.

  ```bash
  # finetune Paired TCR-pMHC binding model (CDR3).
  python scripts/train/train_tcr_pmhc_binding.py --config configs/config-paired-cdr3-pmhc-binding.yml 

  # finetune Paired TCR-pMHC binding model (CDR123).
  python scripts/train/train_tcr_pmhc_binding.py --config configs/config-paired-cdr123-pmhc-binding.yml
  ```
- Predict TCR$\alpha\beta$-pMHC binding specificity.

  ```bash
  # predict cross-validation results
  for i in {1..5}
  do
      python scripts/predict/predict_tcr_pmhc_binding.py \
          --config configs/config-paired-cdr123-pmhc-binding.yml \
          --input_data_path data/binding/Full-TCR/k-fold-data/val_fold_${i}.csv \
          --model_location checkpoints/paired-cdr123-pmhc-binding/paired-cdr123-pmhc-binding-model-fold-${i}.pt\
          --log_dir results/preds-cdr123-pmhc-binding/Fold_${i}/
  done
  ```
- Predict TCR$\alpha\beta$-pMHC binding ranks compared to background TCRs

  ```bash
  # predict binding ranks for SARS-CoV-2 responsive TCR clonotypes
  python scripts/predict/predict_tcr_pmhc_binding_rank.py --config configs/config-paired-cdr123-pmhc-binding.yml \
                                          --log_dir results/ranking-covid-cdr123/ \
                                          --input_data_path data/binding/covid_clonotypes.csv \
                                          --model_location checkpoints/paired-cdr123-pmhc-binding/paired-cdr123-pmhc-binding-model-all.pt \
                                          --bg_tcr_path data/pretrained/10x-paired-healthy-human-tcr-repertoire.csv \
                                          --num_bg_tcrs 20000
  ```

#### 3. Predict interaction conformation

- Train TCR$\alpha\beta$-pMHC interaction model.

  ```bash
  # finetune Paired TCR-pMHC interaction model (CDR123).
  python scripts/train/train_tcr_pmhc_interact.py --config configs/config-paired-cdr123-pmhc-interact.yml
  ```
- Predict TCR$\alpha\beta$-pMHC interaction conformations.

  ```bash
  # predict distance matrices and contact sites between MEL8 TCR and HLA-A2-presented peptides.
  for i in {1..5}
  do
      python scripts/predict/predict_tcr_pmhc_interact.py --config configs/config-paired-cdr123-pmhc-interact.yml \
          --input_data_path data/MEL8_A0201_peptides.csv \
          --model_location checkpoints/paired-cdr123-pmhc-interaction/paired-cdr123-pmhc-interaction-model-fold-${i}.pt \
          --log_dir results/interaction-MEL8-bg-cdr123-closest/Fold_${i}/
  done
  ```

### Citation

```tex
@article {Zhang2024.04.05.588255,
	author = {Yumeng Zhang and Zhikang Wang and Yunzhe Jiang and Dene R Littler and Mark Gerstein and Anthony W Purcell and Jamie Rossjohn and Hong-Yu Ou and Jiangning Song},
	title = {Epitope-anchored contrastive transfer learning for paired CD8+ T cell receptor-antigen recognition},
	elocation-id = {2024.04.05.588255},
	year = {2024},
	doi = {10.1101/2024.04.05.588255},
	publisher = {Cold Spring Harbor Laboratory},
	URL = {https://www.biorxiv.org/content/early/2024/04/07/2024.04.05.588255},
	eprint = {https://www.biorxiv.org/content/early/2024/04/07/2024.04.05.588255.full.pdf},
	journal = {bioRxiv}
}
```

### Contact

If you have any questions, please contact us at [zhangyumeng1@sjtu.edu.cn](mailto:zhangyumeng1@sjtu.edu.cn) or [jiangning.song@monash.edu](mailto:jiangning.song@monash.edu).

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "epact",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": null,
    "author": null,
    "author_email": "Yumeng Zhang <zhangyumeng1@sjtu.edu.cn>",
    "download_url": "https://files.pythonhosted.org/packages/61/f2/5988742d9a3fbff922074fdb4c4c2462355ded96592e88ebae2b60fc0fa0/epact-0.1.1.tar.gz",
    "platform": null,
    "description": "## EPACT: Epitope-anchored Contrastive Transfer Learning for Paired CD8+ T Cell Receptor-antigen Recognition\n\nThis repository contains the source code for the paper [**Epitope-anchored contrastive transfer learning for paired CD8 T cell receptor-antigen recognition**](https://www.biorxiv.org/content/10.1101/2024.04.05.588255v1).\n\n![model](./model.png)\n\nEPACT is developed by a divide-and-conquer paradigm that combines **pre-training** on TCR or pMHC data and **transfer learning** to predict TCR$\\alpha\\beta$-pMHC binding specificity and interaction conformation via **epitope-anchored** **contrastive** **learning**.\n\n### Colab Notebook <a href=\"https://colab.research.google.com/github/zhangyumeng1sjtu/EPACT/blob/main/EPACT.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n\n### Installation\n\n1. Clone the repository.\n\n   ```python\n   git clone https://github.com/zhangyumeng1sjtu/EPACT.git\n   ```\n2. Create a virtual environment by conda.\n\n   ```python\n   conda create -n EPACT_env python=3.10.12\n   conda activate EPACT_env\n   ```\n3. Download PyTorch>=2.0.1, which is compatible with your CUDA version and other Python packages.\n\n   ```python\n   conda install pytorch==2.0.1 pytorch-cuda=11.7 -c pytorch -c nvidia # for CUDA 11.7\n   pip install -r requirements.txt\n   ```\n\n### Data and model checkpoints\n\nThe following data and model checkpoints are available at [Zenodo](https://zenodo.org/records/10996150).\n\n- `data/binding`: binding data between paired TCR$\\alpha\\beta$ and pMHC derived from IEDB, VDJdb, McPAS, TBAdb, 10X, and Francis et al.\n- `data/pretrained`: human peptides from IEDB, human CD8+ TCRs from 10X Genomics Datasets and STAPLER, peptide-MHC-I binding affinity data from NetMHCpan4.1, and peptide-MHC-I eluted ligand data from BigMHC.\n- `data/structure`: Crystal structures of TCR-pMHC protein complexes in STCRDab. Distance matrices were calculated according to the closest distance between heavy atoms from two amino acid residues.\n- `checkpoints/paired-cdr3-pmhc-binding`: model checkpoints for predicting TCR$\\alpha\\beta$-pMHC binding specificity from CDR3 sequences.\n- `checkpoints/paired-cdr123-pmhc-binding`: model checkpoints for predicting TCR$\\alpha\\beta$-pMHC binding specificity from CDR1, CDR2, and CDR3 sequences.\n- `checkpoints/paired-cdr123-pmhc-interaction`: model checkpoints for predicting CDR-epitope residue-level distance matrix and contact sites.\n- `checkpoints/pretrained`: model checkpoints for pre-trained language model of TCRs and peptides, and peptide-MHC models (binding affinity & eluted ligand).\n\n### Usage\n\n#### 1. Pre-training\n\n- Pre-train peptide and TCR$\\alpha\\beta$ language models.\n\n  ```bash\n  # pretrain epitope masked language model.\n  python scripts/pretrain/pretrain_plm.py --config configs/config-pretrain-epitope-lm.yml\n\n  # pretrain paired cdr3 masked language model.\n  python scripts/pretrain/pretrain_plm.py --config configs/config-pretrain-cdr3-lm.yml\n\n  # pretrain paired cdr123 masked language model.\n  python scripts/pretrain/pretrain_plm.py --config configs/config-pretrain-cdr123-lm.yml\n  ```\n- Train peptide-MHC binding affinity or eluted ligand models.\n\n  ```bash\n  # pretrain peptide-MHC binding affinity model.\n  python scripts/pretrain/pretrain_pmhc_model.py --config configs/config-pmhc-binding.yml\n\n  # pretrain peptide-MHC eluted ligand model.\n  python scripts/pretrain/pretrain_pmhc_model.py --config configs/config-pmhc-elution.yml\n  ```\n\n#### 2. Predict binding specificity\n\n- Train TCR$\\alpha\\beta$-pMHC binding models.\n\n  ```bash\n  # finetune Paired TCR-pMHC binding model (CDR3).\n  python scripts/train/train_tcr_pmhc_binding.py --config configs/config-paired-cdr3-pmhc-binding.yml \n\n  # finetune Paired TCR-pMHC binding model (CDR123).\n  python scripts/train/train_tcr_pmhc_binding.py --config configs/config-paired-cdr123-pmhc-binding.yml\n  ```\n- Predict TCR$\\alpha\\beta$-pMHC binding specificity.\n\n  ```bash\n  # predict cross-validation results\n  for i in {1..5}\n  do\n      python scripts/predict/predict_tcr_pmhc_binding.py \\\n          --config configs/config-paired-cdr123-pmhc-binding.yml \\\n          --input_data_path data/binding/Full-TCR/k-fold-data/val_fold_${i}.csv \\\n          --model_location checkpoints/paired-cdr123-pmhc-binding/paired-cdr123-pmhc-binding-model-fold-${i}.pt\\\n          --log_dir results/preds-cdr123-pmhc-binding/Fold_${i}/\n  done\n  ```\n- Predict TCR$\\alpha\\beta$-pMHC binding ranks compared to background TCRs\n\n  ```bash\n  # predict binding ranks for SARS-CoV-2 responsive TCR clonotypes\n  python scripts/predict/predict_tcr_pmhc_binding_rank.py --config configs/config-paired-cdr123-pmhc-binding.yml \\\n                                          --log_dir results/ranking-covid-cdr123/ \\\n                                          --input_data_path data/binding/covid_clonotypes.csv \\\n                                          --model_location checkpoints/paired-cdr123-pmhc-binding/paired-cdr123-pmhc-binding-model-all.pt \\\n                                          --bg_tcr_path data/pretrained/10x-paired-healthy-human-tcr-repertoire.csv \\\n                                          --num_bg_tcrs 20000\n  ```\n\n#### 3. Predict interaction conformation\n\n- Train TCR$\\alpha\\beta$-pMHC interaction model.\n\n  ```bash\n  # finetune Paired TCR-pMHC interaction model (CDR123).\n  python scripts/train/train_tcr_pmhc_interact.py --config configs/config-paired-cdr123-pmhc-interact.yml\n  ```\n- Predict TCR$\\alpha\\beta$-pMHC interaction conformations.\n\n  ```bash\n  # predict distance matrices and contact sites between MEL8 TCR and HLA-A2-presented peptides.\n  for i in {1..5}\n  do\n      python scripts/predict/predict_tcr_pmhc_interact.py --config configs/config-paired-cdr123-pmhc-interact.yml \\\n          --input_data_path data/MEL8_A0201_peptides.csv \\\n          --model_location checkpoints/paired-cdr123-pmhc-interaction/paired-cdr123-pmhc-interaction-model-fold-${i}.pt \\\n          --log_dir results/interaction-MEL8-bg-cdr123-closest/Fold_${i}/\n  done\n  ```\n\n### Citation\n\n```tex\n@article {Zhang2024.04.05.588255,\n\tauthor = {Yumeng Zhang and Zhikang Wang and Yunzhe Jiang and Dene R Littler and Mark Gerstein and Anthony W Purcell and Jamie Rossjohn and Hong-Yu Ou and Jiangning Song},\n\ttitle = {Epitope-anchored contrastive transfer learning for paired CD8+ T cell receptor-antigen recognition},\n\telocation-id = {2024.04.05.588255},\n\tyear = {2024},\n\tdoi = {10.1101/2024.04.05.588255},\n\tpublisher = {Cold Spring Harbor Laboratory},\n\tURL = {https://www.biorxiv.org/content/early/2024/04/07/2024.04.05.588255},\n\teprint = {https://www.biorxiv.org/content/early/2024/04/07/2024.04.05.588255.full.pdf},\n\tjournal = {bioRxiv}\n}\n```\n\n### Contact\n\nIf you have any questions, please contact us at [zhangyumeng1@sjtu.edu.cn](mailto:zhangyumeng1@sjtu.edu.cn) or [jiangning.song@monash.edu](mailto:jiangning.song@monash.edu).\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Epitope-anchored contrastive transfer learning for paired CD8+ T Cell receptor-antigen Rrcognition",
    "version": "0.1.1",
    "project_urls": {
        "Homepage": "https://github.com/zhangyumeng1sjtu/EPACT",
        "Issues": "https://github.com/zhangyumeng1sjtu/EPACT/issues"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e93b8ada381122cbded1da4ccff265a4216910cb03a40816f34506daea38def1",
                "md5": "f6d70285529aa8f5afd15118b564f72e",
                "sha256": "0864df13057a0707b4151b02bb7bb4e0d1e7099995fb7cd4066d6f8c0fea7d0e"
            },
            "downloads": -1,
            "filename": "epact-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "f6d70285529aa8f5afd15118b564f72e",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 49715,
            "upload_time": "2024-08-16T10:30:21",
            "upload_time_iso_8601": "2024-08-16T10:30:21.059050Z",
            "url": "https://files.pythonhosted.org/packages/e9/3b/8ada381122cbded1da4ccff265a4216910cb03a40816f34506daea38def1/epact-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "61f25988742d9a3fbff922074fdb4c4c2462355ded96592e88ebae2b60fc0fa0",
                "md5": "be8b5b5f8cfa816d532ad3ea69453027",
                "sha256": "710b583cf9112d2ccdc4a2ef040eae59131f83a7cea9d36037649beca6d413bd"
            },
            "downloads": -1,
            "filename": "epact-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "be8b5b5f8cfa816d532ad3ea69453027",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 45658,
            "upload_time": "2024-08-16T10:30:22",
            "upload_time_iso_8601": "2024-08-16T10:30:22.534129Z",
            "url": "https://files.pythonhosted.org/packages/61/f2/5988742d9a3fbff922074fdb4c4c2462355ded96592e88ebae2b60fc0fa0/epact-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-16 10:30:22",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "zhangyumeng1sjtu",
    "github_project": "EPACT",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "epact"
}
        
Elapsed time: 0.31629s