| Name | clape JSON |
| Version |
0.0.3
JSON |
| download |
| home_page | https://github.com/YAndrewL/CLAPE |
| Summary | CLAPE (Contrastive Learning And Pre-trained Encoder) for protein-ligand binding sites prediction |
| upload_time | 2024-08-23 20:55:43 |
| maintainer | None |
| docs_url | None |
| author | Yufan Andrew Liu |
| requires_python | >=3.8 |
| license | None |
| keywords |
|
| VCS |
 |
| bugtrack_url |
|
| requirements |
No requirements were recorded.
|
| Travis-CI |
No Travis.
|
| coveralls test coverage |
No coveralls.
|
**If you have any questions regarding the code/data, please contact Yufan Liu via andyalbert97@gmail.com.**
# CLAPE framework
This repo holds the code of CLAPE (Contrastive Learning And Pre-trained Encoder) framework for protein-ligands binding sites prediction. We provide 3 ligand-binding tasks including protein-DNA, protein-RNA, and antibody-antigen binding sites prediction, an we will also provide small molecules binding sites weight in the future (check [CLAPE-SMB](https://github.com/JueWangTHU/CLAPE-SMB) for reference).
## Usage
CLAPE is primarily dependent on a large-scale pre-trained protein language model [ProtBert](https://huggingface.co/Rostlab/prot_bert) implemented using [HuggingFace's Transformers](https://huggingface.co/) and [PyTorch](https://pytorch.org/). Please install the dependencies in advance, or create a conda/mamba envrionment using provided environment file. If you are using CLAPE-SMB, please install [ESM](https://github.com/facebookresearch/esm).
```shell
wget https://github.com/YAndrewL/CLAPE/blob/main/environment.yaml
conda env create -f environment.yaml
conda activate clape
```
### 1. Python package from pypi
We provide a python package for predicting ligand-binding sites of given protein sequences in FASTA format. Here we provide a sample file, and please use CLAPE as following steps, taking DNA-binding sites prediction as an example:
```shell
# download model weights and example file
wget https://github.com/YAndrewL/CLAPE/blob/main/example.fa
wget https://github.com/YAndrewL/CLAPE/blob/main/weights/DNA.pth
pip install clape # install clape from pypi
```
```python
# package usage example
from clape import Clape
model = Clape(model_path="model_path", ligand="DNA")
results = model.predict(input_file="example.fa")
```
You can set `keep_score` to `True` to keep the predicted score from model, and use `switch_ligand` to change to another binding site prediction task.
### 2. Command line tools
We also provide a command line tool, which will be installed along the python package, you may use as below:
```shell
clape --input example.fa --output out.txt --ligand DNA --model /path/to/downloaded/model
```
This command will first load the pre-trained models, users can specify the downloading directory using the `--cache` parameter.
Some parameters are described as follows:
| Parameters | Descriptions |
| ----------- | ------------------------------------------------------------ |
| --help | Show the help doc. |
| --ligand | Specify the ligand for prediction, DNA, RNA, and AB (antibody) are supported now. |
| --threshold | Specify the threshold for identifying the binding site, the value needs to be between 0 and 1, default: 0.5. |
| --input | The path of the input file in FASTA format. |
| --output | The path of the output file, the first and the second line are the same as the input file, and the third line is the prediction result. |
| --cache | The path for saving the pre-trained parameters, default: protbert. |
| --model | The path for trained backbone models.|
## Citation
If you find our work helpful, please kindly cite the BibTex as following:
```
@article{10.1093/bib/bbad488,
author = {Liu, Yufan and Tian, Boxue},
title = "{Protein–DNA binding sites prediction based on pre-trained protein language model and contrastive learning}",
journal = {Briefings in Bioinformatics},
volume = {25},
number = {1},
pages = {bbad488},
year = {2024},
month = {01},
abstract = "{Protein–DNA interaction is critical for life activities such as replication, transcription and splicing. Identifying protein–DNA binding residues is essential for modeling their interaction and downstream studies. However, developing accurate and efficient computational methods for this task remains challenging. Improvements in this area have the potential to drive novel applications in biotechnology and drug design. In this study, we propose a novel approach called Contrastive Learning And Pre-trained Encoder (CLAPE), which combines a pre-trained protein language model and the contrastive learning method to predict DNA binding residues. We trained the CLAPE-DB model on the protein–DNA binding sites dataset and evaluated the model performance and generalization ability through various experiments. The results showed that the area under ROC curve values of the CLAPE-DB model on the two benchmark datasets reached 0.871 and 0.881, respectively, indicating superior performance compared to other existing models. CLAPE-DB showed better generalization ability and was specific to DNA-binding sites. In addition, we trained CLAPE on different protein–ligand binding sites datasets, demonstrating that CLAPE is a general framework for binding sites prediction. To facilitate the scientific community, the benchmark datasets and codes are freely available at https://github.com/YAndrewL/clape.}",
issn = {1477-4054},
doi = {10.1093/bib/bbad488},
url = {https://doi.org/10.1093/bib/bbad488},
eprint = {https://academic.oup.com/bib/article-pdf/25/1/bbad488/55381199/bbad488.pdf},
}
```
## Update
- [Aug. 2024] CLAPE can be used as a python package now, please check [clape in pypi](https://pypi.org/project/clape/).
- [Mar. 2024] The training code is released with CLAPE-SMB, please check [this repo](https://github.com/JueWangTHU/CLAPE-SMB) for reference.
- [Jan. 2024] Our paper is publised in Briefings in Bioinformatics, please check [the online version](https://academic.oup.com/bib/article/25/1/bbad488/7505238).
Raw data
{
"_id": null,
"home_page": "https://github.com/YAndrewL/CLAPE",
"name": "clape",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": null,
"author": "Yufan Andrew Liu",
"author_email": "andyalbert97@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/f4/4f/a727733a0ececf600e069c65c34272001f4c2defc2f3be57a411e128e5f6/clape-0.0.3.tar.gz",
"platform": null,
"description": "**If you have any questions regarding the code/data, please contact Yufan Liu via andyalbert97@gmail.com.**\n\n# CLAPE framework\n\nThis repo holds the code of CLAPE (Contrastive Learning And Pre-trained Encoder) framework for protein-ligands binding sites prediction. We provide 3 ligand-binding tasks including protein-DNA, protein-RNA, and antibody-antigen binding sites prediction, an we will also provide small molecules binding sites weight in the future (check [CLAPE-SMB](https://github.com/JueWangTHU/CLAPE-SMB) for reference).\n\n\n## Usage\n\nCLAPE is primarily dependent on a large-scale pre-trained protein language model [ProtBert](https://huggingface.co/Rostlab/prot_bert) implemented using [HuggingFace's Transformers](https://huggingface.co/) and [PyTorch](https://pytorch.org/). Please install the dependencies in advance, or create a conda/mamba envrionment using provided environment file. If you are using CLAPE-SMB, please install [ESM](https://github.com/facebookresearch/esm).\n\n```shell\nwget https://github.com/YAndrewL/CLAPE/blob/main/environment.yaml\nconda env create -f environment.yaml\nconda activate clape \n```\n### 1. Python package from pypi\nWe provide a python package for predicting ligand-binding sites of given protein sequences in FASTA format. Here we provide a sample file, and please use CLAPE as following steps, taking DNA-binding sites prediction as an example:\n\n```shell \n# download model weights and example file\nwget https://github.com/YAndrewL/CLAPE/blob/main/example.fa\nwget https://github.com/YAndrewL/CLAPE/blob/main/weights/DNA.pth\npip install clape # install clape from pypi\n```\n\n```python\n# package usage example\nfrom clape import Clape\n\nmodel = Clape(model_path=\"model_path\", ligand=\"DNA\")\nresults = model.predict(input_file=\"example.fa\")\n```\nYou can set `keep_score` to `True` to keep the predicted score from model, and use `switch_ligand` to change to another binding site prediction task.\n\n\n### 2. Command line tools\nWe also provide a command line tool, which will be installed along the python package, you may use as below:\n\n```shell\nclape --input example.fa --output out.txt --ligand DNA --model /path/to/downloaded/model\n```\n\nThis command will first load the pre-trained models, users can specify the downloading directory using the `--cache` parameter.\n\nSome parameters are described as follows:\n\n| Parameters | Descriptions |\n| ----------- | ------------------------------------------------------------ |\n| --help | Show the help doc. |\n| --ligand | Specify the ligand for prediction, DNA, RNA, and AB (antibody) are supported now. |\n| --threshold | Specify the threshold for identifying the binding site, the value needs to be between 0 and 1, default: 0.5. |\n| --input | The path of the input file in FASTA format. |\n| --output | The path of the output file, the first and the second line are the same as the input file, and the third line is the prediction result. |\n| --cache | The path for saving the pre-trained parameters, default: protbert. |\n| --model | The path for trained backbone models.|\n\n## Citation\nIf you find our work helpful, please kindly cite the BibTex as following:\n```\n@article{10.1093/bib/bbad488,\n author = {Liu, Yufan and Tian, Boxue},\n title = \"{Protein\u2013DNA binding sites prediction based on pre-trained protein language model and contrastive learning}\",\n journal = {Briefings in Bioinformatics},\n volume = {25},\n number = {1},\n pages = {bbad488},\n year = {2024},\n month = {01},\n abstract = \"{Protein\u2013DNA interaction is critical for life activities such as replication, transcription and splicing. Identifying protein\u2013DNA binding residues is essential for modeling their interaction and downstream studies. However, developing accurate and efficient computational methods for this task remains challenging. Improvements in this area have the potential to drive novel applications in biotechnology and drug design. In this study, we propose a novel approach called Contrastive Learning And Pre-trained Encoder (CLAPE), which combines a pre-trained protein language model and the contrastive learning method to predict DNA binding residues. We trained the CLAPE-DB model on the protein\u2013DNA binding sites dataset and evaluated the model performance and generalization ability through various experiments. The results showed that the area under ROC curve values of the CLAPE-DB model on the two benchmark datasets reached 0.871 and 0.881, respectively, indicating superior performance compared to other existing models. CLAPE-DB showed better generalization ability and was specific to DNA-binding sites. In addition, we trained CLAPE on different protein\u2013ligand binding sites datasets, demonstrating that CLAPE is a general framework for binding sites prediction. To facilitate the scientific community, the benchmark datasets and codes are freely available at https://github.com/YAndrewL/clape.}\",\n issn = {1477-4054},\n doi = {10.1093/bib/bbad488},\n url = {https://doi.org/10.1093/bib/bbad488},\n eprint = {https://academic.oup.com/bib/article-pdf/25/1/bbad488/55381199/bbad488.pdf},\n}\n```\n\n## Update\n- [Aug. 2024] CLAPE can be used as a python package now, please check [clape in pypi](https://pypi.org/project/clape/).\n\n- [Mar. 2024] The training code is released with CLAPE-SMB, please check [this repo](https://github.com/JueWangTHU/CLAPE-SMB) for reference.\n\n- [Jan. 2024] Our paper is publised in Briefings in Bioinformatics, please check [the online version](https://academic.oup.com/bib/article/25/1/bbad488/7505238).\n",
"bugtrack_url": null,
"license": null,
"summary": "CLAPE (Contrastive Learning And Pre-trained Encoder) for protein-ligand binding sites prediction",
"version": "0.0.3",
"project_urls": {
"Homepage": "https://github.com/YAndrewL/CLAPE"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "0a30b612f871fe4265eb14dfecee2cfe574c4bafd36793b7866fba3c4d172ebe",
"md5": "eb84f67c7a0953289fe9ac9354010a3f",
"sha256": "9b376f3c8e27e05b835a152895721af60df942cc434063ea88a2da524903d97a"
},
"downloads": -1,
"filename": "clape-0.0.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "eb84f67c7a0953289fe9ac9354010a3f",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 10433,
"upload_time": "2024-08-23T20:55:41",
"upload_time_iso_8601": "2024-08-23T20:55:41.425858Z",
"url": "https://files.pythonhosted.org/packages/0a/30/b612f871fe4265eb14dfecee2cfe574c4bafd36793b7866fba3c4d172ebe/clape-0.0.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "f44fa727733a0ececf600e069c65c34272001f4c2defc2f3be57a411e128e5f6",
"md5": "241e416e1b4a8332e78ee9c120e31eab",
"sha256": "f1dd3c73cb2e62a50c83d40eba306fa058459ac0559ff3b9d61db9d015f00ea9"
},
"downloads": -1,
"filename": "clape-0.0.3.tar.gz",
"has_sig": false,
"md5_digest": "241e416e1b4a8332e78ee9c120e31eab",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 10944,
"upload_time": "2024-08-23T20:55:43",
"upload_time_iso_8601": "2024-08-23T20:55:43.103867Z",
"url": "https://files.pythonhosted.org/packages/f4/4f/a727733a0ececf600e069c65c34272001f4c2defc2f3be57a411e128e5f6/clape-0.0.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-08-23 20:55:43",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "YAndrewL",
"github_project": "CLAPE",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "clape"
}