fair-GPD


Namefair-GPD JSON
Version 0.0.2 PyPI version JSON
download
home_pagehttps://github.com/decodermu/GPD
SummaryGraphormer Based Protein Sequence Design Package: GPD
upload_time2024-01-20 06:38:21
maintainer
docs_urlNone
authorJunximu
requires_python>=3.8
licenseMIT
keywords gpd
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # GPD
Graphormer-based Protein Design (GPD) model deploys the Transformer on a graph-based representation of 3D protein structures and supplements it with Gaussian noise and a sequence random mask applied to node features, thereby enhancing sequence recovery and diversity. The performance of GPD model was significantly better than that of state-of-the-art model for ProteinMPNN on multiple independent tests, especially for sequence diversity.

![image](http://yu.life.sjtu.edu.cn/ChenLab/GPDGenerator/static/imgs/workflow.png)

# Install
## Quick Start
One can use pip to directly install our package
```
pip install fair-GPD
```
## Install with conda
```
conda create -n GPD
source activate GPD
conda install pytorch==1.12.1 -c pytorch
conda install -c conda-forge mdtraj==1.9.8
conda install -c anaconda networkx==3.1
```
Note that GPD could be used with cuda, you can install the cudatoolkit package according to your own gpu version.
Also, one could use our given ```environment.yml``` file to create an environment
```
conda env create -f environment.yml
```
## Install with pip
One can use our given ```requirements.txt``` file for pip installation
```
pip install -r requirements.txt
```

# Example
```
cd example/
sh submit_example_2_fixed.sh  (simple example)
sh submit_example_1.sh (fix some residue positions)
```

## Output example:
outputs/example_1_outputs/1tca.fasta
```
> predicted model_0	acc: 0.3501577287066246	length: 317
APTGAAPPLTLPPATLRAQLAAKGASPEDLKNPVLILHGPGTDGAEDFAGFLVRLLKSKGYTPAYVDPDPN
ALDDIADDLEALALAAKYLAAGLGNKPFNVITHSLGGVALLTALAYHPELRDKIKRVVLVSPLPTGSDSLR
ALLAANTLRLLQFLSVKGSALDDAARKAGALTPLVPTTVIGHANDPLHYPTSLGSPASGAYVPDARVIDLY
SVYGPDFTVDHAEAVFSSLVRKALKAALTSSSGYARASDVGKSLRVSDPAKDLSAEQREAFLNLLAPAAAA
IANGKTGNACPPLPPEYLPAAPGAKGAGGVLTP
> predicted model_1	acc: 0.334384858044164	length: 317
APTGEPLPLLLPDATLLANVEADGADIDEVTNPVLLLHGLGSDGEEALGASLVALLKALGYTPLGVDPDPN
YTDDILDDAQALAAAARALAAGLGNKPLLVVGHSLGGVVVLLALRYNPALADLIASVILVAPAPRGSSEAR
PLIAAKILRPEDFLLLYGSALADALRAAGLDVPLVPTTVIDSADDPLHSPNALLSAESAAYVPGGTVVDLS
DIFGPDFTVSHAGAVLSPFLRKLLEAALASPTGVPREEDVGASLLDLDLAADLTAEERAAALNALAAYAAR
IAAGARFNAYPALPPELVPAAKGATDAAGTLKP
```
*  **acc** is recovery. Recovery was the proportion of the same amino acids at equivalent position between the native sequence and the designed sequence
*  **length** is the length of designed sequence.

# Training the GPD model
## Dataset
The GPD model was trained using the CATH 40% sequential non-redundancy dataset, with a split ratio of 29868:1000:103 for the training, validation, and testing sets, respectively. We further evaluated the performance of GPD using 39 de novo proteins, including 14 de novo proteins that exhibit significant structural differences from proteins belonging to natural folds.
*  **data/cath-dataset-nonredundant-S40-v4_3_0.pdb** is CATH 40% sequential non-redundancy dataset downloaded from http://download.cathdb.info/cath/releases/all-releases/v4_3_0/non-redundant-data-sets/cath-dataset-nonredundant-S40-v4_3_0.pdb.tgz
*  **data/sc103** is 103 single chain proteins
*  **data/denovo39** is 39 de novo proteins
*  **data/denovo14** is 14 de novo proteins

## Training the GPD model
**train/train_encoder3.py** Its training lasted 1 days and utilized 1 NVIDIA 40G A100 GPUs

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/decodermu/GPD",
    "name": "fair-GPD",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "GPD",
    "author": "Junximu",
    "author_email": "mujunxi@126.com",
    "download_url": "https://files.pythonhosted.org/packages/0d/38/8d8d2b3680eb938f1086904c65d45e76e7af8fccdbe144cc9be6f1879881/fair-GPD-0.0.2.tar.gz",
    "platform": null,
    "description": "# GPD\nGraphormer-based Protein Design (GPD) model deploys the Transformer on a graph-based representation of 3D protein structures and supplements it with Gaussian noise and a sequence random mask applied to node features, thereby enhancing sequence recovery and diversity. The performance of GPD model was significantly better than that of state-of-the-art model for ProteinMPNN on multiple independent tests, especially for sequence diversity.\n\n![image](http://yu.life.sjtu.edu.cn/ChenLab/GPDGenerator/static/imgs/workflow.png)\n\n# Install\n## Quick Start\nOne can use pip to directly install our package\n```\npip install fair-GPD\n```\n## Install with conda\n```\nconda create -n GPD\nsource activate GPD\nconda install pytorch==1.12.1 -c pytorch\nconda install -c conda-forge mdtraj==1.9.8\nconda install -c anaconda networkx==3.1\n```\nNote that GPD could be used with cuda, you can install the cudatoolkit package according to your own gpu version.\nAlso, one could use our given ```environment.yml``` file to create an environment\n```\nconda env create -f environment.yml\n```\n## Install with pip\nOne can use our given ```requirements.txt``` file for pip installation\n```\npip install -r requirements.txt\n```\n\n# Example\n```\ncd example/\nsh submit_example_2_fixed.sh  (simple example)\nsh submit_example_1.sh (fix some residue positions)\n```\n\n## Output example:\noutputs/example_1_outputs/1tca.fasta\n```\n> predicted model_0\tacc: 0.3501577287066246\tlength: 317\nAPTGAAPPLTLPPATLRAQLAAKGASPEDLKNPVLILHGPGTDGAEDFAGFLVRLLKSKGYTPAYVDPDPN\nALDDIADDLEALALAAKYLAAGLGNKPFNVITHSLGGVALLTALAYHPELRDKIKRVVLVSPLPTGSDSLR\nALLAANTLRLLQFLSVKGSALDDAARKAGALTPLVPTTVIGHANDPLHYPTSLGSPASGAYVPDARVIDLY\nSVYGPDFTVDHAEAVFSSLVRKALKAALTSSSGYARASDVGKSLRVSDPAKDLSAEQREAFLNLLAPAAAA\nIANGKTGNACPPLPPEYLPAAPGAKGAGGVLTP\n> predicted model_1\tacc: 0.334384858044164\tlength: 317\nAPTGEPLPLLLPDATLLANVEADGADIDEVTNPVLLLHGLGSDGEEALGASLVALLKALGYTPLGVDPDPN\nYTDDILDDAQALAAAARALAAGLGNKPLLVVGHSLGGVVVLLALRYNPALADLIASVILVAPAPRGSSEAR\nPLIAAKILRPEDFLLLYGSALADALRAAGLDVPLVPTTVIDSADDPLHSPNALLSAESAAYVPGGTVVDLS\nDIFGPDFTVSHAGAVLSPFLRKLLEAALASPTGVPREEDVGASLLDLDLAADLTAEERAAALNALAAYAAR\nIAAGARFNAYPALPPELVPAAKGATDAAGTLKP\n```\n*  **acc** is recovery. Recovery was the proportion of the same amino acids at equivalent position between the native sequence and the designed sequence\n*  **length** is the length of designed sequence.\n\n# Training the GPD model\n## Dataset\nThe GPD model was trained using the CATH 40% sequential non-redundancy dataset, with a split ratio of 29868:1000:103 for the training, validation, and testing sets, respectively. We further evaluated the performance of GPD using 39 de novo proteins, including 14 de novo proteins that exhibit significant structural differences from proteins belonging to natural folds.\n*  **data/cath-dataset-nonredundant-S40-v4_3_0.pdb** is CATH 40% sequential non-redundancy dataset downloaded from http://download.cathdb.info/cath/releases/all-releases/v4_3_0/non-redundant-data-sets/cath-dataset-nonredundant-S40-v4_3_0.pdb.tgz\n*  **data/sc103** is 103 single chain proteins\n*  **data/denovo39** is 39 de novo proteins\n*  **data/denovo14** is 14 de novo proteins\n\n## Training the GPD model\n**train/train_encoder3.py** Its training lasted 1 days and utilized 1 NVIDIA 40G A100 GPUs\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Graphormer Based Protein Sequence Design Package: GPD",
    "version": "0.0.2",
    "project_urls": {
        "Homepage": "https://github.com/decodermu/GPD"
    },
    "split_keywords": [
        "gpd"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a98e50b9cce066c35e8ea983abc79e38f326d915e506c40011df7d438130f0a8",
                "md5": "d535d02206817c29030eecbad564f44f",
                "sha256": "eae79256d9da93cd57cc5f92e2925a2df16e80cfa2019f71d6f4f7c213b73b0d"
            },
            "downloads": -1,
            "filename": "fair_GPD-0.0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d535d02206817c29030eecbad564f44f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 10162645,
            "upload_time": "2024-01-20T06:38:15",
            "upload_time_iso_8601": "2024-01-20T06:38:15.994475Z",
            "url": "https://files.pythonhosted.org/packages/a9/8e/50b9cce066c35e8ea983abc79e38f326d915e506c40011df7d438130f0a8/fair_GPD-0.0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0d388d8d2b3680eb938f1086904c65d45e76e7af8fccdbe144cc9be6f1879881",
                "md5": "a0f2c6ea8f33f31400bdf2a21ed870ca",
                "sha256": "feae75051e9a16b41b3fc590e7039a48d5b7300eef2591959bef5b232d8ba20b"
            },
            "downloads": -1,
            "filename": "fair-GPD-0.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "a0f2c6ea8f33f31400bdf2a21ed870ca",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 10163366,
            "upload_time": "2024-01-20T06:38:21",
            "upload_time_iso_8601": "2024-01-20T06:38:21.680973Z",
            "url": "https://files.pythonhosted.org/packages/0d/38/8d8d2b3680eb938f1086904c65d45e76e7af8fccdbe144cc9be6f1879881/fair-GPD-0.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-01-20 06:38:21",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "decodermu",
    "github_project": "GPD",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "fair-gpd"
}
        
Elapsed time: 0.20339s