<h1 align="center">PDB-score</h1>
[中文文档请点击](https://github.com/SiriNatsume/PDB-score/blob/master/readme-zh.md)
## Features
Implemented GDT scoring for a large number of predicted and actual protein models, along with calculating RMSD after coordinate alignment, which can be used to evaluate the prediction models.
## Installation
```
pip install PDB-score
```
## Usage
```aiignore
psc [-h] -c CONTROL -t TREATMENT -o OUTPUT [-T THREAT] [-B BATCH] [-m {default,prealign}] [-d THRESHOLD] [-i MAX_ITERATIONS] [-s SAVE_LIMIT]
```
- `-c` Directory where the experimental PDB files are stored.
- `-t` Directory where the predicted PDB files are stored.
- `-o` Directory for saving the output scores.
- `-T` Specify the number of cores, default is 4.
- `-B` Specify the Batch size, default is 5000.
- `-m` Specifies the atomic alignment method, defaulting to the Biopython implementation.
---
The following parameters apply only in `-m prealign` mode:
- `-d` Specifies the distance threshold (in Å) for excluding atoms during each iteration of optimization, defaulting to 1.0.
- `-i` Specifies the maximum number of iterations, defaulting to 10.
- `-s` Specifies the minimum proportion of atomic points allowed to be retained, defaulting to 1.0.
## Notes
- `-s` should be precise to two decimal places and cannot exceed 1.0. It applies to the smaller molecule among two of different sizes.
- `-o` should only specify the output directory and not the file name.
- The output is a `.csv` file with a fixed file name, so take care to avoid overwriting it.
- Only `.pdb` and `.ent` files with the same name (excluding extensions) in the two input directories will be analyzed.
- The `prealign` mode has not been performance-optimized and may exhibit suboptimal performance.
## Output
/path/to/output/protein_scores.csv
| name | RMSD | 1A | 2A | ... | 128A | Average |
|:---------|:-------------|:------|:------|:----|:------|:--------|
| Protein1 | rmsd (float) | Score | Score | ... | Score | Score |
| Protein2 | rmsd (float) | Score | Score | ... | Score | Score |
| ... | ... | ... | ... | ... | ... | ... |
## Calculation Method
Default Mode:
- Use Biopython for coordinate alignment.
- Calculate scores using the GDT algorithm.
- Remove all ligands; represent residue coordinates using the central carbon atom coordinate.
- If the number of central carbon atoms is unequal, excess/fewer residues are directly ignored (regardless of precision).
---
Prealign Mode:
- Use a custom `align` algorithm for rigid coordinate alignment.
- Calculate scores using the GDT algorithm.
- Remove all ligands; represent residue coordinates using the central carbon atom coordinate.
- If the number of central carbon atoms is unequal, the most matched chains and fragments are selected using the LCS method.
## Performance
- Test Environment:
- Default parameters: `-T 4 -B 5000`
- Test Machine: `Windows 11 PC, CPU Intel 12600k`
- Single sample size: 146KB with 154 residues
- Comparing 50,000 samples took 387061ms.
- Memory usage is less than 6GB.
## Others
- When `-s` is less than 1.0, it indicates that an equivalent proportion of information (excluding certain atoms) is allowed to be lost during the alignment process. This often results in better alignment performance but may affect the accuracy of the results.
- If the protein data contains significant noise (e.g., the centroid is not located at the origin, includes irrelevant chains, or has a significantly unequal number of central carbon atoms), the `prealign` alignment method typically performs better.
- It is recommended to ensure minimal contamination of protein data before using the default alignment method to achieve more accurate results.
## Acknowledgments
- [JetBrains](https://www.jetbrains.com/)
- [ChatGPT](https://www.chatgpt.com)
[@SiriNatsume](https://github.com/SiriNatsume)
Wishing you happiness :)
Raw data
{
"_id": null,
"home_page": "https://github.com/SiriNatsume/PDB-score",
"name": "PDB-score",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": null,
"keywords": "protein, score, PDB, alignment, GDT",
"author": "SiriNatsume",
"author_email": "SiriNatsume@outlook.com",
"download_url": "https://files.pythonhosted.org/packages/de/f9/856c25431ef284c263cc3a22d350592a290442dddbae3ba5a915638a66cb/pdb_score-1.1.2.tar.gz",
"platform": null,
"description": "\r\n<h1 align=\"center\">PDB-score</h1>\r\n\r\n[\u4e2d\u6587\u6587\u6863\u8bf7\u70b9\u51fb](https://github.com/SiriNatsume/PDB-score/blob/master/readme-zh.md)\r\n\r\n## Features\r\nImplemented GDT scoring for a large number of predicted and actual protein models, along with calculating RMSD after coordinate alignment, which can be used to evaluate the prediction models.\r\n\r\n## Installation\r\n```\r\npip install PDB-score\r\n```\r\n\r\n## Usage\r\n```aiignore\r\npsc [-h] -c CONTROL -t TREATMENT -o OUTPUT [-T THREAT] [-B BATCH] [-m {default,prealign}] [-d THRESHOLD] [-i MAX_ITERATIONS] [-s SAVE_LIMIT]\r\n```\r\n- `-c` Directory where the experimental PDB files are stored.\r\n- `-t` Directory where the predicted PDB files are stored.\r\n- `-o` Directory for saving the output scores.\r\n- `-T` Specify the number of cores, default is 4.\r\n- `-B` Specify the Batch size, default is 5000.\r\n- `-m` Specifies the atomic alignment method, defaulting to the Biopython implementation. \r\n---\r\nThe following parameters apply only in `-m prealign` mode: \r\n- `-d` Specifies the distance threshold (in \u00c5) for excluding atoms during each iteration of optimization, defaulting to 1.0. \r\n- `-i` Specifies the maximum number of iterations, defaulting to 10. \r\n- `-s` Specifies the minimum proportion of atomic points allowed to be retained, defaulting to 1.0. \r\n\r\n## Notes\r\n- `-s` should be precise to two decimal places and cannot exceed 1.0. It applies to the smaller molecule among two of different sizes. \r\n- `-o` should only specify the output directory and not the file name. \r\n- The output is a `.csv` file with a fixed file name, so take care to avoid overwriting it. \r\n- Only `.pdb` and `.ent` files with the same name (excluding extensions) in the two input directories will be analyzed. \r\n- The `prealign` mode has not been performance-optimized and may exhibit suboptimal performance. \r\n\r\n## Output\r\n/path/to/output/protein_scores.csv\r\n\r\n| name | RMSD | 1A | 2A | ... | 128A | Average |\r\n|:---------|:-------------|:------|:------|:----|:------|:--------|\r\n| Protein1 | rmsd (float) | Score | Score | ... | Score | Score |\r\n| Protein2 | rmsd (float) | Score | Score | ... | Score | Score |\r\n| ... | ... | ... | ... | ... | ... | ... |\r\n\r\n## Calculation Method\r\nDefault Mode:\r\n- Use Biopython for coordinate alignment.\r\n- Calculate scores using the GDT algorithm.\r\n- Remove all ligands; represent residue coordinates using the central carbon atom coordinate.\r\n- If the number of central carbon atoms is unequal, excess/fewer residues are directly ignored (regardless of precision).\r\n---\r\nPrealign Mode:\r\n- Use a custom `align` algorithm for rigid coordinate alignment.\r\n- Calculate scores using the GDT algorithm.\r\n- Remove all ligands; represent residue coordinates using the central carbon atom coordinate.\r\n- If the number of central carbon atoms is unequal, the most matched chains and fragments are selected using the LCS method.\r\n\r\n## Performance\r\n- Test Environment:\r\n - Default parameters: `-T 4 -B 5000`\r\n - Test Machine: `Windows 11 PC, CPU Intel 12600k`\r\n - Single sample size: 146KB with 154 residues\r\n- Comparing 50,000 samples took 387061ms.\r\n- Memory usage is less than 6GB.\r\n\r\n## Others\r\n- When `-s` is less than 1.0, it indicates that an equivalent proportion of information (excluding certain atoms) is allowed to be lost during the alignment process. This often results in better alignment performance but may affect the accuracy of the results. \r\n- If the protein data contains significant noise (e.g., the centroid is not located at the origin, includes irrelevant chains, or has a significantly unequal number of central carbon atoms), the `prealign` alignment method typically performs better. \r\n- It is recommended to ensure minimal contamination of protein data before using the default alignment method to achieve more accurate results. \r\n\r\n## Acknowledgments\r\n- [JetBrains](https://www.jetbrains.com/)\r\n- [ChatGPT](https://www.chatgpt.com)\r\n\r\n[@SiriNatsume](https://github.com/SiriNatsume) \r\nWishing you happiness :)\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A tool to massively calculate protein scores using PDB files.",
"version": "1.1.2",
"project_urls": {
"Homepage": "https://github.com/SiriNatsume/PDB-score"
},
"split_keywords": [
"protein",
" score",
" pdb",
" alignment",
" gdt"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "e9dbe3e397bee246f114248ed369e272a93769ebeae55933ab246b79fe86cfac",
"md5": "7a6bc345014b8b1da5354db17a4f3191",
"sha256": "70fd28ffea24f36e77622ee2798a1bdbdf3c3e305f9baa7800c722a20aaa4af0"
},
"downloads": -1,
"filename": "PDB_score-1.1.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "7a6bc345014b8b1da5354db17a4f3191",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 13084,
"upload_time": "2025-01-04T14:26:12",
"upload_time_iso_8601": "2025-01-04T14:26:12.244736Z",
"url": "https://files.pythonhosted.org/packages/e9/db/e3e397bee246f114248ed369e272a93769ebeae55933ab246b79fe86cfac/PDB_score-1.1.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "def9856c25431ef284c263cc3a22d350592a290442dddbae3ba5a915638a66cb",
"md5": "4b7bf78d612589f4e1f8eb2dfa054f9a",
"sha256": "2923a1902e7ac91af14e43da3d7bb19872f3229a3ed682df1ad86d02b967e782"
},
"downloads": -1,
"filename": "pdb_score-1.1.2.tar.gz",
"has_sig": false,
"md5_digest": "4b7bf78d612589f4e1f8eb2dfa054f9a",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 10129,
"upload_time": "2025-01-04T14:26:14",
"upload_time_iso_8601": "2025-01-04T14:26:14.792199Z",
"url": "https://files.pythonhosted.org/packages/de/f9/856c25431ef284c263cc3a22d350592a290442dddbae3ba5a915638a66cb/pdb_score-1.1.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-04 14:26:14",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "SiriNatsume",
"github_project": "PDB-score",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "pdb-score"
}