BioBrigit
===============
BioBrigit is a computational tool designed for the prediction of metal diffusion pathways through a protein. It uses a novel scoring function that combines deep learning and previous domain knowledge regarding bioinorganic interactions. The deep learning part of our hybrid approach consists on a 3D Convolutional Neural Network trained to interpret the biochemical environment to distinguish between metal-binding and non-binding protein regions.
<picture>
<source media="(prefers-color-scheme: dark)" srcset="https://raw.githubusercontent.com/RaulFD-creator/biobrigit/master/docs/figures/BioBrigit_dark_border.png" width="850" class="center">
<source media="(prefers-color-scheme: light)" srcset="https://raw.githubusercontent.com/RaulFD-creator/biobrigit/master/docs/figures/BioBrigit_light.png" width="850" class="center">
<img alt="Shows an stylised anvil with a neural network upon it." src=""https://raw.githubusercontent.com/RaulFD-creator/biobrigit/master/docs/figures/BioBrigit_light.png">
</picture>
Features
--------
**Diferent options for customizing the search:**
* Search for the paths of specific metals.
* Provide a score that indicates how strong the protein-metal interaction will be in different positions.
* Scan the whole protein or only a region (in PDB format).
**Possible applications:**
* Identification of probable metal diffusion pathways through a protein.
* Identification of conformational changes that alter the formation of such paths.
* Metalloenzyme and metallodrug design.
* Aid in developing hypothesis in molecular physiopathology.
* Drug discovery.
**Modular design:**
* The modular design of this package allows for its use as a command-line application or to be integrated into a larger Python program or pipeline. In the [script](https://github.com/insilichem/BioBrigit/tree/main/scripts) directory are some of examples on how to integrate the program within a wider pipeline.
Installation
------------
It can be directly installed in any conda environment by
```bash
conda create -n biobrigit python
conda activate biobrigit
pip install biobrigit
```
Usage
-----
Once the environment is properly set-up the use of the program is relatively simple. The easiest example is:
```bash
biobrigit target metal
```
There are many parameters that can be also tuned, though default use is reccomended.
* `--model`: Which CNN model is to be used. Two options are currently available: `BrigitCNN` which is the default model and `TinyBrigit`, which is a smaller model for improved computational efficiency, though it has lower accuracy.
* `--device`: Whether to use GPU acceleration (`cuda`) or not (`cpu`). By default, it uses GPU if available.
* `--device_id`: Which of the available GPU devices should be used for the calculations in case that there are more than one GPU available. By default, it uses the device labelled as 0.
* `--outputfile`: Name of the outputfiles. The file extensions (`.txt` and `.pdb`) will be added automatically.
* `--max_coordinators`: Number of maximum coordinators expected. By default, 2. It only affects the range of values assigned to the probes.
* `--residues`: Number of most likely coordinating residues. By default, 10.
* `--stride`: Step at which the voxelized representation of the protein should be parsed. By default, 1. The greater the stride, the greater the computational efficiency; however, the resolution of the predictions will be affected.
* `--cluster_radius`: Radius of the clusters to be generated in armstrongs. By default, 5.
* `--cnn_threshold`: Threshold for considering CNN points as possible coordinations. Lower values will impact computational efficiency; greater values, may hide possible coordinating regions. By default, 0.5. Values should be within the range [0, 1].
* `--combined_threshold`: Threshold for considering predictions combining BioMetAll and CNN scores as positive. By default, 0.5. Values should be within the range [0, 1].
* `--threads`: Number of threads available for multithreading calculation. By default it will create 2 threads per physical core.
* `--verbose`: Information that will be displayed. 0: Only Moleculekit, 1: All. By default, 1.
* `--residue score`: Scoring function for residue coordination analysis. Can be either `discrete`, that only considers how likely is a residue to bind to a certain metal (more computationally efficient); or `gaussian`, that also considers the fitness of the geometrical descriptors for a certain residue and metal. By default, `gaussian`.
The following parameters can also be tuned, but their modification is **not** reccomended as it may translate in unreliable predictions.
* `--cnn_weight`: Importance of the CNN score in the final score in relations to the BioMetAll score. By default, 0.5. Values should be within the range [0, 1].
* `--voxelsize`: Resolution of the 3D representation. In Arnstrongs. By default, 1 A.
* `--pH`: pH of the medium at which the structure is to be evaluated. By default, 7.4.
**Examples:**
Searching for copper.
```bash
biobrigit 1dhy Cu
```
Searching with generic metal.
```bash
biobrigit 1dhy generic
```
Searching for multiple metals simultanously.
```bash
biobrigit 1dhy fe,generic
```
Fast preliminar exploration for binding sites with 4 coordinations, no GPU, and only considering the 4 most likely coordinating residues.
```bash
biobrigit 1dhy Cu --stride 3 --max_coordinators 4 --device cpu --residues 4
```
Search for small clusters at acidic pH (5.2).
```bash
biobrigit 1dhy Cu --cluster_radius 3 --pH 5.2
```
Output
------
The program generates 2 output files.
1. A `.txt` file that contains information regarding the clusters of probes ordered by the predicted strength of the interaction between protein and metal. This file also displays a list of possible coordinating residues.
2. A `.pdb` file that contains the coordinates of all probes with a score greater than `combined_threshold` and is the recommended output format for visualizing the predicted paths. The probes are represented as He atoms and the centers of their clusters as Ar atoms. To easily visualise the score for each probe, simply colour the probes by their $\beta$-factor using your protein visualization tool of choice.
License
-------
BioBrigit is an open-source software licensed under the BSD-3 Clause License. Check the details in the [LICENSE](https://github.com/raulfd-creator/biobrigit/blob/master/LICENSE) file.
Development Team
----------------
- Project lead: [Jean-Didier Marechal](https://github.com/JeanDidier).
- Lead development: [Raul Fernandez-Diaz](https://github.com/RaulFD-creator).
Credits
-------
Special thanks to [Silvia González López](https://www.linkedin.com/in/silvia-gonz%C3%A1lez-l%C3%B3pez-717558221/) for designing the BioBrigit logo.
Raw data
{
"_id": null,
"home_page": "https://github.com/insillichem/BioBrigit",
"name": "biobrigit",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": null,
"keywords": "biobrigit",
"author": "Ra\u00fal Fern\u00e1ndez D\u00edaz",
"author_email": "raul.fernandezdiaz@ucdconnect.ie",
"download_url": "https://files.pythonhosted.org/packages/b6/3f/c33c5c718bbf2ec256e92bf1e46c46fcce45c43a001922244f371224cfef/biobrigit-0.0.9.tar.gz",
"platform": null,
"description": "BioBrigit\n===============\n\nBioBrigit is a computational tool designed for the prediction of metal diffusion pathways through a protein. It uses a novel scoring function that combines deep learning and previous domain knowledge regarding bioinorganic interactions. The deep learning part of our hybrid approach consists on a 3D Convolutional Neural Network trained to interpret the biochemical environment to distinguish between metal-binding and non-binding protein regions.\n \n<picture>\n <source media=\"(prefers-color-scheme: dark)\" srcset=\"https://raw.githubusercontent.com/RaulFD-creator/biobrigit/master/docs/figures/BioBrigit_dark_border.png\" width=\"850\" class=\"center\">\n <source media=\"(prefers-color-scheme: light)\" srcset=\"https://raw.githubusercontent.com/RaulFD-creator/biobrigit/master/docs/figures/BioBrigit_light.png\" width=\"850\" class=\"center\">\n <img alt=\"Shows an stylised anvil with a neural network upon it.\" src=\"\"https://raw.githubusercontent.com/RaulFD-creator/biobrigit/master/docs/figures/BioBrigit_light.png\">\n</picture>\n\n\nFeatures\n--------\n**Diferent options for customizing the search:**\n\n* Search for the paths of specific metals.\n* Provide a score that indicates how strong the protein-metal interaction will be in different positions.\n* Scan the whole protein or only a region (in PDB format).\n\n**Possible applications:**\n\n* Identification of probable metal diffusion pathways through a protein.\n* Identification of conformational changes that alter the formation of such paths.\n* Metalloenzyme and metallodrug design.\n* Aid in developing hypothesis in molecular physiopathology.\n* Drug discovery.\n\n**Modular design:**\n\n* The modular design of this package allows for its use as a command-line application or to be integrated into a larger Python program or pipeline. In the [script](https://github.com/insilichem/BioBrigit/tree/main/scripts) directory are some of examples on how to integrate the program within a wider pipeline.\n\nInstallation\n------------\n\nIt can be directly installed in any conda environment by\n\n```bash\nconda create -n biobrigit python\nconda activate biobrigit\npip install biobrigit\n```\n\n\nUsage\n-----\nOnce the environment is properly set-up the use of the program is relatively simple. The easiest example is:\n\n```bash\nbiobrigit target metal\n```\n\nThere are many parameters that can be also tuned, though default use is reccomended.\n\n* `--model`: Which CNN model is to be used. Two options are currently available: `BrigitCNN` which is the default model and `TinyBrigit`, which is a smaller model for improved computational efficiency, though it has lower accuracy.\n* `--device`: Whether to use GPU acceleration (`cuda`) or not (`cpu`). By default, it uses GPU if available.\n* `--device_id`: Which of the available GPU devices should be used for the calculations in case that there are more than one GPU available. By default, it uses the device labelled as 0.\n* `--outputfile`: Name of the outputfiles. The file extensions (`.txt` and `.pdb`) will be added automatically.\n* `--max_coordinators`: Number of maximum coordinators expected. By default, 2. It only affects the range of values assigned to the probes.\n* `--residues`: Number of most likely coordinating residues. By default, 10.\n* `--stride`: Step at which the voxelized representation of the protein should be parsed. By default, 1. The greater the stride, the greater the computational efficiency; however, the resolution of the predictions will be affected.\n* `--cluster_radius`: Radius of the clusters to be generated in armstrongs. By default, 5.\n* `--cnn_threshold`: Threshold for considering CNN points as possible coordinations. Lower values will impact computational efficiency; greater values, may hide possible coordinating regions. By default, 0.5. Values should be within the range [0, 1].\n* `--combined_threshold`: Threshold for considering predictions combining BioMetAll and CNN scores as positive. By default, 0.5. Values should be within the range [0, 1].\n* `--threads`: Number of threads available for multithreading calculation. By default it will create 2 threads per physical core.\n* `--verbose`: Information that will be displayed. 0: Only Moleculekit, 1: All. By default, 1.\n* `--residue score`: Scoring function for residue coordination analysis. Can be either `discrete`, that only considers how likely is a residue to bind to a certain metal (more computationally efficient); or `gaussian`, that also considers the fitness of the geometrical descriptors for a certain residue and metal. By default, `gaussian`.\n\nThe following parameters can also be tuned, but their modification is **not** reccomended as it may translate in unreliable predictions.\n* `--cnn_weight`: Importance of the CNN score in the final score in relations to the BioMetAll score. By default, 0.5. Values should be within the range [0, 1].\n* `--voxelsize`: Resolution of the 3D representation. In Arnstrongs. By default, 1 A.\n* `--pH`: pH of the medium at which the structure is to be evaluated. By default, 7.4.\n\n**Examples:**\n\nSearching for copper.\n\n```bash\nbiobrigit 1dhy Cu\n```\n\nSearching with generic metal.\n\n```bash\nbiobrigit 1dhy generic\n```\n\nSearching for multiple metals simultanously.\n\n```bash\nbiobrigit 1dhy fe,generic\n```\n\nFast preliminar exploration for binding sites with 4 coordinations, no GPU, and only considering the 4 most likely coordinating residues.\n\n```bash\nbiobrigit 1dhy Cu --stride 3 --max_coordinators 4 --device cpu --residues 4\n```\n\nSearch for small clusters at acidic pH (5.2).\n\n```bash\nbiobrigit 1dhy Cu --cluster_radius 3 --pH 5.2\n```\n\nOutput\n------\nThe program generates 2 output files. \n\n1. A `.txt` file that contains information regarding the clusters of probes ordered by the predicted strength of the interaction between protein and metal. This file also displays a list of possible coordinating residues. \n2. A `.pdb` file that contains the coordinates of all probes with a score greater than `combined_threshold` and is the recommended output format for visualizing the predicted paths. The probes are represented as He atoms and the centers of their clusters as Ar atoms. To easily visualise the score for each probe, simply colour the probes by their $\\beta$-factor using your protein visualization tool of choice.\n\nLicense\n-------\nBioBrigit is an open-source software licensed under the BSD-3 Clause License. Check the details in the [LICENSE](https://github.com/raulfd-creator/biobrigit/blob/master/LICENSE) file.\n\nDevelopment Team\n----------------\n\n- Project lead: [Jean-Didier Marechal](https://github.com/JeanDidier).\n- Lead development: [Raul Fernandez-Diaz](https://github.com/RaulFD-creator).\n\nCredits\n-------\n\nSpecial thanks to [Silvia Gonz\u00e1lez L\u00f3pez](https://www.linkedin.com/in/silvia-gonz%C3%A1lez-l%C3%B3pez-717558221/) for designing the BioBrigit logo.\n",
"bugtrack_url": null,
"license": "BSD license",
"summary": "Computational tool for the prediction of metal-binding loading paths in proteins using deep convolutional neural networks.",
"version": "0.0.9",
"project_urls": {
"Homepage": "https://github.com/insillichem/BioBrigit"
},
"split_keywords": [
"biobrigit"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "8de088481f8fdde05591f3ec25de37cf114dfcb1eb99686fdab062c7edaafa71",
"md5": "5242f4ea775c326300884d98d5a7e79f",
"sha256": "d5aee1e31a5dcc93ddccd37014f2759b65b3320f06c1ec40c89707698cd7d535"
},
"downloads": -1,
"filename": "biobrigit-0.0.9-py3-none-any.whl",
"has_sig": false,
"md5_digest": "5242f4ea775c326300884d98d5a7e79f",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 40053746,
"upload_time": "2024-09-17T14:30:19",
"upload_time_iso_8601": "2024-09-17T14:30:19.314459Z",
"url": "https://files.pythonhosted.org/packages/8d/e0/88481f8fdde05591f3ec25de37cf114dfcb1eb99686fdab062c7edaafa71/biobrigit-0.0.9-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "b63fc33c5c718bbf2ec256e92bf1e46c46fcce45c43a001922244f371224cfef",
"md5": "599c6f567126073f18afb5e28a679aee",
"sha256": "bcf198d3979dc5e358522e1ce134b95ae1e04e6e6456e4cd2452ec21f7d99abe"
},
"downloads": -1,
"filename": "biobrigit-0.0.9.tar.gz",
"has_sig": false,
"md5_digest": "599c6f567126073f18afb5e28a679aee",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 20039278,
"upload_time": "2024-09-17T14:30:22",
"upload_time_iso_8601": "2024-09-17T14:30:22.918584Z",
"url": "https://files.pythonhosted.org/packages/b6/3f/c33c5c718bbf2ec256e92bf1e46c46fcce45c43a001922244f371224cfef/biobrigit-0.0.9.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-09-17 14:30:22",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "insillichem",
"github_project": "BioBrigit",
"github_not_found": true,
"lcname": "biobrigit"
}