# Signaturizer
![alt text](http://gitlabsbnb.irbbarcelona.org/packages/signaturizer/raw/master/images/cc_signatures.jpg "Molecule Signaturization")
Bioactivity signatures are multi-dimensional vectors that capture biological
traits of the molecule (for example, its target profile) in a numerical vector
format that is akin to the structural descriptors or fingerprints used in the
field of chemoinformatics.
Our **signaturizers** relate to bioactivities of 25 different types (including
target profiles, cellular response and clinical outcomes) and can be used as
drop-in replacements for chemical descriptors in day-to-day chemoinformatics
tasks.
For and overview of the different bioctivity descriptors available please check
the original Chemical Checker
[paper](https://www.nature.com/articles/s41587-020-0502-7) or
[website](https://chemicalchecker.com/)
## Installation
The only strong dependency for this resource is [RDKit](https://www.rdkit.org/docs/Install.html)
which can be installed in a local conda environment.
The installation procedure takes less than 5 minutes.
### Conda environment
```bash
conda create --no-default-packages -n sign -y python=3.7
conda activate sign
conda install -c conda-forge -y rdkit
```
### from PyPI
```bash
pip install signaturizer
```
### from Git repository
```bash
pip install git+http://gitlabsbnb.irbbarcelona.org/packages/signaturizer.git
```
## Basic Usage
### Generating Bioactivity Signatures
```python
from signaturizer import Signaturizer
# load the predictor for B1 space (representing the Mode of Action)
sign = Signaturizer('B1')
# prepare a list of SMILES strings
smiles = ['C', 'CCC']
# run prediction
results = sign.predict(smiles)
print(results.signature)
# [[-0.05777782 0.09858645 -0.09854423 ... -0.04505355 0.09859559
# 0.09859559]
# [ 0.03842233 0.10035036 -0.10023173 ... -0.07104399 0.10035563
# 0.10035574]
print(results.signature.shape)
# (2, 128)
# or save results as H5 file if you have many molecules
results = sign.predict(smiles, 'destination.h5')
```
## Speed
Generating 1000 signatures for one bioactivity spaces takes less than 4 seconds on an average machine with 4 cores.
## Advanced Usage
For an exemplary application please check the ipython [notebook](http://gitlabsbnb.irbbarcelona.org/packages/signaturizer/blob/master/notebook/signaturizer.ipynb) in the `notebook` directory (you can download it and run on [Google Colab](https://colab.research.google.com/notebooks/))
## Citing
If you use this resource in the course of your research, please consider citing
these papers:
> Bertoni M, et al<br>
> "**Bioactivity descriptors for uncharacterized chemical compounds.**"<br>
> Nature Communications (2021) [[link]](https://doi.org/10.1038/s41467-021-24150-4)
> Duran-Frigola M, et al<br>
> "**Extending the small-molecule similarity principle to all levels of biology with the Chemical Checker.**"<br>
> Nature Biotechnology (2020) [[link]](https://www.nature.com/articles/s41587-020-0502-7)
You can use these bibtex entries:
```
@article{Bertoni2021,
doi = {10.1038/s41467-021-24150-4},
url = {https://doi.org/10.1038/s41467-021-24150-4},
year = {2021},
month = jun,
publisher = {Springer Science and Business Media {LLC}},
volume = {12},
number = {1},
author = {Martino Bertoni and Miquel Duran-Frigola and Pau Badia-i-Mompel and Eduardo Pauls and Modesto Orozco-Ruiz and Oriol Guitart-Pla and V{\'{\i}}ctor Alcalde and V{\'{\i}}ctor M. Diaz and Antoni Berenguer-Llergo and Isabelle Brun-Heath and N{\'{u}}ria Villegas and Antonio Garc{\'{\i}}a de Herreros and Patrick Aloy},
title = {Bioactivity descriptors for uncharacterized chemical compounds},
journal = {Nature Communications},
abstract = {Chemical descriptors encode the physicochemical and structural properties of small molecules, and they are at the core of chemoinformatics. The broad release of bioactivity data has prompted enriched representations of compounds, reaching beyond chemical structures and capturing their known biological properties. Unfortunately, bioactivity descriptors are not available for most small molecules, which limits their applicability to a few thousand well characterized compounds. Here we present a collection of deep neural networks able to infer bioactivity signatures for any compound of interest, even when little or no experimental information is available for them. Our signaturizers relate to bioactivities of 25 different types (including target profiles, cellular response and clinical outcomes) and can be used as drop-in replacements for chemical descriptors in day-to-day chemoinformatics tasks. Indeed, we illustrate how inferred bioactivity signatures are useful to navigate the chemical space in a biologically relevant manner, unveiling higher-order organization in natural product collections, and to enrich mostly uncharacterized chemical libraries for activity against the drug-orphan target Snail1. Moreover, we implement a battery of signature-activity relationship (SigAR) models and show a substantial improvement in performance, with respect to chemistry-based classifiers, across a series of biophysics and physiology activity prediction benchmarks.},
}
```
```
@Article{Duran-Frigola2020,
author={Duran-Frigola, Miquel and Pauls, Eduardo and Guitart-Pla, Oriol and Bertoni, Martino and Alcalde, V{\'i}ctor and Amat, David and Juan-Blanco, Teresa and Aloy, Patrick},
title={Extending the small-molecule similarity principle to all levels of biology with the Chemical Checker},
journal={Nature Biotechnology},
year={2020},
month={May},
day={18},
abstract={Small molecules are usually compared by their chemical structure, but there is no unified analytic framework for representing and comparing their biological activity. We present the Chemical Checker (CC), which provides processed, harmonized and integrated bioactivity data on {\textasciitilde}800,000 small molecules. The CC divides data into five levels of increasing complexity, from the chemical properties of compounds to their clinical outcomes. In between, it includes targets, off-targets, networks and cell-level information, such as omics data, growth inhibition and morphology. Bioactivity data are expressed in a vector format, extending the concept of chemical similarity to similarity between bioactivity signatures. We show how CC signatures can aid drug discovery tasks, including target identification and library characterization. We also demonstrate the discovery of compounds that reverse and mimic biological signatures of disease models and genetic perturbations in cases that could not be addressed using chemical information alone. Overall, the CC signatures facilitate the conversion of bioactivity data to a format that is readily amenable to machine learning methods.},
issn={1546-1696},
doi={10.1038/s41587-020-0502-7},
url={https://doi.org/10.1038/s41587-020-0502-7}
}
```
Raw data
{
"_id": null,
"home_page": "http://gitlabsbnb.irbbarcelona.org/packages/signaturizer",
"name": "signaturizer",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "signaturizer bioactivity signatures chemicalchecker chemoinformatics",
"author": "Martino Bertoni",
"author_email": "martino.bertoni@irbbarcelona.org",
"download_url": "https://files.pythonhosted.org/packages/e6/6a/ca4a43ccace5dead3bc0185097d426392fedbd7d8fc00c8d15969c33b7de/signaturizer-1.1.15.tar.gz",
"platform": null,
"description": "# Signaturizer\n\n![alt text](http://gitlabsbnb.irbbarcelona.org/packages/signaturizer/raw/master/images/cc_signatures.jpg \"Molecule Signaturization\")\n\nBioactivity signatures are multi-dimensional vectors that capture biological\ntraits of the molecule (for example, its target profile) in a numerical vector\nformat that is akin to the structural descriptors or fingerprints used in the\nfield of chemoinformatics.\n\nOur **signaturizers** relate to bioactivities of 25 different types (including\ntarget profiles, cellular response and clinical outcomes) and can be used as\ndrop-in replacements for chemical descriptors in day-to-day chemoinformatics\ntasks.\n\nFor and overview of the different bioctivity descriptors available please check\nthe original Chemical Checker \n[paper](https://www.nature.com/articles/s41587-020-0502-7) or \n[website](https://chemicalchecker.com/)\n\n\n## Installation\n\nThe only strong dependency for this resource is [RDKit](https://www.rdkit.org/docs/Install.html)\nwhich can be installed in a local conda environment.\nThe installation procedure takes less than 5 minutes.\n\n### Conda environment\n\n```bash\nconda create --no-default-packages -n sign -y python=3.7\nconda activate sign\nconda install -c conda-forge -y rdkit\n```\n\n### from PyPI\n\n```bash\npip install signaturizer\n```\n\n### from Git repository\n\n```bash\npip install git+http://gitlabsbnb.irbbarcelona.org/packages/signaturizer.git\n```\n\n\n## Basic Usage\n\n### Generating Bioactivity Signatures\n\n```python\nfrom signaturizer import Signaturizer\n# load the predictor for B1 space (representing the Mode of Action)\nsign = Signaturizer('B1')\n# prepare a list of SMILES strings\nsmiles = ['C', 'CCC']\n# run prediction\nresults = sign.predict(smiles)\nprint(results.signature)\n# [[-0.05777782 0.09858645 -0.09854423 ... -0.04505355 0.09859559\n# 0.09859559]\n# [ 0.03842233 0.10035036 -0.10023173 ... -0.07104399 0.10035563\n# 0.10035574]\nprint(results.signature.shape)\n# (2, 128)\n# or save results as H5 file if you have many molecules\nresults = sign.predict(smiles, 'destination.h5')\n```\n\n## Speed\n\nGenerating 1000 signatures for one bioactivity spaces takes less than 4 seconds on an average machine with 4 cores.\n\n## Advanced Usage\n\nFor an exemplary application please check the ipython [notebook](http://gitlabsbnb.irbbarcelona.org/packages/signaturizer/blob/master/notebook/signaturizer.ipynb) in the `notebook` directory (you can download it and run on [Google Colab](https://colab.research.google.com/notebooks/))\n\n\n## Citing\n\nIf you use this resource in the course of your research, please consider citing \nthese papers:\n\n\n> Bertoni M, et al<br>\n> \"**Bioactivity descriptors for uncharacterized chemical compounds.**\"<br>\n> Nature Communications (2021) [[link]](https://doi.org/10.1038/s41467-021-24150-4)\n\n> Duran-Frigola M, et al<br>\n> \"**Extending the small-molecule similarity principle to all levels of biology with the Chemical Checker.**\"<br>\n> Nature Biotechnology (2020) [[link]](https://www.nature.com/articles/s41587-020-0502-7)\n\n\n\nYou can use these bibtex entries:\n\n```\n@article{Bertoni2021,\n doi = {10.1038/s41467-021-24150-4},\n url = {https://doi.org/10.1038/s41467-021-24150-4},\n year = {2021},\n month = jun,\n publisher = {Springer Science and Business Media {LLC}},\n volume = {12},\n number = {1},\n author = {Martino Bertoni and Miquel Duran-Frigola and Pau Badia-i-Mompel and Eduardo Pauls and Modesto Orozco-Ruiz and Oriol Guitart-Pla and V{\\'{\\i}}ctor Alcalde and V{\\'{\\i}}ctor M. Diaz and Antoni Berenguer-Llergo and Isabelle Brun-Heath and N{\\'{u}}ria Villegas and Antonio Garc{\\'{\\i}}a de Herreros and Patrick Aloy},\n title = {Bioactivity descriptors for uncharacterized chemical compounds},\n journal = {Nature Communications},\n abstract = {Chemical descriptors encode the physicochemical and structural properties of small molecules, and they are at the core of chemoinformatics. The broad release of bioactivity data has prompted enriched representations of compounds, reaching beyond chemical structures and capturing their known biological properties. Unfortunately, bioactivity descriptors are not available for most small molecules, which limits their applicability to a few thousand well characterized compounds. Here we present a collection of deep neural networks able to infer bioactivity signatures for any compound of interest, even when little or no experimental information is available for them. Our signaturizers relate to bioactivities of 25 different types (including target profiles, cellular response and clinical outcomes) and can be used as drop-in replacements for chemical descriptors in day-to-day chemoinformatics tasks. Indeed, we illustrate how inferred bioactivity signatures are useful to navigate the chemical space in a biologically relevant manner, unveiling higher-order organization in natural product collections, and to enrich mostly uncharacterized chemical libraries for activity against the drug-orphan target Snail1. Moreover, we implement a battery of signature-activity relationship (SigAR) models and show a substantial improvement in performance, with respect to chemistry-based classifiers, across a series of biophysics and physiology activity prediction benchmarks.},\n}\n```\n\n```\n@Article{Duran-Frigola2020,\n author={Duran-Frigola, Miquel and Pauls, Eduardo and Guitart-Pla, Oriol and Bertoni, Martino and Alcalde, V{\\'i}ctor and Amat, David and Juan-Blanco, Teresa and Aloy, Patrick},\n title={Extending the small-molecule similarity principle to all levels of biology with the Chemical Checker},\n journal={Nature Biotechnology},\n year={2020},\n month={May},\n day={18},\n abstract={Small molecules are usually compared by their chemical structure, but there is no unified analytic framework for representing and comparing their biological activity. We present the Chemical Checker (CC), which provides processed, harmonized and integrated bioactivity data on {\\textasciitilde}800,000 small molecules. The CC divides data into five levels of increasing complexity, from the chemical properties of compounds to their clinical outcomes. In between, it includes targets, off-targets, networks and cell-level information, such as omics data, growth inhibition and morphology. Bioactivity data are expressed in a vector format, extending the concept of chemical similarity to similarity between bioactivity signatures. We show how CC signatures can aid drug discovery tasks, including target identification and library characterization. We also demonstrate the discovery of compounds that reverse and mimic biological signatures of disease models and genetic perturbations in cases that could not be addressed using chemical information alone. Overall, the CC signatures facilitate the conversion of bioactivity data to a format that is readily amenable to machine learning methods.},\n issn={1546-1696},\n doi={10.1038/s41587-020-0502-7},\n url={https://doi.org/10.1038/s41587-020-0502-7}\n}\n```\n",
"bugtrack_url": null,
"license": "MIT License",
"summary": "Generate Chemical Checker signatures from molecules SMILES.",
"version": "1.1.15",
"project_urls": {
"Homepage": "http://gitlabsbnb.irbbarcelona.org/packages/signaturizer"
},
"split_keywords": [
"signaturizer",
"bioactivity",
"signatures",
"chemicalchecker",
"chemoinformatics"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "2fd7170684e90f4d3dcac85dd553420e50e81317a36e3890a75d31d1d107347a",
"md5": "9c9ef81b52242d5929b6e40799a84459",
"sha256": "c08512c926a757c18dd4afbe02cecb6659bdccc77f56f27378f3a74c9ee928d2"
},
"downloads": -1,
"filename": "signaturizer-1.1.15-py3-none-any.whl",
"has_sig": false,
"md5_digest": "9c9ef81b52242d5929b6e40799a84459",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 15139,
"upload_time": "2024-05-23T18:54:13",
"upload_time_iso_8601": "2024-05-23T18:54:13.723731Z",
"url": "https://files.pythonhosted.org/packages/2f/d7/170684e90f4d3dcac85dd553420e50e81317a36e3890a75d31d1d107347a/signaturizer-1.1.15-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "e66aca4a43ccace5dead3bc0185097d426392fedbd7d8fc00c8d15969c33b7de",
"md5": "1237194c81f3ca283e754592e2365e93",
"sha256": "84102845bc651c6c5e7b0456dc71c3a00e8d2e20e5592e555e7e17305e3adb01"
},
"downloads": -1,
"filename": "signaturizer-1.1.15.tar.gz",
"has_sig": false,
"md5_digest": "1237194c81f3ca283e754592e2365e93",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 89356822,
"upload_time": "2024-05-23T18:54:20",
"upload_time_iso_8601": "2024-05-23T18:54:20.930597Z",
"url": "https://files.pythonhosted.org/packages/e6/6a/ca4a43ccace5dead3bc0185097d426392fedbd7d8fc00c8d15969c33b7de/signaturizer-1.1.15.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-05-23 18:54:20",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "signaturizer"
}