chemicalchecker


Namechemicalchecker JSON
Version 1.0.3 PyPI version JSON
download
home_pagehttp://gitlabsbnb.irbbarcelona.org/packages/chemical_checker
SummaryChemical Checker Package.
upload_time2023-07-11 15:44:26
maintainer
docs_urlNone
authorSBNB
requires_python
licenseMIT License
keywords chemicalchecker bioactivity signatures chemoinformatics
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            The Chemical Checker
====================

The Chemical Checker (CC) is a data-driven resource of small molecule
bioactivity data. The main goal of the CC is to express data in a format
that can be used off-the-shelf in daily computational drug discovery
tasks. The resource is organized in **5 levels** of increasing
complexity, ranging from the chemical properties of the compounds to
their clinical outcomes. In between, we consider targets, off-targets,
perturbed biological networks and several cell-based assays, including
gene expression, growth inhibition, and morphological profiles. The CC
is different to other integrative compounds database in almost every
aspect. The classical, relational representation of the data is
surpassed here by a less explicit, more machine-learning-friendly
abstraction of the data.

The CC resource is ever-growing and maintained by the 
`Structural Bioinformatics & Network Biology Laboratory`_ 
at the Institute for
Research in Biomedicine (`IRB Barcelona`_). Should you have any
questions, please send an email to miquel.duran@irbbarcelona.org or
patrick.aloy@irbbarcelona.org.

This project was first presented to the scientific community in the
following paper:  

    Duran-Frigola M, et al
    "**Extending the small-molecule similarity principle to all levels of biology with the Chemical Checker.**"
    Nature Biotechnology (2020) [`link`_]

and has since produced a number of `related publications`_.

.. note::
    For an overview of the CC universe please visit `bioactivitysignatures.org`_

.. _Structural Bioinformatics & Network Biology Laboratory: https://sbnb.irbbarcelona.org/
.. _IRB Barcelona: https://www.irbbarcelona.org/en
.. _related publications: https://www.bioactivitysignatures.org/publications.html
.. _link: https://www.nature.com/articles/s41587-020-0502-7
.. _BioactivitySignatures.org: https://www.bioactivitysignatures.org/


Source data and datasets
------------------------

The CC is built from public bioactivity data. We are committed to
updating the resource **every 6 months** (versions named accordingly,
e.g. ``chemical_checker_2019_01``). New datasets may be incorporated
upon request.

The basic data unit of the CC is the *dataset*. There are 5 data
*levels* (``A`` Chemistry, ``B`` Targets, ``C`` Networks, ``D`` Cells
and ``E`` Clinics) and, in turn, each level is divided into 5 sublevels
or *coordinates* (``A1``-``E5``). Each dataset belongs to one and only
one of the 25 coordinates, and each coordinate can have a finite number
of datasets (e.g. ``A1.001``), one of which is selected as being
*exemplary*.

The CC is a chemistry-first biomedical resource and, as such, it
contains several predefined compound collections that are of interest to
drug discoverers, including approved drugs, natural products, and
commercial screening libraries.


Signaturization of the data
---------------------------

The main task of the CC is to convert raw data into formats that are
suitable inputs for machine-learning toolkits such as `scikit-learn`_.

Accordingly, the backbone pipeline of the CC is devoted to processing
every dataset and converting it to a series of formats that may be
readily useful for machine learning. The main assets of the CC are the
so-called *CC signatures*:

+-------------+-------------+-------------+-------------+-------------+
| Signature   | Abbreviation| Description | Advantages  |Disadvantages|
+=============+=============+=============+=============+=============+
| Type 0      | ``sign0``   | Raw dataset | Explicit    | Possibly    |
|             |             | data,       | data.       | sparse,     |
|             |             | expressed   |             | het         |
|             |             | in a matrix |             | erogeneous, |
|             |             | format.     |             | u           |
|             |             |             |             | nprocessed. |
+-------------+-------------+-------------+-------------+-------------+
| Type 1      | ``sign1``   | PCA/LSI     | Biological  | Variables   |
|             |             | projections | signatures  | dimensions, |
|             |             | of the      | of this     | they may    |
|             |             | data,       | type can be | still be    |
|             |             | accounting  | obtained by | sparse.     |
|             |             | for 90% of  | simple      |             |
|             |             | the data.   | projection. |             |
|             |             |             | Easy to     |             |
|             |             |             | compute and |             |
|             |             |             | require no  |             |
|             |             |             | f           |             |
|             |             |             | ine-tuning. |             |
+-------------+-------------+-------------+-------------+-------------+
| Type 2      | ``sign2``   | Networ      | Fixed       | Information |
|             |             | k-embedding | -length,    | leak due to |
|             |             | of the      | usually     | similarity  |
|             |             | similarity  | acceptably  | measures.   |
|             |             | network.    | short.      | Hype        |
|             |             |             | Suitable    | r-parameter |
|             |             |             | for machine | tunning.    |
|             |             |             | learning.   |             |
|             |             |             | Capture     |             |
|             |             |             | global      |             |
|             |             |             | properties  |             |
|             |             |             | of the      |             |
|             |             |             | similarity  |             |
|             |             |             | network.    |             |
+-------------+-------------+-------------+-------------+-------------+
| Type 3      | ``sign3``   | Networ      | Fixed       | Possibly    |
|             |             | k-embedding | dimension   | very noisy, |
|             |             | of the      | and         | hence       |
|             |             | inferred    | available   | useless,    |
|             |             | similarity  | for *any*   | especially  |
|             |             | network.    | molecule.   | for         |
|             |             |             |             | low-data    |
|             |             |             |             | datasets.   |
+-------------+-------------+-------------+-------------+-------------+

.. note::
    A `Signaturizer`_ module for direct molecule signaturization is also available.

.. _scikit-learn: https://scikit-learn.org/
.. _Signaturizer: http://gitlabsbnb.irbbarcelona.org/packages/signaturizer


            

Raw data

            {
    "_id": null,
    "home_page": "http://gitlabsbnb.irbbarcelona.org/packages/chemical_checker",
    "name": "chemicalchecker",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "chemicalchecker bioactivity signatures chemoinformatics",
    "author": "SBNB",
    "author_email": "sbnb@irbbarcelona.org",
    "download_url": "https://files.pythonhosted.org/packages/7c/bb/281480f165280a0cbb763ac6d31946d60c3378700da248e877da370a5495/chemicalchecker-1.0.3.tar.gz",
    "platform": null,
    "description": "The Chemical Checker\n====================\n\nThe Chemical Checker (CC) is a data-driven resource of small molecule\nbioactivity data. The main goal of the CC is to express data in a format\nthat can be used off-the-shelf in daily computational drug discovery\ntasks. The resource is organized in **5 levels** of increasing\ncomplexity, ranging from the chemical properties of the compounds to\ntheir clinical outcomes. In between, we consider targets, off-targets,\nperturbed biological networks and several cell-based assays, including\ngene expression, growth inhibition, and morphological profiles. The CC\nis different to other integrative compounds database in almost every\naspect. The classical, relational representation of the data is\nsurpassed here by a less explicit, more machine-learning-friendly\nabstraction of the data.\n\nThe CC resource is ever-growing and maintained by the \n`Structural Bioinformatics & Network Biology Laboratory`_ \nat the Institute for\nResearch in Biomedicine (`IRB Barcelona`_). Should you have any\nquestions, please send an email to miquel.duran@irbbarcelona.org or\npatrick.aloy@irbbarcelona.org.\n\nThis project was first presented to the scientific community in the\nfollowing paper:  \n\n    Duran-Frigola M, et al\n    \"**Extending the small-molecule similarity principle to all levels of biology with the Chemical Checker.**\"\n    Nature Biotechnology (2020) [`link`_]\n\nand has since produced a number of `related publications`_.\n\n.. note::\n    For an overview of the CC universe please visit `bioactivitysignatures.org`_\n\n.. _Structural Bioinformatics & Network Biology Laboratory: https://sbnb.irbbarcelona.org/\n.. _IRB Barcelona: https://www.irbbarcelona.org/en\n.. _related publications: https://www.bioactivitysignatures.org/publications.html\n.. _link: https://www.nature.com/articles/s41587-020-0502-7\n.. _BioactivitySignatures.org: https://www.bioactivitysignatures.org/\n\n\nSource data and datasets\n------------------------\n\nThe CC is built from public bioactivity data. We are committed to\nupdating the resource **every 6 months** (versions named accordingly,\ne.g. ``chemical_checker_2019_01``). New datasets may be incorporated\nupon request.\n\nThe basic data unit of the CC is the *dataset*. There are 5 data\n*levels* (``A`` Chemistry, ``B`` Targets, ``C`` Networks, ``D`` Cells\nand ``E`` Clinics) and, in turn, each level is divided into 5 sublevels\nor *coordinates* (``A1``-``E5``). Each dataset belongs to one and only\none of the 25 coordinates, and each coordinate can have a finite number\nof datasets (e.g. ``A1.001``), one of which is selected as being\n*exemplary*.\n\nThe CC is a chemistry-first biomedical resource and, as such, it\ncontains several predefined compound collections that are of interest to\ndrug discoverers, including approved drugs, natural products, and\ncommercial screening libraries.\n\n\nSignaturization of the data\n---------------------------\n\nThe main task of the CC is to convert raw data into formats that are\nsuitable inputs for machine-learning toolkits such as `scikit-learn`_.\n\nAccordingly, the backbone pipeline of the CC is devoted to processing\nevery dataset and converting it to a series of formats that may be\nreadily useful for machine learning. The main assets of the CC are the\nso-called *CC signatures*:\n\n+-------------+-------------+-------------+-------------+-------------+\n| Signature   | Abbreviation| Description | Advantages  |Disadvantages|\n+=============+=============+=============+=============+=============+\n| Type 0      | ``sign0``   | Raw dataset | Explicit    | Possibly    |\n|             |             | data,       | data.       | sparse,     |\n|             |             | expressed   |             | het         |\n|             |             | in a matrix |             | erogeneous, |\n|             |             | format.     |             | u           |\n|             |             |             |             | nprocessed. |\n+-------------+-------------+-------------+-------------+-------------+\n| Type 1      | ``sign1``   | PCA/LSI     | Biological  | Variables   |\n|             |             | projections | signatures  | dimensions, |\n|             |             | of the      | of this     | they may    |\n|             |             | data,       | type can be | still be    |\n|             |             | accounting  | obtained by | sparse.     |\n|             |             | for 90% of  | simple      |             |\n|             |             | the data.   | projection. |             |\n|             |             |             | Easy to     |             |\n|             |             |             | compute and |             |\n|             |             |             | require no  |             |\n|             |             |             | f           |             |\n|             |             |             | ine-tuning. |             |\n+-------------+-------------+-------------+-------------+-------------+\n| Type 2      | ``sign2``   | Networ      | Fixed       | Information |\n|             |             | k-embedding | -length,    | leak due to |\n|             |             | of the      | usually     | similarity  |\n|             |             | similarity  | acceptably  | measures.   |\n|             |             | network.    | short.      | Hype        |\n|             |             |             | Suitable    | r-parameter |\n|             |             |             | for machine | tunning.    |\n|             |             |             | learning.   |             |\n|             |             |             | Capture     |             |\n|             |             |             | global      |             |\n|             |             |             | properties  |             |\n|             |             |             | of the      |             |\n|             |             |             | similarity  |             |\n|             |             |             | network.    |             |\n+-------------+-------------+-------------+-------------+-------------+\n| Type 3      | ``sign3``   | Networ      | Fixed       | Possibly    |\n|             |             | k-embedding | dimension   | very noisy, |\n|             |             | of the      | and         | hence       |\n|             |             | inferred    | available   | useless,    |\n|             |             | similarity  | for *any*   | especially  |\n|             |             | network.    | molecule.   | for         |\n|             |             |             |             | low-data    |\n|             |             |             |             | datasets.   |\n+-------------+-------------+-------------+-------------+-------------+\n\n.. note::\n    A `Signaturizer`_ module for direct molecule signaturization is also available.\n\n.. _scikit-learn: https://scikit-learn.org/\n.. _Signaturizer: http://gitlabsbnb.irbbarcelona.org/packages/signaturizer\n\n",
    "bugtrack_url": null,
    "license": "MIT License",
    "summary": "Chemical Checker Package.",
    "version": "1.0.3",
    "project_urls": {
        "Homepage": "http://gitlabsbnb.irbbarcelona.org/packages/chemical_checker"
    },
    "split_keywords": [
        "chemicalchecker",
        "bioactivity",
        "signatures",
        "chemoinformatics"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7cbb281480f165280a0cbb763ac6d31946d60c3378700da248e877da370a5495",
                "md5": "31d74a3c9a7d58b7e17e2e7a45b41832",
                "sha256": "17cd7acdf7e3ce71eec18716a728e5fefff541a7ddc904ed0c6a751eb1190603"
            },
            "downloads": -1,
            "filename": "chemicalchecker-1.0.3.tar.gz",
            "has_sig": false,
            "md5_digest": "31d74a3c9a7d58b7e17e2e7a45b41832",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 7041954,
            "upload_time": "2023-07-11T15:44:26",
            "upload_time_iso_8601": "2023-07-11T15:44:26.702394Z",
            "url": "https://files.pythonhosted.org/packages/7c/bb/281480f165280a0cbb763ac6d31946d60c3378700da248e877da370a5495/chemicalchecker-1.0.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-07-11 15:44:26",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "chemicalchecker"
}
        
Elapsed time: 0.12999s