aiondata


Nameaiondata JSON
Version 0.7.1 PyPI version JSON
download
home_pagehttps://www.github.com/aion-labs/aiondata
SummaryA common data access layer for AI-driven drug discovery.
upload_time2024-11-20 18:37:20
maintainerNone
docs_urlNone
authorJJ Ben-Joseph
requires_python<3.13,>=3.10
licenseApache
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            📊 AionData
===========

AionData is a common data access layer designed for AI-driven drug discovery software. It provides a unified interface to access diverse biochemical databases.

Installation
------------

To install AionData, ensure you have Python 3.10 or newer installed on your system. You can install AionData via pip:

```bash
pip install aiondata
```

Datasets
--------

AionData provides access to the following datasets:

- **BindingDB**: A public, web-accessible database of measured binding affinities, focusing chiefly on the interactions of proteins considered to be drug-targets with small, drug-like molecules.

- **UniProt (Universal Protein Resource)**: UniProt provides a comprehensive, high-quality and freely accessible resource of protein sequence and functional information, which includes the manually annotated and reviewed dataset UniProtKB/Swiss-Prot.

- **ZINC**: ZINC is a free database of commercially-available compounds for virtual screening.

- **MoleculeNet**: An extensive collection of datasets curated to support and benchmark the development of machine learning models in the realm of drug discovery and chemical informatics. Covers a broad spectrum of molecular data including quantum mechanical properties, physical chemistry, biophysics, and physiological effects.

    - **Tox21**: Features qualitative toxicity measurements for 12,000 compounds across 12 targets, used for toxicity prediction.
    - **ToxCast**: ToxCast is a large-scale dataset for toxicity prediction, which includes over 600 experiments across 185 assays.
    - **ESOL**: Contains water solubility data for 1,128 compounds, aiding in solubility prediction models.
    - **FreeSolv**: Provides experimental and calculated hydration free energy for small molecules, crucial for understanding solvation.
    - **Lipophilicity**: Includes experimental measurements of octanol/water distribution coefficients (logD) for 4,200 compounds.
    - **QM7**: A dataset of 7,165 molecules with quantum mechanical properties computed using density functional theory (DFT).
    - **QM8**: Features electronic spectra and excited state energies of over 20,000 small molecules computed with TD-DFT.
    - **QM9**: Offers geometric, energetic, electronic, and thermodynamic properties of ~134k molecules computed with DFT.
    - **MUV**: Datasets designed for the validation of virtual screening techniques, with about 93,000 compounds.
    - **HIV**: Contains data on the ability of compounds to inhibit HIV replication, for binary classification tasks.
    - **BACE**: Includes quantitative binding results for inhibitors of human beta-secretase 1, with both classification and regression tasks.
    - **BBBP**: Features compounds with information on permeability properties across the Blood-Brain Barrier.
    - **SIDER**: Contains information on marketed medicines and their recorded adverse drug reactions, for side effects prediction.
    - **ClinTox**: Compares drugs approved by the FDA and those that failed clinical trials for toxicity reasons, for binary classification and toxicity prediction.

- **PDB (Protein Data Bank)**: A comprehensive, publicly available repository of 3D structural data of biological molecules. This dataset includes atomic coordinates, biological macromolecules, and complex assemblies, which are essential for understanding molecular function and designing pharmaceuticals.

- **Foldswitch Proteins**: Datasets from the paper [AlphaFold2 fails to predict protein fold switching](https://pubmed.ncbi.nlm.nih.gov/35634782/) featuring information on fold-switching proteins. These datasets provide insights into the structural dynamics and functional versatility of proteins, highlighting cases where AlphaFold2's predictive capabilities are challenged.

    - **Table S1A**: Lists pairs of proteins (PDBIDs), their lengths, and the sequence of the fold-switching region. For some pairs, only the first fold's PDBID is available if the second fold has not been solved.
    - **Table S1B**: Offers RMSD and TM-scores for the whole protein and the fold-switching fragment specifically, along with sequence identities between the fold-switching pairs.
    - **Table S1C**: Provides a list of fold-switching protein pairs (PDBID and chain) used for analysis, including TM-scores of the predictions.

- **CodNas91**: A dataset curated from the paper [Impact of protein conformational diversity on AlphaFold predictions](https://pubmed.ncbi.nlm.nih.gov/35561203/), featuring 91 proteins with varying degrees of conformational diversity. This dataset focuses on apo–holo pairs selected for their significant structural changes associated with biological processes.

- **Weizmann 3CA**: Curated Cancer Cell Atlas of collected, annotated and analyzed cancer scRNA-seq datasets from the Weizmann Institute of Science.


License
-------

AionData is licensed under the Apache License. See the LICENSE file for more details.

            

Raw data

            {
    "_id": null,
    "home_page": "https://www.github.com/aion-labs/aiondata",
    "name": "aiondata",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.13,>=3.10",
    "maintainer_email": null,
    "keywords": null,
    "author": "JJ Ben-Joseph",
    "author_email": "jj@tensorspace.ai",
    "download_url": "https://files.pythonhosted.org/packages/ef/53/7267371d73f5ec8bcd5dfd3062598cd3776b7b8826cb81b4b6e0b77c01ca/aiondata-0.7.1.tar.gz",
    "platform": null,
    "description": "\ud83d\udcca AionData\n===========\n\nAionData is a common data access layer designed for AI-driven drug discovery software. It provides a unified interface to access diverse biochemical databases.\n\nInstallation\n------------\n\nTo install AionData, ensure you have Python 3.10 or newer installed on your system. You can install AionData via pip:\n\n```bash\npip install aiondata\n```\n\nDatasets\n--------\n\nAionData provides access to the following datasets:\n\n- **BindingDB**: A public, web-accessible database of measured binding affinities, focusing chiefly on the interactions of proteins considered to be drug-targets with small, drug-like molecules.\n\n- **UniProt (Universal Protein Resource)**: UniProt provides a comprehensive, high-quality and freely accessible resource of protein sequence and functional information, which includes the manually annotated and reviewed dataset UniProtKB/Swiss-Prot.\n\n- **ZINC**: ZINC is a free database of commercially-available compounds for virtual screening.\n\n- **MoleculeNet**: An extensive collection of datasets curated to support and benchmark the development of machine learning models in the realm of drug discovery and chemical informatics. Covers a broad spectrum of molecular data including quantum mechanical properties, physical chemistry, biophysics, and physiological effects.\n\n    - **Tox21**: Features qualitative toxicity measurements for 12,000 compounds across 12 targets, used for toxicity prediction.\n    - **ToxCast**: ToxCast is a large-scale dataset for toxicity prediction, which includes over 600 experiments across 185 assays.\n    - **ESOL**: Contains water solubility data for 1,128 compounds, aiding in solubility prediction models.\n    - **FreeSolv**: Provides experimental and calculated hydration free energy for small molecules, crucial for understanding solvation.\n    - **Lipophilicity**: Includes experimental measurements of octanol/water distribution coefficients (logD) for 4,200 compounds.\n    - **QM7**: A dataset of 7,165 molecules with quantum mechanical properties computed using density functional theory (DFT).\n    - **QM8**: Features electronic spectra and excited state energies of over 20,000 small molecules computed with TD-DFT.\n    - **QM9**: Offers geometric, energetic, electronic, and thermodynamic properties of ~134k molecules computed with DFT.\n    - **MUV**: Datasets designed for the validation of virtual screening techniques, with about 93,000 compounds.\n    - **HIV**: Contains data on the ability of compounds to inhibit HIV replication, for binary classification tasks.\n    - **BACE**: Includes quantitative binding results for inhibitors of human beta-secretase 1, with both classification and regression tasks.\n    - **BBBP**: Features compounds with information on permeability properties across the Blood-Brain Barrier.\n    - **SIDER**: Contains information on marketed medicines and their recorded adverse drug reactions, for side effects prediction.\n    - **ClinTox**: Compares drugs approved by the FDA and those that failed clinical trials for toxicity reasons, for binary classification and toxicity prediction.\n\n- **PDB (Protein Data Bank)**: A comprehensive, publicly available repository of 3D structural data of biological molecules. This dataset includes atomic coordinates, biological macromolecules, and complex assemblies, which are essential for understanding molecular function and designing pharmaceuticals.\n\n- **Foldswitch Proteins**: Datasets from the paper [AlphaFold2 fails to predict protein fold switching](https://pubmed.ncbi.nlm.nih.gov/35634782/) featuring information on fold-switching proteins. These datasets provide insights into the structural dynamics and functional versatility of proteins, highlighting cases where AlphaFold2's predictive capabilities are challenged.\n\n    - **Table S1A**: Lists pairs of proteins (PDBIDs), their lengths, and the sequence of the fold-switching region. For some pairs, only the first fold's PDBID is available if the second fold has not been solved.\n    - **Table S1B**: Offers RMSD and TM-scores for the whole protein and the fold-switching fragment specifically, along with sequence identities between the fold-switching pairs.\n    - **Table S1C**: Provides a list of fold-switching protein pairs (PDBID and chain) used for analysis, including TM-scores of the predictions.\n\n- **CodNas91**: A dataset curated from the paper [Impact of protein conformational diversity on AlphaFold predictions](https://pubmed.ncbi.nlm.nih.gov/35561203/), featuring 91 proteins with varying degrees of conformational diversity. This dataset focuses on apo\u2013holo pairs selected for their significant structural changes associated with biological processes.\n\n- **Weizmann 3CA**: Curated Cancer Cell Atlas of collected, annotated and analyzed cancer scRNA-seq datasets from the Weizmann Institute of Science.\n\n\nLicense\n-------\n\nAionData is licensed under the Apache License. See the LICENSE file for more details.\n",
    "bugtrack_url": null,
    "license": "Apache",
    "summary": "A common data access layer for AI-driven drug discovery.",
    "version": "0.7.1",
    "project_urls": {
        "Homepage": "https://www.github.com/aion-labs/aiondata"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "22f42bc061a1343d5404fbf4e09bcc43728c0559e8af6ac2d3c4bd587b6f47d9",
                "md5": "a8a5fa91c55e73b0c816a4dc7a4f51aa",
                "sha256": "1f38b9545d6450fc6b06f236e3004aaa94ff5da99b573a4a532c746b0d9b7781"
            },
            "downloads": -1,
            "filename": "aiondata-0.7.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a8a5fa91c55e73b0c816a4dc7a4f51aa",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.13,>=3.10",
            "size": 23846,
            "upload_time": "2024-11-20T18:37:19",
            "upload_time_iso_8601": "2024-11-20T18:37:19.515993Z",
            "url": "https://files.pythonhosted.org/packages/22/f4/2bc061a1343d5404fbf4e09bcc43728c0559e8af6ac2d3c4bd587b6f47d9/aiondata-0.7.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ef537267371d73f5ec8bcd5dfd3062598cd3776b7b8826cb81b4b6e0b77c01ca",
                "md5": "893e7379474fc2b89e9ddbcd6927b8e7",
                "sha256": "1b52816d5092967320f520964917a8fea56897f95e9f4904406f81f0cbc47aef"
            },
            "downloads": -1,
            "filename": "aiondata-0.7.1.tar.gz",
            "has_sig": false,
            "md5_digest": "893e7379474fc2b89e9ddbcd6927b8e7",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.13,>=3.10",
            "size": 22384,
            "upload_time": "2024-11-20T18:37:20",
            "upload_time_iso_8601": "2024-11-20T18:37:20.461373Z",
            "url": "https://files.pythonhosted.org/packages/ef/53/7267371d73f5ec8bcd5dfd3062598cd3776b7b8826cb81b4b6e0b77c01ca/aiondata-0.7.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-20 18:37:20",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "aion-labs",
    "github_project": "aiondata",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "aiondata"
}
        
Elapsed time: 0.50523s