📊 AionData
===========
AionData is a common data access layer designed for AI-driven drug discovery software. It provides a unified interface to access diverse biochemical databases.
Installation
------------
To install AionData, ensure you have Python 3.10 or newer installed on your system. You can install AionData via pip:
```bash
pip install aiondata
```
Datasets
--------
AionData provides access to the following datasets:
- **BindingDB**: A public, web-accessible database of measured binding affinities, focusing chiefly on the interactions of proteins considered to be drug-targets with small, drug-like molecules.
- **UniProt (Universal Protein Resource)**: UniProt provides a comprehensive, high-quality and freely accessible resource of protein sequence and functional information, which includes the manually annotated and reviewed dataset UniProtKB/Swiss-Prot.
- **ZINC**: ZINC is a free database of commercially-available compounds for virtual screening.
- **MoleculeNet**: An extensive collection of datasets curated to support and benchmark the development of machine learning models in the realm of drug discovery and chemical informatics. Covers a broad spectrum of molecular data including quantum mechanical properties, physical chemistry, biophysics, and physiological effects.
- **Tox21**: Features qualitative toxicity measurements for 12,000 compounds across 12 targets, used for toxicity prediction.
- **ToxCast**: ToxCast is a large-scale dataset for toxicity prediction, which includes over 600 experiments across 185 assays.
- **ESOL**: Contains water solubility data for 1,128 compounds, aiding in solubility prediction models.
- **FreeSolv**: Provides experimental and calculated hydration free energy for small molecules, crucial for understanding solvation.
- **Lipophilicity**: Includes experimental measurements of octanol/water distribution coefficients (logD) for 4,200 compounds.
- **QM7**: A dataset of 7,165 molecules with quantum mechanical properties computed using density functional theory (DFT).
- **QM8**: Features electronic spectra and excited state energies of over 20,000 small molecules computed with TD-DFT.
- **QM9**: Offers geometric, energetic, electronic, and thermodynamic properties of ~134k molecules computed with DFT.
- **MUV**: Datasets designed for the validation of virtual screening techniques, with about 93,000 compounds.
- **HIV**: Contains data on the ability of compounds to inhibit HIV replication, for binary classification tasks.
- **BACE**: Includes quantitative binding results for inhibitors of human beta-secretase 1, with both classification and regression tasks.
- **BBBP**: Features compounds with information on permeability properties across the Blood-Brain Barrier.
- **SIDER**: Contains information on marketed medicines and their recorded adverse drug reactions, for side effects prediction.
- **ClinTox**: Compares drugs approved by the FDA and those that failed clinical trials for toxicity reasons, for binary classification and toxicity prediction.
- **PDB (Protein Data Bank)**: A comprehensive, publicly available repository of 3D structural data of biological molecules. This dataset includes atomic coordinates, biological macromolecules, and complex assemblies, which are essential for understanding molecular function and designing pharmaceuticals.
- **Foldswitch Proteins**: Datasets from the paper [AlphaFold2 fails to predict protein fold switching](https://pubmed.ncbi.nlm.nih.gov/35634782/) featuring information on fold-switching proteins. These datasets provide insights into the structural dynamics and functional versatility of proteins, highlighting cases where AlphaFold2's predictive capabilities are challenged.
- **Table S1A**: Lists pairs of proteins (PDBIDs), their lengths, and the sequence of the fold-switching region. For some pairs, only the first fold's PDBID is available if the second fold has not been solved.
- **Table S1B**: Offers RMSD and TM-scores for the whole protein and the fold-switching fragment specifically, along with sequence identities between the fold-switching pairs.
- **Table S1C**: Provides a list of fold-switching protein pairs (PDBID and chain) used for analysis, including TM-scores of the predictions.
- **CodNas91**: A dataset curated from the paper [Impact of protein conformational diversity on AlphaFold predictions](https://pubmed.ncbi.nlm.nih.gov/35561203/), featuring 91 proteins with varying degrees of conformational diversity. This dataset focuses on apo–holo pairs selected for their significant structural changes associated with biological processes.
- **Weizmann 3CA**: Curated Cancer Cell Atlas of collected, annotated and analyzed cancer scRNA-seq datasets from the Weizmann Institute of Science.
License
-------
AionData is licensed under the Apache License. See the LICENSE file for more details.
Raw data
{
"_id": null,
"home_page": "https://www.github.com/aion-labs/aiondata",
"name": "aiondata",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.13,>=3.10",
"maintainer_email": null,
"keywords": null,
"author": "JJ Ben-Joseph",
"author_email": "jj@tensorspace.ai",
"download_url": "https://files.pythonhosted.org/packages/ef/53/7267371d73f5ec8bcd5dfd3062598cd3776b7b8826cb81b4b6e0b77c01ca/aiondata-0.7.1.tar.gz",
"platform": null,
"description": "\ud83d\udcca AionData\n===========\n\nAionData is a common data access layer designed for AI-driven drug discovery software. It provides a unified interface to access diverse biochemical databases.\n\nInstallation\n------------\n\nTo install AionData, ensure you have Python 3.10 or newer installed on your system. You can install AionData via pip:\n\n```bash\npip install aiondata\n```\n\nDatasets\n--------\n\nAionData provides access to the following datasets:\n\n- **BindingDB**: A public, web-accessible database of measured binding affinities, focusing chiefly on the interactions of proteins considered to be drug-targets with small, drug-like molecules.\n\n- **UniProt (Universal Protein Resource)**: UniProt provides a comprehensive, high-quality and freely accessible resource of protein sequence and functional information, which includes the manually annotated and reviewed dataset UniProtKB/Swiss-Prot.\n\n- **ZINC**: ZINC is a free database of commercially-available compounds for virtual screening.\n\n- **MoleculeNet**: An extensive collection of datasets curated to support and benchmark the development of machine learning models in the realm of drug discovery and chemical informatics. Covers a broad spectrum of molecular data including quantum mechanical properties, physical chemistry, biophysics, and physiological effects.\n\n - **Tox21**: Features qualitative toxicity measurements for 12,000 compounds across 12 targets, used for toxicity prediction.\n - **ToxCast**: ToxCast is a large-scale dataset for toxicity prediction, which includes over 600 experiments across 185 assays.\n - **ESOL**: Contains water solubility data for 1,128 compounds, aiding in solubility prediction models.\n - **FreeSolv**: Provides experimental and calculated hydration free energy for small molecules, crucial for understanding solvation.\n - **Lipophilicity**: Includes experimental measurements of octanol/water distribution coefficients (logD) for 4,200 compounds.\n - **QM7**: A dataset of 7,165 molecules with quantum mechanical properties computed using density functional theory (DFT).\n - **QM8**: Features electronic spectra and excited state energies of over 20,000 small molecules computed with TD-DFT.\n - **QM9**: Offers geometric, energetic, electronic, and thermodynamic properties of ~134k molecules computed with DFT.\n - **MUV**: Datasets designed for the validation of virtual screening techniques, with about 93,000 compounds.\n - **HIV**: Contains data on the ability of compounds to inhibit HIV replication, for binary classification tasks.\n - **BACE**: Includes quantitative binding results for inhibitors of human beta-secretase 1, with both classification and regression tasks.\n - **BBBP**: Features compounds with information on permeability properties across the Blood-Brain Barrier.\n - **SIDER**: Contains information on marketed medicines and their recorded adverse drug reactions, for side effects prediction.\n - **ClinTox**: Compares drugs approved by the FDA and those that failed clinical trials for toxicity reasons, for binary classification and toxicity prediction.\n\n- **PDB (Protein Data Bank)**: A comprehensive, publicly available repository of 3D structural data of biological molecules. This dataset includes atomic coordinates, biological macromolecules, and complex assemblies, which are essential for understanding molecular function and designing pharmaceuticals.\n\n- **Foldswitch Proteins**: Datasets from the paper [AlphaFold2 fails to predict protein fold switching](https://pubmed.ncbi.nlm.nih.gov/35634782/) featuring information on fold-switching proteins. These datasets provide insights into the structural dynamics and functional versatility of proteins, highlighting cases where AlphaFold2's predictive capabilities are challenged.\n\n - **Table S1A**: Lists pairs of proteins (PDBIDs), their lengths, and the sequence of the fold-switching region. For some pairs, only the first fold's PDBID is available if the second fold has not been solved.\n - **Table S1B**: Offers RMSD and TM-scores for the whole protein and the fold-switching fragment specifically, along with sequence identities between the fold-switching pairs.\n - **Table S1C**: Provides a list of fold-switching protein pairs (PDBID and chain) used for analysis, including TM-scores of the predictions.\n\n- **CodNas91**: A dataset curated from the paper [Impact of protein conformational diversity on AlphaFold predictions](https://pubmed.ncbi.nlm.nih.gov/35561203/), featuring 91 proteins with varying degrees of conformational diversity. This dataset focuses on apo\u2013holo pairs selected for their significant structural changes associated with biological processes.\n\n- **Weizmann 3CA**: Curated Cancer Cell Atlas of collected, annotated and analyzed cancer scRNA-seq datasets from the Weizmann Institute of Science.\n\n\nLicense\n-------\n\nAionData is licensed under the Apache License. See the LICENSE file for more details.\n",
"bugtrack_url": null,
"license": "Apache",
"summary": "A common data access layer for AI-driven drug discovery.",
"version": "0.7.1",
"project_urls": {
"Homepage": "https://www.github.com/aion-labs/aiondata"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "22f42bc061a1343d5404fbf4e09bcc43728c0559e8af6ac2d3c4bd587b6f47d9",
"md5": "a8a5fa91c55e73b0c816a4dc7a4f51aa",
"sha256": "1f38b9545d6450fc6b06f236e3004aaa94ff5da99b573a4a532c746b0d9b7781"
},
"downloads": -1,
"filename": "aiondata-0.7.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "a8a5fa91c55e73b0c816a4dc7a4f51aa",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.13,>=3.10",
"size": 23846,
"upload_time": "2024-11-20T18:37:19",
"upload_time_iso_8601": "2024-11-20T18:37:19.515993Z",
"url": "https://files.pythonhosted.org/packages/22/f4/2bc061a1343d5404fbf4e09bcc43728c0559e8af6ac2d3c4bd587b6f47d9/aiondata-0.7.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "ef537267371d73f5ec8bcd5dfd3062598cd3776b7b8826cb81b4b6e0b77c01ca",
"md5": "893e7379474fc2b89e9ddbcd6927b8e7",
"sha256": "1b52816d5092967320f520964917a8fea56897f95e9f4904406f81f0cbc47aef"
},
"downloads": -1,
"filename": "aiondata-0.7.1.tar.gz",
"has_sig": false,
"md5_digest": "893e7379474fc2b89e9ddbcd6927b8e7",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.13,>=3.10",
"size": 22384,
"upload_time": "2024-11-20T18:37:20",
"upload_time_iso_8601": "2024-11-20T18:37:20.461373Z",
"url": "https://files.pythonhosted.org/packages/ef/53/7267371d73f5ec8bcd5dfd3062598cd3776b7b8826cb81b4b6e0b77c01ca/aiondata-0.7.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-20 18:37:20",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "aion-labs",
"github_project": "aiondata",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "aiondata"
}