biochemical-data-connectors


Namebiochemical-data-connectors JSON
Version 3.2.2 PyPI version JSON
download
home_pageNone
SummaryA Python package to extract chemical, biochemical, and bioactivity data from public databases like ORD, ChEMBL and PubChem.
upload_time2025-08-05 00:28:55
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseMIT License
keywords bioinformatics cheminformatics drug discovery chembl pubchem bioactivity api
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # biochemical-data-connectors

`biochemical-data-connectors` is a Python package for extracting chemical, biochemical, and bioactivity data from public databases like ChEMBL, PubChem, BindingDB, IUPHAR/BPS Guide to PHARMACOLOGY, and the Open Reaction Database (ORD).

## Overview
`biochemical-data-connectors` provides a simple and consistent interface to query major cheminformatics bioinformatics databases for compounds. It is designed to be a modular and reusable tool for researchers and developers in computational chemistry and drug discovery, enabling the rapid curation of high-quality datasets for machine learning and analysis.

### Key Features
1. **Bioactive Compounds**
   * **Unified Interface**: A single, easy-to-use abstract base class for fetching bioactives for a given target.
   * **Multiple Data Sources**: Includes concrete connectors for major public databases:
     1. ChEMBL (`ChemblBioactivesExtractor`)
     2. PubChem (`PubChemBioactivesExtractor`)
     3. BindingDB (`BindingDbBioactivesConnector`)
     4. IUPHAR/BPS Guide to PHARMACOLOGY (IUPHARBioactivesConnector)
   * **Powerful Filtering**: Filter compounds by bioactivity type (e.g., Kd, IC50) and potency value.
   * **Efficient Fetching**: Uses concurrency to fetch data from APIs efficiently.
2. **Chemical Reactions**
   * **Local ORD Processing**: Includes a connector (`OpenReactionDatabaseConnector`) to efficiently process a local copy of the Open Reaction Database.
   * **Reaction Role Correction**: Uses RDKit to automatically correct and reassign reactant/product roles from the source data, improving data quality.
   * **Robust SMILES Extraction**: Canonicalizes and validates SMILES strings for both reactants and products to ensure high-quality, standardized output.
   * **Memory-Efficient Processing**: Employs a generator-based extraction method, allowing for iteration over massive reaction datasets with a low memory footprint.

## Installation
You can install this package locally via:
```
pip install biochemical-data-connectors
```

## Quick Start
Here is a simple example of how to retrieve all compounds from ChEMBL with a measured Kd of less than or equal to 1000 nM for the EGFR protein (UniProt ID: `P00533`).
```
import logging
from biochemical_data_connectors import ChEMBLConnector

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# 1. Instantiate the connector for the desired database
chembl_connector = ChEMBLConnector(
    bioactivity_measure='Kd',
    bioactivity_threshold=1000.0, # in nM
    logger=logger
)

# 2. Specify the target's UniProt ID
target_uniprot_id = "P00533" # EGFR

# 3. Get the bioactive compounds
print(f"Fetching bioactive compounds for {target_uniprot_id} from ChEMBL...")
smiles_list = chembl_connector.get_bioactive_compounds(target_uniprot_id)

# 4. Print the results
if smiles_list:
    print(f"\nFound {len(smiles_list)} compounds.")
    print("First 5 compounds:")
    for smiles in smiles_list[:5]:
        print(smiles)
else:
    print("No compounds found matching the criteria.")
```

## Package Structure
```
biochemical-data-connectors/
├── pyproject.toml
├── requirements-dev.txt
├── src/
│   └── biochemical_data_connectors/
│       ├── __init__.py
│       ├── constants.py
│       ├── models.py
│       ├── connectors/
│       │   ├── __init__.py
│       │   ├── ord_connectors.py
│       │   └── bioactive_compounds
│       │       ├── __init__.py
│       │       ├── base_bioactives_connector.py
│       │       ├── bindingdb_bioactives_connector.py
│       │       ├── chembl_bioactives_connector.py
│       │       ├── iuphar_bioactives_connector.py
│       │       └── pubchem_bioactives_connector.py
│       └── utils/
│           ├── __init__.py
│           ├── files_utils.py
│           ├── iter_utils.py
│           ├── standardization_utils.py
│           └── api/
│               ├── __init__.py
│               ├── base_api.py
│               ├── bindingbd_api.py
│               ├── chembl_api.py
│               ├── iuphar_api.py
│               ├── mappings.py
│               └── pubchem_api.py
├── tests/
│   └── ...
└── README.md
```

## License
This project is licensed under the terms of the [MIT License](https://opensource.org/license/mit).

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "biochemical-data-connectors",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "bioinformatics, cheminformatics, drug discovery, chembl, pubchem, bioactivity, api",
    "author": null,
    "author_email": "Christopher van den Berg <cvandenberg1105@googlemail.com>",
    "download_url": "https://files.pythonhosted.org/packages/5a/e4/59e91a717041deb544cdeb8b3cfcecee31a938ea3978300c45013af28ea8/biochemical_data_connectors-3.2.2.tar.gz",
    "platform": null,
    "description": "# biochemical-data-connectors\n\n`biochemical-data-connectors` is a Python package for extracting chemical, biochemical, and bioactivity data from public databases like ChEMBL, PubChem, BindingDB, IUPHAR/BPS Guide to PHARMACOLOGY, and the Open Reaction Database (ORD).\n\n## Overview\n`biochemical-data-connectors` provides a simple and consistent interface to query major cheminformatics bioinformatics databases for compounds. It is designed to be a modular and reusable tool for researchers and developers in computational chemistry and drug discovery, enabling the rapid curation of high-quality datasets for machine learning and analysis.\n\n### Key Features\n1. **Bioactive Compounds**\n   * **Unified Interface**: A single, easy-to-use abstract base class for fetching bioactives for a given target.\n   * **Multiple Data Sources**: Includes concrete connectors for major public databases:\n     1. ChEMBL (`ChemblBioactivesExtractor`)\n     2. PubChem (`PubChemBioactivesExtractor`)\n     3. BindingDB (`BindingDbBioactivesConnector`)\n     4. IUPHAR/BPS Guide to PHARMACOLOGY (IUPHARBioactivesConnector)\n   * **Powerful Filtering**: Filter compounds by bioactivity type (e.g., Kd, IC50) and potency value.\n   * **Efficient Fetching**: Uses concurrency to fetch data from APIs efficiently.\n2. **Chemical Reactions**\n   * **Local ORD Processing**: Includes a connector (`OpenReactionDatabaseConnector`) to efficiently process a local copy of the Open Reaction Database.\n   * **Reaction Role Correction**: Uses RDKit to automatically correct and reassign reactant/product roles from the source data, improving data quality.\n   * **Robust SMILES Extraction**: Canonicalizes and validates SMILES strings for both reactants and products to ensure high-quality, standardized output.\n   * **Memory-Efficient Processing**: Employs a generator-based extraction method, allowing for iteration over massive reaction datasets with a low memory footprint.\n\n## Installation\nYou can install this package locally via:\n```\npip install biochemical-data-connectors\n```\n\n## Quick Start\nHere is a simple example of how to retrieve all compounds from ChEMBL with a measured Kd of less than or equal to 1000 nM for the EGFR protein (UniProt ID: `P00533`).\n```\nimport logging\nfrom biochemical_data_connectors import ChEMBLConnector\n\nlogging.basicConfig(level=logging.INFO)\nlogger = logging.getLogger(__name__)\n\n# 1. Instantiate the connector for the desired database\nchembl_connector = ChEMBLConnector(\n    bioactivity_measure='Kd',\n    bioactivity_threshold=1000.0, # in nM\n    logger=logger\n)\n\n# 2. Specify the target's UniProt ID\ntarget_uniprot_id = \"P00533\" # EGFR\n\n# 3. Get the bioactive compounds\nprint(f\"Fetching bioactive compounds for {target_uniprot_id} from ChEMBL...\")\nsmiles_list = chembl_connector.get_bioactive_compounds(target_uniprot_id)\n\n# 4. Print the results\nif smiles_list:\n    print(f\"\\nFound {len(smiles_list)} compounds.\")\n    print(\"First 5 compounds:\")\n    for smiles in smiles_list[:5]:\n        print(smiles)\nelse:\n    print(\"No compounds found matching the criteria.\")\n```\n\n## Package Structure\n```\nbiochemical-data-connectors/\n\u251c\u2500\u2500 pyproject.toml\n\u251c\u2500\u2500 requirements-dev.txt\n\u251c\u2500\u2500 src/\n\u2502   \u2514\u2500\u2500 biochemical_data_connectors/\n\u2502       \u251c\u2500\u2500 __init__.py\n\u2502       \u251c\u2500\u2500 constants.py\n\u2502       \u251c\u2500\u2500 models.py\n\u2502       \u251c\u2500\u2500 connectors/\n\u2502       \u2502   \u251c\u2500\u2500 __init__.py\n\u2502       \u2502   \u251c\u2500\u2500 ord_connectors.py\n\u2502       \u2502   \u2514\u2500\u2500 bioactive_compounds\n\u2502       \u2502       \u251c\u2500\u2500 __init__.py\n\u2502       \u2502       \u251c\u2500\u2500 base_bioactives_connector.py\n\u2502       \u2502       \u251c\u2500\u2500 bindingdb_bioactives_connector.py\n\u2502       \u2502       \u251c\u2500\u2500 chembl_bioactives_connector.py\n\u2502       \u2502       \u251c\u2500\u2500 iuphar_bioactives_connector.py\n\u2502       \u2502       \u2514\u2500\u2500 pubchem_bioactives_connector.py\n\u2502       \u2514\u2500\u2500 utils/\n\u2502           \u251c\u2500\u2500 __init__.py\n\u2502           \u251c\u2500\u2500 files_utils.py\n\u2502           \u251c\u2500\u2500 iter_utils.py\n\u2502           \u251c\u2500\u2500 standardization_utils.py\n\u2502           \u2514\u2500\u2500 api/\n\u2502               \u251c\u2500\u2500 __init__.py\n\u2502               \u251c\u2500\u2500 base_api.py\n\u2502               \u251c\u2500\u2500 bindingbd_api.py\n\u2502               \u251c\u2500\u2500 chembl_api.py\n\u2502               \u251c\u2500\u2500 iuphar_api.py\n\u2502               \u251c\u2500\u2500 mappings.py\n\u2502               \u2514\u2500\u2500 pubchem_api.py\n\u251c\u2500\u2500 tests/\n\u2502   \u2514\u2500\u2500 ...\n\u2514\u2500\u2500 README.md\n```\n\n## License\nThis project is licensed under the terms of the [MIT License](https://opensource.org/license/mit).\n",
    "bugtrack_url": null,
    "license": "MIT License",
    "summary": "A Python package to extract chemical, biochemical, and bioactivity data from public databases like ORD, ChEMBL and PubChem.",
    "version": "3.2.2",
    "project_urls": {
        "Bug Tracker": "https://github.com/your-username/biochemical-data-connectors/issues",
        "Repository": "https://github.com/your-username/biochemical-data-connectors"
    },
    "split_keywords": [
        "bioinformatics",
        " cheminformatics",
        " drug discovery",
        " chembl",
        " pubchem",
        " bioactivity",
        " api"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "9a2aa79d736199db9eb4afed367e3132f0ca8e87b238084a9262a8b19ec73935",
                "md5": "cfb89ee0faacedd708d765db60c1a121",
                "sha256": "0b31f8beda60056c6ad9063bc3b1be62f13caa69517ee1bfcedf2b89c373db42"
            },
            "downloads": -1,
            "filename": "biochemical_data_connectors-3.2.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "cfb89ee0faacedd708d765db60c1a121",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 38869,
            "upload_time": "2025-08-05T00:28:53",
            "upload_time_iso_8601": "2025-08-05T00:28:53.793737Z",
            "url": "https://files.pythonhosted.org/packages/9a/2a/a79d736199db9eb4afed367e3132f0ca8e87b238084a9262a8b19ec73935/biochemical_data_connectors-3.2.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "5ae459e91a717041deb544cdeb8b3cfcecee31a938ea3978300c45013af28ea8",
                "md5": "7049bd37b79a2b1dc3239267815d90ab",
                "sha256": "dcc588bc2ad3fbec1d0c7d836d971b5fbd464b5b49e354ada73a761da0933a68"
            },
            "downloads": -1,
            "filename": "biochemical_data_connectors-3.2.2.tar.gz",
            "has_sig": false,
            "md5_digest": "7049bd37b79a2b1dc3239267815d90ab",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 27627,
            "upload_time": "2025-08-05T00:28:55",
            "upload_time_iso_8601": "2025-08-05T00:28:55.511380Z",
            "url": "https://files.pythonhosted.org/packages/5a/e4/59e91a717041deb544cdeb8b3cfcecee31a938ea3978300c45013af28ea8/biochemical_data_connectors-3.2.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-05 00:28:55",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "your-username",
    "github_project": "biochemical-data-connectors",
    "github_not_found": true,
    "lcname": "biochemical-data-connectors"
}
        
Elapsed time: 1.70978s