<div align="center">
<img src="https://raw.githubusercontent.com/HzaCode/ChemInformant/main/images/logo.png" width="200px" />
# ChemInformant
*A Robust Data Acquisition Engine for the Modern Scientific Workflow*
<br>
[](https://pepy.tech/project/cheminformant)
<p>
<a href="https://doi.org/10.21105/joss.08341">
<img src="https://joss.theoj.org/papers/10.21105/joss.08341/status.svg" alt="DOI">
</a>
<a href="https://pypi.org/project/ChemInformant/">
<img src="https://img.shields.io/pypi/v/ChemInformant.svg" alt="PyPI version">
</a>
<a href="https://pypi.org/project/ChemInformant/">
<img src="https://img.shields.io/badge/python-%3E%3D3.8-blue.svg" alt="Python Version">
</a>
<a href="https://github.com/HzaCode/ChemInformant/blob/main/LICENSE.md">
<img src="https://img.shields.io/pypi/l/ChemInformant.svg" alt="License">
</a>
</p>
<p>
<a href="https://github.com/HzaCode/ChemInformant/actions/workflows/tests.yml">
<img src="https://img.shields.io/github/actions/workflow/status/HzaCode/ChemInformant/tests.yml?label=Build" alt="Build Status">
</a>
<a href="https://cdn.jsdelivr.net/gh/HzaCode/ChemInformant@gh-pages/coverage.svg">
<img src="https://cdn.jsdelivr.net/gh/HzaCode/ChemInformant@gh-pages/coverage.svg" alt="coverage">
</a>
<a href="https://github.com/astral-sh/ruff">
<img src="https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json" alt="Ruff">
</a>
<a href="https://app.codacy.com/gh/HzaCode/ChemInformant/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade">
<img src="https://app.codacy.com/project/badge/Grade/ba35e3e2f5224858bcaeb8f9c4ee2838" alt="Codacy Badge">
</a>
</p>
</div>
---
**ChemInformant** is a robust data acquisition engine for the [PubChem](https://pubchem.ncbi.nlm.nih.gov/) database, engineered for the modern scientific workflow. It intelligently manages network requests, performs rigorous runtime data validation, and delivers analysis-ready results, providing a dependable foundation for any computational chemistry project in Python.
---
### ✨ Key Features
* **Analysis-Ready Pandas/SQL Output:** The core API (`get_properties`) returns either a clean Pandas DataFrame or a direct SQL output, eliminating data wrangling boilerplate and enabling immediate integration with both the Python data science ecosystem and modern database workflows.
* **Automated Network Reliability:** Ensures your workflows run flawlessly with built-in persistent caching, smart rate-limiting, and automatic retries. It also transparently handles API pagination (`ListKey`) for large-scale queries, delivering complete result sets without any manual intervention.
* **Flexible & Fault-Tolerant Input:** Natively accepts mixed lists of identifiers (names, CIDs, SMILES) and intelligently handles any invalid inputs by flagging them with a clear status in the output, ensuring a single bad entry never fails an entire batch operation.
* **A Dual API for Simplicity and Power:** Offers a clear `get_<property>()` convenience layer for quick lookups, backed by a powerful `get_properties` engine for high-performance batch operations.
* **Guaranteed Data Integrity:** Employs Pydantic v2 models for rigorous, runtime data validation when using the object-based API, preventing malformed or unexpected data from corrupting your analysis pipeline.
* **Terminal-Ready CLI Tools:** Includes `chemfetch` and `chemdraw` for rapid data retrieval and 2D structure visualization directly from your terminal, perfect for quick lookups without writing a script.
* **Modern and Actively Maintained:** Built on a contemporary tech stack for long-term consistency and compatibility, providing a reliable alternative to older or less frequently updated libraries.
---
### 📦 Installation
Install the library from PyPI:
```bash
pip install ChemInformant
```
To include plotting capabilities for use with the tutorial, install the `[plot]` extra:
```bash
pip install "ChemInformant[plot]"
```
---
### 🚀 Quick Start
Retrieve multiple properties for multiple compounds, directly into a Pandas DataFrame, in a single function call:
```python
import ChemInformant as ci
# 1. Define your identifiers
identifiers = ["aspirin", "caffeine", 1983] # 1983 is paracetamol's CID
# 2. Specify the properties you need
properties = ["molecular_weight", "xlogp", "cas"]
# 3. Call the core function
df = ci.get_properties(identifiers, properties)
# 4. Save the results to an SQL database
ci.df_to_sql(df, "sqlite:///chem_data.db", "results", if_exists="replace")
# 5. Analyze your results!
print(df)
```
**Output:**
```
input_identifier cid status molecular_weight xlogp cas
0 aspirin 2244 OK 180.16 1.2 50-78-2
1 caffeine 2519 OK 194.19 -0.1 58-08-2
2 1983 1983 OK 151.16 0.5 103-90-2
```
<details>
<summary><b>➡️ Click to see Convenience API Cheatsheet</b></summary>
<br>
| Function | Description |
| -------------------------- | ------------------------------------------------------------- |
| `get_weight(id)` | Molecular weight *(float)* |
| `get_formula(id)` | Molecular formula *(str)* |
| `get_cas(id)` | CAS Registry Number *(str)* |
| `get_iupac_name(id)` | IUPAC name *(str)* |
| `get_canonical_smiles(id)` | Canonical SMILES with Canonical→Connectivity fallback *(str)* |
| `get_isomeric_smiles(id)` | Isomeric SMILES with Isomeric→SMILES fallback *(str)* |
| `get_xlogp(id)` | XLogP (calculated hydrophobicity) *(float)* |
| `get_synonyms(id)` | List of synonyms *(List\[str])* |
| `get_compound(id)` | Full, validated **`Compound`** object (Pydantic v2 model) |
*Note: This table shows key convenience functions for demonstration. ChemInformant provides **22 convenience functions** in total, covering molecular descriptors, mass properties, stereochemistry, and more.*
*All functions accept a **CID, name, or SMILES** and return `None`/`[]` on failure.*
</details>
ChemInformant also includes handy command-line tools for quick lookups directly from your terminal:
* **`chemfetch`**: Fetches properties for one or more compounds.
```bash
chemfetch aspirin --props "cas,molecular_weight,iupac_name"
```
* **`chemdraw`**: Renders the 2D structure of a compound.
```bash
chemdraw aspirin
```
<p align="center">
<img src="https://raw.githubusercontent.com/HzaCode/ChemInformant/main/wide-cli-demo.gif" width="100%">
</p>
---
### 📚 Documentation & Examples
For a deep dive, please see our detailed guides:
* **➡️ Online Documentation:** The **[official documentation site](https://hezhiang.com/ChemInformant)** contains complete API references, guides, and usage examples. **This is the most comprehensive resource.**
* **➡️ Interactive User Manual:** Our [**Jupyter Notebook Tutorial**](examples/ChemInformant_User_Manual_v1.0.ipynb) provides a complete, end-to-end walkthrough. This is the best place to start for a hands-on experience.
* **➡️ Performance Benchmarks:** You can review and run our [**Benchmark Script**](./benchmark.py) to see the performance advantages of batching and caching.
---
### 🤔 Why ChemInformant?
> ChemInformant's core mission is to serve as a high-performance data backbone for the Python cheminformatics ecosystem. By delivering clean, validated, and analysis-ready Pandas DataFrames, it enables researchers to effortlessly pipe PubChem data into powerful toolkits like RDKit, Scikit-learn, or custom machine learning models, transforming multi-step data acquisition and wrangling tasks into single, elegant lines of code.
A detailed comparison with other existing tools is provided in our [JOSS paper](https://github.com/HzaCode/ChemInformant/blob/main/paper/paper.md).
### 🤝 Contributing
Contributions are welcome! For guidelines on how to get started, please read our [contributing guide](https://github.com/HzaCode/ChemInformant/blob/main/CONTRIBUTING.md). You can [open an issue](https://github.com/HzaCode/ChemInformant/issues) to report bugs or suggest features, or [submit a pull request](https://github.com/HzaCode/ChemInformant/pulls) to contribute code.
### 📄 License
This project is licensed under the MIT License - see the [LICENSE](LICENSE.md) file for details.
### 📑 Citation
```bibtex
@article{He2025,
doi = {10.21105/joss.08341},
url = {https://doi.org/10.21105/joss.08341},
year = {2025},
publisher = {The Open Journal},
volume = {10},
number = {112},
pages = {8341},
author = {He, Zhiang},
title = {ChemInformant: A Robust and Workflow-Centric Python Client for High-Throughput PubChem Access},
journal = {Journal of Open Source Software}
}
Raw data
{
"_id": null,
"home_page": null,
"name": "ChemInformant",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "chemistry, cheminformatics, pubchem, api, compound, drug, cache, pydantic, batch, smiles, sql",
"author": null,
"author_email": "Zhiang He <ang@hezhiang.com>",
"download_url": "https://files.pythonhosted.org/packages/58/5f/fc2167d8ad040eb66ccb9a152c4f4c1f3c42cb374bca8a4be12025cb1f7a/cheminformant-2.4.2.tar.gz",
"platform": null,
"description": "\r\n<div align=\"center\">\r\n\r\n<img src=\"https://raw.githubusercontent.com/HzaCode/ChemInformant/main/images/logo.png\" width=\"200px\" />\r\n\r\n# ChemInformant\r\n\r\n*A Robust Data Acquisition Engine for the Modern Scientific Workflow*\r\n\r\n<br>\r\n\r\n[](https://pepy.tech/project/cheminformant)\r\n\r\n<p>\r\n <a href=\"https://doi.org/10.21105/joss.08341\">\r\n <img src=\"https://joss.theoj.org/papers/10.21105/joss.08341/status.svg\" alt=\"DOI\">\r\n </a>\r\n <a href=\"https://pypi.org/project/ChemInformant/\">\r\n <img src=\"https://img.shields.io/pypi/v/ChemInformant.svg\" alt=\"PyPI version\">\r\n </a>\r\n <a href=\"https://pypi.org/project/ChemInformant/\">\r\n <img src=\"https://img.shields.io/badge/python-%3E%3D3.8-blue.svg\" alt=\"Python Version\">\r\n </a>\r\n <a href=\"https://github.com/HzaCode/ChemInformant/blob/main/LICENSE.md\">\r\n <img src=\"https://img.shields.io/pypi/l/ChemInformant.svg\" alt=\"License\">\r\n </a>\r\n</p>\r\n\r\n<p>\r\n <a href=\"https://github.com/HzaCode/ChemInformant/actions/workflows/tests.yml\">\r\n <img src=\"https://img.shields.io/github/actions/workflow/status/HzaCode/ChemInformant/tests.yml?label=Build\" alt=\"Build Status\">\r\n </a>\r\n <a href=\"https://cdn.jsdelivr.net/gh/HzaCode/ChemInformant@gh-pages/coverage.svg\">\r\n <img src=\"https://cdn.jsdelivr.net/gh/HzaCode/ChemInformant@gh-pages/coverage.svg\" alt=\"coverage\">\r\n </a>\r\n <a href=\"https://github.com/astral-sh/ruff\">\r\n <img src=\"https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json\" alt=\"Ruff\">\r\n </a>\r\n <a href=\"https://app.codacy.com/gh/HzaCode/ChemInformant/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade\">\r\n <img src=\"https://app.codacy.com/project/badge/Grade/ba35e3e2f5224858bcaeb8f9c4ee2838\" alt=\"Codacy Badge\">\r\n </a>\r\n</p>\r\n\r\n</div>\r\n\r\n---\r\n\r\n**ChemInformant** is a robust data acquisition engine for the [PubChem](https://pubchem.ncbi.nlm.nih.gov/) database, engineered for the modern scientific workflow. It intelligently manages network requests, performs rigorous runtime data validation, and delivers analysis-ready results, providing a dependable foundation for any computational chemistry project in Python.\r\n\r\n---\r\n\r\n### \u2728 Key Features\r\n\r\n* **Analysis-Ready Pandas/SQL Output:** The core API (`get_properties`) returns either a clean Pandas DataFrame or a direct SQL output, eliminating data wrangling boilerplate and enabling immediate integration with both the Python data science ecosystem and modern database workflows.\r\n\r\n* **Automated Network Reliability:** Ensures your workflows run flawlessly with built-in persistent caching, smart rate-limiting, and automatic retries. It also transparently handles API pagination (`ListKey`) for large-scale queries, delivering complete result sets without any manual intervention.\r\n\r\n* **Flexible & Fault-Tolerant Input:** Natively accepts mixed lists of identifiers (names, CIDs, SMILES) and intelligently handles any invalid inputs by flagging them with a clear status in the output, ensuring a single bad entry never fails an entire batch operation.\r\n\r\n* **A Dual API for Simplicity and Power:** Offers a clear `get_<property>()` convenience layer for quick lookups, backed by a powerful `get_properties` engine for high-performance batch operations.\r\n\r\n* **Guaranteed Data Integrity:** Employs Pydantic v2 models for rigorous, runtime data validation when using the object-based API, preventing malformed or unexpected data from corrupting your analysis pipeline.\r\n\r\n* **Terminal-Ready CLI Tools:** Includes `chemfetch` and `chemdraw` for rapid data retrieval and 2D structure visualization directly from your terminal, perfect for quick lookups without writing a script.\r\n\r\n* **Modern and Actively Maintained:** Built on a contemporary tech stack for long-term consistency and compatibility, providing a reliable alternative to older or less frequently updated libraries.\r\n\r\n---\r\n\r\n### \ud83d\udce6 Installation\r\n\r\nInstall the library from PyPI:\r\n\r\n```bash\r\npip install ChemInformant\r\n```\r\n\r\nTo include plotting capabilities for use with the tutorial, install the `[plot]` extra:\r\n\r\n```bash\r\npip install \"ChemInformant[plot]\"\r\n```\r\n\r\n---\r\n\r\n### \ud83d\ude80 Quick Start\r\n\r\nRetrieve multiple properties for multiple compounds, directly into a Pandas DataFrame, in a single function call:\r\n\r\n```python\r\nimport ChemInformant as ci\r\n\r\n# 1. Define your identifiers\r\nidentifiers = [\"aspirin\", \"caffeine\", 1983] # 1983 is paracetamol's CID\r\n\r\n# 2. Specify the properties you need\r\nproperties = [\"molecular_weight\", \"xlogp\", \"cas\"]\r\n\r\n# 3. Call the core function\r\ndf = ci.get_properties(identifiers, properties)\r\n\r\n# 4. Save the results to an SQL database\r\nci.df_to_sql(df, \"sqlite:///chem_data.db\", \"results\", if_exists=\"replace\")\r\n\r\n# 5. Analyze your results!\r\nprint(df)\r\n```\r\n\r\n**Output:**\r\n\r\n```\r\n input_identifier cid status molecular_weight xlogp cas\r\n0 aspirin 2244 OK 180.16 1.2 50-78-2\r\n1 caffeine 2519 OK 194.19 -0.1 58-08-2\r\n2 1983 1983 OK 151.16 0.5 103-90-2\r\n```\r\n\r\n<details>\r\n<summary><b>\u27a1\ufe0f Click to see Convenience API Cheatsheet</b></summary>\r\n<br>\r\n\r\n| Function | Description |\r\n| -------------------------- | ------------------------------------------------------------- |\r\n| `get_weight(id)` | Molecular weight *(float)* |\r\n| `get_formula(id)` | Molecular formula *(str)* |\r\n| `get_cas(id)` | CAS Registry Number *(str)* |\r\n| `get_iupac_name(id)` | IUPAC name *(str)* |\r\n| `get_canonical_smiles(id)` | Canonical SMILES with Canonical\u2192Connectivity fallback *(str)* |\r\n| `get_isomeric_smiles(id)` | Isomeric SMILES with Isomeric\u2192SMILES fallback *(str)* |\r\n| `get_xlogp(id)` | XLogP (calculated hydrophobicity) *(float)* |\r\n| `get_synonyms(id)` | List of synonyms *(List\\[str])* |\r\n| `get_compound(id)` | Full, validated **`Compound`** object (Pydantic v2 model) |\r\n\r\n*Note: This table shows key convenience functions for demonstration. ChemInformant provides **22 convenience functions** in total, covering molecular descriptors, mass properties, stereochemistry, and more.*\r\n\r\n*All functions accept a **CID, name, or SMILES** and return `None`/`[]` on failure.*\r\n\r\n</details>\r\n\r\nChemInformant also includes handy command-line tools for quick lookups directly from your terminal:\r\n\r\n* **`chemfetch`**: Fetches properties for one or more compounds.\r\n\r\n ```bash\r\n chemfetch aspirin --props \"cas,molecular_weight,iupac_name\"\r\n ```\r\n\r\n* **`chemdraw`**: Renders the 2D structure of a compound.\r\n\r\n ```bash\r\n chemdraw aspirin\r\n ```\r\n\r\n<p align=\"center\">\r\n <img src=\"https://raw.githubusercontent.com/HzaCode/ChemInformant/main/wide-cli-demo.gif\" width=\"100%\">\r\n</p>\r\n\r\n---\r\n\r\n### \ud83d\udcda Documentation & Examples\r\n\r\nFor a deep dive, please see our detailed guides:\r\n\r\n* **\u27a1\ufe0f Online Documentation:** The **[official documentation site](https://hezhiang.com/ChemInformant)** contains complete API references, guides, and usage examples. **This is the most comprehensive resource.**\r\n* **\u27a1\ufe0f Interactive User Manual:** Our [**Jupyter Notebook Tutorial**](examples/ChemInformant_User_Manual_v1.0.ipynb) provides a complete, end-to-end walkthrough. This is the best place to start for a hands-on experience.\r\n* **\u27a1\ufe0f Performance Benchmarks:** You can review and run our [**Benchmark Script**](./benchmark.py) to see the performance advantages of batching and caching.\r\n\r\n---\r\n\r\n### \ud83e\udd14 Why ChemInformant?\r\n\r\n> ChemInformant's core mission is to serve as a high-performance data backbone for the Python cheminformatics ecosystem. By delivering clean, validated, and analysis-ready Pandas DataFrames, it enables researchers to effortlessly pipe PubChem data into powerful toolkits like RDKit, Scikit-learn, or custom machine learning models, transforming multi-step data acquisition and wrangling tasks into single, elegant lines of code.\r\n\r\nA detailed comparison with other existing tools is provided in our [JOSS paper](https://github.com/HzaCode/ChemInformant/blob/main/paper/paper.md).\r\n\r\n### \ud83e\udd1d Contributing\r\n\r\nContributions are welcome! For guidelines on how to get started, please read our [contributing guide](https://github.com/HzaCode/ChemInformant/blob/main/CONTRIBUTING.md). You can [open an issue](https://github.com/HzaCode/ChemInformant/issues) to report bugs or suggest features, or [submit a pull request](https://github.com/HzaCode/ChemInformant/pulls) to contribute code.\r\n\r\n### \ud83d\udcc4 License\r\n\r\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE.md) file for details.\r\n\r\n### \ud83d\udcd1 Citation\r\n\r\n```bibtex\r\n@article{He2025,\r\n doi = {10.21105/joss.08341},\r\n url = {https://doi.org/10.21105/joss.08341},\r\n year = {2025},\r\n publisher = {The Open Journal},\r\n volume = {10},\r\n number = {112},\r\n pages = {8341},\r\n author = {He, Zhiang},\r\n title = {ChemInformant: A Robust and Workflow-Centric Python Client for High-Throughput PubChem Access},\r\n journal = {Journal of Open Source Software}\r\n}\r\n\r\n",
"bugtrack_url": null,
"license": null,
"summary": "A robust and high-throughput Python client for the PubChem API, designed for automated data retrieval and analysis",
"version": "2.4.2",
"project_urls": {
"Documentation": "https://github.com/HzaCode/ChemInformant#readme",
"Homepage": "https://github.com/HzaCode/ChemInformant",
"Issues": "https://github.com/HzaCode/ChemInformant/issues"
},
"split_keywords": [
"chemistry",
" cheminformatics",
" pubchem",
" api",
" compound",
" drug",
" cache",
" pydantic",
" batch",
" smiles",
" sql"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "4eaa464b22a17d3e6878fb7fcdec62a3381aed455ba7bc0fe35a0952325ea121",
"md5": "39ef19a074be7d0c0018833ade747d2f",
"sha256": "343e90a215da3c39664f7b1a9fea6a436bf8626ef371ae2098360b0a3aa7bcbc"
},
"downloads": -1,
"filename": "cheminformant-2.4.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "39ef19a074be7d0c0018833ade747d2f",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 27441,
"upload_time": "2025-09-10T06:25:22",
"upload_time_iso_8601": "2025-09-10T06:25:22.396776Z",
"url": "https://files.pythonhosted.org/packages/4e/aa/464b22a17d3e6878fb7fcdec62a3381aed455ba7bc0fe35a0952325ea121/cheminformant-2.4.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "585ffc2167d8ad040eb66ccb9a152c4f4c1f3c42cb374bca8a4be12025cb1f7a",
"md5": "233fe617d30cfdfca0c9599254eced3e",
"sha256": "9f591b7c74feb80d8e1dba27017042c97e4c6f413205bbfc15fb1fb321accc4d"
},
"downloads": -1,
"filename": "cheminformant-2.4.2.tar.gz",
"has_sig": false,
"md5_digest": "233fe617d30cfdfca0c9599254eced3e",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 39230,
"upload_time": "2025-09-10T06:25:23",
"upload_time_iso_8601": "2025-09-10T06:25:23.539300Z",
"url": "https://files.pythonhosted.org/packages/58/5f/fc2167d8ad040eb66ccb9a152c4f4c1f3c42cb374bca8a4be12025cb1f7a/cheminformant-2.4.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-10 06:25:23",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "HzaCode",
"github_project": "ChemInformant#readme",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "cheminformant"
}