# scikit-fingerprints
[](https://badge.fury.io/py/scikit-fingerprints)
[](https://pypi.org/project/scikit-fingerprints/)
[](https://pepy.tech/project/scikit-fingerprints)
[](https://github.com/astral-sh/ruff)
[](LICENSE.md)
[](https://pypi.org/project/scikit-fingerprints/)
[](https://github.com/scikit-fingerprints/scikit-fingerprints/graphs/contributors)
---
<picture>
<source media="(prefers-color-scheme: dark)" srcset="docs/logos/skfp-logo-horizontal-text-white.png">
<source media="(prefers-color-scheme: light)" srcset="docs/logos/skfp-logo-horizontal-text-black.png">
<img alt="change image for different github color schemes" src="docs/logos/skfp-logo-no-text.png">
</picture>
---
[scikit-fingerprints](https://scikit-fingerprints.readthedocs.io/latest/) is a Python library for efficient
computation of molecular fingerprints.
## Table of Contents
- [Description](#description)
- [Supported platforms](#supported-platforms)
- [Installation](#installation)
- [Quickstart](#quickstart)
- [Project overview](#project-overview)
- [Contributing](#contributing)
- [License](#license)
---
## Description
Molecular fingerprints are crucial in various scientific fields, including drug discovery, materials science, and
chemical analysis. However, existing Python libraries for computing molecular fingerprints often lack performance,
user-friendliness, and support for modern programming standards. This project aims to address these shortcomings by
creating an efficient and accessible Python library for molecular fingerprint computation.
See [the documentation and API reference](https://scikit-fingerprints.readthedocs.io/latest/) for details.
Main features:
- scikit-learn compatible
- feature-rich, with >30 fingerprints
- parallelization
- sparse matrix support
- commercial-friendly MIT license
## Supported platforms
| | `python3.10` | `python3.11` | `python3.12` | `python3.13` |
|:-----------:|:------------:|:------------:|:------------:|:------------:|
| **Linux** | ✅ | ✅ | ✅ | ✅ |
| **Windows** | ✅ | ✅ | ✅ | ✅ |
| **macOS** | ✅ | ✅ | ✅ | ✅ |
Python 3.9 was supported up to scikit-fingerprints 1.13.0.
## Installation
You can install the library using pip:
```bash
pip install scikit-fingerprints
```
If you need bleeding-edge features and don't mind potentially unstable or undocumented functionalities,
you can also install directly from GitHub:
```bash
pip install git+https://github.com/scikit-fingerprints/scikit-fingerprints.git
```
## Quickstart
Most fingerprints are based on molecular graphs (topological, 2D-based), and you can use SMILES
input directly:
```python
from skfp.fingerprints import AtomPairFingerprint
smiles_list = ["O=S(=O)(O)CCS(=O)(=O)O", "O=C(O)c1ccccc1O"]
atom_pair_fingerprint = AtomPairFingerprint()
X = atom_pair_fingerprint.transform(smiles_list)
print(X)
```
For fingerprints using conformers (conformational, 3D-based), you need to create molecules first
and compute conformers. Those fingerprints have `requires_conformers` attribute set
to `True`.
```python
from skfp.preprocessing import ConformerGenerator, MolFromSmilesTransformer
from skfp.fingerprints import WHIMFingerprint
smiles_list = ["O=S(=O)(O)CCS(=O)(=O)O", "O=C(O)c1ccccc1O"]
mol_from_smiles = MolFromSmilesTransformer()
conf_gen = ConformerGenerator()
fp = WHIMFingerprint()
print(fp.requires_conformers) # True
mols_list = mol_from_smiles.transform(smiles_list)
mols_list = conf_gen.transform(mols_list)
X = fp.transform(mols_list)
print(X)
```
You can also use scikit-learn functionalities like pipelines, feature unions
etc. to build complex workflows. Popular datasets, e.g. from MoleculeNet benchmark,
can be loaded directly.
```python
from skfp.datasets.moleculenet import load_clintox
from skfp.metrics import multioutput_auroc_score, extract_pos_proba
from skfp.model_selection import scaffold_train_test_split
from skfp.fingerprints import ECFPFingerprint, MACCSFingerprint
from skfp.preprocessing import MolFromSmilesTransformer
from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import make_pipeline, make_union
smiles, y = load_clintox()
smiles_train, smiles_test, y_train, y_test = scaffold_train_test_split(
smiles, y, test_size=0.2
)
pipeline = make_pipeline(
MolFromSmilesTransformer(),
make_union(ECFPFingerprint(count=True), MACCSFingerprint()),
RandomForestClassifier(random_state=0),
)
pipeline.fit(smiles_train, y_train)
y_pred_proba = pipeline.predict_proba(smiles_test)
y_pred_proba = extract_pos_proba(y_pred_proba)
auroc = multioutput_auroc_score(y_test, y_pred_proba)
print(f"AUROC: {auroc:.2%}")
```
## Examples
You can find Jupyter Notebooks with examples and tutorials [in documentation](https://scikit-fingerprints.readthedocs.io/latest/examples.html),
as well as in the ["examples" directory](https://github.com/scikit-fingerprints/scikit-fingerprints/tree/master/examples).
Examples and tutorials:
1. [Introduction to scikit-fingerprints](examples/01_skfp_introduction.ipynb)
2. [Fingerprint types](examples/02_fingerprint_types.ipynb)
3. [Molecular pipelines](examples/03_pipelines.ipynb)
4. [Conformers and conformational fingerprints](examples/04_conformers.ipynb)
5. [Hyperparameter tuning](examples/05_hyperparameter_tuning.ipynb)
6. [Dataset splits](examples/06_dataset_splits.ipynb)
7. [Datasets and benchmarking](examples/07_datasets_and_benchmarking.ipynb)
8. [Similarity and distance metrics](examples/08_similarity_and_distance_metrics.ipynb)
9. [Molecular filters](examples/09_molecular_filters.ipynb)
## Project overview
`scikit-fingerprint` brings molecular fingerprints and related functionalities into
the scikit-learn ecosystem. With familiar class-based design and `.transform()` method,
fingerprints can be computed from SMILES strings or RDKit `Mol` objects. Resulting NumPy
arrays or SciPy sparse arrays can be directly used in ML pipelines.
Main features:
1. **Scikit-learn compatible:** `scikit-fingerprints` uses familiar scikit-learn
interface and conforms to its API requirements. You can include molecular
fingerprints in pipelines, concatenate them with feature unions, and process with
ML algorithms.
2. **Performance optimization:** both speed and memory usage are optimized, by
utilizing parallelism (with Joblib) and sparse CSR matrices (with SciPy). Heavy
computation is typically relegated to C++ code of RDKit.
3. **Feature-rich:** in addition to computing fingerprints, you can load popular
benchmark datasets (e.g. from MoleculeNet), perform splitting (e.g. scaffold
split), generate conformers, and optimize hyperparameters with optimized cross-validation.
4. **Well-documented:** each public function and class has extensive documentation,
including relevant implementation details, caveats, and literature references.
5. **Extensibility:** any functionality can be easily modified or extended by
inheriting from existing classes.
6. **High code quality:** pre-commit hooks scan each commit for code quality (e.g. `black`,
`flake8`), typing (`mypy`), and security (e.g. `bandit`, `pip-audit`). CI/CD process with
GitHub Actions also includes over 250 unit and integration tests.
## Citing
If you use scikit-fingerprints in your work, please cite our main publication,
[available on SoftwareX (open access)](https://www.sciencedirect.com/science/article/pii/S2352711024003145):
```
@article{scikit_fingerprints,
title = {Scikit-fingerprints: Easy and efficient computation of molecular fingerprints in Python},
author = {Jakub Adamczyk and Piotr Ludynia},
journal = {SoftwareX},
volume = {28},
pages = {101944},
year = {2024},
issn = {2352-7110},
doi = {https://doi.org/10.1016/j.softx.2024.101944},
url = {https://www.sciencedirect.com/science/article/pii/S2352711024003145},
keywords = {Molecular fingerprints, Chemoinformatics, Molecular property prediction, Python, Machine learning, Scikit-learn},
}
```
Its preprint is also [available on ArXiv](https://arxiv.org/abs/2407.13291).
## Publications and usage
Publications using scikit-fingerprints:
1. [J. Adamczyk, W. Czech "Molecular Topological Profile (MOLTOP) - Simple and Strong Baseline for Molecular Graph Classification" ECAI 2024](https://ebooks.iospress.nl/doi/10.3233/FAIA240663)
2. [J. Adamczyk, P. Ludynia "Scikit-fingerprints: easy and efficient computation of molecular fingerprints in Python" SoftwareX](https://www.sciencedirect.com/science/article/pii/S2352711024003145)
3. [J. Adamczyk, P. Ludynia, W. Czech "Molecular Fingerprints Are Strong Models for Peptide Function Prediction" ArXiv preprint](https://arxiv.org/abs/2501.17901)
4. [M. Fitzner et al. "BayBE: a Bayesian Back End for experimental planning in the low-to-no-data regime" RSC Digital Discovery](https://pubs.rsc.org/en/content/articlehtml/2025/dd/d5dd00050e)
5. [J. Xiong "Bridging 3D Molecular Structures and Artificial Intelligence by a Conformation Description Language"](https://www.biorxiv.org/content/10.1101/2025.05.07.652440v1.abstract)
## Contributing
Please read [CONTRIBUTING.md](CONTRIBUTING.md) and [CODE_OF_CONDUCT.md](CODE_OF_CONDUCT.md) for details on our code of
conduct, and the process for submitting pull requests to us.
## License
This project is licensed under the MIT License - see the [LICENSE.md](LICENSE.md) file for details.
Raw data
{
"_id": null,
"home_page": null,
"name": "scikit-fingerprints",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.14,>=3.10",
"maintainer_email": null,
"keywords": "molecular fingerprints, molecular descriptors, chemoinformatics, chemistry, machine learning",
"author": "AGH ML & Chemoinformatics Group",
"author_email": "AGH ML & Chemoinformatics Group <jadamczy@agh.edu.pl>",
"download_url": "https://files.pythonhosted.org/packages/b7/01/45d6c437b935dde329a34c00d70daeac70d1cc73217e1d5ec3159e4c974e/scikit_fingerprints-1.17.0.tar.gz",
"platform": null,
"description": "# scikit-fingerprints\n\n[](https://badge.fury.io/py/scikit-fingerprints)\n[](https://pypi.org/project/scikit-fingerprints/)\n[](https://pepy.tech/project/scikit-fingerprints)\n[](https://github.com/astral-sh/ruff)\n[](LICENSE.md)\n[](https://pypi.org/project/scikit-fingerprints/)\n[](https://github.com/scikit-fingerprints/scikit-fingerprints/graphs/contributors)\n\n--- \n\n<picture>\n <source media=\"(prefers-color-scheme: dark)\" srcset=\"docs/logos/skfp-logo-horizontal-text-white.png\">\n <source media=\"(prefers-color-scheme: light)\" srcset=\"docs/logos/skfp-logo-horizontal-text-black.png\">\n <img alt=\"change image for different github color schemes\" src=\"docs/logos/skfp-logo-no-text.png\"> \n</picture>\n\n---\n\n[scikit-fingerprints](https://scikit-fingerprints.readthedocs.io/latest/) is a Python library for efficient\ncomputation of molecular fingerprints.\n\n## Table of Contents\n\n- [Description](#description)\n- [Supported platforms](#supported-platforms)\n- [Installation](#installation)\n- [Quickstart](#quickstart)\n- [Project overview](#project-overview)\n- [Contributing](#contributing)\n- [License](#license)\n\n---\n\n## Description\n\nMolecular fingerprints are crucial in various scientific fields, including drug discovery, materials science, and\nchemical analysis. However, existing Python libraries for computing molecular fingerprints often lack performance,\nuser-friendliness, and support for modern programming standards. This project aims to address these shortcomings by\ncreating an efficient and accessible Python library for molecular fingerprint computation.\n\nSee [the documentation and API reference](https://scikit-fingerprints.readthedocs.io/latest/) for details.\n\nMain features:\n- scikit-learn compatible\n- feature-rich, with >30 fingerprints\n- parallelization\n- sparse matrix support\n- commercial-friendly MIT license\n\n## Supported platforms\n\n| | `python3.10` | `python3.11` | `python3.12` | `python3.13` |\n|:-----------:|:------------:|:------------:|:------------:|:------------:|\n| **Linux** | \u2705 | \u2705 | \u2705 | \u2705 |\n| **Windows** | \u2705 | \u2705 | \u2705 | \u2705 |\n| **macOS** | \u2705 | \u2705 | \u2705 | \u2705 |\n\nPython 3.9 was supported up to scikit-fingerprints 1.13.0.\n\n## Installation\n\nYou can install the library using pip:\n\n```bash\npip install scikit-fingerprints\n```\n\nIf you need bleeding-edge features and don't mind potentially unstable or undocumented functionalities,\nyou can also install directly from GitHub:\n```bash\npip install git+https://github.com/scikit-fingerprints/scikit-fingerprints.git\n```\n\n## Quickstart\n\nMost fingerprints are based on molecular graphs (topological, 2D-based), and you can use SMILES\ninput directly:\n```python\nfrom skfp.fingerprints import AtomPairFingerprint\n\nsmiles_list = [\"O=S(=O)(O)CCS(=O)(=O)O\", \"O=C(O)c1ccccc1O\"]\n\natom_pair_fingerprint = AtomPairFingerprint()\n\nX = atom_pair_fingerprint.transform(smiles_list)\nprint(X)\n```\n\nFor fingerprints using conformers (conformational, 3D-based), you need to create molecules first\nand compute conformers. Those fingerprints have `requires_conformers` attribute set\nto `True`.\n```python\nfrom skfp.preprocessing import ConformerGenerator, MolFromSmilesTransformer\nfrom skfp.fingerprints import WHIMFingerprint\n\nsmiles_list = [\"O=S(=O)(O)CCS(=O)(=O)O\", \"O=C(O)c1ccccc1O\"]\n\nmol_from_smiles = MolFromSmilesTransformer()\nconf_gen = ConformerGenerator()\nfp = WHIMFingerprint()\nprint(fp.requires_conformers) # True\n\nmols_list = mol_from_smiles.transform(smiles_list)\nmols_list = conf_gen.transform(mols_list)\n\nX = fp.transform(mols_list)\nprint(X)\n```\n\nYou can also use scikit-learn functionalities like pipelines, feature unions\netc. to build complex workflows. Popular datasets, e.g. from MoleculeNet benchmark,\ncan be loaded directly.\n```python\nfrom skfp.datasets.moleculenet import load_clintox\nfrom skfp.metrics import multioutput_auroc_score, extract_pos_proba\nfrom skfp.model_selection import scaffold_train_test_split\nfrom skfp.fingerprints import ECFPFingerprint, MACCSFingerprint\nfrom skfp.preprocessing import MolFromSmilesTransformer\n\nfrom sklearn.ensemble import RandomForestClassifier\nfrom sklearn.pipeline import make_pipeline, make_union\n\n\nsmiles, y = load_clintox()\nsmiles_train, smiles_test, y_train, y_test = scaffold_train_test_split(\n smiles, y, test_size=0.2\n)\n\npipeline = make_pipeline(\n MolFromSmilesTransformer(),\n make_union(ECFPFingerprint(count=True), MACCSFingerprint()),\n RandomForestClassifier(random_state=0),\n)\npipeline.fit(smiles_train, y_train)\n\ny_pred_proba = pipeline.predict_proba(smiles_test)\ny_pred_proba = extract_pos_proba(y_pred_proba)\nauroc = multioutput_auroc_score(y_test, y_pred_proba)\nprint(f\"AUROC: {auroc:.2%}\")\n```\n\n## Examples\n\nYou can find Jupyter Notebooks with examples and tutorials [in documentation](https://scikit-fingerprints.readthedocs.io/latest/examples.html),\nas well as in the [\"examples\" directory](https://github.com/scikit-fingerprints/scikit-fingerprints/tree/master/examples).\n\nExamples and tutorials:\n1. [Introduction to scikit-fingerprints](examples/01_skfp_introduction.ipynb)\n2. [Fingerprint types](examples/02_fingerprint_types.ipynb)\n3. [Molecular pipelines](examples/03_pipelines.ipynb)\n4. [Conformers and conformational fingerprints](examples/04_conformers.ipynb)\n5. [Hyperparameter tuning](examples/05_hyperparameter_tuning.ipynb)\n6. [Dataset splits](examples/06_dataset_splits.ipynb)\n7. [Datasets and benchmarking](examples/07_datasets_and_benchmarking.ipynb)\n8. [Similarity and distance metrics](examples/08_similarity_and_distance_metrics.ipynb)\n9. [Molecular filters](examples/09_molecular_filters.ipynb)\n\n## Project overview\n\n`scikit-fingerprint` brings molecular fingerprints and related functionalities into\nthe scikit-learn ecosystem. With familiar class-based design and `.transform()` method,\nfingerprints can be computed from SMILES strings or RDKit `Mol` objects. Resulting NumPy\narrays or SciPy sparse arrays can be directly used in ML pipelines.\n\nMain features:\n\n1. **Scikit-learn compatible:** `scikit-fingerprints` uses familiar scikit-learn\n interface and conforms to its API requirements. You can include molecular\n fingerprints in pipelines, concatenate them with feature unions, and process with\n ML algorithms.\n\n2. **Performance optimization:** both speed and memory usage are optimized, by\n utilizing parallelism (with Joblib) and sparse CSR matrices (with SciPy). Heavy\n computation is typically relegated to C++ code of RDKit.\n\n3. **Feature-rich:** in addition to computing fingerprints, you can load popular\n benchmark datasets (e.g. from MoleculeNet), perform splitting (e.g. scaffold\n split), generate conformers, and optimize hyperparameters with optimized cross-validation.\n\n4. **Well-documented:** each public function and class has extensive documentation,\n including relevant implementation details, caveats, and literature references.\n\n5. **Extensibility:** any functionality can be easily modified or extended by\n inheriting from existing classes.\n\n6. **High code quality:** pre-commit hooks scan each commit for code quality (e.g. `black`,\n `flake8`), typing (`mypy`), and security (e.g. `bandit`, `pip-audit`). CI/CD process with\n GitHub Actions also includes over 250 unit and integration tests.\n\n## Citing\n\nIf you use scikit-fingerprints in your work, please cite our main publication, \n[available on SoftwareX (open access)](https://www.sciencedirect.com/science/article/pii/S2352711024003145):\n```\n@article{scikit_fingerprints,\n title = {Scikit-fingerprints: Easy and efficient computation of molecular fingerprints in Python},\n author = {Jakub Adamczyk and Piotr Ludynia},\n journal = {SoftwareX},\n volume = {28},\n pages = {101944},\n year = {2024},\n issn = {2352-7110},\n doi = {https://doi.org/10.1016/j.softx.2024.101944},\n url = {https://www.sciencedirect.com/science/article/pii/S2352711024003145},\n keywords = {Molecular fingerprints, Chemoinformatics, Molecular property prediction, Python, Machine learning, Scikit-learn},\n}\n```\n\nIts preprint is also [available on ArXiv](https://arxiv.org/abs/2407.13291).\n\n## Publications and usage\n\nPublications using scikit-fingerprints:\n1. [J. Adamczyk, W. Czech \"Molecular Topological Profile (MOLTOP) - Simple and Strong Baseline for Molecular Graph Classification\" ECAI 2024](https://ebooks.iospress.nl/doi/10.3233/FAIA240663)\n2. [J. Adamczyk, P. Ludynia \"Scikit-fingerprints: easy and efficient computation of molecular fingerprints in Python\" SoftwareX](https://www.sciencedirect.com/science/article/pii/S2352711024003145)\n3. [J. Adamczyk, P. Ludynia, W. Czech \"Molecular Fingerprints Are Strong Models for Peptide Function Prediction\" ArXiv preprint](https://arxiv.org/abs/2501.17901)\n4. [M. Fitzner et al. \"BayBE: a Bayesian Back End for experimental planning in the low-to-no-data regime\" RSC Digital Discovery](https://pubs.rsc.org/en/content/articlehtml/2025/dd/d5dd00050e)\n5. [J. Xiong \"Bridging 3D Molecular Structures and Artificial Intelligence by a Conformation Description Language\"](https://www.biorxiv.org/content/10.1101/2025.05.07.652440v1.abstract)\n\n## Contributing\n\nPlease read [CONTRIBUTING.md](CONTRIBUTING.md) and [CODE_OF_CONDUCT.md](CODE_OF_CONDUCT.md) for details on our code of\nconduct, and the process for submitting pull requests to us.\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE.md](LICENSE.md) file for details.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Library for effective molecular fingerprints calculation",
"version": "1.17.0",
"project_urls": {
"Bug Tracker": "https://github.com/scikit-fingerprints/scikit-fingerprints/issues",
"documentation": "https://scikit-fingerprints.readthedocs.io/latest",
"homepage": "https://github.com/scikit-fingerprints/scikit-fingerprints",
"repository": "https://github.com/scikit-fingerprints/scikit-fingerprints"
},
"split_keywords": [
"molecular fingerprints",
" molecular descriptors",
" chemoinformatics",
" chemistry",
" machine learning"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "ec3b1e7b52d63cd43c1671dd4cdeef9e97f9faf34ef6504c250c1d17e96b24d0",
"md5": "b9539bfba8ae882794ee174fb8a7d35f",
"sha256": "74809a4e2083e7ba4210ba257686c66fbb12c5da1c7392f5e6afa4030cc36a2d"
},
"downloads": -1,
"filename": "scikit_fingerprints-1.17.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "b9539bfba8ae882794ee174fb8a7d35f",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.14,>=3.10",
"size": 468304,
"upload_time": "2025-08-31T12:44:28",
"upload_time_iso_8601": "2025-08-31T12:44:28.695820Z",
"url": "https://files.pythonhosted.org/packages/ec/3b/1e7b52d63cd43c1671dd4cdeef9e97f9faf34ef6504c250c1d17e96b24d0/scikit_fingerprints-1.17.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "b70145d6c437b935dde329a34c00d70daeac70d1cc73217e1d5ec3159e4c974e",
"md5": "eebe510a44f5a03710d8a0920a094c27",
"sha256": "651beec53a8843afb558851276a3f42af1f9329599677cf2105bd029d576baa5"
},
"downloads": -1,
"filename": "scikit_fingerprints-1.17.0.tar.gz",
"has_sig": false,
"md5_digest": "eebe510a44f5a03710d8a0920a094c27",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.14,>=3.10",
"size": 282480,
"upload_time": "2025-08-31T12:44:30",
"upload_time_iso_8601": "2025-08-31T12:44:30.309052Z",
"url": "https://files.pythonhosted.org/packages/b7/01/45d6c437b935dde329a34c00d70daeac70d1cc73217e1d5ec3159e4c974e/scikit_fingerprints-1.17.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-31 12:44:30",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "scikit-fingerprints",
"github_project": "scikit-fingerprints",
"travis_ci": false,
"coveralls": true,
"github_actions": true,
"lcname": "scikit-fingerprints"
}