scikit-fingerprints


Namescikit-fingerprints JSON
Version 1.12.0 PyPI version JSON
download
home_pagehttps://github.com/scikit-fingerprints/scikit-fingerprints
SummaryLibrary for effective molecular fingerprints calculation
upload_time2024-12-03 12:15:33
maintainerNone
docs_urlNone
authorScikit-Fingerprints Development Team
requires_python<3.13,>=3.9
licenseMIT
keywords molecular fingerprints molecular descriptors cheminformatics
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage
            # scikit-fingerprints

[![PyPI version](https://badge.fury.io/py/scikit-fingerprints.svg)](https://badge.fury.io/py/scikit-fingerprints)
[![](https://img.shields.io/pypi/dm/scikit-fingerprints)](https://pypi.org/project/scikit-fingerprints/)
[![Downloads](https://static.pepy.tech/badge/scikit-fingerprints)](https://pepy.tech/project/scikit-fingerprints)
![Libraries.io dependency status for latest release](https://img.shields.io/librariesio/release/pypi/scikit-fingerprints)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![License](https://img.shields.io/badge/license-MIT-blue)](LICENSE.md)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/scikit-fingerprints.svg)](https://pypi.org/project/scikit-fingerprints/)
[![Contributors](https://img.shields.io/github/contributors/scikit-fingerprints/scikit-fingerprints)](https://github.com/scikit-fingerprints/scikit-fingerprints/graphs/contributors)
[![check](https://github.com/scikit-fingerprints/scikit-fingerprints/actions/workflows/python-test.yml/badge.svg)](https://github.com/scikit-fingerprints/scikit-fingerprints/actions/workflows/python-test.yml)

[scikit-fingerprints](https://scikit-fingerprints.github.io/scikit-fingerprints/) is a Python library for efficient
computation of molecular fingerprints.

## Table of Contents

- [Description](#description)
- [Supported platforms](#supported-platforms)
- [Installation](#installation)
- [Quickstart](#quickstart)
- [Project overview](#project-overview)
- [Contributing](#contributing)
- [License](#license)

---

## Description

Molecular fingerprints are crucial in various scientific fields, including drug discovery, materials science, and
chemical analysis. However, existing Python libraries for computing molecular fingerprints often lack performance,
user-friendliness, and support for modern programming standards. This project aims to address these shortcomings by
creating an efficient and accessible Python library for molecular fingerprint computation.

You can find the documentation [HERE](https://scikit-fingerprints.github.io/scikit-fingerprints/)

Main features:
- scikit-learn compatible
- feature-rich, with >30 fingerprints
- parallelization
- sparse matrix support
- commercial-friendly MIT license

## Supported platforms

|                      | `python3.9`            | `python3.10` | `python3.11` | `python3.12` |
|----------------------|------------------------|--------------|--------------|--------------|
| **Ubuntu - latest**  | ✅                      | ✅            | ✅            | ✅            |
| **Windows - latest** | ✅                      | ✅            | ✅            | ✅            |
| **macOS - latest**   | only macOS 13 or newer | ✅            | ✅            | ✅            |

## Installation

You can install the library using pip:

```bash
pip install scikit-fingerprints
```

If you need bleeding-edge features and don't mind potentially unstable or undocumented functionalities,
you can also install directly from GitHub:
```bash
pip install git+https://github.com/scikit-fingerprints/scikit-fingerprints.git
```

## Quickstart

Most fingerprints are based on molecular graphs (2D-based), and you can use SMILES
input directly:
```python
from skfp.fingerprints import AtomPairFingerprint

smiles_list = ["O=S(=O)(O)CCS(=O)(=O)O", "O=C(O)c1ccccc1O"]

atom_pair_fingerprint = AtomPairFingerprint()

X = atom_pair_fingerprint.transform(smiles_list)
print(X)
```

For fingerprints using conformers (3D-based), you need to create molecules first
and compute conformers. Those fingerprints have `requires_conformers` attribute set
to `True`.
```python
from skfp.preprocessing import ConformerGenerator, MolFromSmilesTransformer
from skfp.fingerprints import WHIMFingerprint

smiles_list = ["O=S(=O)(O)CCS(=O)(=O)O", "O=C(O)c1ccccc1O"]

mol_from_smiles = MolFromSmilesTransformer()
conf_gen = ConformerGenerator()
fp = WHIMFingerprint()
print(fp.requires_conformers)  # True

mols_list = mol_from_smiles.transform(smiles_list)
mols_list = conf_gen.transform(mols_list)

X = fp.transform(mols_list)
print(X)
```

You can also use scikit-learn functionalities like pipelines, feature unions
etc. to build complex workflows. Popular datasets, e.g. from MoleculeNet benchmark,
can be loaded directly.
```python
from skfp.datasets.moleculenet import load_clintox
from skfp.metrics import multioutput_auroc_score, extract_pos_proba
from skfp.model_selection import scaffold_train_test_split
from skfp.fingerprints import ECFPFingerprint, MACCSFingerprint
from skfp.preprocessing import MolFromSmilesTransformer

from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import make_pipeline, make_union


smiles, y = load_clintox()
smiles_train, smiles_test, y_train, y_test = scaffold_train_test_split(
    smiles, y, test_size=0.2
)

pipeline = make_pipeline(
    MolFromSmilesTransformer(),
    make_union(ECFPFingerprint(count=True), MACCSFingerprint()),
    RandomForestClassifier(random_state=0),
)
pipeline.fit(smiles_train, y_train)

y_pred_proba = pipeline.predict_proba(smiles_test)
y_pred_proba = extract_pos_proba(y_pred_proba)
auroc = multioutput_auroc_score(y_test, y_pred_proba)
print(f"AUROC: {auroc:.2%}")
```

## Project overview

`scikit-fingerprint` brings molecular fingerprints and related functionalities into
the scikit-learn ecosystem. With familiar class-based design and `.transform()` method,
fingerprints can be computed from SMILES strings or RDKit `Mol` objects. Resulting NumPy
arrays or SciPy sparse arrays can be directly used in ML pipelines.

Main features:

1. **Scikit-learn compatible:** `scikit-fingerprints` uses familiar scikit-learn
   interface  and conforms to its API requirements. You can include molecular
   fingerprints in pipelines, concatenate them with feature unions, and process with
   ML algorithms.

2. **Performance optimization:** both speed and memory usage are optimized, by
   utilizing parallelism (with Joblib) and sparse CSR matrices (with SciPy). Heavy
   computation is typically relegated to C++ code of RDKit.

3. **Feature-rich:** in addition to computing fingerprints, you can load popular
   benchmark  datasets (e.g. from MoleculeNet), perform splitting (e.g. scaffold
   split), generate conformers, and optimize hyperparameters with optimized cross-validation.

4. **Well-documented:** each public function and class has extensive documentation,
   including relevant implementation details, caveats, and literature references.

5. **Extensibility:** any functionality can be easily modified or extended by
   inheriting from existing classes.

6. **High code quality:** pre-commit hooks scan each commit for code quality (e.g. `black`,
   `flake8`), typing (`mypy`), and security (e.g. `bandit`, `safety`). CI/CD process with
   GitHub Actions also includes over 250 unit and integration tests.

## Contributing

Please read [CONTRIBUTING.md](CONTRIBUTING.md) and [CODE_OF_CONDUCT.md](CODE_OF_CONDUCT.md) for details on our code of
conduct, and the process for submitting pull requests to us.

## Citing

If you use scikit-fingerprints in your work, please cite our main publication, 
[available on SoftwareX (open access)](https://doi.org/10.1016/j.softx.2024.101944):
```
@article{scikit_fingerprints,
   title = {Scikit-fingerprints: Easy and efficient computation of molecular fingerprints in Python},
   author = {Jakub Adamczyk and Piotr Ludynia},
   journal = {SoftwareX},
   volume = {28},
   pages = {101944},
   year = {2024},
   issn = {2352-7110},
   doi = {https://doi.org/10.1016/j.softx.2024.101944},
   url = {https://www.sciencedirect.com/science/article/pii/S2352711024003145},
   keywords = {Molecular fingerprints, Chemoinformatics, Molecular property prediction, Python, Machine learning, Scikit-learn},
}
```

Its preprint is also [available on ArXiv](https://arxiv.org/abs/2407.13291).

## License

This project is licensed under the MIT License - see the [LICENSE.md](LICENSE.md) file for details.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/scikit-fingerprints/scikit-fingerprints",
    "name": "scikit-fingerprints",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.13,>=3.9",
    "maintainer_email": null,
    "keywords": "molecular fingerprints, molecular descriptors, cheminformatics",
    "author": "Scikit-Fingerprints Development Team",
    "author_email": "scikitfingerprints@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/41/76/efec3cad5f81954ff463df227e8b40e2f4e6059bfc528b668e21a95ae20d/scikit_fingerprints-1.12.0.tar.gz",
    "platform": null,
    "description": "# scikit-fingerprints\n\n[![PyPI version](https://badge.fury.io/py/scikit-fingerprints.svg)](https://badge.fury.io/py/scikit-fingerprints)\n[![](https://img.shields.io/pypi/dm/scikit-fingerprints)](https://pypi.org/project/scikit-fingerprints/)\n[![Downloads](https://static.pepy.tech/badge/scikit-fingerprints)](https://pepy.tech/project/scikit-fingerprints)\n![Libraries.io dependency status for latest release](https://img.shields.io/librariesio/release/pypi/scikit-fingerprints)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n[![License](https://img.shields.io/badge/license-MIT-blue)](LICENSE.md)\n[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/scikit-fingerprints.svg)](https://pypi.org/project/scikit-fingerprints/)\n[![Contributors](https://img.shields.io/github/contributors/scikit-fingerprints/scikit-fingerprints)](https://github.com/scikit-fingerprints/scikit-fingerprints/graphs/contributors)\n[![check](https://github.com/scikit-fingerprints/scikit-fingerprints/actions/workflows/python-test.yml/badge.svg)](https://github.com/scikit-fingerprints/scikit-fingerprints/actions/workflows/python-test.yml)\n\n[scikit-fingerprints](https://scikit-fingerprints.github.io/scikit-fingerprints/) is a Python library for efficient\ncomputation of molecular fingerprints.\n\n## Table of Contents\n\n- [Description](#description)\n- [Supported platforms](#supported-platforms)\n- [Installation](#installation)\n- [Quickstart](#quickstart)\n- [Project overview](#project-overview)\n- [Contributing](#contributing)\n- [License](#license)\n\n---\n\n## Description\n\nMolecular fingerprints are crucial in various scientific fields, including drug discovery, materials science, and\nchemical analysis. However, existing Python libraries for computing molecular fingerprints often lack performance,\nuser-friendliness, and support for modern programming standards. This project aims to address these shortcomings by\ncreating an efficient and accessible Python library for molecular fingerprint computation.\n\nYou can find the documentation [HERE](https://scikit-fingerprints.github.io/scikit-fingerprints/)\n\nMain features:\n- scikit-learn compatible\n- feature-rich, with >30 fingerprints\n- parallelization\n- sparse matrix support\n- commercial-friendly MIT license\n\n## Supported platforms\n\n|                      | `python3.9`            | `python3.10` | `python3.11` | `python3.12` |\n|----------------------|------------------------|--------------|--------------|--------------|\n| **Ubuntu - latest**  | \u2705                      | \u2705            | \u2705            | \u2705            |\n| **Windows - latest** | \u2705                      | \u2705            | \u2705            | \u2705            |\n| **macOS - latest**   | only macOS 13 or newer | \u2705            | \u2705            | \u2705            |\n\n## Installation\n\nYou can install the library using pip:\n\n```bash\npip install scikit-fingerprints\n```\n\nIf you need bleeding-edge features and don't mind potentially unstable or undocumented functionalities,\nyou can also install directly from GitHub:\n```bash\npip install git+https://github.com/scikit-fingerprints/scikit-fingerprints.git\n```\n\n## Quickstart\n\nMost fingerprints are based on molecular graphs (2D-based), and you can use SMILES\ninput directly:\n```python\nfrom skfp.fingerprints import AtomPairFingerprint\n\nsmiles_list = [\"O=S(=O)(O)CCS(=O)(=O)O\", \"O=C(O)c1ccccc1O\"]\n\natom_pair_fingerprint = AtomPairFingerprint()\n\nX = atom_pair_fingerprint.transform(smiles_list)\nprint(X)\n```\n\nFor fingerprints using conformers (3D-based), you need to create molecules first\nand compute conformers. Those fingerprints have `requires_conformers` attribute set\nto `True`.\n```python\nfrom skfp.preprocessing import ConformerGenerator, MolFromSmilesTransformer\nfrom skfp.fingerprints import WHIMFingerprint\n\nsmiles_list = [\"O=S(=O)(O)CCS(=O)(=O)O\", \"O=C(O)c1ccccc1O\"]\n\nmol_from_smiles = MolFromSmilesTransformer()\nconf_gen = ConformerGenerator()\nfp = WHIMFingerprint()\nprint(fp.requires_conformers)  # True\n\nmols_list = mol_from_smiles.transform(smiles_list)\nmols_list = conf_gen.transform(mols_list)\n\nX = fp.transform(mols_list)\nprint(X)\n```\n\nYou can also use scikit-learn functionalities like pipelines, feature unions\netc. to build complex workflows. Popular datasets, e.g. from MoleculeNet benchmark,\ncan be loaded directly.\n```python\nfrom skfp.datasets.moleculenet import load_clintox\nfrom skfp.metrics import multioutput_auroc_score, extract_pos_proba\nfrom skfp.model_selection import scaffold_train_test_split\nfrom skfp.fingerprints import ECFPFingerprint, MACCSFingerprint\nfrom skfp.preprocessing import MolFromSmilesTransformer\n\nfrom sklearn.ensemble import RandomForestClassifier\nfrom sklearn.pipeline import make_pipeline, make_union\n\n\nsmiles, y = load_clintox()\nsmiles_train, smiles_test, y_train, y_test = scaffold_train_test_split(\n    smiles, y, test_size=0.2\n)\n\npipeline = make_pipeline(\n    MolFromSmilesTransformer(),\n    make_union(ECFPFingerprint(count=True), MACCSFingerprint()),\n    RandomForestClassifier(random_state=0),\n)\npipeline.fit(smiles_train, y_train)\n\ny_pred_proba = pipeline.predict_proba(smiles_test)\ny_pred_proba = extract_pos_proba(y_pred_proba)\nauroc = multioutput_auroc_score(y_test, y_pred_proba)\nprint(f\"AUROC: {auroc:.2%}\")\n```\n\n## Project overview\n\n`scikit-fingerprint` brings molecular fingerprints and related functionalities into\nthe scikit-learn ecosystem. With familiar class-based design and `.transform()` method,\nfingerprints can be computed from SMILES strings or RDKit `Mol` objects. Resulting NumPy\narrays or SciPy sparse arrays can be directly used in ML pipelines.\n\nMain features:\n\n1. **Scikit-learn compatible:** `scikit-fingerprints` uses familiar scikit-learn\n   interface  and conforms to its API requirements. You can include molecular\n   fingerprints in pipelines, concatenate them with feature unions, and process with\n   ML algorithms.\n\n2. **Performance optimization:** both speed and memory usage are optimized, by\n   utilizing parallelism (with Joblib) and sparse CSR matrices (with SciPy). Heavy\n   computation is typically relegated to C++ code of RDKit.\n\n3. **Feature-rich:** in addition to computing fingerprints, you can load popular\n   benchmark  datasets (e.g. from MoleculeNet), perform splitting (e.g. scaffold\n   split), generate conformers, and optimize hyperparameters with optimized cross-validation.\n\n4. **Well-documented:** each public function and class has extensive documentation,\n   including relevant implementation details, caveats, and literature references.\n\n5. **Extensibility:** any functionality can be easily modified or extended by\n   inheriting from existing classes.\n\n6. **High code quality:** pre-commit hooks scan each commit for code quality (e.g. `black`,\n   `flake8`), typing (`mypy`), and security (e.g. `bandit`, `safety`). CI/CD process with\n   GitHub Actions also includes over 250 unit and integration tests.\n\n## Contributing\n\nPlease read [CONTRIBUTING.md](CONTRIBUTING.md) and [CODE_OF_CONDUCT.md](CODE_OF_CONDUCT.md) for details on our code of\nconduct, and the process for submitting pull requests to us.\n\n## Citing\n\nIf you use scikit-fingerprints in your work, please cite our main publication, \n[available on SoftwareX (open access)](https://doi.org/10.1016/j.softx.2024.101944):\n```\n@article{scikit_fingerprints,\n   title = {Scikit-fingerprints: Easy and efficient computation of molecular fingerprints in Python},\n   author = {Jakub Adamczyk and Piotr Ludynia},\n   journal = {SoftwareX},\n   volume = {28},\n   pages = {101944},\n   year = {2024},\n   issn = {2352-7110},\n   doi = {https://doi.org/10.1016/j.softx.2024.101944},\n   url = {https://www.sciencedirect.com/science/article/pii/S2352711024003145},\n   keywords = {Molecular fingerprints, Chemoinformatics, Molecular property prediction, Python, Machine learning, Scikit-learn},\n}\n```\n\nIts preprint is also [available on ArXiv](https://arxiv.org/abs/2407.13291).\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE.md](LICENSE.md) file for details.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Library for effective molecular fingerprints calculation",
    "version": "1.12.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/scikit-fingerprints/scikit-fingerprints/issues",
        "Documentation": "https://scikit-fingerprints.github.io/scikit-fingerprints/",
        "Homepage": "https://github.com/scikit-fingerprints/scikit-fingerprints",
        "Repository": "https://github.com/scikit-fingerprints/scikit-fingerprints"
    },
    "split_keywords": [
        "molecular fingerprints",
        " molecular descriptors",
        " cheminformatics"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1ce2b61b8b73f6b6d2e8dc44ad9d81736e935ff541d13243341ed1f7b8292dba",
                "md5": "86d1b8f8df18467eb433c6d19bf3bc13",
                "sha256": "bb2f50bf60a7a0bb2b3227a7944fb86c59336a4ed3311bab05286b71d4e50357"
            },
            "downloads": -1,
            "filename": "scikit_fingerprints-1.12.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "86d1b8f8df18467eb433c6d19bf3bc13",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.13,>=3.9",
            "size": 295469,
            "upload_time": "2024-12-03T12:15:31",
            "upload_time_iso_8601": "2024-12-03T12:15:31.522338Z",
            "url": "https://files.pythonhosted.org/packages/1c/e2/b61b8b73f6b6d2e8dc44ad9d81736e935ff541d13243341ed1f7b8292dba/scikit_fingerprints-1.12.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4176efec3cad5f81954ff463df227e8b40e2f4e6059bfc528b668e21a95ae20d",
                "md5": "fbafa655d659d3abca4aa18c819c973d",
                "sha256": "1f4e5c2e9e64be67a245817c5989fcd9841db39ceff5239f35328d4cecc16d20"
            },
            "downloads": -1,
            "filename": "scikit_fingerprints-1.12.0.tar.gz",
            "has_sig": false,
            "md5_digest": "fbafa655d659d3abca4aa18c819c973d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.13,>=3.9",
            "size": 173275,
            "upload_time": "2024-12-03T12:15:33",
            "upload_time_iso_8601": "2024-12-03T12:15:33.596253Z",
            "url": "https://files.pythonhosted.org/packages/41/76/efec3cad5f81954ff463df227e8b40e2f4e6059bfc528b668e21a95ae20d/scikit_fingerprints-1.12.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-03 12:15:33",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "scikit-fingerprints",
    "github_project": "scikit-fingerprints",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "lcname": "scikit-fingerprints"
}
        
Elapsed time: 0.62093s