scikit-mol


Namescikit-mol JSON
Version 0.3.1 PyPI version JSON
download
home_pagehttps://github.com/EBjerrum/scikit-mol
Summaryscikit-learn classes for molecule transformation
upload_time2024-04-26 09:54:46
maintainerNone
docs_urlNone
authorEsben Jannik Bjerrum
requires_python>=3.7
licenseLGPL-3.0
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # scikit-mol

![Fancy logo](./ressources/logo/ScikitMol_Logo_DarkBG_300px.png#gh-dark-mode-only)
![Fancy logo](./ressources/logo/ScikitMol_Logo_LightBG_300px.png#gh-light-mode-only)

## Scikit-Learn classes for molecular vectorization using RDKit

The intended usage is to be able to add molecular vectorization directly into scikit-learn pipelines, so that the final model directly predict on RDKit molecules or SMILES strings

As example with the needed scikit-learn and -mol imports and RDKit mol objects in the mol_list_train and \_test lists:

    pipe = Pipeline([('mol_transformer', MorganFingerprintTransformer()), ('Regressor', Ridge())])
    pipe.fit(mol_list_train, y_train)
    pipe.score(mol_list_test, y_test)
    pipe.predict([Chem.MolFromSmiles('c1ccccc1C(=O)C')])

    >>> array([4.93858815])

The scikit-learn compatibility should also make it easier to include the fingerprinting step in hyperparameter tuning with scikit-learns utilities

The first draft for the project was created at the [RDKIT UGM 2022 hackathon](https://github.com/rdkit/UGM_2022) 2022-October-14

## Implemented

- Descriptors
  - MolecularDescriptorTransformer

<br>

- Fingerprints
  - MorganFingerprintTransformer
  - MACCSKeysFingerprintTransformer
  - RDKitFingerprintTransformer
  - AtomPairFingerprintTransformer
  - TopologicalTorsionFingerprintTransformer
  - MHFingerprintTransformer
  - SECFingerprintTransformer
  - AvalonFingerprintTransformer

<br>

- Conversions
  - SmilesToMol

<br>

- Standardizer
  - Standardizer

<br>

- Utilities
  - CheckSmilesSanitazion

## Installation

Users can install latest tagged release from pip

    pip install scikit-mol

Bleeding edge

    pip install git+https://github.com:EBjerrum/scikit-mol.git

## Documentation

There are a collection of notebooks in the notebooks directory which demonstrates some different aspects and use cases

- [Basic Usage and fingerprint transformers](https://github.com/EBjerrum/scikit-mol/tree/main/notebooks/01_basic_usage.ipynb)
- [Descriptor transformer](https://github.com/EBjerrum/scikit-mol/tree/main/notebooks/02_descriptor_transformer.ipynb)
- [Pipelining with Scikit-Learn classes](https://github.com/EBjerrum/scikit-mol/tree/main/notebooks/03_example_pipeline.ipynb)
- [Molecular standardization](https://github.com/EBjerrum/scikit-mol/tree/main/notebooks/04_standardizer.ipynb)
- [Sanitizing SMILES input](https://github.com/EBjerrum/scikit-mol/tree/main/notebooks/05_smiles_sanitaztion.ipynb)
- [Integrated hyperparameter tuning of Scikit-Learn estimator and Scikit-Mol transformer](https://github.com/EBjerrum/scikit-mol/tree/main/notebooks/06_hyperparameter_tuning.ipynb)
- [Using parallel execution to speed up descriptor and fingerprint calculations](https://github.com/EBjerrum/scikit-mol/tree/main/notebooks/07_parallel_transforms.ipynb)
- [Using skopt for hyperparameter tuning](https://github.com/EBjerrum/scikit-mol/tree/main/notebooks/08_external_library_skopt.ipynb)
- [Testing different fingerprints as part of the hyperparameter optimization](https://github.com/EBjerrum/scikit-mol/blob/main/notebooks/09_Combinatorial_Method_Usage_with_FingerPrint_Transformers.ipynb)
- [Using pandas output for easy feature importance analysis and combine pre-exisitng values with new computations](https://github.com/EBjerrum/scikit-mol/blob/main/notebooks/10_pipeline_pandas_output.ipynb)

  We also put a software note on ChemRxiv. [https://doi.org/10.26434/chemrxiv-2023-fzqwd](https://doi.org/10.26434/chemrxiv-2023-fzqwd)

## Contributing

There are more information about how to contribute to the project in [CONTRIBUTION.md](https://github.com/EBjerrum/scikit-mol/CONTRIBUTION.md)

## BUGS

Probably still, please check issues at GitHub and report there

## Contributers:

- Esben Jannik Bjerrum [@ebjerrum](https://github.com/ebjerrum), esbenbjerrum+scikit_mol@gmail.com
- Carmen Esposito [@cespos](https://github.com/cespos)
- Son Ha, sonha@uni-mainz.de
- Oh-hyeon Choung, ohhyeon.choung@gmail.com
- Andreas Poehlmann, [@ap--](https://github.com/ap--)
- Ya Chen, [@anya-chen](https://github.com/anya-chen)
- RafaƂ Bachorz [@rafalbachorz](https://github.com/rafalbachorz)
- Adrien Chaton [@adrienchaton](https://github.com/adrienchaton)
- [@VincentAlexanderScholz](https://github.com/VincentAlexanderScholz)
- [@RiesBen](https://github.com/RiesBen)
- [@enricogandini](https://github.com/enricogandini)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/EBjerrum/scikit-mol",
    "name": "scikit-mol",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": null,
    "author": "Esben Jannik Bjerrum",
    "author_email": "esben@cheminformania.com",
    "download_url": "https://files.pythonhosted.org/packages/16/92/2dbef27aa988d4da8ac5d1837679e2da986e43c110f90b87bbe154cff3d6/scikit_mol-0.3.1.tar.gz",
    "platform": null,
    "description": "# scikit-mol\n\n![Fancy logo](./ressources/logo/ScikitMol_Logo_DarkBG_300px.png#gh-dark-mode-only)\n![Fancy logo](./ressources/logo/ScikitMol_Logo_LightBG_300px.png#gh-light-mode-only)\n\n## Scikit-Learn classes for molecular vectorization using RDKit\n\nThe intended usage is to be able to add molecular vectorization directly into scikit-learn pipelines, so that the final model directly predict on RDKit molecules or SMILES strings\n\nAs example with the needed scikit-learn and -mol imports and RDKit mol objects in the mol_list_train and \\_test lists:\n\n    pipe = Pipeline([('mol_transformer', MorganFingerprintTransformer()), ('Regressor', Ridge())])\n    pipe.fit(mol_list_train, y_train)\n    pipe.score(mol_list_test, y_test)\n    pipe.predict([Chem.MolFromSmiles('c1ccccc1C(=O)C')])\n\n    >>> array([4.93858815])\n\nThe scikit-learn compatibility should also make it easier to include the fingerprinting step in hyperparameter tuning with scikit-learns utilities\n\nThe first draft for the project was created at the [RDKIT UGM 2022 hackathon](https://github.com/rdkit/UGM_2022) 2022-October-14\n\n## Implemented\n\n- Descriptors\n  - MolecularDescriptorTransformer\n\n<br>\n\n- Fingerprints\n  - MorganFingerprintTransformer\n  - MACCSKeysFingerprintTransformer\n  - RDKitFingerprintTransformer\n  - AtomPairFingerprintTransformer\n  - TopologicalTorsionFingerprintTransformer\n  - MHFingerprintTransformer\n  - SECFingerprintTransformer\n  - AvalonFingerprintTransformer\n\n<br>\n\n- Conversions\n  - SmilesToMol\n\n<br>\n\n- Standardizer\n  - Standardizer\n\n<br>\n\n- Utilities\n  - CheckSmilesSanitazion\n\n## Installation\n\nUsers can install latest tagged release from pip\n\n    pip install scikit-mol\n\nBleeding edge\n\n    pip install git+https://github.com:EBjerrum/scikit-mol.git\n\n## Documentation\n\nThere are a collection of notebooks in the notebooks directory which demonstrates some different aspects and use cases\n\n- [Basic Usage and fingerprint transformers](https://github.com/EBjerrum/scikit-mol/tree/main/notebooks/01_basic_usage.ipynb)\n- [Descriptor transformer](https://github.com/EBjerrum/scikit-mol/tree/main/notebooks/02_descriptor_transformer.ipynb)\n- [Pipelining with Scikit-Learn classes](https://github.com/EBjerrum/scikit-mol/tree/main/notebooks/03_example_pipeline.ipynb)\n- [Molecular standardization](https://github.com/EBjerrum/scikit-mol/tree/main/notebooks/04_standardizer.ipynb)\n- [Sanitizing SMILES input](https://github.com/EBjerrum/scikit-mol/tree/main/notebooks/05_smiles_sanitaztion.ipynb)\n- [Integrated hyperparameter tuning of Scikit-Learn estimator and Scikit-Mol transformer](https://github.com/EBjerrum/scikit-mol/tree/main/notebooks/06_hyperparameter_tuning.ipynb)\n- [Using parallel execution to speed up descriptor and fingerprint calculations](https://github.com/EBjerrum/scikit-mol/tree/main/notebooks/07_parallel_transforms.ipynb)\n- [Using skopt for hyperparameter tuning](https://github.com/EBjerrum/scikit-mol/tree/main/notebooks/08_external_library_skopt.ipynb)\n- [Testing different fingerprints as part of the hyperparameter optimization](https://github.com/EBjerrum/scikit-mol/blob/main/notebooks/09_Combinatorial_Method_Usage_with_FingerPrint_Transformers.ipynb)\n- [Using pandas output for easy feature importance analysis and combine pre-exisitng values with new computations](https://github.com/EBjerrum/scikit-mol/blob/main/notebooks/10_pipeline_pandas_output.ipynb)\n\n  We also put a software note on ChemRxiv. [https://doi.org/10.26434/chemrxiv-2023-fzqwd](https://doi.org/10.26434/chemrxiv-2023-fzqwd)\n\n## Contributing\n\nThere are more information about how to contribute to the project in [CONTRIBUTION.md](https://github.com/EBjerrum/scikit-mol/CONTRIBUTION.md)\n\n## BUGS\n\nProbably still, please check issues at GitHub and report there\n\n## Contributers:\n\n- Esben Jannik Bjerrum [@ebjerrum](https://github.com/ebjerrum), esbenbjerrum+scikit_mol@gmail.com\n- Carmen Esposito [@cespos](https://github.com/cespos)\n- Son Ha, sonha@uni-mainz.de\n- Oh-hyeon Choung, ohhyeon.choung@gmail.com\n- Andreas Poehlmann, [@ap--](https://github.com/ap--)\n- Ya Chen, [@anya-chen](https://github.com/anya-chen)\n- Rafa\u0142 Bachorz [@rafalbachorz](https://github.com/rafalbachorz)\n- Adrien Chaton [@adrienchaton](https://github.com/adrienchaton)\n- [@VincentAlexanderScholz](https://github.com/VincentAlexanderScholz)\n- [@RiesBen](https://github.com/RiesBen)\n- [@enricogandini](https://github.com/enricogandini)\n",
    "bugtrack_url": null,
    "license": "LGPL-3.0",
    "summary": "scikit-learn classes for molecule transformation",
    "version": "0.3.1",
    "project_urls": {
        "Download": "https://github.com/EBjerrum/scikit-mol",
        "Homepage": "https://github.com/EBjerrum/scikit-mol"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "def9ab4dd88b76e46d3e5b6fddfae28fc4f29ccdeafdfe77d12478e96f4572c8",
                "md5": "73c275cbe82f509af2bdf489b98aa219",
                "sha256": "07bfb63ab37f3051522e3156ee1a383f1fb5d4822eaaae4ea21d839d4432206d"
            },
            "downloads": -1,
            "filename": "scikit_mol-0.3.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "73c275cbe82f509af2bdf489b98aa219",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 18165,
            "upload_time": "2024-04-26T09:54:44",
            "upload_time_iso_8601": "2024-04-26T09:54:44.126476Z",
            "url": "https://files.pythonhosted.org/packages/de/f9/ab4dd88b76e46d3e5b6fddfae28fc4f29ccdeafdfe77d12478e96f4572c8/scikit_mol-0.3.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "16922dbef27aa988d4da8ac5d1837679e2da986e43c110f90b87bbe154cff3d6",
                "md5": "09667efffffd1be75c1303336b0ea2c8",
                "sha256": "da32327092e64d0be85774c0ea19457f7a08fc02a7a4beb8c895183470302c69"
            },
            "downloads": -1,
            "filename": "scikit_mol-0.3.1.tar.gz",
            "has_sig": false,
            "md5_digest": "09667efffffd1be75c1303336b0ea2c8",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 3184934,
            "upload_time": "2024-04-26T09:54:46",
            "upload_time_iso_8601": "2024-04-26T09:54:46.221852Z",
            "url": "https://files.pythonhosted.org/packages/16/92/2dbef27aa988d4da8ac5d1837679e2da986e43c110f90b87bbe154cff3d6/scikit_mol-0.3.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-26 09:54:46",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "EBjerrum",
    "github_project": "scikit-mol",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "scikit-mol"
}
        
Elapsed time: 4.45770s