Name | scikit-mol JSON |
Version |
0.5.2
JSON |
| download |
home_page | None |
Summary | scikit-learn classes for molecule transformation |
upload_time | 2025-02-19 19:36:09 |
maintainer | None |
docs_url | None |
author | None |
requires_python | <3.14,>=3.9 |
license | None |
keywords |
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# scikit-mol
<picture>
<source media="(prefers-color-scheme: dark)" srcset="https://github.com/EBjerrum/scikit-mol/blob/30c74b3648c0087bdb1b659bc67ba757d7498e9a/ressources/logo/ScikitMol_Logo_DarkBG_300px.png?raw=true">
<source media="(prefers-color-scheme: light)" srcset="https://github.com/EBjerrum/scikit-mol/blob/30c74b3648c0087bdb1b659bc67ba757d7498e9a/ressources/logo/ScikitMol_Logo_LightBG_300px.png?raw=true">
<img src="https://github.com/EBjerrum/scikit-mol/blob/30c74b3648c0087bdb1b659bc67ba757d7498e9a/ressources/logo/ScikitMol_Logo_LightBG_300px.png?raw=true" alt="Fancy logo">
</picture>
[]()
[](https://pypi.org/project/scikit-mol/)
[](https://anaconda.org/conda-forge/scikit-mol)
[](#)
[](https://www.rdkit.org/)
[](https://github.com/astral-sh/ruff)
## Scikit-Learn classes for molecular vectorization using RDKit
The intended usage is to be able to add molecular vectorization directly into scikit-learn pipelines, so that the final model directly predict on RDKit molecules or SMILES strings
As example with the needed scikit-learn and -mol imports and RDKit mol objects in the mol_list_train and \_test lists:
pipe = Pipeline([('mol_transformer', MorganFingerprintTransformer()), ('Regressor', Ridge())])
pipe.fit(mol_list_train, y_train)
pipe.score(mol_list_test, y_test)
pipe.predict([Chem.MolFromSmiles('c1ccccc1C(=O)C')])
>>> array([4.93858815])
The scikit-learn compatibility should also make it easier to include the fingerprinting step in hyperparameter tuning with scikit-learns utilities
The first draft for the project was created at the [RDKIT UGM 2022 hackathon](https://github.com/rdkit/UGM_2022) 2022-October-14
## Installation
Users can install latest tagged release from pip
```sh
pip install scikit-mol
```
or from conda-forge
```sh
conda install -c conda-forge scikit-mol
```
The conda forge package should get updated shortly after a new tagged release on pypi.
Bleeding edge
```sh
pip install git+https://github.com:EBjerrum/scikit-mol.git
```
## Documentation
Example notebooks and API documentation are now hosted on [https://scikit-mol.readthedocs.io](https://scikit-mol.readthedocs.io/en/latest/)
- [Basic Usage and fingerprint transformers](https://scikit-mol.readthedocs.io/en/latest/notebooks/01_basic_usage/)
- [Descriptor transformer](https://scikit-mol.readthedocs.io/en/latest/notebooks/02_descriptor_transformer/)
- [Pipelining with Scikit-Learn classes](https://scikit-mol.readthedocs.io/en/latest/notebooks/03_example_pipeline/)
- [Molecular standardization](https://scikit-mol.readthedocs.io/en/latest/notebooks/04_standardizer/)
- [Sanitizing SMILES input](https://scikit-mol.readthedocs.io/en/latest/notebooks/05_smiles_sanitaztion/)
- [Integrated hyperparameter tuning of Scikit-Learn estimator and Scikit-Mol transformer](https://scikit-mol.readthedocs.io/en/latest/notebooks/06_hyperparameter_tuning/)
- [Using parallel execution to speed up descriptor and fingerprint calculations](https://scikit-mol.readthedocs.io/en/latest/notebooks/07_parallel_transforms/)
- [Using skopt for hyperparameter tuning](https://scikit-mol.readthedocs.io/en/latest/notebooks/08_external_library_skopt/)
- [Testing different fingerprints as part of the hyperparameter optimization](https://scikit-mol.readthedocs.io/en/latest/notebooks/09_Combinatorial_Method_Usage_with_FingerPrint_Transformers/)
- [Using pandas output for easy feature importance analysis and combine pre-exisitng values with new computations](https://scikit-mol.readthedocs.io/en/latest/notebooks/10_pipeline_pandas_output/)
- [Working with pipelines and estimators in safe inference mode for handling prediction on batches with invalid smiles or molecules](https://scikit-mol.readthedocs.io/en/latest/notebooks/11_safe_inference/)
We also put a software note on ChemRxiv. [https://doi.org/10.26434/chemrxiv-2023-fzqwd](https://doi.org/10.26434/chemrxiv-2023-fzqwd)
## Other use-examples
Scikit-Mol has been featured in blog-posts or used in research, some examples which are listed below:
- [Useful ML package for cheminformatics iwatobipen.wordpress.com](https://iwatobipen.wordpress.com/2023/11/12/useful-ml-package-for-cheminformatics-rdkit-cheminformatics-ml/)
- [Boosted trees Data_in_life_blog](https://jhylin.github.io/Data_in_life_blog/posts/19_ML2-3_Boosted_trees/1_adaboost_xgb.html)
- [Konnektor: A Framework for Using Graph Theory to Plan Networks for Free Energy Calculations](https://pubs.acs.org/doi/abs/10.1021/acs.jcim.4c01710)
- [Moldrug algorithm for an automated ligand binding site exploration by 3D aware molecular enumerations](https://chemrxiv.org/engage/chemrxiv/article-details/67688633fa469535b97c1b73)
- [RandomNets Improve Neural Network Regression Performance via Implicit Ensembling](https://chemrxiv.org/engage/chemrxiv/article-details/67656cfa81d2151a02603f48)
- [WAE-DTI: Ensemble-based architecture for drug–target interaction prediction using descriptors and embeddings](https://www.sciencedirect.com/science/article/pii/S2352914824001618)
- [Data Driven Estimation of Molecular Log-Likelihood using Fingerprint Key Counting](https://chemrxiv.org/engage/chemrxiv/article-details/661402ee21291e5d1d646651)
- [AUTONOMOUS DRUG DISCOVERY](https://www.proquest.com/openview/3e830e36bc618f263905a99e787c66c6/1?pq-origsite=gscholar&cbl=18750&diss=y)
## Roadmap and Contributing
_Help wanted!_ Are you a PhD student that want a "side-quest" to procrastinate your thesis writing or are you simply interested in computational chemistry, cheminformatics or simply with an interest in QSAR modelling, Python Programming open-source software? Do you want to learn more about machine learning with Scikit-Learn? Or do you use scikit-mol for your current work and would like to pay a little back to the project and see it improved as well?
With a little bit of help, this project can be improved much faster! Reach to me (Esben), for a discussion about how we can proceed.
Currently we are working on fixing some deprecation warnings, its not the most exciting work, but it's important to maintain a little. Later on we need to go over the scikit-learn compatibility and update to some of their newer features on their estimator classes. We're also brewing on some feature enhancements and tests, such as new fingerprints and a more versatile standardizer.
There are more information about how to contribute to the project in [CONTRIBUTING](CONTRIBUTING.md)
## BUGS
Probably still, please check issues at GitHub and report there
## Contributers:
Scikit-Mol has been developed as a community effort with contributions from people from many different companies, consortia, foundations and academic institutions.
[Cheminformania Consulting](https://www.cheminformania.com), [Aptuit](https://www.linkedin.com/company/aptuit/), [BASF](https://www.basf.com), [Bayer AG](https://www.bayer.com), [Boehringer Ingelheim](https://www.boehringer-ingelheim.com/), [Chodera Lab (MSKCC)](https://www.choderalab.org/), [EPAM Systems](https://www.epam.com/),[ETH Zürich](https://ethz.ch/en.html), [Evotec](https://www.evotec.com/), [Johannes Gutenberg University](https://www.uni-mainz.de/en/), [Martin Luther University](https://www.uni-halle.de/?lang=en), [Odyssey Therapeutics](https://odysseytx.com/), [Open Molecular Software Foundation](https://omsf.io/), [Openfree.energy](https://openfree.energy/), [Polish Academy of Sciences](https://pasific.pan.pl/polish-academy-of-sciences/), [Productivista](https://www.productivista.com), [Simulations-Plus Inc.](https://www.simulations-plus.com/), [University of Vienna](https://www.univie.ac.at/en/)
- Esben Jannik Bjerrum [@ebjerrum](https://github.com/ebjerrum), esbenbjerrum+scikit_mol@gmail.com
- Carmen Esposito [@cespos](https://github.com/cespos)
- Son Ha, sonha@uni-mainz.de
- Oh-hyeon Choung, ohhyeon.choung@gmail.com
- Andreas Poehlmann, [@ap--](https://github.com/ap--)
- Ya Chen, [@anya-chen](https://github.com/anya-chen)
- Anton Siomchen [@asiomchen](https://github.com/asiomchen)
- Rafał Bachorz [@rafalbachorz](https://github.com/rafalbachorz)
- Adrien Chaton [@adrienchaton](https://github.com/adrienchaton)
- [@VincentAlexanderScholz](https://github.com/VincentAlexanderScholz)
- [@RiesBen](https://github.com/RiesBen)
- [@enricogandini](https://github.com/enricogandini)
- [@mikemhenry](https://github.com/mikemhenry)
- [@c-feldmann](https://github.com/c-feldmann)
Raw data
{
"_id": null,
"home_page": null,
"name": "scikit-mol",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.14,>=3.9",
"maintainer_email": null,
"keywords": null,
"author": null,
"author_email": "Esben Jannik Bjerrum <esben@cheminformania.com>",
"download_url": "https://files.pythonhosted.org/packages/42/22/903d7488ec884291af05c56d34f6c0465be4796fcbb7d6cbbd8a719db7f8/scikit_mol-0.5.2.tar.gz",
"platform": null,
"description": "# scikit-mol\n\n<picture>\n <source media=\"(prefers-color-scheme: dark)\" srcset=\"https://github.com/EBjerrum/scikit-mol/blob/30c74b3648c0087bdb1b659bc67ba757d7498e9a/ressources/logo/ScikitMol_Logo_DarkBG_300px.png?raw=true\">\n <source media=\"(prefers-color-scheme: light)\" srcset=\"https://github.com/EBjerrum/scikit-mol/blob/30c74b3648c0087bdb1b659bc67ba757d7498e9a/ressources/logo/ScikitMol_Logo_LightBG_300px.png?raw=true\">\n <img src=\"https://github.com/EBjerrum/scikit-mol/blob/30c74b3648c0087bdb1b659bc67ba757d7498e9a/ressources/logo/ScikitMol_Logo_LightBG_300px.png?raw=true\" alt=\"Fancy logo\">\n</picture>\n\n[]()\n\n[](https://pypi.org/project/scikit-mol/)\n[](https://anaconda.org/conda-forge/scikit-mol)\n[](#)\n\n[](https://www.rdkit.org/)\n[](https://github.com/astral-sh/ruff)\n\n## Scikit-Learn classes for molecular vectorization using RDKit\n\nThe intended usage is to be able to add molecular vectorization directly into scikit-learn pipelines, so that the final model directly predict on RDKit molecules or SMILES strings\n\nAs example with the needed scikit-learn and -mol imports and RDKit mol objects in the mol_list_train and \\_test lists:\n\n pipe = Pipeline([('mol_transformer', MorganFingerprintTransformer()), ('Regressor', Ridge())])\n pipe.fit(mol_list_train, y_train)\n pipe.score(mol_list_test, y_test)\n pipe.predict([Chem.MolFromSmiles('c1ccccc1C(=O)C')])\n\n >>> array([4.93858815])\n\nThe scikit-learn compatibility should also make it easier to include the fingerprinting step in hyperparameter tuning with scikit-learns utilities\n\nThe first draft for the project was created at the [RDKIT UGM 2022 hackathon](https://github.com/rdkit/UGM_2022) 2022-October-14\n\n## Installation\n\nUsers can install latest tagged release from pip\n\n```sh\npip install scikit-mol\n```\n\nor from conda-forge\n\n```sh\nconda install -c conda-forge scikit-mol\n```\n\nThe conda forge package should get updated shortly after a new tagged release on pypi.\n\nBleeding edge\n\n```sh\npip install git+https://github.com:EBjerrum/scikit-mol.git\n```\n\n## Documentation\n\nExample notebooks and API documentation are now hosted on [https://scikit-mol.readthedocs.io](https://scikit-mol.readthedocs.io/en/latest/)\n\n- [Basic Usage and fingerprint transformers](https://scikit-mol.readthedocs.io/en/latest/notebooks/01_basic_usage/)\n- [Descriptor transformer](https://scikit-mol.readthedocs.io/en/latest/notebooks/02_descriptor_transformer/)\n- [Pipelining with Scikit-Learn classes](https://scikit-mol.readthedocs.io/en/latest/notebooks/03_example_pipeline/)\n- [Molecular standardization](https://scikit-mol.readthedocs.io/en/latest/notebooks/04_standardizer/)\n- [Sanitizing SMILES input](https://scikit-mol.readthedocs.io/en/latest/notebooks/05_smiles_sanitaztion/)\n- [Integrated hyperparameter tuning of Scikit-Learn estimator and Scikit-Mol transformer](https://scikit-mol.readthedocs.io/en/latest/notebooks/06_hyperparameter_tuning/)\n- [Using parallel execution to speed up descriptor and fingerprint calculations](https://scikit-mol.readthedocs.io/en/latest/notebooks/07_parallel_transforms/)\n- [Using skopt for hyperparameter tuning](https://scikit-mol.readthedocs.io/en/latest/notebooks/08_external_library_skopt/)\n- [Testing different fingerprints as part of the hyperparameter optimization](https://scikit-mol.readthedocs.io/en/latest/notebooks/09_Combinatorial_Method_Usage_with_FingerPrint_Transformers/)\n- [Using pandas output for easy feature importance analysis and combine pre-exisitng values with new computations](https://scikit-mol.readthedocs.io/en/latest/notebooks/10_pipeline_pandas_output/)\n- [Working with pipelines and estimators in safe inference mode for handling prediction on batches with invalid smiles or molecules](https://scikit-mol.readthedocs.io/en/latest/notebooks/11_safe_inference/)\n\nWe also put a software note on ChemRxiv. [https://doi.org/10.26434/chemrxiv-2023-fzqwd](https://doi.org/10.26434/chemrxiv-2023-fzqwd)\n\n## Other use-examples\n\nScikit-Mol has been featured in blog-posts or used in research, some examples which are listed below:\n\n- [Useful ML package for cheminformatics iwatobipen.wordpress.com](https://iwatobipen.wordpress.com/2023/11/12/useful-ml-package-for-cheminformatics-rdkit-cheminformatics-ml/)\n- [Boosted trees Data_in_life_blog](https://jhylin.github.io/Data_in_life_blog/posts/19_ML2-3_Boosted_trees/1_adaboost_xgb.html)\n- [Konnektor: A Framework for Using Graph Theory to Plan Networks for Free Energy Calculations](https://pubs.acs.org/doi/abs/10.1021/acs.jcim.4c01710)\n- [Moldrug algorithm for an automated ligand binding site exploration by 3D aware molecular enumerations](https://chemrxiv.org/engage/chemrxiv/article-details/67688633fa469535b97c1b73)\n- [RandomNets Improve Neural Network Regression Performance via Implicit Ensembling](https://chemrxiv.org/engage/chemrxiv/article-details/67656cfa81d2151a02603f48)\n- [WAE-DTI: Ensemble-based architecture for drug\u2013target interaction prediction using descriptors and embeddings](https://www.sciencedirect.com/science/article/pii/S2352914824001618)\n- [Data Driven Estimation of Molecular Log-Likelihood using Fingerprint Key Counting](https://chemrxiv.org/engage/chemrxiv/article-details/661402ee21291e5d1d646651)\n- [AUTONOMOUS DRUG DISCOVERY](https://www.proquest.com/openview/3e830e36bc618f263905a99e787c66c6/1?pq-origsite=gscholar&cbl=18750&diss=y)\n\n## Roadmap and Contributing\n\n_Help wanted!_ Are you a PhD student that want a \"side-quest\" to procrastinate your thesis writing or are you simply interested in computational chemistry, cheminformatics or simply with an interest in QSAR modelling, Python Programming open-source software? Do you want to learn more about machine learning with Scikit-Learn? Or do you use scikit-mol for your current work and would like to pay a little back to the project and see it improved as well?\nWith a little bit of help, this project can be improved much faster! Reach to me (Esben), for a discussion about how we can proceed.\n\nCurrently we are working on fixing some deprecation warnings, its not the most exciting work, but it's important to maintain a little. Later on we need to go over the scikit-learn compatibility and update to some of their newer features on their estimator classes. We're also brewing on some feature enhancements and tests, such as new fingerprints and a more versatile standardizer.\n\nThere are more information about how to contribute to the project in [CONTRIBUTING](CONTRIBUTING.md)\n\n## BUGS\n\nProbably still, please check issues at GitHub and report there\n\n## Contributers:\n\nScikit-Mol has been developed as a community effort with contributions from people from many different companies, consortia, foundations and academic institutions.\n\n[Cheminformania Consulting](https://www.cheminformania.com), [Aptuit](https://www.linkedin.com/company/aptuit/), [BASF](https://www.basf.com), [Bayer AG](https://www.bayer.com), [Boehringer Ingelheim](https://www.boehringer-ingelheim.com/), [Chodera Lab (MSKCC)](https://www.choderalab.org/), [EPAM Systems](https://www.epam.com/),[ETH Z\u00fcrich](https://ethz.ch/en.html), [Evotec](https://www.evotec.com/), [Johannes Gutenberg University](https://www.uni-mainz.de/en/), [Martin Luther University](https://www.uni-halle.de/?lang=en), [Odyssey Therapeutics](https://odysseytx.com/), [Open Molecular Software Foundation](https://omsf.io/), [Openfree.energy](https://openfree.energy/), [Polish Academy of Sciences](https://pasific.pan.pl/polish-academy-of-sciences/), [Productivista](https://www.productivista.com), [Simulations-Plus Inc.](https://www.simulations-plus.com/), [University of Vienna](https://www.univie.ac.at/en/)\n\n- Esben Jannik Bjerrum [@ebjerrum](https://github.com/ebjerrum), esbenbjerrum+scikit_mol@gmail.com\n- Carmen Esposito [@cespos](https://github.com/cespos)\n- Son Ha, sonha@uni-mainz.de\n- Oh-hyeon Choung, ohhyeon.choung@gmail.com\n- Andreas Poehlmann, [@ap--](https://github.com/ap--)\n- Ya Chen, [@anya-chen](https://github.com/anya-chen)\n- Anton Siomchen [@asiomchen](https://github.com/asiomchen)\n- Rafa\u0142 Bachorz [@rafalbachorz](https://github.com/rafalbachorz)\n- Adrien Chaton [@adrienchaton](https://github.com/adrienchaton)\n- [@VincentAlexanderScholz](https://github.com/VincentAlexanderScholz)\n- [@RiesBen](https://github.com/RiesBen)\n- [@enricogandini](https://github.com/enricogandini)\n- [@mikemhenry](https://github.com/mikemhenry)\n- [@c-feldmann](https://github.com/c-feldmann)\n",
"bugtrack_url": null,
"license": null,
"summary": "scikit-learn classes for molecule transformation",
"version": "0.5.2",
"project_urls": {
"Download": "https://github.com/EBjerrum/scikit-mol",
"Homepage": "https://github.com/EBjerrum/scikit-mol"
},
"split_keywords": [],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "40832b30c69db9b10556f217800c8c56f6900540f653cb24c220115e21868454",
"md5": "94374df7182bc70d9200dec25c1c7fe3",
"sha256": "2c1f77a1bff1c4d34a5ffa7e2f6dbcb0c3558a502f2fe153cc68ebcb63a42cd8"
},
"downloads": -1,
"filename": "scikit_mol-0.5.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "94374df7182bc70d9200dec25c1c7fe3",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.14,>=3.9",
"size": 35350,
"upload_time": "2025-02-19T19:36:07",
"upload_time_iso_8601": "2025-02-19T19:36:07.183822Z",
"url": "https://files.pythonhosted.org/packages/40/83/2b30c69db9b10556f217800c8c56f6900540f653cb24c220115e21868454/scikit_mol-0.5.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "4222903d7488ec884291af05c56d34f6c0465be4796fcbb7d6cbbd8a719db7f8",
"md5": "ac2b73373e7fbeeb12b778b261271913",
"sha256": "48c9ddf5ac0f470160324f8cf9d3a6849bc7784a81a4440398553c63cdd26b85"
},
"downloads": -1,
"filename": "scikit_mol-0.5.2.tar.gz",
"has_sig": false,
"md5_digest": "ac2b73373e7fbeeb12b778b261271913",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.14,>=3.9",
"size": 25580,
"upload_time": "2025-02-19T19:36:09",
"upload_time_iso_8601": "2025-02-19T19:36:09.129472Z",
"url": "https://files.pythonhosted.org/packages/42/22/903d7488ec884291af05c56d34f6c0465be4796fcbb7d6cbbd8a719db7f8/scikit_mol-0.5.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-02-19 19:36:09",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "EBjerrum",
"github_project": "scikit-mol",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "scikit-mol"
}