Development Status :: 3 - Alpha
# SMILES featurizer
<p align="left">
<a href="https://github.com/dsdanielpark/SMILES-featurizer"><img alt="PyPI package" src="https://img.shields.io/badge/pypi-SMILES featurizer-black"></a>
<a href="https://github.com/psf/black"><img alt="Code style: black" src="https://img.shields.io/badge/code%20style-black-000000.svg"></a>
<a href="https://hits.seeyoufarm.com"><img src="https://hits.seeyoufarm.com/api/count/incr/badge.svg?url=https%3A%2F%2Fgithub.com%2Fdsdanielpark%2FSMILES-featurizer&count_bg=%23000000&title_bg=%23555555&icon=&icon_color=%23E7E7E7&title=hits&edge_flat=false"/></a>
<a href="https://pypi.org/project/smilesfeaturizer/"><img alt="PyPI" src="https://img.shields.io/pypi/v/smilesfeaturizer"></a>
</p>
A Python package that automatically generates derived feature variables from a column with SMILES (Simplified Molecular-Input Line-Entry System)
![](./assets/smilesfeaturizer.gif)
The python package, SMILES Featurizer helps quickly and painlessly explore the baseline and key features for many projects that use SMILES strings. It's still in the development phase, and there are some errors with certain SMILES strings due to dependencies in the package. There are no scheduled regular updates, and I welcome pull requests at any time. *I intentionally did not encapsulate it highly as a class, and I maintain it in the form of functions. This is because it is based on the processing of a single data frame and because the service is highly likely to be modified.*
<br>
## Install
```
$ pip install smilesfeaturizer
```
```
$ pip install git+https://github.com/dsdanielpark/SMILES-featurizer.git
```
<br>
## Usage [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://drive.google.com/file/d/1BHTtOEvl577FyrQ5kLK-yJ9h9EDVUvGg/view?usp=sharing)
The dataset assumes the presence of SMILES strings in a column named SMILES. See [tutorial notebook](https://github.com/dsdanielpark/SMILES-featurizer/blob/main/tutorial.ipynb).
### *Feature generation*
- Create fingerprint columns for SMILES representations based on various packages [RDKit](https://www.rdkit.org/), [Mol2Vec](https://github.com/samoturk/mol2vec), [DataMol](https://github.com/datamolorg/datamol), [MolFeat](https://github.com/cplassier/molfeat), [Scikit-Learn](https://scikit-learn.org/stable/).
```python
from smilesfeaturizer import generate_smiles_feature
df = generate_smiles_feature(df) # default method="simple"
df = generate_smiles_feature(df, method="specific")
```
### *Create dashboard*
- Through the dashboard, you can determine which compounds exhibit what prediction performance.
```python
from smilesfeaturizer import create_inline_dash_dashboard
# Load your DataFrame and specify the columns
true_col = 'pIC50'
predicted_col = 'predicted_pIC50'
# Create and run the Dash dashboard
create_inline_dash_dashboard(df, true_col, predicted_col)
```
### *Save reporting images*
- Molecular images, basic information, and the prediction versus actual values are visually represented in bar graphs for easy viewing.
```python
from smilesfeaturizer import smiles_insight_plot
selected_metric = 'RMSE' # Choose the error metric you want to display
true_col = 'pIC50' # Replace with your true column name
predicted_col = 'predicted_pIC50' # Replace with your predicted column name
smiles_insight_plot(df[:1], true_col, predicted_col, selected_metric, 'output_folder', show=True)
```
<br>
## License
[Apache 2.0](https://opensource.org/license/apache-2-0/) <br>
## Bugs and Issues
Sincerely grateful for any reports on new features or bugs. Your valuable feedback on the code is highly appreciated.
## Contacts
- Core maintainer: [Daniel Park, South Korea](https://github.com/DSDanielPark) <br>
- E-mail: parkminwoo1991@gmail.com <br>
<br>
*Copyright (c) 2023 MinWoo Park, South Korea*<br>
Raw data
{
"_id": null,
"home_page": "https://github.com/dsdanielpark/SMILES-feature",
"name": "smilesfeaturizer",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": "",
"keywords": "Python,SMILES,Cheminformatics,Molecular Informatics,Molecular Descriptor Generation,Chemical Data Analysis,Computational Chemistry",
"author": "daniel park",
"author_email": "parkminwoo1991@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/1f/ce/58b7e2d6657f7344d10c936069d52aeda6b8a069bf3225d1c355b7f262fd/smilesfeaturizer-0.1.3.tar.gz",
"platform": null,
"description": "Development Status :: 3 - Alpha\r\n\r\n\r\n# SMILES featurizer\r\n\r\n<p align=\"left\">\r\n<a href=\"https://github.com/dsdanielpark/SMILES-featurizer\"><img alt=\"PyPI package\" src=\"https://img.shields.io/badge/pypi-SMILES featurizer-black\"></a>\r\n<a href=\"https://github.com/psf/black\"><img alt=\"Code style: black\" src=\"https://img.shields.io/badge/code%20style-black-000000.svg\"></a>\r\n<a href=\"https://hits.seeyoufarm.com\"><img src=\"https://hits.seeyoufarm.com/api/count/incr/badge.svg?url=https%3A%2F%2Fgithub.com%2Fdsdanielpark%2FSMILES-featurizer&count_bg=%23000000&title_bg=%23555555&icon=&icon_color=%23E7E7E7&title=hits&edge_flat=false\"/></a>\r\n<a href=\"https://pypi.org/project/smilesfeaturizer/\"><img alt=\"PyPI\" src=\"https://img.shields.io/pypi/v/smilesfeaturizer\"></a>\r\n</p>\r\n\r\nA Python package that automatically generates derived feature variables from a column with SMILES (Simplified Molecular-Input Line-Entry System)\r\n\r\n![](./assets/smilesfeaturizer.gif)\r\n\r\n\r\nThe python package, SMILES Featurizer helps quickly and painlessly explore the baseline and key features for many projects that use SMILES strings. It's still in the development phase, and there are some errors with certain SMILES strings due to dependencies in the package. There are no scheduled regular updates, and I welcome pull requests at any time. *I intentionally did not encapsulate it highly as a class, and I maintain it in the form of functions. This is because it is based on the processing of a single data frame and because the service is highly likely to be modified.*\r\n\r\n<br>\r\n\r\n## Install\r\n```\r\n$ pip install smilesfeaturizer\r\n```\r\n```\r\n$ pip install git+https://github.com/dsdanielpark/SMILES-featurizer.git\r\n```\r\n<br>\r\n\r\n## Usage [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://drive.google.com/file/d/1BHTtOEvl577FyrQ5kLK-yJ9h9EDVUvGg/view?usp=sharing) \r\nThe dataset assumes the presence of SMILES strings in a column named SMILES. See [tutorial notebook](https://github.com/dsdanielpark/SMILES-featurizer/blob/main/tutorial.ipynb).\r\n### *Feature generation*\r\n- Create fingerprint columns for SMILES representations based on various packages [RDKit](https://www.rdkit.org/), [Mol2Vec](https://github.com/samoturk/mol2vec), [DataMol](https://github.com/datamolorg/datamol), [MolFeat](https://github.com/cplassier/molfeat), [Scikit-Learn](https://scikit-learn.org/stable/).\r\n\r\n ```python\r\n from smilesfeaturizer import generate_smiles_feature\r\n\r\n df = generate_smiles_feature(df) # default method=\"simple\"\r\n\r\n df = generate_smiles_feature(df, method=\"specific\") \r\n ```\r\n\r\n### *Create dashboard* \r\n- Through the dashboard, you can determine which compounds exhibit what prediction performance. \r\n\r\n ```python\r\n from smilesfeaturizer import create_inline_dash_dashboard\r\n\r\n # Load your DataFrame and specify the columns\r\n true_col = 'pIC50'\r\n predicted_col = 'predicted_pIC50'\r\n\r\n # Create and run the Dash dashboard\r\n create_inline_dash_dashboard(df, true_col, predicted_col)\r\n ```\r\n\r\n### *Save reporting images*\r\n- Molecular images, basic information, and the prediction versus actual values are visually represented in bar graphs for easy viewing.\r\n ```python\r\n from smilesfeaturizer import smiles_insight_plot\r\n\r\n selected_metric = 'RMSE' # Choose the error metric you want to display\r\n true_col = 'pIC50' # Replace with your true column name\r\n predicted_col = 'predicted_pIC50' # Replace with your predicted column name\r\n smiles_insight_plot(df[:1], true_col, predicted_col, selected_metric, 'output_folder', show=True)\r\n ```\r\n\r\n<br>\r\n\r\n## License\r\n[Apache 2.0](https://opensource.org/license/apache-2-0/) <br>\r\n\r\n\r\n## Bugs and Issues\r\nSincerely grateful for any reports on new features or bugs. Your valuable feedback on the code is highly appreciated.\r\n\r\n## Contacts\r\n- Core maintainer: [Daniel Park, South Korea](https://github.com/DSDanielPark) <br>\r\n- E-mail: parkminwoo1991@gmail.com <br>\r\n\r\n<br>\r\n\r\n*Copyright (c) 2023 MinWoo Park, South Korea*<br>\r\n",
"bugtrack_url": null,
"license": "",
"summary": "A Python package that automatically generates derived variables from a column with SMILES (Simplified Molecular-Input Line-Entry System).",
"version": "0.1.3",
"project_urls": {
"Homepage": "https://github.com/dsdanielpark/SMILES-feature"
},
"split_keywords": [
"python",
"smiles",
"cheminformatics",
"molecular informatics",
"molecular descriptor generation",
"chemical data analysis",
"computational chemistry"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "abd13f2fd9abb7734b324c92da1815acdab98a872033df8be542fc349a812b45",
"md5": "c60dc9ea4ea5fccf8819604a9aa89a80",
"sha256": "8a1fc03f0c3665afdc6873bccb3becba0a0e65f4897554fcd8009b4be81d6b4d"
},
"downloads": -1,
"filename": "smilesfeaturizer-0.1.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "c60dc9ea4ea5fccf8819604a9aa89a80",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 19321342,
"upload_time": "2023-10-11T09:33:48",
"upload_time_iso_8601": "2023-10-11T09:33:48.461484Z",
"url": "https://files.pythonhosted.org/packages/ab/d1/3f2fd9abb7734b324c92da1815acdab98a872033df8be542fc349a812b45/smilesfeaturizer-0.1.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "1fce58b7e2d6657f7344d10c936069d52aeda6b8a069bf3225d1c355b7f262fd",
"md5": "1f03c5acf9f85fa5e48033e1d8154e6e",
"sha256": "c9abc16b66011fc9b3bfd1954f3d08b107d07694a68ff43518516ba8f0acd9c8"
},
"downloads": -1,
"filename": "smilesfeaturizer-0.1.3.tar.gz",
"has_sig": false,
"md5_digest": "1f03c5acf9f85fa5e48033e1d8154e6e",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 19303906,
"upload_time": "2023-10-11T09:34:00",
"upload_time_iso_8601": "2023-10-11T09:34:00.808348Z",
"url": "https://files.pythonhosted.org/packages/1f/ce/58b7e2d6657f7344d10c936069d52aeda6b8a069bf3225d1c355b7f262fd/smilesfeaturizer-0.1.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-10-11 09:34:00",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "dsdanielpark",
"github_project": "SMILES-feature",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "smilesfeaturizer"
}