smilesfeaturizer


Namesmilesfeaturizer JSON
Version 0.1.3 PyPI version JSON
download
home_pagehttps://github.com/dsdanielpark/SMILES-feature
SummaryA Python package that automatically generates derived variables from a column with SMILES (Simplified Molecular-Input Line-Entry System).
upload_time2023-10-11 09:34:00
maintainer
docs_urlNone
authordaniel park
requires_python>=3.6
license
keywords python smiles cheminformatics molecular informatics molecular descriptor generation chemical data analysis computational chemistry
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            Development Status :: 3 - Alpha


# SMILES featurizer

<p align="left">
<a href="https://github.com/dsdanielpark/SMILES-featurizer"><img alt="PyPI package" src="https://img.shields.io/badge/pypi-SMILES featurizer-black"></a>
<a href="https://github.com/psf/black"><img alt="Code style: black" src="https://img.shields.io/badge/code%20style-black-000000.svg"></a>
<a href="https://hits.seeyoufarm.com"><img src="https://hits.seeyoufarm.com/api/count/incr/badge.svg?url=https%3A%2F%2Fgithub.com%2Fdsdanielpark%2FSMILES-featurizer&count_bg=%23000000&title_bg=%23555555&icon=&icon_color=%23E7E7E7&title=hits&edge_flat=false"/></a>
<a href="https://pypi.org/project/smilesfeaturizer/"><img alt="PyPI" src="https://img.shields.io/pypi/v/smilesfeaturizer"></a>
</p>

A Python package that automatically generates derived feature variables from a column with SMILES (Simplified Molecular-Input Line-Entry System)

![](./assets/smilesfeaturizer.gif)


The python package, SMILES Featurizer helps quickly and painlessly explore the baseline and key features for many projects that use SMILES strings. It's still in the development phase, and there are some errors with certain SMILES strings due to dependencies in the package. There are no scheduled regular updates, and I welcome pull requests at any time. *I intentionally did not encapsulate it highly as a class, and I maintain it in the form of functions. This is because it is based on the processing of a single data frame and because the service is highly likely to be modified.*

<br>

## Install
```
$ pip install smilesfeaturizer
```
```
$ pip install git+https://github.com/dsdanielpark/SMILES-featurizer.git
```
<br>

## Usage [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://drive.google.com/file/d/1BHTtOEvl577FyrQ5kLK-yJ9h9EDVUvGg/view?usp=sharing) 
The dataset assumes the presence of SMILES strings in a column named SMILES. See [tutorial notebook](https://github.com/dsdanielpark/SMILES-featurizer/blob/main/tutorial.ipynb).
### *Feature generation*
- Create fingerprint columns for SMILES representations based on various packages [RDKit](https://www.rdkit.org/), [Mol2Vec](https://github.com/samoturk/mol2vec), [DataMol](https://github.com/datamolorg/datamol), [MolFeat](https://github.com/cplassier/molfeat), [Scikit-Learn](https://scikit-learn.org/stable/).

    ```python
    from smilesfeaturizer import generate_smiles_feature

    df = generate_smiles_feature(df) # default method="simple"

    df = generate_smiles_feature(df, method="specific") 
    ```

### *Create dashboard* 
- Through the dashboard, you can determine which compounds exhibit what prediction performance. 

    ```python
    from smilesfeaturizer import create_inline_dash_dashboard

    # Load your DataFrame and specify the columns
    true_col = 'pIC50'
    predicted_col = 'predicted_pIC50'

    # Create and run the Dash dashboard
    create_inline_dash_dashboard(df, true_col, predicted_col)
    ```

### *Save reporting images*
- Molecular images, basic information, and the prediction versus actual values are visually represented in bar graphs for easy viewing.
    ```python
    from smilesfeaturizer import smiles_insight_plot

    selected_metric = 'RMSE'  # Choose the error metric you want to display
    true_col = 'pIC50'  # Replace with your true column name
    predicted_col = 'predicted_pIC50'  # Replace with your predicted column name
    smiles_insight_plot(df[:1], true_col, predicted_col, selected_metric, 'output_folder', show=True)
    ```

<br>

## License
[Apache 2.0](https://opensource.org/license/apache-2-0/) <br>


## Bugs and Issues
Sincerely grateful for any reports on new features or bugs. Your valuable feedback on the code is highly appreciated.

## Contacts
- Core maintainer: [Daniel Park, South Korea](https://github.com/DSDanielPark) <br>
- E-mail: parkminwoo1991@gmail.com <br>

<br>

*Copyright (c) 2023 MinWoo Park, South Korea*<br>

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/dsdanielpark/SMILES-feature",
    "name": "smilesfeaturizer",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "Python,SMILES,Cheminformatics,Molecular Informatics,Molecular Descriptor Generation,Chemical Data Analysis,Computational Chemistry",
    "author": "daniel park",
    "author_email": "parkminwoo1991@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/1f/ce/58b7e2d6657f7344d10c936069d52aeda6b8a069bf3225d1c355b7f262fd/smilesfeaturizer-0.1.3.tar.gz",
    "platform": null,
    "description": "Development Status :: 3 - Alpha\r\n\r\n\r\n# SMILES featurizer\r\n\r\n<p align=\"left\">\r\n<a href=\"https://github.com/dsdanielpark/SMILES-featurizer\"><img alt=\"PyPI package\" src=\"https://img.shields.io/badge/pypi-SMILES featurizer-black\"></a>\r\n<a href=\"https://github.com/psf/black\"><img alt=\"Code style: black\" src=\"https://img.shields.io/badge/code%20style-black-000000.svg\"></a>\r\n<a href=\"https://hits.seeyoufarm.com\"><img src=\"https://hits.seeyoufarm.com/api/count/incr/badge.svg?url=https%3A%2F%2Fgithub.com%2Fdsdanielpark%2FSMILES-featurizer&count_bg=%23000000&title_bg=%23555555&icon=&icon_color=%23E7E7E7&title=hits&edge_flat=false\"/></a>\r\n<a href=\"https://pypi.org/project/smilesfeaturizer/\"><img alt=\"PyPI\" src=\"https://img.shields.io/pypi/v/smilesfeaturizer\"></a>\r\n</p>\r\n\r\nA Python package that automatically generates derived feature variables from a column with SMILES (Simplified Molecular-Input Line-Entry System)\r\n\r\n![](./assets/smilesfeaturizer.gif)\r\n\r\n\r\nThe python package, SMILES Featurizer helps quickly and painlessly explore the baseline and key features for many projects that use SMILES strings. It's still in the development phase, and there are some errors with certain SMILES strings due to dependencies in the package. There are no scheduled regular updates, and I welcome pull requests at any time. *I intentionally did not encapsulate it highly as a class, and I maintain it in the form of functions. This is because it is based on the processing of a single data frame and because the service is highly likely to be modified.*\r\n\r\n<br>\r\n\r\n## Install\r\n```\r\n$ pip install smilesfeaturizer\r\n```\r\n```\r\n$ pip install git+https://github.com/dsdanielpark/SMILES-featurizer.git\r\n```\r\n<br>\r\n\r\n## Usage [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://drive.google.com/file/d/1BHTtOEvl577FyrQ5kLK-yJ9h9EDVUvGg/view?usp=sharing) \r\nThe dataset assumes the presence of SMILES strings in a column named SMILES. See [tutorial notebook](https://github.com/dsdanielpark/SMILES-featurizer/blob/main/tutorial.ipynb).\r\n### *Feature generation*\r\n- Create fingerprint columns for SMILES representations based on various packages [RDKit](https://www.rdkit.org/), [Mol2Vec](https://github.com/samoturk/mol2vec), [DataMol](https://github.com/datamolorg/datamol), [MolFeat](https://github.com/cplassier/molfeat), [Scikit-Learn](https://scikit-learn.org/stable/).\r\n\r\n    ```python\r\n    from smilesfeaturizer import generate_smiles_feature\r\n\r\n    df = generate_smiles_feature(df) # default method=\"simple\"\r\n\r\n    df = generate_smiles_feature(df, method=\"specific\") \r\n    ```\r\n\r\n### *Create dashboard* \r\n- Through the dashboard, you can determine which compounds exhibit what prediction performance. \r\n\r\n    ```python\r\n    from smilesfeaturizer import create_inline_dash_dashboard\r\n\r\n    # Load your DataFrame and specify the columns\r\n    true_col = 'pIC50'\r\n    predicted_col = 'predicted_pIC50'\r\n\r\n    # Create and run the Dash dashboard\r\n    create_inline_dash_dashboard(df, true_col, predicted_col)\r\n    ```\r\n\r\n### *Save reporting images*\r\n- Molecular images, basic information, and the prediction versus actual values are visually represented in bar graphs for easy viewing.\r\n    ```python\r\n    from smilesfeaturizer import smiles_insight_plot\r\n\r\n    selected_metric = 'RMSE'  # Choose the error metric you want to display\r\n    true_col = 'pIC50'  # Replace with your true column name\r\n    predicted_col = 'predicted_pIC50'  # Replace with your predicted column name\r\n    smiles_insight_plot(df[:1], true_col, predicted_col, selected_metric, 'output_folder', show=True)\r\n    ```\r\n\r\n<br>\r\n\r\n## License\r\n[Apache 2.0](https://opensource.org/license/apache-2-0/) <br>\r\n\r\n\r\n## Bugs and Issues\r\nSincerely grateful for any reports on new features or bugs. Your valuable feedback on the code is highly appreciated.\r\n\r\n## Contacts\r\n- Core maintainer: [Daniel Park, South Korea](https://github.com/DSDanielPark) <br>\r\n- E-mail: parkminwoo1991@gmail.com <br>\r\n\r\n<br>\r\n\r\n*Copyright (c) 2023 MinWoo Park, South Korea*<br>\r\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "A Python package that automatically generates derived variables from a column with SMILES (Simplified Molecular-Input Line-Entry System).",
    "version": "0.1.3",
    "project_urls": {
        "Homepage": "https://github.com/dsdanielpark/SMILES-feature"
    },
    "split_keywords": [
        "python",
        "smiles",
        "cheminformatics",
        "molecular informatics",
        "molecular descriptor generation",
        "chemical data analysis",
        "computational chemistry"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "abd13f2fd9abb7734b324c92da1815acdab98a872033df8be542fc349a812b45",
                "md5": "c60dc9ea4ea5fccf8819604a9aa89a80",
                "sha256": "8a1fc03f0c3665afdc6873bccb3becba0a0e65f4897554fcd8009b4be81d6b4d"
            },
            "downloads": -1,
            "filename": "smilesfeaturizer-0.1.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c60dc9ea4ea5fccf8819604a9aa89a80",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 19321342,
            "upload_time": "2023-10-11T09:33:48",
            "upload_time_iso_8601": "2023-10-11T09:33:48.461484Z",
            "url": "https://files.pythonhosted.org/packages/ab/d1/3f2fd9abb7734b324c92da1815acdab98a872033df8be542fc349a812b45/smilesfeaturizer-0.1.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1fce58b7e2d6657f7344d10c936069d52aeda6b8a069bf3225d1c355b7f262fd",
                "md5": "1f03c5acf9f85fa5e48033e1d8154e6e",
                "sha256": "c9abc16b66011fc9b3bfd1954f3d08b107d07694a68ff43518516ba8f0acd9c8"
            },
            "downloads": -1,
            "filename": "smilesfeaturizer-0.1.3.tar.gz",
            "has_sig": false,
            "md5_digest": "1f03c5acf9f85fa5e48033e1d8154e6e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 19303906,
            "upload_time": "2023-10-11T09:34:00",
            "upload_time_iso_8601": "2023-10-11T09:34:00.808348Z",
            "url": "https://files.pythonhosted.org/packages/1f/ce/58b7e2d6657f7344d10c936069d52aeda6b8a069bf3225d1c355b7f262fd/smilesfeaturizer-0.1.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-10-11 09:34:00",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "dsdanielpark",
    "github_project": "SMILES-feature",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "smilesfeaturizer"
}
        
Elapsed time: 0.12200s