MoleculaPy


NameMoleculaPy JSON
Version 1.0.1 PyPI version JSON
download
home_pagehttps://github.com/kamilpytlak/MoleculaPy
SummaryA command-line application that utilizes the RDKit library to compute molecular descriptors and fingerprints, aiding in the analysis and characterization of chemical structures
upload_time2023-09-01 20:58:42
maintainer
docs_urlNone
authorKamil Pytlak
requires_python>=3.9,<4.0
licenseMIT
keywords cli fingerprints rdkit chemoinformatics qsar molecular-descriptors
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <h1 align="center">
  ๐Ÿงช MoleculaPy ๐Ÿงช 
  <br>
</h1>

<h4 align="center">A command-line application that utilizes the RDKit library to compute molecular descriptors and fingerprints, aiding in the analysis and characterization of chemical structures.</h4>

<p align="center">
  <a href="#key-features">Key Features</a> โ€ข
  <a href="#installation">Installation</a> โ€ข
  <a href="#how-to-use">How To Use</a> โ€ข
  <a href="#contact">Contact</a> โ€ข
  <a href="#credits">Credits</a> โ€ข
  <a href="#license">License</a>
</p>

<p align="center">MoleculaPy is a powerful command-line interface (CLI) application developed in Python, designed for chemoinformatics enthusiasts and researchers. Leveraging the renowned RDKit library, MoleculaPy empowers users to effortlessly compute a diverse set of molecular descriptors and fingerprints for compounds specified in the SMILES (Simplified Molecular Input Line Entry System) format, all within the convenience of their terminal.</p>

## Key Features
*  ๐Ÿ“ **SMILES Compatibility**: MoleculaPy seamlessly processes chemical data in the SMILES format, the industry-standard notation for representing molecular structures.


*  ๐Ÿงฌ **Comprehensive Descriptors**: The application provides an extensive set of molecular descriptors, in a total number of 209. This breadth empowers users to gain deep insights into the properties and characteristics of chemical compounds.


*  ๐Ÿ” **Fingerprint Generation**: MoleculaPy offers robust functionality for generating molecular fingerprints, a critical component for tasks such as similarity analysis and virtual screening: n-dimensional Atom, Morgan, RDKit, Topological and 166-dimensional MACCS.


*  ๐Ÿ“ **CSV File Support**: Import and process large datasets of compounds effortlessly with MoleculaPy's CSV file support, streamlining high-throughput data analysis.


*  ๐Ÿงช **Scientific Accuracy**: MoleculaPy relies on the RDKit library, known for its scientific rigor and reliability in chemoinformatics, ensuring trustworthy results for research and analysis.


*  ๐Ÿ–ฅ๏ธ **User-Friendly Command Line**: The CLI interface is designed to be user-friendly and intuitive, catering to both seasoned researchers and newcomers in the field.


*  ๐Ÿง‚ **Salt Removal Option**: MoleculaPy offers users the flexibility to choose whether they want to remove salts from molecules during processing. This feature is particularly valuable when working with complex chemical datasets, allowing for cleaner and more accurate analyses.


*  ๐Ÿ“„ **Logging for Transparency**: MoleculaPy integrates a robust logging system that maintains detailed records of application activities. This ensures transparency and facilitates tasks such as debugging, progress tracking, auditing, and reproducibility.

## Installation
To install this app, just type in your CLI the following command:
```commandline
pip install moleculapy
```

Then make sure that the installation process went correctly by typing `moleculepy -h` in the CLI.
```commandline
>>> moleculepy -h

usage: MoleculaPy [-h] [--method {descriptors,fingerprints}] [--fp_type {Atom,MACCS,Morgan,Topological,RDKit}]
                  [--remove_salt | --no-remove_salt] [--n_bits N_BITS]
                  input_file output_file

Calculate molecular descriptors and fingerprints for molecules provided in a CSV file.

positional arguments:
  input_file            Path to the input file
  output_file           Path to the output file

options:
  -h, --help            show this help message and exit
  --method {descriptors,fingerprints}
                        (Optional) Calculation method: descriptors or fingeprints (default: descriptors)
  --fp_type {Atom,MACCS,Morgan,Topological,RDKit}
                        (Optional) Fingerprint type (default: Morgan)
  --remove_salt, --no-remove_salt
                        (Optional) Remove salts from SMILE. (default: --remove_salt)
  --n_bits N_BITS       (Optional) Number of bits of a given fingerprints type (default: 2048)
```

## How To Use
The application is fully compatible with Python 3.9+.

### Setting Up

By default, the program requires two arguments: `input_file` and `output_file`. Both are paths - the CSV file containing SMILES molecules and the output file, respectively.

Suppose we have a file `smiles_samples.csv`, which contains SMILES molecules (and other information, in this case it is not important). The column containing SMILES must be named "SMILES" (case-insensitive).

### Calculate molecular descriptors
To calculate molecular descriptors, we do not need to specify optional parameters. Thus, it is sufficient that we call:

```commandline
moleculapy --input_file .\smiles_sample.csv --output_file .\smiles_desc_output.csv
```

By default, MoleculaPy removes salts from chemical compounds, To oppose this, you must use the `--no-remove_salt parameter`:

```commandline
moleculapy --input_file .\smiles_sample.csv --output_file .\smiles_desc_output.csv --no-remove-salt
```

### Calculate fingerprints
With MoleculaPy, you can calculate various n-dimensional vectors of molecules, known as fingerprints: n-dimensional Atom, Morgan, RDKit, Topological and 166-dimensional MACCS.

To do this, you need to take care of two optional arguments: `--method` and `--fp_type`. The first argument specifies the calculation method (molecular descriptors or fingerprints), and the second one -- the fingerprint type.

For example, if you want to calculate 2048-dimensional Morgan fingerprints:

```commandline
moleculapy --input_file .\smiles_sample.csv --output_file .\smiles_morgan_output.csv --method fingerprints --fp_type Morgan
```

Atom, Morgan, RDKit and Topological compute as 2048-dimensional vectors by default, and MACCS computes as 166-dimensional vectors. If you want to change it, you can specify the another optional parameter `--n_bits`.

For example, if you want to calculate 512-dimensional fingerprints vectors of Atom type:
```commandline
moleculapy --input_file .\smiles_sample.csv --output_file .\smiles_atom_output.csv --method fingerprints --fp_type Atom --n_bits 512
```

### Logging
All calculations performed by the application are logged. The logs are stored in the `logs` folder in the path where the application was installed. The path to the logs will be displayed in the CLI after the calculation session is completed.

## Contact
If you have any problems, ideas or general feedback, please don't hesitate to contact me at [kam.pytlak@gmail.com](mailto:kam.pytlak@gmail.com). I'd really appreciate it!

## Credits

This software uses the following open source packages:

- [pandas](https://pandas.pydata.org/)
- [RDKit](https://www.rdkit.org/docs/index.html#)
- [tqdm](https://github.com/tqdm/tqdm)

## License
MIT

---

> GitHub [@kamilpytlak](https://github.com/kamilpytlak) &nbsp;&middot;&nbsp;
> LinkedIn [kamil-pytlak](https://www.linkedin.com/in/kamil-pytlak/)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/kamilpytlak/MoleculaPy",
    "name": "MoleculaPy",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.9,<4.0",
    "maintainer_email": "",
    "keywords": "cli,fingerprints,rdkit,chemoinformatics,qsar,molecular-descriptors",
    "author": "Kamil Pytlak",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/1f/6e/f3a871b654ade0c8baa8f6b758adac8d1b72aea63c3c5465a2e476f65194/moleculapy-1.0.1.tar.gz",
    "platform": null,
    "description": "<h1 align=\"center\">\n  \ud83e\uddea MoleculaPy \ud83e\uddea \n  <br>\n</h1>\n\n<h4 align=\"center\">A command-line application that utilizes the RDKit library to compute molecular descriptors and fingerprints, aiding in the analysis and characterization of chemical structures.</h4>\n\n<p align=\"center\">\n  <a href=\"#key-features\">Key Features</a> \u2022\n  <a href=\"#installation\">Installation</a> \u2022\n  <a href=\"#how-to-use\">How To Use</a> \u2022\n  <a href=\"#contact\">Contact</a> \u2022\n  <a href=\"#credits\">Credits</a> \u2022\n  <a href=\"#license\">License</a>\n</p>\n\n<p align=\"center\">MoleculaPy is a powerful command-line interface (CLI) application developed in Python, designed for chemoinformatics enthusiasts and researchers. Leveraging the renowned RDKit library, MoleculaPy empowers users to effortlessly compute a diverse set of molecular descriptors and fingerprints for compounds specified in the SMILES (Simplified Molecular Input Line Entry System) format, all within the convenience of their terminal.</p>\n\n## Key Features\n*  \ud83d\udcdd **SMILES Compatibility**: MoleculaPy seamlessly processes chemical data in the SMILES format, the industry-standard notation for representing molecular structures.\n\n\n*  \ud83e\uddec **Comprehensive Descriptors**: The application provides an extensive set of molecular descriptors, in a total number of 209. This breadth empowers users to gain deep insights into the properties and characteristics of chemical compounds.\n\n\n*  \ud83d\udd0d **Fingerprint Generation**: MoleculaPy offers robust functionality for generating molecular fingerprints, a critical component for tasks such as similarity analysis and virtual screening: n-dimensional Atom, Morgan, RDKit, Topological and 166-dimensional MACCS.\n\n\n*  \ud83d\udcc1 **CSV File Support**: Import and process large datasets of compounds effortlessly with MoleculaPy's CSV file support, streamlining high-throughput data analysis.\n\n\n*  \ud83e\uddea **Scientific Accuracy**: MoleculaPy relies on the RDKit library, known for its scientific rigor and reliability in chemoinformatics, ensuring trustworthy results for research and analysis.\n\n\n*  \ud83d\udda5\ufe0f **User-Friendly Command Line**: The CLI interface is designed to be user-friendly and intuitive, catering to both seasoned researchers and newcomers in the field.\n\n\n*  \ud83e\uddc2 **Salt Removal Option**: MoleculaPy offers users the flexibility to choose whether they want to remove salts from molecules during processing. This feature is particularly valuable when working with complex chemical datasets, allowing for cleaner and more accurate analyses.\n\n\n*  \ud83d\udcc4 **Logging for Transparency**: MoleculaPy integrates a robust logging system that maintains detailed records of application activities. This ensures transparency and facilitates tasks such as debugging, progress tracking, auditing, and reproducibility.\n\n## Installation\nTo install this app, just type in your CLI the following command:\n```commandline\npip install moleculapy\n```\n\nThen make sure that the installation process went correctly by typing `moleculepy -h` in the CLI.\n```commandline\n>>> moleculepy -h\n\nusage: MoleculaPy [-h] [--method {descriptors,fingerprints}] [--fp_type {Atom,MACCS,Morgan,Topological,RDKit}]\n                  [--remove_salt | --no-remove_salt] [--n_bits N_BITS]\n                  input_file output_file\n\nCalculate molecular descriptors and fingerprints for molecules provided in a CSV file.\n\npositional arguments:\n  input_file            Path to the input file\n  output_file           Path to the output file\n\noptions:\n  -h, --help            show this help message and exit\n  --method {descriptors,fingerprints}\n                        (Optional) Calculation method: descriptors or fingeprints (default: descriptors)\n  --fp_type {Atom,MACCS,Morgan,Topological,RDKit}\n                        (Optional) Fingerprint type (default: Morgan)\n  --remove_salt, --no-remove_salt\n                        (Optional) Remove salts from SMILE. (default: --remove_salt)\n  --n_bits N_BITS       (Optional) Number of bits of a given fingerprints type (default: 2048)\n```\n\n## How To Use\nThe application is fully compatible with Python 3.9+.\n\n### Setting Up\n\nBy default, the program requires two arguments: `input_file` and `output_file`. Both are paths - the CSV file containing SMILES molecules and the output file, respectively.\n\nSuppose we have a file `smiles_samples.csv`, which contains SMILES molecules (and other information, in this case it is not important). The column containing SMILES must be named \"SMILES\" (case-insensitive).\n\n### Calculate molecular descriptors\nTo calculate molecular descriptors, we do not need to specify optional parameters. Thus, it is sufficient that we call:\n\n```commandline\nmoleculapy --input_file .\\smiles_sample.csv --output_file .\\smiles_desc_output.csv\n```\n\nBy default, MoleculaPy removes salts from chemical compounds, To oppose this, you must use the `--no-remove_salt parameter`:\n\n```commandline\nmoleculapy --input_file .\\smiles_sample.csv --output_file .\\smiles_desc_output.csv --no-remove-salt\n```\n\n### Calculate fingerprints\nWith MoleculaPy, you can calculate various n-dimensional vectors of molecules, known as fingerprints: n-dimensional Atom, Morgan, RDKit, Topological and 166-dimensional MACCS.\n\nTo do this, you need to take care of two optional arguments: `--method` and `--fp_type`. The first argument specifies the calculation method (molecular descriptors or fingerprints), and the second one -- the fingerprint type.\n\nFor example, if you want to calculate 2048-dimensional Morgan fingerprints:\n\n```commandline\nmoleculapy --input_file .\\smiles_sample.csv --output_file .\\smiles_morgan_output.csv --method fingerprints --fp_type Morgan\n```\n\nAtom, Morgan, RDKit and Topological compute as 2048-dimensional vectors by default, and MACCS computes as 166-dimensional vectors. If you want to change it, you can specify the another optional parameter `--n_bits`.\n\nFor example, if you want to calculate 512-dimensional fingerprints vectors of Atom type:\n```commandline\nmoleculapy --input_file .\\smiles_sample.csv --output_file .\\smiles_atom_output.csv --method fingerprints --fp_type Atom --n_bits 512\n```\n\n### Logging\nAll calculations performed by the application are logged. The logs are stored in the `logs` folder in the path where the application was installed. The path to the logs will be displayed in the CLI after the calculation session is completed.\n\n## Contact\nIf you have any problems, ideas or general feedback, please don't hesitate to contact me at [kam.pytlak@gmail.com](mailto:kam.pytlak@gmail.com). I'd really appreciate it!\n\n## Credits\n\nThis software uses the following open source packages:\n\n- [pandas](https://pandas.pydata.org/)\n- [RDKit](https://www.rdkit.org/docs/index.html#)\n- [tqdm](https://github.com/tqdm/tqdm)\n\n## License\nMIT\n\n---\n\n> GitHub [@kamilpytlak](https://github.com/kamilpytlak) &nbsp;&middot;&nbsp;\n> LinkedIn [kamil-pytlak](https://www.linkedin.com/in/kamil-pytlak/)\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A command-line application that utilizes the RDKit library to compute molecular descriptors and fingerprints, aiding in the analysis and characterization of chemical structures",
    "version": "1.0.1",
    "project_urls": {
        "Homepage": "https://github.com/kamilpytlak/MoleculaPy",
        "Repository": "https://github.com/kamilpytlak/MoleculaPy"
    },
    "split_keywords": [
        "cli",
        "fingerprints",
        "rdkit",
        "chemoinformatics",
        "qsar",
        "molecular-descriptors"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "33ef870bc9da51fa51a187a1dfb66d514c649055490ef8d22225b0aaf6726793",
                "md5": "a5853b92c55c31f51e170d48f273cb4b",
                "sha256": "5156d9f32f4995842e85c142d6062b60c973f7a6b8c6d1b90f65263c44de7520"
            },
            "downloads": -1,
            "filename": "moleculapy-1.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a5853b92c55c31f51e170d48f273cb4b",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9,<4.0",
            "size": 98116,
            "upload_time": "2023-09-01T20:58:40",
            "upload_time_iso_8601": "2023-09-01T20:58:40.583337Z",
            "url": "https://files.pythonhosted.org/packages/33/ef/870bc9da51fa51a187a1dfb66d514c649055490ef8d22225b0aaf6726793/moleculapy-1.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1f6ef3a871b654ade0c8baa8f6b758adac8d1b72aea63c3c5465a2e476f65194",
                "md5": "0d5febb01f4e4d2d279458b6b9f937a5",
                "sha256": "71e31eade7fc0486294659e42d03cc2a42a9b3bbced4214aba76d53200baf958"
            },
            "downloads": -1,
            "filename": "moleculapy-1.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "0d5febb01f4e4d2d279458b6b9f937a5",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9,<4.0",
            "size": 94783,
            "upload_time": "2023-09-01T20:58:42",
            "upload_time_iso_8601": "2023-09-01T20:58:42.145989Z",
            "url": "https://files.pythonhosted.org/packages/1f/6e/f3a871b654ade0c8baa8f6b758adac8d1b72aea63c3c5465a2e476f65194/moleculapy-1.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-09-01 20:58:42",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "kamilpytlak",
    "github_project": "MoleculaPy",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "moleculapy"
}
        
Elapsed time: 0.11830s