ChemDescriptors


NameChemDescriptors JSON
Version 0.0.8 PyPI version JSON
download
home_pagehttps://github.com/AhmedAlhilal14/chemical-descriptors.git
SummaryChemical descriptors is a powerful Python package facilitating calculation of fingerprints for CSV files
upload_time2025-01-05 19:26:28
maintainerNone
docs_urlNone
authorAhmed Alhilal
requires_pythonNone
licenseMIT
keywords cheminformatics molecular descriptors fingerprints rdkit mordred padelpy
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Python Library: `ChemDescriptors`

This library provides various functions for calculating molecular descriptors and fingerprints in cheminformatics. It supports the calculation of a wide range of molecular descriptors and fingerprints, such as RDKit, Lipinski, Morgan, Mordred, and more.

## Importance of Fingerprint Types:
- **Distinct Representation:** Different fingerprint types capture various aspects of a molecule’s structure, allowing for versatile molecular comparisons.
- **Diverse Applications:** Depending on the task (such as similarity searching, classification, or clustering), choosing the right fingerprint type ensures better performance in chemical analysis and predictive modeling.
- **Accuracy in Modeling:** The right fingerprint type can significantly improve the accuracy of machine learning models and predictions based on molecular features.


## Number of Fingerprints:
The library supports a wide variety of fingerprint types, enabling a range of analyses for molecular datasets.

## Tutorial & Example visit the Chemical Descriptors Repository. [GitHub Repository](https://gist.github.com/AhmedAlhilal14/0efb8ff1b15c1227367cacea6ae16e2c#file-chemdescriptors-tutorial-ipynb) & [GitHub Repository]( https://github.com/AhmedAlhilal14/chemical-descriptors)

## Functions

### **add_rdkit_descriptor**(input_file,smiles_column)

**Description:**  
This function calculates RDKit molecular descriptors for molecules specified in a CSV file (`input_file`) using SMILES strings from a specified column (`smiles_column`). The calculated descriptors are appended as additional columns to the original data and saved in a new CSV file named `<input_file_name>_rdkit_descriptor.csv`.

**Parameters:**
- `input_file` (str): Path to the input CSV file containing molecular data in SMILES format.
- `smiles_column` (str): The name of the column in the input CSV file that contains the SMILES strings.

**Output:**  
The output will be saved as a CSV file named `<input_file_name>_rdkit_descriptor.csv`.

---

### **add_lipinski_descriptors**(file_path, smiles_column, verbose=False)

**Description:**  
This function calculates Lipinski descriptors for molecules specified in a CSV file (`file_path`) using SMILES strings from a specified column (`smiles_column`). It automatically saves the calculated descriptors to an output file named `<input_file_name>_lipinski_descriptors.csv`.

**Parameters:**
- `file_path` (str): Path to the input CSV file containing molecular data in SMILES format.
- `smiles_column` (str): The name of the column in the input CSV file that contains the SMILES strings.
- `verbose` (bool, optional): If `True`, the function will print additional processing details. Default is `False`.

**Output:**  
The output will be saved as a CSV file named `<input_file_name>_lipinski_descriptors.csv`.

---

### **add_morgan_fp**(input_file, smiles_column)

**Description:**  
Calculates Morgan fingerprints for molecules specified in a CSV file (`input_file`) using SMILES strings from a specified column (`smiles_column`). The function saves the calculated fingerprints to an output file named `<input_file_name>_calculate_morgan_fpts.csv`.

**Parameters:**
- `input_file` (str): Path to the input CSV file containing molecular data in SMILES format.
- `smiles_column` (str): The name of the column in the input CSV file that contains the SMILES strings.

**Output:**  
The output will be saved as a CSV file named `<input_file_name>_calculate_morgan_fpts.csv`.

---

### **add_mordred_descriptors**(input_file, smiles_column)

**Description:**  
Computes Mordred descriptors for molecules specified in a CSV file (`input_file`) using SMILES strings from a specified column (`smiles_column`). The function saves the computed descriptors to an output file named `<input_file_name>_mordred_descriptors.csv`.

**Parameters:**
- `input_file` (str): Path to the input CSV file containing molecular data in SMILES format.
- `smiles_column` (str): The name of the column in the input CSV file that contains the SMILES strings.

**Output:**  
The output will be saved as a CSV file named `<input_file_name>_mordred_descriptors.csv`.

---


**add_WienerIndex_ZagrebIndex(filename, smiles_column):


### **add_WienerIndex_ZagrebIndex**(input_file, smiles_column)

**Description:**  
Computes WienerIndex and ZagrebIndex descriptors for molecules specified in a CSV file (`input_file`) using SMILES strings from a specified column (`smiles_column`). The function saves the computed descriptors to an output file named `add_WienerIndex_ZagrebIndex_<input_file_name>_.csv`.

**Parameters:**
- `input_file` (str): Path to the input CSV file containing molecular data in SMILES format.
- `smiles_column` (str): The name of the column in the input CSV file that contains the SMILES strings.

**Output:**  
The output will be saved as a CSV file named `<input_file_name>_mordred_descriptors.csv`.

---
### **add_padelpy_fps**(input_file, smiles_column)

**Description:**  
This function allow to user to add  12 different types of molecular fingerprints  by using padelpy Library.. The supported fingerprints include:

- `AtomPairs2DCount`
- `AtomPairs2D`
- `EState`
- `CDKextended`
- `CDK`
- `CDKgraphonly`
- `KlekotaRothCount`
- `KlekotaRoth`
- `MACCS`
- `PubChem`
- `SubstructureCount`
- `Substructure`

Each enhanced dataset with fingerprints is saved as separate CSV files, appended with the respective fingerprint type name.

**Parameters:**
- `input_file` (str): Path to the input CSV file containing molecular data in SMILES format.
- `smiles_column` (str): The name of the column in the input CSV file that contains the SMILES strings.
- `Then run the code`: You find list of fingerpints you can select one or more to add them in your file

**Output:**  
Each fingerprint type will be saved as a separate CSV file with the respective fingerprint type name appended.  
For example: `<input_file_name>_AtomPairs2DCount.csv`.

---

### **add_molfeat_fps**(filename, smiles_column)

**Description:**  
This function allow to user to add  19 different types of molecular fingerprints by using molfeat Library. The supported fingerprints include:

  - `maccs`
  - `avalon`
  - `pattern`
  - `layered`
  - `map4`
  - `secfp`
  - `erg`
  - `estate`
  - `avalon-count`
  - `ecfp`
  - `fcfp`
  - `topological`
  - `atompair`
  - `rdkit`
  - `ecfp-count`
  - `fcfp-count`
  - `topological-count`
  - `atompair-count`
  - `rdkit-count`

### Parameters:
- `filename` (str): Path to the input CSV file containing molecular data in SMILES format.
- `smiles_column` (str): The name of the column in the input CSV file that contains the SMILES strings.
- `Then run the code`: You find list of fingerpints you can select one or more to add them in your file


### Output:
The output will be saved as a CSV file named `<input_file_name>_<fp_type>.csv` depending on the fingerprint type chosen.




### Refernce:
1. Emmanuel Noutahi, Cas Wognum, Hadrien Mary, Honoré Hounwanou, Kyle M. Kovary, Desmond Gilmour, thibaultvarin-r, Jackson Burns, Julien St-Laurent, t, DomInvivo, Saurav Maheshkar, & rbyrne-momatx. (2023). datamol-io/molfeat: 0.9.4 (0.9.4). Zenodo. [https://doi.org/10.5281/zenodo.8373019](https://doi.org/10.5281/zenodo.8373019)  
   [GitHub Repository](https://github.com/datamol-io/molfeat/tree/main)

2. RDKit: Open-source cheminformatics software. [https://rdkit.org](https://rdkit.org)

3. Moriwaki, H., Tian, YS., Kawashita, N. et al. (2018). Mordred: a molecular descriptor calculator. *Journal of Cheminformatics*, 10, 4. [https://doi.org/10.1186/s13321-018-0258-y](https://doi.org/10.1186/s13321-018-0258-y)

4. PaDELPy: A Python wrapper for PaDEL-Descriptor software. [GitHub Repository](https://github.com/ecrl/padelpy)

5. Ahmed Alhilal. Chemical Descriptors Repository. [GitHub Repository](https://github.com/AhmedAlhilal14/chemical-descriptors)


Ahmed Alhilal
=============

0.0.8 (05/01/2025)
-------------------
- First Release

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/AhmedAlhilal14/chemical-descriptors.git",
    "name": "ChemDescriptors",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "Cheminformatics, Molecular Descriptors, Fingerprints, RDKit, Mordred, Padelpy",
    "author": "Ahmed Alhilal",
    "author_email": "aalhilal@udel.edu",
    "download_url": "https://files.pythonhosted.org/packages/4f/5f/fa7e3708f62d2827bf39e404b5987bebc5f72191f8428f0419ad823cc3e8/chemdescriptors-0.0.8.tar.gz",
    "platform": null,
    "description": "# Python Library: `ChemDescriptors`\r\n\r\nThis library provides various functions for calculating molecular descriptors and fingerprints in cheminformatics. It supports the calculation of a wide range of molecular descriptors and fingerprints, such as RDKit, Lipinski, Morgan, Mordred, and more.\r\n\r\n## Importance of Fingerprint Types:\r\n- **Distinct Representation:** Different fingerprint types capture various aspects of a molecule\u2019s structure, allowing for versatile molecular comparisons.\r\n- **Diverse Applications:** Depending on the task (such as similarity searching, classification, or clustering), choosing the right fingerprint type ensures better performance in chemical analysis and predictive modeling.\r\n- **Accuracy in Modeling:** The right fingerprint type can significantly improve the accuracy of machine learning models and predictions based on molecular features.\r\n\r\n\r\n## Number of Fingerprints:\r\nThe library supports a wide variety of fingerprint types, enabling a range of analyses for molecular datasets.\r\n\r\n## Tutorial & Example visit the Chemical Descriptors Repository. [GitHub Repository](https://gist.github.com/AhmedAlhilal14/0efb8ff1b15c1227367cacea6ae16e2c#file-chemdescriptors-tutorial-ipynb) & [GitHub Repository]( https://github.com/AhmedAlhilal14/chemical-descriptors)\r\n\r\n## Functions\r\n\r\n### **add_rdkit_descriptor**(input_file,smiles_column)\r\n\r\n**Description:**  \r\nThis function calculates RDKit molecular descriptors for molecules specified in a CSV file (`input_file`) using SMILES strings from a specified column (`smiles_column`). The calculated descriptors are appended as additional columns to the original data and saved in a new CSV file named `<input_file_name>_rdkit_descriptor.csv`.\r\n\r\n**Parameters:**\r\n- `input_file` (str): Path to the input CSV file containing molecular data in SMILES format.\r\n- `smiles_column` (str): The name of the column in the input CSV file that contains the SMILES strings.\r\n\r\n**Output:**  \r\nThe output will be saved as a CSV file named `<input_file_name>_rdkit_descriptor.csv`.\r\n\r\n---\r\n\r\n### **add_lipinski_descriptors**(file_path, smiles_column, verbose=False)\r\n\r\n**Description:**  \r\nThis function calculates Lipinski descriptors for molecules specified in a CSV file (`file_path`) using SMILES strings from a specified column (`smiles_column`). It automatically saves the calculated descriptors to an output file named `<input_file_name>_lipinski_descriptors.csv`.\r\n\r\n**Parameters:**\r\n- `file_path` (str): Path to the input CSV file containing molecular data in SMILES format.\r\n- `smiles_column` (str): The name of the column in the input CSV file that contains the SMILES strings.\r\n- `verbose` (bool, optional): If `True`, the function will print additional processing details. Default is `False`.\r\n\r\n**Output:**  \r\nThe output will be saved as a CSV file named `<input_file_name>_lipinski_descriptors.csv`.\r\n\r\n---\r\n\r\n### **add_morgan_fp**(input_file, smiles_column)\r\n\r\n**Description:**  \r\nCalculates Morgan fingerprints for molecules specified in a CSV file (`input_file`) using SMILES strings from a specified column (`smiles_column`). The function saves the calculated fingerprints to an output file named `<input_file_name>_calculate_morgan_fpts.csv`.\r\n\r\n**Parameters:**\r\n- `input_file` (str): Path to the input CSV file containing molecular data in SMILES format.\r\n- `smiles_column` (str): The name of the column in the input CSV file that contains the SMILES strings.\r\n\r\n**Output:**  \r\nThe output will be saved as a CSV file named `<input_file_name>_calculate_morgan_fpts.csv`.\r\n\r\n---\r\n\r\n### **add_mordred_descriptors**(input_file, smiles_column)\r\n\r\n**Description:**  \r\nComputes Mordred descriptors for molecules specified in a CSV file (`input_file`) using SMILES strings from a specified column (`smiles_column`). The function saves the computed descriptors to an output file named `<input_file_name>_mordred_descriptors.csv`.\r\n\r\n**Parameters:**\r\n- `input_file` (str): Path to the input CSV file containing molecular data in SMILES format.\r\n- `smiles_column` (str): The name of the column in the input CSV file that contains the SMILES strings.\r\n\r\n**Output:**  \r\nThe output will be saved as a CSV file named `<input_file_name>_mordred_descriptors.csv`.\r\n\r\n---\r\n\r\n\r\n**add_WienerIndex_ZagrebIndex(filename, smiles_column):\r\n\r\n\r\n### **add_WienerIndex_ZagrebIndex**(input_file, smiles_column)\r\n\r\n**Description:**  \r\nComputes WienerIndex and ZagrebIndex descriptors for molecules specified in a CSV file (`input_file`) using SMILES strings from a specified column (`smiles_column`). The function saves the computed descriptors to an output file named `add_WienerIndex_ZagrebIndex_<input_file_name>_.csv`.\r\n\r\n**Parameters:**\r\n- `input_file` (str): Path to the input CSV file containing molecular data in SMILES format.\r\n- `smiles_column` (str): The name of the column in the input CSV file that contains the SMILES strings.\r\n\r\n**Output:**  \r\nThe output will be saved as a CSV file named `<input_file_name>_mordred_descriptors.csv`.\r\n\r\n---\r\n### **add_padelpy_fps**(input_file, smiles_column)\r\n\r\n**Description:**  \r\nThis function allow to user to add  12 different types of molecular fingerprints  by using padelpy Library.. The supported fingerprints include:\r\n\r\n- `AtomPairs2DCount`\r\n- `AtomPairs2D`\r\n- `EState`\r\n- `CDKextended`\r\n- `CDK`\r\n- `CDKgraphonly`\r\n- `KlekotaRothCount`\r\n- `KlekotaRoth`\r\n- `MACCS`\r\n- `PubChem`\r\n- `SubstructureCount`\r\n- `Substructure`\r\n\r\nEach enhanced dataset with fingerprints is saved as separate CSV files, appended with the respective fingerprint type name.\r\n\r\n**Parameters:**\r\n- `input_file` (str): Path to the input CSV file containing molecular data in SMILES format.\r\n- `smiles_column` (str): The name of the column in the input CSV file that contains the SMILES strings.\r\n- `Then run the code`: You find list of fingerpints you can select one or more to add them in your file\r\n\r\n**Output:**  \r\nEach fingerprint type will be saved as a separate CSV file with the respective fingerprint type name appended.  \r\nFor example: `<input_file_name>_AtomPairs2DCount.csv`.\r\n\r\n---\r\n\r\n### **add_molfeat_fps**(filename, smiles_column)\r\n\r\n**Description:**  \r\nThis function allow to user to add  19 different types of molecular fingerprints by using molfeat Library. The supported fingerprints include:\r\n\r\n  - `maccs`\r\n  - `avalon`\r\n  - `pattern`\r\n  - `layered`\r\n  - `map4`\r\n  - `secfp`\r\n  - `erg`\r\n  - `estate`\r\n  - `avalon-count`\r\n  - `ecfp`\r\n  - `fcfp`\r\n  - `topological`\r\n  - `atompair`\r\n  - `rdkit`\r\n  - `ecfp-count`\r\n  - `fcfp-count`\r\n  - `topological-count`\r\n  - `atompair-count`\r\n  - `rdkit-count`\r\n\r\n### Parameters:\r\n- `filename` (str): Path to the input CSV file containing molecular data in SMILES format.\r\n- `smiles_column` (str): The name of the column in the input CSV file that contains the SMILES strings.\r\n- `Then run the code`: You find list of fingerpints you can select one or more to add them in your file\r\n\r\n\r\n### Output:\r\nThe output will be saved as a CSV file named `<input_file_name>_<fp_type>.csv` depending on the fingerprint type chosen.\r\n\r\n\r\n\r\n\r\n### Refernce:\r\n1. Emmanuel Noutahi, Cas Wognum, Hadrien Mary, Honor\u00e9 Hounwanou, Kyle M. Kovary, Desmond Gilmour, thibaultvarin-r, Jackson Burns, Julien St-Laurent, t, DomInvivo, Saurav Maheshkar, & rbyrne-momatx. (2023). datamol-io/molfeat: 0.9.4 (0.9.4). Zenodo. [https://doi.org/10.5281/zenodo.8373019](https://doi.org/10.5281/zenodo.8373019)  \r\n   [GitHub Repository](https://github.com/datamol-io/molfeat/tree/main)\r\n\r\n2. RDKit: Open-source cheminformatics software. [https://rdkit.org](https://rdkit.org)\r\n\r\n3. Moriwaki, H., Tian, YS., Kawashita, N. et al. (2018). Mordred: a molecular descriptor calculator. *Journal of Cheminformatics*, 10, 4. [https://doi.org/10.1186/s13321-018-0258-y](https://doi.org/10.1186/s13321-018-0258-y)\r\n\r\n4. PaDELPy: A Python wrapper for PaDEL-Descriptor software. [GitHub Repository](https://github.com/ecrl/padelpy)\r\n\r\n5. Ahmed Alhilal. Chemical Descriptors Repository. [GitHub Repository](https://github.com/AhmedAlhilal14/chemical-descriptors)\r\n\r\n\r\nAhmed Alhilal\r\n=============\r\n\r\n0.0.8 (05/01/2025)\r\n-------------------\r\n- First Release\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Chemical descriptors is a powerful Python package facilitating calculation of fingerprints for CSV files",
    "version": "0.0.8",
    "project_urls": {
        "Homepage": "https://github.com/AhmedAlhilal14/chemical-descriptors.git"
    },
    "split_keywords": [
        "cheminformatics",
        " molecular descriptors",
        " fingerprints",
        " rdkit",
        " mordred",
        " padelpy"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "bd3542aec2ac31ad3af830ea208c3ae7b59df2f99e4b8a91a48464adea07773f",
                "md5": "9bfba7c5841bffc1aa4ed6a23abc1d73",
                "sha256": "13222e5bbc5dbeb27bfdf5894e4530f964a08261ec080a420d7426c937393af1"
            },
            "downloads": -1,
            "filename": "ChemDescriptors-0.0.8-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9bfba7c5841bffc1aa4ed6a23abc1d73",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 9426,
            "upload_time": "2025-01-05T19:26:25",
            "upload_time_iso_8601": "2025-01-05T19:26:25.206465Z",
            "url": "https://files.pythonhosted.org/packages/bd/35/42aec2ac31ad3af830ea208c3ae7b59df2f99e4b8a91a48464adea07773f/ChemDescriptors-0.0.8-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4f5ffa7e3708f62d2827bf39e404b5987bebc5f72191f8428f0419ad823cc3e8",
                "md5": "83083e1ef2fb65701687266c8a60a9a2",
                "sha256": "e0aee373e9b47d34aa2f1e93937c4182837d3ba251ab513ab127fabff3674810"
            },
            "downloads": -1,
            "filename": "chemdescriptors-0.0.8.tar.gz",
            "has_sig": false,
            "md5_digest": "83083e1ef2fb65701687266c8a60a9a2",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 9321,
            "upload_time": "2025-01-05T19:26:28",
            "upload_time_iso_8601": "2025-01-05T19:26:28.702142Z",
            "url": "https://files.pythonhosted.org/packages/4f/5f/fa7e3708f62d2827bf39e404b5987bebc5f72191f8428f0419ad823cc3e8/chemdescriptors-0.0.8.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-05 19:26:28",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "AhmedAlhilal14",
    "github_project": "chemical-descriptors",
    "github_not_found": true,
    "lcname": "chemdescriptors"
}
        
Elapsed time: 0.43621s