# Python Library: `ChemDescriptors`
This library provides various functions for calculating molecular descriptors and fingerprints in cheminformatics. It supports the calculation of a wide range of molecular descriptors and fingerprints, such as RDKit, Lipinski, Morgan, Mordred, and more.
## Importance of Fingerprint Types:
- **Distinct Representation:** Different fingerprint types capture various aspects of a molecule’s structure, allowing for versatile molecular comparisons.
- **Diverse Applications:** Depending on the task (such as similarity searching, classification, or clustering), choosing the right fingerprint type ensures better performance in chemical analysis and predictive modeling.
- **Accuracy in Modeling:** The right fingerprint type can significantly improve the accuracy of machine learning models and predictions based on molecular features.
## Number of Fingerprints:
The library supports a wide variety of fingerprint types, enabling a range of analyses for molecular datasets.
## Tutorial & Example visit the Chemical Descriptors Repository. [GitHub Repository](https://gist.github.com/AhmedAlhilal14/0efb8ff1b15c1227367cacea6ae16e2c#file-chemdescriptors-tutorial-ipynb) & [GitHub Repository]( https://github.com/AhmedAlhilal14/chemical-descriptors)
## Functions
### **add_rdkit_descriptor**(input_file,smiles_column)
**Description:**
This function calculates RDKit molecular descriptors for molecules specified in a CSV file (`input_file`) using SMILES strings from a specified column (`smiles_column`). The calculated descriptors are appended as additional columns to the original data and saved in a new CSV file named `<input_file_name>_rdkit_descriptor.csv`.
**Parameters:**
- `input_file` (str): Path to the input CSV file containing molecular data in SMILES format.
- `smiles_column` (str): The name of the column in the input CSV file that contains the SMILES strings.
**Output:**
The output will be saved as a CSV file named `<input_file_name>_rdkit_descriptor.csv`.
---
### **add_lipinski_descriptors**(file_path, smiles_column, verbose=False)
**Description:**
This function calculates Lipinski descriptors for molecules specified in a CSV file (`file_path`) using SMILES strings from a specified column (`smiles_column`). It automatically saves the calculated descriptors to an output file named `<input_file_name>_lipinski_descriptors.csv`.
**Parameters:**
- `file_path` (str): Path to the input CSV file containing molecular data in SMILES format.
- `smiles_column` (str): The name of the column in the input CSV file that contains the SMILES strings.
- `verbose` (bool, optional): If `True`, the function will print additional processing details. Default is `False`.
**Output:**
The output will be saved as a CSV file named `<input_file_name>_lipinski_descriptors.csv`.
---
### **add_morgan_fp**(input_file, smiles_column)
**Description:**
Calculates Morgan fingerprints for molecules specified in a CSV file (`input_file`) using SMILES strings from a specified column (`smiles_column`). The function saves the calculated fingerprints to an output file named `<input_file_name>_calculate_morgan_fpts.csv`.
**Parameters:**
- `input_file` (str): Path to the input CSV file containing molecular data in SMILES format.
- `smiles_column` (str): The name of the column in the input CSV file that contains the SMILES strings.
**Output:**
The output will be saved as a CSV file named `<input_file_name>_calculate_morgan_fpts.csv`.
---
### **add_mordred_descriptors**(input_file, smiles_column)
**Description:**
Computes Mordred descriptors for molecules specified in a CSV file (`input_file`) using SMILES strings from a specified column (`smiles_column`). The function saves the computed descriptors to an output file named `<input_file_name>_mordred_descriptors.csv`.
**Parameters:**
- `input_file` (str): Path to the input CSV file containing molecular data in SMILES format.
- `smiles_column` (str): The name of the column in the input CSV file that contains the SMILES strings.
**Output:**
The output will be saved as a CSV file named `<input_file_name>_mordred_descriptors.csv`.
---
**add_WienerIndex_ZagrebIndex(filename, smiles_column):
### **add_WienerIndex_ZagrebIndex**(input_file, smiles_column)
**Description:**
Computes WienerIndex and ZagrebIndex descriptors for molecules specified in a CSV file (`input_file`) using SMILES strings from a specified column (`smiles_column`). The function saves the computed descriptors to an output file named `add_WienerIndex_ZagrebIndex_<input_file_name>_.csv`.
**Parameters:**
- `input_file` (str): Path to the input CSV file containing molecular data in SMILES format.
- `smiles_column` (str): The name of the column in the input CSV file that contains the SMILES strings.
**Output:**
The output will be saved as a CSV file named `<input_file_name>_mordred_descriptors.csv`.
---
### **add_padelpy_fps**(input_file, smiles_column)
**Description:**
This function allow to user to add 12 different types of molecular fingerprints by using padelpy Library.. The supported fingerprints include:
- `AtomPairs2DCount`
- `AtomPairs2D`
- `EState`
- `CDKextended`
- `CDK`
- `CDKgraphonly`
- `KlekotaRothCount`
- `KlekotaRoth`
- `MACCS`
- `PubChem`
- `SubstructureCount`
- `Substructure`
Each enhanced dataset with fingerprints is saved as separate CSV files, appended with the respective fingerprint type name.
**Parameters:**
- `input_file` (str): Path to the input CSV file containing molecular data in SMILES format.
- `smiles_column` (str): The name of the column in the input CSV file that contains the SMILES strings.
- `Then run the code`: You find list of fingerpints you can select one or more to add them in your file
**Output:**
Each fingerprint type will be saved as a separate CSV file with the respective fingerprint type name appended.
For example: `<input_file_name>_AtomPairs2DCount.csv`.
---
### **add_molfeat_fps**(filename, smiles_column)
**Description:**
This function allow to user to add 19 different types of molecular fingerprints by using molfeat Library. The supported fingerprints include:
- `maccs`
- `avalon`
- `pattern`
- `layered`
- `map4`
- `secfp`
- `erg`
- `estate`
- `avalon-count`
- `ecfp`
- `fcfp`
- `topological`
- `atompair`
- `rdkit`
- `ecfp-count`
- `fcfp-count`
- `topological-count`
- `atompair-count`
- `rdkit-count`
### Parameters:
- `filename` (str): Path to the input CSV file containing molecular data in SMILES format.
- `smiles_column` (str): The name of the column in the input CSV file that contains the SMILES strings.
- `Then run the code`: You find list of fingerpints you can select one or more to add them in your file
### Output:
The output will be saved as a CSV file named `<input_file_name>_<fp_type>.csv` depending on the fingerprint type chosen.
### Refernce:
1. Emmanuel Noutahi, Cas Wognum, Hadrien Mary, Honoré Hounwanou, Kyle M. Kovary, Desmond Gilmour, thibaultvarin-r, Jackson Burns, Julien St-Laurent, t, DomInvivo, Saurav Maheshkar, & rbyrne-momatx. (2023). datamol-io/molfeat: 0.9.4 (0.9.4). Zenodo. [https://doi.org/10.5281/zenodo.8373019](https://doi.org/10.5281/zenodo.8373019)
[GitHub Repository](https://github.com/datamol-io/molfeat/tree/main)
2. RDKit: Open-source cheminformatics software. [https://rdkit.org](https://rdkit.org)
3. Moriwaki, H., Tian, YS., Kawashita, N. et al. (2018). Mordred: a molecular descriptor calculator. *Journal of Cheminformatics*, 10, 4. [https://doi.org/10.1186/s13321-018-0258-y](https://doi.org/10.1186/s13321-018-0258-y)
4. PaDELPy: A Python wrapper for PaDEL-Descriptor software. [GitHub Repository](https://github.com/ecrl/padelpy)
5. Ahmed Alhilal. Chemical Descriptors Repository. [GitHub Repository](https://github.com/AhmedAlhilal14/chemical-descriptors)
Ahmed Alhilal
=============
0.0.8 (05/01/2025)
-------------------
- First Release
Raw data
{
"_id": null,
"home_page": "https://github.com/AhmedAlhilal14/chemical-descriptors.git",
"name": "ChemDescriptors",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "Cheminformatics, Molecular Descriptors, Fingerprints, RDKit, Mordred, Padelpy",
"author": "Ahmed Alhilal",
"author_email": "aalhilal@udel.edu",
"download_url": "https://files.pythonhosted.org/packages/4f/5f/fa7e3708f62d2827bf39e404b5987bebc5f72191f8428f0419ad823cc3e8/chemdescriptors-0.0.8.tar.gz",
"platform": null,
"description": "# Python Library: `ChemDescriptors`\r\n\r\nThis library provides various functions for calculating molecular descriptors and fingerprints in cheminformatics. It supports the calculation of a wide range of molecular descriptors and fingerprints, such as RDKit, Lipinski, Morgan, Mordred, and more.\r\n\r\n## Importance of Fingerprint Types:\r\n- **Distinct Representation:** Different fingerprint types capture various aspects of a molecule\u2019s structure, allowing for versatile molecular comparisons.\r\n- **Diverse Applications:** Depending on the task (such as similarity searching, classification, or clustering), choosing the right fingerprint type ensures better performance in chemical analysis and predictive modeling.\r\n- **Accuracy in Modeling:** The right fingerprint type can significantly improve the accuracy of machine learning models and predictions based on molecular features.\r\n\r\n\r\n## Number of Fingerprints:\r\nThe library supports a wide variety of fingerprint types, enabling a range of analyses for molecular datasets.\r\n\r\n## Tutorial & Example visit the Chemical Descriptors Repository. [GitHub Repository](https://gist.github.com/AhmedAlhilal14/0efb8ff1b15c1227367cacea6ae16e2c#file-chemdescriptors-tutorial-ipynb) & [GitHub Repository]( https://github.com/AhmedAlhilal14/chemical-descriptors)\r\n\r\n## Functions\r\n\r\n### **add_rdkit_descriptor**(input_file,smiles_column)\r\n\r\n**Description:** \r\nThis function calculates RDKit molecular descriptors for molecules specified in a CSV file (`input_file`) using SMILES strings from a specified column (`smiles_column`). The calculated descriptors are appended as additional columns to the original data and saved in a new CSV file named `<input_file_name>_rdkit_descriptor.csv`.\r\n\r\n**Parameters:**\r\n- `input_file` (str): Path to the input CSV file containing molecular data in SMILES format.\r\n- `smiles_column` (str): The name of the column in the input CSV file that contains the SMILES strings.\r\n\r\n**Output:** \r\nThe output will be saved as a CSV file named `<input_file_name>_rdkit_descriptor.csv`.\r\n\r\n---\r\n\r\n### **add_lipinski_descriptors**(file_path, smiles_column, verbose=False)\r\n\r\n**Description:** \r\nThis function calculates Lipinski descriptors for molecules specified in a CSV file (`file_path`) using SMILES strings from a specified column (`smiles_column`). It automatically saves the calculated descriptors to an output file named `<input_file_name>_lipinski_descriptors.csv`.\r\n\r\n**Parameters:**\r\n- `file_path` (str): Path to the input CSV file containing molecular data in SMILES format.\r\n- `smiles_column` (str): The name of the column in the input CSV file that contains the SMILES strings.\r\n- `verbose` (bool, optional): If `True`, the function will print additional processing details. Default is `False`.\r\n\r\n**Output:** \r\nThe output will be saved as a CSV file named `<input_file_name>_lipinski_descriptors.csv`.\r\n\r\n---\r\n\r\n### **add_morgan_fp**(input_file, smiles_column)\r\n\r\n**Description:** \r\nCalculates Morgan fingerprints for molecules specified in a CSV file (`input_file`) using SMILES strings from a specified column (`smiles_column`). The function saves the calculated fingerprints to an output file named `<input_file_name>_calculate_morgan_fpts.csv`.\r\n\r\n**Parameters:**\r\n- `input_file` (str): Path to the input CSV file containing molecular data in SMILES format.\r\n- `smiles_column` (str): The name of the column in the input CSV file that contains the SMILES strings.\r\n\r\n**Output:** \r\nThe output will be saved as a CSV file named `<input_file_name>_calculate_morgan_fpts.csv`.\r\n\r\n---\r\n\r\n### **add_mordred_descriptors**(input_file, smiles_column)\r\n\r\n**Description:** \r\nComputes Mordred descriptors for molecules specified in a CSV file (`input_file`) using SMILES strings from a specified column (`smiles_column`). The function saves the computed descriptors to an output file named `<input_file_name>_mordred_descriptors.csv`.\r\n\r\n**Parameters:**\r\n- `input_file` (str): Path to the input CSV file containing molecular data in SMILES format.\r\n- `smiles_column` (str): The name of the column in the input CSV file that contains the SMILES strings.\r\n\r\n**Output:** \r\nThe output will be saved as a CSV file named `<input_file_name>_mordred_descriptors.csv`.\r\n\r\n---\r\n\r\n\r\n**add_WienerIndex_ZagrebIndex(filename, smiles_column):\r\n\r\n\r\n### **add_WienerIndex_ZagrebIndex**(input_file, smiles_column)\r\n\r\n**Description:** \r\nComputes WienerIndex and ZagrebIndex descriptors for molecules specified in a CSV file (`input_file`) using SMILES strings from a specified column (`smiles_column`). The function saves the computed descriptors to an output file named `add_WienerIndex_ZagrebIndex_<input_file_name>_.csv`.\r\n\r\n**Parameters:**\r\n- `input_file` (str): Path to the input CSV file containing molecular data in SMILES format.\r\n- `smiles_column` (str): The name of the column in the input CSV file that contains the SMILES strings.\r\n\r\n**Output:** \r\nThe output will be saved as a CSV file named `<input_file_name>_mordred_descriptors.csv`.\r\n\r\n---\r\n### **add_padelpy_fps**(input_file, smiles_column)\r\n\r\n**Description:** \r\nThis function allow to user to add 12 different types of molecular fingerprints by using padelpy Library.. The supported fingerprints include:\r\n\r\n- `AtomPairs2DCount`\r\n- `AtomPairs2D`\r\n- `EState`\r\n- `CDKextended`\r\n- `CDK`\r\n- `CDKgraphonly`\r\n- `KlekotaRothCount`\r\n- `KlekotaRoth`\r\n- `MACCS`\r\n- `PubChem`\r\n- `SubstructureCount`\r\n- `Substructure`\r\n\r\nEach enhanced dataset with fingerprints is saved as separate CSV files, appended with the respective fingerprint type name.\r\n\r\n**Parameters:**\r\n- `input_file` (str): Path to the input CSV file containing molecular data in SMILES format.\r\n- `smiles_column` (str): The name of the column in the input CSV file that contains the SMILES strings.\r\n- `Then run the code`: You find list of fingerpints you can select one or more to add them in your file\r\n\r\n**Output:** \r\nEach fingerprint type will be saved as a separate CSV file with the respective fingerprint type name appended. \r\nFor example: `<input_file_name>_AtomPairs2DCount.csv`.\r\n\r\n---\r\n\r\n### **add_molfeat_fps**(filename, smiles_column)\r\n\r\n**Description:** \r\nThis function allow to user to add 19 different types of molecular fingerprints by using molfeat Library. The supported fingerprints include:\r\n\r\n - `maccs`\r\n - `avalon`\r\n - `pattern`\r\n - `layered`\r\n - `map4`\r\n - `secfp`\r\n - `erg`\r\n - `estate`\r\n - `avalon-count`\r\n - `ecfp`\r\n - `fcfp`\r\n - `topological`\r\n - `atompair`\r\n - `rdkit`\r\n - `ecfp-count`\r\n - `fcfp-count`\r\n - `topological-count`\r\n - `atompair-count`\r\n - `rdkit-count`\r\n\r\n### Parameters:\r\n- `filename` (str): Path to the input CSV file containing molecular data in SMILES format.\r\n- `smiles_column` (str): The name of the column in the input CSV file that contains the SMILES strings.\r\n- `Then run the code`: You find list of fingerpints you can select one or more to add them in your file\r\n\r\n\r\n### Output:\r\nThe output will be saved as a CSV file named `<input_file_name>_<fp_type>.csv` depending on the fingerprint type chosen.\r\n\r\n\r\n\r\n\r\n### Refernce:\r\n1. Emmanuel Noutahi, Cas Wognum, Hadrien Mary, Honor\u00e9 Hounwanou, Kyle M. Kovary, Desmond Gilmour, thibaultvarin-r, Jackson Burns, Julien St-Laurent, t, DomInvivo, Saurav Maheshkar, & rbyrne-momatx. (2023). datamol-io/molfeat: 0.9.4 (0.9.4). Zenodo. [https://doi.org/10.5281/zenodo.8373019](https://doi.org/10.5281/zenodo.8373019) \r\n [GitHub Repository](https://github.com/datamol-io/molfeat/tree/main)\r\n\r\n2. RDKit: Open-source cheminformatics software. [https://rdkit.org](https://rdkit.org)\r\n\r\n3. Moriwaki, H., Tian, YS., Kawashita, N. et al. (2018). Mordred: a molecular descriptor calculator. *Journal of Cheminformatics*, 10, 4. [https://doi.org/10.1186/s13321-018-0258-y](https://doi.org/10.1186/s13321-018-0258-y)\r\n\r\n4. PaDELPy: A Python wrapper for PaDEL-Descriptor software. [GitHub Repository](https://github.com/ecrl/padelpy)\r\n\r\n5. Ahmed Alhilal. Chemical Descriptors Repository. [GitHub Repository](https://github.com/AhmedAlhilal14/chemical-descriptors)\r\n\r\n\r\nAhmed Alhilal\r\n=============\r\n\r\n0.0.8 (05/01/2025)\r\n-------------------\r\n- First Release\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Chemical descriptors is a powerful Python package facilitating calculation of fingerprints for CSV files",
"version": "0.0.8",
"project_urls": {
"Homepage": "https://github.com/AhmedAlhilal14/chemical-descriptors.git"
},
"split_keywords": [
"cheminformatics",
" molecular descriptors",
" fingerprints",
" rdkit",
" mordred",
" padelpy"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "bd3542aec2ac31ad3af830ea208c3ae7b59df2f99e4b8a91a48464adea07773f",
"md5": "9bfba7c5841bffc1aa4ed6a23abc1d73",
"sha256": "13222e5bbc5dbeb27bfdf5894e4530f964a08261ec080a420d7426c937393af1"
},
"downloads": -1,
"filename": "ChemDescriptors-0.0.8-py3-none-any.whl",
"has_sig": false,
"md5_digest": "9bfba7c5841bffc1aa4ed6a23abc1d73",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 9426,
"upload_time": "2025-01-05T19:26:25",
"upload_time_iso_8601": "2025-01-05T19:26:25.206465Z",
"url": "https://files.pythonhosted.org/packages/bd/35/42aec2ac31ad3af830ea208c3ae7b59df2f99e4b8a91a48464adea07773f/ChemDescriptors-0.0.8-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "4f5ffa7e3708f62d2827bf39e404b5987bebc5f72191f8428f0419ad823cc3e8",
"md5": "83083e1ef2fb65701687266c8a60a9a2",
"sha256": "e0aee373e9b47d34aa2f1e93937c4182837d3ba251ab513ab127fabff3674810"
},
"downloads": -1,
"filename": "chemdescriptors-0.0.8.tar.gz",
"has_sig": false,
"md5_digest": "83083e1ef2fb65701687266c8a60a9a2",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 9321,
"upload_time": "2025-01-05T19:26:28",
"upload_time_iso_8601": "2025-01-05T19:26:28.702142Z",
"url": "https://files.pythonhosted.org/packages/4f/5f/fa7e3708f62d2827bf39e404b5987bebc5f72191f8428f0419ad823cc3e8/chemdescriptors-0.0.8.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-05 19:26:28",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "AhmedAlhilal14",
"github_project": "chemical-descriptors",
"github_not_found": true,
"lcname": "chemdescriptors"
}