# Python Library: `Chemical_Descriptors`
This function generates one of several fingerprint types from the list of **molecular fingerprints** available, each serving specific tasks in cheminformatics and computational chemistry.
### Importance of Fingerprint Types:
- **Distinct Representation:** Different fingerprint types capture various aspects of a molecule’s structure, allowing for versatile molecular comparisons.
- **Diverse Applications:** Depending on the task (such as similarity searching, classification, or clustering), choosing the right fingerprint type ensures better performance in chemical analysis and predictive modeling.
- **Accuracy in Modeling:** The right fingerprint type can significantly improve the accuracy of machine learning models and predictions based on molecular features.
### Number of Fingerprints:
## Functions
### **cal_rdkit_descriptor**(input_file, output_file, smiles_column)
**Description:**
This function calculates RDKit molecular descriptors for molecules specified in a CSV file (`input_file`) using SMILES strings from a specified column (`smiles_column`). The calculated descriptors are appended as additional columns to the original data and saved in a new CSV file named `<input_file_name>_rdkit_descriptor.csv`.
**Parameters:**
- `input_file` (str): Path to the input CSV file containing molecular data in SMILES format.
- `output_file` (str): Path where the output CSV file will be saved (optional if using the default naming convention).
- `smiles_column` (str): The name of the column in the input CSV file that contains the SMILES strings.
**Output:**
The output will be saved as a CSV file named `<input_file_name>_rdkit_descriptor.csv`.
---
### **cal_lipinski_descriptors**(file_path, smiles_column, verbose=False)
**Description:**
This function calculates Lipinski descriptors for molecules specified in a CSV file (`file_path`) using SMILES strings from a specified column (`smiles_column`). It automatically saves the calculated descriptors to an output file named `<input_file_name>_lipinski_descriptors.csv`.
**Parameters:**
- `file_path` (str): Path to the input CSV file containing molecular data in SMILES format.
- `smiles_column` (str): The name of the column in the input CSV file that contains the SMILES strings.
- `verbose` (bool, optional): If `True`, the function will print additional processing details. Default is `False`.
**Output:**
The output will be saved as a CSV file named `<input_file_name>_lipinski_descriptors.csv`.
---
### **cal_morgan_fpts**(input_file, smiles_column)
**Description:**
Calculates Morgan fingerprints for molecules specified in a CSV file (`input_file`) using SMILES strings from a specified column (`smiles_column`). The function saves the calculated fingerprints to an output file named `<input_file_name>_calculate_morgan_fpts.csv`.
**Parameters:**
- `input_file` (str): Path to the input CSV file containing molecular data in SMILES format.
- `smiles_column` (str): The name of the column in the input CSV file that contains the SMILES strings.
**Output:**
The output will be saved as a CSV file named `<input_file_name>_calculate_morgan_fpts.csv`.
---
### **cal_mordred_descriptors**(input_file, smiles_column)
**Description:**
Computes Mordred descriptors for molecules specified in a CSV file (`input_file`) using SMILES strings from a specified column (`smiles_column`). The function saves the computed descriptors to an output file named `<input_file_name>_mordred_descriptors.csv`.
**Parameters:**
- `input_file` (str): Path to the input CSV file containing molecular data in SMILES format.
- `smiles_column` (str): The name of the column in the input CSV file that contains the SMILES strings.
**Output:**
The output will be saved as a CSV file named `<input_file_name>_mordred_descriptors.csv`.
---
### **calculate_selected_fingerprints**(input_file, smiles_column)
**Description:**
Before using this function, execute the following code snippet to download and unzip the necessary files:
```bash
! wget https://github.com/dataprofessor/padel/raw/main/fingerprints_xml.zip
! unzip fingerprints_xml.zip
## Molecular Fingerprint Calculation
This function calculates 12 different types of molecular fingerprints:
- `AtomPairs2DCount`
- `AtomPairs2D`
- `EState`
- `CDKextended`
- `CDK`
- `CDKgraphonly`
- `KlekotaRothCount`
- `KlekotaRoth`
- `MACCS`
- `PubChem`
- `SubstructureCount`
- `Substructure`
Each enhanced dataset with fingerprints is saved as separate CSV files, appended with the respective fingerprint type name.
### Parameters:
- `input_file` (str): Path to the input CSV file containing molecular data in SMILES format.
- `smiles_column` (str): The name of the column in the input CSV file that contains the SMILES strings.
### Output:
Each fingerprint type will be saved as a separate CSV file with the respective fingerprint type name appended.
For example: `<input_file_name>_AtomPairs2DCount.csv`.
---
## **fps**(filename, smiles_column, fp_type)
**Description:**
This function calculates a specified molecular fingerprint (`fp_type`) for each molecule in a CSV file. The user must provide:
- The CSV file (`filename`)
- The SMILES column name in the file (`smiles_column`)
- One of the following fingerprint types:
- `maccs`
- `avalon`
- `pattern`
- `layered`
- `map4`
- `secfp`
- `erg`
- `estate`
- `avalon-count`
- `ecfp`
- `fcfp`
- `topological`
- `atompair`
- `rdkit`
- `ecfp-count`
- `fcfp-count`
- `topological-count`
- `atompair-count`
- `rdkit-count`
### Parameters:
- `filename` (str): Path to the input CSV file containing molecular data in SMILES format.
- `smiles_column` (str): The name of the column in the input CSV file that contains the SMILES strings.
- `fp_type` (str): The type of molecular fingerprint to calculate (choose from the list of fingerprint types above).
### Output:
The output will be saved as a CSV file named `<input_file_name>_<fp_type>.csv` depending on the fingerprint type chosen.
Ahmed Alhilal
=============
0.0.1 (05/01/2025)
-------------------
- First Release
Raw data
{
"_id": null,
"home_page": "https://github.com/AhmedAlhilal14/chemical-descriptors.git",
"name": "Chemical-Descriptors",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "Cheminformatics, Molecular Descriptors, Fingerprints, RDKit, Mordred, Padelpy",
"author": "Ahmed Alhilal",
"author_email": "aalhilal@udel.edu",
"download_url": "https://files.pythonhosted.org/packages/87/f2/0589fe48d330a62a621344dd074c6c94743937944da92c4c125a3b2b550d/chemical_descriptors-0.0.1.tar.gz",
"platform": null,
"description": "# Python Library: `Chemical_Descriptors`\r\n\r\nThis function generates one of several fingerprint types from the list of **molecular fingerprints** available, each serving specific tasks in cheminformatics and computational chemistry.\r\n\r\n### Importance of Fingerprint Types:\r\n- **Distinct Representation:** Different fingerprint types capture various aspects of a molecule\u2019s structure, allowing for versatile molecular comparisons.\r\n\r\n- **Diverse Applications:** Depending on the task (such as similarity searching, classification, or clustering), choosing the right fingerprint type ensures better performance in chemical analysis and predictive modeling.\r\n\r\n- **Accuracy in Modeling:** The right fingerprint type can significantly improve the accuracy of machine learning models and predictions based on molecular features.\r\n\r\n\r\n### Number of Fingerprints:\r\n\r\n\r\n## Functions\r\n\r\n### **cal_rdkit_descriptor**(input_file, output_file, smiles_column)\r\n**Description:** \r\nThis function calculates RDKit molecular descriptors for molecules specified in a CSV file (`input_file`) using SMILES strings from a specified column (`smiles_column`). The calculated descriptors are appended as additional columns to the original data and saved in a new CSV file named `<input_file_name>_rdkit_descriptor.csv`.\r\n\r\n**Parameters:**\r\n- `input_file` (str): Path to the input CSV file containing molecular data in SMILES format.\r\n- `output_file` (str): Path where the output CSV file will be saved (optional if using the default naming convention).\r\n- `smiles_column` (str): The name of the column in the input CSV file that contains the SMILES strings.\r\n\r\n**Output:** \r\nThe output will be saved as a CSV file named `<input_file_name>_rdkit_descriptor.csv`.\r\n\r\n---\r\n\r\n### **cal_lipinski_descriptors**(file_path, smiles_column, verbose=False)\r\n**Description:** \r\nThis function calculates Lipinski descriptors for molecules specified in a CSV file (`file_path`) using SMILES strings from a specified column (`smiles_column`). It automatically saves the calculated descriptors to an output file named `<input_file_name>_lipinski_descriptors.csv`.\r\n\r\n**Parameters:**\r\n- `file_path` (str): Path to the input CSV file containing molecular data in SMILES format.\r\n- `smiles_column` (str): The name of the column in the input CSV file that contains the SMILES strings.\r\n- `verbose` (bool, optional): If `True`, the function will print additional processing details. Default is `False`.\r\n\r\n**Output:** \r\nThe output will be saved as a CSV file named `<input_file_name>_lipinski_descriptors.csv`.\r\n\r\n---\r\n\r\n### **cal_morgan_fpts**(input_file, smiles_column)\r\n**Description:** \r\nCalculates Morgan fingerprints for molecules specified in a CSV file (`input_file`) using SMILES strings from a specified column (`smiles_column`). The function saves the calculated fingerprints to an output file named `<input_file_name>_calculate_morgan_fpts.csv`.\r\n\r\n**Parameters:**\r\n- `input_file` (str): Path to the input CSV file containing molecular data in SMILES format.\r\n- `smiles_column` (str): The name of the column in the input CSV file that contains the SMILES strings.\r\n\r\n**Output:** \r\nThe output will be saved as a CSV file named `<input_file_name>_calculate_morgan_fpts.csv`.\r\n\r\n---\r\n\r\n### **cal_mordred_descriptors**(input_file, smiles_column)\r\n**Description:** \r\nComputes Mordred descriptors for molecules specified in a CSV file (`input_file`) using SMILES strings from a specified column (`smiles_column`). The function saves the computed descriptors to an output file named `<input_file_name>_mordred_descriptors.csv`.\r\n\r\n**Parameters:**\r\n- `input_file` (str): Path to the input CSV file containing molecular data in SMILES format.\r\n- `smiles_column` (str): The name of the column in the input CSV file that contains the SMILES strings.\r\n\r\n**Output:** \r\nThe output will be saved as a CSV file named `<input_file_name>_mordred_descriptors.csv`.\r\n\r\n---\r\n\r\n### **calculate_selected_fingerprints**(input_file, smiles_column)\r\n**Description:** \r\nBefore using this function, execute the following code snippet to download and unzip the necessary files:\r\n\r\n```bash\r\n! wget https://github.com/dataprofessor/padel/raw/main/fingerprints_xml.zip\r\n! unzip fingerprints_xml.zip\r\n## Molecular Fingerprint Calculation\r\n\r\nThis function calculates 12 different types of molecular fingerprints:\r\n\r\n- `AtomPairs2DCount`\r\n- `AtomPairs2D`\r\n- `EState`\r\n- `CDKextended`\r\n- `CDK`\r\n- `CDKgraphonly`\r\n- `KlekotaRothCount`\r\n- `KlekotaRoth`\r\n- `MACCS`\r\n- `PubChem`\r\n- `SubstructureCount`\r\n- `Substructure`\r\n\r\nEach enhanced dataset with fingerprints is saved as separate CSV files, appended with the respective fingerprint type name.\r\n\r\n### Parameters:\r\n- `input_file` (str): Path to the input CSV file containing molecular data in SMILES format.\r\n- `smiles_column` (str): The name of the column in the input CSV file that contains the SMILES strings.\r\n\r\n### Output:\r\nEach fingerprint type will be saved as a separate CSV file with the respective fingerprint type name appended. \r\nFor example: `<input_file_name>_AtomPairs2DCount.csv`.\r\n\r\n---\r\n\r\n## **fps**(filename, smiles_column, fp_type)\r\n\r\n**Description:** \r\nThis function calculates a specified molecular fingerprint (`fp_type`) for each molecule in a CSV file. The user must provide:\r\n- The CSV file (`filename`)\r\n- The SMILES column name in the file (`smiles_column`)\r\n- One of the following fingerprint types:\r\n\r\n - `maccs`\r\n - `avalon`\r\n - `pattern`\r\n - `layered`\r\n - `map4`\r\n - `secfp`\r\n - `erg`\r\n - `estate`\r\n - `avalon-count`\r\n - `ecfp`\r\n - `fcfp`\r\n - `topological`\r\n - `atompair`\r\n - `rdkit`\r\n - `ecfp-count`\r\n - `fcfp-count`\r\n - `topological-count`\r\n - `atompair-count`\r\n - `rdkit-count`\r\n\r\n### Parameters:\r\n- `filename` (str): Path to the input CSV file containing molecular data in SMILES format.\r\n- `smiles_column` (str): The name of the column in the input CSV file that contains the SMILES strings.\r\n- `fp_type` (str): The type of molecular fingerprint to calculate (choose from the list of fingerprint types above).\r\n\r\n### Output:\r\nThe output will be saved as a CSV file named `<input_file_name>_<fp_type>.csv` depending on the fingerprint type chosen.\r\n\r\n\r\nAhmed Alhilal\r\n=============\r\n\r\n0.0.1 (05/01/2025)\r\n-------------------\r\n- First Release\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Chemical descriptors is a powerful Python package facilitating calculation of fingerprints for CSV files",
"version": "0.0.1",
"project_urls": {
"Homepage": "https://github.com/AhmedAlhilal14/chemical-descriptors.git"
},
"split_keywords": [
"cheminformatics",
" molecular descriptors",
" fingerprints",
" rdkit",
" mordred",
" padelpy"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "d440f1c1208f40e3ae37d7b5f281a30e58da8f6877c6798d457d6e9ec2451205",
"md5": "78661b9afa0f0cd70692103b9dd7c51a",
"sha256": "9ee6c8e06212dc31bd8a173df43dd5c64c155c4bd879010055c90b972a931535"
},
"downloads": -1,
"filename": "Chemical_Descriptors-0.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "78661b9afa0f0cd70692103b9dd7c51a",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 7978,
"upload_time": "2025-01-05T01:19:45",
"upload_time_iso_8601": "2025-01-05T01:19:45.982760Z",
"url": "https://files.pythonhosted.org/packages/d4/40/f1c1208f40e3ae37d7b5f281a30e58da8f6877c6798d457d6e9ec2451205/Chemical_Descriptors-0.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "87f20589fe48d330a62a621344dd074c6c94743937944da92c4c125a3b2b550d",
"md5": "b71a456dc454354d1ea11bc611709dc8",
"sha256": "74724b9afb9f33566e28f76a38b9bb38be83704446dc73ee823a06e2998507b0"
},
"downloads": -1,
"filename": "chemical_descriptors-0.0.1.tar.gz",
"has_sig": false,
"md5_digest": "b71a456dc454354d1ea11bc611709dc8",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 7689,
"upload_time": "2025-01-05T01:19:48",
"upload_time_iso_8601": "2025-01-05T01:19:48.547846Z",
"url": "https://files.pythonhosted.org/packages/87/f2/0589fe48d330a62a621344dd074c6c94743937944da92c4c125a3b2b550d/chemical_descriptors-0.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-05 01:19:48",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "AhmedAlhilal14",
"github_project": "chemical-descriptors",
"github_not_found": true,
"lcname": "chemical-descriptors"
}