Chemical-Descriptors


NameChemical-Descriptors JSON
Version 0.0.1 PyPI version JSON
download
home_pagehttps://github.com/AhmedAlhilal14/chemical-descriptors.git
SummaryChemical descriptors is a powerful Python package facilitating calculation of fingerprints for CSV files
upload_time2025-01-05 01:19:48
maintainerNone
docs_urlNone
authorAhmed Alhilal
requires_pythonNone
licenseMIT
keywords cheminformatics molecular descriptors fingerprints rdkit mordred padelpy
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Python Library: `Chemical_Descriptors`

This function generates one of several fingerprint types from the list of **molecular fingerprints** available, each serving specific tasks in cheminformatics and computational chemistry.

### Importance of Fingerprint Types:
- **Distinct Representation:** Different fingerprint types capture various aspects of a molecule’s structure, allowing for versatile molecular comparisons.

- **Diverse Applications:** Depending on the task (such as similarity searching, classification, or clustering), choosing the right fingerprint type ensures better performance in chemical analysis and predictive modeling.

- **Accuracy in Modeling:** The right fingerprint type can significantly improve the accuracy of machine learning models and predictions based on molecular features.


### Number of Fingerprints:


## Functions

### **cal_rdkit_descriptor**(input_file, output_file, smiles_column)
**Description:**  
This function calculates RDKit molecular descriptors for molecules specified in a CSV file (`input_file`) using SMILES strings from a specified column (`smiles_column`). The calculated descriptors are appended as additional columns to the original data and saved in a new CSV file named `<input_file_name>_rdkit_descriptor.csv`.

**Parameters:**
- `input_file` (str): Path to the input CSV file containing molecular data in SMILES format.
- `output_file` (str): Path where the output CSV file will be saved (optional if using the default naming convention).
- `smiles_column` (str): The name of the column in the input CSV file that contains the SMILES strings.

**Output:**  
The output will be saved as a CSV file named `<input_file_name>_rdkit_descriptor.csv`.

---

### **cal_lipinski_descriptors**(file_path, smiles_column, verbose=False)
**Description:**  
This function calculates Lipinski descriptors for molecules specified in a CSV file (`file_path`) using SMILES strings from a specified column (`smiles_column`). It automatically saves the calculated descriptors to an output file named `<input_file_name>_lipinski_descriptors.csv`.

**Parameters:**
- `file_path` (str): Path to the input CSV file containing molecular data in SMILES format.
- `smiles_column` (str): The name of the column in the input CSV file that contains the SMILES strings.
- `verbose` (bool, optional): If `True`, the function will print additional processing details. Default is `False`.

**Output:**  
The output will be saved as a CSV file named `<input_file_name>_lipinski_descriptors.csv`.

---

### **cal_morgan_fpts**(input_file, smiles_column)
**Description:**  
Calculates Morgan fingerprints for molecules specified in a CSV file (`input_file`) using SMILES strings from a specified column (`smiles_column`). The function saves the calculated fingerprints to an output file named `<input_file_name>_calculate_morgan_fpts.csv`.

**Parameters:**
- `input_file` (str): Path to the input CSV file containing molecular data in SMILES format.
- `smiles_column` (str): The name of the column in the input CSV file that contains the SMILES strings.

**Output:**  
The output will be saved as a CSV file named `<input_file_name>_calculate_morgan_fpts.csv`.

---

### **cal_mordred_descriptors**(input_file, smiles_column)
**Description:**  
Computes Mordred descriptors for molecules specified in a CSV file (`input_file`) using SMILES strings from a specified column (`smiles_column`). The function saves the computed descriptors to an output file named `<input_file_name>_mordred_descriptors.csv`.

**Parameters:**
- `input_file` (str): Path to the input CSV file containing molecular data in SMILES format.
- `smiles_column` (str): The name of the column in the input CSV file that contains the SMILES strings.

**Output:**  
The output will be saved as a CSV file named `<input_file_name>_mordred_descriptors.csv`.

---

### **calculate_selected_fingerprints**(input_file, smiles_column)
**Description:**  
Before using this function, execute the following code snippet to download and unzip the necessary files:

```bash
! wget https://github.com/dataprofessor/padel/raw/main/fingerprints_xml.zip
! unzip fingerprints_xml.zip
## Molecular Fingerprint Calculation

This function calculates 12 different types of molecular fingerprints:

- `AtomPairs2DCount`
- `AtomPairs2D`
- `EState`
- `CDKextended`
- `CDK`
- `CDKgraphonly`
- `KlekotaRothCount`
- `KlekotaRoth`
- `MACCS`
- `PubChem`
- `SubstructureCount`
- `Substructure`

Each enhanced dataset with fingerprints is saved as separate CSV files, appended with the respective fingerprint type name.

### Parameters:
- `input_file` (str): Path to the input CSV file containing molecular data in SMILES format.
- `smiles_column` (str): The name of the column in the input CSV file that contains the SMILES strings.

### Output:
Each fingerprint type will be saved as a separate CSV file with the respective fingerprint type name appended.  
For example: `<input_file_name>_AtomPairs2DCount.csv`.

---

## **fps**(filename, smiles_column, fp_type)

**Description:**  
This function calculates a specified molecular fingerprint (`fp_type`) for each molecule in a CSV file. The user must provide:
- The CSV file (`filename`)
- The SMILES column name in the file (`smiles_column`)
- One of the following fingerprint types:

  - `maccs`
  - `avalon`
  - `pattern`
  - `layered`
  - `map4`
  - `secfp`
  - `erg`
  - `estate`
  - `avalon-count`
  - `ecfp`
  - `fcfp`
  - `topological`
  - `atompair`
  - `rdkit`
  - `ecfp-count`
  - `fcfp-count`
  - `topological-count`
  - `atompair-count`
  - `rdkit-count`

### Parameters:
- `filename` (str): Path to the input CSV file containing molecular data in SMILES format.
- `smiles_column` (str): The name of the column in the input CSV file that contains the SMILES strings.
- `fp_type` (str): The type of molecular fingerprint to calculate (choose from the list of fingerprint types above).

### Output:
The output will be saved as a CSV file named `<input_file_name>_<fp_type>.csv` depending on the fingerprint type chosen.


Ahmed Alhilal
=============

0.0.1 (05/01/2025)
-------------------
- First Release

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/AhmedAlhilal14/chemical-descriptors.git",
    "name": "Chemical-Descriptors",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "Cheminformatics, Molecular Descriptors, Fingerprints, RDKit, Mordred, Padelpy",
    "author": "Ahmed Alhilal",
    "author_email": "aalhilal@udel.edu",
    "download_url": "https://files.pythonhosted.org/packages/87/f2/0589fe48d330a62a621344dd074c6c94743937944da92c4c125a3b2b550d/chemical_descriptors-0.0.1.tar.gz",
    "platform": null,
    "description": "# Python Library: `Chemical_Descriptors`\r\n\r\nThis function generates one of several fingerprint types from the list of **molecular fingerprints** available, each serving specific tasks in cheminformatics and computational chemistry.\r\n\r\n### Importance of Fingerprint Types:\r\n- **Distinct Representation:** Different fingerprint types capture various aspects of a molecule\u2019s structure, allowing for versatile molecular comparisons.\r\n\r\n- **Diverse Applications:** Depending on the task (such as similarity searching, classification, or clustering), choosing the right fingerprint type ensures better performance in chemical analysis and predictive modeling.\r\n\r\n- **Accuracy in Modeling:** The right fingerprint type can significantly improve the accuracy of machine learning models and predictions based on molecular features.\r\n\r\n\r\n### Number of Fingerprints:\r\n\r\n\r\n## Functions\r\n\r\n### **cal_rdkit_descriptor**(input_file, output_file, smiles_column)\r\n**Description:**  \r\nThis function calculates RDKit molecular descriptors for molecules specified in a CSV file (`input_file`) using SMILES strings from a specified column (`smiles_column`). The calculated descriptors are appended as additional columns to the original data and saved in a new CSV file named `<input_file_name>_rdkit_descriptor.csv`.\r\n\r\n**Parameters:**\r\n- `input_file` (str): Path to the input CSV file containing molecular data in SMILES format.\r\n- `output_file` (str): Path where the output CSV file will be saved (optional if using the default naming convention).\r\n- `smiles_column` (str): The name of the column in the input CSV file that contains the SMILES strings.\r\n\r\n**Output:**  \r\nThe output will be saved as a CSV file named `<input_file_name>_rdkit_descriptor.csv`.\r\n\r\n---\r\n\r\n### **cal_lipinski_descriptors**(file_path, smiles_column, verbose=False)\r\n**Description:**  \r\nThis function calculates Lipinski descriptors for molecules specified in a CSV file (`file_path`) using SMILES strings from a specified column (`smiles_column`). It automatically saves the calculated descriptors to an output file named `<input_file_name>_lipinski_descriptors.csv`.\r\n\r\n**Parameters:**\r\n- `file_path` (str): Path to the input CSV file containing molecular data in SMILES format.\r\n- `smiles_column` (str): The name of the column in the input CSV file that contains the SMILES strings.\r\n- `verbose` (bool, optional): If `True`, the function will print additional processing details. Default is `False`.\r\n\r\n**Output:**  \r\nThe output will be saved as a CSV file named `<input_file_name>_lipinski_descriptors.csv`.\r\n\r\n---\r\n\r\n### **cal_morgan_fpts**(input_file, smiles_column)\r\n**Description:**  \r\nCalculates Morgan fingerprints for molecules specified in a CSV file (`input_file`) using SMILES strings from a specified column (`smiles_column`). The function saves the calculated fingerprints to an output file named `<input_file_name>_calculate_morgan_fpts.csv`.\r\n\r\n**Parameters:**\r\n- `input_file` (str): Path to the input CSV file containing molecular data in SMILES format.\r\n- `smiles_column` (str): The name of the column in the input CSV file that contains the SMILES strings.\r\n\r\n**Output:**  \r\nThe output will be saved as a CSV file named `<input_file_name>_calculate_morgan_fpts.csv`.\r\n\r\n---\r\n\r\n### **cal_mordred_descriptors**(input_file, smiles_column)\r\n**Description:**  \r\nComputes Mordred descriptors for molecules specified in a CSV file (`input_file`) using SMILES strings from a specified column (`smiles_column`). The function saves the computed descriptors to an output file named `<input_file_name>_mordred_descriptors.csv`.\r\n\r\n**Parameters:**\r\n- `input_file` (str): Path to the input CSV file containing molecular data in SMILES format.\r\n- `smiles_column` (str): The name of the column in the input CSV file that contains the SMILES strings.\r\n\r\n**Output:**  \r\nThe output will be saved as a CSV file named `<input_file_name>_mordred_descriptors.csv`.\r\n\r\n---\r\n\r\n### **calculate_selected_fingerprints**(input_file, smiles_column)\r\n**Description:**  \r\nBefore using this function, execute the following code snippet to download and unzip the necessary files:\r\n\r\n```bash\r\n! wget https://github.com/dataprofessor/padel/raw/main/fingerprints_xml.zip\r\n! unzip fingerprints_xml.zip\r\n## Molecular Fingerprint Calculation\r\n\r\nThis function calculates 12 different types of molecular fingerprints:\r\n\r\n- `AtomPairs2DCount`\r\n- `AtomPairs2D`\r\n- `EState`\r\n- `CDKextended`\r\n- `CDK`\r\n- `CDKgraphonly`\r\n- `KlekotaRothCount`\r\n- `KlekotaRoth`\r\n- `MACCS`\r\n- `PubChem`\r\n- `SubstructureCount`\r\n- `Substructure`\r\n\r\nEach enhanced dataset with fingerprints is saved as separate CSV files, appended with the respective fingerprint type name.\r\n\r\n### Parameters:\r\n- `input_file` (str): Path to the input CSV file containing molecular data in SMILES format.\r\n- `smiles_column` (str): The name of the column in the input CSV file that contains the SMILES strings.\r\n\r\n### Output:\r\nEach fingerprint type will be saved as a separate CSV file with the respective fingerprint type name appended.  \r\nFor example: `<input_file_name>_AtomPairs2DCount.csv`.\r\n\r\n---\r\n\r\n## **fps**(filename, smiles_column, fp_type)\r\n\r\n**Description:**  \r\nThis function calculates a specified molecular fingerprint (`fp_type`) for each molecule in a CSV file. The user must provide:\r\n- The CSV file (`filename`)\r\n- The SMILES column name in the file (`smiles_column`)\r\n- One of the following fingerprint types:\r\n\r\n  - `maccs`\r\n  - `avalon`\r\n  - `pattern`\r\n  - `layered`\r\n  - `map4`\r\n  - `secfp`\r\n  - `erg`\r\n  - `estate`\r\n  - `avalon-count`\r\n  - `ecfp`\r\n  - `fcfp`\r\n  - `topological`\r\n  - `atompair`\r\n  - `rdkit`\r\n  - `ecfp-count`\r\n  - `fcfp-count`\r\n  - `topological-count`\r\n  - `atompair-count`\r\n  - `rdkit-count`\r\n\r\n### Parameters:\r\n- `filename` (str): Path to the input CSV file containing molecular data in SMILES format.\r\n- `smiles_column` (str): The name of the column in the input CSV file that contains the SMILES strings.\r\n- `fp_type` (str): The type of molecular fingerprint to calculate (choose from the list of fingerprint types above).\r\n\r\n### Output:\r\nThe output will be saved as a CSV file named `<input_file_name>_<fp_type>.csv` depending on the fingerprint type chosen.\r\n\r\n\r\nAhmed Alhilal\r\n=============\r\n\r\n0.0.1 (05/01/2025)\r\n-------------------\r\n- First Release\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Chemical descriptors is a powerful Python package facilitating calculation of fingerprints for CSV files",
    "version": "0.0.1",
    "project_urls": {
        "Homepage": "https://github.com/AhmedAlhilal14/chemical-descriptors.git"
    },
    "split_keywords": [
        "cheminformatics",
        " molecular descriptors",
        " fingerprints",
        " rdkit",
        " mordred",
        " padelpy"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d440f1c1208f40e3ae37d7b5f281a30e58da8f6877c6798d457d6e9ec2451205",
                "md5": "78661b9afa0f0cd70692103b9dd7c51a",
                "sha256": "9ee6c8e06212dc31bd8a173df43dd5c64c155c4bd879010055c90b972a931535"
            },
            "downloads": -1,
            "filename": "Chemical_Descriptors-0.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "78661b9afa0f0cd70692103b9dd7c51a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 7978,
            "upload_time": "2025-01-05T01:19:45",
            "upload_time_iso_8601": "2025-01-05T01:19:45.982760Z",
            "url": "https://files.pythonhosted.org/packages/d4/40/f1c1208f40e3ae37d7b5f281a30e58da8f6877c6798d457d6e9ec2451205/Chemical_Descriptors-0.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "87f20589fe48d330a62a621344dd074c6c94743937944da92c4c125a3b2b550d",
                "md5": "b71a456dc454354d1ea11bc611709dc8",
                "sha256": "74724b9afb9f33566e28f76a38b9bb38be83704446dc73ee823a06e2998507b0"
            },
            "downloads": -1,
            "filename": "chemical_descriptors-0.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "b71a456dc454354d1ea11bc611709dc8",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 7689,
            "upload_time": "2025-01-05T01:19:48",
            "upload_time_iso_8601": "2025-01-05T01:19:48.547846Z",
            "url": "https://files.pythonhosted.org/packages/87/f2/0589fe48d330a62a621344dd074c6c94743937944da92c4c125a3b2b550d/chemical_descriptors-0.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-05 01:19:48",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "AhmedAlhilal14",
    "github_project": "chemical-descriptors",
    "github_not_found": true,
    "lcname": "chemical-descriptors"
}
        
Elapsed time: 1.86024s