# Chem Func
Useful functions and scripts for working with small molecules.
## Installation
Optionally, create a conda environment.
```bash
conda create -y -n chemfunc python=3.10
conda activate chemfunc
```
Install the latest version of Chem Func using pip.
```
pip install chemfunc
```
Alternatively, clone the repository and install the local version of the package.
```
git clone https://github.com/swansonk14/chemfunc.git
cd chemfunc
pip install -e .
```
If there are version issues with the required packages, create a conda environment with specific working versions of the packages as follows.
```bash
pip install -r requirements.txt
pip install -e .
```
**Note:** If you get the issue `ImportError: libXrender.so.1: cannot open shared object file: No such file or directory`, run `conda install -c conda-forge xorg-libxrender`.
## Features
Chem Func contains a variety of useful functions and scripts for working with small molecules.
Functions can be imported from the `chemfunc` package. For example:
```python
from pathlib import Path
from chemfunc.sdf_to_smiles import sdf_to_smiles
sdf_to_smiles(
data_path=Path('molecules.sdf'),
save_path=Path('molecules.csv')
)
```
Most modules can also be run as scripts from the command line using the `chemfunc` command along with the appropriate function name. For example:
```bash
chemfunc sdf_to_smiles \
--data_path molecules.sdf \
--save_path molecules.csv
```
To see a list of available scripts, run `chemfunc -h`.
For each script, run `chemfunc <script_name> -h` to see a description of the arguments for that script.
## Contents
Below is a list of the contents of the package.
[`canonicalize_smiles.py`](https://github.com/swansonk14/chemfunc/blob/main/chemfunc/canonicalize_smiles.py) (function, script)
Canonicalizes SMILES using RDKit canonicalization and optionally strips salts.
[`chemical_diversity.py`](https://github.com/swansonk14/chemfunc/blob/main/chemfunc/chemical_diversity.py) (function, script)
Computes the chemical diversity of a set of molecules in terms of Tanimoto distances.
[`cluster_molecules.py`](https://github.com/swansonk14/chemfunc/blob/main/chemfunc/cluster_molecules.py) (function, script)
Performs k-means clustering to cluster molecules based on Morgan fingerprints.
[`compute_property_distribution.py`](https://github.com/swansonk14/chemfunc/blob/main/chemfunc/compute_property_distribution.py) (function, script)
Computes one or more molecular properties for a set of molecules.
[`deduplicate_smiles.py`](https://github.com/swansonk14/chemfunc/blob/main/chemfunc/deduplicate_smiles.py) (function, script)
Deduplicate a CSV files by SMILES.
[`filter_molecules.py`](https://github.com/swansonk14/chemfunc/blob/main/chemfunc/filter_molecules.py) (function, script)
Filters molecules to those with values in a certain range.
[`measure_experimental_reproducibility.py`](https://github.com/swansonk14/chemfunc/blob/main/chemfunc/measure_experimental_reproducibility.py) (function, script)
Measures the experimental reproducibility of two biological replicates by using one replicate to predict the other.
[`molecular_fingerprints.py`](https://github.com/swansonk14/chemfunc/blob/main/chemfunc/molecular_fingerprints.py) (functions, script)
Contains functions to compute fingerprints for molecules. Parallelized for speed. The function `save_fingerprints` can be used as a script to compute fingerprints from a CSV file and save them as an NPZ file.
[`molecular_properties.py`](https://github.com/swansonk14/chemfunc/blob/main/chemfunc/molecular_properties.py) (functions)
Contains functions to compute molecular properties. Parallelized for speed.
[`molecular_similarities.py`](https://github.com/swansonk14/chemfunc/blob/main/chemfunc/molecular_similarities.py) (functions)
Contains functions to compute similarities between molecules. Parallelized for speed.
[`nearest_neighbor.py`](https://github.com/swansonk14/chemfunc/blob/main/chemfunc/nearest_neighbor.py) (function, script)
Given a dataset of molecules, computes the nearest neighbor molecule in a second dataset using one of several similarity metrics.
[`plot_property_distribution.py`](https://github.com/swansonk14/chemfunc/blob/main/chemfunc/plot_property_distribution.py) (function, script)
Plots the distribution of molecular properties of a set of molecules.
[`plot_tsne.py`](https://github.com/swansonk14/chemfunc/blob/main/chemfunc/plot_tsne.py) (function, script)
Runs a t-SNE on molecular fingerprints from one or more chemical libraries.
[`regression_to_classification.py`](https://github.com/swansonk14/chemfunc/blob/main/chemfunc/regression_to_classification.py) (function, script)
Converts regression data to classification data using given thresholds.
[`sample_molecules.py`](https://github.com/swansonk14/chemfunc/blob/main/chemfunc/sample_molecules.py) (function, script)
Samples molecules from a CSV file, either uniformly at random across the entire dataset or uniformly at random from each cluster within the data.
[`sdf_to_smiles.py`](https://github.com/swansonk14/chemfunc/blob/main/chemfunc/sdf_to_smiles.py) (function, script)
Converts an SDF file to a CSV file with SMILES.
[`select_from_clusters.py`](https://github.com/swansonk14/chemfunc/blob/main/chemfunc/select_from_clusters.py) (function, script)
Selects the best molecule from each cluster.
[`smiles_to_svg.py`](https://github.com/swansonk14/chemfunc/blob/main/chemfunc/smiles_to_svg.py) (function, script)
Converts a SMILES string to an SVG image of the molecule.
[`visualize_molecules.py`](https://github.com/swansonk14/chemfunc/blob/main/chemfunc/visualize_molecules.py)(function, script)
Converts a file of SMILES to images of molecular structures.
[`visualize_reactions.py`](https://github.com/swansonk14/chemfunc/blob/main/chemfunc/visualize_reactions.py) (function, script)
Converts a file of reaction SMARTS to images of chemical reactions.
Raw data
{
"_id": null,
"home_page": "https://github.com/swansonk14/chemfunc",
"name": "chemfunc",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": "",
"keywords": "computational chemistry",
"author": "Kyle Swanson",
"author_email": "swansonk.14@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/d4/bb/357145ab3d20bc18f5eaa961cb5323f2431c88b96d91dcd0df0dba6caf39/chemfunc-1.0.5.tar.gz",
"platform": null,
"description": "# Chem Func\n\nUseful functions and scripts for working with small molecules.\n\n## Installation\n\nOptionally, create a conda environment.\n```bash\nconda create -y -n chemfunc python=3.10\nconda activate chemfunc\n```\n\nInstall the latest version of Chem Func using pip.\n```\npip install chemfunc\n```\n\nAlternatively, clone the repository and install the local version of the package.\n```\ngit clone https://github.com/swansonk14/chemfunc.git\ncd chemfunc\npip install -e .\n```\n\nIf there are version issues with the required packages, create a conda environment with specific working versions of the packages as follows.\n```bash\npip install -r requirements.txt\npip install -e .\n```\n\n**Note:** If you get the issue `ImportError: libXrender.so.1: cannot open shared object file: No such file or directory`, run `conda install -c conda-forge xorg-libxrender`.\n\n\n## Features\n\nChem Func contains a variety of useful functions and scripts for working with small molecules.\n\nFunctions can be imported from the `chemfunc` package. For example:\n```python\nfrom pathlib import Path\nfrom chemfunc.sdf_to_smiles import sdf_to_smiles\n\nsdf_to_smiles(\n data_path=Path('molecules.sdf'),\n save_path=Path('molecules.csv')\n)\n```\n\nMost modules can also be run as scripts from the command line using the `chemfunc` command along with the appropriate function name. For example:\n```bash\nchemfunc sdf_to_smiles \\\n --data_path molecules.sdf \\\n --save_path molecules.csv\n```\n\nTo see a list of available scripts, run `chemfunc -h`.\n\nFor each script, run `chemfunc <script_name> -h` to see a description of the arguments for that script.\n\n\n## Contents\n\nBelow is a list of the contents of the package.\n\n[`canonicalize_smiles.py`](https://github.com/swansonk14/chemfunc/blob/main/chemfunc/canonicalize_smiles.py) (function, script)\n\nCanonicalizes SMILES using RDKit canonicalization and optionally strips salts.\n\n[`chemical_diversity.py`](https://github.com/swansonk14/chemfunc/blob/main/chemfunc/chemical_diversity.py) (function, script)\n\nComputes the chemical diversity of a set of molecules in terms of Tanimoto distances.\n\n[`cluster_molecules.py`](https://github.com/swansonk14/chemfunc/blob/main/chemfunc/cluster_molecules.py) (function, script)\n\nPerforms k-means clustering to cluster molecules based on Morgan fingerprints.\n\n[`compute_property_distribution.py`](https://github.com/swansonk14/chemfunc/blob/main/chemfunc/compute_property_distribution.py) (function, script)\n\nComputes one or more molecular properties for a set of molecules.\n\n[`deduplicate_smiles.py`](https://github.com/swansonk14/chemfunc/blob/main/chemfunc/deduplicate_smiles.py) (function, script)\n\nDeduplicate a CSV files by SMILES.\n\n[`filter_molecules.py`](https://github.com/swansonk14/chemfunc/blob/main/chemfunc/filter_molecules.py) (function, script)\n\nFilters molecules to those with values in a certain range.\n\n[`measure_experimental_reproducibility.py`](https://github.com/swansonk14/chemfunc/blob/main/chemfunc/measure_experimental_reproducibility.py) (function, script)\n\nMeasures the experimental reproducibility of two biological replicates by using one replicate to predict the other.\n\n[`molecular_fingerprints.py`](https://github.com/swansonk14/chemfunc/blob/main/chemfunc/molecular_fingerprints.py) (functions, script)\n\nContains functions to compute fingerprints for molecules. Parallelized for speed. The function `save_fingerprints` can be used as a script to compute fingerprints from a CSV file and save them as an NPZ file.\n\n[`molecular_properties.py`](https://github.com/swansonk14/chemfunc/blob/main/chemfunc/molecular_properties.py) (functions)\n\nContains functions to compute molecular properties. Parallelized for speed.\n\n[`molecular_similarities.py`](https://github.com/swansonk14/chemfunc/blob/main/chemfunc/molecular_similarities.py) (functions)\n\nContains functions to compute similarities between molecules. Parallelized for speed.\n\n[`nearest_neighbor.py`](https://github.com/swansonk14/chemfunc/blob/main/chemfunc/nearest_neighbor.py) (function, script)\n\nGiven a dataset of molecules, computes the nearest neighbor molecule in a second dataset using one of several similarity metrics.\n\n[`plot_property_distribution.py`](https://github.com/swansonk14/chemfunc/blob/main/chemfunc/plot_property_distribution.py) (function, script)\n\nPlots the distribution of molecular properties of a set of molecules.\n\n[`plot_tsne.py`](https://github.com/swansonk14/chemfunc/blob/main/chemfunc/plot_tsne.py) (function, script)\n\nRuns a t-SNE on molecular fingerprints from one or more chemical libraries.\n\n[`regression_to_classification.py`](https://github.com/swansonk14/chemfunc/blob/main/chemfunc/regression_to_classification.py) (function, script)\n\nConverts regression data to classification data using given thresholds.\n\n[`sample_molecules.py`](https://github.com/swansonk14/chemfunc/blob/main/chemfunc/sample_molecules.py) (function, script)\n\nSamples molecules from a CSV file, either uniformly at random across the entire dataset or uniformly at random from each cluster within the data.\n\n[`sdf_to_smiles.py`](https://github.com/swansonk14/chemfunc/blob/main/chemfunc/sdf_to_smiles.py) (function, script)\n\nConverts an SDF file to a CSV file with SMILES.\n\n[`select_from_clusters.py`](https://github.com/swansonk14/chemfunc/blob/main/chemfunc/select_from_clusters.py) (function, script)\n\nSelects the best molecule from each cluster.\n\n[`smiles_to_svg.py`](https://github.com/swansonk14/chemfunc/blob/main/chemfunc/smiles_to_svg.py) (function, script)\n\nConverts a SMILES string to an SVG image of the molecule.\n\n[`visualize_molecules.py`](https://github.com/swansonk14/chemfunc/blob/main/chemfunc/visualize_molecules.py)(function, script)\n\nConverts a file of SMILES to images of molecular structures.\n\n[`visualize_reactions.py`](https://github.com/swansonk14/chemfunc/blob/main/chemfunc/visualize_reactions.py) (function, script)\n\nConverts a file of reaction SMARTS to images of chemical reactions.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Chem Func",
"version": "1.0.5",
"project_urls": {
"Download": "https://github.com/swansonk14/chemfunc/v_1.0.5.tar.gz",
"Homepage": "https://github.com/swansonk14/chemfunc",
"PyPi": "https://pypi.org/project/chemfunc/",
"Source": "https://github.com/swansonk14/chemfunc"
},
"split_keywords": [
"computational",
"chemistry"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "d4bb357145ab3d20bc18f5eaa961cb5323f2431c88b96d91dcd0df0dba6caf39",
"md5": "3bd8a54a4ba935096b03225ab065e793",
"sha256": "ae45bda9308568c3107906ee1652f3b6d4aaf1ad9e62697b6f43d5c110725183"
},
"downloads": -1,
"filename": "chemfunc-1.0.5.tar.gz",
"has_sig": false,
"md5_digest": "3bd8a54a4ba935096b03225ab065e793",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 20899,
"upload_time": "2023-12-29T16:37:36",
"upload_time_iso_8601": "2023-12-29T16:37:36.858893Z",
"url": "https://files.pythonhosted.org/packages/d4/bb/357145ab3d20bc18f5eaa961cb5323f2431c88b96d91dcd0df0dba6caf39/chemfunc-1.0.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-12-29 16:37:36",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "swansonk14",
"github_project": "chemfunc",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "chemfunc"
}