# Filterzyme
Structural filtering pipeline using docking and active site heuristics to prioritze ML-predicted enzyme variants for experimental validation.
This tool processes superimposed ligand poses and filters them using geometric criteria such as distances, angles, and optionally, esterase-specific filters or nucleophilic proximity.
---
## Features
- Analysis of enzyme-ligand docking using multiple docking tools (ML- and physics-based).
- Optional catalytic nucleophile-focused analysis for esterases or other enzymes with nucleophilic catalytic residues.
- User-friendly pipeline only using a .pkl file as input and ligand smile strings.
---
## Installation
## Environment Setup
### Using conda
```bash
conda env create -f environment.yml
conda activate filterpipeline
```
### Clone the repository
```bash
git clone https://github.com/HelenSchmid/EnzymeStructuralFiltering.git
cd EnzymeStructuralFiltering
pip install .
```
### Coming soon: Install via pip
```bash
pip install enzyme-filtering-pipline
```
## Usage Example
The input pandas **DataFrame** must include:
- `Entry` – unique identifier for each enzyme
- `Sequence` – amino acid sequence of the enzyme
- `substrate_name` – name of the substrate
- `substrate_smiles` – SMILES string of substrate e.g. MEHP "CCCCC(CC)COC(=O)C1=CC=CC=C1C(=O)O"
- `substrate_moiety` – SMARTS pattern to define chemical moiety of interest within substrate e.g. general ester SMARTS "[C](=O)(O)(O)"
If cofactors are included, add:
- `cofactor_name` – name of the cofactor
- `cofactor_smiles` – SMILES string of cofactor e.g. PLP "CC1=NC=C(C(=C1O)C=O)COP(=O)(O)O"
- `cofactor_moiety` – SMARTS pattern to define chemical moiety of interest within the cofactor
```python
from filterzyme.pipeline import Pipeline
import pandas as pd
df = pd.read_pickle("example_df.pkl")
pipeline = Pipeline(
df = df,
ligand_name="TPP",
max_matches=1000, # number of matches during substructure SEARCH
esterase=0, # 1 if interested specifically in esterases
find_closest_nuc=1,
num_threads=1,
skip_catalytic_residue_prediction = False,
alternative_structure_for_vina = 'Chai',
squidly_dir='/nvme2/helen/EnzymeStructuralFiltering/filterzyme/squidly_final_models/',
base_output_dir="pipeline_output"
)
pipeline.run()
```
Raw data
{
"_id": null,
"home_page": "https://github.com/HelenSchmid/Filterzyme",
"name": "filterzyme",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "util",
"author": "Helen Schmid",
"author_email": "schmid.helen2@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/a8/02/d7181316561f51538a8ae4b586afccb7eb69f65a7a5d62b795c24bd78a4e/filterzyme-0.0.6.tar.gz",
"platform": null,
"description": "# Filterzyme\n\nStructural filtering pipeline using docking and active site heuristics to prioritze ML-predicted enzyme variants for experimental validation. \nThis tool processes superimposed ligand poses and filters them using geometric criteria such as distances, angles, and optionally, esterase-specific filters or nucleophilic proximity.\n\n---\n\n## Features\n\n- Analysis of enzyme-ligand docking using multiple docking tools (ML- and physics-based).\n- Optional catalytic nucleophile-focused analysis for esterases or other enzymes with nucleophilic catalytic residues. \n- User-friendly pipeline only using a .pkl file as input and ligand smile strings.\n\n---\n\n## Installation\n\n## Environment Setup\n### Using conda\n```bash\nconda env create -f environment.yml\nconda activate filterpipeline\n```\n\n### Clone the repository\n```bash\ngit clone https://github.com/HelenSchmid/EnzymeStructuralFiltering.git\ncd EnzymeStructuralFiltering\npip install .\n```\n\n### Coming soon: Install via pip\n```bash\npip install enzyme-filtering-pipline\n```\n\n## Usage Example\n\nThe input pandas **DataFrame** must include: \n- `Entry` \u2013 unique identifier for each enzyme \n- `Sequence` \u2013 amino acid sequence of the enzyme\n- `substrate_name` \u2013 name of the substrate\n- `substrate_smiles` \u2013 SMILES string of substrate e.g. MEHP \"CCCCC(CC)COC(=O)C1=CC=CC=C1C(=O)O\"\n- `substrate_moiety` \u2013 SMARTS pattern to define chemical moiety of interest within substrate e.g. general ester SMARTS \"[C](=O)(O)(O)\"\n\nIf cofactors are included, add:\n- `cofactor_name` \u2013 name of the cofactor\n- `cofactor_smiles` \u2013 SMILES string of cofactor e.g. PLP \"CC1=NC=C(C(=C1O)C=O)COP(=O)(O)O\" \n- `cofactor_moiety` \u2013 SMARTS pattern to define chemical moiety of interest within the cofactor \n\n\n```python\nfrom filterzyme.pipeline import Pipeline\nimport pandas as pd\n\ndf = pd.read_pickle(\"example_df.pkl\")\n\npipeline = Pipeline(\n df = df,\n ligand_name=\"TPP\",\n max_matches=1000, # number of matches during substructure SEARCH\n esterase=0, # 1 if interested specifically in esterases\n find_closest_nuc=1,\n num_threads=1,\n skip_catalytic_residue_prediction = False,\n alternative_structure_for_vina = 'Chai', \n squidly_dir='/nvme2/helen/EnzymeStructuralFiltering/filterzyme/squidly_final_models/',\n base_output_dir=\"pipeline_output\"\n )\n\npipeline.run()\n```\n",
"bugtrack_url": null,
"license": "GPL3",
"summary": null,
"version": "0.0.6",
"project_urls": {
"Bug Tracker": "https://github.com/HelenSchmid/Filterzyme/issues",
"Documentation": "https://github.com/HelenSchmid/Filterzyme",
"Homepage": "https://github.com/HelenSchmid/Filterzyme",
"Source Code": "https://github.com/HelenSchmid/Filterzyme"
},
"split_keywords": [
"util"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "afe37170f71f9b464a13dcfa92dde501bf9bf36d7e3eda583960ec3ad1763b3b",
"md5": "669090a1e9403d70361817fe81bed344",
"sha256": "63ba26019c7b177e2298dc8985a1adfcba0f8f0dd3c7c9d78b8757f30671582c"
},
"downloads": -1,
"filename": "filterzyme-0.0.6-py3-none-any.whl",
"has_sig": false,
"md5_digest": "669090a1e9403d70361817fe81bed344",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 79736,
"upload_time": "2025-09-08T07:13:21",
"upload_time_iso_8601": "2025-09-08T07:13:21.674959Z",
"url": "https://files.pythonhosted.org/packages/af/e3/7170f71f9b464a13dcfa92dde501bf9bf36d7e3eda583960ec3ad1763b3b/filterzyme-0.0.6-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "a802d7181316561f51538a8ae4b586afccb7eb69f65a7a5d62b795c24bd78a4e",
"md5": "63ac8174c8c59dd2daf74ccbcb5b5d68",
"sha256": "1723426831189c79f04867b36208a43216ef3041dcad85aa5e63e0aef6e4377e"
},
"downloads": -1,
"filename": "filterzyme-0.0.6.tar.gz",
"has_sig": false,
"md5_digest": "63ac8174c8c59dd2daf74ccbcb5b5d68",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 52702,
"upload_time": "2025-09-08T07:13:23",
"upload_time_iso_8601": "2025-09-08T07:13:23.315742Z",
"url": "https://files.pythonhosted.org/packages/a8/02/d7181316561f51538a8ae4b586afccb7eb69f65a7a5d62b795c24bd78a4e/filterzyme-0.0.6.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-08 07:13:23",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "HelenSchmid",
"github_project": "Filterzyme",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "filterzyme"
}