Name | enzymm JSON |
Version |
0.1.7
JSON |
| download |
home_page | None |
Summary | Detect catalytic enzyme residues in protein structures by matching a library of known templates. |
upload_time | 2025-08-26 22:34:08 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.7 |
license | MIT License
Copyright (c) 2024-2025 Raymund Hackett <r.e.hackett@lumc.nl>
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE. |
keywords |
enzyme
catalytic
active site
protein
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# EnzyMM - The Enzyme Motif Miner [](https://github.com/rayhackett/enzymm/stargazers)
[](https://github.com/RayHackett/Enzymm/actions/workflows/test.yml)
[](https://codecov.io/gh/rayhackett/enzymm/)
[](https://github.com/rayhackett/enzymm/tags)
[](https://choosealicense.com/licenses/mit/)
[](https://github.com/RayHackett/enzymm/)
[](https://github.com/rayHackett/enzymm/blob/main/CHANGELOG.md)
[](https://github.com/RayHackett/enzymm/issues)
[](https://pypi.org/project/enzymm/#files)
[](https://pypi.python.org/pypi/enzymm)
[](https://pypi.org/project/enzymm/#files)
[](https://github.com/users/rayhackett/packages/container/package/enzymm)
[](https://github.com/rayhackett/enzymm/releases/latest)
<!-- [](https://doi.org/10.21105/joss.04296) -->
<!-- [](https://badge.dimensions.ai/details/id/pub.1147419140) -->
<!-- [](https://anaconda.org/bioconda/pyhmmer) -->
<!-- [](https://aur.archlinux.org/packages/python-pyhmmer) -->
## ๏ธOverview
Enzyme Motif Miner uses geometric template matching to identify known arrangements of catalytic residues called templates in protein structures. It searches protein structures provided by the user against a database of templates. `EnzyMM` ships with a library of catalytic templates derived from the [Mechanism and Catalytic Site Atlas](https://www.ebi.ac.uk/thornton-srv/m-csa/) (M-CSA) but you can also generate your own. These templates represent consensus arrangements of catalytic sites found in active sites of experimental protein structures.
As catalytic sites are both highly conserved and absolutely critical for the function of a protein, identifying them offers many biological insights. This method has two key advantages. Firstly, as it doesn't rely on sequence or (global) fold similarity, similar catalytic arrangements can be found accross great evolutionary distances offering insights into the divergence or even convergence of enyzmes. Secondly, as geometric matching is very fast, `EnzyMM` scales along side databases of predicted protein structures. Expect to scan a protein structure in a matter of seconds on consumer laptops.
As a database driven method, `EnzyMM` is inherently limited by the coverage of residue arrangements in its template library. The provided template library covers nearly the entire M-CSA and thus around 3/4 of enzyme mechanisms classified by the Enzyme Commission to the 3rd level. Catalytic arrangements not found in the PDBe won't be included in the M-CSA. Of course, the user can also provide their own library of templates. While primarily intended for catalytic sites, you are invited to search with your own library of templates.
For the actual geometric matching `EnzyMM` relies on [PyJess](https://github.com/althonos/pyjess) - a [Cython](https://cython.org/) wrapper of [Jess](https://github.com/iriziotis/jess).
## ๐ง Installing EnzyMM
`EnzyMM` is implemented in [Python](https://www.python.org/),
and supports [all versions](https://endoflife.date/python) from Python 3.8 on Linux and MacOS. It requires
additional libraries that can be installed directly from
[PyPI](https://pypi.org), the Python Package Index.
Use [`pip`](https://pip.pypa.io/en/stable/) to install `EnzyMM` on your
machine:
```bash
$ pip install enzymm
```
This will both install `EnzyMM` and also download a library of catalytic templates together with important metadata. This requires around 16MB of data to be downloaded.
It should also run on windows (though this is not tested for on release).
### ๐ผ๏ธ Images
Lightweight images built from [`python:3.13-alpine`](https://hub.docker.com/_/python/tags?page=1&name=3.13-alpine) are available:
Pull the latest [Docker](https://www.docker.com/) image from GHCR:
```bash
docker pull ghcr.io/rayhackett/enzymm:latest
```
Pull the latest [Apptainer](https://apptainer.org/) image via ORAS from GHCR:
```bash
apptainer pull oras://ghcr.io/rayhackett/enzymm:latest
```
## ๐ Running EnzyMM
Once `EnzyMM` is installed, you can run it from the terminal. The user can either provide a path to a single protein structure `-i` or to run multiple queries at once, the path to a text file `-l` which itself contains a list of paths to protein structures.
Optionally, an output directory for pdb structures of the identified matches per query protein can be supplied with the `--pdbs` flag.
```bash
$ enzymm -i some_structure.pdb -o results.tsv --pdbs dir_to_save_matches
```
Additional parameters of interest are:
- `--n-jobs` or `-n`, which controls the number of threads used to parallelize the search.
By default, it will use one thread less than available on your system using
[`os.cpu_count`](https://docs.python.org/3/library/os.html#os.cpu_count).
- `--unfiltered` or `-u`, which disables filtering of matches by RMSD and residue orientation.
By default, filtering is enabled.
- `--skip-smaller-hits`, which skips searches with smaller templates on a query
if a match to a larger template has already been found.
- `--jess` or `-j`, which controls the RMSD threshold and pairwise distance threshold applied. By default sensible thresholds are selected. Refer to the Docs for details
- `--template-dir` or `-t`, though which the user may supply their own template library. By default, a library of catalytic templates derived from the M-CSA is loaded.
- `--conservation-cutoff` or `-c`, which can be set to exclude atoms with B-factors or pLDDT scores below this threshold from matching. This is not set by default.
Further, `EnyzMM` is designed with modularity in mind and comes with a fully usable internal API.
Please refer to the Docs for further reference.
## ๐น Results
`EnzyMM` will create a single output file:
- `{output}.tsv`: A `.tsv` file containing a summary of all results. One row is printed per match.
For visual exploration of matches, you can optionally save an alignment of the template and the matched query residues to a pdb file which can be viewed with any pdb viewer.
To do so, supply an output directory after the `--pdbs` flag for the `.pdb` files.
This will also create:
- `{pdbs_dir}/{query_identifier}_matches.pdb`: One `.pdb` file per query with a structural alignment between template and query residues. This can be further configured.
Add additional information to each `.pdb` file with the following flags:
- `--transform`, which causes the query to be aligned to the the template instead of vice versa.
- `--include-query`, which also writes the entire query pdb structure to the `.pdb` file
Currently, `--transform` and `--include-query` should not be used together.
Hopefully I'll get around to fixing this soon.
## ๐ญ Feedback
### โ ๏ธ Issue Tracker
Please report any bugs or feature requests though the [GitHub issue tracker](https://github.com/RayHackett/enzymm/issues).
Please also feel free to ask any questions and I will do my best to answer them.
If reporting a bug, please include as much information as you can about the issue and try to recreate the same bug.
Ideally include a little test example so I can quickly troubleshoot.
### ๐๏ธ Contributing
Contributions are more than welcome!
Raise an issue or shoot me an email under `r.e.hackett` AT `lumc.nl`
I'm happy to help.
## ๐ Changelog
This project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html)
and provides a [changelog](https://github.com/rayhackett/enzymm/blob/main/CHANGELOG.md)
in the [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) format.
## โ๏ธ License
This software is provided under the open source [MIT](https://choosealicense.com/licenses/mit/) licence.
Though conceived at the [EMBL-EBI](https://www.ebi.ac.uk/) in Hinxton, UK in the [Thornton Group](https://www.ebi.ac.uk/research/thornton/), `EnzyMM` is now developed by Raymund Hackett and the [Zeller Group](https://zellerlab.org/) at the [Leiden University Medical Center](https://www.lumc.nl/en/) in Leiden in the Netherlands with continuing support from the Thornton Group.
## ๐ Citations
`EnyzMM` is academic software but relies on many previous approaches.
`EnzyMM` itself can not yet be cited but a preprint is in preparation.
We intend to publish during the summer of 2025.
We kindly ask you to cite both:
- PyJess, for instance as:
> PyJess, a Python library binding to Jess (Barker *et al.*, 2003).
- Mechanism and Catalytic Site Atlas as:
> Ribeiro AJM et al. (2017), Nucleic Acids Res, 46, D618-D623. Mechanism and Catalytic Site Atlas (M-CSA): a database of enzyme reaction mechanisms and active sites. DOI:10.1093/nar/gkx1012. PMID:29106569.
<!--
## ๐ References -->
Raw data
{
"_id": null,
"home_page": null,
"name": "enzymm",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": null,
"keywords": "enzyme, catalytic, active site, protein",
"author": null,
"author_email": "Raymund Hackett <r.e.hackett@lumc.nl>",
"download_url": "https://files.pythonhosted.org/packages/c3/74/e1d8c7226cf9cb67cc2d07ad2c80c3f70ec83f940b87cb0ce9b26e54604f/enzymm-0.1.7.tar.gz",
"platform": null,
"description": "\n# EnzyMM - The Enzyme Motif Miner [](https://github.com/rayhackett/enzymm/stargazers)\n\n[](https://github.com/RayHackett/Enzymm/actions/workflows/test.yml)\n[](https://codecov.io/gh/rayhackett/enzymm/)\n[](https://github.com/rayhackett/enzymm/tags)\n[](https://choosealicense.com/licenses/mit/)\n[](https://github.com/RayHackett/enzymm/)\n[](https://github.com/rayHackett/enzymm/blob/main/CHANGELOG.md)\n[](https://github.com/RayHackett/enzymm/issues)\n[](https://pypi.org/project/enzymm/#files)\n[](https://pypi.python.org/pypi/enzymm)\n[](https://pypi.org/project/enzymm/#files)\n[](https://github.com/users/rayhackett/packages/container/package/enzymm)\n[](https://github.com/rayhackett/enzymm/releases/latest)\n<!-- [](https://doi.org/10.21105/joss.04296) -->\n<!-- [](https://badge.dimensions.ai/details/id/pub.1147419140) -->\n<!-- [](https://anaconda.org/bioconda/pyhmmer) -->\n<!-- [](https://aur.archlinux.org/packages/python-pyhmmer) -->\n\n## \ufe0fOverview\n\nEnzyme Motif Miner uses geometric template matching to identify known arrangements of catalytic residues called templates in protein structures. It searches protein structures provided by the user against a database of templates. `EnzyMM` ships with a library of catalytic templates derived from the [Mechanism and Catalytic Site Atlas](https://www.ebi.ac.uk/thornton-srv/m-csa/) (M-CSA) but you can also generate your own. These templates represent consensus arrangements of catalytic sites found in active sites of experimental protein structures. \n\nAs catalytic sites are both highly conserved and absolutely critical for the function of a protein, identifying them offers many biological insights. This method has two key advantages. Firstly, as it doesn't rely on sequence or (global) fold similarity, similar catalytic arrangements can be found accross great evolutionary distances offering insights into the divergence or even convergence of enyzmes. Secondly, as geometric matching is very fast, `EnzyMM` scales along side databases of predicted protein structures. Expect to scan a protein structure in a matter of seconds on consumer laptops. \n\nAs a database driven method, `EnzyMM` is inherently limited by the coverage of residue arrangements in its template library. The provided template library covers nearly the entire M-CSA and thus around 3/4 of enzyme mechanisms classified by the Enzyme Commission to the 3rd level. Catalytic arrangements not found in the PDBe won't be included in the M-CSA. Of course, the user can also provide their own library of templates. While primarily intended for catalytic sites, you are invited to search with your own library of templates. \n\nFor the actual geometric matching `EnzyMM` relies on [PyJess](https://github.com/althonos/pyjess) - a [Cython](https://cython.org/) wrapper of [Jess](https://github.com/iriziotis/jess).\n\n\n## \ud83d\udd27 Installing EnzyMM\n\n`EnzyMM` is implemented in [Python](https://www.python.org/), \nand supports [all versions](https://endoflife.date/python) from Python 3.8 on Linux and MacOS. It requires\nadditional libraries that can be installed directly from\n[PyPI](https://pypi.org), the Python Package Index.\n\nUse [`pip`](https://pip.pypa.io/en/stable/) to install `EnzyMM` on your\nmachine:\n```bash\n$ pip install enzymm\n```\n\nThis will both install `EnzyMM` and also download a library of catalytic templates together with important metadata. This requires around 16MB of data to be downloaded.\nIt should also run on windows (though this is not tested for on release).\n\n### \ud83d\uddbc\ufe0f Images\nLightweight images built from [`python:3.13-alpine`](https://hub.docker.com/_/python/tags?page=1&name=3.13-alpine) are available: \n\nPull the latest [Docker](https://www.docker.com/) image from GHCR:\n```bash\ndocker pull ghcr.io/rayhackett/enzymm:latest\n```\n\nPull the latest [Apptainer](https://apptainer.org/) image via ORAS from GHCR:\n```bash\napptainer pull oras://ghcr.io/rayhackett/enzymm:latest\n```\n\n## \ud83d\udd0e Running EnzyMM\n\nOnce `EnzyMM` is installed, you can run it from the terminal. The user can either provide a path to a single protein structure `-i` or to run multiple queries at once, the path to a text file `-l` which itself contains a list of paths to protein structures.\nOptionally, an output directory for pdb structures of the identified matches per query protein can be supplied with the `--pdbs` flag.\n\n```bash\n$ enzymm -i some_structure.pdb -o results.tsv --pdbs dir_to_save_matches\n```\n\nAdditional parameters of interest are:\n\n- `--n-jobs` or `-n`, which controls the number of threads used to parallelize the search.\n By default, it will use one thread less than available on your system using\n [`os.cpu_count`](https://docs.python.org/3/library/os.html#os.cpu_count).\n- `--unfiltered` or `-u`, which disables filtering of matches by RMSD and residue orientation.\n By default, filtering is enabled.\n- `--skip-smaller-hits`, which skips searches with smaller templates on a query\n if a match to a larger template has already been found.\n- `--jess` or `-j`, which controls the RMSD threshold and pairwise distance threshold applied. By default sensible thresholds are selected. Refer to the Docs for details\n- `--template-dir` or `-t`, though which the user may supply their own template library. By default, a library of catalytic templates derived from the M-CSA is loaded.\n- `--conservation-cutoff` or `-c`, which can be set to exclude atoms with B-factors or pLDDT scores below this threshold from matching. This is not set by default.\n\nFurther, `EnyzMM` is designed with modularity in mind and comes with a fully usable internal API.\nPlease refer to the Docs for further reference.\n\n## \ud83d\uddb9 Results\n\n`EnzyMM` will create a single output file:\n\n- `{output}.tsv`: A `.tsv` file containing a summary of all results. One row is printed per match.\n\nFor visual exploration of matches, you can optionally save an alignment of the template and the matched query residues to a pdb file which can be viewed with any pdb viewer.\nTo do so, supply an output directory after the `--pdbs` flag for the `.pdb` files.\n\nThis will also create:\n\n- `{pdbs_dir}/{query_identifier}_matches.pdb`: One `.pdb` file per query with a structural alignment between template and query residues. This can be further configured.\n\nAdd additional information to each `.pdb` file with the following flags:\n\n- `--transform`, which causes the query to be aligned to the the template instead of vice versa.\n- `--include-query`, which also writes the entire query pdb structure to the `.pdb` file\n\nCurrently, `--transform` and `--include-query` should not be used together.\nHopefully I'll get around to fixing this soon.\n\n\n## \ud83d\udcad Feedback\n\n### \u26a0\ufe0f Issue Tracker\n\nPlease report any bugs or feature requests though the [GitHub issue tracker](https://github.com/RayHackett/enzymm/issues).\nPlease also feel free to ask any questions and I will do my best to answer them. \nIf reporting a bug, please include as much information as you can about the issue and try to recreate the same bug.\nIdeally include a little test example so I can quickly troubleshoot.\n\n### \ud83c\udfd7\ufe0f Contributing\nContributions are more than welcome!\nRaise an issue or shoot me an email under `r.e.hackett` AT `lumc.nl` \nI'm happy to help.\n\n## \ud83d\udccb Changelog\n\nThis project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html)\nand provides a [changelog](https://github.com/rayhackett/enzymm/blob/main/CHANGELOG.md)\nin the [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) format.\n\n## \u2696\ufe0f License\n\nThis software is provided under the open source [MIT](https://choosealicense.com/licenses/mit/) licence. \nThough conceived at the [EMBL-EBI](https://www.ebi.ac.uk/) in Hinxton, UK in the [Thornton Group](https://www.ebi.ac.uk/research/thornton/), `EnzyMM` is now developed by Raymund Hackett and the [Zeller Group](https://zellerlab.org/) at the [Leiden University Medical Center](https://www.lumc.nl/en/) in Leiden in the Netherlands with continuing support from the Thornton Group.\n\n## \ud83d\udd16 Citations\n`EnyzMM` is academic software but relies on many previous approaches. \n`EnzyMM` itself can not yet be cited but a preprint is in preparation.\nWe intend to publish during the summer of 2025. \n\nWe kindly ask you to cite both: \n- PyJess, for instance as:\n> PyJess, a Python library binding to Jess (Barker *et al.*, 2003).\n- Mechanism and Catalytic Site Atlas as:\n> Ribeiro AJM et al. (2017), Nucleic Acids Res, 46, D618-D623. Mechanism and Catalytic Site Atlas (M-CSA): a database of enzyme reaction mechanisms and active sites. DOI:10.1093/nar/gkx1012. PMID:29106569.\n\n<!-- \n## \ud83d\udcda References -->\n",
"bugtrack_url": null,
"license": "MIT License\n \n Copyright (c) 2024-2025 Raymund Hackett <r.e.hackett@lumc.nl>\n \n Permission is hereby granted, free of charge, to any person obtaining a copy\n of this software and associated documentation files (the \"Software\"), to deal\n in the Software without restriction, including without limitation the rights\n to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\n copies of the Software, and to permit persons to whom the Software is\n furnished to do so, subject to the following conditions:\n \n The above copyright notice and this permission notice shall be included in all\n copies or substantial portions of the Software.\n \n THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\n AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\n OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\n SOFTWARE.",
"summary": "Detect catalytic enzyme residues in protein structures by matching a library of known templates.",
"version": "0.1.7",
"project_urls": {
"Bug Tracker": "https://github.com/RayHackett/enzymm/issues",
"CI": "https://github.com/RayHackett/enzymm/actions",
"Changelog": "https://github.com/RayHackett/enzymm/blob/main/CHANGELOG.md"
},
"split_keywords": [
"enzyme",
" catalytic",
" active site",
" protein"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "09f72703ca22de321b75932ac7816acbcae9c19cc8152870fcac5e79999a5702",
"md5": "ac19575994fa6c70264b0f0dee21f14b",
"sha256": "e2730d19ac2d82b35d0d9e11c2df0c0b4bc772758ea8adcdc5974c5ed9eb9366"
},
"downloads": -1,
"filename": "enzymm-0.1.7-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ac19575994fa6c70264b0f0dee21f14b",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 17962503,
"upload_time": "2025-08-26T22:34:06",
"upload_time_iso_8601": "2025-08-26T22:34:06.077936Z",
"url": "https://files.pythonhosted.org/packages/09/f7/2703ca22de321b75932ac7816acbcae9c19cc8152870fcac5e79999a5702/enzymm-0.1.7-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "c374e1d8c7226cf9cb67cc2d07ad2c80c3f70ec83f940b87cb0ce9b26e54604f",
"md5": "1c554c8ef395149ba60de988f757a005",
"sha256": "3ec9d5fc593adb6b2edb0277cc37512030050106539f765defc32214ffd5b69c"
},
"downloads": -1,
"filename": "enzymm-0.1.7.tar.gz",
"has_sig": false,
"md5_digest": "1c554c8ef395149ba60de988f757a005",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 12800874,
"upload_time": "2025-08-26T22:34:08",
"upload_time_iso_8601": "2025-08-26T22:34:08.705231Z",
"url": "https://files.pythonhosted.org/packages/c3/74/e1d8c7226cf9cb67cc2d07ad2c80c3f70ec83f940b87cb0ce9b26e54604f/enzymm-0.1.7.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-26 22:34:08",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "RayHackett",
"github_project": "enzymm",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "enzymm"
}