# 🚀 `mini3di` [![Stars](https://img.shields.io/github/stars/althonos/mini3di.svg?style=social&maxAge=3600&label=Star)](https://github.com/althonos/mini3di/stargazers)
*A [NumPy](https://numpy.org/) port of the [`foldseek`](https://github.com/steineggerlab/foldseek) code for encoding structures to 3di.*
[![Actions](https://img.shields.io/github/actions/workflow/status/althonos/mini3di/test.yml?branch=main&logo=github&style=flat-square&maxAge=300)](https://github.com/althonos/mini3di/actions)
[![Coverage](https://img.shields.io/codecov/c/gh/althonos/mini3di?style=flat-square&maxAge=3600)](https://codecov.io/gh/althonos/mini3di/)
[![License](https://img.shields.io/badge/license-GPLv3-blue.svg?style=flat-square&maxAge=2678400)](https://choosealicense.com/licenses/gpl-3.0/)
[![PyPI](https://img.shields.io/pypi/v/mini3di.svg?style=flat-square&maxAge=3600)](https://pypi.org/project/mini3di)
[![Bioconda](https://img.shields.io/conda/vn/bioconda/mini3di?style=flat-square&maxAge=3600&logo=anaconda)](https://anaconda.org/bioconda/mini3di)
[![Wheel](https://img.shields.io/pypi/wheel/mini3di.svg?style=flat-square&maxAge=3600)](https://pypi.org/project/mini3di/#files)
[![Python Versions](https://img.shields.io/pypi/pyversions/mini3di.svg?style=flat-square&maxAge=3600)](https://pypi.org/project/mini3di/#files)
[![Python Implementations](https://img.shields.io/badge/impl-universal-success.svg?style=flat-square&maxAge=3600&label=impl)](https://pypi.org/project/mini3di/#files)
[![Source](https://img.shields.io/badge/source-GitHub-303030.svg?maxAge=2678400&style=flat-square)](https://github.com/althonos/mini3di/)
[![Mirror](https://img.shields.io/badge/mirror-EMBL-009f4d?style=flat-square&maxAge=2678400)](https://git.embl.de/larralde/mini3di/)
[![GitHub issues](https://img.shields.io/github/issues/althonos/mini3di.svg?style=flat-square&maxAge=600)](https://github.com/althonos/mini3di/issues)
[![Docs](https://img.shields.io/readthedocs/mini3di/latest?style=flat-square&maxAge=600)](https://mini3di.readthedocs.io)
[![Changelog](https://img.shields.io/badge/keep%20a-changelog-8A0707.svg?maxAge=2678400&style=flat-square)](https://github.com/althonos/mini3di/blob/master/CHANGELOG.md)
[![Downloads](https://img.shields.io/pypi/dm/mini3di?style=flat-square&color=303f9f&maxAge=86400&label=downloads)](https://pepy.tech/project/mini3di)
## 🗺️ Overview
[`foldseek`](https://github.com/steineggerlab/foldseek) is a method developed
by van Kempen *et al.*[\[1\]](#ref1) for the fast and accurate search of
protein structures. In order to search proteins structures at a large scale,
it first encodes the 3D structure into sequences over a structural alphabet,
3di, which captures tertiary amino acid interactions.
`mini3di` is a pure-Python package to encode 3D structures of proteins into
the 3di alphabet, using the trained weights from the `foldseek` VQ-VAE model.
This library only depends on NumPy and is available for all modern Python
versions (3.7+).
<!-- ### 📋 Features -->
## 🔧 Installing
Install the `mini3di` package directly from [PyPi](https://pypi.org/project/mini3di)
which hosts universal wheels that can be installed with `pip`:
```console
$ pip install mini3di
```
<!-- Otherwise, `mini3di` is also available as a [Bioconda](https://bioconda.github.io/)
package:
```console
$ conda install -c bioconda mini3di
``` -->
<!-- ## 📖 Documentation
A complete [API reference](https://mini3di.readthedocs.io/en/stable/api.html)
can be found in the [online documentation](https://mini3di.readthedocs.io/),
or directly from the command line using
[`pydoc`](https://docs.python.org/3/library/pydoc.html):
```console
$ pydoc mini3di
``` -->
## 💡 Example
`mini3di` provides a single `Encoder` class, which expects the 3D coordinates
of the **Cα**, **Cβ**, **N** and **C** atoms from each peptide residue. For
residues without **Cβ** (Gly), simply write the coordinates as `math.nan`.
Call the `encode_atoms` method to get a sequence of 3di states:
```python
from math import nan
import mini3di
encoder = mini3di.Encoder()
states = encoder.encode_atoms(
ca=[[32.9, 51.9, 28.8], [35.0, 51.9, 26.6], ...],
cb=[[ nan, nan, nan], [35.3, 53.3, 26.4], ...],
n=[ [32.1, 51.2, 29.8], [35.3, 51.5, 28.1], ...],
c=[ [34.4, 51.7, 29.1], [36.1, 51.1, 25.8], ...],
)
```
The states returned as output will be a NumPy array of state indices. To turn
it into a sequence, use the `build_sequence` method of the encoder:
```python
sequence = encoder.build_sequence(states)
print(sequence)
```
The encoder can work directly with Biopython objects, if Biopython is available.
A helper method `encode_chain` to extract the atom coordinates from
a [`Bio.PDB.Chain`](https://biopython.org/docs/latest/api/Bio.PDB.Chain.html)
and encoding them directly. For instance, to encode all the chains from a
[PDB file](https://en.wikipedia.org/wiki/Protein_Data_Bank_(file_format)):
```python
import pathlib
import mini3di
from Bio.PDB import PDBParser
encoder = mini3di.Encoder()
parser = PDBParser(QUIET=True)
struct = parser.get_structure("8crb", pathlib.Path("tests", "data", "8crb.pdb"))
for chain in struct.get_chains():
states = encoder.encode_chain(chain)
sequence = encoder.build_sequence(states)
print(chain.get_id(), sequence)
```
## 💭 Feedback
### ⚠️ Issue Tracker
Found a bug? Have an enhancement request? Head over to the [GitHub issue
tracker](https://github.com/althonos/mini3di/issues) if you need to report
or ask something. If you are filing in on a bug, please include as much
information as you can about the issue, and try to recreate the same bug
in a simple, easily reproducible situation.
### 🏗️ Contributing
Contributions are more than welcome! See
[`CONTRIBUTING.md`](https://github.com/althonos/mini3di/blob/main/CONTRIBUTING.md)
for more details.
## 📋 Changelog
This project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html)
and provides a [changelog](https://github.com/althonos/mini3di/blob/master/CHANGELOG.md)
in the [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) format.
## ⚖️ License
This library is provided under the [BSD 3-clause license](https://choosealicense.com/licenses/bsd-3-clause/).
It includes some code ported from `foldseek`, which is licensed under the
[GNU General Public License v3.0](https://choosealicense.com/licenses/gpl-3.0/),
and relicensed with the permission of the authors.
*This project is in no way not affiliated, sponsored, or otherwise endorsed
by the [original `foldseek` authors](https://github.com/steineggerlab).
It was developed by [Martin Larralde](https://github.com/althonos/) during his
PhD project at the [European Molecular Biology Laboratory](https://www.embl.de/)
in the [Zeller team](https://github.com/zellerlab).*
## 📚 References
- <a id="ref1">\[1\]</a> Kempen, Michel van, Stephanie S. Kim, Charlotte Tumescheit, Milot Mirdita, Jeongjae Lee, Cameron L. M. Gilchrist, Johannes Söding, and Martin Steinegger. ‘Fast and Accurate Protein Structure Search with Foldseek’. Nature Biotechnology, 8 May 2023, 1–4. [doi:10.1038/s41587-023-01773-0](https://doi.org/10.1038/s41587-023-01773-0).
Raw data
{
"_id": null,
"home_page": "https://github.com/althonos/mini3di",
"name": "mini3di",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": null,
"keywords": "bioinformatics, protein, structure, foldseek, 3di",
"author": "Martin Larralde",
"author_email": "martin.larralde@embl.de",
"download_url": "https://files.pythonhosted.org/packages/e4/e4/a478c514aef8759e5e55fb2bed4fd242f19a82ba7d780cc24dd7952b1d72/mini3di-0.2.1.tar.gz",
"platform": "posix",
"description": "# \ud83d\ude80 `mini3di` [![Stars](https://img.shields.io/github/stars/althonos/mini3di.svg?style=social&maxAge=3600&label=Star)](https://github.com/althonos/mini3di/stargazers)\n\n*A [NumPy](https://numpy.org/) port of the [`foldseek`](https://github.com/steineggerlab/foldseek) code for encoding structures to 3di.*\n\n[![Actions](https://img.shields.io/github/actions/workflow/status/althonos/mini3di/test.yml?branch=main&logo=github&style=flat-square&maxAge=300)](https://github.com/althonos/mini3di/actions)\n[![Coverage](https://img.shields.io/codecov/c/gh/althonos/mini3di?style=flat-square&maxAge=3600)](https://codecov.io/gh/althonos/mini3di/)\n[![License](https://img.shields.io/badge/license-GPLv3-blue.svg?style=flat-square&maxAge=2678400)](https://choosealicense.com/licenses/gpl-3.0/)\n[![PyPI](https://img.shields.io/pypi/v/mini3di.svg?style=flat-square&maxAge=3600)](https://pypi.org/project/mini3di)\n[![Bioconda](https://img.shields.io/conda/vn/bioconda/mini3di?style=flat-square&maxAge=3600&logo=anaconda)](https://anaconda.org/bioconda/mini3di)\n[![Wheel](https://img.shields.io/pypi/wheel/mini3di.svg?style=flat-square&maxAge=3600)](https://pypi.org/project/mini3di/#files)\n[![Python Versions](https://img.shields.io/pypi/pyversions/mini3di.svg?style=flat-square&maxAge=3600)](https://pypi.org/project/mini3di/#files)\n[![Python Implementations](https://img.shields.io/badge/impl-universal-success.svg?style=flat-square&maxAge=3600&label=impl)](https://pypi.org/project/mini3di/#files)\n[![Source](https://img.shields.io/badge/source-GitHub-303030.svg?maxAge=2678400&style=flat-square)](https://github.com/althonos/mini3di/)\n[![Mirror](https://img.shields.io/badge/mirror-EMBL-009f4d?style=flat-square&maxAge=2678400)](https://git.embl.de/larralde/mini3di/)\n[![GitHub issues](https://img.shields.io/github/issues/althonos/mini3di.svg?style=flat-square&maxAge=600)](https://github.com/althonos/mini3di/issues)\n[![Docs](https://img.shields.io/readthedocs/mini3di/latest?style=flat-square&maxAge=600)](https://mini3di.readthedocs.io)\n[![Changelog](https://img.shields.io/badge/keep%20a-changelog-8A0707.svg?maxAge=2678400&style=flat-square)](https://github.com/althonos/mini3di/blob/master/CHANGELOG.md)\n[![Downloads](https://img.shields.io/pypi/dm/mini3di?style=flat-square&color=303f9f&maxAge=86400&label=downloads)](https://pepy.tech/project/mini3di)\n\n## \ud83d\uddfa\ufe0f Overview\n\n[`foldseek`](https://github.com/steineggerlab/foldseek) is a method developed\nby van Kempen *et al.*[\\[1\\]](#ref1) for the fast and accurate search of\nprotein structures. In order to search proteins structures at a large scale,\nit first encodes the 3D structure into sequences over a structural alphabet,\n3di, which captures tertiary amino acid interactions.\n\n`mini3di` is a pure-Python package to encode 3D structures of proteins into\nthe 3di alphabet, using the trained weights from the `foldseek` VQ-VAE model.\n\nThis library only depends on NumPy and is available for all modern Python\nversions (3.7+).\n\n<!-- ### \ud83d\udccb Features -->\n\n\n## \ud83d\udd27 Installing\n\nInstall the `mini3di` package directly from [PyPi](https://pypi.org/project/mini3di)\nwhich hosts universal wheels that can be installed with `pip`:\n```console\n$ pip install mini3di\n```\n\n<!-- Otherwise, `mini3di` is also available as a [Bioconda](https://bioconda.github.io/)\npackage:\n```console\n$ conda install -c bioconda mini3di\n``` -->\n\n<!-- ## \ud83d\udcd6 Documentation\n\nA complete [API reference](https://mini3di.readthedocs.io/en/stable/api.html)\ncan be found in the [online documentation](https://mini3di.readthedocs.io/),\nor directly from the command line using\n[`pydoc`](https://docs.python.org/3/library/pydoc.html):\n```console\n$ pydoc mini3di\n``` -->\n\n## \ud83d\udca1 Example\n\n`mini3di` provides a single `Encoder` class, which expects the 3D coordinates\nof the **C\u03b1**, **C\u03b2**, **N** and **C** atoms from each peptide residue. For\nresidues without **C\u03b2** (Gly), simply write the coordinates as `math.nan`.\nCall the `encode_atoms` method to get a sequence of 3di states:\n```python\nfrom math import nan\nimport mini3di\n\nencoder = mini3di.Encoder()\nstates = encoder.encode_atoms(\n ca=[[32.9, 51.9, 28.8], [35.0, 51.9, 26.6], ...],\n cb=[[ nan, nan, nan], [35.3, 53.3, 26.4], ...],\n n=[ [32.1, 51.2, 29.8], [35.3, 51.5, 28.1], ...],\n c=[ [34.4, 51.7, 29.1], [36.1, 51.1, 25.8], ...],\n)\n```\n\nThe states returned as output will be a NumPy array of state indices. To turn\nit into a sequence, use the `build_sequence` method of the encoder:\n```python\nsequence = encoder.build_sequence(states)\nprint(sequence)\n```\n\nThe encoder can work directly with Biopython objects, if Biopython is available.\nA helper method `encode_chain` to extract the atom coordinates from\na [`Bio.PDB.Chain`](https://biopython.org/docs/latest/api/Bio.PDB.Chain.html)\nand encoding them directly. For instance, to encode all the chains from a\n[PDB file](https://en.wikipedia.org/wiki/Protein_Data_Bank_(file_format)):\n```python\nimport pathlib\n\nimport mini3di\nfrom Bio.PDB import PDBParser\n\nencoder = mini3di.Encoder()\nparser = PDBParser(QUIET=True)\nstruct = parser.get_structure(\"8crb\", pathlib.Path(\"tests\", \"data\", \"8crb.pdb\"))\n\nfor chain in struct.get_chains():\n states = encoder.encode_chain(chain)\n sequence = encoder.build_sequence(states)\n print(chain.get_id(), sequence)\n```\n\n## \ud83d\udcad Feedback\n\n### \u26a0\ufe0f Issue Tracker\n\nFound a bug? Have an enhancement request? Head over to the [GitHub issue\ntracker](https://github.com/althonos/mini3di/issues) if you need to report\nor ask something. If you are filing in on a bug, please include as much\ninformation as you can about the issue, and try to recreate the same bug\nin a simple, easily reproducible situation.\n\n### \ud83c\udfd7\ufe0f Contributing\n\nContributions are more than welcome! See\n[`CONTRIBUTING.md`](https://github.com/althonos/mini3di/blob/main/CONTRIBUTING.md)\nfor more details.\n\n## \ud83d\udccb Changelog\n\nThis project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html)\nand provides a [changelog](https://github.com/althonos/mini3di/blob/master/CHANGELOG.md)\nin the [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) format.\n\n## \u2696\ufe0f License\n\nThis library is provided under the [BSD 3-clause license](https://choosealicense.com/licenses/bsd-3-clause/).\nIt includes some code ported from `foldseek`, which is licensed under the\n[GNU General Public License v3.0](https://choosealicense.com/licenses/gpl-3.0/),\nand relicensed with the permission of the authors.\n\n*This project is in no way not affiliated, sponsored, or otherwise endorsed\nby the [original `foldseek` authors](https://github.com/steineggerlab).\nIt was developed by [Martin Larralde](https://github.com/althonos/) during his\nPhD project at the [European Molecular Biology Laboratory](https://www.embl.de/)\nin the [Zeller team](https://github.com/zellerlab).*\n\n\n## \ud83d\udcda References\n\n- <a id=\"ref1\">\\[1\\]</a> Kempen, Michel van, Stephanie S. Kim, Charlotte Tumescheit, Milot Mirdita, Jeongjae Lee, Cameron L. M. Gilchrist, Johannes S\u00f6ding, and Martin Steinegger. \u2018Fast and Accurate Protein Structure Search with Foldseek\u2019. Nature Biotechnology, 8 May 2023, 1\u20134. [doi:10.1038/s41587-023-01773-0](https://doi.org/10.1038/s41587-023-01773-0).\n",
"bugtrack_url": null,
"license": "BSD-3-Clause",
"summary": "A NumPy port of the foldseek code for encoding structures to 3di.",
"version": "0.2.1",
"project_urls": {
"Bug Tracker": "https://github.com/althonos/mini3di/issues",
"Builds": "https://github.com/althonos/mini3di/actions",
"Changelog": "https://github.com/althonos/mini3di/blob/master/CHANGELOG.md",
"Coverage": "https://codecov.io/gh/althonos/mini3di/",
"Homepage": "https://github.com/althonos/mini3di",
"PyPI": "https://pypi.org/project/mini3di"
},
"split_keywords": [
"bioinformatics",
" protein",
" structure",
" foldseek",
" 3di"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "eea6a62382d676ab32faf6007935743676c4f97240d55e1829188fe7762a217c",
"md5": "2204abb5c302c2b739289093c53d11d2",
"sha256": "304479e7e6c81065b616fb2b745a3e17c41e69d4f382ee83b57e7c3f4a401e85"
},
"downloads": -1,
"filename": "mini3di-0.2.1-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "2204abb5c302c2b739289093c53d11d2",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": ">=3.7",
"size": 14189,
"upload_time": "2024-09-15T21:48:21",
"upload_time_iso_8601": "2024-09-15T21:48:21.478475Z",
"url": "https://files.pythonhosted.org/packages/ee/a6/a62382d676ab32faf6007935743676c4f97240d55e1829188fe7762a217c/mini3di-0.2.1-py2.py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "e4e4a478c514aef8759e5e55fb2bed4fd242f19a82ba7d780cc24dd7952b1d72",
"md5": "456d8d9f692ddb0ae3e321acb800a6a2",
"sha256": "dfc4a63aaba175b05e9cc1f260e1bfcd2832ff3235537146a334b6a2179f8a96"
},
"downloads": -1,
"filename": "mini3di-0.2.1.tar.gz",
"has_sig": false,
"md5_digest": "456d8d9f692ddb0ae3e321acb800a6a2",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 18577,
"upload_time": "2024-09-15T21:48:23",
"upload_time_iso_8601": "2024-09-15T21:48:23.085474Z",
"url": "https://files.pythonhosted.org/packages/e4/e4/a478c514aef8759e5e55fb2bed4fd242f19a82ba7d780cc24dd7952b1d72/mini3di-0.2.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-09-15 21:48:23",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "althonos",
"github_project": "mini3di",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "mini3di"
}