mini3di


Namemini3di JSON
Version 0.2.1 PyPI version JSON
download
home_pagehttps://github.com/althonos/mini3di
SummaryA NumPy port of the foldseek code for encoding structures to 3di.
upload_time2024-09-15 21:48:23
maintainerNone
docs_urlNone
authorMartin Larralde
requires_python>=3.7
licenseBSD-3-Clause
keywords bioinformatics protein structure foldseek 3di
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # 🚀 `mini3di` [![Stars](https://img.shields.io/github/stars/althonos/mini3di.svg?style=social&maxAge=3600&label=Star)](https://github.com/althonos/mini3di/stargazers)

*A [NumPy](https://numpy.org/) port of the [`foldseek`](https://github.com/steineggerlab/foldseek) code for encoding structures to 3di.*

[![Actions](https://img.shields.io/github/actions/workflow/status/althonos/mini3di/test.yml?branch=main&logo=github&style=flat-square&maxAge=300)](https://github.com/althonos/mini3di/actions)
[![Coverage](https://img.shields.io/codecov/c/gh/althonos/mini3di?style=flat-square&maxAge=3600)](https://codecov.io/gh/althonos/mini3di/)
[![License](https://img.shields.io/badge/license-GPLv3-blue.svg?style=flat-square&maxAge=2678400)](https://choosealicense.com/licenses/gpl-3.0/)
[![PyPI](https://img.shields.io/pypi/v/mini3di.svg?style=flat-square&maxAge=3600)](https://pypi.org/project/mini3di)
[![Bioconda](https://img.shields.io/conda/vn/bioconda/mini3di?style=flat-square&maxAge=3600&logo=anaconda)](https://anaconda.org/bioconda/mini3di)
[![Wheel](https://img.shields.io/pypi/wheel/mini3di.svg?style=flat-square&maxAge=3600)](https://pypi.org/project/mini3di/#files)
[![Python Versions](https://img.shields.io/pypi/pyversions/mini3di.svg?style=flat-square&maxAge=3600)](https://pypi.org/project/mini3di/#files)
[![Python Implementations](https://img.shields.io/badge/impl-universal-success.svg?style=flat-square&maxAge=3600&label=impl)](https://pypi.org/project/mini3di/#files)
[![Source](https://img.shields.io/badge/source-GitHub-303030.svg?maxAge=2678400&style=flat-square)](https://github.com/althonos/mini3di/)
[![Mirror](https://img.shields.io/badge/mirror-EMBL-009f4d?style=flat-square&maxAge=2678400)](https://git.embl.de/larralde/mini3di/)
[![GitHub issues](https://img.shields.io/github/issues/althonos/mini3di.svg?style=flat-square&maxAge=600)](https://github.com/althonos/mini3di/issues)
[![Docs](https://img.shields.io/readthedocs/mini3di/latest?style=flat-square&maxAge=600)](https://mini3di.readthedocs.io)
[![Changelog](https://img.shields.io/badge/keep%20a-changelog-8A0707.svg?maxAge=2678400&style=flat-square)](https://github.com/althonos/mini3di/blob/master/CHANGELOG.md)
[![Downloads](https://img.shields.io/pypi/dm/mini3di?style=flat-square&color=303f9f&maxAge=86400&label=downloads)](https://pepy.tech/project/mini3di)

## 🗺️ Overview

[`foldseek`](https://github.com/steineggerlab/foldseek) is a method developed
by van Kempen *et al.*[\[1\]](#ref1) for the fast and accurate search of
protein structures. In order to search proteins structures at a large scale,
it first encodes the 3D structure into sequences over a structural alphabet,
3di, which captures tertiary amino acid interactions.

`mini3di` is a pure-Python package to encode 3D structures of proteins into
the 3di alphabet, using the trained weights from the `foldseek` VQ-VAE model.

This library only depends on NumPy and is available for all modern Python
versions (3.7+).

<!-- ### 📋 Features -->


## 🔧 Installing

Install the `mini3di` package directly from [PyPi](https://pypi.org/project/mini3di)
which hosts universal wheels that can be installed with `pip`:
```console
$ pip install mini3di
```

<!-- Otherwise, `mini3di` is also available as a [Bioconda](https://bioconda.github.io/)
package:
```console
$ conda install -c bioconda mini3di
``` -->

<!-- ## 📖 Documentation

A complete [API reference](https://mini3di.readthedocs.io/en/stable/api.html)
can be found in the [online documentation](https://mini3di.readthedocs.io/),
or directly from the command line using
[`pydoc`](https://docs.python.org/3/library/pydoc.html):
```console
$ pydoc mini3di
``` -->

## 💡 Example

`mini3di` provides a single `Encoder` class, which expects the 3D coordinates
of the **Cα**, **Cβ**, **N** and **C** atoms from each peptide residue. For
residues without **Cβ** (Gly), simply write the coordinates as `math.nan`.
Call the `encode_atoms` method to get a sequence of 3di states:
```python
from math import nan
import mini3di

encoder = mini3di.Encoder()
states = encoder.encode_atoms(
    ca=[[32.9, 51.9, 28.8], [35.0, 51.9, 26.6], ...],
    cb=[[ nan,  nan,  nan], [35.3, 53.3, 26.4], ...],
    n=[ [32.1, 51.2, 29.8], [35.3, 51.5, 28.1], ...],
    c=[ [34.4, 51.7, 29.1], [36.1, 51.1, 25.8], ...],
)
```

The states returned as output will be a NumPy array of state indices. To turn
it into a sequence, use the `build_sequence` method of the encoder:
```python
sequence = encoder.build_sequence(states)
print(sequence)
```

The encoder can work directly with Biopython objects, if Biopython is available.
A helper method `encode_chain` to extract the atom coordinates from
a [`Bio.PDB.Chain`](https://biopython.org/docs/latest/api/Bio.PDB.Chain.html)
and encoding them directly. For instance, to encode all the chains from a
[PDB file](https://en.wikipedia.org/wiki/Protein_Data_Bank_(file_format)):
```python
import pathlib

import mini3di
from Bio.PDB import PDBParser

encoder = mini3di.Encoder()
parser = PDBParser(QUIET=True)
struct = parser.get_structure("8crb", pathlib.Path("tests", "data", "8crb.pdb"))

for chain in struct.get_chains():
    states = encoder.encode_chain(chain)
    sequence = encoder.build_sequence(states)
    print(chain.get_id(), sequence)
```

## 💭 Feedback

### ⚠️ Issue Tracker

Found a bug? Have an enhancement request? Head over to the [GitHub issue
tracker](https://github.com/althonos/mini3di/issues) if you need to report
or ask something. If you are filing in on a bug, please include as much
information as you can about the issue, and try to recreate the same bug
in a simple, easily reproducible situation.

### 🏗️ Contributing

Contributions are more than welcome! See
[`CONTRIBUTING.md`](https://github.com/althonos/mini3di/blob/main/CONTRIBUTING.md)
for more details.

## 📋 Changelog

This project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html)
and provides a [changelog](https://github.com/althonos/mini3di/blob/master/CHANGELOG.md)
in the [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) format.

## ⚖️ License

This library is provided under the [BSD 3-clause license](https://choosealicense.com/licenses/bsd-3-clause/).
It includes some code ported from `foldseek`, which is licensed under the
[GNU General Public License v3.0](https://choosealicense.com/licenses/gpl-3.0/),
and relicensed with the permission of the authors.

*This project is in no way not affiliated, sponsored, or otherwise endorsed
by the [original `foldseek` authors](https://github.com/steineggerlab).
It was developed by [Martin Larralde](https://github.com/althonos/) during his
PhD project at the [European Molecular Biology Laboratory](https://www.embl.de/)
in the [Zeller team](https://github.com/zellerlab).*


## 📚 References

- <a id="ref1">\[1\]</a> Kempen, Michel van, Stephanie S. Kim, Charlotte Tumescheit, Milot Mirdita, Jeongjae Lee, Cameron L. M. Gilchrist, Johannes Söding, and Martin Steinegger. ‘Fast and Accurate Protein Structure Search with Foldseek’. Nature Biotechnology, 8 May 2023, 1–4. [doi:10.1038/s41587-023-01773-0](https://doi.org/10.1038/s41587-023-01773-0).

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/althonos/mini3di",
    "name": "mini3di",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": "bioinformatics, protein, structure, foldseek, 3di",
    "author": "Martin Larralde",
    "author_email": "martin.larralde@embl.de",
    "download_url": "https://files.pythonhosted.org/packages/e4/e4/a478c514aef8759e5e55fb2bed4fd242f19a82ba7d780cc24dd7952b1d72/mini3di-0.2.1.tar.gz",
    "platform": "posix",
    "description": "# \ud83d\ude80 `mini3di` [![Stars](https://img.shields.io/github/stars/althonos/mini3di.svg?style=social&maxAge=3600&label=Star)](https://github.com/althonos/mini3di/stargazers)\n\n*A [NumPy](https://numpy.org/) port of the [`foldseek`](https://github.com/steineggerlab/foldseek) code for encoding structures to 3di.*\n\n[![Actions](https://img.shields.io/github/actions/workflow/status/althonos/mini3di/test.yml?branch=main&logo=github&style=flat-square&maxAge=300)](https://github.com/althonos/mini3di/actions)\n[![Coverage](https://img.shields.io/codecov/c/gh/althonos/mini3di?style=flat-square&maxAge=3600)](https://codecov.io/gh/althonos/mini3di/)\n[![License](https://img.shields.io/badge/license-GPLv3-blue.svg?style=flat-square&maxAge=2678400)](https://choosealicense.com/licenses/gpl-3.0/)\n[![PyPI](https://img.shields.io/pypi/v/mini3di.svg?style=flat-square&maxAge=3600)](https://pypi.org/project/mini3di)\n[![Bioconda](https://img.shields.io/conda/vn/bioconda/mini3di?style=flat-square&maxAge=3600&logo=anaconda)](https://anaconda.org/bioconda/mini3di)\n[![Wheel](https://img.shields.io/pypi/wheel/mini3di.svg?style=flat-square&maxAge=3600)](https://pypi.org/project/mini3di/#files)\n[![Python Versions](https://img.shields.io/pypi/pyversions/mini3di.svg?style=flat-square&maxAge=3600)](https://pypi.org/project/mini3di/#files)\n[![Python Implementations](https://img.shields.io/badge/impl-universal-success.svg?style=flat-square&maxAge=3600&label=impl)](https://pypi.org/project/mini3di/#files)\n[![Source](https://img.shields.io/badge/source-GitHub-303030.svg?maxAge=2678400&style=flat-square)](https://github.com/althonos/mini3di/)\n[![Mirror](https://img.shields.io/badge/mirror-EMBL-009f4d?style=flat-square&maxAge=2678400)](https://git.embl.de/larralde/mini3di/)\n[![GitHub issues](https://img.shields.io/github/issues/althonos/mini3di.svg?style=flat-square&maxAge=600)](https://github.com/althonos/mini3di/issues)\n[![Docs](https://img.shields.io/readthedocs/mini3di/latest?style=flat-square&maxAge=600)](https://mini3di.readthedocs.io)\n[![Changelog](https://img.shields.io/badge/keep%20a-changelog-8A0707.svg?maxAge=2678400&style=flat-square)](https://github.com/althonos/mini3di/blob/master/CHANGELOG.md)\n[![Downloads](https://img.shields.io/pypi/dm/mini3di?style=flat-square&color=303f9f&maxAge=86400&label=downloads)](https://pepy.tech/project/mini3di)\n\n## \ud83d\uddfa\ufe0f Overview\n\n[`foldseek`](https://github.com/steineggerlab/foldseek) is a method developed\nby van Kempen *et al.*[\\[1\\]](#ref1) for the fast and accurate search of\nprotein structures. In order to search proteins structures at a large scale,\nit first encodes the 3D structure into sequences over a structural alphabet,\n3di, which captures tertiary amino acid interactions.\n\n`mini3di` is a pure-Python package to encode 3D structures of proteins into\nthe 3di alphabet, using the trained weights from the `foldseek` VQ-VAE model.\n\nThis library only depends on NumPy and is available for all modern Python\nversions (3.7+).\n\n<!-- ### \ud83d\udccb Features -->\n\n\n## \ud83d\udd27 Installing\n\nInstall the `mini3di` package directly from [PyPi](https://pypi.org/project/mini3di)\nwhich hosts universal wheels that can be installed with `pip`:\n```console\n$ pip install mini3di\n```\n\n<!-- Otherwise, `mini3di` is also available as a [Bioconda](https://bioconda.github.io/)\npackage:\n```console\n$ conda install -c bioconda mini3di\n``` -->\n\n<!-- ## \ud83d\udcd6 Documentation\n\nA complete [API reference](https://mini3di.readthedocs.io/en/stable/api.html)\ncan be found in the [online documentation](https://mini3di.readthedocs.io/),\nor directly from the command line using\n[`pydoc`](https://docs.python.org/3/library/pydoc.html):\n```console\n$ pydoc mini3di\n``` -->\n\n## \ud83d\udca1 Example\n\n`mini3di` provides a single `Encoder` class, which expects the 3D coordinates\nof the **C\u03b1**, **C\u03b2**, **N** and **C** atoms from each peptide residue. For\nresidues without **C\u03b2** (Gly), simply write the coordinates as `math.nan`.\nCall the `encode_atoms` method to get a sequence of 3di states:\n```python\nfrom math import nan\nimport mini3di\n\nencoder = mini3di.Encoder()\nstates = encoder.encode_atoms(\n    ca=[[32.9, 51.9, 28.8], [35.0, 51.9, 26.6], ...],\n    cb=[[ nan,  nan,  nan], [35.3, 53.3, 26.4], ...],\n    n=[ [32.1, 51.2, 29.8], [35.3, 51.5, 28.1], ...],\n    c=[ [34.4, 51.7, 29.1], [36.1, 51.1, 25.8], ...],\n)\n```\n\nThe states returned as output will be a NumPy array of state indices. To turn\nit into a sequence, use the `build_sequence` method of the encoder:\n```python\nsequence = encoder.build_sequence(states)\nprint(sequence)\n```\n\nThe encoder can work directly with Biopython objects, if Biopython is available.\nA helper method `encode_chain` to extract the atom coordinates from\na [`Bio.PDB.Chain`](https://biopython.org/docs/latest/api/Bio.PDB.Chain.html)\nand encoding them directly. For instance, to encode all the chains from a\n[PDB file](https://en.wikipedia.org/wiki/Protein_Data_Bank_(file_format)):\n```python\nimport pathlib\n\nimport mini3di\nfrom Bio.PDB import PDBParser\n\nencoder = mini3di.Encoder()\nparser = PDBParser(QUIET=True)\nstruct = parser.get_structure(\"8crb\", pathlib.Path(\"tests\", \"data\", \"8crb.pdb\"))\n\nfor chain in struct.get_chains():\n    states = encoder.encode_chain(chain)\n    sequence = encoder.build_sequence(states)\n    print(chain.get_id(), sequence)\n```\n\n## \ud83d\udcad Feedback\n\n### \u26a0\ufe0f Issue Tracker\n\nFound a bug? Have an enhancement request? Head over to the [GitHub issue\ntracker](https://github.com/althonos/mini3di/issues) if you need to report\nor ask something. If you are filing in on a bug, please include as much\ninformation as you can about the issue, and try to recreate the same bug\nin a simple, easily reproducible situation.\n\n### \ud83c\udfd7\ufe0f Contributing\n\nContributions are more than welcome! See\n[`CONTRIBUTING.md`](https://github.com/althonos/mini3di/blob/main/CONTRIBUTING.md)\nfor more details.\n\n## \ud83d\udccb Changelog\n\nThis project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html)\nand provides a [changelog](https://github.com/althonos/mini3di/blob/master/CHANGELOG.md)\nin the [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) format.\n\n## \u2696\ufe0f License\n\nThis library is provided under the [BSD 3-clause license](https://choosealicense.com/licenses/bsd-3-clause/).\nIt includes some code ported from `foldseek`, which is licensed under the\n[GNU General Public License v3.0](https://choosealicense.com/licenses/gpl-3.0/),\nand relicensed with the permission of the authors.\n\n*This project is in no way not affiliated, sponsored, or otherwise endorsed\nby the [original `foldseek` authors](https://github.com/steineggerlab).\nIt was developed by [Martin Larralde](https://github.com/althonos/) during his\nPhD project at the [European Molecular Biology Laboratory](https://www.embl.de/)\nin the [Zeller team](https://github.com/zellerlab).*\n\n\n## \ud83d\udcda References\n\n- <a id=\"ref1\">\\[1\\]</a> Kempen, Michel van, Stephanie S. Kim, Charlotte Tumescheit, Milot Mirdita, Jeongjae Lee, Cameron L. M. Gilchrist, Johannes S\u00f6ding, and Martin Steinegger. \u2018Fast and Accurate Protein Structure Search with Foldseek\u2019. Nature Biotechnology, 8 May 2023, 1\u20134. [doi:10.1038/s41587-023-01773-0](https://doi.org/10.1038/s41587-023-01773-0).\n",
    "bugtrack_url": null,
    "license": "BSD-3-Clause",
    "summary": "A NumPy port of the foldseek code for encoding structures to 3di.",
    "version": "0.2.1",
    "project_urls": {
        "Bug Tracker": "https://github.com/althonos/mini3di/issues",
        "Builds": "https://github.com/althonos/mini3di/actions",
        "Changelog": "https://github.com/althonos/mini3di/blob/master/CHANGELOG.md",
        "Coverage": "https://codecov.io/gh/althonos/mini3di/",
        "Homepage": "https://github.com/althonos/mini3di",
        "PyPI": "https://pypi.org/project/mini3di"
    },
    "split_keywords": [
        "bioinformatics",
        " protein",
        " structure",
        " foldseek",
        " 3di"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "eea6a62382d676ab32faf6007935743676c4f97240d55e1829188fe7762a217c",
                "md5": "2204abb5c302c2b739289093c53d11d2",
                "sha256": "304479e7e6c81065b616fb2b745a3e17c41e69d4f382ee83b57e7c3f4a401e85"
            },
            "downloads": -1,
            "filename": "mini3di-0.2.1-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "2204abb5c302c2b739289093c53d11d2",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": ">=3.7",
            "size": 14189,
            "upload_time": "2024-09-15T21:48:21",
            "upload_time_iso_8601": "2024-09-15T21:48:21.478475Z",
            "url": "https://files.pythonhosted.org/packages/ee/a6/a62382d676ab32faf6007935743676c4f97240d55e1829188fe7762a217c/mini3di-0.2.1-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e4e4a478c514aef8759e5e55fb2bed4fd242f19a82ba7d780cc24dd7952b1d72",
                "md5": "456d8d9f692ddb0ae3e321acb800a6a2",
                "sha256": "dfc4a63aaba175b05e9cc1f260e1bfcd2832ff3235537146a334b6a2179f8a96"
            },
            "downloads": -1,
            "filename": "mini3di-0.2.1.tar.gz",
            "has_sig": false,
            "md5_digest": "456d8d9f692ddb0ae3e321acb800a6a2",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 18577,
            "upload_time": "2024-09-15T21:48:23",
            "upload_time_iso_8601": "2024-09-15T21:48:23.085474Z",
            "url": "https://files.pythonhosted.org/packages/e4/e4/a478c514aef8759e5e55fb2bed4fd242f19a82ba7d780cc24dd7952b1d72/mini3di-0.2.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-15 21:48:23",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "althonos",
    "github_project": "mini3di",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "mini3di"
}
        
Elapsed time: 0.30045s