_____________________.___.____ .____
\__ ___/\______ \ | | | |
| | | _/ | | | |
| | | | \ | |___| |___
|____| |____|_ /___|_______ \_______ \
\/ \/ \/
[![pypi version](https://img.shields.io/pypi/v/trill-proteins?color=blueviolet&style=flat-square)](https://pypi.org/project/trill-proteins)
![Downloads](https://pepy.tech/badge/trill-proteins)
[![license](https://img.shields.io/pypi/l/trill-proteins?color=blueviolet&style=flat-square)](LICENSE)
[![Documentation Status](https://readthedocs.org/projects/trill/badge/?version=latest&style=flat-square)](https://trill.readthedocs.io/en/latest/?badge=latest)
<!---![status](https://github.com/martinez-zacharya/TRILL/workflows/CI/badge.svg?style=flat-square&color=blueviolet)--->
# Intro
TRILL (**TR**aining and **I**nference using the **L**anguage of **L**ife) is a sandbox for creative protein engineering and discovery. As a bioengineer myself, deep-learning based approaches for protein design and analysis are of great interest to me. However, many of these deep-learning models are rather unwieldy, especially for non ML-practitioners due to their sheer size. Not only does TRILL allow researchers to perform inference on their proteins of interest using a variety of models, but it also democratizes the efficient fine-tuning of large-language models. Whether using Google Colab with one GPU or a supercomputer with many, TRILL empowers scientists to leverage models with millions to billions of parameters without worrying (too much) about hardware constraints. Currently, TRILL supports using these models as of v1.8.0:
## Breakdown of TRILL's Commands
| **Command** | **Function** | **Available Models** |
|:-----------:|:------------:|:--------------------:|
| **Embed** | Generates numerical representations or "embeddings" of protein sequences for quantitative analysis and comparison. | [ESM2](https://doi.org/10.1101/2022.07.20.500902), [ProtT5-XL](https://doi.org/10.1109/TPAMI.2021.3095381), [ProstT5](https://doi.org/10.1101/2023.07.23.550085), [Ankh](https://doi.org/10.48550/arXiv.2301.06568)|
| **Visualize** | Creates interactive 2D visualizations of embeddings for exploratory data analysis. | PCA, t-SNE, UMAP |
| **Finetune** | Finetunes protein language models for specific tasks. | [ESM2](https://doi.org/10.1101/2022.07.20.500902), [ProtGPT2](https://doi.org/10.1038/s41467-022-32007-7), [ZymCTRL](https://www.mlsb.io/papers_2022/ZymCTRL_a_conditional_language_model_for_the_controllable_generation_of_artificial_enzymes.pdf) |
| **Language Model Protein Generation** | Generates proteins using pretrained language models. | [ESM2](https://doi.org/10.1101/2022.07.20.500902), [ProtGPT2](https://doi.org/10.1038/s41467-022-32007-7), [ZymCTRL](https://www.mlsb.io/papers_2022/ZymCTRL_a_conditional_language_model_for_the_controllable_generation_of_artificial_enzymes.pdf) |
| **Inverse Folding Protein Generation** | Designs proteins to fold into specific 3D structures. | [ESM-IF1](https://doi.org/10.1101/2022.04.10.487779), [LigandMPNN](https://doi.org/10.1101/2023.12.22.573103), [ProstT5](https://doi.org/10.1101/2023.07.23.550085) |
| **Diffusion Based Protein Generation** | Uses denoising diffusion models to generate proteins. | [RFDiffusion](https://doi.org/10.1101/2022.12.09.519842) |
| **Fold** | Predicts 3D protein structures. | [ESMFold](https://doi.org/10.1101/2022.07.20.500902), [ProstT5](https://doi.org/10.1101/2023.07.23.550085) |
| **Dock** | Simulates protein-ligand interactions. | [DiffDock](https://doi.org/10.48550/arXiv.2210.01776), [Smina](https://doi.org/10.1021/ci300604z), [Autodock Vina](https://doi.org/10.1021/acs.jcim.1c00203), [Lightdock](https://doi.org/10.1093/bioinformatics/btx555), [GeoDock](https://doi.org/10.1101/2023.06.29.547134) |
| **Classify** | Predicts protein properties with pretrained models or train custom classifiers | [TemStaPro](https://doi.org/10.1101/2023.03.27.534365), [EpHod](https://doi.org/10.1101/2023.06.22.544776), [ECPICK](https://github.com/datax-lab/ECPICK?tab=readme-ov-file), [LightGBM](https://papers.nips.cc/paper_files/paper/2017/hash/6449f44a102fde848669bdd9eb6b76fa-Abstract.html), [XGBoost](https://doi.org/10.48550/arXiv.1603.02754), [Isolation Forest](https://doi.org/10.1109/ICDM.2008.17) |
| **Regress** | Train custom regression models. | [LightGBM](https://papers.nips.cc/paper_files/paper/2017/hash/6449f44a102fde848669bdd9eb6b76fa-Abstract.html), [Linear](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html)|
| **Simulate** | Uses molecular dynamics to simulate protein-ligand interactions. | [OpenMM](https://doi.org/10.1371/journal.pcbi.1005659) |
| **Score** | Utilize ESM1v or ESM2 to score protein sequences or ProteinMPNN to score protein structures in a zero-shot manner. | [COMPSS](https://www.nature.com/articles/s41587-024-02214-2#change-history) |
## Documentation
Check out the documentation and examples at https://trill.readthedocs.io/en/latest/index.html
Raw data
{
"_id": null,
"home_page": "https://github.com/martinez-zacharya/TRILL",
"name": "trill-proteins",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.8",
"maintainer_email": null,
"keywords": "NLP, Natural Language Processing, Protein Design, ESM2, ESMFold, ProteinMPNN, ProtGPT2, ZymCTRL, RFDiffusion",
"author": "Zachary Martinez",
"author_email": "martinez.zacharya@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/3f/49/1dbd272a391cb3ceda064ca622b4bcdae916b1ed6c5a70a13c5e31293007/trill_proteins-1.8.2.tar.gz",
"platform": null,
"description": " _____________________.___.____ .____ \n \\__ ___/\\______ \\ | | | | \n | | | _/ | | | | \n | | | | \\ | |___| |___ \n |____| |____|_ /___|_______ \\_______ \\\n \\/ \\/ \\/\n\n[![pypi version](https://img.shields.io/pypi/v/trill-proteins?color=blueviolet&style=flat-square)](https://pypi.org/project/trill-proteins)\n![Downloads](https://pepy.tech/badge/trill-proteins)\n[![license](https://img.shields.io/pypi/l/trill-proteins?color=blueviolet&style=flat-square)](LICENSE)\n[![Documentation Status](https://readthedocs.org/projects/trill/badge/?version=latest&style=flat-square)](https://trill.readthedocs.io/en/latest/?badge=latest)\n<!---![status](https://github.com/martinez-zacharya/TRILL/workflows/CI/badge.svg?style=flat-square&color=blueviolet)--->\n# Intro\nTRILL (**TR**aining and **I**nference using the **L**anguage of **L**ife) is a sandbox for creative protein engineering and discovery. As a bioengineer myself, deep-learning based approaches for protein design and analysis are of great interest to me. However, many of these deep-learning models are rather unwieldy, especially for non ML-practitioners due to their sheer size. Not only does TRILL allow researchers to perform inference on their proteins of interest using a variety of models, but it also democratizes the efficient fine-tuning of large-language models. Whether using Google Colab with one GPU or a supercomputer with many, TRILL empowers scientists to leverage models with millions to billions of parameters without worrying (too much) about hardware constraints. Currently, TRILL supports using these models as of v1.8.0:\n\n## Breakdown of TRILL's Commands\n\n| **Command** | **Function** | **Available Models** |\n|:-----------:|:------------:|:--------------------:|\n| **Embed** | Generates numerical representations or \"embeddings\" of protein sequences for quantitative analysis and comparison. | [ESM2](https://doi.org/10.1101/2022.07.20.500902), [ProtT5-XL](https://doi.org/10.1109/TPAMI.2021.3095381), [ProstT5](https://doi.org/10.1101/2023.07.23.550085), [Ankh](https://doi.org/10.48550/arXiv.2301.06568)|\n| **Visualize** | Creates interactive 2D visualizations of embeddings for exploratory data analysis. | PCA, t-SNE, UMAP |\n| **Finetune** | Finetunes protein language models for specific tasks. | [ESM2](https://doi.org/10.1101/2022.07.20.500902), [ProtGPT2](https://doi.org/10.1038/s41467-022-32007-7), [ZymCTRL](https://www.mlsb.io/papers_2022/ZymCTRL_a_conditional_language_model_for_the_controllable_generation_of_artificial_enzymes.pdf) |\n| **Language Model Protein Generation** | Generates proteins using pretrained language models. | [ESM2](https://doi.org/10.1101/2022.07.20.500902), [ProtGPT2](https://doi.org/10.1038/s41467-022-32007-7), [ZymCTRL](https://www.mlsb.io/papers_2022/ZymCTRL_a_conditional_language_model_for_the_controllable_generation_of_artificial_enzymes.pdf) |\n| **Inverse Folding Protein Generation** | Designs proteins to fold into specific 3D structures. | [ESM-IF1](https://doi.org/10.1101/2022.04.10.487779), [LigandMPNN](https://doi.org/10.1101/2023.12.22.573103), [ProstT5](https://doi.org/10.1101/2023.07.23.550085) |\n| **Diffusion Based Protein Generation** | Uses denoising diffusion models to generate proteins. | [RFDiffusion](https://doi.org/10.1101/2022.12.09.519842) |\n| **Fold** | Predicts 3D protein structures. | [ESMFold](https://doi.org/10.1101/2022.07.20.500902), [ProstT5](https://doi.org/10.1101/2023.07.23.550085) |\n| **Dock** | Simulates protein-ligand interactions. | [DiffDock](https://doi.org/10.48550/arXiv.2210.01776), [Smina](https://doi.org/10.1021/ci300604z), [Autodock Vina](https://doi.org/10.1021/acs.jcim.1c00203), [Lightdock](https://doi.org/10.1093/bioinformatics/btx555), [GeoDock](https://doi.org/10.1101/2023.06.29.547134) |\n| **Classify** | Predicts protein properties with pretrained models or train custom classifiers | [TemStaPro](https://doi.org/10.1101/2023.03.27.534365), [EpHod](https://doi.org/10.1101/2023.06.22.544776), [ECPICK](https://github.com/datax-lab/ECPICK?tab=readme-ov-file), [LightGBM](https://papers.nips.cc/paper_files/paper/2017/hash/6449f44a102fde848669bdd9eb6b76fa-Abstract.html), [XGBoost](https://doi.org/10.48550/arXiv.1603.02754), [Isolation Forest](https://doi.org/10.1109/ICDM.2008.17) |\n| **Regress** | Train custom regression models. | [LightGBM](https://papers.nips.cc/paper_files/paper/2017/hash/6449f44a102fde848669bdd9eb6b76fa-Abstract.html), [Linear](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html)|\n| **Simulate** | Uses molecular dynamics to simulate protein-ligand interactions. | [OpenMM](https://doi.org/10.1371/journal.pcbi.1005659) |\n| **Score** | Utilize ESM1v or ESM2 to score protein sequences or ProteinMPNN to score protein structures in a zero-shot manner. | [COMPSS](https://www.nature.com/articles/s41587-024-02214-2#change-history) |\n\n\n## Documentation\nCheck out the documentation and examples at https://trill.readthedocs.io/en/latest/index.html\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Sandbox for Computational Protein Design",
"version": "1.8.2",
"project_urls": {
"Documentation": "https://trill.readthedocs.io/en/latest/home.html",
"Homepage": "https://github.com/martinez-zacharya/TRILL",
"Repository": "https://github.com/martinez-zacharya/TRILL"
},
"split_keywords": [
"nlp",
" natural language processing",
" protein design",
" esm2",
" esmfold",
" proteinmpnn",
" protgpt2",
" zymctrl",
" rfdiffusion"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "966947d8e32ef08dc389afb15dae8a3d0cf7f8c1a8bdbd66dab8441f2e236899",
"md5": "0e9be68b95012ce1dd885bc40e1752c5",
"sha256": "74224681ed6e6c5ad8a0fab58bfa5294a7e4605a0a6f91766fd81e91751f2125"
},
"downloads": -1,
"filename": "trill_proteins-1.8.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "0e9be68b95012ce1dd885bc40e1752c5",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.8",
"size": 11025044,
"upload_time": "2024-06-26T22:47:01",
"upload_time_iso_8601": "2024-06-26T22:47:01.120607Z",
"url": "https://files.pythonhosted.org/packages/96/69/47d8e32ef08dc389afb15dae8a3d0cf7f8c1a8bdbd66dab8441f2e236899/trill_proteins-1.8.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "3f491dbd272a391cb3ceda064ca622b4bcdae916b1ed6c5a70a13c5e31293007",
"md5": "9d9b04a98c21567d53e54895ba812ae4",
"sha256": "94d3be8ee3354d8c389c96ae5af413a0fd55058369b6f5ac75a004f290088f00"
},
"downloads": -1,
"filename": "trill_proteins-1.8.2.tar.gz",
"has_sig": false,
"md5_digest": "9d9b04a98c21567d53e54895ba812ae4",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.8",
"size": 11006655,
"upload_time": "2024-06-26T22:47:04",
"upload_time_iso_8601": "2024-06-26T22:47:04.993991Z",
"url": "https://files.pythonhosted.org/packages/3f/49/1dbd272a391cb3ceda064ca622b4bcdae916b1ed6c5a70a13c5e31293007/trill_proteins-1.8.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-06-26 22:47:04",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "martinez-zacharya",
"github_project": "TRILL",
"travis_ci": false,
"coveralls": true,
"github_actions": true,
"lcname": "trill-proteins"
}