# AmyloDeep
**Prediction of amyloid propensity from amino acid sequences using deep learning**
AmyloDeep is a Python package that uses a 5-model ensemble to predict amyloidogenic regions in protein sequences using a rolling window approach. The package combines multiple state-of-the-art machine learning models including ESM2 transformers, UniRep embeddings, SVM, and XGBoost to provide accurate amyloid propensity predictions.
## Features
- **Multi-model ensemble**: Combines 5 different models for robust predictions
- **Rolling window analysis**: Analyzes sequences using sliding windows of configurable size
- **Pre-trained models**: Uses models trained on amyloid sequence databases
- **Calibrated probabilities**: Includes probability calibration for better confidence estimates
- **Easy-to-use API**: Simple Python interface and command-line tool
- **Streamlit web interface**: Optional web interface for interactive predictions
## Installation
### From PyPI (recommended)
```bash
pip install amylodeep
```
### From source
```bash
git clone https://github.com/AlisaDavtyan/protein_classification.git
cd amylodeep
pip install -e .
```
For development:
```bash
pip install amylodeep[dev]
```
## Quick Start
### Python API
```python
from amylodeep import predict_ensemble_rolling
# Predict amyloid propensity for a protein sequence
sequence = "MKTFFFLLLLFTIGFCYVQFSKLKLENLHFKDNSEGLKNGGLQRQLGLTLKFNSNSLHHTSNL"
result = predict_ensemble_rolling(sequence, window_size=6)
print(f"Average probability: {result['avg_probability']:.4f}")
print(f"Maximum probability: {result['max_probability']:.4f}")
# Access position-wise probabilities
for position, probability in result['position_probs']:
print(f"Position {position}: {probability:.4f}")
```
### Command Line Interface
```bash
# Basic prediction
amylodeep "MKTFFFLLLLFTIGFCYVQFSKLKLENLHFKDNSEGLKNGGLQRQLGLTLKFNSNSLHHTSNL"
# With custom window size
amylodeep "SEQUENCE" --window-size 10
# Save results to file
amylodeep "SEQUENCE" --output results.json --format json
# CSV output
amylodeep "SEQUENCE" --output results.csv --format csv
```
## Model Architecture
AmyloDeep uses an ensemble of 5 models:
1. **ESM2-150M**: Fine-tuned ESM2 transformer (150M parameters)
2. **UniRep**: UniRep-based neural network classifier
3. **ESM2-650M**: Custom classifier using ESM2-650M embeddings
4. **SVM**: Support Vector Machine with ESM2 embeddings
5. **XGBoost**: Gradient boosting with ESM2 embeddings
The models are combined using probability averaging, with some models using probability calibration (Platt scaling or isotonic regression) for better confidence estimates.
## Requirements
- Python >= 3.8
- PyTorch >= 1.9.0
- Transformers >= 4.15.0
- NumPy >= 1.20.0
- scikit-learn >= 1.0.0
- XGBoost >= 1.5.0
- jax-unirep >= 2.0.0
- wandb >= 0.12.0
### Main Functions
#### `predict_ensemble_rolling(sequence, window_size=6)`
Predict amyloid propensity for a protein sequence using rolling window analysis.
**Parameters:**
- `sequence` (str): Protein sequence (amino acid letters)
- `window_size` (int): Size of the rolling window (default: 6)
**Returns:**
Dictionary containing:
- `position_probs`: List of (position, probability) tuples
- `avg_probability`: Average probability across all windows
- `max_probability`: Maximum probability across all windows
- `sequence_length`: Length of the input sequence
- `num_windows`: Number of windows analyzed
Individual model classes for ESM and UniRep-based predictions.
## Contributing
We welcome contributions! Please see our contributing guidelines for more information.
## License
This project is licensed under the MIT License - see the LICENSE file for details.
## Citation
If you use AmyloDeep in your research, please cite:
```bibtex
@software{amylodeep2025,
title={AmyloDeep: Prediction of amyloid propensity from amino acid sequences using deep learning},
author={Alisa Davtyan},
year={2025},
url={https://github.com/AlisaDavtyan/protein_classification}
}
```
## Support
For questions and support:
- Open an issue on GitHub
- Contact: alisadavtyan7@gmail.com
## Changelog
### v0.1.0
- Initial release
- 5-model ensemble implementation
- Rolling window prediction
- Command-line interface
- Python API
Raw data
{
"_id": null,
"home_page": null,
"name": "amylodeep",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "bioinformatics, amyloid, deep learning, protein, sequence classification",
"author": null,
"author_email": "Alisa Davtyan <alisadavtyan7@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/04/fc/e4f58efb34c14b6fb031bcb21d805311cb9aa7db54000205a79c72b5f1f6/amylodeep-0.1.2.tar.gz",
"platform": null,
"description": "# AmyloDeep\n\n**Prediction of amyloid propensity from amino acid sequences using deep learning**\n\nAmyloDeep is a Python package that uses a 5-model ensemble to predict amyloidogenic regions in protein sequences using a rolling window approach. The package combines multiple state-of-the-art machine learning models including ESM2 transformers, UniRep embeddings, SVM, and XGBoost to provide accurate amyloid propensity predictions.\n\n## Features\n\n- **Multi-model ensemble**: Combines 5 different models for robust predictions\n- **Rolling window analysis**: Analyzes sequences using sliding windows of configurable size\n- **Pre-trained models**: Uses models trained on amyloid sequence databases\n- **Calibrated probabilities**: Includes probability calibration for better confidence estimates\n- **Easy-to-use API**: Simple Python interface and command-line tool\n- **Streamlit web interface**: Optional web interface for interactive predictions\n\n## Installation\n\n### From PyPI (recommended)\n\n```bash\npip install amylodeep\n```\n\n### From source\n\n```bash\ngit clone https://github.com/AlisaDavtyan/protein_classification.git\ncd amylodeep\npip install -e .\n```\n\n\n\nFor development:\n```bash\npip install amylodeep[dev]\n```\n\n## Quick Start\n\n### Python API\n\n```python\nfrom amylodeep import predict_ensemble_rolling\n\n# Predict amyloid propensity for a protein sequence\nsequence = \"MKTFFFLLLLFTIGFCYVQFSKLKLENLHFKDNSEGLKNGGLQRQLGLTLKFNSNSLHHTSNL\"\nresult = predict_ensemble_rolling(sequence, window_size=6)\n\nprint(f\"Average probability: {result['avg_probability']:.4f}\")\nprint(f\"Maximum probability: {result['max_probability']:.4f}\")\n\n# Access position-wise probabilities\nfor position, probability in result['position_probs']:\n print(f\"Position {position}: {probability:.4f}\")\n```\n\n### Command Line Interface\n\n```bash\n# Basic prediction\namylodeep \"MKTFFFLLLLFTIGFCYVQFSKLKLENLHFKDNSEGLKNGGLQRQLGLTLKFNSNSLHHTSNL\"\n\n# With custom window size\namylodeep \"SEQUENCE\" --window-size 10\n\n# Save results to file\namylodeep \"SEQUENCE\" --output results.json --format json\n\n# CSV output\namylodeep \"SEQUENCE\" --output results.csv --format csv\n```\n\n\n## Model Architecture\n\nAmyloDeep uses an ensemble of 5 models:\n\n1. **ESM2-150M**: Fine-tuned ESM2 transformer (150M parameters)\n2. **UniRep**: UniRep-based neural network classifier\n3. **ESM2-650M**: Custom classifier using ESM2-650M embeddings\n4. **SVM**: Support Vector Machine with ESM2 embeddings\n5. **XGBoost**: Gradient boosting with ESM2 embeddings\n\nThe models are combined using probability averaging, with some models using probability calibration (Platt scaling or isotonic regression) for better confidence estimates.\n\n## Requirements\n\n- Python >= 3.8\n- PyTorch >= 1.9.0\n- Transformers >= 4.15.0\n- NumPy >= 1.20.0\n- scikit-learn >= 1.0.0\n- XGBoost >= 1.5.0\n- jax-unirep >= 2.0.0\n- wandb >= 0.12.0\n\n\n\n\n### Main Functions\n\n#### `predict_ensemble_rolling(sequence, window_size=6)`\n\nPredict amyloid propensity for a protein sequence using rolling window analysis.\n\n**Parameters:**\n- `sequence` (str): Protein sequence (amino acid letters)\n- `window_size` (int): Size of the rolling window (default: 6)\n\n**Returns:**\nDictionary containing:\n- `position_probs`: List of (position, probability) tuples\n- `avg_probability`: Average probability across all windows\n- `max_probability`: Maximum probability across all windows\n- `sequence_length`: Length of the input sequence\n- `num_windows`: Number of windows analyzed\n\n\nIndividual model classes for ESM and UniRep-based predictions.\n\n## Contributing\n\nWe welcome contributions! Please see our contributing guidelines for more information.\n\n## License\n\nThis project is licensed under the MIT License - see the LICENSE file for details.\n\n## Citation\n\nIf you use AmyloDeep in your research, please cite:\n\n```bibtex\n@software{amylodeep2025,\n title={AmyloDeep: Prediction of amyloid propensity from amino acid sequences using deep learning},\n author={Alisa Davtyan},\n year={2025},\n url={https://github.com/AlisaDavtyan/protein_classification}\n}\n```\n\n## Support\n\nFor questions and support:\n- Open an issue on GitHub\n- Contact: alisadavtyan7@gmail.com\n\n## Changelog\n\n### v0.1.0\n- Initial release\n- 5-model ensemble implementation\n- Rolling window prediction\n- Command-line interface\n- Python API\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Prediction of amyloid propensity from amino acid sequences using ensemble deep learning and LLM models",
"version": "0.1.2",
"project_urls": {
"Bug Tracker": "https://github.com/AlisaDavtyan/protein_classification/issues",
"Repository": "https://github.com/AlisaDavtyan/protein_classification"
},
"split_keywords": [
"bioinformatics",
" amyloid",
" deep learning",
" protein",
" sequence classification"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "1a4d2eae8742b54f75cac6952c22e86dadd55d2126d33efff57cea311402acdb",
"md5": "d604ab8fc846b872a7b9d1c6945997c4",
"sha256": "980b047e77eda2655f571fad1e22faf98bfca7181dcf7c25f66cf6ba6f7636ac"
},
"downloads": -1,
"filename": "amylodeep-0.1.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "d604ab8fc846b872a7b9d1c6945997c4",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 12428,
"upload_time": "2025-07-27T11:09:02",
"upload_time_iso_8601": "2025-07-27T11:09:02.595374Z",
"url": "https://files.pythonhosted.org/packages/1a/4d/2eae8742b54f75cac6952c22e86dadd55d2126d33efff57cea311402acdb/amylodeep-0.1.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "04fce4f58efb34c14b6fb031bcb21d805311cb9aa7db54000205a79c72b5f1f6",
"md5": "93617b436bc10fea8b7adc997ea8b398",
"sha256": "ea3cd0ee685d80ba730e60318dad7bb17ee763a3b88853cfcdf5ce9e3bf9e69d"
},
"downloads": -1,
"filename": "amylodeep-0.1.2.tar.gz",
"has_sig": false,
"md5_digest": "93617b436bc10fea8b7adc997ea8b398",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 12229,
"upload_time": "2025-07-27T11:09:03",
"upload_time_iso_8601": "2025-07-27T11:09:03.813369Z",
"url": "https://files.pythonhosted.org/packages/04/fc/e4f58efb34c14b6fb031bcb21d805311cb9aa7db54000205a79c72b5f1f6/amylodeep-0.1.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-27 11:09:03",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "AlisaDavtyan",
"github_project": "protein_classification",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "accelerate",
"specs": [
[
"==",
"1.7.0"
]
]
},
{
"name": "alembic",
"specs": [
[
"==",
"1.16.1"
]
]
},
{
"name": "altair",
"specs": [
[
"==",
"5.5.0"
]
]
},
{
"name": "annotated-types",
"specs": [
[
"==",
"0.7.0"
]
]
},
{
"name": "appnope",
"specs": [
[
"==",
"0.1.4"
]
]
},
{
"name": "asttokens",
"specs": [
[
"==",
"3.0.0"
]
]
},
{
"name": "attrs",
"specs": [
[
"==",
"25.3.0"
]
]
},
{
"name": "beautifulsoup4",
"specs": [
[
"==",
"4.13.4"
]
]
},
{
"name": "blinker",
"specs": [
[
"==",
"1.9.0"
]
]
},
{
"name": "bs4",
"specs": [
[
"==",
"0.0.2"
]
]
},
{
"name": "cachetools",
"specs": [
[
"==",
"6.1.0"
]
]
},
{
"name": "certifi",
"specs": [
[
"==",
"2025.7.9"
]
]
},
{
"name": "charset-normalizer",
"specs": [
[
"==",
"3.4.2"
]
]
},
{
"name": "click",
"specs": [
[
"==",
"8.2.1"
]
]
},
{
"name": "colorlog",
"specs": [
[
"==",
"6.9.0"
]
]
},
{
"name": "comm",
"specs": [
[
"==",
"0.2.2"
]
]
},
{
"name": "contourpy",
"specs": [
[
"==",
"1.3.2"
]
]
},
{
"name": "cycler",
"specs": [
[
"==",
"0.12.1"
]
]
},
{
"name": "debugpy",
"specs": [
[
"==",
"1.8.14"
]
]
},
{
"name": "decorator",
"specs": [
[
"==",
"5.2.1"
]
]
},
{
"name": "executing",
"specs": [
[
"==",
"2.2.0"
]
]
},
{
"name": "filelock",
"specs": [
[
"==",
"3.18.0"
]
]
},
{
"name": "Flask",
"specs": [
[
"==",
"3.1.1"
]
]
},
{
"name": "fonttools",
"specs": [
[
"==",
"4.58.3"
]
]
},
{
"name": "fsspec",
"specs": [
[
"==",
"2025.5.1"
]
]
},
{
"name": "gitdb",
"specs": [
[
"==",
"4.0.12"
]
]
},
{
"name": "GitPython",
"specs": [
[
"==",
"3.1.44"
]
]
},
{
"name": "h11",
"specs": [
[
"==",
"0.16.0"
]
]
},
{
"name": "hf-xet",
"specs": [
[
"==",
"1.1.3"
]
]
},
{
"name": "huggingface-hub",
"specs": [
[
"==",
"0.32.4"
]
]
},
{
"name": "idna",
"specs": [
[
"==",
"3.10"
]
]
},
{
"name": "ipykernel",
"specs": [
[
"==",
"6.29.5"
]
]
},
{
"name": "ipython",
"specs": [
[
"==",
"9.3.0"
]
]
},
{
"name": "ipython_pygments_lexers",
"specs": [
[
"==",
"1.1.1"
]
]
},
{
"name": "itsdangerous",
"specs": [
[
"==",
"2.2.0"
]
]
},
{
"name": "jax",
"specs": [
[
"==",
"0.6.1"
]
]
},
{
"name": "jax-unirep",
"specs": [
[
"==",
"2.2.0"
]
]
},
{
"name": "jaxlib",
"specs": [
[
"==",
"0.6.1"
]
]
},
{
"name": "jedi",
"specs": [
[
"==",
"0.19.2"
]
]
},
{
"name": "Jinja2",
"specs": [
[
"==",
"3.1.6"
]
]
},
{
"name": "joblib",
"specs": [
[
"==",
"1.5.1"
]
]
},
{
"name": "jsonschema",
"specs": [
[
"==",
"4.24.1"
]
]
},
{
"name": "jsonschema-specifications",
"specs": [
[
"==",
"2025.4.1"
]
]
},
{
"name": "jupyter_client",
"specs": [
[
"==",
"8.6.3"
]
]
},
{
"name": "jupyter_core",
"specs": [
[
"==",
"5.8.1"
]
]
},
{
"name": "kiwisolver",
"specs": [
[
"==",
"1.4.8"
]
]
},
{
"name": "Mako",
"specs": [
[
"==",
"1.3.10"
]
]
},
{
"name": "MarkupSafe",
"specs": [
[
"==",
"3.0.2"
]
]
},
{
"name": "matplotlib",
"specs": [
[
"==",
"3.10.3"
]
]
},
{
"name": "matplotlib-inline",
"specs": [
[
"==",
"0.1.7"
]
]
},
{
"name": "ml_dtypes",
"specs": [
[
"==",
"0.5.1"
]
]
},
{
"name": "mpmath",
"specs": [
[
"==",
"1.3.0"
]
]
},
{
"name": "multipledispatch",
"specs": [
[
"==",
"1.0.0"
]
]
},
{
"name": "narwhals",
"specs": [
[
"==",
"1.47.1"
]
]
},
{
"name": "nest-asyncio",
"specs": [
[
"==",
"1.6.0"
]
]
},
{
"name": "networkx",
"specs": [
[
"==",
"3.5"
]
]
},
{
"name": "numpy",
"specs": [
[
"==",
"2.3.0"
]
]
},
{
"name": "opt_einsum",
"specs": [
[
"==",
"3.4.0"
]
]
},
{
"name": "optuna",
"specs": [
[
"==",
"4.3.0"
]
]
},
{
"name": "outcome",
"specs": [
[
"==",
"1.3.0.post0"
]
]
},
{
"name": "packaging",
"specs": [
[
"==",
"25.0"
]
]
},
{
"name": "pandas",
"specs": [
[
"==",
"2.3.0"
]
]
},
{
"name": "parso",
"specs": [
[
"==",
"0.8.4"
]
]
},
{
"name": "pexpect",
"specs": [
[
"==",
"4.9.0"
]
]
},
{
"name": "pillow",
"specs": [
[
"==",
"11.2.1"
]
]
},
{
"name": "platformdirs",
"specs": [
[
"==",
"4.3.8"
]
]
},
{
"name": "prompt_toolkit",
"specs": [
[
"==",
"3.0.51"
]
]
},
{
"name": "protobuf",
"specs": [
[
"==",
"6.31.1"
]
]
},
{
"name": "psutil",
"specs": [
[
"==",
"7.0.0"
]
]
},
{
"name": "ptyprocess",
"specs": [
[
"==",
"0.7.0"
]
]
},
{
"name": "pure_eval",
"specs": [
[
"==",
"0.2.3"
]
]
},
{
"name": "pyarrow",
"specs": [
[
"==",
"21.0.0"
]
]
},
{
"name": "pydantic",
"specs": [
[
"==",
"2.11.7"
]
]
},
{
"name": "pydantic_core",
"specs": [
[
"==",
"2.33.2"
]
]
},
{
"name": "pydeck",
"specs": [
[
"==",
"0.9.1"
]
]
},
{
"name": "Pygments",
"specs": [
[
"==",
"2.19.1"
]
]
},
{
"name": "pyparsing",
"specs": [
[
"==",
"3.2.3"
]
]
},
{
"name": "PySocks",
"specs": [
[
"==",
"1.7.1"
]
]
},
{
"name": "python-dateutil",
"specs": [
[
"==",
"2.9.0.post0"
]
]
},
{
"name": "pytz",
"specs": [
[
"==",
"2025.2"
]
]
},
{
"name": "PyYAML",
"specs": [
[
"==",
"6.0.2"
]
]
},
{
"name": "pyzmq",
"specs": [
[
"==",
"26.4.0"
]
]
},
{
"name": "referencing",
"specs": [
[
"==",
"0.36.2"
]
]
},
{
"name": "regex",
"specs": [
[
"==",
"2024.11.6"
]
]
},
{
"name": "requests",
"specs": [
[
"==",
"2.32.4"
]
]
},
{
"name": "rpds-py",
"specs": [
[
"==",
"0.26.0"
]
]
},
{
"name": "safetensors",
"specs": [
[
"==",
"0.5.3"
]
]
},
{
"name": "scikit-learn",
"specs": [
[
"==",
"1.7.0"
]
]
},
{
"name": "scipy",
"specs": [
[
"==",
"1.15.3"
]
]
},
{
"name": "seaborn",
"specs": [
[
"==",
"0.13.2"
]
]
},
{
"name": "selenium",
"specs": [
[
"==",
"4.34.2"
]
]
},
{
"name": "sentry-sdk",
"specs": [
[
"==",
"2.33.1"
]
]
},
{
"name": "setuptools",
"specs": [
[
"==",
"80.9.0"
]
]
},
{
"name": "six",
"specs": [
[
"==",
"1.17.0"
]
]
},
{
"name": "smmap",
"specs": [
[
"==",
"5.0.2"
]
]
},
{
"name": "sniffio",
"specs": [
[
"==",
"1.3.1"
]
]
},
{
"name": "sortedcontainers",
"specs": [
[
"==",
"2.4.0"
]
]
},
{
"name": "soupsieve",
"specs": [
[
"==",
"2.7"
]
]
},
{
"name": "SQLAlchemy",
"specs": [
[
"==",
"2.0.41"
]
]
},
{
"name": "stack-data",
"specs": [
[
"==",
"0.6.3"
]
]
},
{
"name": "streamlit",
"specs": [
[
"==",
"1.47.0"
]
]
},
{
"name": "streamlit-option-menu",
"specs": [
[
"==",
"0.4.0"
]
]
},
{
"name": "sympy",
"specs": [
[
"==",
"1.13.1"
]
]
},
{
"name": "tenacity",
"specs": [
[
"==",
"9.1.2"
]
]
},
{
"name": "threadpoolctl",
"specs": [
[
"==",
"3.6.0"
]
]
},
{
"name": "tokenizers",
"specs": [
[
"==",
"0.21.1"
]
]
},
{
"name": "toml",
"specs": [
[
"==",
"0.10.2"
]
]
},
{
"name": "torch",
"specs": [
[
"==",
"2.6.0"
]
]
},
{
"name": "tornado",
"specs": [
[
"==",
"6.5.1"
]
]
},
{
"name": "tqdm",
"specs": [
[
"==",
"4.67.1"
]
]
},
{
"name": "traitlets",
"specs": [
[
"==",
"5.14.3"
]
]
},
{
"name": "transformers",
"specs": [
[
"==",
"4.52.4"
]
]
},
{
"name": "trio",
"specs": [
[
"==",
"0.30.0"
]
]
},
{
"name": "trio-websocket",
"specs": [
[
"==",
"0.12.2"
]
]
},
{
"name": "typing-inspection",
"specs": [
[
"==",
"0.4.1"
]
]
},
{
"name": "typing_extensions",
"specs": [
[
"==",
"4.14.0"
]
]
},
{
"name": "tzdata",
"specs": [
[
"==",
"2025.2"
]
]
},
{
"name": "urllib3",
"specs": [
[
"==",
"2.5.0"
]
]
},
{
"name": "wandb",
"specs": [
[
"==",
"0.21.0"
]
]
},
{
"name": "wcwidth",
"specs": [
[
"==",
"0.2.13"
]
]
},
{
"name": "websocket-client",
"specs": [
[
"==",
"1.8.0"
]
]
},
{
"name": "Werkzeug",
"specs": [
[
"==",
"3.1.3"
]
]
},
{
"name": "wsproto",
"specs": [
[
"==",
"1.2.0"
]
]
},
{
"name": "xgboost",
"specs": [
[
"==",
"3.0.2"
]
]
}
],
"lcname": "amylodeep"
}