instanexus


Nameinstanexus JSON
Version 0.1.0 PyPI version JSON
download
home_pageNone
SummaryEnd-to-end workflow for de novo protein sequencing based on InstaNovo
upload_time2025-10-16 13:40:20
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseMIT
keywords assembly bioinformatics de novo mass spectrometry protein sequencing proteomics
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <p align="center">
  <img src="images/instanexus_logo 2.svg" width="600" alt="InstaNexus logo">
</p>

<p align="center"><em>A de novo protein sequencing workflow</em></p>

<p align="center">
  <img src="https://img.shields.io/badge/environment-conda-blue" alt="Conda">
  <img src="https://img.shields.io/badge/license-MIT-green" alt="License">
  <img src="https://img.shields.io/badge/python-3.9+-blue" alt="Python">
</p>

---

## Table of Contents
- [Introduction](#introduction)
- [Features](#features)
- [Workflow Diagram](#workflow-diagram)
- [Repository Structure](#repository-structure)
- [Installation](#installation)
- [Command-Line Usage](#command-line-usage)
- [Hyperparameter Optimization](#hyperparameter-optimization)
- [License](#license)
- [Acknowledgments](#acknowledgments)
- [References](#references)
- [Citation](#citation)

---

## Introduction

InstaNexus is a generalizable, end-to-end workflow for direct protein sequencing, tailored to reconstruct full-length protein therapeutics such as antibodies and nanobodies. It integrates AI-driven de novo peptide sequencing with optimized assembly and scoring strategies to maximize accuracy, coverage, and functional relevance.

This pipeline enables robust reconstruction of critical protein regions, advancing applications in therapeutic discovery, immune profiling, and protein engineering.

---

## Features

- ๐Ÿงฌ Supports De Bruijn Graph and Greedy-based assembly
- โš—๏ธ Handles multiple protease digestions (Trypsin, LysC, GluC, etc.)
- ๐Ÿงน Integrated contaminant removal and confidence filtering
- ๐Ÿงฉ Clustering, alignment, and consensus sequence reconstruction
- ๐Ÿ”— Integrates with external tools:
  - [MMseqs2](https://github.com/soedinglab/MMseqs2) for fast clustering
  - [Clustal Omega](https://www.ebi.ac.uk/Tools/msa/clustalo/) for high-quality alignment
- ๐Ÿ“Š Output-ready for downstream analysis and visualization

---

## Workflow Diagram

<p align="center">
  <img src="images/instanexus_panel.png" width="900" alt="InstaNexus Workflow">
</p>

---

## Repository Structure


| Folder / File | Description |
|----------------|-------------|
| `environment.linux.yml` | Conda environment for Linux systems |
| `environment.osx-arm64.yaml` | Conda environment for macOS (Apple Silicon) |
| `src/instanexus/` | Core InstaNexus package (modules + CLI) |
| `src/instanexus/__main__.py` | Entry point for CLI (`instanexus` command) |
| `src/instanexus/script_dbg.py` | De Bruijn Graph-based assembly |
| `src/instanexus/script_greedy.py` | Greedy-based peptide assembly |
| `src/opt/` | Grid search and optimization workflows |
| `fasta/` | FASTA reference and contaminant sequences |
| `inputs/` | Example input CSV files |
| `json/` | Metadata and parameter configuration files |
| `notebooks/` | Jupyter notebooks for analysis and visualization |
| `images/` | Logos and workflow figures |
| `outputs/` | Generated results (created during execution) |

---

## Installation

- [Conda](https://docs.conda.io/en/latest/)
- [MMseqs2](https://github.com/soedinglab/MMseqs2)
- [Clustal Omega](https://www.ebi.ac.uk/Tools/msa/clustalo/)

> [!IMPORTANT]
> MMseqs2 and Clustal Omega are available through Conda, but compatibility depends on your system architecture.
> - ๐Ÿ” [Clustal Omega on Anaconda.org](https://anaconda.org/search?q=clustalo)   

---

## Getting Started

Follow these steps to clone the repository and set up the environment using Conda:

### 1. Clone the repository

To clone and set up the environment:

```bash
git clone git@github.com:Multiomics-Analytics-Group/InstaNexus.git
cd instanexus
```

### 2. Create and activate the Conda environment

Create instanexus conda environment for linux.

```bash
conda env create -f environment.linux.yml
```

Create instanexus conda environment for OS.

```bash
conda env create -f environment.osx-arm64.yaml
```

Activate:

```bash
conda activate instanexus
```

---

### 3. Install InstaNexus as a local package

```
pip install -e .
```

Then verify the CLI installation:

```
instanexus --version
```

---

## Command-line usage

After activating the environment, you can run InstaNexus directly from the terminal:
```bash
instanexus --help
```

### Run De Bruijn graph assembly

```
instanexus dbg --input_csv inputs/sample.csv --chain light --folder_outputs outputs --reference
```

### Run greedy assembly

```
instanexus greedy --input_csv inputs/sample.csv --folder_outputs outputs
```




---

## Hyperparameter Optimization

To launch the hyperparameter grid search, run the following command from the project root (the folder containing ```src/``` and ```json/```):

```bash
python -m src.opt.gridsearch
```
**Adjusting Parameters**

Grid search parameters for both the De Bruijn graph (dbg) and Greedy (greedy) assembly methods are defined in:

```bash
json/gridsearch_params.json
```

To test more (or fewer) combinations, edit the arrays for each parameter in this file.

## License

This project is licensed under the [MIT License](LICENSE).

---

## Acknowledgments

InstaNexus was developed at **DTU Biosustain** and **DTU Bioengineering**.

We are grateful to the **DTU Bioengineering Proteomics Core Facility** for maintenance and operation of mass spectrometry instrumentation.

We also thank the **Informatics Platform at DTU Biosustain** for their support during the development and optimization of InstaNexus.

Special thanks to the users and developers of:
- [MMseqs2](https://github.com/soedinglab/MMseqs2)
- [Clustal Omega](https://www.ebi.ac.uk/Tools/msa/clustalo/)

---

## References

1. Hauser, M., et al. **MMseqs2: ultra fast and sensitive sequence searching**. *Nature Biotechnology* 35, 1026โ€“1028 (2016). https://doi.org/10.1038/nbt.3988  
2. Sievers, F., et al. **Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega**. *Molecular Systems Biology* 7, 539 (2011). https://doi.org/10.1038/msb.2011.75
3. Eloff, K., Kalogeropoulos, K., Mabona, A., Morell, O., Catzel, R., Rivera-de-Torre, E., ... & Jenkins, T. P. (2025). **InstaNovo enables diffusion-powered de novo peptide sequencing in large-scale proteomics experiments.** Nature Machine Intelligence, 1-15.

---

## Citation

If you find this project useful in your research or work, please cite it as:

Reverenna M., Nielsen M. W., Wolff D. S., Lytra E., Colaianni P. D., Ljungars A., Laustsen A. H., Schoof E. M., Van Goey J., Jenkins T. P., Lukassen M. V., Santos A., Kalogeropoulos K. (2025). *Generalizable direct protein sequencing with InstaNexus* [Preprint]. bioRxiv. https://doi.org/10.1101/2025.07.25.666861

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "instanexus",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "assembly, bioinformatics, de novo, mass spectrometry, protein sequencing, proteomics",
    "author": null,
    "author_email": "Marco Reverenna <marcor@dtu.dk>",
    "download_url": "https://files.pythonhosted.org/packages/94/a3/b38478e8f9ac1451034466a457b6806738115a9bab9c9b1849b94ed29c83/instanexus-0.1.0.tar.gz",
    "platform": null,
    "description": "<p align=\"center\">\n  <img src=\"images/instanexus_logo 2.svg\" width=\"600\" alt=\"InstaNexus logo\">\n</p>\n\n<p align=\"center\"><em>A de novo protein sequencing workflow</em></p>\n\n<p align=\"center\">\n  <img src=\"https://img.shields.io/badge/environment-conda-blue\" alt=\"Conda\">\n  <img src=\"https://img.shields.io/badge/license-MIT-green\" alt=\"License\">\n  <img src=\"https://img.shields.io/badge/python-3.9+-blue\" alt=\"Python\">\n</p>\n\n---\n\n## Table of Contents\n- [Introduction](#introduction)\n- [Features](#features)\n- [Workflow Diagram](#workflow-diagram)\n- [Repository Structure](#repository-structure)\n- [Installation](#installation)\n- [Command-Line Usage](#command-line-usage)\n- [Hyperparameter Optimization](#hyperparameter-optimization)\n- [License](#license)\n- [Acknowledgments](#acknowledgments)\n- [References](#references)\n- [Citation](#citation)\n\n---\n\n## Introduction\n\nInstaNexus is a generalizable, end-to-end workflow for direct protein sequencing, tailored to reconstruct full-length protein therapeutics such as antibodies and nanobodies. It integrates AI-driven de novo peptide sequencing with optimized assembly and scoring strategies to maximize accuracy, coverage, and functional relevance.\n\nThis pipeline enables robust reconstruction of critical protein regions, advancing applications in therapeutic discovery, immune profiling, and protein engineering.\n\n---\n\n## Features\n\n- \ud83e\uddec Supports De Bruijn Graph and Greedy-based assembly\n- \u2697\ufe0f Handles multiple protease digestions (Trypsin, LysC, GluC, etc.)\n- \ud83e\uddf9 Integrated contaminant removal and confidence filtering\n- \ud83e\udde9 Clustering, alignment, and consensus sequence reconstruction\n- \ud83d\udd17 Integrates with external tools:\n  - [MMseqs2](https://github.com/soedinglab/MMseqs2) for fast clustering\n  - [Clustal Omega](https://www.ebi.ac.uk/Tools/msa/clustalo/) for high-quality alignment\n- \ud83d\udcca Output-ready for downstream analysis and visualization\n\n---\n\n## Workflow Diagram\n\n<p align=\"center\">\n  <img src=\"images/instanexus_panel.png\" width=\"900\" alt=\"InstaNexus Workflow\">\n</p>\n\n---\n\n## Repository Structure\n\n\n| Folder / File | Description |\n|----------------|-------------|\n| `environment.linux.yml` | Conda environment for Linux systems |\n| `environment.osx-arm64.yaml` | Conda environment for macOS (Apple Silicon) |\n| `src/instanexus/` | Core InstaNexus package (modules + CLI) |\n| `src/instanexus/__main__.py` | Entry point for CLI (`instanexus` command) |\n| `src/instanexus/script_dbg.py` | De Bruijn Graph-based assembly |\n| `src/instanexus/script_greedy.py` | Greedy-based peptide assembly |\n| `src/opt/` | Grid search and optimization workflows |\n| `fasta/` | FASTA reference and contaminant sequences |\n| `inputs/` | Example input CSV files |\n| `json/` | Metadata and parameter configuration files |\n| `notebooks/` | Jupyter notebooks for analysis and visualization |\n| `images/` | Logos and workflow figures |\n| `outputs/` | Generated results (created during execution) |\n\n---\n\n## Installation\n\n- [Conda](https://docs.conda.io/en/latest/)\n- [MMseqs2](https://github.com/soedinglab/MMseqs2)\n- [Clustal Omega](https://www.ebi.ac.uk/Tools/msa/clustalo/)\n\n> [!IMPORTANT]\n> MMseqs2 and Clustal Omega are available through Conda, but compatibility depends on your system architecture.\n> - \ud83d\udd0d [Clustal Omega on Anaconda.org](https://anaconda.org/search?q=clustalo)   \n\n---\n\n## Getting Started\n\nFollow these steps to clone the repository and set up the environment using Conda:\n\n### 1. Clone the repository\n\nTo clone and set up the environment:\n\n```bash\ngit clone git@github.com:Multiomics-Analytics-Group/InstaNexus.git\ncd instanexus\n```\n\n### 2. Create and activate the Conda environment\n\nCreate instanexus conda environment for linux.\n\n```bash\nconda env create -f environment.linux.yml\n```\n\nCreate instanexus conda environment for OS.\n\n```bash\nconda env create -f environment.osx-arm64.yaml\n```\n\nActivate:\n\n```bash\nconda activate instanexus\n```\n\n---\n\n### 3. Install InstaNexus as a local package\n\n```\npip install -e .\n```\n\nThen verify the CLI installation:\n\n```\ninstanexus --version\n```\n\n---\n\n## Command-line usage\n\nAfter activating the environment, you can run InstaNexus directly from the terminal:\n```bash\ninstanexus --help\n```\n\n### Run De Bruijn graph assembly\n\n```\ninstanexus dbg --input_csv inputs/sample.csv --chain light --folder_outputs outputs --reference\n```\n\n### Run greedy assembly\n\n```\ninstanexus greedy --input_csv inputs/sample.csv --folder_outputs outputs\n```\n\n\n\n\n---\n\n## Hyperparameter Optimization\n\nTo launch the hyperparameter grid search, run the following command from the project root (the folder containing ```src/``` and ```json/```):\n\n```bash\npython -m src.opt.gridsearch\n```\n**Adjusting Parameters**\n\nGrid search parameters for both the De Bruijn graph (dbg) and Greedy (greedy) assembly methods are defined in:\n\n```bash\njson/gridsearch_params.json\n```\n\nTo test more (or fewer) combinations, edit the arrays for each parameter in this file.\n\n## License\n\nThis project is licensed under the [MIT License](LICENSE).\n\n---\n\n## Acknowledgments\n\nInstaNexus was developed at **DTU Biosustain** and **DTU Bioengineering**.\n\nWe are grateful to the **DTU Bioengineering Proteomics Core Facility** for maintenance and operation of mass spectrometry instrumentation.\n\nWe also thank the **Informatics Platform at DTU Biosustain** for their support during the development and optimization of InstaNexus.\n\nSpecial thanks to the users and developers of:\n- [MMseqs2](https://github.com/soedinglab/MMseqs2)\n- [Clustal Omega](https://www.ebi.ac.uk/Tools/msa/clustalo/)\n\n---\n\n## References\n\n1. Hauser, M., et al. **MMseqs2: ultra fast and sensitive sequence searching**. *Nature Biotechnology* 35, 1026\u20131028 (2016). https://doi.org/10.1038/nbt.3988  \n2. Sievers, F., et al. **Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega**. *Molecular Systems Biology* 7, 539 (2011). https://doi.org/10.1038/msb.2011.75\n3. Eloff, K., Kalogeropoulos, K., Mabona, A., Morell, O., Catzel, R., Rivera-de-Torre, E., ... & Jenkins, T. P. (2025). **InstaNovo enables diffusion-powered de novo peptide sequencing in large-scale proteomics experiments.** Nature Machine Intelligence, 1-15.\n\n---\n\n## Citation\n\nIf you find this project useful in your research or work, please cite it as:\n\nReverenna M., Nielsen M. W., Wolff D. S., Lytra E., Colaianni P. D., Ljungars A., Laustsen A. H., Schoof E. M., Van Goey J., Jenkins T. P., Lukassen M. V., Santos A., Kalogeropoulos K. (2025). *Generalizable direct protein sequencing with InstaNexus* [Preprint]. bioRxiv. https://doi.org/10.1101/2025.07.25.666861\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "End-to-end workflow for de novo protein sequencing based on InstaNovo",
    "version": "0.1.0",
    "project_urls": {
        "Homepage": "https://github.com/Multiomics-Analytics-Group/InstaNexus",
        "Issues": "https://github.com/Multiomics-Analytics-Group/InstaNexus/issues"
    },
    "split_keywords": [
        "assembly",
        " bioinformatics",
        " de novo",
        " mass spectrometry",
        " protein sequencing",
        " proteomics"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "2c91aef2ded820d1b6ec944b10b78c1c3297c1b60e7dd74ee4617bda6b209193",
                "md5": "0e95807402d6e6ce01393213b67ca0ec",
                "sha256": "4e1d342c98494a4f05b28fd0ac5905524ce882b3d9dc11da0b604cbb36492859"
            },
            "downloads": -1,
            "filename": "instanexus-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "0e95807402d6e6ce01393213b67ca0ec",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 53829,
            "upload_time": "2025-10-16T13:39:23",
            "upload_time_iso_8601": "2025-10-16T13:39:23.600930Z",
            "url": "https://files.pythonhosted.org/packages/2c/91/aef2ded820d1b6ec944b10b78c1c3297c1b60e7dd74ee4617bda6b209193/instanexus-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "94a3b38478e8f9ac1451034466a457b6806738115a9bab9c9b1849b94ed29c83",
                "md5": "81af9e05399bd9c87a265718e863751d",
                "sha256": "d3e0ab1317daba4c3b22e61dbc04d632302aafd67a653b2586d87e2d5eaaa7ea"
            },
            "downloads": -1,
            "filename": "instanexus-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "81af9e05399bd9c87a265718e863751d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 9855254,
            "upload_time": "2025-10-16T13:40:20",
            "upload_time_iso_8601": "2025-10-16T13:40:20.677977Z",
            "url": "https://files.pythonhosted.org/packages/94/a3/b38478e8f9ac1451034466a457b6806738115a9bab9c9b1849b94ed29c83/instanexus-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-16 13:40:20",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Multiomics-Analytics-Group",
    "github_project": "InstaNexus",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "instanexus"
}
        
Elapsed time: 1.92489s