sublyme


Namesublyme JSON
Version 1.2.1 PyPI version JSON
download
home_pageNone
SummaryA SUBLYME pipeline for Uncovering Bacteriophage Lysins in Metagenomic Datasets
upload_time2025-09-04 15:48:49
maintainerNone
docs_urlNone
authorAlexandre Boulay, Elsa Rousseau, Roberto Vazquez
requires_python>=3.11.5
licenseNone
keywords bacteriophage bioinformatics cell wall depolymerase endolysin lysin phage
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <h1 align="center">SUBLYME</h1>
<div align="center"> <strong>S</strong>oftware for <strong>U</strong>ncovering <strong>B</strong>acteriophage <strong>LY</strong>sins in <strong>ME</strong>tagenomic datasets</div>
<br>

<!-- TABLE OF CONTENTS -->
<details open>
  <summary>Table of Contents</summary>
  <ol>
    <li>
      <a href="#about-the-project">About the Project</a>
    </li>
    <li>
      <a href="#getting-started">Getting Started</a>
      <ul>
        <li><a href="#prerequisites">Prerequisites</a></li>
        <li><a href="#installation">Installation</a></li>
      </ul>
    </li>
    <li><a href="#usage-details">Usage details</a></li>
    <li><a href="#output-format">Output format</a></li>
  </ol>
</details>

## About the Project

SUBLYME is a tool to identify bacteriophage lysins. It utilizes the highly informative ProtT5
protein embeddings to make predictions and was trained using proteins in the [PHALP](https://phalp.ugent.be/) database.


## Getting started
SUBLYME has been packaged in [PyPI](https://pypi.org/project/sublyme/) for ease of use. The source code can be downloaded from [GitHub](https://github.com/Rousseau-Team/sublyme).


### Prerequisites
A GPU is recommended to compute embeddings for large datasets.

The full list of dependencies can be found in [requirements.txt](https://github.com/Rousseau-Team/sublyme/blob/main/requirements.txt).

Dependencies are taken care of by pip.
```
python/3.11.5
joblib==1.2.0
numpy==1.26.4
pandas==2.2.1
torch==2.3.0
scipy==1.13.1
scikit-learn==1.3.0
transformers==4.43.1
sentencepiece==0.2.0
```


### Installation

First create a virtual environment in python 3.11.5. For example:
```
conda create -n sublyme_env python=3.11.5
conda activate sublyme_env
```


**From pypi**:
```
pip install sublyme
```

Usage
```
sublyme test/input.faa -t 4
```

**From apptainer**:

Download [Apptainer](https://apptainer.org/docs/admin/main/installation.html) or singularity. On windows, this will require a virtual machine.
[WSL](https://learn.microsoft.com/en-us/windows/wsl/install) works well.

Fetch SUBLYME from  [Sylabs](https://cloud.sylabs.io/library/alexandre_boulay/sublyme/sublyme):
```
apptainer pull sublyme.sif library://alexandre_boulay/sublyme/sublyme
```

Usage
```
apptainer run sublyme.sif test/input.fa path/to/output_folder {protein|genome} nb_threads [--no-dedup]
```

The apptainer image accepts either protein or genomic sequences. 
If genomes are used as input, Prodigal will be run to determine coding sequences.
Proteins will be deduplicated using MMseqs unless specified otherwise (--no-dedup) and lysins will be predicted within the resulting set of proteins.
Arguments must be specified in the order they appear above.

The script outputs 2-4 files: 
 - genes.fna: genes predicted by Prodigal.
 - proteins.faa: proteins predicted by Prodigal.
 - proteins.csv: protein embeddings computed using ProtT5.
 - sublyme_predictions.csv: predictions obtained from sublyme.

**From source**:
```
git clone https://github.com/Rousseau-Team/sublyme.git
cd sublyme
pip install -r requirements.txt
```

ex. `python3 src/sublyme/sublyme.py test/input.faa -t 4 --models_folder src/sublyme/models`


### Usage details
A fasta file of protein sequences or a csv file of protein embeddings can be used as input.

Specifying the option --only_embeddings will only compute embeddings. This step is much faster with a GPU.
The embeddings file can then be reinputted using the same command (without --only_embeddings) and specifying the new file as input file.

Options:
- **input_file**:           Path to input file containing protein sequences (.fa*) or protein embeddings (.csv) that you wish to annotate.
- **--threads** (-t):       Number of threads (default 1).
- **--output_folder** (-o): Path to the output folder. Default folder is ./outputs/.
- **--models_folder** (-m): Path to folder containing pretrained models (lysin_miner.pkl, val_endo_clf.pkl). Default is src/sublyme/models.
- **--only_embeddings**:    Whether to only calculate embeddings (no lysin prediction).

### Output format
The output consists of a csv file with a column for the final prediction and one column each for probabilities associated to lysins, endolysins and VALs. 

Ex.
|            pred           |lysin|endolysin|VAL |
|---------------------------|-----|---------|----|
|      lysin\|endolysin     |0.98 |0.95     |0.05|
|             Na            |0.01 |Na       |Na  |

Note that the endolysin/VAL classifier is one multiclass classifier, implying that their probabilities will always add up to one and that the classifier will always assign one of these to be true.

Also, the endolysin/VAL classifier is only applied to proteins first predicted as being lysins (lysin proba >0.5).


            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "sublyme",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.11.5",
    "maintainer_email": null,
    "keywords": "bacteriophage, bioinformatics, cell wall depolymerase, endolysin, lysin, phage",
    "author": "Alexandre Boulay, Elsa Rousseau, Roberto Vazquez",
    "author_email": "Alexandre Boulay <alexandre.boulay.6@ulaval.ca>",
    "download_url": "https://files.pythonhosted.org/packages/e0/72/67f75a1654bbc1d32bbd69384a87a06bd59fdce471f687ab3a6e8b5b0d01/sublyme-1.2.1.tar.gz",
    "platform": null,
    "description": "<h1 align=\"center\">SUBLYME</h1>\n<div align=\"center\"> <strong>S</strong>oftware for <strong>U</strong>ncovering <strong>B</strong>acteriophage <strong>LY</strong>sins in <strong>ME</strong>tagenomic datasets</div>\n<br>\n\n<!-- TABLE OF CONTENTS -->\n<details open>\n  <summary>Table of Contents</summary>\n  <ol>\n    <li>\n      <a href=\"#about-the-project\">About the Project</a>\n    </li>\n    <li>\n      <a href=\"#getting-started\">Getting Started</a>\n      <ul>\n        <li><a href=\"#prerequisites\">Prerequisites</a></li>\n        <li><a href=\"#installation\">Installation</a></li>\n      </ul>\n    </li>\n    <li><a href=\"#usage-details\">Usage details</a></li>\n    <li><a href=\"#output-format\">Output format</a></li>\n  </ol>\n</details>\n\n## About the Project\n\nSUBLYME is a tool to identify bacteriophage lysins. It utilizes the highly informative ProtT5\nprotein embeddings to make predictions and was trained using proteins in the [PHALP](https://phalp.ugent.be/) database.\n\n\n## Getting started\nSUBLYME has been packaged in [PyPI](https://pypi.org/project/sublyme/) for ease of use. The source code can be downloaded from [GitHub](https://github.com/Rousseau-Team/sublyme).\n\n\n### Prerequisites\nA GPU is recommended to compute embeddings for large datasets.\n\nThe full list of dependencies can be found in [requirements.txt](https://github.com/Rousseau-Team/sublyme/blob/main/requirements.txt).\n\nDependencies are taken care of by pip.\n```\npython/3.11.5\njoblib==1.2.0\nnumpy==1.26.4\npandas==2.2.1\ntorch==2.3.0\nscipy==1.13.1\nscikit-learn==1.3.0\ntransformers==4.43.1\nsentencepiece==0.2.0\n```\n\n\n### Installation\n\nFirst create a virtual environment in python 3.11.5. For example:\n```\nconda create -n sublyme_env python=3.11.5\nconda activate sublyme_env\n```\n\n\n**From pypi**:\n```\npip install sublyme\n```\n\nUsage\n```\nsublyme test/input.faa -t 4\n```\n\n**From apptainer**:\n\nDownload [Apptainer](https://apptainer.org/docs/admin/main/installation.html) or singularity. On windows, this will require a virtual machine.\n[WSL](https://learn.microsoft.com/en-us/windows/wsl/install) works well.\n\nFetch SUBLYME from  [Sylabs](https://cloud.sylabs.io/library/alexandre_boulay/sublyme/sublyme):\n```\napptainer pull sublyme.sif library://alexandre_boulay/sublyme/sublyme\n```\n\nUsage\n```\napptainer run sublyme.sif test/input.fa path/to/output_folder {protein|genome} nb_threads [--no-dedup]\n```\n\nThe apptainer image accepts either protein or genomic sequences. \nIf genomes are used as input, Prodigal will be run to determine coding sequences.\nProteins will be deduplicated using MMseqs unless specified otherwise (--no-dedup) and lysins will be predicted within the resulting set of proteins.\nArguments must be specified in the order they appear above.\n\nThe script outputs 2-4 files: \n - genes.fna: genes predicted by Prodigal.\n - proteins.faa: proteins predicted by Prodigal.\n - proteins.csv: protein embeddings computed using ProtT5.\n - sublyme_predictions.csv: predictions obtained from sublyme.\n\n**From source**:\n```\ngit clone https://github.com/Rousseau-Team/sublyme.git\ncd sublyme\npip install -r requirements.txt\n```\n\nex. `python3 src/sublyme/sublyme.py test/input.faa -t 4 --models_folder src/sublyme/models`\n\n\n### Usage details\nA fasta file of protein sequences or a csv file of protein embeddings can be used as input.\n\nSpecifying the option --only_embeddings will only compute embeddings. This step is much faster with a GPU.\nThe embeddings file can then be reinputted using the same command (without --only_embeddings) and specifying the new file as input file.\n\nOptions:\n- **input_file**:           Path to input file containing protein sequences (.fa*) or protein embeddings (.csv) that you wish to annotate.\n- **--threads** (-t):       Number of threads (default 1).\n- **--output_folder** (-o): Path to the output folder. Default folder is ./outputs/.\n- **--models_folder** (-m): Path to folder containing pretrained models (lysin_miner.pkl, val_endo_clf.pkl). Default is src/sublyme/models.\n- **--only_embeddings**:    Whether to only calculate embeddings (no lysin prediction).\n\n### Output format\nThe output consists of a csv file with a column for the final prediction and one column each for probabilities associated to lysins, endolysins and VALs. \n\nEx.\n|            pred           |lysin|endolysin|VAL |\n|---------------------------|-----|---------|----|\n|      lysin\\|endolysin     |0.98 |0.95     |0.05|\n|             Na            |0.01 |Na       |Na  |\n\nNote that the endolysin/VAL classifier is one multiclass classifier, implying that their probabilities will always add up to one and that the classifier will always assign one of these to be true.\n\nAlso, the endolysin/VAL classifier is only applied to proteins first predicted as being lysins (lysin proba >0.5).\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A SUBLYME pipeline for Uncovering Bacteriophage Lysins in Metagenomic Datasets",
    "version": "1.2.1",
    "project_urls": {
        "Homepage": "https://github.com/Rousseau-Team/sublyme"
    },
    "split_keywords": [
        "bacteriophage",
        " bioinformatics",
        " cell wall depolymerase",
        " endolysin",
        " lysin",
        " phage"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "686ccdfe5abfa6d621a8d8499dd4ec33d3b755e67b4c393c6880f8df8ef0310b",
                "md5": "3034ef8b3db7a97a00ad6bd7f4fdb6b3",
                "sha256": "2c11415eed838ac9cf358b46c2e956774dc9c99fe730250430f3b984a8a5f5db"
            },
            "downloads": -1,
            "filename": "sublyme-1.2.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3034ef8b3db7a97a00ad6bd7f4fdb6b3",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11.5",
            "size": 13949317,
            "upload_time": "2025-09-04T15:48:42",
            "upload_time_iso_8601": "2025-09-04T15:48:42.245481Z",
            "url": "https://files.pythonhosted.org/packages/68/6c/cdfe5abfa6d621a8d8499dd4ec33d3b755e67b4c393c6880f8df8ef0310b/sublyme-1.2.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "e07267f75a1654bbc1d32bbd69384a87a06bd59fdce471f687ab3a6e8b5b0d01",
                "md5": "84b3169e13a5f7e3d23702d80aaffb05",
                "sha256": "c7cf263e037183479fb46a7a9036a203a0a4bedaad7a6e8cc53e00f144cd7b26"
            },
            "downloads": -1,
            "filename": "sublyme-1.2.1.tar.gz",
            "has_sig": false,
            "md5_digest": "84b3169e13a5f7e3d23702d80aaffb05",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11.5",
            "size": 13883141,
            "upload_time": "2025-09-04T15:48:49",
            "upload_time_iso_8601": "2025-09-04T15:48:49.253732Z",
            "url": "https://files.pythonhosted.org/packages/e0/72/67f75a1654bbc1d32bbd69384a87a06bd59fdce471f687ab3a6e8b5b0d01/sublyme-1.2.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-04 15:48:49",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Rousseau-Team",
    "github_project": "sublyme",
    "github_not_found": true,
    "lcname": "sublyme"
}
        
Elapsed time: 2.05574s