empathi


Nameempathi JSON
Version 1.0.3 PyPI version JSON
download
home_pageNone
SummaryAn embedding-based phage protein annotation tool by hierarchical assignment
upload_time2025-01-28 19:06:04
maintainerNone
docs_urlNone
authorAlexandre Boulay, Clovis Galiez, Elsa Rousseau
requires_python>=3.11.5
licenseNone
keywords bacteriophages bioinformatics phages protein functions
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
<span style="font-size:2em;">**Empathi**</span><br>
<span style="font-size:1.15em;">**Embedding-based Phage Protein Annotation Tool by Hierarchical Assignment**</span>


<!-- TABLE OF CONTENTS -->
<details>
  <summary>Table of Contents</summary>
  <ol>
    <li>
      <a href="#about-the-project">About the Project</a>
    </li>
    <li>
      <a href="#getting-started">Getting Started</a>
      <ul>
        <li><a href="#prerequisites">Prerequisites</a></li>
        <li><a href="#installation">Installation</a></li>
      </ul>
    </li>
    <li><a href="#usage">Usage details</a></li>
  </ol>
</details>

## About the Project

Empathi is a tool for the prediction of bacteriophage protein functions. It utilizes the highly informative ProtT5 
protein embeddings to make predictions. In addition, new functional groups were defined to be better suited for
machine-learning than the often-overlapping [PHROG](https://phrogs.lmge.uca.fr/) categories.

A preprint is available [here](https://doi.org/10.1101/2024.12.31.630607).


## Getting Started
Empathi has been packaged in [PyPI](https://pypi.org/project/empathi/) and as an 
[Apptainer container](https://cloud.sylabs.io/library/alexandreboulay/empathi/empathi) for ease of use. \
The source code can also be downloaded from [HuggingFace](https://huggingface.co/AlexandreBoulay/empathi).

### Prerequisites
The full list of dependencies and versions can be found in [requirements.txt](https://huggingface.co/AlexandreBoulay/EmPATHi/blob/main/requirements.txt).

Either git-lfs or Apptainer will be required. See instructions below.

Other dependencies are taken care of by pip and Apptainer.
```
python/3.11.5
joblib==1.2.0
numpy==1.26.4
pandas==2.2.1
torch==2.3.0
scipy==1.13.1
scikit-learn==1.5.0
transformers==4.43.1
sentencepiece==0.2.0
```


### Installation
There are three ways of installing Empathi: through PyPI, as an Apptainer container or as source code.


#### 1. PIP
First, create a virtual environment in python 3.11.5.
```
conda create -n empathi_env python=3.11.5
conda activate empathi_env
```

Download models for Empathi. 
You will need git-lfs: for WSL or linux use `sudo apt-get install git-lfs`, for windows either use git
[bash](https://git-scm.com/downloads) or get it from [here](https://github.com/git-lfs/git-lfs/releases). Then:
```
git lfs install
git clone https://huggingface.co/AlexandreBoulay/empathi
export PATH="/path/to/empathi/models:$PATH"
```

Install dependencies:
```
pip install empathi
```

Usage
```
python
from empathi import empathi
empathi.empathi("input_file", "name", output_folder="path/to/output")
```


#### 2. Apptainer
Download [Apptainer](https://apptainer.org/docs/admin/main/installation.html) or singularity. On windows, this will require a virtual machine. 
[WSL](https://learn.microsoft.com/en-us/windows/wsl/install) works well.

Fetch Empathi from [Sylabs](https://cloud.sylabs.io/library/alexandreboulay/empathi/empathi):
```
apptainer pull empathi.sif library://alexandreboulay/empathi/empathi
```

Usage
```
apptainer run empathi.sif path/to/input_file name
```


#### 3. From source code
First, create a virtual environment in python 3.11.5.
```
conda create -n empathi_env python=3.11.5
conda activate empathi_env
```

Clone the repo. 
You will need git-lfs: for WSL or linux use `sudo apt-get install git-lfs`, for windows either use git 
[bash](https://git-scm.com/downloads) or get it from [here](https://github.com/git-lfs/git-lfs/releases). Then:
```
git lfs install
git clone https://huggingface.co/AlexandreBoulay/empathi
```

Install dependencies:
```
cd empathi
pip install -r requirements.txt
```

Usage
```
python src/empathi/empathi.py input_file name
```

### Usage details
A fasta file of protein sequences or a csv file of protein embeddings can be used as input.

Specifying the option --only_embeddings will only compute embeddings. This step is much faster with a GPU.
The embeddings file can then be reinputted using the same command (without --only_embeddings) and specifying the new file as input file. 

Options:
 - input_file: Path to input file containing protein sequencs (.fa*) or protein embeddings (.pkl/.csv).
 - name: Name of file you want to save to (wOut extension). Should be different between runs to avoid overwriting files.
 - --models_folder: Path to folder containing EmPATHi models. Can be left unspecified if it was added to PATH earlier.
 - --only_embeddings: Whether to only calculate embeddings (no functional prediction).
 - --output_folder: Path to the output folder. Default is ./empathi_out/.
 - --mode: Which types of proteins you want to predict. Accepted arguments are "all", "pvp", "rbp", "lysin", "regulator"...

When launching from python omit the '--' in front of args.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "empathi",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.11.5",
    "maintainer_email": null,
    "keywords": "bacteriophages, bioinformatics, phages, protein functions",
    "author": "Alexandre Boulay, Clovis Galiez, Elsa Rousseau",
    "author_email": "Alexandre Boulay <alexandre.boulay.6@ulaval.ca>",
    "download_url": "https://files.pythonhosted.org/packages/42/14/22916c884368ff593d5d8eb8fb24c6b5d099fe70f6f06f4c48b6bbb34ab3/empathi-1.0.3.tar.gz",
    "platform": null,
    "description": "\n<span style=\"font-size:2em;\">**Empathi**</span><br>\n<span style=\"font-size:1.15em;\">**Embedding-based Phage Protein Annotation Tool by Hierarchical Assignment**</span>\n\n\n<!-- TABLE OF CONTENTS -->\n<details>\n  <summary>Table of Contents</summary>\n  <ol>\n    <li>\n      <a href=\"#about-the-project\">About the Project</a>\n    </li>\n    <li>\n      <a href=\"#getting-started\">Getting Started</a>\n      <ul>\n        <li><a href=\"#prerequisites\">Prerequisites</a></li>\n        <li><a href=\"#installation\">Installation</a></li>\n      </ul>\n    </li>\n    <li><a href=\"#usage\">Usage details</a></li>\n  </ol>\n</details>\n\n## About the Project\n\nEmpathi is a tool for the prediction of bacteriophage protein functions. It utilizes the highly informative ProtT5 \nprotein embeddings to make predictions. In addition, new functional groups were defined to be better suited for\nmachine-learning than the often-overlapping [PHROG](https://phrogs.lmge.uca.fr/) categories.\n\nA preprint is available [here](https://doi.org/10.1101/2024.12.31.630607).\n\n\n## Getting Started\nEmpathi has been packaged in [PyPI](https://pypi.org/project/empathi/) and as an \n[Apptainer container](https://cloud.sylabs.io/library/alexandreboulay/empathi/empathi) for ease of use. \\\nThe source code can also be downloaded from [HuggingFace](https://huggingface.co/AlexandreBoulay/empathi).\n\n### Prerequisites\nThe full list of dependencies and versions can be found in [requirements.txt](https://huggingface.co/AlexandreBoulay/EmPATHi/blob/main/requirements.txt).\n\nEither git-lfs or Apptainer will be required. See instructions below.\n\nOther dependencies are taken care of by pip and Apptainer.\n```\npython/3.11.5\njoblib==1.2.0\nnumpy==1.26.4\npandas==2.2.1\ntorch==2.3.0\nscipy==1.13.1\nscikit-learn==1.5.0\ntransformers==4.43.1\nsentencepiece==0.2.0\n```\n\n\n### Installation\nThere are three ways of installing Empathi: through PyPI, as an Apptainer container or as source code.\n\n\n#### 1. PIP\nFirst, create a virtual environment in python 3.11.5.\n```\nconda create -n empathi_env python=3.11.5\nconda activate empathi_env\n```\n\nDownload models for Empathi. \nYou will need git-lfs: for WSL or linux use `sudo apt-get install git-lfs`, for windows either use git\n[bash](https://git-scm.com/downloads) or get it from [here](https://github.com/git-lfs/git-lfs/releases). Then:\n```\ngit lfs install\ngit clone https://huggingface.co/AlexandreBoulay/empathi\nexport PATH=\"/path/to/empathi/models:$PATH\"\n```\n\nInstall dependencies:\n```\npip install empathi\n```\n\nUsage\n```\npython\nfrom empathi import empathi\nempathi.empathi(\"input_file\", \"name\", output_folder=\"path/to/output\")\n```\n\n\n#### 2. Apptainer\nDownload [Apptainer](https://apptainer.org/docs/admin/main/installation.html) or singularity. On windows, this will require a virtual machine. \n[WSL](https://learn.microsoft.com/en-us/windows/wsl/install) works well.\n\nFetch Empathi from [Sylabs](https://cloud.sylabs.io/library/alexandreboulay/empathi/empathi):\n```\napptainer pull empathi.sif library://alexandreboulay/empathi/empathi\n```\n\nUsage\n```\napptainer run empathi.sif path/to/input_file name\n```\n\n\n#### 3. From source code\nFirst, create a virtual environment in python 3.11.5.\n```\nconda create -n empathi_env python=3.11.5\nconda activate empathi_env\n```\n\nClone the repo. \nYou will need git-lfs: for WSL or linux use `sudo apt-get install git-lfs`, for windows either use git \n[bash](https://git-scm.com/downloads) or get it from [here](https://github.com/git-lfs/git-lfs/releases). Then:\n```\ngit lfs install\ngit clone https://huggingface.co/AlexandreBoulay/empathi\n```\n\nInstall dependencies:\n```\ncd empathi\npip install -r requirements.txt\n```\n\nUsage\n```\npython src/empathi/empathi.py input_file name\n```\n\n### Usage details\nA fasta file of protein sequences or a csv file of protein embeddings can be used as input.\n\nSpecifying the option --only_embeddings will only compute embeddings. This step is much faster with a GPU.\nThe embeddings file can then be reinputted using the same command (without --only_embeddings) and specifying the new file as input file. \n\nOptions:\n - input_file: Path to input file containing protein sequencs (.fa*) or protein embeddings (.pkl/.csv).\n - name: Name of file you want to save to (wOut extension). Should be different between runs to avoid overwriting files.\n - --models_folder: Path to folder containing EmPATHi models. Can be left unspecified if it was added to PATH earlier.\n - --only_embeddings: Whether to only calculate embeddings (no functional prediction).\n - --output_folder: Path to the output folder. Default is ./empathi_out/.\n - --mode: Which types of proteins you want to predict. Accepted arguments are \"all\", \"pvp\", \"rbp\", \"lysin\", \"regulator\"...\n\nWhen launching from python omit the '--' in front of args.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "An embedding-based phage protein annotation tool by hierarchical assignment",
    "version": "1.0.3",
    "project_urls": {
        "Homepage": "https://huggingface.co/AlexandreBoulay/EmPATHi"
    },
    "split_keywords": [
        "bacteriophages",
        " bioinformatics",
        " phages",
        " protein functions"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "43d1821c5687c6d26d02fa6779cd63a0df4b1d11eaf5c82020b79a4413e5c5f9",
                "md5": "66afda5d34544c8981c8c7d0fd501765",
                "sha256": "92aba3f4b9efe6c7cf4e905a7c7b66eb8f8cfe7f6509ee68ff0d93abef6d8f42"
            },
            "downloads": -1,
            "filename": "empathi-1.0.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "66afda5d34544c8981c8c7d0fd501765",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11.5",
            "size": 21906,
            "upload_time": "2025-01-28T19:06:02",
            "upload_time_iso_8601": "2025-01-28T19:06:02.303596Z",
            "url": "https://files.pythonhosted.org/packages/43/d1/821c5687c6d26d02fa6779cd63a0df4b1d11eaf5c82020b79a4413e5c5f9/empathi-1.0.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "421422916c884368ff593d5d8eb8fb24c6b5d099fe70f6f06f4c48b6bbb34ab3",
                "md5": "a6d2e05b3a3134d9e3dcd00ba9df761b",
                "sha256": "88e445d542325f5d3acab882b1765f4a36d737c79bfa2f85ed2af3b8388ae61a"
            },
            "downloads": -1,
            "filename": "empathi-1.0.3.tar.gz",
            "has_sig": false,
            "md5_digest": "a6d2e05b3a3134d9e3dcd00ba9df761b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11.5",
            "size": 32193,
            "upload_time": "2025-01-28T19:06:04",
            "upload_time_iso_8601": "2025-01-28T19:06:04.137405Z",
            "url": "https://files.pythonhosted.org/packages/42/14/22916c884368ff593d5d8eb8fb24c6b5d099fe70f6f06f4c48b6bbb34ab3/empathi-1.0.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-28 19:06:04",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "empathi"
}
        
Elapsed time: 0.61213s