# Py Protein Inference
**PyProteinInference** is a Python package for running various protein inference algorithms on tandem mass spectrometry search results and generating protein to peptide mappings with protein level false discovery rates..
## Key Features
* **Protein Inference and Scoring**:
* Maps peptides to proteins.
* Generates protein scores from provided PSMs.
* Calculates set-based protein-level false discovery rates for MS data filtering.
* **Supported Input Formats**:
* Search Result File Types: __idXML__, __mzIdentML__, or __pepXML__.
* PSM files from [Percolator](https://github.com/percolator/percolator).
* Custom tab-delimited files.
* **Output**:
* User-friendly CSV file containing Proteins, Peptides, q-values, and Protein Scores.
* **Supported Inference Procedures**:
* Parsimony - Returns the Minimal set of proteins based on the input peptides.
* Exclusion - Removes all non-distinguishing peptides on the protein level.
* Inclusion - Returns all possible proteins.
* Peptide Centric - Returns protein groups based on peptide assignments.
## Requirements
1. __Python 3.9__ or greater.
2. __Python Packages__:
__numpy__, __pyteomics__, __pulp__, __PyYAML__, __matplotlib__, __pyopenms__, __lxml__, __tqdm__, __pywebview__, __nicegui__. These should be installed automatically during installation.
## Quick Start Guide
### Install the package using pip
```shell
pip install pyproteininference
```
### Running the command line tool
To run the CLI tool either call `protein_inference_cli.py` like so:
```shell
protein_inference_cli.py --help
```
Or call the script while also calling your python interpreter
First, locate the script that gets installed on installation:
```shell
which protein_inference_cli.py
/path/to/venv/bin/protein_inference_cli.py
```
Then, call the script while also calling your python interpreter
```shell
python /path/to/venv/bin/protein_inference_cli.py --help
```
Optionally, download the protein_inference_cli.py file from the github repo here:
https://github.com/thinkle12/pyproteininference/blob/master/scripts/protein_inference_cli.py
And then call the script while also calling the pyton interpreter as shown above
### Running the graphical user interface
To run the GUI tool either call `protein_inference_gui.py` like so:
```shell
protein_inference_gui.py
```
Or again, call the script while also calling your python interpreter
First, locate the script that gets installed on installation:
```shell
which protein_inference_gui.py
/path/to/venv/bin/protein_inference_gui.py
```
Then, call the script while also calling your python interpreter
```shell
python /path/to/venv/bin/protein_inference_gui.py
```
Again, you can optionally download the protein_inference_gui.py file from the github repo here:
https://github.com/thinkle12/pyproteininference/blob/master/scripts/protein_inference_gui.py
And then call the script while also calling the pyton interpreter as shown above
### Executables
You can also download a standalone executable version of the GUI for both Windows and macOS from the releases page on GitHub:
https://github.com/thinkle12/pyproteininference/releases
When launching the GUI's from the executables please wait until for the user interface to pop up. It usually takes a minute or so.
## More Options for calling the CLI
1. Run the standard command line from an idXML file
```shell
protein_inference_cli.py \
-f /path/to/target/file.idXML \
-db /path/to/database/file.fasta \
-y /path/to/params.yaml
```
2. Run the standard command line from an mzIdentML file
```shell
protein_inference_cli.py \
-f /path/to/target/file.mzid \
-db /path/to/database/file.fasta \
-y /path/to/params.yaml
```
3. Run the standard command line from a pepXML file
```shell
protein_inference_cli.py \
-f /path/to/target/file.pep.xml \
-db /path/to/database/file.fasta \
-y /path/to/params.yaml
```
4. Run the standard command line tool with tab delimited results directly from percolator to run a particular inference method. By default, peptide centric inference is selected if a parameter file is not specified:
```shell
protein_inference_cli.py \
-t /path/to/target/file.txt \
-d /path/to/decoy/file.txt \
-db /path/to/database/file.fasta
```
5. Specifying Parameters.
The two most common parameters to change are the inference type, and the decoy symbol (for identifying decoy proteins vs target proteins).
The parameters can be quickly altered by creating a file called params.yaml as follows:
```yaml
parameters:
inference:
inference_type: parsimony
identifiers:
decoy_symbol: "decoy_"
```
The inference type can be one of: `parsimony`, `peptide_centric`, `inclusion`, `exclusion`, or `first_protein`.
All parameters are optional, so you only need to define the ones you want to alter. Parameters that are not defined are set to default values.
See the package documentation for the default parameters.
6. Run the standard command line tool again, this time specifying the parameters as above:
```shell
protein_inference_cli.py \
-t /path/to/target/file.txt \
-d /path/to/decoy/file.txt \
-db /path/to/database/file.fasta \
-y /path/to/params.yaml
```
7. Running with docker
- Either Pull the image from docker hub:
- `docker pull hinklet/pyproteininference:latest`
- Or Build the image with the following command (After having cloned the repository):
- `git clone REPOSITORY_URL`
- `cd pyproteininference`
- `docker build -t pyproteininference:latest .`
- Run the tool, making sure to volume mount in the directory with your input data and parameters. In the case below, that local directory would be `/path/to/local/directory` and the path in the container is `/data`
```shell
docker run -v /path/to/local/directory/:/data \
-it hinklet/pyproteininference:latest \
python /usr/local/bin/protein_inference_cli.py \
-f /data/input_file.txt \
-db /data/database.fasta \
-y /data/parameters.yaml \
-o /data/
```
## Building the Bundled Application Package using PyInstaller
_Note: This is only necessary if you want to build the application package yourself. The package is already available on
PyPi and can be installed using pip, or bundled executables can be downloaded from the releases page on
GitHub (https://thinkle12.github.io/pyproteininference/)._
1. After cloning the source code repository, create a new Python virtual environment under the project directory:
```shell
python -m venv venv
```
2. Activate the virtual environment:
```shell
source venv/bin/activate
```
3. Install the required packages:
```shell
pip install -r requirements.txt pyinstaller==6.11.1
```
4. Run the PyInstaller command to build the executable:
```shell
pyinstaller pyProteinInference.spec
```
5. The executable will be located in the `dist` directory.
## Documentation
For more information please see the full package documentation (https://thinkle12.github.io/pyproteininference/).
Raw data
{
"_id": null,
"home_page": "https://github.com/thinkle12/pyproteininference",
"name": "pyproteininference",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "proteininference",
"author": "Trent Hinkle",
"author_email": "hinklet@gene.com",
"download_url": "https://files.pythonhosted.org/packages/11/db/65f81f00dbb7ffa2f70919379c187efdec74220fa343290925110a06fc24/pyproteininference-1.1.1.tar.gz",
"platform": null,
"description": "\ufeff# Py Protein Inference\n\n**PyProteinInference** is a Python package for running various protein inference algorithms on tandem mass spectrometry search results and generating protein to peptide mappings with protein level false discovery rates.. \n\n## Key Features\n\n* **Protein Inference and Scoring**:\n * Maps peptides to proteins. \n * Generates protein scores from provided PSMs. \n * Calculates set-based protein-level false discovery rates for MS data filtering. \n* **Supported Input Formats**:\n * Search Result File Types: __idXML__, __mzIdentML__, or __pepXML__. \n * PSM files from [Percolator](https://github.com/percolator/percolator).\n * Custom tab-delimited files. \n* **Output**:\n * User-friendly CSV file containing Proteins, Peptides, q-values, and Protein Scores. \n\n* **Supported Inference Procedures**:\n * Parsimony - Returns the Minimal set of proteins based on the input peptides.\n * Exclusion - Removes all non-distinguishing peptides on the protein level.\n * Inclusion - Returns all possible proteins.\n * Peptide Centric - Returns protein groups based on peptide assignments.\n\n## Requirements\n\n 1. __Python 3.9__ or greater. \n 2. __Python Packages__:\n\t__numpy__, __pyteomics__, __pulp__, __PyYAML__, __matplotlib__, __pyopenms__, __lxml__, __tqdm__, __pywebview__, __nicegui__. These should be installed automatically during installation.\n\t\t\n## Quick Start Guide\n### Install the package using pip\n```shell\npip install pyproteininference\n```\n\n### Running the command line tool\n\nTo run the CLI tool either call `protein_inference_cli.py` like so:\n```shell\nprotein_inference_cli.py --help\n```\n\nOr call the script while also calling your python interpreter\n\nFirst, locate the script that gets installed on installation:\n```shell\nwhich protein_inference_cli.py\n/path/to/venv/bin/protein_inference_cli.py\n```\n\nThen, call the script while also calling your python interpreter\n```shell\npython /path/to/venv/bin/protein_inference_cli.py --help\n```\n\nOptionally, download the protein_inference_cli.py file from the github repo here:\nhttps://github.com/thinkle12/pyproteininference/blob/master/scripts/protein_inference_cli.py\n\nAnd then call the script while also calling the pyton interpreter as shown above\n\n### Running the graphical user interface\n\nTo run the GUI tool either call `protein_inference_gui.py` like so:\n```shell\nprotein_inference_gui.py\n```\n\nOr again, call the script while also calling your python interpreter\n\nFirst, locate the script that gets installed on installation:\n```shell\nwhich protein_inference_gui.py\n/path/to/venv/bin/protein_inference_gui.py\n```\n\nThen, call the script while also calling your python interpreter\n```shell\npython /path/to/venv/bin/protein_inference_gui.py\n```\n\nAgain, you can optionally download the protein_inference_gui.py file from the github repo here:\nhttps://github.com/thinkle12/pyproteininference/blob/master/scripts/protein_inference_gui.py\n\nAnd then call the script while also calling the pyton interpreter as shown above\n\n### Executables\n\nYou can also download a standalone executable version of the GUI for both Windows and macOS from the releases page on GitHub:\nhttps://github.com/thinkle12/pyproteininference/releases\n\nWhen launching the GUI's from the executables please wait until for the user interface to pop up. It usually takes a minute or so.\n\n## More Options for calling the CLI\n\n1. Run the standard command line from an idXML file \n```shell\nprotein_inference_cli.py \\\n-f /path/to/target/file.idXML \\\n-db /path/to/database/file.fasta \\\n-y /path/to/params.yaml\n```\n \n2. Run the standard command line from an mzIdentML file \n```shell\nprotein_inference_cli.py \\\n-f /path/to/target/file.mzid \\\n-db /path/to/database/file.fasta \\\n-y /path/to/params.yaml\n```\n \n3. Run the standard command line from a pepXML file \n```shell\nprotein_inference_cli.py \\\n-f /path/to/target/file.pep.xml \\\n-db /path/to/database/file.fasta \\\n-y /path/to/params.yaml\n```\n\n4. Run the standard command line tool with tab delimited results directly from percolator to run a particular inference method. By default, peptide centric inference is selected if a parameter file is not specified:\n```shell\nprotein_inference_cli.py \\\n-t /path/to/target/file.txt \\\n-d /path/to/decoy/file.txt \\\n-db /path/to/database/file.fasta \n```\n\n5. Specifying Parameters. \nThe two most common parameters to change are the inference type, and the decoy symbol (for identifying decoy proteins vs target proteins).\nThe parameters can be quickly altered by creating a file called params.yaml as follows:\n```yaml\nparameters:\n inference:\n inference_type: parsimony\n identifiers:\n decoy_symbol: \"decoy_\"\n```\nThe inference type can be one of: `parsimony`, `peptide_centric`, `inclusion`, `exclusion`, or `first_protein`.\nAll parameters are optional, so you only need to define the ones you want to alter. Parameters that are not defined are set to default values.\nSee the package documentation for the default parameters.\n\n6. Run the standard command line tool again, this time specifying the parameters as above:\n```shell\nprotein_inference_cli.py \\\n-t /path/to/target/file.txt \\\n-d /path/to/decoy/file.txt \\\n-db /path/to/database/file.fasta \\\n-y /path/to/params.yaml\n```\n\n7. Running with docker\n\t- Either Pull the image from docker hub:\n\t\t- `docker pull hinklet/pyproteininference:latest`\n\t- Or Build the image with the following command (After having cloned the repository):\n\t \t- `git clone REPOSITORY_URL`\n\t \t- `cd pyproteininference`\n\t\t- `docker build -t pyproteininference:latest .`\n\t- Run the tool, making sure to volume mount in the directory with your input data and parameters. In the case below, that local directory would be `/path/to/local/directory` and the path in the container is `/data`\n\t ```shell\n\t docker run -v /path/to/local/directory/:/data \\\n\t -it hinklet/pyproteininference:latest \\\n\t python /usr/local/bin/protein_inference_cli.py \\\n\t -f /data/input_file.txt \\\n\t -db /data/database.fasta \\\n\t -y /data/parameters.yaml \\\n\t -o /data/\n\t ```\n\n## Building the Bundled Application Package using PyInstaller\n_Note: This is only necessary if you want to build the application package yourself. The package is already available on\nPyPi and can be installed using pip, or bundled executables can be downloaded from the releases page on \nGitHub (https://thinkle12.github.io/pyproteininference/)._\n\n1. After cloning the source code repository, create a new Python virtual environment under the project directory:\n```shell\npython -m venv venv\n```\n2. Activate the virtual environment:\n```shell\nsource venv/bin/activate\n```\n3. Install the required packages:\n```shell\npip install -r requirements.txt pyinstaller==6.11.1\n```\n4. Run the PyInstaller command to build the executable:\n```shell\npyinstaller pyProteinInference.spec\n```\n5. The executable will be located in the `dist` directory.\n\n\n\n## Documentation\nFor more information please see the full package documentation (https://thinkle12.github.io/pyproteininference/).\n\n\n\n",
"bugtrack_url": null,
"license": "Apache-2",
"summary": "Python Package for running custom protein inference algorithms on tab-formatted tandem MS/MS search results.",
"version": "1.1.1",
"project_urls": {
"Homepage": "https://github.com/thinkle12/pyproteininference"
},
"split_keywords": [
"proteininference"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "ab21c8c1503d15d9c54415f749edc20e67b63e5476f366afd978711e0b71d0d5",
"md5": "0f00dd6d0e7b4c1e4bf80115c0aedfa4",
"sha256": "605db8b50a1f489682c872d57799b21b07957a741c936556b7bfe5d1b09bfde8"
},
"downloads": -1,
"filename": "pyproteininference-1.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "0f00dd6d0e7b4c1e4bf80115c0aedfa4",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 78998,
"upload_time": "2025-01-19T20:52:58",
"upload_time_iso_8601": "2025-01-19T20:52:58.052363Z",
"url": "https://files.pythonhosted.org/packages/ab/21/c8c1503d15d9c54415f749edc20e67b63e5476f366afd978711e0b71d0d5/pyproteininference-1.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "11db65f81f00dbb7ffa2f70919379c187efdec74220fa343290925110a06fc24",
"md5": "0c26c3effd5e8d03c6123795315a3769",
"sha256": "498dc46a23300599c7fc86d3f307567c3eee9ad4f0d69f70035ea7b227f0ec41"
},
"downloads": -1,
"filename": "pyproteininference-1.1.1.tar.gz",
"has_sig": false,
"md5_digest": "0c26c3effd5e8d03c6123795315a3769",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 660411,
"upload_time": "2025-01-19T20:53:00",
"upload_time_iso_8601": "2025-01-19T20:53:00.097021Z",
"url": "https://files.pythonhosted.org/packages/11/db/65f81f00dbb7ffa2f70919379c187efdec74220fa343290925110a06fc24/pyproteininference-1.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-19 20:53:00",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "thinkle12",
"github_project": "pyproteininference",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "numpy",
"specs": [
[
">=",
"1.19.2"
],
[
"<",
"2.0.0"
]
]
},
{
"name": "pyteomics",
"specs": [
[
">=",
"4.7.5"
],
[
"<",
"5.0.0"
]
]
},
{
"name": "pulp",
"specs": [
[
">=",
"2.8"
],
[
"<",
"3.0.0"
]
]
},
{
"name": "PyYAML",
"specs": [
[
"<",
"6.0.2"
],
[
">=",
"6.0.0"
]
]
},
{
"name": "matplotlib",
"specs": [
[
"<",
"4.0.0"
],
[
">=",
"3.3.4"
]
]
},
{
"name": "lxml",
"specs": [
[
"<",
"6.0.0"
],
[
">=",
"5.3.0"
]
]
},
{
"name": "nicegui",
"specs": [
[
">=",
"2.8.0"
],
[
"<",
"3.0.0"
]
]
},
{
"name": "pyopenms",
"specs": [
[
"<",
"4.0.0"
],
[
">=",
"3.2.0"
]
]
},
{
"name": "pywebview",
"specs": [
[
"<",
"6.0.0"
],
[
">=",
"5.3.0"
]
]
},
{
"name": "tqdm",
"specs": [
[
"<",
"5.0.0"
],
[
">=",
"4.67.0"
]
]
}
],
"tox": true,
"lcname": "pyproteininference"
}