# AlphaPeptDeep (PeptDeep)
[![Default installation and tests](https://github.com/MannLabs/alphapeptdeep/actions/workflows/pip_installation.yml/badge.svg)](https://github.com/MannLabs/alphapeptdeep/actions/workflows/pip_installation.yml)
[![Publish on PyPi and release on GitHub](https://github.com/MannLabs/alphapeptdeep/actions/workflows/publish_and_release.yml/badge.svg)](https://github.com/MannLabs/alphapeptdeep/actions/workflows/publish_and_release.yml)
[![Documentation Status](https://readthedocs.org/projects/alphapeptdeep/badge/?version=latest)](https://alphapeptdeep.readthedocs.io/en/latest/?badge=latest)
[![pypi](https://img.shields.io/pypi/v/peptdeep)](https://pypi.org/project/peptdeep)
[![GitHub release](https://img.shields.io/github/v/release/mannlabs/alphapeptdeep?display_name=tag)](https://github.com/MannLabs/alphapeptdeep/releases)
[![GitHub downloads](https://img.shields.io/github/downloads/mannlabs/alphapeptdeep/total?label=github%20downloads)](https://github.com/MannLabs/alphapeptdeep/releases)
[![Downloads@pre-train-models](https://img.shields.io/github/downloads/mannlabs/alphapeptdeep/pre-trained-models/total)](https://github.com/MannLabs/alphapeptdeep/releases/tag/pre-trained-models)
[![pip downloads](https://img.shields.io/pypi/dm/peptdeep?color=blue&label=pip%20downloads)](https://pypi.org/project/peptdeep)
![Python](https://img.shields.io/pypi/pyversions/peptdeep)
- [**About**](#about)
- [**License**](#license)
- [**Installation**](#installation)
- [**One-click GUI**](#one-click-gui)
- [**Pip installer**](#pip)
- [**Use GPU**](#use-gpu)
- [**Developer installer**](#developer)
- [**Usage**](#usage)
- [**GUI**](#gui)
- [**CLI**](#cli)
- [**Python and jupyter notebooks**](#python-and-jupyter-notebooks)
- [**Troubleshooting**](#troubleshooting)
- [**Citations**](#citations)
- [**How to contribute**](#how-to-contribute)
- [**Changelog**](#changelog)
------------------------------------------------------------------------
## About
AlphaPeptDeep (`peptdeep` for short) aims to easily build new deep
learning models for shotgun proteomics studies. Transfer learning is
also easy to apply using AlphaPeptDeep.
It contains some built-in models such as retention time (RT), collision
cross section (CCS), and tandem mass spectrum (MS2) prediction for given
peptides. With these models, one can easily generate a predicted library
from fasta files.
For details, check out our [publications](#citations).
For documentation, see [readthedocs](https://alphapeptdeep.readthedocs.io/en/latest/).
### AlphaX repositories:
- [**alphabase**](https://github.com/MannLabs/alphabase): Infrastructure for AlphaX Ecosystem
- [**alphapept**](https://github.com/MannLabs/alphapept): DDA search
engine
- [**alphapeptdeep**](https://github.com/MannLabs/alphapeptdeep): Deep
learning for proteomics
- [**alpharaw**](https://github.com/MannLabs/alpharaw): Raw data
accessing
- [**alphaviz**](https://github.com/MannLabs/alphaviz): MS data and
result visualization
- [**alphatims**](https://github.com/MannLabs/alphatims): timsTOF data
accessing
### Subsequent projects of AlphaPeptDeep
- [**peptdeep_hla**](https://github.com/MannLabs/PeptDeep-HLA): the DL model that predict if a peptide is presented by indivudual HLA or not.
### Other pre-trained MS2/RT/CCS models
- [**Dimethyl**](https://github.com/MannLabs/alphapeptdeep/releases/tag/dimethyl-models): the MS2/RT/CCS models for Dimethyl-labeled peptides.
------------------------------------------------------------------------
## Citations
Wen-Feng Zeng, Xie-Xuan Zhou, Sander Willems, Constantin Ammar, Maria Wahle, Isabell Bludau, Eugenia Voytik, Maximillian T. Strauss & Matthias Mann. AlphaPeptDeep: a modular deep learning framework to predict peptide properties for proteomics. Nat Commun 13, 7238 (2022). https://doi.org/10.1038/s41467-022-34904-3
------------------------------------------------------------------------
## License
AlphaPeptDeep was developed by the [Mann Labs at the Max Planck
Institute of Biochemistry](https://www.biochem.mpg.de/mann) and the
[University of
Copenhagen](https://www.cpr.ku.dk/research/proteomics/mann/) and is
freely available with an [Apache License](LICENSE.txt). External Python
packages (available in the [requirements](requirements) folder) have
their own licenses, which can be consulted on their respective websites.
------------------------------------------------------------------------
## Installation
AlphaPeptDeep can be installed and used on all major operating systems
(Windows, macOS and Linux).
There are three different types of installation possible:
- [**One-click GUI installer:**](#one-click-gui) Choose this
installation if you only want the GUI and/or keep things as simple as
possible.
- [**Pip installer:**](#pip) Choose this installation if you want to use peptdeep as a Python package in an existing Python (recommended Python 3.8 or 3.9) environment (e.g. a Jupyter notebook). If needed, the GUI and CLI
can be installed with pip as well.
- [**Developer installer:**](#developer) Choose this installation if you
are familiar with CLI tools, [conda](https://docs.conda.io/en/latest/)
and Python. This installation allows access to all available features
of peptdeep and even allows to modify its source code directly.
Generally, the developer version of peptdeep outperforms the
precompiled versions which makes this the installation of choice for
high-throughput experiments.
### One-click GUI
The GUI of peptdeep is a completely stand-alone tool that requires no
knowledge of Python or CLI tools. Click on one of the links below to
download the latest release for:
- [**Windows**](https://github.com/MannLabs/alphapeptdeep/releases/latest/download/peptdeep-1.3.0-windows-amd64.exe)
- [**macOS**](https://github.com/MannLabs/alphapeptdeep/releases/latest/download/peptdeep-1.3.0-macos-darwin-x64.pkg)
- [**macOS ARM**](https://github.com/MannLabs/alphapeptdeep/releases/latest/download/peptdeep-1.3.0-macos-darwin-arm64.pkg )
- [**Linux**](https://github.com/MannLabs/alphapeptdeep/releases/latest/download/peptdeep-1.3.0-linux-x64.deb)
Older releases remain available on the [release
page](https://github.com/MannLabs/alphapeptdeep/releases), but no
backwards compatibility is guaranteed.
Note that, as GitHub does not allow large release files, these installers do not have GPU support.
To create GPU version installers: clone the source code, install the GPU-version of pytorch [see here](#use-gpu),
and then use the `build_installer_*.sh` and `build_package_*.sh`
script in the respective `release/[macos, linux, windows]` folder to build the installer locally.
For Linux you need to additionally pass the "GPU" flag, i.e. run
```bash
release/linux/build_installer_linux.sh GPU
release/linux/build_package_linux.sh
```
### Pip
> PythonNET must be installed to access Thermo or Sciex raw data.
>
> *Legacy, should be replaced by AlphaRaw in the near future.*
>
> #### PythonNET in Windows
>
> Automatically installed for Windows.
>
> #### PythonNET in Linux
>
> 1. Install Mono from mono-project website [Mono
> Linux](https://www.mono-project.com/download/stable/#download-lin).
> NOTE, the installed mono version should be at least 6.10, which
> requires you to add the ppa to your trusted sources!
> 2. Install PythonNET with `pip install pythonnet`.
>
> #### PythonNET in MacOS
>
> 1. Install [brew](https://brew.sh) and pkg-config:
> `brew install pkg-config` 3. Install Mono from mono-project
> website [Mono Mac](https://www.mono-project.com/download/stable/)
> 2. Register the Mono-Path to your system: For macOS Catalina, open
> the configuration of zsh via the terminal:
>
> - Type `nano ~/.zshrc` to open the configuration of the terminal
> - Append the mono path to your `PKG_CONFIG_PATH`:
> `export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig:/usr/lib/pkgconfig:/Library/Frameworks/Mono.framework/Versions/Current/lib/pkgconfig:$PKG_CONFIG_PATH`.
> - Save everything and execute `. ~/.zshrc`
>
> 3. Install PythonNET with `pip install pythonnet`.
peptdeep can be installed in an existing Python environment with a
single `bash` command. *This `bash` command can also be run directly
from within a Jupyter notebook by prepending it with a `!`*:
``` bash
pip install peptdeep
```
Installing peptdeep like this avoids conflicts when integrating it in
other tools, as this does not enforce strict versioning of dependancies.
However, if new versions of dependancies are released, they are not
guaranteed to be fully compatible with peptdeep. This should only occur
in rare cases where dependencies are not backwards compatible.
> **TODO** You can always force peptdeep to use dependancy versions
> which are known to be compatible with:
>
> ``` bash
> pip install "peptdeep[stable]"
> ```
>
> NOTE: You might need to run `pip install pip` before installing
> peptdeep like this. Also note the double quotes `"`.
For those who are really adventurous, it is also possible to directly
install any branch (e.g. `@development`) with any extras
(e.g. `#egg=peptdeep[stable,development-stable]`) from GitHub with e.g.
``` bash
pip install "git+https://github.com/MannLabs/alphapeptdeep.git@development#egg=peptdeep[stable,development-stable]"
```
### Use GPU
To enable GPU, GPU version of PyTorch is required, it can be installed
with:
``` bash
pip install torch --extra-index-url https://download.pytorch.org/whl/cu116 --upgrade
```
Note that this may depend on your NVIDIA driver version. Run the command
to check your NVIDIA driver:
``` bash
nvidia-smi
```
For latest pytorch version, see [pytorch.org](https://pytorch.org/get-started/locally/).
### Developer
peptdeep can also be installed in editable (i.e. developer) mode with a
few `bash` commands. This allows to fully customize the software and
even modify the source code to your specific needs. When an editable
Python package is installed, its source code is stored in a transparent
location of your choice. While optional, it is advised to first (create
and) navigate to e.g. a general software folder:
``` bash
mkdir ~/alphapeptdeep/project/folder
cd ~/alphapeptdeep/project/folder
```
***The following commands assume you do not perform any additional `cd`
commands anymore***.
Next, download the peptdeep repository from GitHub either directly or
with a `git` command. This creates a new peptdeep subfolder in your
current directory.
``` bash
git clone https://github.com/MannLabs/alphapeptdeep.git
```
For any Python package, it is highly recommended to use a separate
[conda virtual environment](https://docs.conda.io/en/latest/), as
otherwise *dependancy conflicts can occur with already existing
packages*.
``` bash
conda create --name peptdeep python=3.9 -y
conda activate peptdeep
```
Finally, peptdeep and all its [dependancies](requirements) need to be
installed. To take advantage of all features and allow development (with
the `-e` flag), this is best done by also installing the [development
dependencies](requirements/requirements_development.txt) instead of only
the [core dependencies](requirements/requirements.txt):
``` bash
pip install -e ".[development]"
```
By default this installs loose dependancies (no explicit versioning),
although it is also possible to use stable dependencies
(e.g. `pip install -e ".[stable,development-stable]"`).
***By using the editable flag `-e`, all modifications to the [peptdeep
source code folder](peptdeep) are directly reflected when running
peptdeep. Note that the peptdeep folder cannot be moved and/or renamed
if an editable version is installed. In case of confusion, you can
always retrieve the location of any Python module with e.g. the command
`import module` followed by `module.__file__`.***
------------------------------------------------------------------------
## Usage
There are three ways to use peptdeep:
- [**GUI**](#gui)
- [**CLI**](#cli)
- [**Python**](#python-and-jupyter-notebooks)
NOTE: The first time you use a fresh installation of peptdeep, it is
often quite slow because some functions might still need compilation on
your local operating system and architecture. Subsequent use should be a
lot faster.
### GUI
If the GUI was not installed through a one-click GUI installer, it can
be launched with the following `bash` command:
``` bash
peptdeep gui
```
This command will start a web server and automatically open the default
browser:
![](https://user-images.githubusercontent.com/4646029/189301730-ac1f92cc-0e9d-4ba3-be1d-07c4d66032cd.jpg)
There are several options in the GUI (left panel):
- Server: Start/stop the task server, check tasks in the task queue
- Settings: Configure common settings, load/save current settings
- Model: Configure DL models for prediction or transfer learning
- Transfer: Refine the models
- Library: Predict a library
- Rescore: Perform ML feature extraction and Percolator
------------------------------------------------------------------------
### CLI
The CLI can be run with the following command (after activating the
`conda` environment with `conda activate peptdeep` or if an alias was
set to the peptdeep executable):
``` bash
peptdeep -h
```
It is possible to get help about each function and their (required)
parameters by using the `-h` flag. AlphaPeptDeep provides several
commands for different tasks:
- [**export-settings**](#export-settings)
- [**cmd-flow**](#cmd-flow)
- [**library**](#library)
- [**transfer**](#transfer)
- [**rescore**](#rescore)
- [**install-models**](#install-models)
- [**gui**](#gui)
Run a command to check usages:
``` bash
peptdeep $command -h
```
For example:
``` bash
peptdeep library -h
```
#### export-settings
``` bash
peptdeep export-settings C:/path/to/settings.yaml
```
This command will export the default settings into the `settings.yaml`
as a template, users can edit the yaml file to run other commands.
Here is a section of the yaml file which controls global parameters for
different tasks:
```
model_url: "https://github.com/MannLabs/alphapeptdeep/releases/download/pre-trained-models/pretrained_models.zip"
task_type: library
task_type_choices:
- library
- train
- rescore
thread_num: 8
torch_device:
device_type: gpu
device_type_choices:
- gpu
- mps
- cpu
device_ids: []
log_level: info
log_level_choices:
- debug
- info
- warning
- error
- critical
common:
modloss_importance_level: 1.0
user_defined_modifications: {}
# For example,
# user_defined_modifications:
# "Dimethyl2@Any_N-term":
# composition: "H(2)2H(2)C(2)"
# modloss_composition: "H(0)" # can be without if no modloss
# "Dimethyl2@K":
# composition: "H(2)2H(2)C(2)"
# "Dimethyl6@Any_N-term":
# composition: "2H(4)13C(2)"
# "Dimethyl6@K":
# composition: "2H(4)13C(2)"
peak_matching:
ms2_ppm: True
ms2_tol_value: 20.0
ms1_ppm: True
ms1_tol_value: 20.0
model_mgr:
default_nce: 30.0
default_instrument: Lumos
mask_modloss: True
model_type: generic
model_choices:
- generic
- phos
- hla # same as generic
- digly
external_ms2_model: ''
external_rt_model: ''
external_ccs_model: ''
instrument_group:
ThermoTOF: ThermoTOF
Astral: ThermoTOF
Lumos: Lumos
QE: QE
timsTOF: timsTOF
SciexTOF: SciexTOF
Fusion: Lumos
Eclipse: Lumos
Velos: Lumos # not important
Elite: Lumos # not important
OrbitrapTribrid: Lumos
ThermoTribrid: Lumos
QE+: QE
QEHF: QE
QEHFX: QE
Exploris: QE
Exploris480: QE
predict:
batch_size_ms2: 512
batch_size_rt_ccs: 1024
verbose: True
multiprocessing: True
```
The `model_mgr` section in the yaml defines the common settings for
MS2/RT/CCS prediction.
------------------------------------------------------------------------
### cmd-flow
``` bash
peptdeep cmd-flow ...
```
Support CLI parameters to control `global_settings` for CLI users. It supports three workflows: `train`, `library` or `train library`, controlled by CLI parameter `--task_workflow`, for example, `--task_workflow train library`. All settings in [global_settings](peptdeep/constants/default_settings.yaml) are converted to CLI parameters using `--` as the dict level indicator, for example, `global_settings["library"]["var_mods"]` corresponds to `--library--var_mods`. See [test_cmd_flow.sh](tests/test_cmd_flow.sh) for example.
There are three kinds of parameter types:
1. value type (int, float, bool, str): The CLI parameter only has a single value, for instance: `--model_mgr--default_instrument 30.0`.
2. list type (list): The CLI parameter has a list of values seperated by a space, for instance `--library--var_mods "Oxidation@M" "Acetyl@Protein_N-term"`.
3. dict type (dict): Only three parameters are `dict type`, `--library--labeling_channels`, `--model_mgr--transfer--psm_modification_mapping`, and `--common--user_defined_modifications`. Here are the examples:
- `--library--labeling_channels`: labeling channels for the library. Example: `--library--labeling_channels "0:Dimethyl@Any_N-term;Dimethyl@K" "4:xx@Any_N-term;xx@K"`
- `--model_mgr--transfer--psm_modification_mapping`: converting other search engines' modification names to alphabase modifications for transfer learning. Example: `--model_mgr--transfer--psm_modification_mapping "Dimethyl@Any_N-term:_(Dimethyl-n-0);_(Dimethyl)" "Dimethyl@K:K(Dimethyl-K-0);K(Dimethyl)"`. Note that `X(UniMod:id)` format can directly be recognized by alphabase.
- `--common--user_defined_modification`: user defined modifications. Example:`--common--user_defined_modification "NewMod1@Any_N-term:H(2)2H(2)C(2)" "NewMod2@K:H(100)O(2)C(2)"`
#### library
``` bash
peptdeep library settings_yaml
```
This command will predict a spectral library for given settings_yaml
file (exported by [export-settings](#export-settings)). All the
essential settings are in the `library` section in the settings_yaml
file:
```
library:
infile_type: fasta
infile_type_choices:
- fasta
- sequence_table
- peptide_table # sequence with mods and mod_sites
- precursor_table # peptide with charge state
infiles:
- xxx.fasta
fasta:
protease: 'trypsin'
protease_choices:
- 'trypsin'
- '([KR])'
- 'trypsin_not_P'
- '([KR](?=[^P]))'
- 'lys-c'
- 'K'
- 'lys-n'
- '\w(?=K)'
- 'chymotrypsin'
- 'asp-n'
- 'glu-c'
max_miss_cleave: 2
add_contaminants: False
fix_mods:
- Carbamidomethyl@C
var_mods:
- Acetyl@Protein_N-term
- Oxidation@M
special_mods: [] # normally for Phospho or GlyGly@K
special_mods_cannot_modify_pep_n_term: False
special_mods_cannot_modify_pep_c_term: False
labeling_channels: {}
# For example,
# labeling_channels:
# 0: ['Dimethyl@Any_N-term','Dimethyl@K']
# 4: ['Dimethyl:2H(2)@Any_N-term','Dimethyl:2H(2)@K']
# 8: [...]
min_var_mod_num: 0
max_var_mod_num: 2
min_special_mod_num: 0
max_special_mod_num: 1
min_precursor_charge: 2
max_precursor_charge: 4
min_peptide_len: 7
max_peptide_len: 35
min_precursor_mz: 200.0
max_precursor_mz: 2000.0
decoy: pseudo_reverse
decoy_choices:
- pseudo_reverse
- diann
- None
max_frag_charge: 2
frag_types:
- b
- y
rt_to_irt: True
generate_precursor_isotope: False
output_folder: "{PEPTDEEP_HOME}/spec_libs"
output_tsv:
enabled: False
min_fragment_mz: 200
max_fragment_mz: 2000
min_relative_intensity: 0.001
keep_higest_k_peaks: 12
translate_batch_size: 1000000
translate_mod_to_unimod_id: False
```
peptdeep will load sequence data based on `library:infile_type`
and `library:infiles` for library prediction.
`library:infiles` contains the list of files with
`library:infile_type` defined in
`library:infile_type_choices`:
- fasta: Protein fasta files, peptdeep will digest the protein sequences
into peptide sequences.
- [sequence_table](#sequence_table): Tab/comma-delimited txt/tsv/csv
(text) files which contain the column `sequence` for peptide
sequences.
- [peptide_table](#peptide_table): Tab/comma-delimited txt/tsv/csv
(text) files which contain the columns `sequence`, `mods`, and
`mod_sites`. peptdeep will not add modifications for peptides of this
file type.
- [precursor_table](#precursor_table): Tab/comma-delimited txt/tsv/csv
(text) files which contain the columns `sequence`, `mods`,
`mod_sites`, and `charge`. peptdeep will not add modifications and
charge states for peptides of this file type.
See examples:
``` python
import pandas as pd
df = pd.DataFrame({
'sequence': ['ACDEFGHIK','LMNPQRSTVK','WYVSTR'],
'mods': ['Carbamidomethyl@C','Acetyl@Protein_N-term;Phospho@S',''],
'mod_sites': ['2','0;7',''],
'charge': [2,3,1],
})
```
##### sequence_table
``` python
df[['sequence']]
```
| | sequence |
| --- | --- |
| 0 | ACDEFGHIK |
| 1 | LMNPQRSTVK |
| 2 | WYVSTR |
##### peptide_table
``` python
df[['sequence','mods','mod_sites']]
```
| | sequence | mods | mod_sites |
| --- | --- | --- | --- |
| 0 | ACDEFGHIK | Carbamidomethyl@C | 2 |
| 1 | LMNPQRSTVK | Acetyl@Protein_N-term;Phospho@S | 0;7 |
| 2 | WYVSTR | | |
##### precursor_table
``` python
df
```
| | sequence | mods | mod_sites | charge |
| --- | --- | --- | --- | --- |
| 0 | ACDEFGHIK | Carbamidomethyl@C | 2 | 2 |
| 1 | LMNPQRSTVK | Acetyl@Protein_N-term;Phospho@S | 0;7 | 3 |
| 2 | WYVSTR | | | 1 |
> Columns of `proteins` and `genes` are optional for these txt/tsv/csv
> files.
peptdeep supports multiple files for library prediction, for example (in
the yaml file):
```
library:
...
infile_type: fasta
infiles:
- /path/to/fasta/human.fasta
- /path/to/fasta/yeast.fasta
...
```
The library in HDF5 (.hdf) format will be saved into
`library:output_folder`. If `library:output_tsv:enabled` is True, a TSV
spectral library that can be processed by DIA-NN and Spectronaut will
also be saved into `library:output_folder`.
------------------------------------------------------------------------
#### transfer
``` bash
peptdeep transfer settings_yaml
```
This command will apply transfer learning to refine RT/CCS/MS2 models
based on `model_mgr:transfer:psm_files` and
`model_mgr:transfer:psm_type`. All yaml settings (exported by
[export-settings](#export-settings)) related to this command are:
```
model_mgr:
transfer:
model_output_folder: "{PEPTDEEP_HOME}/refined_models"
epoch_ms2: 20
warmup_epoch_ms2: 10
batch_size_ms2: 512
lr_ms2: 0.0001
epoch_rt_ccs: 40
warmup_epoch_rt_ccs: 10
batch_size_rt_ccs: 1024
lr_rt_ccs: 0.0001
verbose: False
grid_nce_search: False
grid_nce_first: 15.0
grid_nce_last: 45.0
grid_nce_step: 3.0
grid_instrument: ['Lumos']
psm_type: alphapept
psm_type_choices:
- alphapept
- pfind
- maxquant
- diann
- speclib_tsv
psm_files: []
ms_file_type: alphapept_hdf
ms_file_type_choices:
- alphapept_hdf
- thermo_raw
- mgf
- mzml
ms_files: []
psm_num_to_train_ms2: 100000000
psm_num_per_mod_to_train_ms2: 50
psm_num_to_test_ms2: 0
psm_num_to_train_rt_ccs: 100000000
psm_num_per_mod_to_train_rt_ccs: 50
psm_num_to_test_rt_ccs: 0
top_n_mods_to_train: 10
psm_modification_mapping: {}
# alphabase modification to modifications of other search engines
# For example,
# psm_modification_mapping:
# Dimethyl@Any_N-term:
# - _(Dimethyl-n-0)
# - _(Dimethyl)
# Dimethyl:2H(2)@K:
# - K(Dimethyl-K-2)
# ...
```
For DDA data, peptdeep can also extract MS2 intensities from the
spectrum files from `model_mgr:transfer:ms_files` and
`model_mgr:transfer:ms_file_type` for all PSMs. This will enable the
transfer learning of the MS2 model.
For DIA data, only RT and CCS (if timsTOF) models will be refined.
For example of the settings yaml:
```
model_mgr:
transfer:
...
psm_type: pfind
psm_files:
- /path/to/pFind.spectra
- /path/to/other/pFind.spectra
ms_file_type: thermo_raw
ms_files:
- /path/to/raw1.raw
- /path/to/raw2.raw
...
```
The refined models will be saved in
`model_mgr:transfer:model_output_folder`. After transfer learning, users
can apply the new models by replacing `model_mgr:external_ms2_model`,
`model_mgr:external_rt_model` and `model_mgr:external_ccs_model` with
the saved `ms2.pth`, `rt.pth` and `ccs.pth` in
`model_mgr:transfer:model_output_folder`. This is useful to perform
sample-specific library prediction.
------------------------------------------------------------------------
#### rescore
This command will apply Percolator to rescore DDA PSMs in
`percolator:input_files:psm_files` and
`percolator:input_files:psm_type`. All yaml settings (exported by
[export-settings](#export-settings)) related to this command are:
```
percolator:
require_model_tuning: True
raw_num_to_tune: 8
require_raw_specific_tuning: True
raw_specific_ms2_tuning: False
psm_num_per_raw_to_tune: 200
epoch_per_raw_to_tune: 5
multiprocessing: True
top_k_frags_to_calc_spc: 10
calibrate_frag_mass_error: False
max_perc_train_sample: 1000000
min_perc_train_sample: 100
percolator_backend: sklearn
percolator_backend_choices:
- sklearn
- pytorch
percolator_model: linear
percolator_model_choices:
pytorch_as_backend:
- linear # not fully tested, performance may be unstable
- mlp # not implemented yet
sklearn_as_backend:
- linear # logistic regression
- random_forest
lr_percolator_torch_model: 0.1 # learning rate, only used when percolator_backend==pytorch
percolator_iter_num: 5 # percolator iteration number
cv_fold: 1
fdr: 0.01
fdr_level: psm
fdr_level_choices:
- psm
- precursor
- peptide
- sequence
use_fdr_for_each_raw: False
frag_types: ['b_z1','b_z2','y_z1','y_z2']
input_files:
psm_type: alphapept
psm_type_choices:
- alphapept
- pfind
psm_files: []
ms_file_type: alphapept_hdf
ms_file_type_choices:
- alphapept_hdf
- thermo_raw # if alpharaw is installed
- mgf
- mzml
ms_files: []
other_score_column_mapping:
alphapept: {}
pfind:
raw_score: Raw_Score
msfragger:
hyperscore: hyperscore
nextscore: nextscore
maxquant: {}
output_folder: "{PEPTDEEP_HOME}/rescore"
```
Transfer learning will be applied when rescoring if `percolator:require_model_tuning`
is True.
The corresponding MS files (`percolator:input_files:ms_files` and
`percolator:input_files:ms_file_type`) must be provided to extract
experimental fragment intensities.
------------------------------------------------------------------------
#### install-models
``` bash
peptdeep install-models [--model-file url_or_local_model_zip] --overwrite True
```
Running peptdeep for the first time, it will download and install models
from [models on github](https://github.com/MannLabs/alphapeptdeep/releases/download/pre-trained-models/pretrained_models.zip)
defined in ‘model_url’ in the default yaml settings. This command will
update `pretrained_models.zip` from `--model-file url_or_local_model_zip`.
It is also possible to use other models instead of the pretrained_models by providing `model_mgr:external_ms2_model`,
`model_mgr:external_rt_model` and `model_mgr:external_ccs_model`.
------------------------------------------------------------------------
### Python and Jupyter notebooks
Using peptdeep from Python script or notebook provides the most flexible
way to access all features in peptdeep.
We will introduce several usages of peptdeep via Python notebook:
- [**global_settings**](#global_settings)
- [**Pipeline APIs**](#pipeline-apis)
- [**ModelManager**](#modelmanager)
- [**Library Prediction**](#library-prediction)
- [**DDA Rescoring**](#dda-rescoring)
- [**HLA Peptide Prediction**](#hla-peptide-prediction)
------------------------------------------------------------------------
#### global_settings
Most of the default parameters and attributes peptdeep functions and
classes are controlled by `peptdeep.settings.global_settings` which is a
`dict`.
``` python
from peptdeep.settings import global_settings
```
The default values of `global_settings` is defined in
[default_settings.yaml](https://github.com/MannLabs/alphapeptdeep/blob/main/peptdeep/constants/default_settings.yaml).
#### Pipeline APIs
Pipeline APIs provides the same functionalities with [CLI](#cli),
including [library prediction](#library), [transfer
learning](#transfer), and [rescoring](#rescore).
``` python
from peptdeep.pipeline_api import (
generate_library,
transfer_learn,
rescore,
)
```
All these functionalities take a `settings_dict` as the inputs, the dict
structure is the same as the settings yaml file. See the documatation of `generate_library`, `transfer_learn`, `rescore` in https://alphapeptdeep.readthedocs.io/en/latest/module_pipeline_api.html.
#### ModelManager
``` python
from peptdeep.pretrained_models import ModelManager
```
[`ModelManager`](https://alphapeptdeep.readthedocs.io/en/latest/module_pretrained_models.html#peptdeep.pretrained_models.ModelManager) class is the main entry to access MS2/RT/CCS models. It provides functionalities to train/refine the models and then use the new models to predict the data.
Check [tutorial_model_manager.ipynb](https://github.com/MannLabs/alphapeptdeep/blob/main/nbs/docs/tutorial_model_manager.ipynb) for details.
#### Library Prediction
``` python
from peptdeep.protein.fasta import PredictSpecLibFasta
```
[`PredictSpecLibFasta`](https://alphapeptdeep.readthedocs.io/en/latest/protein/fasta.html#peptdeep.protein.fasta.PredictSpecLibFasta) class provides functionalities to deal with fasta files or protein
sequences and spectral libraries.
Check out
[tutorial_speclib_from_fasta.ipynb](https://github.com/MannLabs/alphapeptdeep/blob/main/docs/nbs/tutorial_speclib_from_fasta.ipynb)
for details.
#### DDA Rescoring
``` python
from peptdeep.rescore.percolator import Percolator
```
`Percolator` class provides functionalities to rescore DDA PSMs search by `pFind` and
`AlphaPept`, (and `MaxQuant` if output FDR=100%), …
Check out [test_percolator.ipynb](https://github.com/MannLabs/alphapeptdeep/blob/main/nbs_tests/test_percolator.ipynb)
for details.
#### HLA Peptide Prediction
``` python
from peptdeep.model.model_interface import ModelInterface
import peptdeep.model.generic_property_prediction # model shop
```
Building new DL models for peptide property prediction is one of the key features of AlphaPeptDeep. The key functionalities are [`ModelInterface`](https://alphapeptdeep.readthedocs.io/en/latest/model/model_interface.html#peptdeep.model.model_interface.ModelInterface) and the pre-designed models and model interfaces in the model shop (module [`peptdeep.model.generic_property_prediction`](https://alphapeptdeep.readthedocs.io/en/latest/model/generic_property_prediction.html)).
For example, we can built a HLA classifier that distinguishes HLA peptides from non-HLA peptides, see https://github.com/MannLabs/PeptDeep-HLA for details.
------------------------------------------------------------------------
## Troubleshooting
In case of issues, check out the following:
- [Issues](https://github.com/MannLabs/alphapeptdeep/issues). Try a few
different search terms to find out if a similar problem has been
encountered before.
- [Discussions](https://github.com/MannLabs/alphapeptdeep/discussions).
Check if your problem or feature requests has been discussed before.
------------------------------------------------------------------------
## How to contribute
If you like this software, you can give us a
[star](https://github.com/MannLabs/alphapeptdeep/stargazers) to boost
our visibility! All direct contributions are also welcome. Feel free to
post a new [issue](https://github.com/MannLabs/alphapeptdeep/issues) or
clone the repository and create a [pull
request](https://github.com/MannLabs/alphapeptdeep/pulls) with a new
branch. For an even more interactive participation, check out the
[discussions](https://github.com/MannLabs/alphapeptdeep/discussions) and
the [Contributors License Agreement](misc/CLA.md).
### Notes for developers
#### Tagging of changes
In order to have release notes automatically generated, changes need to be tagged with labels.
The following labels are used (should be safe-explanatory):
`breaking-change`, `bug`, `enhancement`.
#### Release a new version
This package uses a shared release process defined in the
[alphashared](https://github.com/MannLabs/alphashared) repository. Please see the instructions
[there](https://github.com/MannLabs/alphashared/blob/reusable-release-workflow/.github/workflows/README.md#release-a-new-version).
#### pre-commit hooks
It is highly recommended to use the provided pre-commit hooks, as the CI pipeline enforces all checks therein to
pass in order to merge a branch.
The hooks need to be installed once by
```bash
pre-commit install
```
You can run the checks yourself using:
```bash
pre-commit run --all-files
```
------------------------------------------------------------------------
## Changelog
See the [HISTORY.md](HISTORY.md) for a full overview of the changes made
in each version.
Raw data
{
"_id": null,
"home_page": "https://github.com/MannLabs/peptdeep",
"name": "peptdeep",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "deep learning, proteomics, AlphaX ecosystem",
"author": "Mann Labs",
"author_email": "jalew.zwf@qq.com",
"download_url": "https://files.pythonhosted.org/packages/a7/87/34fce94f522c1feb1e28264ec66228790b57f3e0c281232ac24ab05aba0e/peptdeep-1.3.0.tar.gz",
"platform": null,
"description": "# AlphaPeptDeep (PeptDeep)\n\n[![Default installation and tests](https://github.com/MannLabs/alphapeptdeep/actions/workflows/pip_installation.yml/badge.svg)](https://github.com/MannLabs/alphapeptdeep/actions/workflows/pip_installation.yml)\n[![Publish on PyPi and release on GitHub](https://github.com/MannLabs/alphapeptdeep/actions/workflows/publish_and_release.yml/badge.svg)](https://github.com/MannLabs/alphapeptdeep/actions/workflows/publish_and_release.yml)\n[![Documentation Status](https://readthedocs.org/projects/alphapeptdeep/badge/?version=latest)](https://alphapeptdeep.readthedocs.io/en/latest/?badge=latest)\n[![pypi](https://img.shields.io/pypi/v/peptdeep)](https://pypi.org/project/peptdeep)\n[![GitHub release](https://img.shields.io/github/v/release/mannlabs/alphapeptdeep?display_name=tag)](https://github.com/MannLabs/alphapeptdeep/releases)\n[![GitHub downloads](https://img.shields.io/github/downloads/mannlabs/alphapeptdeep/total?label=github%20downloads)](https://github.com/MannLabs/alphapeptdeep/releases)\n[![Downloads@pre-train-models](https://img.shields.io/github/downloads/mannlabs/alphapeptdeep/pre-trained-models/total)](https://github.com/MannLabs/alphapeptdeep/releases/tag/pre-trained-models)\n[![pip downloads](https://img.shields.io/pypi/dm/peptdeep?color=blue&label=pip%20downloads)](https://pypi.org/project/peptdeep)\n![Python](https://img.shields.io/pypi/pyversions/peptdeep)\n\n- [**About**](#about)\n- [**License**](#license)\n- [**Installation**](#installation)\n - [**One-click GUI**](#one-click-gui)\n - [**Pip installer**](#pip)\n - [**Use GPU**](#use-gpu)\n - [**Developer installer**](#developer)\n- [**Usage**](#usage)\n - [**GUI**](#gui)\n - [**CLI**](#cli)\n - [**Python and jupyter notebooks**](#python-and-jupyter-notebooks)\n- [**Troubleshooting**](#troubleshooting)\n- [**Citations**](#citations)\n- [**How to contribute**](#how-to-contribute)\n- [**Changelog**](#changelog)\n\n------------------------------------------------------------------------\n\n## About\n\nAlphaPeptDeep (`peptdeep` for short) aims to easily build new deep\nlearning models for shotgun proteomics studies. Transfer learning is\nalso easy to apply using AlphaPeptDeep.\n\nIt contains some built-in models such as retention time (RT), collision\ncross section (CCS), and tandem mass spectrum (MS2) prediction for given\npeptides. With these models, one can easily generate a predicted library\nfrom fasta files.\n\nFor details, check out our [publications](#citations).\n\nFor documentation, see [readthedocs](https://alphapeptdeep.readthedocs.io/en/latest/).\n\n### AlphaX repositories:\n\n- [**alphabase**](https://github.com/MannLabs/alphabase): Infrastructure for AlphaX Ecosystem\n- [**alphapept**](https://github.com/MannLabs/alphapept): DDA search\n engine\n- [**alphapeptdeep**](https://github.com/MannLabs/alphapeptdeep): Deep\n learning for proteomics\n- [**alpharaw**](https://github.com/MannLabs/alpharaw): Raw data\n accessing\n- [**alphaviz**](https://github.com/MannLabs/alphaviz): MS data and\n result visualization\n- [**alphatims**](https://github.com/MannLabs/alphatims): timsTOF data\n accessing\n\n### Subsequent projects of AlphaPeptDeep\n\n- [**peptdeep_hla**](https://github.com/MannLabs/PeptDeep-HLA): the DL model that predict if a peptide is presented by indivudual HLA or not.\n\n### Other pre-trained MS2/RT/CCS models\n\n- [**Dimethyl**](https://github.com/MannLabs/alphapeptdeep/releases/tag/dimethyl-models): the MS2/RT/CCS models for Dimethyl-labeled peptides.\n\n------------------------------------------------------------------------\n\n## Citations\n\nWen-Feng Zeng, Xie-Xuan Zhou, Sander Willems, Constantin Ammar, Maria Wahle, Isabell Bludau, Eugenia Voytik, Maximillian T. Strauss & Matthias Mann. AlphaPeptDeep: a modular deep learning framework to predict peptide properties for proteomics. Nat Commun 13, 7238 (2022). https://doi.org/10.1038/s41467-022-34904-3\n\n\n------------------------------------------------------------------------\n\n## License\n\nAlphaPeptDeep was developed by the [Mann Labs at the Max Planck\nInstitute of Biochemistry](https://www.biochem.mpg.de/mann) and the\n[University of\nCopenhagen](https://www.cpr.ku.dk/research/proteomics/mann/) and is\nfreely available with an [Apache License](LICENSE.txt). External Python\npackages (available in the [requirements](requirements) folder) have\ntheir own licenses, which can be consulted on their respective websites.\n\n------------------------------------------------------------------------\n\n## Installation\n\nAlphaPeptDeep can be installed and used on all major operating systems\n(Windows, macOS and Linux).\n\nThere are three different types of installation possible:\n\n- [**One-click GUI installer:**](#one-click-gui) Choose this\n installation if you only want the GUI and/or keep things as simple as\n possible.\n- [**Pip installer:**](#pip) Choose this installation if you want to use peptdeep as a Python package in an existing Python (recommended Python 3.8 or 3.9) environment (e.g.\u00a0a Jupyter notebook). If needed, the GUI and CLI\n can be installed with pip as well.\n- [**Developer installer:**](#developer) Choose this installation if you\n are familiar with CLI tools, [conda](https://docs.conda.io/en/latest/)\n and Python. This installation allows access to all available features\n of peptdeep and even allows to modify its source code directly.\n Generally, the developer version of peptdeep outperforms the\n precompiled versions which makes this the installation of choice for\n high-throughput experiments.\n\n### One-click GUI\n\nThe GUI of peptdeep is a completely stand-alone tool that requires no\nknowledge of Python or CLI tools. Click on one of the links below to\ndownload the latest release for:\n\n- [**Windows**](https://github.com/MannLabs/alphapeptdeep/releases/latest/download/peptdeep-1.3.0-windows-amd64.exe)\n- [**macOS**](https://github.com/MannLabs/alphapeptdeep/releases/latest/download/peptdeep-1.3.0-macos-darwin-x64.pkg)\n- [**macOS ARM**](https://github.com/MannLabs/alphapeptdeep/releases/latest/download/peptdeep-1.3.0-macos-darwin-arm64.pkg )\n- [**Linux**](https://github.com/MannLabs/alphapeptdeep/releases/latest/download/peptdeep-1.3.0-linux-x64.deb)\n\nOlder releases remain available on the [release\npage](https://github.com/MannLabs/alphapeptdeep/releases), but no\nbackwards compatibility is guaranteed.\n\nNote that, as GitHub does not allow large release files, these installers do not have GPU support.\nTo create GPU version installers: clone the source code, install the GPU-version of pytorch [see here](#use-gpu),\nand then use the `build_installer_*.sh` and `build_package_*.sh`\nscript in the respective `release/[macos, linux, windows]` folder to build the installer locally.\nFor Linux you need to additionally pass the \"GPU\" flag, i.e. run\n```bash\nrelease/linux/build_installer_linux.sh GPU\nrelease/linux/build_package_linux.sh\n```\n\n### Pip\n\n> PythonNET must be installed to access Thermo or Sciex raw data.\n>\n> *Legacy, should be replaced by AlphaRaw in the near future.*\n>\n> #### PythonNET in Windows\n>\n> Automatically installed for Windows.\n>\n> #### PythonNET in Linux\n>\n> 1. Install Mono from mono-project website [Mono\n> Linux](https://www.mono-project.com/download/stable/#download-lin).\n> NOTE, the installed mono version should be at least 6.10, which\n> requires you to add the ppa to your trusted sources!\n> 2. Install PythonNET with `pip install pythonnet`.\n>\n> #### PythonNET in MacOS\n>\n> 1. Install [brew](https://brew.sh) and pkg-config:\n> `brew install pkg-config` 3. Install Mono from mono-project\n> website [Mono Mac](https://www.mono-project.com/download/stable/)\n> 2. Register the Mono-Path to your system: For macOS Catalina, open\n> the configuration of zsh via the terminal:\n>\n> - Type `nano ~/.zshrc` to open the configuration of the terminal\n> - Append the mono path to your `PKG_CONFIG_PATH`:\n> `export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig:/usr/lib/pkgconfig:/Library/Frameworks/Mono.framework/Versions/Current/lib/pkgconfig:$PKG_CONFIG_PATH`.\n> - Save everything and execute `. ~/.zshrc`\n>\n> 3. Install PythonNET with `pip install pythonnet`.\n\npeptdeep can be installed in an existing Python environment with a\nsingle `bash` command. *This `bash` command can also be run directly\nfrom within a Jupyter notebook by prepending it with a `!`*:\n\n``` bash\npip install peptdeep\n```\n\nInstalling peptdeep like this avoids conflicts when integrating it in\nother tools, as this does not enforce strict versioning of dependancies.\nHowever, if new versions of dependancies are released, they are not\nguaranteed to be fully compatible with peptdeep. This should only occur\nin rare cases where dependencies are not backwards compatible.\n\n> **TODO** You can always force peptdeep to use dependancy versions\n> which are known to be compatible with:\n>\n> ``` bash\n> pip install \"peptdeep[stable]\"\n> ```\n>\n> NOTE: You might need to run `pip install pip` before installing\n> peptdeep like this. Also note the double quotes `\"`.\n\nFor those who are really adventurous, it is also possible to directly\ninstall any branch (e.g.\u00a0`@development`) with any extras\n(e.g.\u00a0`#egg=peptdeep[stable,development-stable]`) from GitHub with e.g.\n\n``` bash\npip install \"git+https://github.com/MannLabs/alphapeptdeep.git@development#egg=peptdeep[stable,development-stable]\"\n```\n\n### Use GPU\n\nTo enable GPU, GPU version of PyTorch is required, it can be installed\nwith:\n\n``` bash\npip install torch --extra-index-url https://download.pytorch.org/whl/cu116 --upgrade\n```\n\nNote that this may depend on your NVIDIA driver version. Run the command\nto check your NVIDIA driver:\n\n``` bash\nnvidia-smi\n```\n\nFor latest pytorch version, see [pytorch.org](https://pytorch.org/get-started/locally/).\n\n### Developer\n\npeptdeep can also be installed in editable (i.e.\u00a0developer) mode with a\nfew `bash` commands. This allows to fully customize the software and\neven modify the source code to your specific needs. When an editable\nPython package is installed, its source code is stored in a transparent\nlocation of your choice. While optional, it is advised to first (create\nand) navigate to e.g.\u00a0a general software folder:\n\n``` bash\nmkdir ~/alphapeptdeep/project/folder\ncd ~/alphapeptdeep/project/folder\n```\n\n***The following commands assume you do not perform any additional `cd`\ncommands anymore***.\n\nNext, download the peptdeep repository from GitHub either directly or\nwith a `git` command. This creates a new peptdeep subfolder in your\ncurrent directory.\n\n``` bash\ngit clone https://github.com/MannLabs/alphapeptdeep.git\n```\n\nFor any Python package, it is highly recommended to use a separate\n[conda virtual environment](https://docs.conda.io/en/latest/), as\notherwise *dependancy conflicts can occur with already existing\npackages*.\n\n``` bash\nconda create --name peptdeep python=3.9 -y\nconda activate peptdeep\n```\n\nFinally, peptdeep and all its [dependancies](requirements) need to be\ninstalled. To take advantage of all features and allow development (with\nthe `-e` flag), this is best done by also installing the [development\ndependencies](requirements/requirements_development.txt) instead of only\nthe [core dependencies](requirements/requirements.txt):\n\n``` bash\npip install -e \".[development]\"\n```\n\nBy default this installs loose dependancies (no explicit versioning),\nalthough it is also possible to use stable dependencies\n(e.g.\u00a0`pip install -e \".[stable,development-stable]\"`).\n\n***By using the editable flag `-e`, all modifications to the [peptdeep\nsource code folder](peptdeep) are directly reflected when running\npeptdeep. Note that the peptdeep folder cannot be moved and/or renamed\nif an editable version is installed. In case of confusion, you can\nalways retrieve the location of any Python module with e.g.\u00a0the command\n`import module` followed by `module.__file__`.***\n\n------------------------------------------------------------------------\n\n## Usage\n\nThere are three ways to use peptdeep:\n\n- [**GUI**](#gui)\n- [**CLI**](#cli)\n- [**Python**](#python-and-jupyter-notebooks)\n\nNOTE: The first time you use a fresh installation of peptdeep, it is\noften quite slow because some functions might still need compilation on\nyour local operating system and architecture. Subsequent use should be a\nlot faster.\n\n### GUI\n\nIf the GUI was not installed through a one-click GUI installer, it can\nbe launched with the following `bash` command:\n\n``` bash\npeptdeep gui\n```\n\nThis command will start a web server and automatically open the default\nbrowser:\n![](https://user-images.githubusercontent.com/4646029/189301730-ac1f92cc-0e9d-4ba3-be1d-07c4d66032cd.jpg)\n\nThere are several options in the GUI (left panel):\n\n- Server: Start/stop the task server, check tasks in the task queue\n- Settings: Configure common settings, load/save current settings\n- Model: Configure DL models for prediction or transfer learning\n- Transfer: Refine the models\n- Library: Predict a library\n- Rescore: Perform ML feature extraction and Percolator\n\n------------------------------------------------------------------------\n\n### CLI\n\nThe CLI can be run with the following command (after activating the\n`conda` environment with `conda activate peptdeep` or if an alias was\nset to the peptdeep executable):\n\n``` bash\npeptdeep -h\n```\n\nIt is possible to get help about each function and their (required)\nparameters by using the `-h` flag. AlphaPeptDeep provides several\ncommands for different tasks:\n\n- [**export-settings**](#export-settings)\n- [**cmd-flow**](#cmd-flow)\n- [**library**](#library)\n- [**transfer**](#transfer)\n- [**rescore**](#rescore)\n- [**install-models**](#install-models)\n- [**gui**](#gui)\n\nRun a command to check usages:\n\n``` bash\npeptdeep $command -h\n```\n\nFor example:\n\n``` bash\npeptdeep library -h\n```\n\n#### export-settings\n\n``` bash\npeptdeep export-settings C:/path/to/settings.yaml\n```\n\nThis command will export the default settings into the `settings.yaml`\nas a template, users can edit the yaml file to run other commands.\n\nHere is a section of the yaml file which controls global parameters for\ndifferent tasks:\n\n```\nmodel_url: \"https://github.com/MannLabs/alphapeptdeep/releases/download/pre-trained-models/pretrained_models.zip\"\n\ntask_type: library\ntask_type_choices:\n - library\n - train\n - rescore\nthread_num: 8\ntorch_device:\n device_type: gpu\n device_type_choices:\n - gpu\n - mps\n - cpu\n device_ids: []\n\nlog_level: info\nlog_level_choices:\n - debug\n - info\n - warning\n - error\n - critical\n\ncommon:\n modloss_importance_level: 1.0\n user_defined_modifications: {}\n # For example,\n # user_defined_modifications:\n # \"Dimethyl2@Any_N-term\":\n # composition: \"H(2)2H(2)C(2)\"\n # modloss_composition: \"H(0)\" # can be without if no modloss\n # \"Dimethyl2@K\":\n # composition: \"H(2)2H(2)C(2)\"\n # \"Dimethyl6@Any_N-term\":\n # composition: \"2H(4)13C(2)\"\n # \"Dimethyl6@K\":\n # composition: \"2H(4)13C(2)\"\n\npeak_matching:\n ms2_ppm: True\n ms2_tol_value: 20.0\n ms1_ppm: True\n ms1_tol_value: 20.0\n\nmodel_mgr:\n default_nce: 30.0\n default_instrument: Lumos\n mask_modloss: True\n model_type: generic\n model_choices:\n - generic\n - phos\n - hla # same as generic\n - digly\n external_ms2_model: ''\n external_rt_model: ''\n external_ccs_model: ''\n instrument_group:\n ThermoTOF: ThermoTOF\n Astral: ThermoTOF\n Lumos: Lumos\n QE: QE\n timsTOF: timsTOF\n SciexTOF: SciexTOF\n Fusion: Lumos\n Eclipse: Lumos\n Velos: Lumos # not important\n Elite: Lumos # not important\n OrbitrapTribrid: Lumos\n ThermoTribrid: Lumos\n QE+: QE\n QEHF: QE\n QEHFX: QE\n Exploris: QE\n Exploris480: QE\n predict:\n batch_size_ms2: 512\n batch_size_rt_ccs: 1024\n verbose: True\n multiprocessing: True\n```\n\nThe `model_mgr` section in the yaml defines the common settings for\nMS2/RT/CCS prediction.\n\n------------------------------------------------------------------------\n\n### cmd-flow\n\n``` bash\npeptdeep cmd-flow ...\n```\n\nSupport CLI parameters to control `global_settings` for CLI users. It supports three workflows: `train`, `library` or `train library`, controlled by CLI parameter `--task_workflow`, for example, `--task_workflow train library`. All settings in [global_settings](peptdeep/constants/default_settings.yaml) are converted to CLI parameters using `--` as the dict level indicator, for example, `global_settings[\"library\"][\"var_mods\"]` corresponds to `--library--var_mods`. See [test_cmd_flow.sh](tests/test_cmd_flow.sh) for example.\n\nThere are three kinds of parameter types:\n 1. value type (int, float, bool, str): The CLI parameter only has a single value, for instance: `--model_mgr--default_instrument 30.0`.\n 2. list type (list): The CLI parameter has a list of values seperated by a space, for instance `--library--var_mods \"Oxidation@M\" \"Acetyl@Protein_N-term\"`.\n 3. dict type (dict): Only three parameters are `dict type`, `--library--labeling_channels`, `--model_mgr--transfer--psm_modification_mapping`, and `--common--user_defined_modifications`. Here are the examples:\n - `--library--labeling_channels`: labeling channels for the library. Example: `--library--labeling_channels \"0:Dimethyl@Any_N-term;Dimethyl@K\" \"4:xx@Any_N-term;xx@K\"`\n - `--model_mgr--transfer--psm_modification_mapping`: converting other search engines' modification names to alphabase modifications for transfer learning. Example: `--model_mgr--transfer--psm_modification_mapping \"Dimethyl@Any_N-term:_(Dimethyl-n-0);_(Dimethyl)\" \"Dimethyl@K:K(Dimethyl-K-0);K(Dimethyl)\"`. Note that `X(UniMod:id)` format can directly be recognized by alphabase.\n - `--common--user_defined_modification`: user defined modifications. Example:`--common--user_defined_modification \"NewMod1@Any_N-term:H(2)2H(2)C(2)\" \"NewMod2@K:H(100)O(2)C(2)\"`\n\n#### library\n\n``` bash\npeptdeep library settings_yaml\n```\n\nThis command will predict a spectral library for given settings_yaml\nfile (exported by [export-settings](#export-settings)). All the\nessential settings are in the `library` section in the settings_yaml\nfile:\n\n```\nlibrary:\n infile_type: fasta\n infile_type_choices:\n - fasta\n - sequence_table\n - peptide_table # sequence with mods and mod_sites\n - precursor_table # peptide with charge state\n infiles:\n - xxx.fasta\n fasta:\n protease: 'trypsin'\n protease_choices:\n - 'trypsin'\n - '([KR])'\n - 'trypsin_not_P'\n - '([KR](?=[^P]))'\n - 'lys-c'\n - 'K'\n - 'lys-n'\n - '\\w(?=K)'\n - 'chymotrypsin'\n - 'asp-n'\n - 'glu-c'\n max_miss_cleave: 2\n add_contaminants: False\n fix_mods:\n - Carbamidomethyl@C\n var_mods:\n - Acetyl@Protein_N-term\n - Oxidation@M\n special_mods: [] # normally for Phospho or GlyGly@K\n special_mods_cannot_modify_pep_n_term: False\n special_mods_cannot_modify_pep_c_term: False\n labeling_channels: {}\n # For example,\n # labeling_channels:\n # 0: ['Dimethyl@Any_N-term','Dimethyl@K']\n # 4: ['Dimethyl:2H(2)@Any_N-term','Dimethyl:2H(2)@K']\n # 8: [...]\n min_var_mod_num: 0\n max_var_mod_num: 2\n min_special_mod_num: 0\n max_special_mod_num: 1\n min_precursor_charge: 2\n max_precursor_charge: 4\n min_peptide_len: 7\n max_peptide_len: 35\n min_precursor_mz: 200.0\n max_precursor_mz: 2000.0\n decoy: pseudo_reverse\n decoy_choices:\n - pseudo_reverse\n - diann\n - None\n max_frag_charge: 2\n frag_types:\n - b\n - y\n rt_to_irt: True\n generate_precursor_isotope: False\n output_folder: \"{PEPTDEEP_HOME}/spec_libs\"\n output_tsv:\n enabled: False\n min_fragment_mz: 200\n max_fragment_mz: 2000\n min_relative_intensity: 0.001\n keep_higest_k_peaks: 12\n translate_batch_size: 1000000\n translate_mod_to_unimod_id: False\n```\n\npeptdeep will load sequence data based on `library:infile_type`\nand `library:infiles` for library prediction.\n`library:infiles` contains the list of files with\n`library:infile_type` defined in\n`library:infile_type_choices`:\n\n- fasta: Protein fasta files, peptdeep will digest the protein sequences\n into peptide sequences.\n- [sequence_table](#sequence_table): Tab/comma-delimited txt/tsv/csv\n (text) files which contain the column `sequence` for peptide\n sequences.\n- [peptide_table](#peptide_table): Tab/comma-delimited txt/tsv/csv\n (text) files which contain the columns `sequence`, `mods`, and\n `mod_sites`. peptdeep will not add modifications for peptides of this\n file type.\n- [precursor_table](#precursor_table): Tab/comma-delimited txt/tsv/csv\n (text) files which contain the columns `sequence`, `mods`,\n `mod_sites`, and `charge`. peptdeep will not add modifications and\n charge states for peptides of this file type.\n\nSee examples:\n\n``` python\nimport pandas as pd\ndf = pd.DataFrame({\n 'sequence': ['ACDEFGHIK','LMNPQRSTVK','WYVSTR'],\n 'mods': ['Carbamidomethyl@C','Acetyl@Protein_N-term;Phospho@S',''],\n 'mod_sites': ['2','0;7',''],\n 'charge': [2,3,1],\n})\n```\n\n##### sequence_table\n\n``` python\ndf[['sequence']]\n```\n\n| | sequence |\n| --- | --- |\n| 0 | ACDEFGHIK |\n| 1 | LMNPQRSTVK |\n| 2 | WYVSTR |\n\n\n##### peptide_table\n\n``` python\ndf[['sequence','mods','mod_sites']]\n```\n\n| | sequence | mods | mod_sites |\n| --- | --- | --- | --- |\n| 0 | ACDEFGHIK | Carbamidomethyl@C | 2 |\n| 1 | LMNPQRSTVK | Acetyl@Protein_N-term;Phospho@S | 0;7 |\n| 2 | WYVSTR | | |\n\n##### precursor_table\n\n``` python\ndf\n```\n\n| | sequence | mods | mod_sites | charge |\n| --- | --- | --- | --- | --- |\n| 0 | ACDEFGHIK | Carbamidomethyl@C | 2 | 2 |\n| 1 | LMNPQRSTVK | Acetyl@Protein_N-term;Phospho@S | 0;7 | 3 |\n| 2 | WYVSTR | | | 1 |\n\n> Columns of `proteins` and `genes` are optional for these txt/tsv/csv\n> files.\n\npeptdeep supports multiple files for library prediction, for example (in\nthe yaml file):\n\n```\nlibrary:\n ...\n infile_type: fasta\n infiles:\n - /path/to/fasta/human.fasta\n - /path/to/fasta/yeast.fasta\n ...\n```\n\nThe library in HDF5 (.hdf) format will be saved into\n`library:output_folder`. If `library:output_tsv:enabled` is True, a TSV\nspectral library that can be processed by DIA-NN and Spectronaut will\nalso be saved into `library:output_folder`.\n\n------------------------------------------------------------------------\n\n#### transfer\n\n``` bash\npeptdeep transfer settings_yaml\n```\n\nThis command will apply transfer learning to refine RT/CCS/MS2 models\nbased on `model_mgr:transfer:psm_files` and\n`model_mgr:transfer:psm_type`. All yaml settings (exported by\n[export-settings](#export-settings)) related to this command are:\n\n```\nmodel_mgr:\n transfer:\n model_output_folder: \"{PEPTDEEP_HOME}/refined_models\"\n epoch_ms2: 20\n warmup_epoch_ms2: 10\n batch_size_ms2: 512\n lr_ms2: 0.0001\n epoch_rt_ccs: 40\n warmup_epoch_rt_ccs: 10\n batch_size_rt_ccs: 1024\n lr_rt_ccs: 0.0001\n verbose: False\n grid_nce_search: False\n grid_nce_first: 15.0\n grid_nce_last: 45.0\n grid_nce_step: 3.0\n grid_instrument: ['Lumos']\n psm_type: alphapept\n psm_type_choices:\n - alphapept\n - pfind\n - maxquant\n - diann\n - speclib_tsv\n psm_files: []\n ms_file_type: alphapept_hdf\n ms_file_type_choices:\n - alphapept_hdf\n - thermo_raw\n - mgf\n - mzml\n ms_files: []\n psm_num_to_train_ms2: 100000000\n psm_num_per_mod_to_train_ms2: 50\n psm_num_to_test_ms2: 0\n psm_num_to_train_rt_ccs: 100000000\n psm_num_per_mod_to_train_rt_ccs: 50\n psm_num_to_test_rt_ccs: 0\n top_n_mods_to_train: 10\n psm_modification_mapping: {}\n # alphabase modification to modifications of other search engines\n # For example,\n # psm_modification_mapping:\n # Dimethyl@Any_N-term:\n # - _(Dimethyl-n-0)\n # - _(Dimethyl)\n # Dimethyl:2H(2)@K:\n # - K(Dimethyl-K-2)\n # ...\n```\nFor DDA data, peptdeep can also extract MS2 intensities from the\nspectrum files from `model_mgr:transfer:ms_files` and\n`model_mgr:transfer:ms_file_type` for all PSMs. This will enable the\ntransfer learning of the MS2 model.\n\nFor DIA data, only RT and CCS (if timsTOF) models will be refined.\n\nFor example of the settings yaml:\n\n```\nmodel_mgr:\n transfer:\n ...\n psm_type: pfind\n psm_files:\n - /path/to/pFind.spectra\n - /path/to/other/pFind.spectra\n\n ms_file_type: thermo_raw\n ms_files:\n - /path/to/raw1.raw\n - /path/to/raw2.raw\n ...\n```\n\nThe refined models will be saved in\n`model_mgr:transfer:model_output_folder`. After transfer learning, users\ncan apply the new models by replacing `model_mgr:external_ms2_model`,\n`model_mgr:external_rt_model` and `model_mgr:external_ccs_model` with\nthe saved `ms2.pth`, `rt.pth` and `ccs.pth` in\n`model_mgr:transfer:model_output_folder`. This is useful to perform\nsample-specific library prediction.\n\n------------------------------------------------------------------------\n\n#### rescore\n\nThis command will apply Percolator to rescore DDA PSMs in\n`percolator:input_files:psm_files` and\n`percolator:input_files:psm_type`. All yaml settings (exported by\n[export-settings](#export-settings)) related to this command are:\n\n```\npercolator:\n require_model_tuning: True\n raw_num_to_tune: 8\n\n require_raw_specific_tuning: True\n raw_specific_ms2_tuning: False\n psm_num_per_raw_to_tune: 200\n epoch_per_raw_to_tune: 5\n\n multiprocessing: True\n\n top_k_frags_to_calc_spc: 10\n calibrate_frag_mass_error: False\n max_perc_train_sample: 1000000\n min_perc_train_sample: 100\n\n percolator_backend: sklearn\n percolator_backend_choices:\n - sklearn\n - pytorch\n percolator_model: linear\n percolator_model_choices:\n pytorch_as_backend:\n - linear # not fully tested, performance may be unstable\n - mlp # not implemented yet\n sklearn_as_backend:\n - linear # logistic regression\n - random_forest\n lr_percolator_torch_model: 0.1 # learning rate, only used when percolator_backend==pytorch\n percolator_iter_num: 5 # percolator iteration number\n cv_fold: 1\n fdr: 0.01\n fdr_level: psm\n fdr_level_choices:\n - psm\n - precursor\n - peptide\n - sequence\n use_fdr_for_each_raw: False\n frag_types: ['b_z1','b_z2','y_z1','y_z2']\n input_files:\n psm_type: alphapept\n psm_type_choices:\n - alphapept\n - pfind\n psm_files: []\n ms_file_type: alphapept_hdf\n ms_file_type_choices:\n - alphapept_hdf\n - thermo_raw # if alpharaw is installed\n - mgf\n - mzml\n ms_files: []\n other_score_column_mapping:\n alphapept: {}\n pfind:\n raw_score: Raw_Score\n msfragger:\n hyperscore: hyperscore\n nextscore: nextscore\n maxquant: {}\n output_folder: \"{PEPTDEEP_HOME}/rescore\"\n```\n\nTransfer learning will be applied when rescoring if `percolator:require_model_tuning`\nis True.\n\nThe corresponding MS files (`percolator:input_files:ms_files` and\n`percolator:input_files:ms_file_type`) must be provided to extract\nexperimental fragment intensities.\n\n------------------------------------------------------------------------\n\n#### install-models\n\n``` bash\npeptdeep install-models [--model-file url_or_local_model_zip] --overwrite True\n```\n\nRunning peptdeep for the first time, it will download and install models\nfrom [models on github](https://github.com/MannLabs/alphapeptdeep/releases/download/pre-trained-models/pretrained_models.zip)\ndefined in \u2018model_url\u2019 in the default yaml settings. This command will\nupdate `pretrained_models.zip` from `--model-file url_or_local_model_zip`.\n\nIt is also possible to use other models instead of the pretrained_models by providing `model_mgr:external_ms2_model`,\n`model_mgr:external_rt_model` and `model_mgr:external_ccs_model`.\n\n------------------------------------------------------------------------\n\n### Python and Jupyter notebooks\n\nUsing peptdeep from Python script or notebook provides the most flexible\nway to access all features in peptdeep.\n\nWe will introduce several usages of peptdeep via Python notebook:\n\n- [**global_settings**](#global_settings)\n- [**Pipeline APIs**](#pipeline-apis)\n- [**ModelManager**](#modelmanager)\n- [**Library Prediction**](#library-prediction)\n- [**DDA Rescoring**](#dda-rescoring)\n- [**HLA Peptide Prediction**](#hla-peptide-prediction)\n\n------------------------------------------------------------------------\n\n#### global_settings\n\nMost of the default parameters and attributes peptdeep functions and\nclasses are controlled by `peptdeep.settings.global_settings` which is a\n`dict`.\n\n``` python\nfrom peptdeep.settings import global_settings\n```\n\nThe default values of `global_settings` is defined in\n[default_settings.yaml](https://github.com/MannLabs/alphapeptdeep/blob/main/peptdeep/constants/default_settings.yaml).\n\n#### Pipeline APIs\n\nPipeline APIs provides the same functionalities with [CLI](#cli),\nincluding [library prediction](#library), [transfer\nlearning](#transfer), and [rescoring](#rescore).\n\n``` python\nfrom peptdeep.pipeline_api import (\n generate_library,\n transfer_learn,\n rescore,\n)\n```\n\nAll these functionalities take a `settings_dict` as the inputs, the dict\nstructure is the same as the settings yaml file. See the documatation of `generate_library`, `transfer_learn`, `rescore` in https://alphapeptdeep.readthedocs.io/en/latest/module_pipeline_api.html.\n\n#### ModelManager\n\n``` python\nfrom peptdeep.pretrained_models import ModelManager\n```\n\n[`ModelManager`](https://alphapeptdeep.readthedocs.io/en/latest/module_pretrained_models.html#peptdeep.pretrained_models.ModelManager) class is the main entry to access MS2/RT/CCS models. It provides functionalities to train/refine the models and then use the new models to predict the data.\n\nCheck [tutorial_model_manager.ipynb](https://github.com/MannLabs/alphapeptdeep/blob/main/nbs/docs/tutorial_model_manager.ipynb) for details.\n\n#### Library Prediction\n\n``` python\nfrom peptdeep.protein.fasta import PredictSpecLibFasta\n```\n\n[`PredictSpecLibFasta`](https://alphapeptdeep.readthedocs.io/en/latest/protein/fasta.html#peptdeep.protein.fasta.PredictSpecLibFasta) class provides functionalities to deal with fasta files or protein\nsequences and spectral libraries.\n\nCheck out\n[tutorial_speclib_from_fasta.ipynb](https://github.com/MannLabs/alphapeptdeep/blob/main/docs/nbs/tutorial_speclib_from_fasta.ipynb)\nfor details.\n\n#### DDA Rescoring\n\n``` python\nfrom peptdeep.rescore.percolator import Percolator\n```\n\n`Percolator` class provides functionalities to rescore DDA PSMs search by `pFind` and\n`AlphaPept`, (and `MaxQuant` if output FDR=100%), \u2026\n\nCheck out [test_percolator.ipynb](https://github.com/MannLabs/alphapeptdeep/blob/main/nbs_tests/test_percolator.ipynb)\nfor details.\n\n#### HLA Peptide Prediction\n\n``` python\nfrom peptdeep.model.model_interface import ModelInterface\nimport peptdeep.model.generic_property_prediction # model shop\n```\n\nBuilding new DL models for peptide property prediction is one of the key features of AlphaPeptDeep. The key functionalities are [`ModelInterface`](https://alphapeptdeep.readthedocs.io/en/latest/model/model_interface.html#peptdeep.model.model_interface.ModelInterface) and the pre-designed models and model interfaces in the model shop (module [`peptdeep.model.generic_property_prediction`](https://alphapeptdeep.readthedocs.io/en/latest/model/generic_property_prediction.html)).\n\nFor example, we can built a HLA classifier that distinguishes HLA peptides from non-HLA peptides, see https://github.com/MannLabs/PeptDeep-HLA for details.\n\n------------------------------------------------------------------------\n\n## Troubleshooting\n\nIn case of issues, check out the following:\n\n- [Issues](https://github.com/MannLabs/alphapeptdeep/issues). Try a few\n different search terms to find out if a similar problem has been\n encountered before.\n\n- [Discussions](https://github.com/MannLabs/alphapeptdeep/discussions).\n Check if your problem or feature requests has been discussed before.\n\n------------------------------------------------------------------------\n\n## How to contribute\n\nIf you like this software, you can give us a\n[star](https://github.com/MannLabs/alphapeptdeep/stargazers) to boost\nour visibility! All direct contributions are also welcome. Feel free to\npost a new [issue](https://github.com/MannLabs/alphapeptdeep/issues) or\nclone the repository and create a [pull\nrequest](https://github.com/MannLabs/alphapeptdeep/pulls) with a new\nbranch. For an even more interactive participation, check out the\n[discussions](https://github.com/MannLabs/alphapeptdeep/discussions) and\nthe [Contributors License Agreement](misc/CLA.md).\n\n### Notes for developers\n\n#### Tagging of changes\nIn order to have release notes automatically generated, changes need to be tagged with labels.\nThe following labels are used (should be safe-explanatory):\n`breaking-change`, `bug`, `enhancement`.\n\n#### Release a new version\nThis package uses a shared release process defined in the\n[alphashared](https://github.com/MannLabs/alphashared) repository. Please see the instructions\n[there](https://github.com/MannLabs/alphashared/blob/reusable-release-workflow/.github/workflows/README.md#release-a-new-version).\n\n#### pre-commit hooks\nIt is highly recommended to use the provided pre-commit hooks, as the CI pipeline enforces all checks therein to\npass in order to merge a branch.\n\nThe hooks need to be installed once by\n```bash\npre-commit install\n```\nYou can run the checks yourself using:\n```bash\npre-commit run --all-files\n```\n------------------------------------------------------------------------\n\n## Changelog\n\nSee the [HISTORY.md](HISTORY.md) for a full overview of the changes made\nin each version.\n",
"bugtrack_url": null,
"license": "Apache 2.0",
"summary": "The AlphaX deep learning framework for Proteomics",
"version": "1.3.0",
"project_urls": {
"Docs": "https://alphapeptdeep.readthedocs.io/en/latest/",
"GitHub": "https://github.com/MannLabs/peptdeep",
"Homepage": "https://github.com/MannLabs/peptdeep",
"Mann Labs at CPR": "https://www.cpr.ku.dk/research/proteomics/mann/",
"Mann Labs at MPIB": "https://www.biochem.mpg.de/mann",
"PyPi": "https://pypi.org/project/peptdeep/"
},
"split_keywords": [
"deep learning",
" proteomics",
" alphax ecosystem"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "2ca79c8e3c94fccf2a67c1804dae796f55515a1e6fcb1cc51e8ab75d5031e594",
"md5": "eb46aa2ba3204423a46bfce56aea8cc5",
"sha256": "c26a0266829ed625bfff8ddf1a6d5282749707af54f8a447e8cde0c4144479e8"
},
"downloads": -1,
"filename": "peptdeep-1.3.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "eb46aa2ba3204423a46bfce56aea8cc5",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 560245,
"upload_time": "2024-09-13T09:29:59",
"upload_time_iso_8601": "2024-09-13T09:29:59.699716Z",
"url": "https://files.pythonhosted.org/packages/2c/a7/9c8e3c94fccf2a67c1804dae796f55515a1e6fcb1cc51e8ab75d5031e594/peptdeep-1.3.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "a78734fce94f522c1feb1e28264ec66228790b57f3e0c281232ac24ab05aba0e",
"md5": "9d95192e357d9369d41885e7cdfbcd38",
"sha256": "4b2ec1cb919529d5c039b7eb99b91b57716093c6a1c497408f54bdd22cdd5c4e"
},
"downloads": -1,
"filename": "peptdeep-1.3.0.tar.gz",
"has_sig": false,
"md5_digest": "9d95192e357d9369d41885e7cdfbcd38",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 563939,
"upload_time": "2024-09-13T09:30:02",
"upload_time_iso_8601": "2024-09-13T09:30:02.064465Z",
"url": "https://files.pythonhosted.org/packages/a7/87/34fce94f522c1feb1e28264ec66228790b57f3e0c281232ac24ab05aba0e/peptdeep-1.3.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-09-13 09:30:02",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "MannLabs",
"github_project": "peptdeep",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "peptdeep"
}