triqler

Name	triqler JSON
Version	0.8.0 JSON
	download
home_page	https://github.com/statisticalbiotechnology/triqler
Summary	Triqler: TRansparent Identification-Quantification-Linked Error Rates
upload_time	2025-01-07 12:31:12
maintainer	None
docs_url	None
author	Matthew The
requires_python	<3.13,>=3.9
license	Apache-2.0
keywords	mass spectrometry missing values proteomics quantification
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Triqler: TRansparent Identification-Quantification-Linked Error Rates

[![PyPI version](https://img.shields.io/pypi/v/triqler.svg?logo=pypi&logoColor=FFE873)](https://pypi.org/project/triqler/)
[![Supported Python versions](https://img.shields.io/pypi/pyversions/triqler.svg?logo=python&logoColor=FFE873)](https://pypi.org/project/triqler/)
[![PyPI downloads](https://img.shields.io/pypi/dm/triqler.svg)](https://pypistats.org/packages/triqler)

Triqler is a probabilistic graphical model that propagates error
information through all steps from MS1 feature to protein level,
employing distributions in favor of point estimates, most notably for
missing value imputation. The model outputs posterior probabilities for
fold changes between treatment groups, highlighting uncertainty rather
than hiding it.

For a detailed explanation of how to install and run Triqler
(stand-alone or in combination with MaxQuant, Quandenser or Dinosaur) as
well as how to interpret the results, please read our [Triqler user
manual](https://www.biorxiv.org/content/10.1101/2020.09.24.311605v1).

Brief instructions for installing and running Triqler as well as
descriptions of the input and output formats can be found below.
Instructions for running the converters to the Triqler input format are
available in our
[wiki](https://github.com/statisticalbiotechnology/triqler/wiki).

## Method description / Citation

The, M. & Käll, L. (2019). Integrated identification and quantification
error probabilities for shotgun proteomics. *Molecular & Cellular
Proteomics, 18* (3), 561-570. <https://doi.org/10.1074/mcp.RA118.001018>

Truong, P., The, M., & Käll, L. (2023). Triqler for Protein
Summarization of Data from Data-Independent Acquisition Mass
Spectrometry. *Journal of Proteome Research, 22* (4), 1359-1366.
<https://doi.org/10.1021/acs.jproteome.2c00607>

### Installation via `pip`

    pip install triqler

### Installation from source

    git clone https://github.com/statisticalbiotechnology/triqler.git
    cd triqler
    pip install .

## Usage

    usage: triqler [-h] [--out_file OUT] [--fold_change_eval F]
                 [--decoy_pattern P] [--missing_value_prior D] [--min_samples N]
                 [--num_threads N] [--ttest] [--write_spectrum_quants]
                 [--write_protein_posteriors P_OUT]
                 [--write_group_posteriors G_OUT]
                 [--write_fold_change_posteriors F_OUT]
                 [--csv-field-size-limit CSV_FIELD_SIZE_LIMIT]
                 IN_FILE

    positional arguments:
      IN_FILE               List of PSMs with abundances (not log transformed!)
                            and search engine score. See README for a detailed
                            description of the columns.

    optional arguments:
      -h, --help            show this help message and exit
      --out_file OUT        Path to output file (writing in TSV format). N.B. if
                            more than 2 treatment groups are present, suffixes
                            will be added before the file extension. (default:
                            proteins.tsv)
      --fold_change_eval F  log2 fold change evaluation threshold. (default: 1.0)
      --decoy_pattern P     Prefix for decoy proteins. (default: decoy_)
      --missing_value_prior D
                            Distribution to fit for missing value prior. Use "DIA"
                            for using means of NaNs to fit the censored normal
                            distribution. The "default" option fits the censored
                            normal distribution with all observed XIC values.
                            (default: default)
      --min_samples N       Minimum number of samples a peptide needed to be
                            quantified in. (default: 2)
      --num_threads N       Number of threads, by default this is equal to the
                            number of CPU cores available on the device. (default:
                            6)
      --ttest               Use t-test for evaluating differential expression
                            instead of posterior probabilities. (default: False)
      --write_spectrum_quants
                            Write quantifications for consensus spectra. Only
                            works if consensus spectrum index are given in input.
                            (default: False)
      --write_protein_posteriors P_OUT
                            Write raw data of protein posteriors to the specified
                            file in TSV format. (default: )
      --write_group_posteriors G_OUT
                            Write raw data of treatment group posteriors to the
                            specified file in TSV format. (default: )
      --write_fold_change_posteriors F_OUT
                            Write raw data of fold change posteriors to the
                            specified file in TSV format. (default: )
      --csv-field-size-limit CSV_FIELD_SIZE_LIMIT
                            Set a new maximum CSV field size (default: None)

## Example

A sample file `iPRG2016.tsv` is provided in the `example` folder. You
can run Triqler on this file by running the following command:

    python -m triqler --fold_change_eval 0.8 example/iPRG2016.tsv

A detailed example of the different levels of Triqler output can be
found in [Supplementary Note
2](https://www.nature.com/articles/s41467-020-17037-3#Sec13) of the
Quandenser publication.

## Interface

The simplest input format is a tab-separated file consisting of a header
line followed by one PSM per line in the following format:

    run <tab> condition <tab> charge <tab> searchScore <tab> intensity <tab> peptide     <tab> proteins
    r1  <tab> 1         <tab> 2      <tab> 1.345       <tab> 21359.123 <tab> A.PEPTIDE.A <tab> proteinA <tab> proteinB 
    r2  <tab> 1         <tab> 2      <tab> 1.945       <tab> 24837.398 <tab> A.PEPTIDE.A <tab> proteinA <tab> proteinB 
    r3  <tab> 2         <tab> 2      <tab> 1.684       <tab> 25498.869 <tab> A.PEPTIDE.A <tab> proteinA <tab> proteinB
    ...
    r1  <tab> 1         <tab> 3      <tab> 0.452       <tab> 13642.232 <tab> A.NTPEPTIDE.- <tab> decoy_proteinA

Alternatively, if you have match-between-run probabilities, a slightly
more complicated input format can be used as input:

    run <tab> condition <tab> charge <tab> searchScore <tab> spectrumId <tab> linkPEP <tab> featureClusterId <tab> intensity <tab> peptide     <tab> proteins
    r1  <tab> 1         <tab> 2      <tab> 1.345       <tab> 3          <tab> 0.0     <tab> 1                <tab> 21359.123 <tab> A.PEPTIDE.A <tab> proteinA <tab> proteinB 
    r2  <tab> 1         <tab> 2      <tab> 1.345       <tab> 3          <tab> 0.021   <tab> 1                <tab> 24837.398 <tab> A.PEPTIDE.A <tab> proteinA <tab> proteinB 
    r3  <tab> 2         <tab> 2      <tab> 1.684       <tab> 4          <tab> 0.0     <tab> 1                <tab> 25498.869 <tab> A.PEPTIDE.A <tab> proteinA <tab> proteinB
    ...
    r1  <tab> 1         <tab> 3      <tab> 0.452       <tab> 6568       <tab> 0.15    <tab> 9845             <tab> 13642.232 <tab> A.NTPEPTIDE.- <tab> decoy_proteinA

Some remarks:

-   For Triqler to work, it also needs decoy PSMs, preferably resulting
    from a search engine search with a reversed protein sequence
    database concatenated to the target database.
-   The intensities should **not** be log transformed, Triqler will do
    this transformation for you.
-   An intensity of 0 is considered a missing value and the row will be
    discarded.
-   The search engine scores should be such that higher scores indicate
    a higher confidence in the PSM.
-   We recommend usage of well calibrated search engine scores, e.g. the
    SVM scores from Percolator.
-   Do **not** set \--fold_change_eval to 0 or a very low value (\<0.2).
    The fold change posterior distribution always has a certain width,
    reflecting the uncertainty of our estimation. Even if the fold
    change is 0, this distribution will necessarily spill over into low
    fold change values, without there being any ground for differential
    expression.
-   Multiple proteins can be specified at the end of the line, separated
    by tabs. However, it should be noted that Triqler currently discards
    shared peptides.

The output format is a tab-separated file consisting of a header line
followed by one protein per line in the following format:

    q_value <tab> posterior_error_prob <tab> protein <tab> num_peptides <tab> protein_id_PEP <tab> log2_fold_change <tab> diff_exp_prob_<FC> <tab> <condition1>:<run1> <tab> <condition1>:<run2> <tab> ... <tab> <conditionM>:<runN> <tab> peptides

Some remarks:

-   The *q_value* and *posterior_error_prob* columns represent
    respectively the FDR and PEP for the hypothesis that the protein was
    correctly identified and has a fold change larger than the specified
    \--fold_change_eval.
-   The *protein_id_PEP* and *diff_exp_prob\_\<FC\>* columns are simply
    the separate probabilities that make up the above hypothesis test,
    i.e. for correct identification and for fold change respectively.
-   The reported fold change is log2 transformed and is the expected
    value based on the posterior distribution of the fold change.
-   If more than 2 treatment groups are present, separate files will be
    written out for each pairwise comparison with suffixes added before
    the file extension, e.g. proteins.1vs3.tsv.
-   The reported protein expressions per run are the expected value of
    the protein\'s expression in that run. They represent relative
    values (**not** log transformed) to the protein\'s mean expression
    across all runs, which itself would correspond to the value 1.0. For
    example, a value of 1.5 means that the expression in this sample is
    50% higher than the mean across all runs. A second example comparing
    values across samples: if sample1 has a value of 2.0 and sample2 a
    value of 1.5, it means that the expression in sample1 is 33% higher
    than in sample2 (2.0/1.5=1.33). We don\'t necessarily recommend
    using these values for downstream analysis, as the idea is that the
    actual value of interest is the fold change between treatment groups
    rather than between samples.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/statisticalbiotechnology/triqler",
    "name": "triqler",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.13,>=3.9",
    "maintainer_email": null,
    "keywords": "mass spectrometry, missing values, proteomics, quantification",
    "author": "Matthew The",
    "author_email": "matthew.the@tum.de",
    "download_url": null,
    "platform": null,
    "description": "# Triqler: TRansparent Identification-Quantification-Linked Error Rates\n\n[![PyPI version](https://img.shields.io/pypi/v/triqler.svg?logo=pypi&logoColor=FFE873)](https://pypi.org/project/triqler/)\n[![Supported Python versions](https://img.shields.io/pypi/pyversions/triqler.svg?logo=python&logoColor=FFE873)](https://pypi.org/project/triqler/)\n[![PyPI downloads](https://img.shields.io/pypi/dm/triqler.svg)](https://pypistats.org/packages/triqler)\n\nTriqler is a probabilistic graphical model that propagates error\ninformation through all steps from MS1 feature to protein level,\nemploying distributions in favor of point estimates, most notably for\nmissing value imputation. The model outputs posterior probabilities for\nfold changes between treatment groups, highlighting uncertainty rather\nthan hiding it.\n\nFor a detailed explanation of how to install and run Triqler\n(stand-alone or in combination with MaxQuant, Quandenser or Dinosaur) as\nwell as how to interpret the results, please read our [Triqler user\nmanual](https://www.biorxiv.org/content/10.1101/2020.09.24.311605v1).\n\nBrief instructions for installing and running Triqler as well as\ndescriptions of the input and output formats can be found below.\nInstructions for running the converters to the Triqler input format are\navailable in our\n[wiki](https://github.com/statisticalbiotechnology/triqler/wiki).\n\n## Method description / Citation\n\nThe, M. & K\u00e4ll, L. (2019). Integrated identification and quantification\nerror probabilities for shotgun proteomics. *Molecular & Cellular\nProteomics, 18* (3), 561-570. <https://doi.org/10.1074/mcp.RA118.001018>\n\nTruong, P., The, M., & K\u00e4ll, L. (2023). Triqler for Protein\nSummarization of Data from Data-Independent Acquisition Mass\nSpectrometry. *Journal of Proteome Research, 22* (4), 1359-1366.\n<https://doi.org/10.1021/acs.jproteome.2c00607>\n\n### Installation via `pip`\n\n    pip install triqler\n\n### Installation from source\n\n    git clone https://github.com/statisticalbiotechnology/triqler.git\n    cd triqler\n    pip install .\n\n## Usage\n\n    usage: triqler [-h] [--out_file OUT] [--fold_change_eval F]\n                 [--decoy_pattern P] [--missing_value_prior D] [--min_samples N]\n                 [--num_threads N] [--ttest] [--write_spectrum_quants]\n                 [--write_protein_posteriors P_OUT]\n                 [--write_group_posteriors G_OUT]\n                 [--write_fold_change_posteriors F_OUT]\n                 [--csv-field-size-limit CSV_FIELD_SIZE_LIMIT]\n                 IN_FILE\n\n    positional arguments:\n      IN_FILE               List of PSMs with abundances (not log transformed!)\n                            and search engine score. See README for a detailed\n                            description of the columns.\n\n    optional arguments:\n      -h, --help            show this help message and exit\n      --out_file OUT        Path to output file (writing in TSV format). N.B. if\n                            more than 2 treatment groups are present, suffixes\n                            will be added before the file extension. (default:\n                            proteins.tsv)\n      --fold_change_eval F  log2 fold change evaluation threshold. (default: 1.0)\n      --decoy_pattern P     Prefix for decoy proteins. (default: decoy_)\n      --missing_value_prior D\n                            Distribution to fit for missing value prior. Use \"DIA\"\n                            for using means of NaNs to fit the censored normal\n                            distribution. The \"default\" option fits the censored\n                            normal distribution with all observed XIC values.\n                            (default: default)\n      --min_samples N       Minimum number of samples a peptide needed to be\n                            quantified in. (default: 2)\n      --num_threads N       Number of threads, by default this is equal to the\n                            number of CPU cores available on the device. (default:\n                            6)\n      --ttest               Use t-test for evaluating differential expression\n                            instead of posterior probabilities. (default: False)\n      --write_spectrum_quants\n                            Write quantifications for consensus spectra. Only\n                            works if consensus spectrum index are given in input.\n                            (default: False)\n      --write_protein_posteriors P_OUT\n                            Write raw data of protein posteriors to the specified\n                            file in TSV format. (default: )\n      --write_group_posteriors G_OUT\n                            Write raw data of treatment group posteriors to the\n                            specified file in TSV format. (default: )\n      --write_fold_change_posteriors F_OUT\n                            Write raw data of fold change posteriors to the\n                            specified file in TSV format. (default: )\n      --csv-field-size-limit CSV_FIELD_SIZE_LIMIT\n                            Set a new maximum CSV field size (default: None)\n\n## Example\n\nA sample file `iPRG2016.tsv` is provided in the `example` folder. You\ncan run Triqler on this file by running the following command:\n\n    python -m triqler --fold_change_eval 0.8 example/iPRG2016.tsv\n\nA detailed example of the different levels of Triqler output can be\nfound in [Supplementary Note\n2](https://www.nature.com/articles/s41467-020-17037-3#Sec13) of the\nQuandenser publication.\n\n## Interface\n\nThe simplest input format is a tab-separated file consisting of a header\nline followed by one PSM per line in the following format:\n\n    run <tab> condition <tab> charge <tab> searchScore <tab> intensity <tab> peptide     <tab> proteins\n    r1  <tab> 1         <tab> 2      <tab> 1.345       <tab> 21359.123 <tab> A.PEPTIDE.A <tab> proteinA <tab> proteinB \n    r2  <tab> 1         <tab> 2      <tab> 1.945       <tab> 24837.398 <tab> A.PEPTIDE.A <tab> proteinA <tab> proteinB \n    r3  <tab> 2         <tab> 2      <tab> 1.684       <tab> 25498.869 <tab> A.PEPTIDE.A <tab> proteinA <tab> proteinB\n    ...\n    r1  <tab> 1         <tab> 3      <tab> 0.452       <tab> 13642.232 <tab> A.NTPEPTIDE.- <tab> decoy_proteinA\n\nAlternatively, if you have match-between-run probabilities, a slightly\nmore complicated input format can be used as input:\n\n    run <tab> condition <tab> charge <tab> searchScore <tab> spectrumId <tab> linkPEP <tab> featureClusterId <tab> intensity <tab> peptide     <tab> proteins\n    r1  <tab> 1         <tab> 2      <tab> 1.345       <tab> 3          <tab> 0.0     <tab> 1                <tab> 21359.123 <tab> A.PEPTIDE.A <tab> proteinA <tab> proteinB \n    r2  <tab> 1         <tab> 2      <tab> 1.345       <tab> 3          <tab> 0.021   <tab> 1                <tab> 24837.398 <tab> A.PEPTIDE.A <tab> proteinA <tab> proteinB \n    r3  <tab> 2         <tab> 2      <tab> 1.684       <tab> 4          <tab> 0.0     <tab> 1                <tab> 25498.869 <tab> A.PEPTIDE.A <tab> proteinA <tab> proteinB\n    ...\n    r1  <tab> 1         <tab> 3      <tab> 0.452       <tab> 6568       <tab> 0.15    <tab> 9845             <tab> 13642.232 <tab> A.NTPEPTIDE.- <tab> decoy_proteinA\n\nSome remarks:\n\n-   For Triqler to work, it also needs decoy PSMs, preferably resulting\n    from a search engine search with a reversed protein sequence\n    database concatenated to the target database.\n-   The intensities should **not** be log transformed, Triqler will do\n    this transformation for you.\n-   An intensity of 0 is considered a missing value and the row will be\n    discarded.\n-   The search engine scores should be such that higher scores indicate\n    a higher confidence in the PSM.\n-   We recommend usage of well calibrated search engine scores, e.g. the\n    SVM scores from Percolator.\n-   Do **not** set \\--fold_change_eval to 0 or a very low value (\\<0.2).\n    The fold change posterior distribution always has a certain width,\n    reflecting the uncertainty of our estimation. Even if the fold\n    change is 0, this distribution will necessarily spill over into low\n    fold change values, without there being any ground for differential\n    expression.\n-   Multiple proteins can be specified at the end of the line, separated\n    by tabs. However, it should be noted that Triqler currently discards\n    shared peptides.\n\nThe output format is a tab-separated file consisting of a header line\nfollowed by one protein per line in the following format:\n\n    q_value <tab> posterior_error_prob <tab> protein <tab> num_peptides <tab> protein_id_PEP <tab> log2_fold_change <tab> diff_exp_prob_<FC> <tab> <condition1>:<run1> <tab> <condition1>:<run2> <tab> ... <tab> <conditionM>:<runN> <tab> peptides\n\nSome remarks:\n\n-   The *q_value* and *posterior_error_prob* columns represent\n    respectively the FDR and PEP for the hypothesis that the protein was\n    correctly identified and has a fold change larger than the specified\n    \\--fold_change_eval.\n-   The *protein_id_PEP* and *diff_exp_prob\\_\\<FC\\>* columns are simply\n    the separate probabilities that make up the above hypothesis test,\n    i.e. for correct identification and for fold change respectively.\n-   The reported fold change is log2 transformed and is the expected\n    value based on the posterior distribution of the fold change.\n-   If more than 2 treatment groups are present, separate files will be\n    written out for each pairwise comparison with suffixes added before\n    the file extension, e.g. proteins.1vs3.tsv.\n-   The reported protein expressions per run are the expected value of\n    the protein\\'s expression in that run. They represent relative\n    values (**not** log transformed) to the protein\\'s mean expression\n    across all runs, which itself would correspond to the value 1.0. For\n    example, a value of 1.5 means that the expression in this sample is\n    50% higher than the mean across all runs. A second example comparing\n    values across samples: if sample1 has a value of 2.0 and sample2 a\n    value of 1.5, it means that the expression in sample1 is 33% higher\n    than in sample2 (2.0/1.5=1.33). We don\\'t necessarily recommend\n    using these values for downstream analysis, as the idea is that the\n    actual value of interest is the fold change between treatment groups\n    rather than between samples.\n\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Triqler: TRansparent Identification-Quantification-Linked Error Rates",
    "version": "0.8.0",
    "project_urls": {
        "Homepage": "https://github.com/statisticalbiotechnology/triqler",
        "Repository": "https://github.com/statisticalbiotechnology/triqler"
    },
    "split_keywords": [
        "mass spectrometry",
        " missing values",
        " proteomics",
        " quantification"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "27255d6684245ac0afd413c8df10896ed35490f89b52b6d0316aa412c64943ae",
                "md5": "ef98ea1e1bdbeedbf21251be58b24c13",
                "sha256": "eb84ed9f071da4d38e668de0b739a8dd39c76113b8c268ab435bc50245d0442a"
            },
            "downloads": -1,
            "filename": "triqler-0.8.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ef98ea1e1bdbeedbf21251be58b24c13",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.13,>=3.9",
            "size": 64570,
            "upload_time": "2025-01-07T12:31:12",
            "upload_time_iso_8601": "2025-01-07T12:31:12.923873Z",
            "url": "https://files.pythonhosted.org/packages/27/25/5d6684245ac0afd413c8df10896ed35490f89b52b6d0316aa412c64943ae/triqler-0.8.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-07 12:31:12",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "statisticalbiotechnology",
    "github_project": "triqler",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "triqler"
}

Matthew The