ibaqpy


Nameibaqpy JSON
Version 0.0.2 PyPI version JSON
download
home_pagehttps://github.com/bigbio/ibaqpy/
SummaryPython package to compute intensity base absolute expression values
upload_time2023-05-22 08:40:20
maintainer
docs_urlNone
authorYasset Perez-Riverol
requires_python
licenseMIT
keywords proteomics label-free absolute quantification
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # ibaqpy

[![Python application](https://github.com/bigbio/ibaqpy/actions/workflows/python-app.yml/badge.svg)](https://github.com/bigbio/ibaqpy/actions/workflows/python-app.yml)

iBAQ (intensity Based Absolute Quantification) determines the abundance of a protein by dividing the total precursor intensities by the number of theoretically observable peptides of the protein. The TPA (Total Protein Approach) value is determined by summing peptide intensities of each protein and then dividing by the molecular mass to determine the relative concentration of each protein. By using [ProteomicRuler](https://www.sciencedirect.com/science/article/pii/S1535947620337749), it is possible to calculate the protein copy number and absolute concentration. **ibaqpy** compute IBAQ values, TPA values, copy numbers and concentration for proteins starting from a msstats input file and a SDRF experimental design file. This package provides multiple tools: 

- `peptide_file_generation.py`: generate a peptide file from a msstats input file and a SDRF experimental design file. 

- `peptide_normalization.py`: Normalize the input output file from script `peptide_file_generation.py`. It includes multiple steps such as peptidoform normalization, peptidorom to peptide summarization, peptide intensity normalization, and imputation. 

- `compute_ibaq.py`: Compute IBAQ values from the output file from script `peptide_normalization.py`.

- `compute_tpa.py`: Compute TPA values, protein copy numbers and concentration from the output file from script `peptide_file_generation.py`.


### How to install ibaqpy

Ibaqpy is available in PyPI and can be installed using pip:

```asciidoc
pip install ibaqpy
```

You can install the package from code: 

1. Clone the repository:

```asciidoc
>$ git clone https://github.com/bigbio/ibaqpy
>$ cd ibaqpy
```

2. Install conda environment:

```asciidoc
>$ mamba create --name ibaqpy python=3.7
>$ conda activate ibaqpy
>$ mamba install -c conda-forge --file requirements.txt
```

3. Install ibaqpy:

```asciidoc
>$ python setup.py install
```

### Collecting intensity files 

Absolute quantification files has been store in the following url: 

```
https://ftp.pride.ebi.ac.uk/pub/databases/pride/resources/proteomes/absolute-expression/
```

Inside each project reanalysis folder, the folder proteomicslfq contains the msstats input file with the structure `{Name of the project}_msstats_in.csv`. 

E.g. http://ftp.pride.ebi.ac.uk/pub/databases/pride/resources/proteomes/absolute-expression/PXD003947/proteomicslfq/PXD003947.sdrf_openms_design_msstats_in.csv 

### Generate Peptidoform Intesity file - peptide_file_generation.py

```asciidoc
python  peptide_file_generation.py --msstats PXD003947.sdrf_openms_design_msstats_in.csv --sdrf PXD003947.sdrf.tsv --min_aa 7 --min_unique 2 --output PXD003947-peptides.csv
```

The command provides an additional `flag` for compression data analysis where the msstats and sdrf files are compressed. 

```asciidoc
python peptide_file_generation.py --help
Usage: peptide_file_generation.py [OPTIONS]

  Conversion of peptide intensity information into a file that containers
  peptide intensities but also the metadata around the conditions, the
  retention time, charge states, etc.

  :param msstats: MsStats file import generated by quantms 
  :param sdrf: SDRF file import generated by quantms 
  :param min_aa: Minimum number of amino acids to filter peptides
  :param min_unique: Minimum number of unique peptides to filter proteins
  :param compress: Read all files compress
  :param output: Peptide intensity file including other all properties for
                 normalization

Options:
  -m, --msstats TEXT  MsStats file import generated by quantms  [required]
  -s, --sdrf TEXT     SDRF file import generated by quantms  [required]
  --min_aa            Minimum number of amino acids to filter peptides (default: 7)
  --min_unique        Minimum number of unique peptides to filter proteins (default: 2)
  --compress          Read all files compress
  -o, --output TEXT   Peptide intensity file including other all properties
                      for normalization
  --help              Show this message and exit.
```

### Peptide Normalization - peptide_normalization.py

```asciidoc
python peptide_normalization.py --peptides PXD003947-peptides.csv --impute --normalize --contaminants data/contaminants_ids.tsv --output PXD003947-peptides-norm.tsv
``` 

The command provides an additional `flag` for skip_normalization, impute, pnormalization, compress, log2, violin, verbose.

```asciidoc
python peptide_normalization.py --help
Usage: peptide_normalization.py [OPTIONS]

  Normalize the peptide intensities using different methods for a file output
  of peptides with the format described in peptide_file_generation.py.

  :param peptides: Peptides files from the peptide file generation tool
  :param contaminants: Contaminants and high abundant proteins to be removed
  :param output: Peptide intensity file including other all properties for normalization
  :param skip_normalization: Skip normalization step
  :param nmethod: Normalization method used to normalize intensities for all samples
  :param impute: Impute the missing values using MissForest
  :param pnormalization: Normalize the peptide intensities using different methods
  :param compress: Read the input peptides file in compress gzip file
  :param log2: Transform to log2 the peptide intensity values before normalization
  :param violin: Use violin plot instead of boxplot for distribution representations
  :param verbose: Print addition information about the distributions of the intensities, 
                  number of peptides remove after normalization, etc.

Options:
  --peptides TEXT       Peptides files from the peptide file generation tool  [required]
  --contaminants TEXT   Contaminants and high abundant proteins to be removed
  --skip_normalization  Skip normalization step (default: False)
  --nmethod             Normalization method used to normalize intensities for all samples
                        (default: qnorm)
  --impute              Impute the missing values using MissForest
  --pnormalization      Normalize the peptide intensities using different methods
  --compress            Read the input peptides file in compress gzip file
  --log2                Transform to log2 the peptide intensity values before normalization
  --violin              Use violin plot instead of boxplot for distribution representations
  --verbose             Print addition information about the distributions of the intensities, 
                        number of peptides remove after normalization, etc.
  --output TEXT         Peptide intensity file including other all properties for normalization
  --help                Show this message and exit.
```

Peptide normalization starts from the output file from script `peptide_file_generation.py`. The structure of the input contains the following columns: 

- ProteinName: Protein name
- PeptideSequence: Peptide sequence including post-translation modifications `(e.g. .(Acetyl)ASPDWGYDDKN(Deamidated)GPEQWSK)`
- PrecursorCharge: Precursor charge
- FragmentIon: Fragment ion
- ProductCharge: Product charge
- IsotopeLabelType: Isotope label type
- Condition: Condition label `(e.g. heart)`
- BioReplicate: Biological replicate index `(e.g. 1)`
- Run: Run index `(e.g. 1)`
- Fraction: Fraction index `(e.g. 1)`
- Intensity: Peptide intensity
- Reference: Name of the RAW file containing the peptide intensity `(e.g. Adult_Heart_Gel_Elite_54_f16)`
- SampleID: Sample ID `(e.g. PXD003947-Sample-3)`
- StudyID: Study ID `(e.g. PXD003947)`. In most of the cases the study ID is the same as the ProteomeXchange ID.

#### Removing Contaminants and Decoys

The first step is to remove contaminants and decoys from the input file. The script `peptide_normalization.py` provides a parameter `--contaminants` for the user to provide a file with a list of protein accessions which represent each contaminant in the file. An example file can be seen in `data/contaminants.txt`. In addition to all the proteins accessions, the tool remove all the proteins with the following prefixes: `CONTAMINANT` and `DECOY`.

#### Peptidoform Normalization

A peptidoform is a combination of a `PeptideSequence(Modifications) + Charge + BioReplicate + Fraction`. In the current version of the file, each row correspond to one peptidoform. 

The current version of the tool uses the parackage [qnorm](https://pypi.org/project/qnorm/) to normalize the intensities for each peptidofrom. **qnorm** implements a quantile normalization method. 

#### Peptidoform to Peptide Summarization

For each peptidoform a peptide sequence (canonical) with not modification is generated. The intensity of all peptides group by biological replicate are `sum`. 

Then, the intensities of the peptides across different biological replicates are summarize using the function `median`. 

At the end of this step, for each peptide, the corresponding protein + the intensity of the peptide is stored. 

#### Peptide Intensity Imputation and Normalization

Before the final two steps (peptide normalization and imputation), the algorithm removes all peptides that are source of missing values significantly. The algorithm removes all peptides that have more than 80% of missing values and peptides that do not appear in more than 1 sample. 

Finally, two extra steps are performed: 

- ``peptide intensity imputation``: Imputation is performed using the package [missingpy](https://pypi.org/project/missingpy/). The algorithm uses a Random Forest algorithm to perform the imputation.
- ``peptide intensity normalization``: Similar to the normalization of the peptidoform intensities, the peptide intensities are normalized using the package [qnorm](https://pypi.org/project/qnorm/).

### Compute IBAQ - compute_ibaq.py
IBAQ is an absolute quantitative method based on strength that can be used to estimate the relative abundance of proteins in a sample. IBAQ value is the total intensity of a protein divided by the number of theoretical peptides.

```asciidoc
python compute_ibaq.py --fasta Homo-sapiens-uniprot-reviewed-contaminants-decoy-202210.fasta --peptides PXD003947-peptides.csv --enzyme "Trypsin" --normalize --output PXD003947-ibaq.tsv
``` 

The command provides an additional `flag` for normalize IBAQ values.

```asciidoc
python compute_ibaq.py --help
Usage: compute_ibaq.py [OPTIONS]

  Compute the IBAQ values for a file output of peptides with the format described in
  peptide_normalization.py.

  :param min_aa: Minimum number of amino acids to consider a peptide
  :param max_aa: Maximum number of amino acids to consider a peptide
  :param fasta: Fasta file used to perform the peptide identification
  :param peptides: Peptide intensity file
  :param enzyme: Enzyme used to digest the protein sample
  :param normalize: use some basic normalization steps
  :param output: output format containing the ibaq values

Options:
  -f, --fasta TEXT      Protein database to compute IBAQ values  [required]
  -p, --peptides TEXT   Peptide identifications with intensities following the peptide intensity output  [required]
  -e, --enzyme          Enzyme used during the analysis of the dataset (default: Trypsin)
  -n, --normalize       Normalize IBAQ values using by using the total IBAQ of the experiment
  --min_aa              Minimum number of amino acids to consider a peptide (default: 7)
  --max_aa              Maximum number of amino acids to consider a peptide (default: 30)
  -o, --output TEXT     Output format containing the ibaq values
  --help                Show this message and exit.
```

#### Performs the Enzymatic Digestion
The current version of this tool uses OpenMS method to load fasta file, and use [ProteaseDigestion](https://openms.de/current_doxygen/html/classOpenMS_1_1ProteaseDigestion.html) to enzyme digestion of protein sequences, and finally get the theoretical peptide number of each protein.

#### Calculate the IBAQ Value
First, peptide intensity dataframe was grouped according to protein name, sample name and condition. The protein intensity of each group was summed. Finally, the sum of the intensity of the protein is divided by the number of theoretical peptides.

If protein-group exists in the peptide intensity dataframe, the intensity of all proteins in the protein-group is summed based on the above steps, and then multiplied by the number of proteins in the protein-group.

#### IBAQ Normalization  
Normalize the ibaq values using the total ibaq of the sample. The resulted ibaq values are then multiplied by 100'000'000 (PRIDE database noramalization), for the ibaq ppb and log10 shifted by 10 (ProteomicsDB)

### Compute TPA - compute_tpa.py
The total protein approach (TPA) is a label- and standard-free method for absolute protein quantitation of proteins using large-scale proteomic data. In the current version of the tool, the TPA value is the total intensity of the protein divided by its theoretical molecular mass.

[ProteomicRuler](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4256500/) is a method for protein copy number and concentration estimation that does not require the use of isotope labeled standards. It uses the mass spectrum signal of histones as a "proteomic ruler" because it is proportional to the amount of DNA in the sample, which in turn depends on cell count. Thus, this approach can add an absolute scale to the mass spectrometry readout and allow estimates of the copy number of individual proteins in each cell.

```asciidoc
python compute_tpa.py --fasta Homo-sapiens-uniprot-reviewed-contaminants-decoy-202210.fasta --contaminants contaminants_ids.tsv --peptides PXD003947-peptides.csv --ruler --ploidy 2 --cpc 200 --output PXD003947-tpa.tsv
``` 

Note: The peptide intensity file used in the calculation of TPA and the ProteomicRuler is not normalized!

```asciidoc
python compute_tpa.py --help
Usage: compute_tpa.py [OPTIONS]

  Compute the protein copy numbers and concentrations according to a file output of peptides with the
  format described in peptide_file_generation.py.

  :param fasta: Fasta file used to perform the peptide identification
  :param peptides: Peptide intensity file
  :param contaminants: Contaminants file
  :param ruler: Whether to compute protein copy number, weight and concentration.
  :param ploidy: Ploidy number
  :param cpc: Cellular protein concentration(g/L)
  :param output: Output format containing the TPA values, protein copy numbers and concentrations

Options:
  -f, --fasta TEXT      Protein database to compute IBAQ values  [required]
  -p, --peptides TEXT   Peptide identifications with intensities following the peptide intensity output  [required]
  --contaminants        Contaminants and high abundant proteins to be removed
  -r, --ruler           Calculate protein copy number and concentration according to ProteomicRuler
  -n, --ploidy          Ploidy number (default: 2)
  -c, --cpc             Cellular protein concentration(g/L) (default: 200)
  -o, --output TEXT     Output format containing the TPA values, protein copy numbers and concentrations
  --help                Show this message and exit.
```

#### Calculate the TPA Value
The OpenMS tool was used to calculate the theoretical molecular mass of each protein. Similar to the calculation of IBAQ, the TPA value of protein-group was the sum of its intensity divided by the sum of the theoretical molecular mass.

#### Calculate the Cellular Protein Copy Number and Concentration
The protein copy calculation follows the following formula:
```
protein copies per cell = protein MS-signal *  (avogadro / molecular mass) * (DNA mass / histone MS-signal)
```
For cellular protein copy number calculation, the uniprot accession of histones were obtained from species first, and the molecular mass of DNA was calculated. Then the dataframe was grouped according to different conditions, and the copy number, molar number and mass of proteins were calculated.

In the calculation of protein concentration, the volume is calculated according to the cell protein concentration first, and then the protein mass is divided by the volume to calculate the intracellular protein concentration.

### Credits 

- Julianus Pfeuffer
- Yasset Perez-Riverol
- Hong Wang

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/bigbio/ibaqpy/",
    "name": "ibaqpy",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "Proteomics,Label-free,absolute quantification",
    "author": "Yasset Perez-Riverol",
    "author_email": "ypriverol@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/90/f8/cfefc3e5211ece7791ae22036d204bf1289e273988c6954b0512713b2859/ibaqpy-0.0.2.tar.gz",
    "platform": null,
    "description": "# ibaqpy\n\n[![Python application](https://github.com/bigbio/ibaqpy/actions/workflows/python-app.yml/badge.svg)](https://github.com/bigbio/ibaqpy/actions/workflows/python-app.yml)\n\niBAQ (intensity Based Absolute Quantification) determines the abundance of a protein by dividing the total precursor intensities by the number of theoretically observable peptides of the protein. The TPA (Total Protein Approach) value is determined by summing peptide intensities of each protein and then dividing by the molecular mass to determine the relative concentration of each protein. By using [ProteomicRuler](https://www.sciencedirect.com/science/article/pii/S1535947620337749), it is possible to calculate the protein copy number and absolute concentration. **ibaqpy** compute IBAQ values, TPA values, copy numbers and concentration for proteins starting from a msstats input file and a SDRF experimental design file. This package provides multiple tools: \n\n- `peptide_file_generation.py`: generate a peptide file from a msstats input file and a SDRF experimental design file. \n\n- `peptide_normalization.py`: Normalize the input output file from script `peptide_file_generation.py`. It includes multiple steps such as peptidoform normalization, peptidorom to peptide summarization, peptide intensity normalization, and imputation. \n\n- `compute_ibaq.py`: Compute IBAQ values from the output file from script `peptide_normalization.py`.\n\n- `compute_tpa.py`: Compute TPA values, protein copy numbers and concentration from the output file from script `peptide_file_generation.py`.\n\n\n### How to install ibaqpy\n\nIbaqpy is available in PyPI and can be installed using pip:\n\n```asciidoc\npip install ibaqpy\n```\n\nYou can install the package from code: \n\n1. Clone the repository:\n\n```asciidoc\n>$ git clone https://github.com/bigbio/ibaqpy\n>$ cd ibaqpy\n```\n\n2. Install conda environment:\n\n```asciidoc\n>$ mamba create --name ibaqpy python=3.7\n>$ conda activate ibaqpy\n>$ mamba install -c conda-forge --file requirements.txt\n```\n\n3. Install ibaqpy:\n\n```asciidoc\n>$ python setup.py install\n```\n\n### Collecting intensity files \n\nAbsolute quantification files has been store in the following url: \n\n```\nhttps://ftp.pride.ebi.ac.uk/pub/databases/pride/resources/proteomes/absolute-expression/\n```\n\nInside each project reanalysis folder, the folder proteomicslfq contains the msstats input file with the structure `{Name of the project}_msstats_in.csv`. \n\nE.g. http://ftp.pride.ebi.ac.uk/pub/databases/pride/resources/proteomes/absolute-expression/PXD003947/proteomicslfq/PXD003947.sdrf_openms_design_msstats_in.csv \n\n### Generate Peptidoform Intesity file - peptide_file_generation.py\n\n```asciidoc\npython  peptide_file_generation.py --msstats PXD003947.sdrf_openms_design_msstats_in.csv --sdrf PXD003947.sdrf.tsv --min_aa 7 --min_unique 2 --output PXD003947-peptides.csv\n```\n\nThe command provides an additional `flag` for compression data analysis where the msstats and sdrf files are compressed. \n\n```asciidoc\npython peptide_file_generation.py --help\nUsage: peptide_file_generation.py [OPTIONS]\n\n  Conversion of peptide intensity information into a file that containers\n  peptide intensities but also the metadata around the conditions, the\n  retention time, charge states, etc.\n\n  :param msstats: MsStats file import generated by quantms \n  :param sdrf: SDRF file import generated by quantms \n  :param min_aa: Minimum number of amino acids to filter peptides\n  :param min_unique: Minimum number of unique peptides to filter proteins\n  :param compress: Read all files compress\n  :param output: Peptide intensity file including other all properties for\n                 normalization\n\nOptions:\n  -m, --msstats TEXT  MsStats file import generated by quantms  [required]\n  -s, --sdrf TEXT     SDRF file import generated by quantms  [required]\n  --min_aa            Minimum number of amino acids to filter peptides (default: 7)\n  --min_unique        Minimum number of unique peptides to filter proteins (default: 2)\n  --compress          Read all files compress\n  -o, --output TEXT   Peptide intensity file including other all properties\n                      for normalization\n  --help              Show this message and exit.\n```\n\n### Peptide Normalization - peptide_normalization.py\n\n```asciidoc\npython peptide_normalization.py --peptides PXD003947-peptides.csv --impute --normalize --contaminants data/contaminants_ids.tsv --output PXD003947-peptides-norm.tsv\n``` \n\nThe command provides an additional `flag` for skip_normalization, impute, pnormalization, compress, log2, violin, verbose.\n\n```asciidoc\npython peptide_normalization.py --help\nUsage: peptide_normalization.py [OPTIONS]\n\n  Normalize the peptide intensities using different methods for a file output\n  of peptides with the format described in peptide_file_generation.py.\n\n  :param peptides: Peptides files from the peptide file generation tool\n  :param contaminants: Contaminants and high abundant proteins to be removed\n  :param output: Peptide intensity file including other all properties for normalization\n  :param skip_normalization: Skip normalization step\n  :param nmethod: Normalization method used to normalize intensities for all samples\n  :param impute: Impute the missing values using MissForest\n  :param pnormalization: Normalize the peptide intensities using different methods\n  :param compress: Read the input peptides file in compress gzip file\n  :param log2: Transform to log2 the peptide intensity values before normalization\n  :param violin: Use violin plot instead of boxplot for distribution representations\n  :param verbose: Print addition information about the distributions of the intensities, \n                  number of peptides remove after normalization, etc.\n\nOptions:\n  --peptides TEXT       Peptides files from the peptide file generation tool  [required]\n  --contaminants TEXT   Contaminants and high abundant proteins to be removed\n  --skip_normalization  Skip normalization step (default: False)\n  --nmethod             Normalization method used to normalize intensities for all samples\n                        (default: qnorm)\n  --impute              Impute the missing values using MissForest\n  --pnormalization      Normalize the peptide intensities using different methods\n  --compress            Read the input peptides file in compress gzip file\n  --log2                Transform to log2 the peptide intensity values before normalization\n  --violin              Use violin plot instead of boxplot for distribution representations\n  --verbose             Print addition information about the distributions of the intensities, \n                        number of peptides remove after normalization, etc.\n  --output TEXT         Peptide intensity file including other all properties for normalization\n  --help                Show this message and exit.\n```\n\nPeptide normalization starts from the output file from script `peptide_file_generation.py`. The structure of the input contains the following columns: \n\n- ProteinName: Protein name\n- PeptideSequence: Peptide sequence including post-translation modifications `(e.g. .(Acetyl)ASPDWGYDDKN(Deamidated)GPEQWSK)`\n- PrecursorCharge: Precursor charge\n- FragmentIon: Fragment ion\n- ProductCharge: Product charge\n- IsotopeLabelType: Isotope label type\n- Condition: Condition label `(e.g. heart)`\n- BioReplicate: Biological replicate index `(e.g. 1)`\n- Run: Run index `(e.g. 1)`\n- Fraction: Fraction index `(e.g. 1)`\n- Intensity: Peptide intensity\n- Reference: Name of the RAW file containing the peptide intensity `(e.g. Adult_Heart_Gel_Elite_54_f16)`\n- SampleID: Sample ID `(e.g. PXD003947-Sample-3)`\n- StudyID: Study ID `(e.g. PXD003947)`. In most of the cases the study ID is the same as the ProteomeXchange ID.\n\n#### Removing Contaminants and Decoys\n\nThe first step is to remove contaminants and decoys from the input file. The script `peptide_normalization.py` provides a parameter `--contaminants` for the user to provide a file with a list of protein accessions which represent each contaminant in the file. An example file can be seen in `data/contaminants.txt`. In addition to all the proteins accessions, the tool remove all the proteins with the following prefixes: `CONTAMINANT` and `DECOY`.\n\n#### Peptidoform Normalization\n\nA peptidoform is a combination of a `PeptideSequence(Modifications) + Charge + BioReplicate + Fraction`. In the current version of the file, each row correspond to one peptidoform. \n\nThe current version of the tool uses the parackage [qnorm](https://pypi.org/project/qnorm/) to normalize the intensities for each peptidofrom. **qnorm** implements a quantile normalization method. \n\n#### Peptidoform to Peptide Summarization\n\nFor each peptidoform a peptide sequence (canonical) with not modification is generated. The intensity of all peptides group by biological replicate are `sum`. \n\nThen, the intensities of the peptides across different biological replicates are summarize using the function `median`. \n\nAt the end of this step, for each peptide, the corresponding protein + the intensity of the peptide is stored. \n\n#### Peptide Intensity Imputation and Normalization\n\nBefore the final two steps (peptide normalization and imputation), the algorithm removes all peptides that are source of missing values significantly. The algorithm removes all peptides that have more than 80% of missing values and peptides that do not appear in more than 1 sample. \n\nFinally, two extra steps are performed: \n\n- ``peptide intensity imputation``: Imputation is performed using the package [missingpy](https://pypi.org/project/missingpy/). The algorithm uses a Random Forest algorithm to perform the imputation.\n- ``peptide intensity normalization``: Similar to the normalization of the peptidoform intensities, the peptide intensities are normalized using the package [qnorm](https://pypi.org/project/qnorm/).\n\n### Compute IBAQ - compute_ibaq.py\nIBAQ is an absolute quantitative method based on strength that can be used to estimate the relative abundance of proteins in a sample. IBAQ value is the total intensity of a protein divided by the number of theoretical peptides.\n\n```asciidoc\npython compute_ibaq.py --fasta Homo-sapiens-uniprot-reviewed-contaminants-decoy-202210.fasta --peptides PXD003947-peptides.csv --enzyme \"Trypsin\" --normalize --output PXD003947-ibaq.tsv\n``` \n\nThe command provides an additional `flag` for normalize IBAQ values.\n\n```asciidoc\npython compute_ibaq.py --help\nUsage: compute_ibaq.py [OPTIONS]\n\n  Compute the IBAQ values for a file output of peptides with the format described in\n  peptide_normalization.py.\n\n  :param min_aa: Minimum number of amino acids to consider a peptide\n  :param max_aa: Maximum number of amino acids to consider a peptide\n  :param fasta: Fasta file used to perform the peptide identification\n  :param peptides: Peptide intensity file\n  :param enzyme: Enzyme used to digest the protein sample\n  :param normalize: use some basic normalization steps\n  :param output: output format containing the ibaq values\n\nOptions:\n  -f, --fasta TEXT      Protein database to compute IBAQ values  [required]\n  -p, --peptides TEXT   Peptide identifications with intensities following the peptide intensity output  [required]\n  -e, --enzyme          Enzyme used during the analysis of the dataset (default: Trypsin)\n  -n, --normalize       Normalize IBAQ values using by using the total IBAQ of the experiment\n  --min_aa              Minimum number of amino acids to consider a peptide (default: 7)\n  --max_aa              Maximum number of amino acids to consider a peptide (default: 30)\n  -o, --output TEXT     Output format containing the ibaq values\n  --help                Show this message and exit.\n```\n\n#### Performs the Enzymatic Digestion\nThe current version of this tool uses OpenMS method to load fasta file, and use [ProteaseDigestion](https://openms.de/current_doxygen/html/classOpenMS_1_1ProteaseDigestion.html) to enzyme digestion of protein sequences, and finally get the theoretical peptide number of each protein.\n\n#### Calculate the IBAQ Value\nFirst, peptide intensity dataframe was grouped according to protein name, sample name and condition. The protein intensity of each group was summed. Finally, the sum of the intensity of the protein is divided by the number of theoretical peptides.\n\nIf protein-group exists in the peptide intensity dataframe, the intensity of all proteins in the protein-group is summed based on the above steps, and then multiplied by the number of proteins in the protein-group.\n\n#### IBAQ Normalization  \nNormalize the ibaq values using the total ibaq of the sample. The resulted ibaq values are then multiplied by 100'000'000 (PRIDE database noramalization), for the ibaq ppb and log10 shifted by 10 (ProteomicsDB)\n\n### Compute TPA - compute_tpa.py\nThe total protein approach (TPA) is a label- and standard-free method for absolute protein quantitation of proteins using large-scale proteomic data. In the current version of the tool, the TPA value is the total intensity of the protein divided by its theoretical molecular mass.\n\n[ProteomicRuler](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4256500/) is a method for protein copy number and concentration estimation that does not require the use of isotope labeled standards. It uses the mass spectrum signal of histones as a \"proteomic ruler\" because it is proportional to the amount of DNA in the sample, which in turn depends on cell count. Thus, this approach can add an absolute scale to the mass spectrometry readout and allow estimates of the copy number of individual proteins in each cell.\n\n```asciidoc\npython compute_tpa.py --fasta Homo-sapiens-uniprot-reviewed-contaminants-decoy-202210.fasta --contaminants contaminants_ids.tsv --peptides PXD003947-peptides.csv --ruler --ploidy 2 --cpc 200 --output PXD003947-tpa.tsv\n``` \n\nNote: The peptide intensity file used in the calculation of TPA and the ProteomicRuler is not normalized!\n\n```asciidoc\npython compute_tpa.py --help\nUsage: compute_tpa.py [OPTIONS]\n\n  Compute the protein copy numbers and concentrations according to a file output of peptides with the\n  format described in peptide_file_generation.py.\n\n  :param fasta: Fasta file used to perform the peptide identification\n  :param peptides: Peptide intensity file\n  :param contaminants: Contaminants file\n  :param ruler: Whether to compute protein copy number, weight and concentration.\n  :param ploidy: Ploidy number\n  :param cpc: Cellular protein concentration(g/L)\n  :param output: Output format containing the TPA values, protein copy numbers and concentrations\n\nOptions:\n  -f, --fasta TEXT      Protein database to compute IBAQ values  [required]\n  -p, --peptides TEXT   Peptide identifications with intensities following the peptide intensity output  [required]\n  --contaminants        Contaminants and high abundant proteins to be removed\n  -r, --ruler           Calculate protein copy number and concentration according to ProteomicRuler\n  -n, --ploidy          Ploidy number (default: 2)\n  -c, --cpc             Cellular protein concentration(g/L) (default: 200)\n  -o, --output TEXT     Output format containing the TPA values, protein copy numbers and concentrations\n  --help                Show this message and exit.\n```\n\n#### Calculate the TPA Value\nThe OpenMS tool was used to calculate the theoretical molecular mass of each protein. Similar to the calculation of IBAQ, the TPA value of protein-group was the sum of its intensity divided by the sum of the theoretical molecular mass.\n\n#### Calculate the Cellular Protein Copy Number and Concentration\nThe protein copy calculation follows the following formula:\n```\nprotein copies per cell = protein MS-signal *  (avogadro / molecular mass) * (DNA mass / histone MS-signal)\n```\nFor cellular protein copy number calculation, the uniprot accession of histones were obtained from species first, and the molecular mass of DNA was calculated. Then the dataframe was grouped according to different conditions, and the copy number, molar number and mass of proteins were calculated.\n\nIn the calculation of protein concentration, the volume is calculated according to the cell protein concentration first, and then the protein mass is divided by the volume to calculate the intracellular protein concentration.\n\n### Credits \n\n- Julianus Pfeuffer\n- Yasset Perez-Riverol\n- Hong Wang\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Python package to compute intensity base absolute expression values",
    "version": "0.0.2",
    "project_urls": {
        "Download": "https://github.com/bigbio/ibaqpy/",
        "Homepage": "https://github.com/bigbio/ibaqpy/"
    },
    "split_keywords": [
        "proteomics",
        "label-free",
        "absolute quantification"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8abeac61a599d58519bb83c1f84268d5538b56e0736050c6361312abf557d809",
                "md5": "d15590bdf2d4edd46032b686ba129cd3",
                "sha256": "50349e00e6504b3ff898ce2dcaa11f29fd5824f64d3e469ee6992211fa56f618"
            },
            "downloads": -1,
            "filename": "ibaqpy-0.0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d15590bdf2d4edd46032b686ba129cd3",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 43617,
            "upload_time": "2023-05-22T08:40:18",
            "upload_time_iso_8601": "2023-05-22T08:40:18.214143Z",
            "url": "https://files.pythonhosted.org/packages/8a/be/ac61a599d58519bb83c1f84268d5538b56e0736050c6361312abf557d809/ibaqpy-0.0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "90f8cfefc3e5211ece7791ae22036d204bf1289e273988c6954b0512713b2859",
                "md5": "211b491b8c26ac957c54f6bea399cf6b",
                "sha256": "6e6ec017d19a586b4b070acd0f1ac2f16aed710688580edd3e14299e51f6b867"
            },
            "downloads": -1,
            "filename": "ibaqpy-0.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "211b491b8c26ac957c54f6bea399cf6b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 33996,
            "upload_time": "2023-05-22T08:40:20",
            "upload_time_iso_8601": "2023-05-22T08:40:20.521668Z",
            "url": "https://files.pythonhosted.org/packages/90/f8/cfefc3e5211ece7791ae22036d204bf1289e273988c6954b0512713b2859/ibaqpy-0.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-05-22 08:40:20",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "bigbio",
    "github_project": "ibaqpy",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "ibaqpy"
}
        
Elapsed time: 0.07428s