pmultiqc


Namepmultiqc JSON
Version 0.0.30 PyPI version JSON
download
home_pageNone
SummaryPython package for quality control of proteomics datasets, based on multiqc package
upload_time2025-07-22 15:16:48
maintainerNone
docs_urlNone
authorYasset Perez-Riverol
requires_python<3.13,>=3.10
licenseMIT
keywords quantms proteomics quality control multiqc
VCS
bugtrack_url
requirements multiqc pandas pyteomics pyopenms sdrf-pipelines lxml numpy pyarrow scikit-learn
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # pmultiqc

[![Python application](https://github.com/bigbio/pmultiqc/actions/workflows/python-app.yml/badge.svg?branch=main)](https://github.com/bigbio/pmultiqc/actions/workflows/python-app.yml)
[![Upload Python Package](https://github.com/bigbio/pmultiqc/actions/workflows/python-publish.yml/badge.svg)](https://github.com/bigbio/pmultiqc/actions/workflows/python-publish.yml)
![PyPI - Version](https://img.shields.io/pypi/v/pmultiqc?style=flat)
![PyPI - Downloads](https://img.shields.io/pypi/dm/pmultiqc)
![Pepy Total Downloads](https://img.shields.io/pepy/dt/pmultiqc)
![GitHub Repo stars](https://img.shields.io/github/stars/bigbio/pmultiqc)

## What is pmultiqc?

pmultiqc is a MultiQC plugin for comprehensive quality control reporting of proteomics data. It generates interactive HTML reports with visualizations and metrics to help you assess the quality of your mass spectrometry-based proteomics experiments.

### Key Features

- Works with multiple proteomics data formats and analysis pipelines
- Generates interactive HTML reports with visualizations
- Provides comprehensive QC metrics for MS data
- Supports different quantification methods (LFQ, TMT, DIA)
- Integrates with the MultiQC framework

## Supported Data Sources

pmultiqc supports the following data sources:

1. **[quantms pipeline](https://github.com/nf-core/quantms)** output files:
   - `experimental_design.tsv`: Experimental design file
   - `*.mzTab`: Results of the identification
   - `*msstats*.csv`: MSstats/MSstatsTMT input files
   - `*.mzML`: Spectra files
   - `*ms_info.tsv`: MS quality control information
   - `*.idXML`: Identification results
   - `*.yml`: Pipeline parameters (optional)
   - `diann_report.tsv` or `diann_report.parquet`: DIA-NN main report (DIA analysis only)

2. **[MaxQuant](https://www.maxquant.org)** result files:
   - `parameters.txt`: Analysis parameters
   - `proteinGroups.txt`: Protein identification results
   - `summary.txt`: Summary statistics
   - `evidence.txt`: Peptide evidence
   - `msms.txt`: MS/MS scan information
   - `msmsScans.txt`: MS/MS scan details

3. **[DIA-NN](https://aptila.bio)** result files:
   - `*ms_info.parquet`: mzML statistics after Raw-to-mzML conversion (using **[quantms-utils](https://github.com/bigbio/quantms-utils)**)
   - `report.tsv` or `report.parquet`: DIA-NN main report

4. **[ProteoBench](https://proteobench.readthedocs.io)** file:
   - `result_performance.csv`: ProteoBench result file

5. **mzIdentML** files:
   - `*.mzid`: Identification results
   - `*.mzML` or `*.mgf`: Corresponding spectra files

## Installation

### Install from PyPI

```bash
# To install the stable release from PyPI:
pip install pmultiqc
```

### Install from Source (Without PyPI)

```bash
# Fork the repository on GitHub

# Clone the repository
git clone https://github.com/your-username/pmultiqc.git
cd pmultiqc

# Install the package locally
pip install .

# Now you can run pmultiqc on your own dataset
```

## Usage

pmultiqc is used as a plugin for MultiQC. After installation, you can run it using the MultiQC command-line interface.

### Basic Usage

```bash
multiqc {analysis_dir} -o {output_dir}
```

Where:
- `{analysis_dir}` is the directory containing your proteomics data files
- `{output_dir}` is the directory where you want to save the report

### Examples

#### For quantms pipeline results

```bash
# Basic usage
multiqc /path/to/quantms/results -o ./report

# With specific options
multiqc /path/to/quantms/results -o ./report --remove_decoy --condition factor
```

#### For MaxQuant results

```bash
multiqc --parse_maxquant /path/to/maxquant/results -o ./report
```

#### For DIA-NN results

```bash
multiqc /path/to/diann/results -o ./report
```

#### For ProteoBench files

```bash
multiqc --parse_proteobench /path/to/proteobench/files -o ./report
```

#### For mzIdentML files

```bash
multiqc --mzid_plugin /path/to/mzid/files -o ./report
```


### Command-line Options

| Option | Description | Default |
|--------|-------------|---------|
| `--raw` | Keep filenames in experimental design output as raw | `False` |
| `--condition` | Create conditions from provided columns | - |
| `--remove_decoy` | Remove decoy peptides when counting | `True` |
| `--decoy_affix` | Pre- or suffix of decoy proteins in their accession | `DECOY_` |
| `--contaminant_affix` | The contaminant prefix or suffix | `CONT` |
| `--affix_type` | Location of the decoy marker (prefix or suffix) | `prefix` |
| `--disable_plugin` | Disable pmultiqc plugin | `False` |
| `--quantification_method` | Quantification method for LFQ experiment | `feature_intensity` |
| `--disable_table` | Disable protein/peptide table plots for large datasets | `False` |
| `--ignored_idxml` | Ignore idXML files for faster processing | `False` |
| `--parse_maxquant` | Generate reports based on MaxQuant results | `False` |
| `--parse_proteobench` | Generate reports based on ProteoBench result | `False` |
| `--mzid_plugin` | Generate reports based on mzIdentML files | `False` |

## QC Metrics and Visualizations

pmultiqc generates a comprehensive report with multiple sections:

### General Report

- **Experimental Design**: Overview of the dataset structure
- **Pipeline Performance Overview**: Key metrics including:
  - Contaminants Score
  - Peptide Intensity
  - Charge Score
  - Missed Cleavages
  - ID rate over RT
  - MS2 OverSampling
  - Peptide Missing Value
- **Summary Table**: Spectra counts, identification rates, peptide and protein counts
- **MS1 Information**: Quality metrics at MS1 level
- **Pipeline Results Statistics**: Overall identification results
- **Number of Peptides per Protein**: Distribution of peptide counts per protein

### Results Tables

- **Peptide Table**: First 500 peptides in the dataset
- **PSM Table**: First 500 PSMs (Peptide-Spectrum Matches)

### Identification Statistics

- **Spectra Tracking**: Summary of identification results by file
- **Search Engine Scores**: Distribution of search engine scores
- **Precursor Charges Distribution**: Distribution of precursor ion charges
- **Number of Peaks per MS/MS Spectrum**: Peak count distribution
- **Peak Intensity Distribution**: MS2 peak intensity distribution
- **Oversampling Distribution**: Analysis of MS2 oversampling
- **Delta Mass**: Mass accuracy distribution
- **Peptide/Protein Quantification Tables**: Quantitative levels across conditions

## Example Reports

You can find example reports on the [docs page](https://bigbio.github.io/pmultiqc).

## Development

To contribute to pmultiqc:

1. Fork the repository
2. Clone your fork: `git clone https://github.com/YOUR-USERNAME/pmultiqc`
3. Create a feature branch: `git checkout -b new-feature`
4. Make your changes
5. Install in development mode: `pip install -e .`
6. Test your changes: `cd tests && multiqc resources/LFQ -o ./`
7. Commit your changes: `git commit -am 'Add new feature'`
8. Push to the branch: `git push origin new-feature`
9. Submit a pull request

## License

This project is licensed under the terms of the LICENSE file included in the repository.

## Citation

If you use pmultiqc in your research, please cite:

```
pmultiqc: A MultiQC plugin for proteomics quality control
https://github.com/bigbio/pmultiqc
```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "pmultiqc",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.13,>=3.10",
    "maintainer_email": null,
    "keywords": "quantms, proteomics, quality control, MultiQC",
    "author": "Yasset Perez-Riverol",
    "author_email": "ypriverol@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/cc/73/ee63c257e9b15b07b6cb904efef8f0c0631a855fb40396c0d1ac603c7446/pmultiqc-0.0.30.tar.gz",
    "platform": null,
    "description": "# pmultiqc\n\n[![Python application](https://github.com/bigbio/pmultiqc/actions/workflows/python-app.yml/badge.svg?branch=main)](https://github.com/bigbio/pmultiqc/actions/workflows/python-app.yml)\n[![Upload Python Package](https://github.com/bigbio/pmultiqc/actions/workflows/python-publish.yml/badge.svg)](https://github.com/bigbio/pmultiqc/actions/workflows/python-publish.yml)\n![PyPI - Version](https://img.shields.io/pypi/v/pmultiqc?style=flat)\n![PyPI - Downloads](https://img.shields.io/pypi/dm/pmultiqc)\n![Pepy Total Downloads](https://img.shields.io/pepy/dt/pmultiqc)\n![GitHub Repo stars](https://img.shields.io/github/stars/bigbio/pmultiqc)\n\n## What is pmultiqc?\n\npmultiqc is a MultiQC plugin for comprehensive quality control reporting of proteomics data. It generates interactive HTML reports with visualizations and metrics to help you assess the quality of your mass spectrometry-based proteomics experiments.\n\n### Key Features\n\n- Works with multiple proteomics data formats and analysis pipelines\n- Generates interactive HTML reports with visualizations\n- Provides comprehensive QC metrics for MS data\n- Supports different quantification methods (LFQ, TMT, DIA)\n- Integrates with the MultiQC framework\n\n## Supported Data Sources\n\npmultiqc supports the following data sources:\n\n1. **[quantms pipeline](https://github.com/nf-core/quantms)** output files:\n   - `experimental_design.tsv`: Experimental design file\n   - `*.mzTab`: Results of the identification\n   - `*msstats*.csv`: MSstats/MSstatsTMT input files\n   - `*.mzML`: Spectra files\n   - `*ms_info.tsv`: MS quality control information\n   - `*.idXML`: Identification results\n   - `*.yml`: Pipeline parameters (optional)\n   - `diann_report.tsv` or `diann_report.parquet`: DIA-NN main report (DIA analysis only)\n\n2. **[MaxQuant](https://www.maxquant.org)** result files:\n   - `parameters.txt`: Analysis parameters\n   - `proteinGroups.txt`: Protein identification results\n   - `summary.txt`: Summary statistics\n   - `evidence.txt`: Peptide evidence\n   - `msms.txt`: MS/MS scan information\n   - `msmsScans.txt`: MS/MS scan details\n\n3. **[DIA-NN](https://aptila.bio)** result files:\n   - `*ms_info.parquet`: mzML statistics after Raw-to-mzML conversion (using **[quantms-utils](https://github.com/bigbio/quantms-utils)**)\n   - `report.tsv` or `report.parquet`: DIA-NN main report\n\n4. **[ProteoBench](https://proteobench.readthedocs.io)** file:\n   - `result_performance.csv`: ProteoBench result file\n\n5. **mzIdentML** files:\n   - `*.mzid`: Identification results\n   - `*.mzML` or `*.mgf`: Corresponding spectra files\n\n## Installation\n\n### Install from PyPI\n\n```bash\n# To install the stable release from PyPI:\npip install pmultiqc\n```\n\n### Install from Source (Without PyPI)\n\n```bash\n# Fork the repository on GitHub\n\n# Clone the repository\ngit clone https://github.com/your-username/pmultiqc.git\ncd pmultiqc\n\n# Install the package locally\npip install .\n\n# Now you can run pmultiqc on your own dataset\n```\n\n## Usage\n\npmultiqc is used as a plugin for MultiQC. After installation, you can run it using the MultiQC command-line interface.\n\n### Basic Usage\n\n```bash\nmultiqc {analysis_dir} -o {output_dir}\n```\n\nWhere:\n- `{analysis_dir}` is the directory containing your proteomics data files\n- `{output_dir}` is the directory where you want to save the report\n\n### Examples\n\n#### For quantms pipeline results\n\n```bash\n# Basic usage\nmultiqc /path/to/quantms/results -o ./report\n\n# With specific options\nmultiqc /path/to/quantms/results -o ./report --remove_decoy --condition factor\n```\n\n#### For MaxQuant results\n\n```bash\nmultiqc --parse_maxquant /path/to/maxquant/results -o ./report\n```\n\n#### For DIA-NN results\n\n```bash\nmultiqc /path/to/diann/results -o ./report\n```\n\n#### For ProteoBench files\n\n```bash\nmultiqc --parse_proteobench /path/to/proteobench/files -o ./report\n```\n\n#### For mzIdentML files\n\n```bash\nmultiqc --mzid_plugin /path/to/mzid/files -o ./report\n```\n\n\n### Command-line Options\n\n| Option | Description | Default |\n|--------|-------------|---------|\n| `--raw` | Keep filenames in experimental design output as raw | `False` |\n| `--condition` | Create conditions from provided columns | - |\n| `--remove_decoy` | Remove decoy peptides when counting | `True` |\n| `--decoy_affix` | Pre- or suffix of decoy proteins in their accession | `DECOY_` |\n| `--contaminant_affix` | The contaminant prefix or suffix | `CONT` |\n| `--affix_type` | Location of the decoy marker (prefix or suffix) | `prefix` |\n| `--disable_plugin` | Disable pmultiqc plugin | `False` |\n| `--quantification_method` | Quantification method for LFQ experiment | `feature_intensity` |\n| `--disable_table` | Disable protein/peptide table plots for large datasets | `False` |\n| `--ignored_idxml` | Ignore idXML files for faster processing | `False` |\n| `--parse_maxquant` | Generate reports based on MaxQuant results | `False` |\n| `--parse_proteobench` | Generate reports based on ProteoBench result | `False` |\n| `--mzid_plugin` | Generate reports based on mzIdentML files | `False` |\n\n## QC Metrics and Visualizations\n\npmultiqc generates a comprehensive report with multiple sections:\n\n### General Report\n\n- **Experimental Design**: Overview of the dataset structure\n- **Pipeline Performance Overview**: Key metrics including:\n  - Contaminants Score\n  - Peptide Intensity\n  - Charge Score\n  - Missed Cleavages\n  - ID rate over RT\n  - MS2 OverSampling\n  - Peptide Missing Value\n- **Summary Table**: Spectra counts, identification rates, peptide and protein counts\n- **MS1 Information**: Quality metrics at MS1 level\n- **Pipeline Results Statistics**: Overall identification results\n- **Number of Peptides per Protein**: Distribution of peptide counts per protein\n\n### Results Tables\n\n- **Peptide Table**: First 500 peptides in the dataset\n- **PSM Table**: First 500 PSMs (Peptide-Spectrum Matches)\n\n### Identification Statistics\n\n- **Spectra Tracking**: Summary of identification results by file\n- **Search Engine Scores**: Distribution of search engine scores\n- **Precursor Charges Distribution**: Distribution of precursor ion charges\n- **Number of Peaks per MS/MS Spectrum**: Peak count distribution\n- **Peak Intensity Distribution**: MS2 peak intensity distribution\n- **Oversampling Distribution**: Analysis of MS2 oversampling\n- **Delta Mass**: Mass accuracy distribution\n- **Peptide/Protein Quantification Tables**: Quantitative levels across conditions\n\n## Example Reports\n\nYou can find example reports on the [docs page](https://bigbio.github.io/pmultiqc).\n\n## Development\n\nTo contribute to pmultiqc:\n\n1. Fork the repository\n2. Clone your fork: `git clone https://github.com/YOUR-USERNAME/pmultiqc`\n3. Create a feature branch: `git checkout -b new-feature`\n4. Make your changes\n5. Install in development mode: `pip install -e .`\n6. Test your changes: `cd tests && multiqc resources/LFQ -o ./`\n7. Commit your changes: `git commit -am 'Add new feature'`\n8. Push to the branch: `git push origin new-feature`\n9. Submit a pull request\n\n## License\n\nThis project is licensed under the terms of the LICENSE file included in the repository.\n\n## Citation\n\nIf you use pmultiqc in your research, please cite:\n\n```\npmultiqc: A MultiQC plugin for proteomics quality control\nhttps://github.com/bigbio/pmultiqc\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Python package for quality control of proteomics datasets, based on multiqc package",
    "version": "0.0.30",
    "project_urls": {
        "GitHub": "https://github.com/bigbio/pmultiqc",
        "LICENSE": "https://github.com/bigbio/pmultiqc/blob/main/LICENSE",
        "Quantms": "https://quantms.org"
    },
    "split_keywords": [
        "quantms",
        " proteomics",
        " quality control",
        " multiqc"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "a51b83e81f1bfc84820be0129a481bf4dff4ff9bd5ecd53a1acfb7664cd94bb5",
                "md5": "2055fc020e3c594cb902797f2fb65cb2",
                "sha256": "d3856ac50ab4b32428ccf8d1078d64d8292c35b1c0535d22ff4f3e2ac7e47dbf"
            },
            "downloads": -1,
            "filename": "pmultiqc-0.0.30-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "2055fc020e3c594cb902797f2fb65cb2",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.13,>=3.10",
            "size": 915379,
            "upload_time": "2025-07-22T15:16:47",
            "upload_time_iso_8601": "2025-07-22T15:16:47.696225Z",
            "url": "https://files.pythonhosted.org/packages/a5/1b/83e81f1bfc84820be0129a481bf4dff4ff9bd5ecd53a1acfb7664cd94bb5/pmultiqc-0.0.30-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "cc73ee63c257e9b15b07b6cb904efef8f0c0631a855fb40396c0d1ac603c7446",
                "md5": "1d40c428b849ebdeaf00090028894c87",
                "sha256": "6e30ddc425297c09b99f6c918a0b0b42b300e8954add2b4c6d13dc5d77bf9533"
            },
            "downloads": -1,
            "filename": "pmultiqc-0.0.30.tar.gz",
            "has_sig": false,
            "md5_digest": "1d40c428b849ebdeaf00090028894c87",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.13,>=3.10",
            "size": 896802,
            "upload_time": "2025-07-22T15:16:48",
            "upload_time_iso_8601": "2025-07-22T15:16:48.930473Z",
            "url": "https://files.pythonhosted.org/packages/cc/73/ee63c257e9b15b07b6cb904efef8f0c0631a855fb40396c0d1ac603c7446/pmultiqc-0.0.30.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-22 15:16:48",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "bigbio",
    "github_project": "pmultiqc",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "multiqc",
            "specs": [
                [
                    ">=",
                    "1.29"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    ">=",
                    "1.5"
                ]
            ]
        },
        {
            "name": "pyteomics",
            "specs": []
        },
        {
            "name": "pyopenms",
            "specs": []
        },
        {
            "name": "sdrf-pipelines",
            "specs": [
                [
                    ">=",
                    "0.0.32"
                ]
            ]
        },
        {
            "name": "lxml",
            "specs": []
        },
        {
            "name": "numpy",
            "specs": [
                [
                    ">=",
                    "1.23"
                ]
            ]
        },
        {
            "name": "pyarrow",
            "specs": []
        },
        {
            "name": "scikit-learn",
            "specs": [
                [
                    ">=",
                    "1.2"
                ]
            ]
        }
    ],
    "lcname": "pmultiqc"
}
        
Elapsed time: 2.19581s