pmultiqc

Name	pmultiqc JSON
Version	0.0.38 JSON
	download
home_page	None
Summary	Python package for quality control of proteomics datasets, based on multiqc package
upload_time	2025-11-01 06:21:18
maintainer	None
docs_url	None
author	Yasset Perez-Riverol
requires_python	>=3.10
license	MIT
keywords	quantms proteomics quality control multiqc
VCS
bugtrack_url
requirements	multiqc pandas pyteomics pyopenms sdrf-pipelines lxml numpy pyarrow scikit-learn statsmodels
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # pmultiqc

[![Python application](https://github.com/bigbio/pmultiqc/actions/workflows/python-app.yml/badge.svg?branch=main)](https://github.com/bigbio/pmultiqc/actions/workflows/python-app.yml)
[![Upload Python Package](https://github.com/bigbio/pmultiqc/actions/workflows/python-publish.yml/badge.svg)](https://github.com/bigbio/pmultiqc/actions/workflows/python-publish.yml)
![PyPI - Version](https://img.shields.io/pypi/v/pmultiqc?style=flat)
![PyPI - Downloads](https://img.shields.io/pypi/dm/pmultiqc)
![Pepy Total Downloads](https://img.shields.io/pepy/dt/pmultiqc)
![GitHub Repo stars](https://img.shields.io/github/stars/bigbio/pmultiqc)

## What is pmultiqc?

pmultiqc is a MultiQC plugin for comprehensive quality control reporting of proteomics data. It generates interactive HTML reports with visualizations and metrics to help you assess the quality of your mass spectrometry-based proteomics experiments.

### Key Features

- Works with multiple proteomics data formats and analysis pipelines
- Generates interactive HTML reports with visualizations
- Provides comprehensive QC metrics for MS data
- Supports different quantification methods (LFQ, TMT, DIA)
- Integrates with the MultiQC framework

## Supported Data Sources

pmultiqc supports the following data sources:

1. **[quantms pipeline](https://github.com/nf-core/quantms)** output files:
   - `experimental_design.tsv`: Experimental design file
   - `*.mzTab`: Results of the identification
   - `*msstats*.csv`: MSstats/MSstatsTMT input files
   - `*.mzML`: Spectra files
   - `*ms_info.tsv`: MS quality control information
   - `*.idXML`: Identification results
   - `*.yml`: Pipeline parameters (optional)
   - `diann_report.tsv` or `diann_report.parquet`: DIA-NN main report (DIA analysis only)

2. **[MaxQuant](https://www.maxquant.org)** result files:
   - `parameters.txt`: Analysis parameters
   - `proteinGroups.txt`: Protein identification results
   - `summary.txt`: Summary statistics
   - `evidence.txt`: Peptide evidence
   - `msms.txt`: MS/MS scan information
   - `msmsScans.txt`: MS/MS scan details
   - `*sdrf.tsv`: SDRF-Proteomics (optional)

3. **[DIA-NN](https://aptila.bio)** result files:
   - `report.tsv` or `report.parquet`: DIA-NN main report
   - `*sdrf.tsv`: SDRF-Proteomics (optional)
   - `*ms_info.parquet`: mzML statistics after RAW-to-mzML conversion (using **[quantms-utils](https://github.com/bigbio/quantms-utils)**) (optional)

4. **[ProteoBench](https://proteobench.readthedocs.io)** file:
   - `result_performance.csv`: ProteoBench result file

5. **mzIdentML** files:
   - `*.mzid`: Identification results
   - `*.mzML` or `*.mgf`: Corresponding spectra files

## Installation

### Install from PyPI

```bash
# To install the stable release from PyPI:
pip install pmultiqc
```

### Install from Source (Without PyPI)

```bash
# Fork the repository on GitHub

# Clone the repository
git clone https://github.com/your-username/pmultiqc.git
cd pmultiqc

# Install the package locally
pip install .

# Now you can run pmultiqc on your own dataset
```

## Usage

pmultiqc is used as a plugin for MultiQC. After installation, you can run it using the MultiQC command-line interface.

### Basic Usage

```bash
multiqc {analysis_dir} -o {output_dir}
```

Where:
- `{analysis_dir}` is the directory containing your proteomics data files
- `{output_dir}` is the directory where you want to save the report

### Examples

#### For quantms pipeline results

```bash
# Basic usage
multiqc --quantms_plugin /path/to/quantms/results -o ./report

# With specific options
multiqc --quantms_plugin /path/to/quantms/results -o ./report --remove_decoy --condition factor
```

#### For MaxQuant results

```bash
multiqc --maxquant_plugin /path/to/maxquant/results -o ./report
```

#### For DIA-NN results

```bash
multiqc --diann_plugin /path/to/diann/results -o ./report
```

#### For ProteoBench files

```bash
multiqc --proteobench_plugin /path/to/proteobench/files -o ./report
```

#### For mzIdentML files

```bash
multiqc --mzid_plugin /path/to/mzid/files -o ./report
```


### Command-line Options

| Option | Description | Default |
|--------|-------------|---------|
| `--raw` | Keep filenames in experimental design output as raw | `False` |
| `--condition` | Create conditions from provided columns | - |
| `--remove_decoy` | Remove decoy peptides when counting | `True` |
| `--decoy_affix` | Pre- or suffix of decoy proteins in their accession | `DECOY_` |
| `--contaminant_affix` | The contaminant prefix or suffix | `CONT` |
| `--affix_type` | Location of the decoy marker (prefix or suffix) | `prefix` |
| `--disable_plugin` | Disable pmultiqc plugin | `False` |
| `--quantification_method` | Quantification method for LFQ experiment | `feature_intensity` |
| `--disable_table` | Disable protein/peptide table plots for large datasets | `False` |
| `--ignored_idxml` | Ignore idXML files for faster processing | `False` |
| `--quantms_plugin` | Generate reports based on Quantms results | `False` |
| `--diann_plugin` | Generate reports based on DIANN results | `False` |
| `--maxquant_plugin` | Generate reports based on MaxQuant results | `False` |
| `--proteobench_plugin` | Generate reports based on ProteoBench result | `False` |
| `--mzid_plugin` | Generate reports based on mzIdentML files | `False` |

## QC Metrics and Visualizations

pmultiqc generates a comprehensive report with multiple sections:

### General Report

- **Experimental Design**: Overview of the dataset structure
- **Pipeline Performance Overview**: Key metrics including:
  - Contaminants Score
  - Peptide Intensity
  - Charge Score
  - Missed Cleavages
  - ID rate over RT
  - MS2 OverSampling
  - Peptide Missing Value
- **Summary Table**: Spectra counts, identification rates, peptide and protein counts
- **MS1 Information**: Quality metrics at MS1 level
- **Pipeline Results Statistics**: Overall identification results
- **Number of Peptides per Protein**: Distribution of peptide counts per protein

### Results Tables

- **Peptide Table**: First 500 peptides in the dataset
- **PSM Table**: First 500 PSMs (Peptide-Spectrum Matches)

### Identification Statistics

- **Spectra Tracking**: Summary of identification results by file
- **Search Engine Scores**: Distribution of search engine scores
- **Precursor Charges Distribution**: Distribution of precursor ion charges
- **Number of Peaks per MS/MS Spectrum**: Peak count distribution
- **Peak Intensity Distribution**: MS2 peak intensity distribution
- **Oversampling Distribution**: Analysis of MS2 oversampling
- **Delta Mass**: Mass accuracy distribution
- **Peptide/Protein Quantification Tables**: Quantitative levels across conditions

## Example Reports

You can find example reports on the [docs page](https://bigbio.github.io/pmultiqc).

## Reporting Issues

We have comprehensive issue templates to help you report problems effectively:

- **Bug Reports**: For crashes, incorrect metrics, or unexpected behavior
- **Metric Requests**: For new proteomics quality control metrics (we actively encourage these!)
- **Feature Requests**: For new visualizations, data format support, or functionality
- **Service Issues**: For problems with the PRIDE web service
- **General Issues**: For questions, suggestions, or issues that don't fit other categories

## Contributing

We welcome contributions! See our [Contributing Guide](CONTRIBUTING.md) for detailed instructions.

### Quick Start for Contributors

1. Fork the repository
2. Clone your fork: `git clone https://github.com/YOUR-USERNAME/pmultiqc`
3. Create a feature branch: `git checkout -b new-feature`
4. Make your changes
5. Install in development mode: `pip install -e .`
6. Test your changes: `cd tests && multiqc resources/LFQ -o ./`
7. Commit your changes: `git commit -am 'Add new feature'`
8. Push to the branch: `git push origin new-feature`
9. Submit a pull request

## License

This project is licensed under the terms of the LICENSE file included in the repository.

## Citation

If you use pmultiqc in your research, please cite:

```
pmultiqc: A MultiQC plugin for proteomics quality control
https://github.com/bigbio/pmultiqc
```

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "pmultiqc",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "quantms, proteomics, quality control, MultiQC",
    "author": "Yasset Perez-Riverol",
    "author_email": "ypriverol@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/03/40/c43f77753139708a543157ed9206648b7213e068d2dd373a56a4bfb8a268/pmultiqc-0.0.38.tar.gz",
    "platform": null,
    "description": "# pmultiqc\n\n[![Python application](https://github.com/bigbio/pmultiqc/actions/workflows/python-app.yml/badge.svg?branch=main)](https://github.com/bigbio/pmultiqc/actions/workflows/python-app.yml)\n[![Upload Python Package](https://github.com/bigbio/pmultiqc/actions/workflows/python-publish.yml/badge.svg)](https://github.com/bigbio/pmultiqc/actions/workflows/python-publish.yml)\n![PyPI - Version](https://img.shields.io/pypi/v/pmultiqc?style=flat)\n![PyPI - Downloads](https://img.shields.io/pypi/dm/pmultiqc)\n![Pepy Total Downloads](https://img.shields.io/pepy/dt/pmultiqc)\n![GitHub Repo stars](https://img.shields.io/github/stars/bigbio/pmultiqc)\n\n## What is pmultiqc?\n\npmultiqc is a MultiQC plugin for comprehensive quality control reporting of proteomics data. It generates interactive HTML reports with visualizations and metrics to help you assess the quality of your mass spectrometry-based proteomics experiments.\n\n### Key Features\n\n- Works with multiple proteomics data formats and analysis pipelines\n- Generates interactive HTML reports with visualizations\n- Provides comprehensive QC metrics for MS data\n- Supports different quantification methods (LFQ, TMT, DIA)\n- Integrates with the MultiQC framework\n\n## Supported Data Sources\n\npmultiqc supports the following data sources:\n\n1. **[quantms pipeline](https://github.com/nf-core/quantms)** output files:\n   - `experimental_design.tsv`: Experimental design file\n   - `*.mzTab`: Results of the identification\n   - `*msstats*.csv`: MSstats/MSstatsTMT input files\n   - `*.mzML`: Spectra files\n   - `*ms_info.tsv`: MS quality control information\n   - `*.idXML`: Identification results\n   - `*.yml`: Pipeline parameters (optional)\n   - `diann_report.tsv` or `diann_report.parquet`: DIA-NN main report (DIA analysis only)\n\n2. **[MaxQuant](https://www.maxquant.org)** result files:\n   - `parameters.txt`: Analysis parameters\n   - `proteinGroups.txt`: Protein identification results\n   - `summary.txt`: Summary statistics\n   - `evidence.txt`: Peptide evidence\n   - `msms.txt`: MS/MS scan information\n   - `msmsScans.txt`: MS/MS scan details\n   - `*sdrf.tsv`: SDRF-Proteomics (optional)\n\n3. **[DIA-NN](https://aptila.bio)** result files:\n   - `report.tsv` or `report.parquet`: DIA-NN main report\n   - `*sdrf.tsv`: SDRF-Proteomics (optional)\n   - `*ms_info.parquet`: mzML statistics after RAW-to-mzML conversion (using **[quantms-utils](https://github.com/bigbio/quantms-utils)**) (optional)\n\n4. **[ProteoBench](https://proteobench.readthedocs.io)** file:\n   - `result_performance.csv`: ProteoBench result file\n\n5. **mzIdentML** files:\n   - `*.mzid`: Identification results\n   - `*.mzML` or `*.mgf`: Corresponding spectra files\n\n## Installation\n\n### Install from PyPI\n\n```bash\n# To install the stable release from PyPI:\npip install pmultiqc\n```\n\n### Install from Source (Without PyPI)\n\n```bash\n# Fork the repository on GitHub\n\n# Clone the repository\ngit clone https://github.com/your-username/pmultiqc.git\ncd pmultiqc\n\n# Install the package locally\npip install .\n\n# Now you can run pmultiqc on your own dataset\n```\n\n## Usage\n\npmultiqc is used as a plugin for MultiQC. After installation, you can run it using the MultiQC command-line interface.\n\n### Basic Usage\n\n```bash\nmultiqc {analysis_dir} -o {output_dir}\n```\n\nWhere:\n- `{analysis_dir}` is the directory containing your proteomics data files\n- `{output_dir}` is the directory where you want to save the report\n\n### Examples\n\n#### For quantms pipeline results\n\n```bash\n# Basic usage\nmultiqc --quantms_plugin /path/to/quantms/results -o ./report\n\n# With specific options\nmultiqc --quantms_plugin /path/to/quantms/results -o ./report --remove_decoy --condition factor\n```\n\n#### For MaxQuant results\n\n```bash\nmultiqc --maxquant_plugin /path/to/maxquant/results -o ./report\n```\n\n#### For DIA-NN results\n\n```bash\nmultiqc --diann_plugin /path/to/diann/results -o ./report\n```\n\n#### For ProteoBench files\n\n```bash\nmultiqc --proteobench_plugin /path/to/proteobench/files -o ./report\n```\n\n#### For mzIdentML files\n\n```bash\nmultiqc --mzid_plugin /path/to/mzid/files -o ./report\n```\n\n\n### Command-line Options\n\n| Option | Description | Default |\n|--------|-------------|---------|\n| `--raw` | Keep filenames in experimental design output as raw | `False` |\n| `--condition` | Create conditions from provided columns | - |\n| `--remove_decoy` | Remove decoy peptides when counting | `True` |\n| `--decoy_affix` | Pre- or suffix of decoy proteins in their accession | `DECOY_` |\n| `--contaminant_affix` | The contaminant prefix or suffix | `CONT` |\n| `--affix_type` | Location of the decoy marker (prefix or suffix) | `prefix` |\n| `--disable_plugin` | Disable pmultiqc plugin | `False` |\n| `--quantification_method` | Quantification method for LFQ experiment | `feature_intensity` |\n| `--disable_table` | Disable protein/peptide table plots for large datasets | `False` |\n| `--ignored_idxml` | Ignore idXML files for faster processing | `False` |\n| `--quantms_plugin` | Generate reports based on Quantms results | `False` |\n| `--diann_plugin` | Generate reports based on DIANN results | `False` |\n| `--maxquant_plugin` | Generate reports based on MaxQuant results | `False` |\n| `--proteobench_plugin` | Generate reports based on ProteoBench result | `False` |\n| `--mzid_plugin` | Generate reports based on mzIdentML files | `False` |\n\n## QC Metrics and Visualizations\n\npmultiqc generates a comprehensive report with multiple sections:\n\n### General Report\n\n- **Experimental Design**: Overview of the dataset structure\n- **Pipeline Performance Overview**: Key metrics including:\n  - Contaminants Score\n  - Peptide Intensity\n  - Charge Score\n  - Missed Cleavages\n  - ID rate over RT\n  - MS2 OverSampling\n  - Peptide Missing Value\n- **Summary Table**: Spectra counts, identification rates, peptide and protein counts\n- **MS1 Information**: Quality metrics at MS1 level\n- **Pipeline Results Statistics**: Overall identification results\n- **Number of Peptides per Protein**: Distribution of peptide counts per protein\n\n### Results Tables\n\n- **Peptide Table**: First 500 peptides in the dataset\n- **PSM Table**: First 500 PSMs (Peptide-Spectrum Matches)\n\n### Identification Statistics\n\n- **Spectra Tracking**: Summary of identification results by file\n- **Search Engine Scores**: Distribution of search engine scores\n- **Precursor Charges Distribution**: Distribution of precursor ion charges\n- **Number of Peaks per MS/MS Spectrum**: Peak count distribution\n- **Peak Intensity Distribution**: MS2 peak intensity distribution\n- **Oversampling Distribution**: Analysis of MS2 oversampling\n- **Delta Mass**: Mass accuracy distribution\n- **Peptide/Protein Quantification Tables**: Quantitative levels across conditions\n\n## Example Reports\n\nYou can find example reports on the [docs page](https://bigbio.github.io/pmultiqc).\n\n## Reporting Issues\n\nWe have comprehensive issue templates to help you report problems effectively:\n\n- **Bug Reports**: For crashes, incorrect metrics, or unexpected behavior\n- **Metric Requests**: For new proteomics quality control metrics (we actively encourage these!)\n- **Feature Requests**: For new visualizations, data format support, or functionality\n- **Service Issues**: For problems with the PRIDE web service\n- **General Issues**: For questions, suggestions, or issues that don't fit other categories\n\n## Contributing\n\nWe welcome contributions! See our [Contributing Guide](CONTRIBUTING.md) for detailed instructions.\n\n### Quick Start for Contributors\n\n1. Fork the repository\n2. Clone your fork: `git clone https://github.com/YOUR-USERNAME/pmultiqc`\n3. Create a feature branch: `git checkout -b new-feature`\n4. Make your changes\n5. Install in development mode: `pip install -e .`\n6. Test your changes: `cd tests && multiqc resources/LFQ -o ./`\n7. Commit your changes: `git commit -am 'Add new feature'`\n8. Push to the branch: `git push origin new-feature`\n9. Submit a pull request\n\n## License\n\nThis project is licensed under the terms of the LICENSE file included in the repository.\n\n## Citation\n\nIf you use pmultiqc in your research, please cite:\n\n```\npmultiqc: A MultiQC plugin for proteomics quality control\nhttps://github.com/bigbio/pmultiqc\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Python package for quality control of proteomics datasets, based on multiqc package",
    "version": "0.0.38",
    "project_urls": {
        "GitHub": "https://github.com/bigbio/pmultiqc",
        "LICENSE": "https://github.com/bigbio/pmultiqc/blob/main/LICENSE",
        "Quantms": "https://quantms.org"
    },
    "split_keywords": [
        "quantms",
        " proteomics",
        " quality control",
        " multiqc"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "a009f1cd0437c828bb2e8f2eb42aef4509f7aa074de9080f39fdc76e11284166",
                "md5": "71dbbab825cb568b7dda80a2ed5ec53b",
                "sha256": "c3e766cdc2887aaadc873284f7d4be1e633eb7caa269575dab02fbcc35ae8633"
            },
            "downloads": -1,
            "filename": "pmultiqc-0.0.38-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "71dbbab825cb568b7dda80a2ed5ec53b",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 952511,
            "upload_time": "2025-11-01T06:21:17",
            "upload_time_iso_8601": "2025-11-01T06:21:17.082663Z",
            "url": "https://files.pythonhosted.org/packages/a0/09/f1cd0437c828bb2e8f2eb42aef4509f7aa074de9080f39fdc76e11284166/pmultiqc-0.0.38-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "0340c43f77753139708a543157ed9206648b7213e068d2dd373a56a4bfb8a268",
                "md5": "e620508c6dc4a030ec8e93f36df1147f",
                "sha256": "a7e71b3a4fd5242426568c92d669fef664a8b9abcaeac4e28fbde7eeaf7cd897"
            },
            "downloads": -1,
            "filename": "pmultiqc-0.0.38.tar.gz",
            "has_sig": false,
            "md5_digest": "e620508c6dc4a030ec8e93f36df1147f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 924494,
            "upload_time": "2025-11-01T06:21:18",
            "upload_time_iso_8601": "2025-11-01T06:21:18.380501Z",
            "url": "https://files.pythonhosted.org/packages/03/40/c43f77753139708a543157ed9206648b7213e068d2dd373a56a4bfb8a268/pmultiqc-0.0.38.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-11-01 06:21:18",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "bigbio",
    "github_project": "pmultiqc",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "multiqc",
            "specs": [
                [
                    "<=",
                    "1.32"
                ],
                [
                    ">=",
                    "1.29"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    ">=",
                    "1.5"
                ]
            ]
        },
        {
            "name": "pyteomics",
            "specs": []
        },
        {
            "name": "pyopenms",
            "specs": []
        },
        {
            "name": "sdrf-pipelines",
            "specs": [
                [
                    ">=",
                    "0.0.32"
                ]
            ]
        },
        {
            "name": "lxml",
            "specs": []
        },
        {
            "name": "numpy",
            "specs": [
                [
                    ">=",
                    "1.23"
                ]
            ]
        },
        {
            "name": "pyarrow",
            "specs": []
        },
        {
            "name": "scikit-learn",
            "specs": [
                [
                    ">=",
                    "1.2"
                ]
            ]
        },
        {
            "name": "statsmodels",
            "specs": []
        }
    ],
    "lcname": "pmultiqc"
}

Yasset Perez-Riverol