# Pyquet
Memory-efficient mzML to Parquet converter for mass spectrometry files.
## Overview
Pyquet provides streaming conversion of mzML files to Parquet format with minimal memory usage, making it suitable for processing large mass spectrometry datasets without running out of memory. This project was originally developed as a side project inspired by GSoC 25' with OpenMS, with the goal of providing a simple CLI for converting .mzML to .parquet files, which is especially important in big data projects (e.g., machine learning).
## Installation
### From PyPI
```bash
pip install pyquet
```
### From source
```bash
git clone https://github.com/Avni2000/pyquet.git
cd pyquet
pip install .
```
### Development installation
```bash
git clone https://github.com/Avni2000/pyquet.git
cd pyquet
pip install -e ".[dev]"
```
## Usage
### CLI
Basic conversion:
```bash
pyquet input.mzML
```
or
```bash
pyquet ~/Downloads/input.mzML
```
Specify output file (defaults to working directory):
```bash
pyquet input.mzML -o output.parquet
```
Customize batch size and compression. I recommend :
```bash
pyquet input.mzML --batch-size 5000 --compression gzip
```
Get file information without converting:
```bash
pyquet input.mzML --info
```
## Output Format
The converted Parquet files contain the following columns:
Depending on the type of mzml file, we have slightly different columns.
Some columns may be blank, which is perfectly okay! It doesn't mean your mzml is wrong.
The main expected values are time, m/z, and intensity
## Contributions
It's quite a small project, feel free to make a PR or open an issue!
Raw data
{
"_id": null,
"home_page": null,
"name": "PyquetMS",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "mass spectrometry, mzML, parquet, proteomics, metabolomics",
"author": null,
"author_email": "Avni Badiwale <avnibadiwale@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/a6/9b/4af44e77cb48bf0e0f7646eeaf0f2f9d2067d297e9c191180875140fec70/pyquetms-0.1.0.tar.gz",
"platform": null,
"description": "# Pyquet\n\nMemory-efficient mzML to Parquet converter for mass spectrometry files.\n\n## Overview\n\nPyquet provides streaming conversion of mzML files to Parquet format with minimal memory usage, making it suitable for processing large mass spectrometry datasets without running out of memory. This project was originally developed as a side project inspired by GSoC 25' with OpenMS, with the goal of providing a simple CLI for converting .mzML to .parquet files, which is especially important in big data projects (e.g., machine learning).\n\n## Installation\n\n### From PyPI\n\n```bash\npip install pyquet\n```\n\n### From source\n\n```bash\ngit clone https://github.com/Avni2000/pyquet.git\ncd pyquet\npip install .\n```\n\n### Development installation\n\n```bash\ngit clone https://github.com/Avni2000/pyquet.git\ncd pyquet\npip install -e \".[dev]\"\n```\n\n## Usage\n\n### CLI\n\nBasic conversion:\n```bash\npyquet input.mzML\n```\nor\n```bash\npyquet ~/Downloads/input.mzML\n```\n\nSpecify output file (defaults to working directory):\n```bash\npyquet input.mzML -o output.parquet\n```\n\nCustomize batch size and compression. I recommend :\n```bash\npyquet input.mzML --batch-size 5000 --compression gzip\n```\n\nGet file information without converting:\n```bash\npyquet input.mzML --info\n```\n\n## Output Format\n\nThe converted Parquet files contain the following columns:\n\nDepending on the type of mzml file, we have slightly different columns. \nSome columns may be blank, which is perfectly okay! It doesn't mean your mzml is wrong. \nThe main expected values are time, m/z, and intensity\n\n## Contributions\n\nIt's quite a small project, feel free to make a PR or open an issue!\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Memory-efficient mzML to Parquet converter for mass spectrometry files",
"version": "0.1.0",
"project_urls": {
"Homepage": "https://github.com/Avni2000/pyquet"
},
"split_keywords": [
"mass spectrometry",
" mzml",
" parquet",
" proteomics",
" metabolomics"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "e003806c4b5c34dcce6d18888af564dc55163d98b474134e8a1235172fdf742a",
"md5": "367474e61ff2a191d3f75c09348cf0c7",
"sha256": "8ee88a379cfab2861a9577c040b8929c335fe9737827b8c37d51b9820ae157f5"
},
"downloads": -1,
"filename": "pyquetms-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "367474e61ff2a191d3f75c09348cf0c7",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 9874,
"upload_time": "2025-08-31T23:51:15",
"upload_time_iso_8601": "2025-08-31T23:51:15.083179Z",
"url": "https://files.pythonhosted.org/packages/e0/03/806c4b5c34dcce6d18888af564dc55163d98b474134e8a1235172fdf742a/pyquetms-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "a69b4af44e77cb48bf0e0f7646eeaf0f2f9d2067d297e9c191180875140fec70",
"md5": "55552a580d4524b56cff8a5d9e43253c",
"sha256": "70785b57d68113a679385d5cb9995e89db488eeb6bb8b3762735ae2446e46342"
},
"downloads": -1,
"filename": "pyquetms-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "55552a580d4524b56cff8a5d9e43253c",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 10484,
"upload_time": "2025-08-31T23:51:16",
"upload_time_iso_8601": "2025-08-31T23:51:16.134967Z",
"url": "https://files.pythonhosted.org/packages/a6/9b/4af44e77cb48bf0e0f7646eeaf0f2f9d2067d297e9c191180875140fec70/pyquetms-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-31 23:51:16",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Avni2000",
"github_project": "pyquet",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "pyquetms"
}