# pyquetms
Memory-efficient mzML to Parquet converter for mass spectrometry files.
## Overview
pyquetms provides streaming conversion of mzML files to Parquet format with minimal memory usage, making it suitable for processing large mass spectrometry datasets without running out of memory. This project was originally developed as a side project inspired by GSoC 25' with OpenMS, with the goal of providing a simple CLI for converting .mzML to .parquet files, which is especially important in big data projects (e.g., machine learning).
## Installation
### From PyPI
```bash
pip install pyquetms
```
### From source
```bash
git clone https://github.com/Avni2000/pyquetms.git
cd pyquetms
pip install .
```
### Development installation
```bash
git clone https://github.com/Avni2000/pyquetms.git
cd pyquetms
pip install -e ".[dev]"
```
## Usage
### CLI
Basic conversion:
```bash
pyquetms input.mzML
```
or
```bash
pyquetms ~/Downloads/input.mzML
```
Specify output file (defaults to working directory):
```bash
pyquetms input.mzML -o output.parquet
```
Customize batch size and compression. I recommend :
```bash
pyquetms input.mzML --batch-size 5000 --compression gzip
```
Get file information without converting:
```bash
pyquetms input.mzML --info
```
## Output Format
The converted Parquet files contain the following columns:
Depending on the type of mzml file, we have slightly different columns.
Some columns may be blank, which is perfectly okay! It doesn't mean your mzml is wrong.
The main expected values are time, m/z, and intensity
## Contributions
It's quite a small project, feel free to make a PR or open an issue!
Raw data
{
"_id": null,
"home_page": null,
"name": "pyquetmsMS",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "mass spectrometry, mzML, parquet, proteomics, metabolomics",
"author": null,
"author_email": "Avni Badiwale <avnibadiwale@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/19/25/46f14d75802d272366600c216b8bad60d6eaa6eba812dc7559adb29cdcd9/pyquetmsms-0.1.1.tar.gz",
"platform": null,
"description": "# pyquetms\n\nMemory-efficient mzML to Parquet converter for mass spectrometry files.\n\n## Overview\n\npyquetms provides streaming conversion of mzML files to Parquet format with minimal memory usage, making it suitable for processing large mass spectrometry datasets without running out of memory. This project was originally developed as a side project inspired by GSoC 25' with OpenMS, with the goal of providing a simple CLI for converting .mzML to .parquet files, which is especially important in big data projects (e.g., machine learning).\n\n## Installation\n\n### From PyPI\n\n```bash\npip install pyquetms\n```\n\n### From source\n\n```bash\ngit clone https://github.com/Avni2000/pyquetms.git\ncd pyquetms\npip install .\n```\n\n### Development installation\n\n```bash\ngit clone https://github.com/Avni2000/pyquetms.git\ncd pyquetms\npip install -e \".[dev]\"\n```\n\n## Usage\n\n### CLI\n\nBasic conversion:\n```bash\npyquetms input.mzML\n```\nor\n```bash\npyquetms ~/Downloads/input.mzML\n```\n\nSpecify output file (defaults to working directory):\n```bash\npyquetms input.mzML -o output.parquet\n```\n\nCustomize batch size and compression. I recommend :\n```bash\npyquetms input.mzML --batch-size 5000 --compression gzip\n```\n\nGet file information without converting:\n```bash\npyquetms input.mzML --info\n```\n\n## Output Format\n\nThe converted Parquet files contain the following columns:\n\nDepending on the type of mzml file, we have slightly different columns. \nSome columns may be blank, which is perfectly okay! It doesn't mean your mzml is wrong. \nThe main expected values are time, m/z, and intensity\n\n## Contributions\n\nIt's quite a small project, feel free to make a PR or open an issue!\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Memory-efficient mzML to Parquet converter for mass spectrometry files",
"version": "0.1.1",
"project_urls": {
"Homepage": "https://github.com/Avni2000/pyquetms"
},
"split_keywords": [
"mass spectrometry",
" mzml",
" parquet",
" proteomics",
" metabolomics"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "c8295a3bccbb278f45108ef55d768db76b53514403a33df8c436819ee40b47ec",
"md5": "a43752007b12b7e6908f23cbcc433265",
"sha256": "c5e39a1fd753ce8aecc3c847bf8d37a078a53fbb270424d4902cc8c6e6ae2666"
},
"downloads": -1,
"filename": "pyquetmsms-0.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "a43752007b12b7e6908f23cbcc433265",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 2369,
"upload_time": "2025-09-01T00:19:34",
"upload_time_iso_8601": "2025-09-01T00:19:34.243063Z",
"url": "https://files.pythonhosted.org/packages/c8/29/5a3bccbb278f45108ef55d768db76b53514403a33df8c436819ee40b47ec/pyquetmsms-0.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "192546f14d75802d272366600c216b8bad60d6eaa6eba812dc7559adb29cdcd9",
"md5": "8175613beb8446532dece9e681fb8c25",
"sha256": "4812512f0605675a881834e968ce31dbbf445d8e634aadf1666bf448d3ecd389"
},
"downloads": -1,
"filename": "pyquetmsms-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "8175613beb8446532dece9e681fb8c25",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 2817,
"upload_time": "2025-09-01T00:19:35",
"upload_time_iso_8601": "2025-09-01T00:19:35.657240Z",
"url": "https://files.pythonhosted.org/packages/19/25/46f14d75802d272366600c216b8bad60d6eaa6eba812dc7559adb29cdcd9/pyquetmsms-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-01 00:19:35",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Avni2000",
"github_project": "pyquetms",
"github_not_found": true,
"lcname": "pyquetmsms"
}