bmxp

Name	bmxp JSON
Version	0.3.11 JSON
	download
home_page	None
Summary	LCMS Processing tools used by the Metabolomics Platform at the Broad Institute.
upload_time	2025-04-25 21:26:25
maintainer	None
docs_url	None
author	Daniel S. Hitchcock, Jesse Krejci, Jayanth Mani, Chloe Sturgeon
requires_python	None
license	MIT
keywords	lcms alignment processing metabolomics clustering batch correction drift correction qc filtering
VCS
bugtrack_url
requirements	networkx pandas scipy statsmodels matplotlib scikit-learn xlsxwriter openpyxl tqdm
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # BMXP - The Metabolomics Platform at the Broad Institute
`pip install bmxp`

Please cite:
https://www.biorxiv.org/content/10.1101/2023.06.09.544417v1.full

This is a collection of tools for processing our data, which powers our cloud processing workflow. Each tool is meant to be a standalone module that performs a step in our processing pipeline. They are written in Python and C, and designed to be perfomant and cloud-compatible.

* [Eclipse](https://github.com/broadinstitute/bmxp/blob/main/bmxp/eclipse/readme.md) - Align two or more same-method nontargeted LCMS datsets.
* [Gravity](https://github.com/broadinstitute/bmxp/blob/main/bmxp/gravity/readme.md) - Cluster redundant LCMS features based on RT and Correlation (And someday, XIC shape)
* [Blueshift](https://github.com/broadinstitute/bmxp/blob/main/bmxp/blueshift/readme.md) - Drift Correction via pooled technical replicates and internal standards
* [Formation](https://github.com/broadinstitute/bmxp/blob/main/bmxp/formation/readme.md) - Formatting and Final QC
* [Chroma](https://github.com/broadinstitute/bmxp/blob/main/bmxp/chroma/readme.md) - Read .raw and .mzml files

We expect users to be familiar with Python and already have an understanding of LCMS Metabolomics data processing and the specific steps they wish to accomplish.

While the tools are and always will be standalone, we are working on linking them closer together with a shared schema, and eventually may have a pipeline ability to run all steps, given a set of parameters.

We are open to feedback and suggestions, with a focus on performance and application in pipelines.

# Shared Schema
All BMXP modules use a shared schema and file formats with our prefered columns headers. These files are (along with their labels):
* Feature Metadata `bmxp.FMDATA` - Describes the feature. Index default is `Compound_ID`
* Injection Metadata `bmxp.IMDATA` - Describes the Injection. Index default is `injection_id`
* Sample Metadata `bmxp.SMDATA` - Describes the biospecimen from which the Injection is derived. Index default is `broad_id` 
* Feature Abundances - Pivot table of Feature x Injection (`Compound_ID` x `injection_id`) containing the abundances.

Some modules (Blueshift, Eclipse) require merging Feature Metadata + Feature Abundances.
 
These can be changed globally so that all packages will use the same terminology.
To update the schema, modify the dictionary objects in the module directly prior to running code. For example:
```python
import bmxp
from bxmp.eclipse import MSAligner
from bxmp.blueshift import DriftCorrection
from bmxp.gravity import cluster
bmxp.FMDATA['Compound_ID'] = 'Feature_ID'
bmxp.IMDATA['injection_id'] = 'Filename'

# continue with work...
```
With those changes above, Eclipse, Blushift and Gravity will use "Feature_ID" and "Filename" as column headers instead of "Compound_ID" and "injection_id".

## Feature Metadata - bmxp.FMDATA
Feature Metadata describes the LCMS feature. This is a mixture of fundamental nontargeted feature information, annotation info, and anything else.

### Feature Specific
* `Compound_ID` - Index, Project-unique feature ID (a bit of a misnomer)
* `RT` - Unitless retention time, may or may not be scaled
* `MZ` - Unsigned mass-to-charge ratio
* `Intensity` - Average feature intensity
* `Method` - Human Readable name of LCMS method used
* `__extraction_method` - Name of extraction method/software used. Used to denote mixed Targeted/Nontargeted

### Annotation
* `Annotation_ID` - Method-unique annotation label
* `Adduct` - Adduct form of the annotation
* `__annotation_id` - Globally unique annotation identifier
* `Metabolite` - Preferred display/reporting name of metabolite
* `Non_Quant` - Boolean denoting that a feature is not quanitifiable

### Generated by Gravity
* `Cluster_Num` - Cluster number assigned during Gravity clustering
* `Cluster_Size` -  Number of members in the cluster

### Generated by Blueshift
* `Batches Skipped` - Batches that were skipped due to lack of PREFs

## Injection Metadata - bmxp.IMDATA
* `injection_id` - Index, Injection name, usually filename without the extension
* `broad_id` - Assigned biospeciemn label
* `program_id` - Biospecimen label as received (inherited from Sample Metadata)
* `injection_type` - Type of injection ("sample", "prefa", "prefb", "blank", "other-", "not_used-")
* `comments` - Comments about the injection
* `column_number` - Column number, in multi-column studies
* `injection_order` - Injection number, not skipping blanks or non-samples
* `batches` - Denotes batches ('batch start' or 'batch end')

## generated by blueshift
* `QCRole` - Role in drift correction ("QC-drift_correction", "QC-pooled_ref", "QC-not_used", "sample")

## Sample Metadata - bmxp.SMDATA
* `broad_id` - Assigned biospecimen label
* Arbitrary Metadata Columns - Any column label except labels in Injection Metadata

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "bmxp",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "LCMS, Alignment, Processing, Metabolomics, Clustering, Batch Correction, Drift Correction, QC, Filtering",
    "author": "Daniel S. Hitchcock, Jesse Krejci, Jayanth Mani, Chloe Sturgeon",
    "author_email": "\"Daniel S. Hitchcock\" <daniel.s.hitchcock@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/3c/3e/e28f8bc0aacddbf69304f67154f6706db8ebbda09781c91b5d7e656c0050/bmxp-0.3.11.tar.gz",
    "platform": null,
    "description": "# BMXP - The Metabolomics Platform at the Broad Institute\r\n`pip install bmxp`\r\n\r\nPlease cite:\r\nhttps://www.biorxiv.org/content/10.1101/2023.06.09.544417v1.full\r\n\r\nThis is a collection of tools for processing our data, which powers our cloud processing workflow. Each tool is meant to be a standalone module that performs a step in our processing pipeline. They are written in Python and C, and designed to be perfomant and cloud-compatible.\r\n\r\n* [Eclipse](https://github.com/broadinstitute/bmxp/blob/main/bmxp/eclipse/readme.md) - Align two or more same-method nontargeted LCMS datsets.\r\n* [Gravity](https://github.com/broadinstitute/bmxp/blob/main/bmxp/gravity/readme.md) - Cluster redundant LCMS features based on RT and Correlation (And someday, XIC shape)\r\n* [Blueshift](https://github.com/broadinstitute/bmxp/blob/main/bmxp/blueshift/readme.md) - Drift Correction via pooled technical replicates and internal standards\r\n* [Formation](https://github.com/broadinstitute/bmxp/blob/main/bmxp/formation/readme.md) - Formatting and Final QC\r\n* [Chroma](https://github.com/broadinstitute/bmxp/blob/main/bmxp/chroma/readme.md) - Read .raw and .mzml files\r\n\r\nWe expect users to be familiar with Python and already have an understanding of LCMS Metabolomics data processing and the specific steps they wish to accomplish.\r\n\r\nWhile the tools are and always will be standalone, we are working on linking them closer together with a shared schema, and eventually may have a pipeline ability to run all steps, given a set of parameters.\r\n\r\nWe are open to feedback and suggestions, with a focus on performance and application in pipelines.\r\n\r\n# Shared Schema\r\nAll BMXP modules use a shared schema and file formats with our prefered columns headers. These files are (along with their labels):\r\n* Feature Metadata `bmxp.FMDATA` - Describes the feature. Index default is `Compound_ID`\r\n* Injection Metadata `bmxp.IMDATA` - Describes the Injection. Index default is `injection_id`\r\n* Sample Metadata `bmxp.SMDATA` - Describes the biospecimen from which the Injection is derived. Index default is `broad_id` \r\n* Feature Abundances - Pivot table of Feature x Injection (`Compound_ID` x `injection_id`) containing the abundances.\r\n\r\nSome modules (Blueshift, Eclipse) require merging Feature Metadata + Feature Abundances.\r\n \r\nThese can be changed globally so that all packages will use the same terminology.\r\nTo update the schema, modify the dictionary objects in the module directly prior to running code. For example:\r\n```python\r\nimport bmxp\r\nfrom bxmp.eclipse import MSAligner\r\nfrom bxmp.blueshift import DriftCorrection\r\nfrom bmxp.gravity import cluster\r\nbmxp.FMDATA['Compound_ID'] = 'Feature_ID'\r\nbmxp.IMDATA['injection_id'] = 'Filename'\r\n\r\n# continue with work...\r\n```\r\nWith those changes above, Eclipse, Blushift and Gravity will use \"Feature_ID\" and \"Filename\" as column headers instead of \"Compound_ID\" and \"injection_id\".\r\n\r\n## Feature Metadata - bmxp.FMDATA\r\nFeature Metadata describes the LCMS feature. This is a mixture of fundamental nontargeted feature information, annotation info, and anything else.\r\n\r\n### Feature Specific\r\n* `Compound_ID` - Index, Project-unique feature ID (a bit of a misnomer)\r\n* `RT` - Unitless retention time, may or may not be scaled\r\n* `MZ` - Unsigned mass-to-charge ratio\r\n* `Intensity` - Average feature intensity\r\n* `Method` - Human Readable name of LCMS method used\r\n* `__extraction_method` - Name of extraction method/software used. Used to denote mixed Targeted/Nontargeted\r\n\r\n### Annotation\r\n* `Annotation_ID` - Method-unique annotation label\r\n* `Adduct` - Adduct form of the annotation\r\n* `__annotation_id` - Globally unique annotation identifier\r\n* `Metabolite` - Preferred display/reporting name of metabolite\r\n* `Non_Quant` - Boolean denoting that a feature is not quanitifiable\r\n\r\n### Generated by Gravity\r\n* `Cluster_Num` - Cluster number assigned during Gravity clustering\r\n* `Cluster_Size` -  Number of members in the cluster\r\n\r\n### Generated by Blueshift\r\n* `Batches Skipped` - Batches that were skipped due to lack of PREFs\r\n\r\n## Injection Metadata - bmxp.IMDATA\r\n* `injection_id` - Index, Injection name, usually filename without the extension\r\n* `broad_id` - Assigned biospeciemn label\r\n* `program_id` - Biospecimen label as received (inherited from Sample Metadata)\r\n* `injection_type` - Type of injection (\"sample\", \"prefa\", \"prefb\", \"blank\", \"other-\", \"not_used-\")\r\n* `comments` - Comments about the injection\r\n* `column_number` - Column number, in multi-column studies\r\n* `injection_order` - Injection number, not skipping blanks or non-samples\r\n* `batches` - Denotes batches ('batch start' or 'batch end')\r\n\r\n## generated by blueshift\r\n* `QCRole` - Role in drift correction (\"QC-drift_correction\", \"QC-pooled_ref\", \"QC-not_used\", \"sample\")\r\n\r\n## Sample Metadata - bmxp.SMDATA\r\n* `broad_id` - Assigned biospecimen label\r\n* Arbitrary Metadata Columns - Any column label except labels in Injection Metadata\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "LCMS Processing tools used by the Metabolomics Platform at the Broad Institute.",
    "version": "0.3.11",
    "project_urls": {
        "Repository": "https://github.com/broadinstitute/bmxp"
    },
    "split_keywords": [
        "lcms",
        " alignment",
        " processing",
        " metabolomics",
        " clustering",
        " batch correction",
        " drift correction",
        " qc",
        " filtering"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "57bb2e96f0157251338d9736499120dbbbfb8065ecc882d2b48add633ab9d7de",
                "md5": "9d13ca78bfe4eb51bb23b3df9415d176",
                "sha256": "13796625dea6c6f59cc095b6007c34c6a8645c78347e1ad1e864f27b0ce616d3"
            },
            "downloads": -1,
            "filename": "bmxp-0.3.11-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9d13ca78bfe4eb51bb23b3df9415d176",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 214024,
            "upload_time": "2025-04-25T21:26:23",
            "upload_time_iso_8601": "2025-04-25T21:26:23.216036Z",
            "url": "https://files.pythonhosted.org/packages/57/bb/2e96f0157251338d9736499120dbbbfb8065ecc882d2b48add633ab9d7de/bmxp-0.3.11-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "3c3ee28f8bc0aacddbf69304f67154f6706db8ebbda09781c91b5d7e656c0050",
                "md5": "65630407759eaa6f91c11183c66eb67a",
                "sha256": "599b6c3d18bef5c456b7016efb1e9b65dd88b3a614a360be28661dbbf4d65bdd"
            },
            "downloads": -1,
            "filename": "bmxp-0.3.11.tar.gz",
            "has_sig": false,
            "md5_digest": "65630407759eaa6f91c11183c66eb67a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 1511878,
            "upload_time": "2025-04-25T21:26:25",
            "upload_time_iso_8601": "2025-04-25T21:26:25.213179Z",
            "url": "https://files.pythonhosted.org/packages/3c/3e/e28f8bc0aacddbf69304f67154f6706db8ebbda09781c91b5d7e656c0050/bmxp-0.3.11.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-04-25 21:26:25",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "broadinstitute",
    "github_project": "bmxp",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "networkx",
            "specs": []
        },
        {
            "name": "pandas",
            "specs": [
                [
                    ">=",
                    "2"
                ]
            ]
        },
        {
            "name": "scipy",
            "specs": [
                [
                    ">=",
                    "1"
                ]
            ]
        },
        {
            "name": "statsmodels",
            "specs": []
        },
        {
            "name": "matplotlib",
            "specs": []
        },
        {
            "name": "scikit-learn",
            "specs": [
                [
                    ">=",
                    "1"
                ]
            ]
        },
        {
            "name": "xlsxwriter",
            "specs": []
        },
        {
            "name": "openpyxl",
            "specs": []
        },
        {
            "name": "tqdm",
            "specs": []
        }
    ],
    "lcname": "bmxp"
}

Daniel S. Hitchcock, Jesse Krejci, Jayanth Mani, Chloe Sturgeon