=====
MQuad
=====
MQuad: Mixture Model for Mitochondrial Mutation detection in single-cell omics data
MQuad is a tool that detects mitochondrial mutations that are informative for clonal substructure inference. It uses a binomial mixture model to assess the heteroplasmy of mtDNA variants among background noise.
A recommended pipeline to generate the neccessary files:
1. use `cellSNP <https://github.com/single-cell-genetics/cellSNP>`_ or `cellsnp-lite <https://github.com/single-cell-genetics/cellsnp-lite>`_ (a faster version of cellSNP, still at testing stage so might be unstable) to pileup mtDNA variants from raw .bam file(s)
2. use MQuad to differentiate informative mtDNA variants from noisy backbground
3. use `vireoSNP <https://github.com/single-cell-genetics/vireo>`_ to assign cells to clones based on mtDNA variant profile
Different upstream/downstream packages can also be used if the neccesary file formats are available.
If you are too lazy to dig into cellSNP's usage, a preprocessing_cmds.sh is also available in the examples folder which shows the shell commands for cellSNP. (With more updates coming)
OS requirements
===============
This package has been tested on the following systems with Python 3.8.8:
* Windows: Windows 10
* Linux: CentOS Linux 7 (Core)
Installation
============
MQuad is available through `PyPI <https://pypi.org/project/mquad/>`_. To install, type the following command line and add ``-U`` for updates:
.. code-block:: bash
pip install -U mquad
Alternatively, you can install from this GitHub repository for latest (often development) version by the following command line:
.. code-block:: bash
pip install -U git+https://github.com/single-cell-genetics/MQuad
Installation time: < 1 min
Manual
======
Once installed, you can first check the version and input parameters with ``mquad -h``
MQuad recognizes 3 types of input:
1. cellSNP output folder with AD and DP sparse matrices (.mtx)
.. code-block:: bash
mquad -c $INPUT_DIR -o $OUT_DIR -p 20
2. .vcf only
.. code-block:: bash
mquad --vcfData $VCF -o $OUT_DIR -p 20
3. AD and DP sparse matrices (.mtx), comma separated
.. code-block:: bash
mquad -m cellSNP.tag.AD.mtx, cellSNP.tag.DP.mtx -o $OUT_DIR -p 20
For droplet-based sequencing data, eg. 10X Chromium CNV, scATAC..etc, it is recommended to add ``--minDP 5`` or a smaller value to prevent errors during fitting. The default value is 10, which is suitable for Smart-seq2 data but might be too stringent for low sequencing depth data.
The output files will be explained below in the 'Example' section.
Example
=======
MQuad comes with an example dataset for you to test things out. The mtDNA mutations of this dataset are extracted from `Ludwig et al, Cell, 2019 <https://doi.org/10.1016/j.cell.2019.01.022>`_. It contains 500 background variants, along with 9 variants used in Supp Fig. 2F (and main Fig. 2F). There is also 1 additional variant that is informative but not mentioned in the paper. In total, there are 510 variants in the example dataset.
Run the following command line:
.. code-block:: bash
mquad --vcfData example/example.vcf.gz -o example_test -p 5
or using batch mode tailored for mixture-binomial modelling:
.. code-block:: bash
mquad --vcfData example/example.vcf.gz -o example_test -p 5 --batchFit 1 --batchSize 5
The output files should include:
* passed_ad.mtx, passed_dp.mtx: Sparse matrix files of the AD/DP of qualified variants for downstream clonal analysis
* top variants heatmap.pdf: Heatmap of the allele frequency of qualified variants
.. image:: images/top_variants_heatmap.png
:width: 600px
:align: center
:height: 400px
* deltaBIC_cdf.pdf: A cdf plot of deltaBIC distribution of all variants, including the cutoff determined by MQuad
.. image:: images/deltaBIC_cdf.png
:width: 600px
:align: center
:height: 400px
* BIC_params.csv: A spreadsheet containing detailed parameters/statistics of all variants, sorted from highest deltaBIC to lowest
* debug_unsorted_BIC_params.csv: Same spreadsheet as BIC_params.csv but unsorted, for developers' debugging purpose, will probably be removed on later versions of MQuad
Typical run time: ~10 seconds
Column description for BIC_params.csv:
* num_cells: number of cells passing the sequencing depth threshold (default 10)
* deltaBIC: score of informativeness, higher is better
* params1, params2, model1BIC, model2BIC: fitted parameteres for the binomial model, for debugging purposes
* num_cells_nonzero_AD, total_DP, median_DP, total_AD, median_AD: self explanatory
* new_mutations, as_mutation: some classification criteria that does not affect the filtering, again for debugging purposes
* fraction_b_allele: the fraction of minor allele in the minor component (NOT equal to allele frequency)
* num_cells_minor_cpt: no. of cells in the minor component, used to filtering variants that only happens in 1 or 2 cells
Raw data
{
"_id": null,
"home_page": "https://github.com/aaronkwc/MQuad",
"name": "mquad",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "mitochondrial mutation, single-cell omics data, binomial mixture model, model selection",
"author": "Aaron Kwok",
"author_email": "aaronkwc@connect.hku.hk",
"download_url": null,
"platform": null,
"description": "=====\nMQuad\n=====\n\nMQuad: Mixture Model for Mitochondrial Mutation detection in single-cell omics data\n\nMQuad is a tool that detects mitochondrial mutations that are informative for clonal substructure inference. It uses a binomial mixture model to assess the heteroplasmy of mtDNA variants among background noise.\n\nA recommended pipeline to generate the neccessary files:\n\n1. use `cellSNP <https://github.com/single-cell-genetics/cellSNP>`_ or `cellsnp-lite <https://github.com/single-cell-genetics/cellsnp-lite>`_ (a faster version of cellSNP, still at testing stage so might be unstable) to pileup mtDNA variants from raw .bam file(s)\n\n2. use MQuad to differentiate informative mtDNA variants from noisy backbground\n\n3. use `vireoSNP <https://github.com/single-cell-genetics/vireo>`_ to assign cells to clones based on mtDNA variant profile\n\n\nDifferent upstream/downstream packages can also be used if the neccesary file formats are available.\n\nIf you are too lazy to dig into cellSNP's usage, a preprocessing_cmds.sh is also available in the examples folder which shows the shell commands for cellSNP. (With more updates coming)\n\nOS requirements\n===============\n\nThis package has been tested on the following systems with Python 3.8.8:\n\n* Windows: Windows 10\n* Linux: CentOS Linux 7 (Core)\n\nInstallation\n============\n\nMQuad is available through `PyPI <https://pypi.org/project/mquad/>`_. To install, type the following command line and add ``-U`` for updates:\n\n.. code-block:: bash\n\n pip install -U mquad\n\nAlternatively, you can install from this GitHub repository for latest (often development) version by the following command line:\n\n.. code-block:: bash\n\n pip install -U git+https://github.com/single-cell-genetics/MQuad\n\nInstallation time: < 1 min\n\nManual\n======\n\nOnce installed, you can first check the version and input parameters with ``mquad -h`` \n\nMQuad recognizes 3 types of input:\n\n1. cellSNP output folder with AD and DP sparse matrices (.mtx)\n\n.. code-block:: bash\n\n mquad -c $INPUT_DIR -o $OUT_DIR -p 20\n\n2. .vcf only\n\n.. code-block:: bash\n\n mquad --vcfData $VCF -o $OUT_DIR -p 20\n\n3. AD and DP sparse matrices (.mtx), comma separated\n\n.. code-block:: bash\n\n mquad -m cellSNP.tag.AD.mtx, cellSNP.tag.DP.mtx -o $OUT_DIR -p 20\n \nFor droplet-based sequencing data, eg. 10X Chromium CNV, scATAC..etc, it is recommended to add ``--minDP 5`` or a smaller value to prevent errors during fitting. The default value is 10, which is suitable for Smart-seq2 data but might be too stringent for low sequencing depth data.\n\nThe output files will be explained below in the 'Example' section.\n\nExample\n=======\n\nMQuad comes with an example dataset for you to test things out. The mtDNA mutations of this dataset are extracted from `Ludwig et al, Cell, 2019 <https://doi.org/10.1016/j.cell.2019.01.022>`_. It contains 500 background variants, along with 9 variants used in Supp Fig. 2F (and main Fig. 2F). There is also 1 additional variant that is informative but not mentioned in the paper. In total, there are 510 variants in the example dataset.\n\nRun the following command line:\n\n.. code-block:: bash\n\n mquad --vcfData example/example.vcf.gz -o example_test -p 5\n \nor using batch mode tailored for mixture-binomial modelling:\n\n.. code-block:: bash\n\n mquad --vcfData example/example.vcf.gz -o example_test -p 5 --batchFit 1 --batchSize 5\n \nThe output files should include:\n\n* passed_ad.mtx, passed_dp.mtx: Sparse matrix files of the AD/DP of qualified variants for downstream clonal analysis\n* top variants heatmap.pdf: Heatmap of the allele frequency of qualified variants\n\n.. image:: images/top_variants_heatmap.png\n :width: 600px\n :align: center\n :height: 400px\n \n* deltaBIC_cdf.pdf: A cdf plot of deltaBIC distribution of all variants, including the cutoff determined by MQuad\n\n.. image:: images/deltaBIC_cdf.png\n :width: 600px\n :align: center\n :height: 400px\n \n* BIC_params.csv: A spreadsheet containing detailed parameters/statistics of all variants, sorted from highest deltaBIC to lowest\n* debug_unsorted_BIC_params.csv: Same spreadsheet as BIC_params.csv but unsorted, for developers' debugging purpose, will probably be removed on later versions of MQuad\n\nTypical run time: ~10 seconds\n\nColumn description for BIC_params.csv:\n\n* num_cells: number of cells passing the sequencing depth threshold (default 10)\n* deltaBIC: score of informativeness, higher is better\n* params1, params2, model1BIC, model2BIC: fitted parameteres for the binomial model, for debugging purposes\n* num_cells_nonzero_AD, total_DP, median_DP, total_AD, median_AD: self explanatory\n* new_mutations, as_mutation: some classification criteria that does not affect the filtering, again for debugging purposes\n* fraction_b_allele: the fraction of minor allele in the minor component (NOT equal to allele frequency)\n* num_cells_minor_cpt: no. of cells in the minor component, used to filtering variants that only happens in 1 or 2 cells\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "MQuad - Mixture Modelling for Mitochondrial Mutation detection",
"version": "0.1.8",
"project_urls": {
"Homepage": "https://github.com/aaronkwc/MQuad"
},
"split_keywords": [
"mitochondrial mutation",
" single-cell omics data",
" binomial mixture model",
" model selection"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "18306d9600df79c96f004ba31633a0cca3a5fe7a8cb3854622550e6d9b54b33c",
"md5": "bf3a35c95f98033b5a51c1bb0d6d8df1",
"sha256": "ba4293b8c175558338bd174cfdcf66f60809e8e8d8b79aa2682d2b9082bf092b"
},
"downloads": -1,
"filename": "mquad-0.1.8-py3-none-any.whl",
"has_sig": false,
"md5_digest": "bf3a35c95f98033b5a51c1bb0d6d8df1",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 20021,
"upload_time": "2024-05-03T09:11:55",
"upload_time_iso_8601": "2024-05-03T09:11:55.241060Z",
"url": "https://files.pythonhosted.org/packages/18/30/6d9600df79c96f004ba31633a0cca3a5fe7a8cb3854622550e6d9b54b33c/mquad-0.1.8-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-05-03 09:11:55",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "aaronkwc",
"github_project": "MQuad",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "mquad"
}