Name | luxglm JSON |
Version |
1.0.0
JSON |
| download |
home_page | https://github.com/tare/LuxGLM |
Summary | A probabilistic covariate model for quantification of DNA methylation modifications with complex experimental designs |
upload_time | 2024-10-21 23:43:27 |
maintainer | None |
docs_url | None |
author | Tarmo Äijö |
requires_python | <4.0,>=3.10 |
license | MIT |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# LuxGLM: a probabilistic covariate model for quantification of DNA methylation modifications with complex experimental designs
## Overview
LuxGLM is a method for quantifying oxi-mC species with arbitrary covariate structures from bisulphite based sequencing data. LuxGLM's probabilistic modeling framework combines a previously proposed hierarchical generative model of Lux for oxi-mC-seq data and a general linear model component to account for confounding effects.
## Features
- Model-based integration and analysis of BS-seq/oxBS-seq/TAB-seq/fCAB-seq/CAB-seq/redBS-seq/MAB-seq/etc. data from whole genome, reduced representation or targeted experiments
- Accounts for confounding effects through general linear model component
- Considers nonideal experimental parameters through modeling, including e.g. bisulphite conversion and oxidation efficiencies, various chemical labeling and protection steps etc.
- Model-based integration of biological replicates
- Detects differential methylation using Bayes factors (DMRs)
- Full Bayesian inference using NumPyro
## Quick introduction
An usual LuxGLM pipeline has the following steps
1. Alignment of BS-seq and oxBS-seq data (e.g., [Bismark](http://www.bioinformatics.babraham.ac.uk/projects/bismark/) or [BSmooth](http://rafalab.jhsph.edu/bsmooth/))
2. Extraction of converted and unconverted counts (e.g., [Bismark](http://www.bioinformatics.babraham.ac.uk/projects/bismark/) or [BSmooth](http://rafalab.jhsph.edu/bsmooth/))
3. Integrative methylation analysis
4. Analysis of obtained methylation estimates, e.g., using Bayes factors
This documentation focuses on the third and fourth points.
## Installation
### PyPI
```console
$ pip install luxglm
```
### GitHub
Install the version from the main branch as follows
```console
$ pip install git+https://github.com/tare/LuxGLM.git
```
## Usage
### Metadata
Count data and covariates are defined in the metadata file.
| name | basal/tgf-beta | vitc | ra | timepoint | count_file | control_count_file | control_definition_file |
| ---------- | -------------- | ---- | --- | --------- | ----------------------- | ------------------------------ | ----------------------- |
| TGFb_1_24h | 1 | 0 | 0 | 24 | wildtype/TGFb_1_24h.tsv | control/TGFb_1_24h_control.tsv | control_definitions.tsv |
| TGFb_1_38h | 1 | 0 | 0 | 38 | wildtype/TGFb_1_38h.tsv | control/TGFb_1_38h_control.tsv | control_definitions.tsv |
| TGFb_1_48h | 1 | 0 | 0 | 48 | wildtype/TGFb_1_48h.tsv | control/TGFb_1_48h_control.tsv | control_definitions.tsv |
The following columns are mandatory: `name`, `count_file`, `control_count_file`, and `control_definition`. Additionally, there has to be at least one covariate. In the above example, we have four covariates: `basal/tgf-beta`, `vitc`, `ra`, and `timepoint`.
### Control cytosines
The control cytosine data are supplied in the control count files. Each experiment will have its own file. The files contain location information (`chromosome` and `position`) and control type information (`control_cytosine`) for the control cytosines. Additionally, we have the number of Cs and and total number of read-outs from BS-seq and oxBS-seq experiments (`bs_c`, `bs_total`, `oxbs_c`, and `oxbs_total`).
| chromosome | position | control_type | bs_c | bs_total | oxbs_c | oxbs_total |
| ----------- | -------- | ------------ | ---- | -------- | ------ | ---------- |
| Lambda_ctrl | 22924 | C | 2 | 343 | 1 | 562 |
| Lambda_ctrl | 22928 | C | 2 | 341 | 1 | 561 |
| Lambda_ctrl | 47359 | 5mC | 3770 | 3857 | 4767 | 4877 |
| Lambda_ctrl | 47367 | 5mC | 3895 | 3962 | 4855 | 4979 |
| Lambda_ctrl | 23789 | 5hmC | 3792 | 3964 | 79 | 865 |
| Lambda_ctrl | 23794 | 5hmC | 3901 | 4115 | 62 | 934 |
The prior knowledge on the control cytosines is supplied in the control definition file. Note that `control_type` is used to link the control count data and control definitions.
| control_type | C_pseudocount | 5mC_pseudocount | 5hmC_pseudocount |
| ------------ | ------------- | --------------- | ---------------- |
| C | 998 | 1 | 1 |
| 5mC | 1 | 998 | 1 |
| 5hmC | 6 | 2 | 72 |
### Noncontrol cytosines
The non-control cytosine data are supplied in the count files. Each experiment will have its own file. The files contain location information (`chromosome` and `position`) for the non-control cytosines. Additionally, we have the number of Cs and and total number of read-outs from BS-seq and oxBS-seq experiments (`bs_c`, `bs_total`, `oxbs_c`, and `oxbs_total`).
| chromosome | position | bs_c | bs_total | oxbs_c | oxbs_total |
| ---------- | -------- | ---- | -------- | ------ | ---------- |
| chrX | 7159069 | 1083 | 1563 | 850 | 2736 |
| chrX | 7159186 | 1341 | 1534 | 2119 | 2719 |
| chrX | 7159222 | 4949 | 5575 | 3886 | 4639 |
| chrX | 7159235 | 4831 | 5588 | 4354 | 4641 |
### LuxGLM analysis
The following lines are sufficient to run LuxGLM
```python
import numpyro
from jax import random
from luxglm.inference import run_nuts
from luxglm.utils import get_input_data
numpyro.enable_x64()
# read input data
lux_input_data = get_input_data("metadata.tsv")
key = random.PRNGKey(0)
key, key_ = random.split(key)
# run LuxGLM
lux_result = run_nuts(
key,
lux_input_data,
["basal/tgf-beta"],
num_warmup=1_000,
num_samples=1_000,
num_chains=4,
)
# ensure convergence
lux_result.inference_metrics["summary"].query("r_hat > 1.05")
```
To get the posterior samples of methylation levels of control and non-control cytosines one can call `lux_result.methylation_controls()` and `lux_result.methylation()`, respectively.
The posterior samples of experimental parameters can be obtained by calling `lux_result.experimental_parameters()`.
To study the effects of the covariates, one can get the posterior samples of coefficients of covariates using `lux_result.coefficients()`.
### Examples
Please see the [examples](examples/) directory for the tutorial notebooks.
## References
[1] T. Äijö, X. Yue, A. Rao and H. Lähdesmäki, “LuxGLM: a probabilistic covariate model for quantification of DNA methylation modifications with complex experimental designs.,” Bioinformatics, 32.17:i511-i519, Sep 2016.
[2] T. Äijö, Y. Huang, H. Mannerström, L. Chavez, A. Tsagaratou, A. Rao and H. Lähdesmäki, “A probabilistic generative model for quantification of DNA modifications enables analysis of demethylation pathways.,” Genome Biol, 17.1:1, Mar 2016.
[3] F. Krueger and S. R. Andrews, “Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications.,” Bioinformatics, 27.11:1571-1572, Jun 2011.
[4] K. D. Hansen, B. Langmead and R. A. Irizarry, “BSmooth: from whole genome bisulfite sequencing reads to differentially methylated regions.,” Genome Biol, 13.10:1, Oct 2012.
[5] D. Phan, N. Pradhan and M. Jankowiak, “Composable Effects for Flexible and Accelerated Probabilistic Programming in NumPyro.,” arXiv preprint 1912.11554, Dec 2019.
Raw data
{
"_id": null,
"home_page": "https://github.com/tare/LuxGLM",
"name": "luxglm",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.10",
"maintainer_email": null,
"keywords": null,
"author": "Tarmo \u00c4ij\u00f6",
"author_email": "tarmo.aijo@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/1d/44/5e96215266cb090c44c99a55735123a9e0e8e0ee687c38705c7baa0d9796/luxglm-1.0.0.tar.gz",
"platform": null,
"description": "# LuxGLM: a probabilistic covariate model for quantification of DNA methylation modifications with complex experimental designs\n\n## Overview\n\nLuxGLM is a method for quantifying oxi-mC species with arbitrary covariate structures from bisulphite based sequencing data. LuxGLM's probabilistic modeling framework combines a previously proposed hierarchical generative model of Lux for oxi-mC-seq data and a general linear model component to account for confounding effects.\n\n## Features\n\n- Model-based integration and analysis of BS-seq/oxBS-seq/TAB-seq/fCAB-seq/CAB-seq/redBS-seq/MAB-seq/etc. data from whole genome, reduced representation or targeted experiments\n- Accounts for confounding effects through general linear model component\n- Considers nonideal experimental parameters through modeling, including e.g. bisulphite conversion and oxidation efficiencies, various chemical labeling and protection steps etc.\n- Model-based integration of biological replicates\n- Detects differential methylation using Bayes factors (DMRs)\n- Full Bayesian inference using NumPyro\n\n## Quick introduction\n\nAn usual LuxGLM pipeline has the following steps\n\n1. Alignment of BS-seq and oxBS-seq data (e.g., [Bismark](http://www.bioinformatics.babraham.ac.uk/projects/bismark/) or [BSmooth](http://rafalab.jhsph.edu/bsmooth/))\n\n2. Extraction of converted and unconverted counts (e.g., [Bismark](http://www.bioinformatics.babraham.ac.uk/projects/bismark/) or [BSmooth](http://rafalab.jhsph.edu/bsmooth/))\n\n3. Integrative methylation analysis\n\n4. Analysis of obtained methylation estimates, e.g., using Bayes factors\n\nThis documentation focuses on the third and fourth points.\n\n## Installation\n\n### PyPI\n\n```console\n$ pip install luxglm\n```\n\n### GitHub\n\nInstall the version from the main branch as follows\n\n```console\n$ pip install git+https://github.com/tare/LuxGLM.git\n```\n\n## Usage\n\n### Metadata\n\nCount data and covariates are defined in the metadata file.\n\n| name | basal/tgf-beta | vitc | ra | timepoint | count_file | control_count_file | control_definition_file |\n| ---------- | -------------- | ---- | --- | --------- | ----------------------- | ------------------------------ | ----------------------- |\n| TGFb_1_24h | 1 | 0 | 0 | 24 | wildtype/TGFb_1_24h.tsv | control/TGFb_1_24h_control.tsv | control_definitions.tsv |\n| TGFb_1_38h | 1 | 0 | 0 | 38 | wildtype/TGFb_1_38h.tsv | control/TGFb_1_38h_control.tsv | control_definitions.tsv |\n| TGFb_1_48h | 1 | 0 | 0 | 48 | wildtype/TGFb_1_48h.tsv | control/TGFb_1_48h_control.tsv | control_definitions.tsv |\n\nThe following columns are mandatory: `name`, `count_file`, `control_count_file`, and `control_definition`. Additionally, there has to be at least one covariate. In the above example, we have four covariates: `basal/tgf-beta`, `vitc`, `ra`, and `timepoint`.\n\n### Control cytosines\n\nThe control cytosine data are supplied in the control count files. Each experiment will have its own file. The files contain location information (`chromosome` and `position`) and control type information (`control_cytosine`) for the control cytosines. Additionally, we have the number of Cs and and total number of read-outs from BS-seq and oxBS-seq experiments (`bs_c`, `bs_total`, `oxbs_c`, and `oxbs_total`).\n\n| chromosome | position | control_type | bs_c | bs_total | oxbs_c | oxbs_total |\n| ----------- | -------- | ------------ | ---- | -------- | ------ | ---------- |\n| Lambda_ctrl | 22924 | C | 2 | 343 | 1 | 562 |\n| Lambda_ctrl | 22928 | C | 2 | 341 | 1 | 561 |\n| Lambda_ctrl | 47359 | 5mC | 3770 | 3857 | 4767 | 4877 |\n| Lambda_ctrl | 47367 | 5mC | 3895 | 3962 | 4855 | 4979 |\n| Lambda_ctrl | 23789 | 5hmC | 3792 | 3964 | 79 | 865 |\n| Lambda_ctrl | 23794 | 5hmC | 3901 | 4115 | 62 | 934 |\n\nThe prior knowledge on the control cytosines is supplied in the control definition file. Note that `control_type` is used to link the control count data and control definitions.\n\n| control_type | C_pseudocount | 5mC_pseudocount | 5hmC_pseudocount |\n| ------------ | ------------- | --------------- | ---------------- |\n| C | 998 | 1 | 1 |\n| 5mC | 1 | 998 | 1 |\n| 5hmC | 6 | 2 | 72 |\n\n### Noncontrol cytosines\n\nThe non-control cytosine data are supplied in the count files. Each experiment will have its own file. The files contain location information (`chromosome` and `position`) for the non-control cytosines. Additionally, we have the number of Cs and and total number of read-outs from BS-seq and oxBS-seq experiments (`bs_c`, `bs_total`, `oxbs_c`, and `oxbs_total`).\n\n| chromosome | position | bs_c | bs_total | oxbs_c | oxbs_total |\n| ---------- | -------- | ---- | -------- | ------ | ---------- |\n| chrX | 7159069 | 1083 | 1563 | 850 | 2736 |\n| chrX | 7159186 | 1341 | 1534 | 2119 | 2719 |\n| chrX | 7159222 | 4949 | 5575 | 3886 | 4639 |\n| chrX | 7159235 | 4831 | 5588 | 4354 | 4641 |\n\n### LuxGLM analysis\n\nThe following lines are sufficient to run LuxGLM\n\n```python\nimport numpyro\nfrom jax import random\nfrom luxglm.inference import run_nuts\nfrom luxglm.utils import get_input_data\n\nnumpyro.enable_x64()\n\n# read input data\nlux_input_data = get_input_data(\"metadata.tsv\")\n\nkey = random.PRNGKey(0)\nkey, key_ = random.split(key)\n\n# run LuxGLM\nlux_result = run_nuts(\n key,\n lux_input_data,\n [\"basal/tgf-beta\"],\n num_warmup=1_000,\n num_samples=1_000,\n num_chains=4,\n)\n\n# ensure convergence\nlux_result.inference_metrics[\"summary\"].query(\"r_hat > 1.05\")\n```\n\nTo get the posterior samples of methylation levels of control and non-control cytosines one can call `lux_result.methylation_controls()` and `lux_result.methylation()`, respectively.\n\nThe posterior samples of experimental parameters can be obtained by calling `lux_result.experimental_parameters()`.\n\nTo study the effects of the covariates, one can get the posterior samples of coefficients of covariates using `lux_result.coefficients()`.\n\n### Examples\n\nPlease see the [examples](examples/) directory for the tutorial notebooks.\n\n## References\n\n[1] T. \u00c4ij\u00f6, X. Yue, A. Rao and H. L\u00e4hdesm\u00e4ki, \u201cLuxGLM: a probabilistic covariate model for quantification of DNA methylation modifications with complex experimental designs.,\u201d Bioinformatics, 32.17:i511-i519, Sep 2016.\n\n[2] T. \u00c4ij\u00f6, Y. Huang, H. Mannerstr\u00f6m, L. Chavez, A. Tsagaratou, A. Rao and H. L\u00e4hdesm\u00e4ki, \u201cA probabilistic generative model for quantification of DNA modifications enables analysis of demethylation pathways.,\u201d Genome Biol, 17.1:1, Mar 2016.\n\n[3] F. Krueger and S. R. Andrews, \u201cBismark: a flexible aligner and methylation caller for Bisulfite-Seq applications.,\u201d Bioinformatics, 27.11:1571-1572, Jun 2011.\n\n[4] K. D. Hansen, B. Langmead and R. A. Irizarry, \u201cBSmooth: from whole genome bisulfite sequencing reads to differentially methylated regions.,\u201d Genome Biol, 13.10:1, Oct 2012.\n\n[5] D. Phan, N. Pradhan and M. Jankowiak, \u201cComposable Effects for Flexible and Accelerated Probabilistic Programming in NumPyro.,\u201d arXiv preprint 1912.11554, Dec 2019.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A probabilistic covariate model for quantification of DNA methylation modifications with complex experimental designs",
"version": "1.0.0",
"project_urls": {
"Bug Tracker": "https://github.com/tare/LuxGLM/issues",
"Homepage": "https://github.com/tare/LuxGLM",
"Repository": "https://github.com/tare/LuxGLM"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "76a6602096a14a406322b22d50b399e5517c3d405a493785864356a678432a3c",
"md5": "c4a554040b6fbed27313086faa2630f1",
"sha256": "8ff1c8f97537a9cb46182253ce94edda6c9b108d53b56237f625b714d0c8afd8"
},
"downloads": -1,
"filename": "luxglm-1.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "c4a554040b6fbed27313086faa2630f1",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.10",
"size": 11282,
"upload_time": "2024-10-21T23:43:25",
"upload_time_iso_8601": "2024-10-21T23:43:25.049183Z",
"url": "https://files.pythonhosted.org/packages/76/a6/602096a14a406322b22d50b399e5517c3d405a493785864356a678432a3c/luxglm-1.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "1d445e96215266cb090c44c99a55735123a9e0e8e0ee687c38705c7baa0d9796",
"md5": "d07ce325b1bebe5821c1bd6deb208e5a",
"sha256": "3c94e22b5a63bd6e1675c310d597436bfd3906c2dcc01f09cbe7653c348c2c0b"
},
"downloads": -1,
"filename": "luxglm-1.0.0.tar.gz",
"has_sig": false,
"md5_digest": "d07ce325b1bebe5821c1bd6deb208e5a",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.10",
"size": 13932,
"upload_time": "2024-10-21T23:43:27",
"upload_time_iso_8601": "2024-10-21T23:43:27.084458Z",
"url": "https://files.pythonhosted.org/packages/1d/44/5e96215266cb090c44c99a55735123a9e0e8e0ee687c38705c7baa0d9796/luxglm-1.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-21 23:43:27",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "tare",
"github_project": "LuxGLM",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "luxglm"
}