# MuVI
A multi-view latent variable model with domain-informed structured sparsity, that integrates noisy domain expertise in terms of feature sets.
[Examples](examples/1_basic_tutorial.ipynb) | [Paper](https://proceedings.mlr.press/v206/qoku23a/qoku23a.pdf) | [BibTeX](citation.bib)
[![Build](https://github.com/mlo-lab/muvi/actions/workflows/build.yml/badge.svg)](https://github.com/mlo-lab/muvi/actions/workflows/build.yml/)
[![Coverage](https://codecov.io/gh/mlo-lab/muvi/branch/master/graph/badge.svg)](https://codecov.io/gh/mlo-lab/muvi)
## Basic usage
The `MuVI` class is the main entry point for loading the data and performing the inference:
```py
import numpy as np
import pandas as pd
import anndata as ad
import mudata as md
import muvi
# Load processed input data (missing values are allowed)
# Matrix of dimensions n_samples x n_rna_features
rna_df = pd.read_csv(...)
# Matrix of dimensions n_samples x n_prot_features
prot_df = pd.read_csv(...)
# Load prior feature sets, e.g. gene sets
gene_sets = muvi.fs.from_gmt(...)
# Binary matrix of dimensions n_gene_sets x n_rna_features
gene_sets_mask = gene_sets.to_mask(rna_df.columns)
# Create a MuVI object by passing both input data and prior information
model = muvi.MuVI(
observations={"rna": rna_df, "prot": prot_df},
prior_masks={"rna": gene_sets_mask},
...
device=device,
)
# Alternatively, create a MuVI model from AnnData (single-view)
rna_adata = ad.AnnData(rna_df, dtype=np.float32)
rna_adata.varm['gene_sets_mask'] = gene_sets_mask.T
model = muvi.tl.from_adata(
adata,
prior_mask_key="gene_sets_mask",
...,
device=device
)
# Alternatively, create a MuVI model from MuData (multi-view)
mdata = md.MuData({"rna": rna_adata, "prot": prot_adata})
model = muvi.tl.mdata(
mdata,
prior_mask_key="gene_sets_mask",
...,
device=device
)
# Fit the model for a given number of training epochs
model.fit(batch_size, n_epochs, ...)
# Continue with the downstream analysis (see below)
```
## Submodules
The package consists of three additional submodules for analysing the results post-training:
- [`muvi.tl`](muvi/tools/utils.py) provides tools for downstream analysis, e.g.,
- compute `muvi.tl.variance_explained` across all factors and views
- `muvi.tl.test` the significance between the prior feature sets and the inferred factors
- apply clustering on the latent space such as `muvi.tl.leiden`
- `muvi.tl.save` the model in order to `muvi.tl.load` it at a later point in time
- [`muvi.pl`](muvi/tools/plotting.py) works in tandem with `muvi.tl` by providing visualization methods such as
- `muvi.pl.variance_explained` (see above)
- plotting the latent space via `muvi.pl.tsne`, `muvi.pl.scatter` or `muvi.pl.stripplot`
- investigating factors in terms of their inferred loadings with `muvi.pl.inspect_factor`
- [`muvi.fs`](muvi/tools/feature_sets.py) serves the data structure and methods for loading, processing and storing the prior information from feature sets
## Tutorials
Check out our [basic tutorial](examples/1_basic_tutorial.ipynb) to get familiar with `MuVI`, or jump straight to a [single-cell multiome](examples/3a_single-cell_multi-omics_integration.ipynb) analysis!
`R` users can readily export a trained `MuVI` model into `R` with a single line of code and resume the analysis with the [`MOFA2`](https://biofam.github.io/MOFA2) package.
```py
muvi.ext.save_as_hdf5(model, "muvi.hdf5", save_metadata=True)
```
See [this vignette](https://raw.githack.com/MLO-lab/MuVI/master/examples/4_single-cell_multi-omics_integration_R.html) for more details!
## Installation
We suggest using [conda](https://docs.conda.io/en/latest/miniconda.html) to manage your environments, and [pip](https://pypi.org/project/pip/) to install `muvi` as a python package. Follow these steps to get `muvi` up and running!
1. Create a python environment in `conda`:
```bash
conda create -n muvi python=3.9
```
2. Activate freshly created environment:
```bash
source activate muvi
```
3. Install `muvi` with `pip`:
```bash
python3 -m pip install muvi
```
4. Alternatively, install the latest version with `pip`:
```bash
python3 -m pip install git+https://github.com/MLO-lab/MuVI.git
```
Make sure to install a GPU version of [PyTorch](https://pytorch.org/) to significantly speed up the inference.
## Citation
If you use `MuVI` in your work, please use this [BibTeX](citation.bib) entry:
> **Encoding Domain Knowledge in Multi-view Latent Variable Models: A Bayesian Approach with Structured Sparsity**
>
> Arber Qoku and Florian Buettner
>
> _International Conference on Artificial Intelligence and Statistics (AISTATS)_ 2023
>
> <https://proceedings.mlr.press/v206/qoku23a.html>
Raw data
{
"_id": null,
"home_page": "https://github.com/MLO-lab/MuVI",
"name": "muvi",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.11,>=3.9",
"maintainer_email": null,
"keywords": "multi-view, multi-omics, feature sets, latent variable model, structured sparsity, variational inference, single-cell",
"author": "Arber Qoku",
"author_email": "arber.qoku@dkfz-heidelberg.com",
"download_url": "https://files.pythonhosted.org/packages/b8/d6/0519c6961bc0ce99982326c4351f95016a97bbdd8cb662b81c2987d5e8a6/muvi-0.1.5.tar.gz",
"platform": null,
"description": "# MuVI\n\nA multi-view latent variable model with domain-informed structured sparsity, that integrates noisy domain expertise in terms of feature sets.\n\n[Examples](examples/1_basic_tutorial.ipynb) | [Paper](https://proceedings.mlr.press/v206/qoku23a/qoku23a.pdf) | [BibTeX](citation.bib)\n\n[![Build](https://github.com/mlo-lab/muvi/actions/workflows/build.yml/badge.svg)](https://github.com/mlo-lab/muvi/actions/workflows/build.yml/)\n[![Coverage](https://codecov.io/gh/mlo-lab/muvi/branch/master/graph/badge.svg)](https://codecov.io/gh/mlo-lab/muvi)\n\n## Basic usage\n\nThe `MuVI` class is the main entry point for loading the data and performing the inference:\n\n```py\nimport numpy as np\nimport pandas as pd\nimport anndata as ad\nimport mudata as md\nimport muvi\n\n# Load processed input data (missing values are allowed)\n# Matrix of dimensions n_samples x n_rna_features\nrna_df = pd.read_csv(...)\n# Matrix of dimensions n_samples x n_prot_features\nprot_df = pd.read_csv(...)\n\n# Load prior feature sets, e.g. gene sets\ngene_sets = muvi.fs.from_gmt(...)\n# Binary matrix of dimensions n_gene_sets x n_rna_features\ngene_sets_mask = gene_sets.to_mask(rna_df.columns)\n\n# Create a MuVI object by passing both input data and prior information\nmodel = muvi.MuVI(\n observations={\"rna\": rna_df, \"prot\": prot_df},\n prior_masks={\"rna\": gene_sets_mask},\n ...\n device=device,\n)\n\n# Alternatively, create a MuVI model from AnnData (single-view)\nrna_adata = ad.AnnData(rna_df, dtype=np.float32)\nrna_adata.varm['gene_sets_mask'] = gene_sets_mask.T\nmodel = muvi.tl.from_adata(\n adata, \n prior_mask_key=\"gene_sets_mask\", \n ..., \n device=device\n)\n\n# Alternatively, create a MuVI model from MuData (multi-view)\nmdata = md.MuData({\"rna\": rna_adata, \"prot\": prot_adata})\nmodel = muvi.tl.mdata(\n mdata, \n prior_mask_key=\"gene_sets_mask\", \n ..., \n device=device\n)\n\n# Fit the model for a given number of training epochs\nmodel.fit(batch_size, n_epochs, ...)\n\n# Continue with the downstream analysis (see below)\n```\n\n## Submodules\n\nThe package consists of three additional submodules for analysing the results post-training:\n\n- [`muvi.tl`](muvi/tools/utils.py) provides tools for downstream analysis, e.g.,\n - compute `muvi.tl.variance_explained` across all factors and views\n - `muvi.tl.test` the significance between the prior feature sets and the inferred factors\n - apply clustering on the latent space such as `muvi.tl.leiden`\n - `muvi.tl.save` the model in order to `muvi.tl.load` it at a later point in time\n- [`muvi.pl`](muvi/tools/plotting.py) works in tandem with `muvi.tl` by providing visualization methods such as\n - `muvi.pl.variance_explained` (see above)\n - plotting the latent space via `muvi.pl.tsne`, `muvi.pl.scatter` or `muvi.pl.stripplot`\n - investigating factors in terms of their inferred loadings with `muvi.pl.inspect_factor`\n- [`muvi.fs`](muvi/tools/feature_sets.py) serves the data structure and methods for loading, processing and storing the prior information from feature sets\n\n## Tutorials\n\nCheck out our [basic tutorial](examples/1_basic_tutorial.ipynb) to get familiar with `MuVI`, or jump straight to a [single-cell multiome](examples/3a_single-cell_multi-omics_integration.ipynb) analysis!\n\n`R` users can readily export a trained `MuVI` model into `R` with a single line of code and resume the analysis with the [`MOFA2`](https://biofam.github.io/MOFA2) package.\n\n```py\nmuvi.ext.save_as_hdf5(model, \"muvi.hdf5\", save_metadata=True)\n```\n\nSee [this vignette](https://raw.githack.com/MLO-lab/MuVI/master/examples/4_single-cell_multi-omics_integration_R.html) for more details!\n\n## Installation\n\nWe suggest using [conda](https://docs.conda.io/en/latest/miniconda.html) to manage your environments, and [pip](https://pypi.org/project/pip/) to install `muvi` as a python package. Follow these steps to get `muvi` up and running!\n\n1. Create a python environment in `conda`:\n\n```bash\nconda create -n muvi python=3.9\n```\n\n2. Activate freshly created environment:\n\n```bash\nsource activate muvi\n```\n\n3. Install `muvi` with `pip`:\n\n```bash\npython3 -m pip install muvi\n```\n\n4. Alternatively, install the latest version with `pip`:\n\n```bash\npython3 -m pip install git+https://github.com/MLO-lab/MuVI.git\n```\n\nMake sure to install a GPU version of [PyTorch](https://pytorch.org/) to significantly speed up the inference.\n\n## Citation\n\nIf you use `MuVI` in your work, please use this [BibTeX](citation.bib) entry:\n\n> **Encoding Domain Knowledge in Multi-view Latent Variable Models: A Bayesian Approach with Structured Sparsity**\n>\n> Arber Qoku and Florian Buettner\n>\n> _International Conference on Artificial Intelligence and Statistics (AISTATS)_ 2023\n>\n> <https://proceedings.mlr.press/v206/qoku23a.html>\n\n",
"bugtrack_url": null,
"license": null,
"summary": "MuVI: A multi-view latent variable model with domain-informed structured sparsity for integrating noisy feature sets.",
"version": "0.1.5",
"project_urls": {
"Homepage": "https://github.com/MLO-lab/MuVI",
"Repository": "https://github.com/MLO-lab/MuVI"
},
"split_keywords": [
"multi-view",
" multi-omics",
" feature sets",
" latent variable model",
" structured sparsity",
" variational inference",
" single-cell"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "cfb4a1349fe11eb5534c97ef42feae50e21df0847116dbb020140864a2be3f92",
"md5": "aba83fc572856a131d47aa82497e62ee",
"sha256": "cb8f46484dea11fb72c9a8501e22b08c61e6684cc6010b899e200fa8cb7ad13d"
},
"downloads": -1,
"filename": "muvi-0.1.5-py3-none-any.whl",
"has_sig": false,
"md5_digest": "aba83fc572856a131d47aa82497e62ee",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.11,>=3.9",
"size": 54115,
"upload_time": "2024-11-22T12:08:08",
"upload_time_iso_8601": "2024-11-22T12:08:08.632256Z",
"url": "https://files.pythonhosted.org/packages/cf/b4/a1349fe11eb5534c97ef42feae50e21df0847116dbb020140864a2be3f92/muvi-0.1.5-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "b8d60519c6961bc0ce99982326c4351f95016a97bbdd8cb662b81c2987d5e8a6",
"md5": "64c0e24820ae94b755abcc95e84838cd",
"sha256": "2bb64850cdf580500191c4afffcc6d1b852e5caea5c827679c208b87ec0ab0f8"
},
"downloads": -1,
"filename": "muvi-0.1.5.tar.gz",
"has_sig": false,
"md5_digest": "64c0e24820ae94b755abcc95e84838cd",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.11,>=3.9",
"size": 52038,
"upload_time": "2024-11-22T12:08:10",
"upload_time_iso_8601": "2024-11-22T12:08:10.587467Z",
"url": "https://files.pythonhosted.org/packages/b8/d6/0519c6961bc0ce99982326c4351f95016a97bbdd8cb662b81c2987d5e8a6/muvi-0.1.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-22 12:08:10",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "MLO-lab",
"github_project": "MuVI",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "muvi"
}