plntree

Name	plntree JSON
Version	0.1.0 JSON
	download
home_page	None
Summary	Package implementing PLN-Tree models and TaxaPLN augmentation.
upload_time	2025-07-08 12:18:45
maintainer	None
docs_url	None
author	None
requires_python	>=3.9
license	None
keywords	python count multivariate tree metagenomics taxonomy pln
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            ![PyPI](https://img.shields.io/pypi/v/plntree)
![GitHub](https://img.shields.io/github/license/AlexandreChaussard/plntree-package)
![Python Versions](https://img.shields.io/badge/python-3.8+-blue)
![GPU Support](https://img.shields.io/badge/GPU-Supported-brightgreen)
# PLN-Tree: Hierarchical Poisson Log-Normal models
> The Poisson Log-Normal (PLN) models are used for the
> analysis of multivariate count data. PLN-Tree extends this framework to 
> hierarchically organized count data by incorporating tree-like structures
> into the analysis. This package provides efficient algorithms to perform PLN-Tree inference
> by leveraging PyTorch with GPU acceleration.
> 
> PLN-Tree has shown interesting applications to metagenomics, exploiting the taxonomy 
> as a guide for microbiome modeling. 

> Typical applications involve:
> - **Hierarchical Modeling**: investigate the relationships between taxa at different levels of the taxonomy, between levels relationships and covariates impact.
> - **Data Augmentation**: generate synthetic samples to inflate training sets and enhance performances. See TaxaPLN augmentation.
> - **Counts Preprocessing**: transform counts using the LTP-CLR transform to tackle the challenges of compositionality and integer constraints of count data.

## 📖 Documentation and tutorials

Want to learn how to use the package? 
Start with the Quickstart guide down below, 
then explore the [documentation](https://github.com/AlexandreChaussard/PLNTree-package/wiki).

If you are interest specifically in the TaxaPLN augmentation strategy for microbiome data, check out our [TaxaPLN starting guide](https://github.com/AlexandreChaussard/PLNTree-package/blob/master/taxapln/README.md).

## 🛠 Installation

**PLN-Tree** is available on [PyPI](https://pypi.org/project/plntree/) for faster installation.

```sh
pip install plntree
```

## ⚡️ Quickstart

This package comes with human microbiome data from the [curatedMetagenomicData](https://waldronlab.io/curatedMetagenomicData/index.html) library.
```python
from plntree.data import cMD

taxa_abundance = cMD.get_study(
    study='ZhuF_2020',           # Study name
    taxonomic_levels=('c', 's'), # Taxonomic levels to retrieve
    prevalence=0.15,             # Minimum prevalence of taxa to include
    total_reads=100_000          # Total abundance of each sample (proportions to counts)
)

covariates = cMD.metadata(
    study='ZhuF_2020',           # Study name
)
```

The `taxa_abundance` is a `pandas.DataFrame` containing the microbial composition 
of each patient, while the `covariates` is a `pandas.DataFrame` with the metadata 
associated to each patient.

### Training a PLN-Tree model

The `PLNTree` class allows to specify the parameters of the model, and perform the inference on the training data.
```python
from plntree import PLNTree
model = PLNTree(
            taxa_abundance,   # DataFrame with counts (rows: samples, columns: taxa)
            covariates=None,  # DataFrame with covariates (optional, default None)
            device='cpu',     # Device to use for training (default CPU, or 'cuda' for GPU)
            seed=0,           # Random seed for reproducibility (default None)
)
```
By default the latent dynamic is set to a Markov Linear model, which is suitable for most metagenomics cases.
Besides, the variational approximation is set to a residual amortized backward method, which is more efficient than
the mean-field approximation for PLN-Tree, but requires more parameters. If you use the covariates,
the default implementation relies on FiLM.
See the [documentation](https://github.com/AlexandreChaussard/PLNTree-package/wiki) to understand how to customize these parameters.

The package comes with visualization functions to help interpret the data, notably
calling the `tree.plot` method, which will display the tree structure.
```python
from plntree import PLNTree
model.tree.plot()
```

Training a PLN-Tree model is done by calling the `fit` method on the model. 
More parameters are available for early stopping or convergence monitoring.
```python
loss = model.fit(max_epoch=1000, batch_size=512, learning_rate=1e-3, verbose=50)  # Output ELBO loss upon fitting
```

### Applications

#### Data Augmentation
PLN-Tree can be used to generate synthetic samples to augment training sets and 
improve downstream tasks performances.

For microbiome data, an effective way to perform data augmentation relies on the [TaxaPLN](https://github.com/AlexandreChaussard/PLNTree-package/blob/master/taxapln/README.md) strategy,
which is thoroughly described in [this paper](https://arxiv.org/abs/2507.03588). In a nutshell, TaxaPLN uses the PLN-Tree model to generate synthetic samples
through a post-hoc VAMP sampler that is instanciated from the trained model.
```python
X_aug, Z_aug = model.vamp_sample(n_samples=1000, seed=0)
```
Covariate-aware sampling is also available if the model was trained with covariates using the `covariates` parameter.

#### Count Preprocessing with LTP-CLR
PLN-Tree can also be used to preprocess count data using the LTP-CLR transform defined in the [PLN-Tree paper](https://doi.org/10.1007/s11222-025-10668-w),
which is a log-ratio transformation that addresses the challenges of compositionality 
and integer constraints of count data by leveraging the latent space.

Upon training a PLN-Tree model, applying the preprocessing can be done through the `latent_proportion` method
which defines counts in the latent space, before applying the CLR transform.
```python
Z = model.encode(taxa_abundance)                                     # First, encode the counts to the latent space
X_preprocessed = model.latent_tree_proportions(Z, clr=True, seed=0)  # Then, apply the LTP-CLR transform
```
This preprocessing is also compatible with covariates.

## 👐 Contributing

Want to contribute? Check the guidelines in [CONTRIBUTING.md](https://github.com/AlexandreChaussard/PLNTree-package/blob/master/CONTRIBUTING.md).

## 📜 Citations

Please cite our work using the following references:

- Chaussard, A., Bonnet, A., Gassiat, E., Le Corff, S.. Tree-based variational inference for Poisson log-normal models. Statistics and Computing 35, 135 (2025). [SpringerLink](https://doi.org/10.1007/s11222-025-10668-w).

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "plntree",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": "Alexandre Chaussard <alexandre.chaussard@sorbonne-universite.fr>",
    "keywords": "python, count, multivariate, tree, metagenomics, taxonomy, PLN",
    "author": null,
    "author_email": "Alexandre Chaussard <alexandre.chaussard@sorbonne-universite.fr>",
    "download_url": "https://files.pythonhosted.org/packages/35/3a/9b64a024c77bbf776db9a1c97f6a2f42830c6025509735cb6f2d1f44124b/plntree-0.1.0.tar.gz",
    "platform": null,
    "description": "![PyPI](https://img.shields.io/pypi/v/plntree)\n![GitHub](https://img.shields.io/github/license/AlexandreChaussard/plntree-package)\n![Python Versions](https://img.shields.io/badge/python-3.8+-blue)\n![GPU Support](https://img.shields.io/badge/GPU-Supported-brightgreen)\n# PLN-Tree: Hierarchical Poisson Log-Normal models\n> The Poisson Log-Normal (PLN) models are used for the\n> analysis of multivariate count data. PLN-Tree extends this framework to \n> hierarchically organized count data by incorporating tree-like structures\n> into the analysis. This package provides efficient algorithms to perform PLN-Tree inference\n> by leveraging PyTorch with GPU acceleration.\n> \n> PLN-Tree has shown interesting applications to metagenomics, exploiting the taxonomy \n> as a guide for microbiome modeling. \n\n> Typical applications involve:\n> - **Hierarchical Modeling**: investigate the relationships between taxa at different levels of the taxonomy, between levels relationships and covariates impact.\n> - **Data Augmentation**: generate synthetic samples to inflate training sets and enhance performances. See TaxaPLN augmentation.\n> - **Counts Preprocessing**: transform counts using the LTP-CLR transform to tackle the challenges of compositionality and integer constraints of count data.\n\n## \ud83d\udcd6 Documentation and tutorials\n\nWant to learn how to use the package? \nStart with the Quickstart guide down below, \nthen explore the [documentation](https://github.com/AlexandreChaussard/PLNTree-package/wiki).\n\nIf you are interest specifically in the TaxaPLN augmentation strategy for microbiome data, check out our [TaxaPLN starting guide](https://github.com/AlexandreChaussard/PLNTree-package/blob/master/taxapln/README.md).\n\n## \ud83d\udee0 Installation\n\n**PLN-Tree** is available on [PyPI](https://pypi.org/project/plntree/) for faster installation.\n\n```sh\npip install plntree\n```\n\n## \u26a1\ufe0f Quickstart\n\nThis package comes with human microbiome data from the [curatedMetagenomicData](https://waldronlab.io/curatedMetagenomicData/index.html) library.\n```python\nfrom plntree.data import cMD\n\ntaxa_abundance = cMD.get_study(\n    study='ZhuF_2020',           # Study name\n    taxonomic_levels=('c', 's'), # Taxonomic levels to retrieve\n    prevalence=0.15,             # Minimum prevalence of taxa to include\n    total_reads=100_000          # Total abundance of each sample (proportions to counts)\n)\n\ncovariates = cMD.metadata(\n    study='ZhuF_2020',           # Study name\n)\n```\n\nThe `taxa_abundance` is a `pandas.DataFrame` containing the microbial composition \nof each patient, while the `covariates` is a `pandas.DataFrame` with the metadata \nassociated to each patient.\n\n### Training a PLN-Tree model\n\nThe `PLNTree` class allows to specify the parameters of the model, and perform the inference on the training data.\n```python\nfrom plntree import PLNTree\nmodel = PLNTree(\n            taxa_abundance,   # DataFrame with counts (rows: samples, columns: taxa)\n            covariates=None,  # DataFrame with covariates (optional, default None)\n            device='cpu',     # Device to use for training (default CPU, or 'cuda' for GPU)\n            seed=0,           # Random seed for reproducibility (default None)\n)\n```\nBy default the latent dynamic is set to a Markov Linear model, which is suitable for most metagenomics cases.\nBesides, the variational approximation is set to a residual amortized backward method, which is more efficient than\nthe mean-field approximation for PLN-Tree, but requires more parameters. If you use the covariates,\nthe default implementation relies on FiLM.\nSee the [documentation](https://github.com/AlexandreChaussard/PLNTree-package/wiki) to understand how to customize these parameters.\n\nThe package comes with visualization functions to help interpret the data, notably\ncalling the `tree.plot` method, which will display the tree structure.\n```python\nfrom plntree import PLNTree\nmodel.tree.plot()\n```\n\nTraining a PLN-Tree model is done by calling the `fit` method on the model. \nMore parameters are available for early stopping or convergence monitoring.\n```python\nloss = model.fit(max_epoch=1000, batch_size=512, learning_rate=1e-3, verbose=50)  # Output ELBO loss upon fitting\n```\n\n### Applications\n\n#### Data Augmentation\nPLN-Tree can be used to generate synthetic samples to augment training sets and \nimprove downstream tasks performances.\n\nFor microbiome data, an effective way to perform data augmentation relies on the [TaxaPLN](https://github.com/AlexandreChaussard/PLNTree-package/blob/master/taxapln/README.md) strategy,\nwhich is thoroughly described in [this paper](https://arxiv.org/abs/2507.03588). In a nutshell, TaxaPLN uses the PLN-Tree model to generate synthetic samples\nthrough a post-hoc VAMP sampler that is instanciated from the trained model.\n```python\nX_aug, Z_aug = model.vamp_sample(n_samples=1000, seed=0)\n```\nCovariate-aware sampling is also available if the model was trained with covariates using the `covariates` parameter.\n\n#### Count Preprocessing with LTP-CLR\nPLN-Tree can also be used to preprocess count data using the LTP-CLR transform defined in the [PLN-Tree paper](https://doi.org/10.1007/s11222-025-10668-w),\nwhich is a log-ratio transformation that addresses the challenges of compositionality \nand integer constraints of count data by leveraging the latent space.\n\nUpon training a PLN-Tree model, applying the preprocessing can be done through the `latent_proportion` method\nwhich defines counts in the latent space, before applying the CLR transform.\n```python\nZ = model.encode(taxa_abundance)                                     # First, encode the counts to the latent space\nX_preprocessed = model.latent_tree_proportions(Z, clr=True, seed=0)  # Then, apply the LTP-CLR transform\n```\nThis preprocessing is also compatible with covariates.\n\n## \ud83d\udc50 Contributing\n\nWant to contribute? Check the guidelines in [CONTRIBUTING.md](https://github.com/AlexandreChaussard/PLNTree-package/blob/master/CONTRIBUTING.md).\n\n## \ud83d\udcdc Citations\n\nPlease cite our work using the following references:\n\n- Chaussard, A., Bonnet, A., Gassiat, E., Le Corff, S.. Tree-based variational inference for Poisson log-normal models. Statistics and Computing 35, 135 (2025). [SpringerLink](https://doi.org/10.1007/s11222-025-10668-w).\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Package implementing PLN-Tree models and TaxaPLN augmentation.",
    "version": "0.1.0",
    "project_urls": {
        "Homepage": "https://github.com/AlexandreChaussard/PLNTree-package",
        "Issues": "https://github.com/AlexandreChaussard/PLNTree-package/issues"
    },
    "split_keywords": [
        "python",
        " count",
        " multivariate",
        " tree",
        " metagenomics",
        " taxonomy",
        " pln"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "fb5f984dedfbead42cc1052ab59c52ec28ec8369a98d3e3eb31f82e12340e497",
                "md5": "a2e7ccc1b51f55048c9b79c885b43c0f",
                "sha256": "4634c27f832aaf40ff4b2383c0750c4e4be58bd5ccd2c992a3d70ca90a4eb5b7"
            },
            "downloads": -1,
            "filename": "plntree-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a2e7ccc1b51f55048c9b79c885b43c0f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 8059186,
            "upload_time": "2025-07-08T12:18:42",
            "upload_time_iso_8601": "2025-07-08T12:18:42.229382Z",
            "url": "https://files.pythonhosted.org/packages/fb/5f/984dedfbead42cc1052ab59c52ec28ec8369a98d3e3eb31f82e12340e497/plntree-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "353a9b64a024c77bbf776db9a1c97f6a2f42830c6025509735cb6f2d1f44124b",
                "md5": "173a9f2bc10c398d2e2b454343ba2102",
                "sha256": "edf12a2c1de7c329cba003b1249ec7e51b64c5e67c1091ab1edc384b0f19bafb"
            },
            "downloads": -1,
            "filename": "plntree-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "173a9f2bc10c398d2e2b454343ba2102",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 34073046,
            "upload_time": "2025-07-08T12:18:45",
            "upload_time_iso_8601": "2025-07-08T12:18:45.315389Z",
            "url": "https://files.pythonhosted.org/packages/35/3a/9b64a024c77bbf776db9a1c97f6a2f42830c6025509735cb6f2d1f44124b/plntree-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-08 12:18:45",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "AlexandreChaussard",
    "github_project": "PLNTree-package",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "plntree"
}

None