



# PLN-Tree: Hierarchical Poisson Log-Normal models
> The Poisson Log-Normal (PLN) models are used for the
> analysis of multivariate count data. PLN-Tree extends this framework to
> hierarchically organized count data by incorporating tree-like structures
> into the analysis. This package provides efficient algorithms to perform PLN-Tree inference
> by leveraging PyTorch with GPU acceleration.
>
> PLN-Tree has shown interesting applications to metagenomics, exploiting the taxonomy
> as a guide for microbiome modeling.
> Typical applications involve:
> - **Hierarchical Modeling**: investigate the relationships between taxa at different levels of the taxonomy, between levels relationships and covariates impact.
> - **Data Augmentation**: generate synthetic samples to inflate training sets and enhance performances. See TaxaPLN augmentation.
> - **Counts Preprocessing**: transform counts using the LTP-CLR transform to tackle the challenges of compositionality and integer constraints of count data.
## 📖 Documentation and tutorials
Want to learn how to use the package?
Start with the Quickstart guide down below,
then explore the [documentation](https://github.com/AlexandreChaussard/PLNTree-package/wiki).
If you are interest specifically in the TaxaPLN augmentation strategy for microbiome data, check out our [TaxaPLN starting guide](https://github.com/AlexandreChaussard/PLNTree-package/blob/master/taxapln/README.md).
## 🛠 Installation
**PLN-Tree** is available on [PyPI](https://pypi.org/project/plntree/) for faster installation.
```sh
pip install plntree
```
## ⚡️ Quickstart
This package comes with human microbiome data from the [curatedMetagenomicData](https://waldronlab.io/curatedMetagenomicData/index.html) library.
```python
from plntree.data import cMD
taxa_abundance = cMD.get_study(
study='ZhuF_2020', # Study name
taxonomic_levels=('c', 's'), # Taxonomic levels to retrieve
prevalence=0.15, # Minimum prevalence of taxa to include
total_reads=100_000 # Total abundance of each sample (proportions to counts)
)
covariates = cMD.metadata(
study='ZhuF_2020', # Study name
)
```
The `taxa_abundance` is a `pandas.DataFrame` containing the microbial composition
of each patient, while the `covariates` is a `pandas.DataFrame` with the metadata
associated to each patient.
### Training a PLN-Tree model
The `PLNTree` class allows to specify the parameters of the model, and perform the inference on the training data.
```python
from plntree import PLNTree
model = PLNTree(
taxa_abundance, # DataFrame with counts (rows: samples, columns: taxa)
covariates=None, # DataFrame with covariates (optional, default None)
device='cpu', # Device to use for training (default CPU, or 'cuda' for GPU)
seed=0, # Random seed for reproducibility (default None)
)
```
By default the latent dynamic is set to a Markov Linear model, which is suitable for most metagenomics cases.
Besides, the variational approximation is set to a residual amortized backward method, which is more efficient than
the mean-field approximation for PLN-Tree, but requires more parameters. If you use the covariates,
the default implementation relies on FiLM.
See the [documentation](https://github.com/AlexandreChaussard/PLNTree-package/wiki) to understand how to customize these parameters.
The package comes with visualization functions to help interpret the data, notably
calling the `tree.plot` method, which will display the tree structure.
```python
from plntree import PLNTree
model.tree.plot()
```
Training a PLN-Tree model is done by calling the `fit` method on the model.
More parameters are available for early stopping or convergence monitoring.
```python
loss = model.fit(max_epoch=1000, batch_size=512, learning_rate=1e-3, verbose=50) # Output ELBO loss upon fitting
```
### Applications
#### Data Augmentation
PLN-Tree can be used to generate synthetic samples to augment training sets and
improve downstream tasks performances.
For microbiome data, an effective way to perform data augmentation relies on the [TaxaPLN](https://github.com/AlexandreChaussard/PLNTree-package/blob/master/taxapln/README.md) strategy,
which is thoroughly described in [this paper](https://arxiv.org/abs/2507.03588). In a nutshell, TaxaPLN uses the PLN-Tree model to generate synthetic samples
through a post-hoc VAMP sampler that is instanciated from the trained model.
```python
X_aug, Z_aug = model.vamp_sample(n_samples=1000, seed=0)
```
Covariate-aware sampling is also available if the model was trained with covariates using the `covariates` parameter.
#### Count Preprocessing with LTP-CLR
PLN-Tree can also be used to preprocess count data using the LTP-CLR transform defined in the [PLN-Tree paper](https://doi.org/10.1007/s11222-025-10668-w),
which is a log-ratio transformation that addresses the challenges of compositionality
and integer constraints of count data by leveraging the latent space.
Upon training a PLN-Tree model, applying the preprocessing can be done through the `latent_proportion` method
which defines counts in the latent space, before applying the CLR transform.
```python
Z = model.encode(taxa_abundance) # First, encode the counts to the latent space
X_preprocessed = model.latent_tree_proportions(Z, clr=True, seed=0) # Then, apply the LTP-CLR transform
```
This preprocessing is also compatible with covariates.
## 👐 Contributing
Want to contribute? Check the guidelines in [CONTRIBUTING.md](https://github.com/AlexandreChaussard/PLNTree-package/blob/master/CONTRIBUTING.md).
## 📜 Citations
Please cite our work using the following references:
- Chaussard, A., Bonnet, A., Gassiat, E., Le Corff, S.. Tree-based variational inference for Poisson log-normal models. Statistics and Computing 35, 135 (2025). [SpringerLink](https://doi.org/10.1007/s11222-025-10668-w).
Raw data
{
"_id": null,
"home_page": null,
"name": "plntree",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": "Alexandre Chaussard <alexandre.chaussard@sorbonne-universite.fr>",
"keywords": "python, count, multivariate, tree, metagenomics, taxonomy, PLN",
"author": null,
"author_email": "Alexandre Chaussard <alexandre.chaussard@sorbonne-universite.fr>",
"download_url": "https://files.pythonhosted.org/packages/35/3a/9b64a024c77bbf776db9a1c97f6a2f42830c6025509735cb6f2d1f44124b/plntree-0.1.0.tar.gz",
"platform": null,
"description": "\n\n\n\n# PLN-Tree: Hierarchical Poisson Log-Normal models\n> The Poisson Log-Normal (PLN) models are used for the\n> analysis of multivariate count data. PLN-Tree extends this framework to \n> hierarchically organized count data by incorporating tree-like structures\n> into the analysis. This package provides efficient algorithms to perform PLN-Tree inference\n> by leveraging PyTorch with GPU acceleration.\n> \n> PLN-Tree has shown interesting applications to metagenomics, exploiting the taxonomy \n> as a guide for microbiome modeling. \n\n> Typical applications involve:\n> - **Hierarchical Modeling**: investigate the relationships between taxa at different levels of the taxonomy, between levels relationships and covariates impact.\n> - **Data Augmentation**: generate synthetic samples to inflate training sets and enhance performances. See TaxaPLN augmentation.\n> - **Counts Preprocessing**: transform counts using the LTP-CLR transform to tackle the challenges of compositionality and integer constraints of count data.\n\n## \ud83d\udcd6 Documentation and tutorials\n\nWant to learn how to use the package? \nStart with the Quickstart guide down below, \nthen explore the [documentation](https://github.com/AlexandreChaussard/PLNTree-package/wiki).\n\nIf you are interest specifically in the TaxaPLN augmentation strategy for microbiome data, check out our [TaxaPLN starting guide](https://github.com/AlexandreChaussard/PLNTree-package/blob/master/taxapln/README.md).\n\n## \ud83d\udee0 Installation\n\n**PLN-Tree** is available on [PyPI](https://pypi.org/project/plntree/) for faster installation.\n\n```sh\npip install plntree\n```\n\n## \u26a1\ufe0f Quickstart\n\nThis package comes with human microbiome data from the [curatedMetagenomicData](https://waldronlab.io/curatedMetagenomicData/index.html) library.\n```python\nfrom plntree.data import cMD\n\ntaxa_abundance = cMD.get_study(\n study='ZhuF_2020', # Study name\n taxonomic_levels=('c', 's'), # Taxonomic levels to retrieve\n prevalence=0.15, # Minimum prevalence of taxa to include\n total_reads=100_000 # Total abundance of each sample (proportions to counts)\n)\n\ncovariates = cMD.metadata(\n study='ZhuF_2020', # Study name\n)\n```\n\nThe `taxa_abundance` is a `pandas.DataFrame` containing the microbial composition \nof each patient, while the `covariates` is a `pandas.DataFrame` with the metadata \nassociated to each patient.\n\n### Training a PLN-Tree model\n\nThe `PLNTree` class allows to specify the parameters of the model, and perform the inference on the training data.\n```python\nfrom plntree import PLNTree\nmodel = PLNTree(\n taxa_abundance, # DataFrame with counts (rows: samples, columns: taxa)\n covariates=None, # DataFrame with covariates (optional, default None)\n device='cpu', # Device to use for training (default CPU, or 'cuda' for GPU)\n seed=0, # Random seed for reproducibility (default None)\n)\n```\nBy default the latent dynamic is set to a Markov Linear model, which is suitable for most metagenomics cases.\nBesides, the variational approximation is set to a residual amortized backward method, which is more efficient than\nthe mean-field approximation for PLN-Tree, but requires more parameters. If you use the covariates,\nthe default implementation relies on FiLM.\nSee the [documentation](https://github.com/AlexandreChaussard/PLNTree-package/wiki) to understand how to customize these parameters.\n\nThe package comes with visualization functions to help interpret the data, notably\ncalling the `tree.plot` method, which will display the tree structure.\n```python\nfrom plntree import PLNTree\nmodel.tree.plot()\n```\n\nTraining a PLN-Tree model is done by calling the `fit` method on the model. \nMore parameters are available for early stopping or convergence monitoring.\n```python\nloss = model.fit(max_epoch=1000, batch_size=512, learning_rate=1e-3, verbose=50) # Output ELBO loss upon fitting\n```\n\n### Applications\n\n#### Data Augmentation\nPLN-Tree can be used to generate synthetic samples to augment training sets and \nimprove downstream tasks performances.\n\nFor microbiome data, an effective way to perform data augmentation relies on the [TaxaPLN](https://github.com/AlexandreChaussard/PLNTree-package/blob/master/taxapln/README.md) strategy,\nwhich is thoroughly described in [this paper](https://arxiv.org/abs/2507.03588). In a nutshell, TaxaPLN uses the PLN-Tree model to generate synthetic samples\nthrough a post-hoc VAMP sampler that is instanciated from the trained model.\n```python\nX_aug, Z_aug = model.vamp_sample(n_samples=1000, seed=0)\n```\nCovariate-aware sampling is also available if the model was trained with covariates using the `covariates` parameter.\n\n#### Count Preprocessing with LTP-CLR\nPLN-Tree can also be used to preprocess count data using the LTP-CLR transform defined in the [PLN-Tree paper](https://doi.org/10.1007/s11222-025-10668-w),\nwhich is a log-ratio transformation that addresses the challenges of compositionality \nand integer constraints of count data by leveraging the latent space.\n\nUpon training a PLN-Tree model, applying the preprocessing can be done through the `latent_proportion` method\nwhich defines counts in the latent space, before applying the CLR transform.\n```python\nZ = model.encode(taxa_abundance) # First, encode the counts to the latent space\nX_preprocessed = model.latent_tree_proportions(Z, clr=True, seed=0) # Then, apply the LTP-CLR transform\n```\nThis preprocessing is also compatible with covariates.\n\n## \ud83d\udc50 Contributing\n\nWant to contribute? Check the guidelines in [CONTRIBUTING.md](https://github.com/AlexandreChaussard/PLNTree-package/blob/master/CONTRIBUTING.md).\n\n## \ud83d\udcdc Citations\n\nPlease cite our work using the following references:\n\n- Chaussard, A., Bonnet, A., Gassiat, E., Le Corff, S.. Tree-based variational inference for Poisson log-normal models. Statistics and Computing 35, 135 (2025). [SpringerLink](https://doi.org/10.1007/s11222-025-10668-w).\n\n",
"bugtrack_url": null,
"license": null,
"summary": "Package implementing PLN-Tree models and TaxaPLN augmentation.",
"version": "0.1.0",
"project_urls": {
"Homepage": "https://github.com/AlexandreChaussard/PLNTree-package",
"Issues": "https://github.com/AlexandreChaussard/PLNTree-package/issues"
},
"split_keywords": [
"python",
" count",
" multivariate",
" tree",
" metagenomics",
" taxonomy",
" pln"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "fb5f984dedfbead42cc1052ab59c52ec28ec8369a98d3e3eb31f82e12340e497",
"md5": "a2e7ccc1b51f55048c9b79c885b43c0f",
"sha256": "4634c27f832aaf40ff4b2383c0750c4e4be58bd5ccd2c992a3d70ca90a4eb5b7"
},
"downloads": -1,
"filename": "plntree-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "a2e7ccc1b51f55048c9b79c885b43c0f",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 8059186,
"upload_time": "2025-07-08T12:18:42",
"upload_time_iso_8601": "2025-07-08T12:18:42.229382Z",
"url": "https://files.pythonhosted.org/packages/fb/5f/984dedfbead42cc1052ab59c52ec28ec8369a98d3e3eb31f82e12340e497/plntree-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "353a9b64a024c77bbf776db9a1c97f6a2f42830c6025509735cb6f2d1f44124b",
"md5": "173a9f2bc10c398d2e2b454343ba2102",
"sha256": "edf12a2c1de7c329cba003b1249ec7e51b64c5e67c1091ab1edc384b0f19bafb"
},
"downloads": -1,
"filename": "plntree-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "173a9f2bc10c398d2e2b454343ba2102",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 34073046,
"upload_time": "2025-07-08T12:18:45",
"upload_time_iso_8601": "2025-07-08T12:18:45.315389Z",
"url": "https://files.pythonhosted.org/packages/35/3a/9b64a024c77bbf776db9a1c97f6a2f42830c6025509735cb6f2d1f44124b/plntree-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-08 12:18:45",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "AlexandreChaussard",
"github_project": "PLNTree-package",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "plntree"
}