# phylim: a phylogenetic limit evaluation library built on [cogent3](https://cogent3.org/)
[![Coverage Status](https://coveralls.io/repos/github/HuttleyLab/PhyLim/badge.svg?branch=main)](https://coveralls.io/github/HuttleyLab/PhyLim?branch=main)
phylim evaluates the identifiability when estimating the phylogenetic tree using the Markov model. The identifiability is the key condition of the Markov model used in phylogenetics to fulfil consistency.
Establishing identifiability relies on the arrangement of specific types of transition probability matrices (e.g., DLC and sympathetic) while avoiding other types. A key concern arises when a tree does not meet the condition that, for each node, a path to a tip must exist where all matrices along the path are DLC. Such trees are not identifiable 🪚🎄! For instance, in the figure below, tree *T'* contains a node surrounded by a specific type of non-DLC matrix, rendering it non-identifiable. In contrast, compare *T'* with tree *T*.
phylim provides a quick, handy method to check the identifiability of a model fit, where we developed a main [cogent3 app](https://cogent3.org/doc/app/index.html), `phylim`. phylim is compatible with [piqtree2](https://github.com/iqtree/piqtree2), a python library that exposes features from iqtree2.
The following content will demonstrate how to set up phylim and give some tutorials on the main identifiability check app and other associated apps.
<p align="center">
<img src="https://figshare.com/ndownloader/files/50904159" alt="tree1" width="600" height="300" />
</p>
## Installation
```pip install phylim```
Let's see if it has been done successfully. In the package directory:
```pytest```
Hope all tests passed! :blush:
## Run the check of identifiability
If you fit a model to an alignment and get the model result:
```python
>>> from cogent3 import get_app, make_aligned_seqs
>>> aln = make_aligned_seqs(
... {
... "Human": "ATGCGGCTCGCGGAGGCCGCGCTCGCGGAG",
... "Gorilla": "ATGCGGCGCGCGGAGGCCGCGCTCGCGGAG",
... "Mouse": "ATGCCCGGCGCCAAGGCAGCGCTGGCGGAG",
... },
... info={"moltype": "dna", "source": "foo"},
... )
>>> app_fit = get_app("model", "GTR")
>>> result = app_fit(aln)
```
You can easily check the identifiability by:
```python
>>> checker = get_app("phylim")
>>> checked = checker(result)
>>> checked.is_identifiable
True
```
The `phylim` app wraps all information about phylogenetic limits.
```python
>>> checked
```
<div class="c3table">
<table>
<thead class="head_cell">
<tr>
<th>Source</th>
<th>Model Name</th>
<th>Identifiable</th>
<th>Has Boundary Values</th>
<th>Version</th>
</tr>
</thead>
<tbody>
<tr>
<td>brca1.fasta</td>
<td>GTR</td>
<td>True</td>
<td>True</td>
<td>2024.12.3.post2</td>
</tr>
</tbody>
</table>
</div>
You can also use features like classifying all matrices or checking boundary values in a model fit.
<details>
<summary>Label all transition probability matrices in a model fit</summary>
You can call `classify_model_psubs` to give the category of all the matrices:
```python
>>> from phylim import classify_model_psubs
>>> labelled = classify_model_psubs(result)
>>> labelled
```
<div class="c3table">
<table>
<caption>
<span class="cell_title">Substitution Matrices Categories</span>
</caption>
<thead class="head_cell">
<th>edge name</th><th>matrix category</th>
</thead>
<tbody>
<tr><td><span class="c3col_left">Gorilla</span></td><td><span class="c3col_left">DLC</span></td></tr>
<tr><td><span class="c3col_left">Human</span></td><td><span class="c3col_left">DLC</span></td></tr>
<tr><td><span class="c3col_left">Mouse</span></td><td><span class="c3col_left">DLC</span></td></tr>
</tbody>
</table>
</div>
</details>
<details>
<summary>Check if all parameter fits are within the boundary</summary>
```python
>>> from phylim import check_fit_boundary
>>> violations = check_fit_boundary(result)
>>> violations
BoundsViolation(source='foo', vio=[{'par_name': 'C/T', 'init': np.float64(1.0000000147345554e-06), 'lower': 1e-06, 'upper': 50}, {'par_name': 'A/T', 'init': np.float64(1.0000000625906854e-06), 'lower': 1e-06, 'upper': 50}])
```
</details>
## Check identifiability for piqtree2
phylim provides an app, `phylim_to_lf`, which allows you to build the likelihood function from a piqtree2 output tree.
```python
>>> phylo = get_app("piqtree_phylo", model="GTR")
>>> tree = phylo(aln)
>>> lf_from = get_app("phylim_to_lf")
>>> result = lf_from(tree)
>>> checker = get_app("phylim")
>>> checked = checker(result)
>>> checked.is_identifiable
True
```
## Colour the edges for a phylogenetic tree based on matrix categories
If you obtain a model fit, phylim can visualise the tree with labelled matrices.
phylim provides an app, `phylim_style_tree`, which takes an edge-matrix category map and colours the edges:
```python
>>> from phylim import classify_model_psubs
>>> edge_to_cat = classify_model_psubs(result)
>>> tree = result.tree
>>> tree_styler = get_app("phylim_style_tree", edge_to_cat)
>>> tree_styler(tree)
```
<img src="https://figshare.com/ndownloader/files/50903022" alt="tree1" width="400" />
You can also colour edges using a user-defined edge-matrix category map, applicable to any tree object!
```python
>>> from cogent3 import make_tree
>>> from phylim import SYMPATHETIC, DLC
>>> tree = make_tree("(A, B, C);")
>>> edge_to_cat = {"A":SYMPATHETIC, "B":SYMPATHETIC, "C":DLC}
>>> tree_styler = get_app("phylim_style_tree", edge_to_cat)
>>> tree_styler(tree)
```
<img src="https://figshare.com/ndownloader/files/50903019" alt="tree1" width="400" />
Raw data
{
"_id": null,
"home_page": null,
"name": "phylim",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.13,>=3.10",
"maintainer_email": null,
"keywords": "biology, genomics, statistics, phylogeny, evolution, bioinformatics",
"author": null,
"author_email": "Yapeng Lang <u7181074@anu.edu.au>, Gavin Huttley <Gavin.Huttley@anu.edu.au>",
"download_url": "https://files.pythonhosted.org/packages/19/38/5daabbec9bf3e8c064827b73f2e83a78b490a09a5c632eb6b6f73a6b8454/phylim-2024.12.3.post2.tar.gz",
"platform": null,
"description": "# phylim: a phylogenetic limit evaluation library built on [cogent3](https://cogent3.org/)\n[![Coverage Status](https://coveralls.io/repos/github/HuttleyLab/PhyLim/badge.svg?branch=main)](https://coveralls.io/github/HuttleyLab/PhyLim?branch=main)\n\nphylim evaluates the identifiability when estimating the phylogenetic tree using the Markov model. The identifiability is the key condition of the Markov model used in phylogenetics to fulfil consistency. \n\nEstablishing identifiability relies on the arrangement of specific types of transition probability matrices (e.g., DLC and sympathetic) while avoiding other types. A key concern arises when a tree does not meet the condition that, for each node, a path to a tip must exist where all matrices along the path are DLC. Such trees are not identifiable \ud83e\ude9a\ud83c\udf84! For instance, in the figure below, tree *T'* contains a node surrounded by a specific type of non-DLC matrix, rendering it non-identifiable. In contrast, compare *T'* with tree *T*.\n\nphylim provides a quick, handy method to check the identifiability of a model fit, where we developed a main [cogent3 app](https://cogent3.org/doc/app/index.html), `phylim`. phylim is compatible with [piqtree2](https://github.com/iqtree/piqtree2), a python library that exposes features from iqtree2.\n\nThe following content will demonstrate how to set up phylim and give some tutorials on the main identifiability check app and other associated apps.\n\n<p align=\"center\">\n<img src=\"https://figshare.com/ndownloader/files/50904159\" alt=\"tree1\" width=\"600\" height=\"300\" />\n</p>\n\n## Installation\n\n```pip install phylim```\n\nLet's see if it has been done successfully. In the package directory:\n\n```pytest```\n\nHope all tests passed! :blush:\n\n## Run the check of identifiability\n\nIf you fit a model to an alignment and get the model result:\n\n```python\n>>> from cogent3 import get_app, make_aligned_seqs\n\n>>> aln = make_aligned_seqs(\n... {\n... \"Human\": \"ATGCGGCTCGCGGAGGCCGCGCTCGCGGAG\",\n... \"Gorilla\": \"ATGCGGCGCGCGGAGGCCGCGCTCGCGGAG\",\n... \"Mouse\": \"ATGCCCGGCGCCAAGGCAGCGCTGGCGGAG\",\n... },\n... info={\"moltype\": \"dna\", \"source\": \"foo\"},\n... )\n\n>>> app_fit = get_app(\"model\", \"GTR\")\n>>> result = app_fit(aln)\n```\n\nYou can easily check the identifiability by:\n\n```python\n>>> checker = get_app(\"phylim\")\n\n>>> checked = checker(result)\n>>> checked.is_identifiable\n\nTrue\n```\n\nThe `phylim` app wraps all information about phylogenetic limits.\n\n```python\n>>> checked\n```\n\n\n<div class=\"c3table\">\n <table>\n <thead class=\"head_cell\">\n <tr>\n <th>Source</th>\n <th>Model Name</th>\n <th>Identifiable</th>\n <th>Has Boundary Values</th>\n <th>Version</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <td>brca1.fasta</td>\n <td>GTR</td>\n <td>True</td>\n <td>True</td>\n <td>2024.12.3.post2</td>\n </tr>\n </tbody>\n </table>\n</div>\n\n\nYou can also use features like classifying all matrices or checking boundary values in a model fit.\n\n<details>\n<summary>Label all transition probability matrices in a model fit</summary>\n\n\nYou can call `classify_model_psubs` to give the category of all the matrices:\n\n```python\n>>> from phylim import classify_model_psubs\n\n>>> labelled = classify_model_psubs(result)\n>>> labelled\n```\n\n\n<div class=\"c3table\">\n<table>\n\n<caption>\n<span class=\"cell_title\">Substitution Matrices Categories</span>\n</caption>\n<thead class=\"head_cell\">\n<th>edge name</th><th>matrix category</th>\n</thead>\n<tbody>\n<tr><td><span class=\"c3col_left\">Gorilla</span></td><td><span class=\"c3col_left\">DLC</span></td></tr>\n<tr><td><span class=\"c3col_left\">Human</span></td><td><span class=\"c3col_left\">DLC</span></td></tr>\n<tr><td><span class=\"c3col_left\">Mouse</span></td><td><span class=\"c3col_left\">DLC</span></td></tr>\n</tbody>\n</table>\n\n</div>\n\n</details>\n\n\n<details>\n<summary>Check if all parameter fits are within the boundary</summary>\n\n\n```python\n>>> from phylim import check_fit_boundary\n\n>>> violations = check_fit_boundary(result)\n>>> violations\nBoundsViolation(source='foo', vio=[{'par_name': 'C/T', 'init': np.float64(1.0000000147345554e-06), 'lower': 1e-06, 'upper': 50}, {'par_name': 'A/T', 'init': np.float64(1.0000000625906854e-06), 'lower': 1e-06, 'upper': 50}])\n```\n\n</details>\n\n\n## Check identifiability for piqtree2\n\nphylim provides an app, `phylim_to_lf`, which allows you to build the likelihood function from a piqtree2 output tree.\n\n```python\n>>> phylo = get_app(\"piqtree_phylo\", model=\"GTR\")\n>>> tree = phylo(aln)\n\n>>> lf_from = get_app(\"phylim_to_lf\")\n>>> result = lf_from(tree)\n\n>>> checker = get_app(\"phylim\")\n>>> checked = checker(result)\n>>> checked.is_identifiable\n\nTrue\n```\n\n\n## Colour the edges for a phylogenetic tree based on matrix categories\n\nIf you obtain a model fit, phylim can visualise the tree with labelled matrices. \n\nphylim provides an app, `phylim_style_tree`, which takes an edge-matrix category map and colours the edges:\n\n```python\n>>> from phylim import classify_model_psubs\n\n>>> edge_to_cat = classify_model_psubs(result)\n>>> tree = result.tree\n\n>>> tree_styler = get_app(\"phylim_style_tree\", edge_to_cat)\n>>> tree_styler(tree)\n```\n\n<img src=\"https://figshare.com/ndownloader/files/50903022\" alt=\"tree1\" width=\"400\" />\n\n\nYou can also colour edges using a user-defined edge-matrix category map, applicable to any tree object! \n\n```python\n>>> from cogent3 import make_tree\n>>> from phylim import SYMPATHETIC, DLC\n\n>>> tree = make_tree(\"(A, B, C);\")\n>>> edge_to_cat = {\"A\":SYMPATHETIC, \"B\":SYMPATHETIC, \"C\":DLC}\n\n>>> tree_styler = get_app(\"phylim_style_tree\", edge_to_cat)\n>>> tree_styler(tree)\n```\n\n<img src=\"https://figshare.com/ndownloader/files/50903019\" alt=\"tree1\" width=\"400\" />\n",
"bugtrack_url": null,
"license": null,
"summary": "A library for checking the limits of phylogenetic tree estimation.",
"version": "2024.12.3.post2",
"project_urls": {
"Bug Tracker": "https://github.com/HuttleyLab/PhyLim/issues",
"Source Code": "https://github.com/HuttleyLab/PhyLim"
},
"split_keywords": [
"biology",
" genomics",
" statistics",
" phylogeny",
" evolution",
" bioinformatics"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "ec8f33bde78fd90613ce71e0bbfdac0da1c888544eb1300038cc3d3376168f2f",
"md5": "f3c9c7a0a7d9e50a2c082270250d7d58",
"sha256": "374ca5355414c5cc8ae83724ce24a171beec50ea85471b26c07986e5dfc0e969"
},
"downloads": -1,
"filename": "phylim-2024.12.3.post2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "f3c9c7a0a7d9e50a2c082270250d7d58",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.13,>=3.10",
"size": 11612,
"upload_time": "2024-12-03T10:02:44",
"upload_time_iso_8601": "2024-12-03T10:02:44.297442Z",
"url": "https://files.pythonhosted.org/packages/ec/8f/33bde78fd90613ce71e0bbfdac0da1c888544eb1300038cc3d3376168f2f/phylim-2024.12.3.post2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "19385daabbec9bf3e8c064827b73f2e83a78b490a09a5c632eb6b6f73a6b8454",
"md5": "6f20ff66c65f220f78b03ef8133a1678",
"sha256": "dc77090dd6e48a95152ca62f5bb8b386d314351a27b46f64550e816ec356ab2d"
},
"downloads": -1,
"filename": "phylim-2024.12.3.post2.tar.gz",
"has_sig": false,
"md5_digest": "6f20ff66c65f220f78b03ef8133a1678",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.13,>=3.10",
"size": 11304,
"upload_time": "2024-12-03T10:02:45",
"upload_time_iso_8601": "2024-12-03T10:02:45.995345Z",
"url": "https://files.pythonhosted.org/packages/19/38/5daabbec9bf3e8c064827b73f2e83a78b490a09a5c632eb6b6f73a6b8454/phylim-2024.12.3.post2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-03 10:02:45",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "HuttleyLab",
"github_project": "PhyLim",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "phylim"
}