edgehog

Name	edgehog JSON
Version	0.1.6 JSON
	download
home_page	https://github.com/DessimozLab/edgehog
Summary	Infering ancestral synteny with hierarchical orthologous groups
upload_time	2024-10-10 08:51:41
maintainer	None
docs_url	None
author	Charles Bernard
requires_python	<3.13,>=3.9
license	MIT
keywords	bioinformatics synteny ancestral gene order
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # edgeHOG

## Contents

- [Overview](#overview)
- [System Requirements](#system-requirements)
- [Installation Guide](#installation)
- [Demo](#demo)
- [Results](#results)
- [License](./LICENSE.txt)
- [Issues](https://github.com/dessimozlab/edgehog/issues)


## Overview

```edgeHOG``` is a tool to infer the gene order of each ancestor in a species phylogeny. As such, ```edgeHOG``` enables both to explore ancestral microsyntenies (local scale) and to reconstruct ancestral chromosomes (global scale). 

```edgeHOG``` relies on objects called HOGs (Hierarchical Groups of Orthologs) to model gene lineages and ancestral gene content. Basically, genes that belong to the same HOG across extant genomes are inferred to have descended from the same common ancestral gene in the common ancestor of these genomes. Accordingly, adjacencies between extant genes can be converted to edges between HOGs, which enables parsimonious ancestral gene order inferences.  

## System Requirements


### Hardware Requirements

The `edgeHOG` package requires only a standard computer with enough RAM. The amount of RAM depends a lot on the size of the dataset. For big datasets (thousands of genomes), more than 100GB of RAM are needed.

### Software Requirements

#### OS Requirements

The package development version is tested on *Linux* operating systems. The developmental version of the package has been tested on an Ubuntu 22.04 and CentOS 7 environment.

The package itself should be compatible with Windows, Mac and Linux operating systems.

Edgehog is written in purge Python, so a working python installation is needed before installing edgehog.

#### Installing Python on Ubuntu 22.04

Python can be installed directly from its `apt` system using `apt install python3`

## Installation

### From PyPi using pip
`edgeHOG`  can be installed directly from pypi using pip. The command is the following:

```bash
pip install edgehog
```

To enable hdf5 support and direct reading of genome data from OMA's HDF5 database, you need to enable 
the `oma` extra during installation:

```bash
pip install edgehog[oma]
```

The dependencies are version pinned and will automatically be installed as well. 

### From sources
```edgeHOG``` was built and tested with python 3.9 and higher. To set up ```edgeHOG``` on your local machine, please follow the instructions below. 

```bash
pip install poetry  # poetry is used as build and dependency resolving system.

git clone https://github.com/dessimozlab/edgehog.git
cd edgehog
poetry install --extra oma
```

## Usage

```
usage: edgehog [-h] [--version] [--output_directory OUTPUT_DIRECTORY] 
               --species_tree SPECIES_TREE  --hogs HOGS 
               [--gff_directory GFF_DIRECTORY] [--hdf5 HDF5] [--orient_edges] 
               [--date_edges] [--phylostratify] [--max_gaps MAX_GAPS] 
               [--include_extant_genes] [--out-format {TSV,HDF5}]

edgehog is a software tool that infers an ancestral synteny graph at each
internal node of an input species phylogenetic tree

optional arguments:
  -h, --help            show this help message and exit
  --version             print version number and exit
  --output_directory OUTPUT_DIRECTORY
                        path to output directory (default is ./edgehog_output)
  --species_tree SPECIES_TREE
                        path to species/genomes phylogenetic tree (newick format)
  --hogs HOGS           path to the HierarchicalGroups.orthoxml file in which HOGs are stored
  --gff_directory GFF_DIRECTORY
                        path to directory with the gffs of extant genomes (each gff file must be named according 
                        to the name of an extant genome / leaf on the species tree)
  --hdf5 HDF5           path to the hdf5 file (alternative to gff_directory to run edgeHOG on the entire OMA 
                        database)
  --orient_edges        whether the transcriptional orientation of edges should be predicted
  --date_edges          whether the age of edges in extant species should be predicted
  --phylostratify       whether the number of edge retention, gain and loss should be analyzed for each node
                        of the species tree
  --max_gaps MAX_GAPS   max_gaps can be seen as the theoritical maximal number of consecutive novel genes that
                        can emerge between two older genes (default = 3), e.g. if max_gaps = 2: the
                        probabilistic A-b-c-D-E-f-g-h-I-J graph will be turn into A-D-E ; I-J in the
                        ancestorwhile if max_gaps = 3: the probabilistic A-b-c-D-E-f-g-h-I-J graph will be
                        turn into A-D-E-I-J in the ancestor
  --include_extant_genes
                        include extant genes in output file for ancestral reconstructions.
  --out-format {TSV,HDF5}
                        define output format. Can be TSV (tab seperated files) or HDF5 (compatible for 
                        integration into oma hdf5)
```

## Input data

Three types of input data are needed for ```edgeHOG``` to run:
* a phylogenetic tree of species/genomes of interest (in newick format)
* the annotation of each of these genomes (e.g. in the form of a directory of gff files)
* an HierarchicalGroups.orthoxml file corresponding to the extant genomes

Since these input data are intersected, they must comply with the following requirements:
* the prefix of a gff filename must correspond to a species/genome identifier in the phylogenetic tree
* all genome identifiers in the phylogenetic tree must correspond to a genome entry in the HierarchicalGroups.orthoxml file
* the ```protId``` or the prefix of the ```protId``` followed by the ```' '``` character of each entry of a given input genome in the HierarchicalGroups.orthoxml file must match the ```protein_id``` of a CDS in the gff file of this genome

### Species tree

A phylogenetic tree of the input genomes/species must be provided in the newick format. If internal nodes are not named, they will be named based on the concatenation of the names of their descendant leaves

* The species phylogenetic tree used in the OMA database can be downloaded [here](https://omabrowser.org/All/speciestree.nwk)
* The high-quality GTDB archaeal and bacterial species trees, along with metadata can be found [here](https://data.gtdb.ecogenomic.org/releases/latest/)
* To use the tree or a subtree of the NCBI taxonomy database, the ete3 python package has some [useful build-in functions](http://etetoolkit.org/docs/latest/tutorial/tutorial_ncbitaxonomy.html)

To prune a tree in order to obtain only the phylogeny of your genomes of interest, please refer to the corresponding [ete3 tutorial](http://etetoolkit.org/docs/latest/tutorial/tutorial_trees.html#pruning-trees)

If you don't have a species tree available for your genomes, you can follow [this tutorial](https://github.com/DessimozLab/f1000_PhylogeneticTree) on how to use OMA Browser and OMA standalone for species tree inference.


### HierarchicalGroups.orthoxml
 
An HierarchicalGroups.orthoxml file of HOGs defined based on the proteomes and the species tree of input genomes is required for genomic comparisons. HOGs xml files can be retrieved from several leading orthology databases such as [OrthoDB](https://www.orthodb.org/), [EggNOG](http://eggnog5.embl.de), [HieranoiDB](https://hieranoidb.sbc.su.se/) or [OMA](https://omabrowser.org/oma/home/).

If you don't have a HierarchicalGroups.orthoxml for your genomes, HOGs can be inferred from your input dataset using [OMA_standalone](https://omabrowser.org/standalone/). 

## Demo

### Small test dataset

We provide a small testdata set in the subdirectory `test_data`. `edgeHOG` can be run on this dataset with the following command:

```bash
edgehog --hog test_data/FastOMA_HOGs.orthoxml \
                --species_tree test_data/species_tree.nwk \
                --gff_directory test_data/gff3/ \
                --date_edges \
                --output_directory test-results
```

See the [test-data specific README](test_data/README.md) for more details how the dataset was assembled. The [Result section](#results) will discuss what the result files contain and how they can be interpreted.  

### Large dataset (complete OMA database with thousands of genomes)
`edgehog` can be run on the complete public OMA database using the data available on https://omabrowser.org/oma/current/. For that,
one can download the HOGs ([oma-hogs.orthoXML.gz file](https://omabrowser.org/All/oma-hogs.orthoXML.gz)), the [species tree](https://omabrowser.org/All/speciestree.nwk) and 
the [OMA HDF5 database](https://omabrowser.org/All/OmaServer.h5). 

Note that this dataset is very large (>200 GB). It can be run with the following command:

```bash
wget https://omabrowser.org/All/oma-hogs.orthoXML.gz
wget https://omabrowser.org/All/speciestree.nwk
wget https://omabrowser.org/All/OmaServer.h5

gunzip oma-hogs.orthoXML.gz
edgehog --hogs oma-hogs.orthoXML --hdf5 OmaServer.h5 --species_tree speciestree.nwk --date_edges --output_directory ./edghog_results
```



## Results

edgehog produces a number of result files in the specified output directory (e.g. `./edgehog_output`). Unless the `--out-format` is 
specified to be hdf5, the result files are all TSV files:

```bash
$> ls edgehog_results
0_bottom-up_synteny_graph_edges.tsv.gz   4_extant_synteny_graph_edges.tsv.gz
0_linearized_synteny_graph_edges.tsv.gz  5_extant_synteny_graph_edges.tsv.gz
0_top-down_synteny_graph_edges.tsv.gz    6_bottom-up_synteny_graph_edges.tsv.gz
1_bottom-up_synteny_graph_edges.tsv.gz   6_linearized_synteny_graph_edges.tsv.gz
1_linearized_synteny_graph_edges.tsv.gz  6_top-down_synteny_graph_edges.tsv.gz
1_top-down_synteny_graph_edges.tsv.gz    7_extant_synteny_graph_edges.tsv.gz
2_bottom-up_synteny_graph_edges.tsv.gz   8_extant_synteny_graph_edges.tsv.gz
2_linearized_synteny_graph_edges.tsv.gz  9_bottom-up_synteny_graph_edges.tsv.gz
2_top-down_synteny_graph_edges.tsv.gz    9_linearized_synteny_graph_edges.tsv.gz
3_bottom-up_synteny_graph_edges.tsv.gz   9_top-down_synteny_graph_edges.tsv.gz
3_linearized_synteny_graph_edges.tsv.gz  genome_dict.tsv
3_top-down_synteny_graph_edges.tsv.gz
```

The genome_dict.tsv file will provide a mapping from the species tree nodes to the prefix of the result files:

```TSV
genome_id       nb_descendant_leaves    level_from_root RED_score       name
0       85      0       0.00    Viridiplantae
1       7       1       0.26    Chlorophyta
2       4       2       0.51    Mamiellales
3       2       3       0.75    Ostreococcus
4       0       4       1.00    Ostreococcus tauri
5       0       4       1.00    Ostreococcus lucimarinus (strain CCE9901)
6       2       3       0.75    Micromonas
7       0       4       1.00    Micromonas commoda (strain RCC299 / NOUM17 / CCMP2709)
8       0       4       1.00    Micromonas pusilla (strain CCMP1545)
9       3       2       0.54    core chlorophytes
```

Files starting with `0_` will therefor be describing the synteny at the level of Viridiplantae (the root node of this dataset). We can see that this taxonomic level contains 85 species in total.

For internal taxonomic levels (ancestral nodes), edgehog produces three TSV files each, one for the bottom up phase 
where extant adjacencies are propagated and collected (`0_bottom-up_synteny_graph_edges.tsv.gz`), one for the 
top-down phase edges where non-parsimonious edges are removed (`0_top-down_synteny_graph_edges.tsv.gz`) and one which 
contains a linearized form (subset of top-down) that corresponds to our proposed ancestral order 
(`0_linearized_synteny_graph_edges.tsv.gz`).

Those files contain the following columns:
 - gene1: extant/ancestral gene-id of the first gene.
 - gene2: extant/ancestral gene-id of the second gene.    
 - weight: number of extant edges supporting this adjacency
 - contiguous_region: the number of contiguous regions
 - nb_internal_nodes_from_ancestor_with_updated_weight: 
 - supporting_children: The list of children levels that support the adjacency
 - predicted_edge_age_relative_to_root: 
 - predicted_edge_lca: the deepest level where this edge is identified.

Each line in the file corresponds to one ancestral / extant adjacency. Ancestral genes for which no adjacency could be identified will be listed as single column rows.

```TSV
gene1   gene2   weight  contiguous_region       nb_internal_nodes_from_ancestor_with_updated_weight     supporting_children     predicted_edge_age_relative_to_root     predicted_edge_lca
rootHOG_7046    HOG_167759      3.0     0.0     0.0     Micromonas;Ostreococcus 0.49    Mamiellales
rootHOG_7050    HOG_172696      2.0     1.0     0.0     Micromonas;Ostreococcus 0.49    Mamiellales
rootHOG_7081    HOG_169253      3.0     2.0     0.0     Micromonas;Ostreococcus 0.49    Mamiellales
rootHOG_7087    HOG_172341      2.0     3.0     0.0     Micromonas;Ostreococcus 0.49    Mamiellales
...
```

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/DessimozLab/edgehog",
    "name": "edgehog",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.13,>=3.9",
    "maintainer_email": null,
    "keywords": "bioinformatics, synteny, ancestral gene order",
    "author": "Charles Bernard",
    "author_email": "charles.bernard@unil.ch",
    "download_url": "https://files.pythonhosted.org/packages/69/73/f5660888e824fe99a0717fd3bd561e08bc591155e857aeb07d2a64eedb29/edgehog-0.1.6.tar.gz",
    "platform": null,
    "description": "# edgeHOG\n\n## Contents\n\n- [Overview](#overview)\n- [System Requirements](#system-requirements)\n- [Installation Guide](#installation)\n- [Demo](#demo)\n- [Results](#results)\n- [License](./LICENSE.txt)\n- [Issues](https://github.com/dessimozlab/edgehog/issues)\n\n\n## Overview\n\n```edgeHOG``` is a tool to infer the gene order of each ancestor in a species phylogeny. As such, ```edgeHOG``` enables both to explore ancestral microsyntenies (local scale) and to reconstruct ancestral chromosomes (global scale). \n\n```edgeHOG``` relies on objects called HOGs (Hierarchical Groups of Orthologs) to model gene lineages and ancestral gene content. Basically, genes that belong to the same HOG across extant genomes are inferred to have descended from the same common ancestral gene in the common ancestor of these genomes. Accordingly, adjacencies between extant genes can be converted to edges between HOGs, which enables parsimonious ancestral gene order inferences.  \n\n## System Requirements\n\n\n### Hardware Requirements\n\nThe `edgeHOG` package requires only a standard computer with enough RAM. The amount of RAM depends a lot on the size of the dataset. For big datasets (thousands of genomes), more than 100GB of RAM are needed.\n\n### Software Requirements\n\n#### OS Requirements\n\nThe package development version is tested on *Linux* operating systems. The developmental version of the package has been tested on an Ubuntu 22.04 and CentOS 7 environment.\n\nThe package itself should be compatible with Windows, Mac and Linux operating systems.\n\nEdgehog is written in purge Python, so a working python installation is needed before installing edgehog.\n\n#### Installing Python on Ubuntu 22.04\n\nPython can be installed directly from its `apt` system using `apt install python3`\n\n## Installation\n\n### From PyPi using pip\n`edgeHOG`  can be installed directly from pypi using pip. The command is the following:\n\n```bash\npip install edgehog\n```\n\nTo enable hdf5 support and direct reading of genome data from OMA's HDF5 database, you need to enable \nthe `oma` extra during installation:\n\n```bash\npip install edgehog[oma]\n```\n\nThe dependencies are version pinned and will automatically be installed as well. \n\n### From sources\n```edgeHOG``` was built and tested with python 3.9 and higher. To set up ```edgeHOG``` on your local machine, please follow the instructions below. \n\n```bash\npip install poetry  # poetry is used as build and dependency resolving system.\n\ngit clone https://github.com/dessimozlab/edgehog.git\ncd edgehog\npoetry install --extra oma\n```\n\n## Usage\n\n```\nusage: edgehog [-h] [--version] [--output_directory OUTPUT_DIRECTORY] \n               --species_tree SPECIES_TREE  --hogs HOGS \n               [--gff_directory GFF_DIRECTORY] [--hdf5 HDF5] [--orient_edges] \n               [--date_edges] [--phylostratify] [--max_gaps MAX_GAPS] \n               [--include_extant_genes] [--out-format {TSV,HDF5}]\n\nedgehog is a software tool that infers an ancestral synteny graph at each\ninternal node of an input species phylogenetic tree\n\noptional arguments:\n  -h, --help            show this help message and exit\n  --version             print version number and exit\n  --output_directory OUTPUT_DIRECTORY\n                        path to output directory (default is ./edgehog_output)\n  --species_tree SPECIES_TREE\n                        path to species/genomes phylogenetic tree (newick format)\n  --hogs HOGS           path to the HierarchicalGroups.orthoxml file in which HOGs are stored\n  --gff_directory GFF_DIRECTORY\n                        path to directory with the gffs of extant genomes (each gff file must be named according \n                        to the name of an extant genome / leaf on the species tree)\n  --hdf5 HDF5           path to the hdf5 file (alternative to gff_directory to run edgeHOG on the entire OMA \n                        database)\n  --orient_edges        whether the transcriptional orientation of edges should be predicted\n  --date_edges          whether the age of edges in extant species should be predicted\n  --phylostratify       whether the number of edge retention, gain and loss should be analyzed for each node\n                        of the species tree\n  --max_gaps MAX_GAPS   max_gaps can be seen as the theoritical maximal number of consecutive novel genes that\n                        can emerge between two older genes (default = 3), e.g. if max_gaps = 2: the\n                        probabilistic A-b-c-D-E-f-g-h-I-J graph will be turn into A-D-E ; I-J in the\n                        ancestorwhile if max_gaps = 3: the probabilistic A-b-c-D-E-f-g-h-I-J graph will be\n                        turn into A-D-E-I-J in the ancestor\n  --include_extant_genes\n                        include extant genes in output file for ancestral reconstructions.\n  --out-format {TSV,HDF5}\n                        define output format. Can be TSV (tab seperated files) or HDF5 (compatible for \n                        integration into oma hdf5)\n```\n\n## Input data\n\nThree types of input data are needed for ```edgeHOG``` to run:\n* a phylogenetic tree of species/genomes of interest (in newick format)\n* the annotation of each of these genomes (e.g. in the form of a directory of gff files)\n* an HierarchicalGroups.orthoxml file corresponding to the extant genomes\n\nSince these input data are intersected, they must comply with the following requirements:\n* the prefix of a gff filename must correspond to a species/genome identifier in the phylogenetic tree\n* all genome identifiers in the phylogenetic tree must correspond to a genome entry in the HierarchicalGroups.orthoxml file\n* the ```protId``` or the prefix of the ```protId``` followed by the ```' '``` character of each entry of a given input genome in the HierarchicalGroups.orthoxml file must match the ```protein_id``` of a CDS in the gff file of this genome\n\n### Species tree\n\nA phylogenetic tree of the input genomes/species must be provided in the newick format. If internal nodes are not named, they will be named based on the concatenation of the names of their descendant leaves\n\n* The species phylogenetic tree used in the OMA database can be downloaded [here](https://omabrowser.org/All/speciestree.nwk)\n* The high-quality GTDB archaeal and bacterial species trees, along with metadata can be found [here](https://data.gtdb.ecogenomic.org/releases/latest/)\n* To use the tree or a subtree of the NCBI taxonomy database, the ete3 python package has some [useful build-in functions](http://etetoolkit.org/docs/latest/tutorial/tutorial_ncbitaxonomy.html)\n\nTo prune a tree in order to obtain only the phylogeny of your genomes of interest, please refer to the corresponding [ete3 tutorial](http://etetoolkit.org/docs/latest/tutorial/tutorial_trees.html#pruning-trees)\n\nIf you don't have a species tree available for your genomes, you can follow [this tutorial](https://github.com/DessimozLab/f1000_PhylogeneticTree) on how to use OMA Browser and OMA standalone for species tree inference.\n\n\n### HierarchicalGroups.orthoxml\n \nAn HierarchicalGroups.orthoxml file of HOGs defined based on the proteomes and the species tree of input genomes is required for genomic comparisons. HOGs xml files can be retrieved from several leading orthology databases such as [OrthoDB](https://www.orthodb.org/), [EggNOG](http://eggnog5.embl.de), [HieranoiDB](https://hieranoidb.sbc.su.se/) or [OMA](https://omabrowser.org/oma/home/).\n\nIf you don't have a HierarchicalGroups.orthoxml for your genomes, HOGs can be inferred from your input dataset using [OMA_standalone](https://omabrowser.org/standalone/). \n\n## Demo\n\n### Small test dataset\n\nWe provide a small testdata set in the subdirectory `test_data`. `edgeHOG` can be run on this dataset with the following command:\n\n```bash\nedgehog --hog test_data/FastOMA_HOGs.orthoxml \\\n                --species_tree test_data/species_tree.nwk \\\n                --gff_directory test_data/gff3/ \\\n                --date_edges \\\n                --output_directory test-results\n```\n\nSee the [test-data specific README](test_data/README.md) for more details how the dataset was assembled. The [Result section](#results) will discuss what the result files contain and how they can be interpreted.  \n\n### Large dataset (complete OMA database with thousands of genomes)\n`edgehog` can be run on the complete public OMA database using the data available on https://omabrowser.org/oma/current/. For that,\none can download the HOGs ([oma-hogs.orthoXML.gz file](https://omabrowser.org/All/oma-hogs.orthoXML.gz)), the [species tree](https://omabrowser.org/All/speciestree.nwk) and \nthe [OMA HDF5 database](https://omabrowser.org/All/OmaServer.h5). \n\nNote that this dataset is very large (>200 GB). It can be run with the following command:\n\n```bash\nwget https://omabrowser.org/All/oma-hogs.orthoXML.gz\nwget https://omabrowser.org/All/speciestree.nwk\nwget https://omabrowser.org/All/OmaServer.h5\n\ngunzip oma-hogs.orthoXML.gz\nedgehog --hogs oma-hogs.orthoXML --hdf5 OmaServer.h5 --species_tree speciestree.nwk --date_edges --output_directory ./edghog_results\n```\n\n\n\n## Results\n\nedgehog produces a number of result files in the specified output directory (e.g. `./edgehog_output`). Unless the `--out-format` is \nspecified to be hdf5, the result files are all TSV files:\n\n```bash\n$> ls edgehog_results\n0_bottom-up_synteny_graph_edges.tsv.gz   4_extant_synteny_graph_edges.tsv.gz\n0_linearized_synteny_graph_edges.tsv.gz  5_extant_synteny_graph_edges.tsv.gz\n0_top-down_synteny_graph_edges.tsv.gz    6_bottom-up_synteny_graph_edges.tsv.gz\n1_bottom-up_synteny_graph_edges.tsv.gz   6_linearized_synteny_graph_edges.tsv.gz\n1_linearized_synteny_graph_edges.tsv.gz  6_top-down_synteny_graph_edges.tsv.gz\n1_top-down_synteny_graph_edges.tsv.gz    7_extant_synteny_graph_edges.tsv.gz\n2_bottom-up_synteny_graph_edges.tsv.gz   8_extant_synteny_graph_edges.tsv.gz\n2_linearized_synteny_graph_edges.tsv.gz  9_bottom-up_synteny_graph_edges.tsv.gz\n2_top-down_synteny_graph_edges.tsv.gz    9_linearized_synteny_graph_edges.tsv.gz\n3_bottom-up_synteny_graph_edges.tsv.gz   9_top-down_synteny_graph_edges.tsv.gz\n3_linearized_synteny_graph_edges.tsv.gz  genome_dict.tsv\n3_top-down_synteny_graph_edges.tsv.gz\n```\n\nThe genome_dict.tsv file will provide a mapping from the species tree nodes to the prefix of the result files:\n\n```TSV\ngenome_id       nb_descendant_leaves    level_from_root RED_score       name\n0       85      0       0.00    Viridiplantae\n1       7       1       0.26    Chlorophyta\n2       4       2       0.51    Mamiellales\n3       2       3       0.75    Ostreococcus\n4       0       4       1.00    Ostreococcus tauri\n5       0       4       1.00    Ostreococcus lucimarinus (strain CCE9901)\n6       2       3       0.75    Micromonas\n7       0       4       1.00    Micromonas commoda (strain RCC299 / NOUM17 / CCMP2709)\n8       0       4       1.00    Micromonas pusilla (strain CCMP1545)\n9       3       2       0.54    core chlorophytes\n```\n\nFiles starting with `0_` will therefor be describing the synteny at the level of Viridiplantae (the root node of this dataset). We can see that this taxonomic level contains 85 species in total.\n\nFor internal taxonomic levels (ancestral nodes), edgehog produces three TSV files each, one for the bottom up phase \nwhere extant adjacencies are propagated and collected (`0_bottom-up_synteny_graph_edges.tsv.gz`), one for the \ntop-down phase edges where non-parsimonious edges are removed (`0_top-down_synteny_graph_edges.tsv.gz`) and one which \ncontains a linearized form (subset of top-down) that corresponds to our proposed ancestral order \n(`0_linearized_synteny_graph_edges.tsv.gz`).\n\nThose files contain the following columns:\n - gene1: extant/ancestral gene-id of the first gene.\n - gene2: extant/ancestral gene-id of the second gene.    \n - weight: number of extant edges supporting this adjacency\n - contiguous_region: the number of contiguous regions\n - nb_internal_nodes_from_ancestor_with_updated_weight: \n - supporting_children: The list of children levels that support the adjacency\n - predicted_edge_age_relative_to_root: \n - predicted_edge_lca: the deepest level where this edge is identified.\n\nEach line in the file corresponds to one ancestral / extant adjacency. Ancestral genes for which no adjacency could be identified will be listed as single column rows.\n\n```TSV\ngene1   gene2   weight  contiguous_region       nb_internal_nodes_from_ancestor_with_updated_weight     supporting_children     predicted_edge_age_relative_to_root     predicted_edge_lca\nrootHOG_7046    HOG_167759      3.0     0.0     0.0     Micromonas;Ostreococcus 0.49    Mamiellales\nrootHOG_7050    HOG_172696      2.0     1.0     0.0     Micromonas;Ostreococcus 0.49    Mamiellales\nrootHOG_7081    HOG_169253      3.0     2.0     0.0     Micromonas;Ostreococcus 0.49    Mamiellales\nrootHOG_7087    HOG_172341      2.0     3.0     0.0     Micromonas;Ostreococcus 0.49    Mamiellales\n...\n```\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Infering ancestral synteny with hierarchical orthologous groups",
    "version": "0.1.6",
    "project_urls": {
        "Homepage": "https://github.com/DessimozLab/edgehog",
        "Repository": "https://github.com/DessimozLab/edgehog"
    },
    "split_keywords": [
        "bioinformatics",
        " synteny",
        " ancestral gene order"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "bed65e4e772fce1737423a71987b1112844f7ddd7ff65917fe2c8e3cf7310b11",
                "md5": "21c11777415f280f9ccc871d318501e4",
                "sha256": "5f493cb3dc9f31739fd998429a7368ec5995daef47e926d1d383ee50fe7bd8bc"
            },
            "downloads": -1,
            "filename": "edgehog-0.1.6-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "21c11777415f280f9ccc871d318501e4",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.13,>=3.9",
            "size": 29405,
            "upload_time": "2024-10-10T08:51:39",
            "upload_time_iso_8601": "2024-10-10T08:51:39.360031Z",
            "url": "https://files.pythonhosted.org/packages/be/d6/5e4e772fce1737423a71987b1112844f7ddd7ff65917fe2c8e3cf7310b11/edgehog-0.1.6-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6973f5660888e824fe99a0717fd3bd561e08bc591155e857aeb07d2a64eedb29",
                "md5": "f198f9c337672ed2296b9a5be2117787",
                "sha256": "f925a386484cbe4a94c459e706fa42d8c3710b3469472e83cb6df2957e44de58"
            },
            "downloads": -1,
            "filename": "edgehog-0.1.6.tar.gz",
            "has_sig": false,
            "md5_digest": "f198f9c337672ed2296b9a5be2117787",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.13,>=3.9",
            "size": 28875,
            "upload_time": "2024-10-10T08:51:41",
            "upload_time_iso_8601": "2024-10-10T08:51:41.068661Z",
            "url": "https://files.pythonhosted.org/packages/69/73/f5660888e824fe99a0717fd3bd561e08bc591155e857aeb07d2a64eedb29/edgehog-0.1.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-10 08:51:41",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "DessimozLab",
    "github_project": "edgehog",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "edgehog"
}

Charles Bernard