edgehog


Nameedgehog JSON
Version 0.1.4 PyPI version JSON
download
home_pagehttps://github.com/DessimozLab/edgehog
SummaryInfering ancestral synteny with hierarchical orthologous groups
upload_time2024-05-03 11:31:11
maintainerNone
docs_urlNone
authorCharles Bernard
requires_python<3.13,>=3.10
licenseMIT
keywords bioinformatics synteny ancestral gene order
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # edgeHOG

```edgeHOG``` is a tool to infer the gene order of each ancestor in a species phylogeny. As such, ```edgeHOG``` enables both to explore ancestral microsyntenies (local scale) and to reconstruct ancestral chromosomes (global scale). 

```edgeHOG``` relies on objects called HOGs (Hierarchical Groups of Orthologs) to model gene lineages and ancestral gene content. Basically, genes that belong to the same HOG across extant genomes are inferred to have descended from the same common ancestral gene in the common ancestor of these genomes. Accordingly, adjacencies between extant genes can be converted to edges between HOGs, which enables parsimonious ancestral gene order inferences.  

## Installation

### From PyPi using pip
`edgeHOG`  can be installed directly from pypi using pip. The command is the following:

```bash
pip install edgehog
```

### From sources
```edgeHOG``` was built and tested with python 3.6. To set up ```edgeHOG``` on your local machine, please follow the instructions below

```bash
git clone https://github.com/dessimozlab/edgehog.git
cd edgehog
poetry install
```

## Usage

```
usage: edgehog [-h] [--version] [--output_directory OUTPUT_DIRECTORY]
               --species_tree SPECIES_TREE --hogs HOGS
               [--gff_directory GFF_DIRECTORY] [--hdf5 HDF5] [--date_edges]
               [--max_gaps MAX_GAPS]

edgehog is a software tool that infers an ancestral synteny graph at each
internal node of an input species phylogenetic tree

optional arguments:
  -h, --help            show this help message and exit
  --version             print version number and exit
  --output_directory OUTPUT_DIRECTORY
                        path to output directory (default is current
                        directory)
  --species_tree SPECIES_TREE
                        path to species/genomes phylogenetic tree (newick
                        format)
  --hogs HOGS           path to the HierarchicalGroups.orthoxml file in which
                        HOGs are stored
  --gff_directory GFF_DIRECTORY
                        path to directory with the gffs of extant genomes
                        (each gff file must be named according to the name of
                        an extant genome / leaf on the species tree)
  --hdf5 HDF5           path to the hdf5 file (alternative to gff_directory to
                        run edgeHOG on the OMA database)
  --date_edges          whether the age of edges in extant species should be
                        predicted

```

## Input data

Three types of input data are needed for ```edgeHOG``` to run:
* a phylogenetic tree of species/genomes of interest (in newick format)
* the annotation of each of these genomes (e.g. in the form of a directory of gff files)
* an HierarchicalGroups.orthoxml file corresponding to the extant genomes

Since these input data are intersected, they must comply with the following requirements:
* the prefix of a gff filename must correspond to a species/genome identifier in the phylogenetic tree
* all genome identifiers in the phylogenetic tree must correspond to a genome entry in the HierarchicalGroups.orthoxml file
* the ```protId``` or the prefix of the ```protId``` followed by the ```' '``` character of each entry of a given input genome in the HierarchicalGroups.orthoxml file must match the ```protein_id``` of a CDS in the gff file of this genome

### Species tree

A phylogenetic tree of the input genomes/species must be provided in the newick format. If internal nodes are not named, they will be named based on the concatenation of the names of their descendant leaves

* The species phylogenetic tree used in the OMA database can be downloaded [here](https://omabrowser.org/All/speciestree.nwk)
* The high-quality GTDB archaeal and bacterial species trees, along with metadata can be found [here](https://data.gtdb.ecogenomic.org/releases/latest/)
* To use the tree or a subtree of the NCBI taxonomy database, the ete3 python package has some [useful build-in functions](http://etetoolkit.org/docs/latest/tutorial/tutorial_ncbitaxonomy.html)

To prune a tree in order to obtain only the phylogeny of your genomes of interest, please refer to the corresponding [ete3 tutorial](http://etetoolkit.org/docs/latest/tutorial/tutorial_trees.html#pruning-trees)

If you don't have a species tree available for your genomes, you can follow [this tutorial](https://github.com/DessimozLab/f1000_PhylogeneticTree) on how to use OMA Browser and OMA standalone for species tree inference.


### HierarchicalGroups.orthoxml
 
An HierarchicalGroups.orthoxml file of HOGs defined based on the proteomes and the species tree of input genomes is required for genomic comparisons. HOGs xml files can be retrieved from several leading orthology databases such as [OrthoDB](https://www.orthodb.org/), [EggNOG](http://eggnog5.embl.de), [HieranoiDB](https://hieranoidb.sbc.su.se/) or [OMA](https://omabrowser.org/oma/home/).

If you don't have a HierarchicalGroups.orthoxml for your genomes, HOGs can be inferred from your input dataset using [OMA_standalone](https://omabrowser.org/standalone/). 


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/DessimozLab/edgehog",
    "name": "edgehog",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.13,>=3.10",
    "maintainer_email": null,
    "keywords": "bioinformatics, synteny, ancestral gene order",
    "author": "Charles Bernard",
    "author_email": "charles.bernard@unil.ch",
    "download_url": "https://files.pythonhosted.org/packages/d5/82/9096d7376fca9037a1d26746deaabe5f65d7aab433cc4715d1427e7c24e2/edgehog-0.1.4.tar.gz",
    "platform": null,
    "description": "# edgeHOG\n\n```edgeHOG``` is a tool to infer the gene order of each ancestor in a species phylogeny. As such, ```edgeHOG``` enables both to explore ancestral microsyntenies (local scale) and to reconstruct ancestral chromosomes (global scale). \n\n```edgeHOG``` relies on objects called HOGs (Hierarchical Groups of Orthologs) to model gene lineages and ancestral gene content. Basically, genes that belong to the same HOG across extant genomes are inferred to have descended from the same common ancestral gene in the common ancestor of these genomes. Accordingly, adjacencies between extant genes can be converted to edges between HOGs, which enables parsimonious ancestral gene order inferences.  \n\n## Installation\n\n### From PyPi using pip\n`edgeHOG`  can be installed directly from pypi using pip. The command is the following:\n\n```bash\npip install edgehog\n```\n\n### From sources\n```edgeHOG``` was built and tested with python 3.6. To set up ```edgeHOG``` on your local machine, please follow the instructions below\n\n```bash\ngit clone https://github.com/dessimozlab/edgehog.git\ncd edgehog\npoetry install\n```\n\n## Usage\n\n```\nusage: edgehog [-h] [--version] [--output_directory OUTPUT_DIRECTORY]\n               --species_tree SPECIES_TREE --hogs HOGS\n               [--gff_directory GFF_DIRECTORY] [--hdf5 HDF5] [--date_edges]\n               [--max_gaps MAX_GAPS]\n\nedgehog is a software tool that infers an ancestral synteny graph at each\ninternal node of an input species phylogenetic tree\n\noptional arguments:\n  -h, --help            show this help message and exit\n  --version             print version number and exit\n  --output_directory OUTPUT_DIRECTORY\n                        path to output directory (default is current\n                        directory)\n  --species_tree SPECIES_TREE\n                        path to species/genomes phylogenetic tree (newick\n                        format)\n  --hogs HOGS           path to the HierarchicalGroups.orthoxml file in which\n                        HOGs are stored\n  --gff_directory GFF_DIRECTORY\n                        path to directory with the gffs of extant genomes\n                        (each gff file must be named according to the name of\n                        an extant genome / leaf on the species tree)\n  --hdf5 HDF5           path to the hdf5 file (alternative to gff_directory to\n                        run edgeHOG on the OMA database)\n  --date_edges          whether the age of edges in extant species should be\n                        predicted\n\n```\n\n## Input data\n\nThree types of input data are needed for ```edgeHOG``` to run:\n* a phylogenetic tree of species/genomes of interest (in newick format)\n* the annotation of each of these genomes (e.g. in the form of a directory of gff files)\n* an HierarchicalGroups.orthoxml file corresponding to the extant genomes\n\nSince these input data are intersected, they must comply with the following requirements:\n* the prefix of a gff filename must correspond to a species/genome identifier in the phylogenetic tree\n* all genome identifiers in the phylogenetic tree must correspond to a genome entry in the HierarchicalGroups.orthoxml file\n* the ```protId``` or the prefix of the ```protId``` followed by the ```' '``` character of each entry of a given input genome in the HierarchicalGroups.orthoxml file must match the ```protein_id``` of a CDS in the gff file of this genome\n\n### Species tree\n\nA phylogenetic tree of the input genomes/species must be provided in the newick format. If internal nodes are not named, they will be named based on the concatenation of the names of their descendant leaves\n\n* The species phylogenetic tree used in the OMA database can be downloaded [here](https://omabrowser.org/All/speciestree.nwk)\n* The high-quality GTDB archaeal and bacterial species trees, along with metadata can be found [here](https://data.gtdb.ecogenomic.org/releases/latest/)\n* To use the tree or a subtree of the NCBI taxonomy database, the ete3 python package has some [useful build-in functions](http://etetoolkit.org/docs/latest/tutorial/tutorial_ncbitaxonomy.html)\n\nTo prune a tree in order to obtain only the phylogeny of your genomes of interest, please refer to the corresponding [ete3 tutorial](http://etetoolkit.org/docs/latest/tutorial/tutorial_trees.html#pruning-trees)\n\nIf you don't have a species tree available for your genomes, you can follow [this tutorial](https://github.com/DessimozLab/f1000_PhylogeneticTree) on how to use OMA Browser and OMA standalone for species tree inference.\n\n\n### HierarchicalGroups.orthoxml\n \nAn HierarchicalGroups.orthoxml file of HOGs defined based on the proteomes and the species tree of input genomes is required for genomic comparisons. HOGs xml files can be retrieved from several leading orthology databases such as [OrthoDB](https://www.orthodb.org/), [EggNOG](http://eggnog5.embl.de), [HieranoiDB](https://hieranoidb.sbc.su.se/) or [OMA](https://omabrowser.org/oma/home/).\n\nIf you don't have a HierarchicalGroups.orthoxml for your genomes, HOGs can be inferred from your input dataset using [OMA_standalone](https://omabrowser.org/standalone/). \n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Infering ancestral synteny with hierarchical orthologous groups",
    "version": "0.1.4",
    "project_urls": {
        "Homepage": "https://github.com/DessimozLab/edgehog",
        "Repository": "https://github.com/DessimozLab/edgehog"
    },
    "split_keywords": [
        "bioinformatics",
        " synteny",
        " ancestral gene order"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "19297b2df216f95db782193bb721fb1f156bc3856b263804f1aff2e4bd7f8616",
                "md5": "aefa2ed6672e68e028be636d37e7346a",
                "sha256": "6975b59b7f4e4b92b55b844e4169eeeb59866edac6d5b5dc4d000660aa7c065c"
            },
            "downloads": -1,
            "filename": "edgehog-0.1.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "aefa2ed6672e68e028be636d37e7346a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.13,>=3.10",
            "size": 23255,
            "upload_time": "2024-05-03T11:31:10",
            "upload_time_iso_8601": "2024-05-03T11:31:10.117852Z",
            "url": "https://files.pythonhosted.org/packages/19/29/7b2df216f95db782193bb721fb1f156bc3856b263804f1aff2e4bd7f8616/edgehog-0.1.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d5829096d7376fca9037a1d26746deaabe5f65d7aab433cc4715d1427e7c24e2",
                "md5": "52d1ffc730b9ba5bc2927ff7e22f8bfe",
                "sha256": "a557ae34cadd1fb273d5a25406550413549a038602df620fa59812b3b3a95d11"
            },
            "downloads": -1,
            "filename": "edgehog-0.1.4.tar.gz",
            "has_sig": false,
            "md5_digest": "52d1ffc730b9ba5bc2927ff7e22f8bfe",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.13,>=3.10",
            "size": 20900,
            "upload_time": "2024-05-03T11:31:11",
            "upload_time_iso_8601": "2024-05-03T11:31:11.777064Z",
            "url": "https://files.pythonhosted.org/packages/d5/82/9096d7376fca9037a1d26746deaabe5f65d7aab433cc4715d1427e7c24e2/edgehog-0.1.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-03 11:31:11",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "DessimozLab",
    "github_project": "edgehog",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "edgehog"
}
        
Elapsed time: 0.27794s