kegg-pathways-completeness


Namekegg-pathways-completeness JSON
Version 1.0.5 PyPI version JSON
download
home_pageNone
SummaryThe tool counts completeness of each KEGG pathway for protein sequences.
upload_time2024-07-08 10:54:38
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseApache Software License 2.0
keywords bioinformatics pipelines metagenomics kegg
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # kegg-pathways-completeness tool

This tool computes the completeness of each [KEGG pathway module](https://www.genome.jp/kegg/module.html) for given set of [KEGG orthologues (KOs)](https://www.genome.jp/kegg/ko.html) based on their presence/absence. The current version of this tool has 482 KEGG modules (updated 02/07/2024). 

Please read the **Theory** section at the bottom of this README for a detailed explanation. 

#### Input example
- [per contig annotation](example/example_hmmscan_annotation.txt) with KOs (ideally given from hmmscan annotation (see [instructions](src/README.md)));  \
or 
- [list](example/example_list_kos.txt) of KOs.

#### Output example

- `*.summary.kegg_pathways.tsv` ([example](example/example_hmmscan.summary.kegg_pathways.tsv)) contains module pathways completeness calculated for all KOs in the given input file.
- `*.summary.kegg_contigs.tsv` ([example](example/example_hmmscan.summary.kegg_contigs.tsv)) contains module pathways completeness calculated per each contig (first column contains name of contig) if contig annotation were provided with `-i`.

Optional:
- `pathways_plots/` ([example](example/pathways_plots)) folder containing PNG representation and graphs generated with `--plot-pathways` argument. 
- `with_weights.*.tsv` [example](example/with_weights.summary.kegg.summary.kegg_contigs.tsv) of output generated with `--include-weights` argument. Each KO has a weight in brackets.

Check more examples of different output files [here](tests/fixtures/give_pathways/output).

## Installation
This tool was published in Pypi and Bioconda:

#### Install with pip
```commandline
pip install kegg-pathways-completeness
```

#### Install with bioconda
Follow [bioconda instructions](https://bioconda.github.io/recipes/kegg-pathways-completeness/README.html#package-package%20'kegg-pathways-completeness')


#### Install from source using venv/conda env (not the best option)
```commandline
conda create --name kegg-env
conda activate kegg-env

pip3 install -r requirements.txt
```


## How to run

#### Quick start
```
# for list of KOs
give_pathways -l {INPUT_LIST}

# per contig annotation with KOs
give_pathways -i {INPUT_FILE}
```

#### Run with test examples
```comandline
# hmmtable as input
python3 kegg_pathways_completeness/bin/give_pathways.py \
  -i 'tests/fixtures/give_pathways/test_pathway.txt' \
  -o test_pathway

# KOs list as input
python3 kegg_pathways_completeness/bin/give_pathways.py \
  -l 'tests/fixtures/give_pathways/test_kos.txt' \
  -o test_list_kos
```

#### Run using docker 
Results can be found in folder `results`. Final annotated pathways are generated in `results/pathways`
```commandline
export INPUT="path to hmm-result table"
docker \
    run \
    -i \
    --workdir=/results \
    --volume=`pwd`/results:/results:rw \
    --volume=${INPUT}:/files/input_table.tsv:ro \
    quay.io/microbiome-informatics/kegg-completeness:v1.1 \
    /tools/run_pathways.sh \
    -i /files/input_table.tsv
```


## Input arguments description

**Required arguments:** 

_input file:_

An input file is required under either of the following commands:
- input table (`-i`/`--input`): hmmsearch table ([example](tests/fixtures/give_pathways/test_pathway.txt)) that was run on KEGG profiles DB with annotated sequences (preferable). If you don't have this table, follow these [instructions](src/README.md) to generate it.
- file with KOs list (`-l`/`--input-list`): comma separated file with list of KOs ([example](tests/fixtures/give_pathways/test_kos.txt)).

**Optional arguments:**

- output prefix (`-o`/`--outname`): prefix for output tables (`-o test_kos` in [example](tests/fixtures/give_pathways/output/test_kos.summary.kegg_contigs.tsv))
- add weight information to output files (`-w`/`--include-weights`). The output table will contain the weight of each KO edge in the pathway graph, for example K00942(0.25) means that the KO has 0.25 importance in the given pathway. Example of [output](tests/fixtures/give_pathways/output/test_weights.summary.kegg_pathways.tsv)
- plot present KOs in pathways (`p`/`--plot-pathways`): generates a PNG containing a schematic representation of the pathway. Presented KOs are marked with red edges. Example: [M00002](tests/fixtures/give_pathways/output/pathways_plots/M00002.png)


_pathways data: modules information and graphs_ 

This repository contains a set of pre-generated files. Modules information files can be found in **[pathways_data](kegg_pathways_completeness/pathways_data)**. 
The repository also contains pre-parsed module pathways into graphs format. In order to generate graphs all pathways were parsed with the NetworkX library. The graph for every module is shown in .png format in [png folder](kegg_pathways_completeness/graphs/png) and .dot format in [dots folder](kegg_pathways_completeness/graphs/dots). Pathway and weights of each KO can be easily checked with the .png image.

**In order to run a tool there is no need to re-generate those files again.**
All [graphs re-generation instructions](kegg_pathways_completeness/graphs/README.md) and [module pathways info re-generation commands](kegg_pathways_completeness/pathways_data/README.md) are provided for updates and understanding a process.

_modules information:_

- list of KEGG modules in KOs notation (`-a`/`--pathways`) (latest [all_pathways.txt](kegg_pathways_completeness%2Fpathways_data%2Fall_pathways.txt))
- list of classes of KEGG modules (`-c`/`--classes`) (latest [all_pathways_class.txt](kegg_pathways_completeness%2Fpathways_data%2Fall_pathways_class.txt))
- list of names of KEGG modules (`-n`/`--names`) (latest [all_pathways_names.txt](kegg_pathways_completeness%2Fpathways_data%2Fall_pathways_names.txt))

_graphs:_

- graphs constructed from each module (`-g`/`--graphs`) (latest [graphs.pkl](kegg_pathways_completeness%2Fgraphs%2Fgraphs.pkl))


### Plot pathway completeness

**NOTE**: please make sure you have [**graphviz**](https://graphviz.org/) installed

You can also run the plotting script separately:
```commandline
plot_completeness_graphs.py -i output_with_pathways_completeness
```

#### Example

![M00050.png](tests/fixtures/give_pathways/output/pathways_plots/M00050.png)

More examples for test data [here](tests/fixtures/give_pathways/output/pathways_plots)


## Theory: 
#### Pathways to graphs 
KEGG provides a representation of each pathway as a specific expression of KOs.
example **A ((B,C) D,E) (A+F)** where:
- A, B, C, D, E, F are KOs
- **space** == AND
- **comma** == OR
- **plus** == essential component
- **minus** == optional component
- **minus minus** == missing optional component (replaced into K0000 with 0 weight ([example](kegg_pathways_completeness/graphs/png/M00014.png)))

Each expression was [converted](kegg_pathways_completeness/bin/make_graphs/make_graphs.py) into a directed graph using NetworkX. The first node is node 0 and the last one is node 1. Each edge corresponds to a KO. 

![ex1.png](src%2Fimg%2Fex1.png)

#### Completeness
In order to compute pathways completeness, each node in the graph is weighted. The default weight of each edge is 0.

Given a set of predicted KOs, if the KO is present in the pathway, the corresponding edge will have assigned weight = 1 (or 0 if edge is optional or another value if edge is connected by +). After that, this [script](kegg_pathways_completeness/bin/give_pathways.py) searches the most relevant path by `graph_weight` from node 0 to node 1. `max_graph_weight` is then calculated under the assumption that all KOs are present.

``
completeness = graph_weight/max_graph_weight * 100%
``

![ex2.png](src%2Fimg%2Fex2.png)

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "kegg-pathways-completeness",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "bioinformatics, pipelines, metagenomics, kegg",
    "author": null,
    "author_email": "Ekaterina Sakharova <kates@ebi.ac.uk>",
    "download_url": "https://files.pythonhosted.org/packages/6b/c1/5f896e7bc857dc3ae23fb0e7e987182c0d723fb809375f7e03d03f208727/kegg_pathways_completeness-1.0.5.tar.gz",
    "platform": null,
    "description": "# kegg-pathways-completeness tool\n\nThis tool computes the completeness of each [KEGG pathway module](https://www.genome.jp/kegg/module.html) for given set of [KEGG orthologues (KOs)](https://www.genome.jp/kegg/ko.html) based on their presence/absence. The current version of this tool has 482 KEGG modules (updated 02/07/2024). \n\nPlease read the **Theory** section at the bottom of this README for a detailed explanation. \n\n#### Input example\n- [per contig annotation](example/example_hmmscan_annotation.txt) with KOs (ideally given from hmmscan annotation (see [instructions](src/README.md)));  \\\nor \n- [list](example/example_list_kos.txt) of KOs.\n\n#### Output example\n\n- `*.summary.kegg_pathways.tsv` ([example](example/example_hmmscan.summary.kegg_pathways.tsv)) contains module pathways completeness calculated for all KOs in the given input file.\n- `*.summary.kegg_contigs.tsv` ([example](example/example_hmmscan.summary.kegg_contigs.tsv)) contains module pathways completeness calculated per each contig (first column contains name of contig) if contig annotation were provided with `-i`.\n\nOptional:\n- `pathways_plots/` ([example](example/pathways_plots)) folder containing PNG representation and graphs generated with `--plot-pathways` argument. \n- `with_weights.*.tsv` [example](example/with_weights.summary.kegg.summary.kegg_contigs.tsv) of output generated with `--include-weights` argument. Each KO has a weight in brackets.\n\nCheck more examples of different output files [here](tests/fixtures/give_pathways/output).\n\n## Installation\nThis tool was published in Pypi and Bioconda:\n\n#### Install with pip\n```commandline\npip install kegg-pathways-completeness\n```\n\n#### Install with bioconda\nFollow [bioconda instructions](https://bioconda.github.io/recipes/kegg-pathways-completeness/README.html#package-package%20&#x27;kegg-pathways-completeness&#x27;)\n\n\n#### Install from source using venv/conda env (not the best option)\n```commandline\nconda create --name kegg-env\nconda activate kegg-env\n\npip3 install -r requirements.txt\n```\n\n\n## How to run\n\n#### Quick start\n```\n# for list of KOs\ngive_pathways -l {INPUT_LIST}\n\n# per contig annotation with KOs\ngive_pathways -i {INPUT_FILE}\n```\n\n#### Run with test examples\n```comandline\n# hmmtable as input\npython3 kegg_pathways_completeness/bin/give_pathways.py \\\n  -i 'tests/fixtures/give_pathways/test_pathway.txt' \\\n  -o test_pathway\n\n# KOs list as input\npython3 kegg_pathways_completeness/bin/give_pathways.py \\\n  -l 'tests/fixtures/give_pathways/test_kos.txt' \\\n  -o test_list_kos\n```\n\n#### Run using docker \nResults can be found in folder `results`. Final annotated pathways are generated in `results/pathways`\n```commandline\nexport INPUT=\"path to hmm-result table\"\ndocker \\\n    run \\\n    -i \\\n    --workdir=/results \\\n    --volume=`pwd`/results:/results:rw \\\n    --volume=${INPUT}:/files/input_table.tsv:ro \\\n    quay.io/microbiome-informatics/kegg-completeness:v1.1 \\\n    /tools/run_pathways.sh \\\n    -i /files/input_table.tsv\n```\n\n\n## Input arguments description\n\n**Required arguments:** \n\n_input file:_\n\nAn input file is required under either of the following commands:\n- input table (`-i`/`--input`): hmmsearch table ([example](tests/fixtures/give_pathways/test_pathway.txt)) that was run on KEGG profiles DB with annotated sequences (preferable). If you don't have this table, follow these [instructions](src/README.md) to generate it.\n- file with KOs list (`-l`/`--input-list`): comma separated file with list of KOs ([example](tests/fixtures/give_pathways/test_kos.txt)).\n\n**Optional arguments:**\n\n- output prefix (`-o`/`--outname`): prefix for output tables (`-o test_kos` in [example](tests/fixtures/give_pathways/output/test_kos.summary.kegg_contigs.tsv))\n- add weight information to output files (`-w`/`--include-weights`). The output table will contain the weight of each KO edge in the pathway graph, for example K00942(0.25) means that the KO has 0.25 importance in the given pathway. Example of [output](tests/fixtures/give_pathways/output/test_weights.summary.kegg_pathways.tsv)\n- plot present KOs in pathways (`p`/`--plot-pathways`): generates a PNG containing a schematic representation of the pathway. Presented KOs are marked with red edges. Example: [M00002](tests/fixtures/give_pathways/output/pathways_plots/M00002.png)\n\n\n_pathways data: modules information and graphs_ \n\nThis repository contains a set of pre-generated files. Modules information files can be found in **[pathways_data](kegg_pathways_completeness/pathways_data)**. \nThe repository also contains pre-parsed module pathways into graphs format. In order to generate graphs all pathways were parsed with the NetworkX library. The graph for every module is shown in .png format in [png folder](kegg_pathways_completeness/graphs/png) and .dot format in [dots folder](kegg_pathways_completeness/graphs/dots). Pathway and weights of each KO can be easily checked with the .png image.\n\n**In order to run a tool there is no need to re-generate those files again.**\nAll [graphs re-generation instructions](kegg_pathways_completeness/graphs/README.md) and [module pathways info re-generation commands](kegg_pathways_completeness/pathways_data/README.md) are provided for updates and understanding a process.\n\n_modules information:_\n\n- list of KEGG modules in KOs notation (`-a`/`--pathways`) (latest [all_pathways.txt](kegg_pathways_completeness%2Fpathways_data%2Fall_pathways.txt))\n- list of classes of KEGG modules (`-c`/`--classes`) (latest [all_pathways_class.txt](kegg_pathways_completeness%2Fpathways_data%2Fall_pathways_class.txt))\n- list of names of KEGG modules (`-n`/`--names`) (latest [all_pathways_names.txt](kegg_pathways_completeness%2Fpathways_data%2Fall_pathways_names.txt))\n\n_graphs:_\n\n- graphs constructed from each module (`-g`/`--graphs`) (latest [graphs.pkl](kegg_pathways_completeness%2Fgraphs%2Fgraphs.pkl))\n\n\n### Plot pathway completeness\n\n**NOTE**: please make sure you have [**graphviz**](https://graphviz.org/) installed\n\nYou can also run the plotting script separately:\n```commandline\nplot_completeness_graphs.py -i output_with_pathways_completeness\n```\n\n#### Example\n\n![M00050.png](tests/fixtures/give_pathways/output/pathways_plots/M00050.png)\n\nMore examples for test data [here](tests/fixtures/give_pathways/output/pathways_plots)\n\n\n## Theory: \n#### Pathways to graphs \nKEGG provides a representation of each pathway as a specific expression of KOs.\nexample **A ((B,C) D,E) (A+F)** where:\n- A, B, C, D, E, F are KOs\n- **space** == AND\n- **comma** == OR\n- **plus** == essential component\n- **minus** == optional component\n- **minus minus** == missing optional component (replaced into K0000 with 0 weight ([example](kegg_pathways_completeness/graphs/png/M00014.png)))\n\nEach expression was [converted](kegg_pathways_completeness/bin/make_graphs/make_graphs.py) into a directed graph using NetworkX. The first node is node 0 and the last one is node 1. Each edge corresponds to a KO. \n\n![ex1.png](src%2Fimg%2Fex1.png)\n\n#### Completeness\nIn order to compute pathways completeness, each node in the graph is weighted. The default weight of each edge is 0.\n\nGiven a set of predicted KOs, if the KO is present in the pathway, the corresponding edge will have assigned weight = 1 (or 0 if edge is optional or another value if edge is connected by +). After that, this [script](kegg_pathways_completeness/bin/give_pathways.py) searches the most relevant path by `graph_weight` from node 0 to node 1. `max_graph_weight` is then calculated under the assumption that all KOs are present.\n\n``\ncompleteness = graph_weight/max_graph_weight * 100%\n``\n\n![ex2.png](src%2Fimg%2Fex2.png)\n",
    "bugtrack_url": null,
    "license": "Apache Software License 2.0",
    "summary": "The tool counts completeness of each KEGG pathway for protein sequences.",
    "version": "1.0.5",
    "project_urls": null,
    "split_keywords": [
        "bioinformatics",
        " pipelines",
        " metagenomics",
        " kegg"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e3312372259c41c1811e012a17eae57406be5428e8121163c3deeef8aa54bbbf",
                "md5": "bc642d28652f4e40f3ef6db34d0f31e6",
                "sha256": "0dce69750e5c891709d5cabfeab44c6c84a2de2f6201ac7283e38772f0b58c49"
            },
            "downloads": -1,
            "filename": "kegg_pathways_completeness-1.0.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "bc642d28652f4e40f3ef6db34d0f31e6",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 112451,
            "upload_time": "2024-07-08T10:54:36",
            "upload_time_iso_8601": "2024-07-08T10:54:36.665184Z",
            "url": "https://files.pythonhosted.org/packages/e3/31/2372259c41c1811e012a17eae57406be5428e8121163c3deeef8aa54bbbf/kegg_pathways_completeness-1.0.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6bc15f896e7bc857dc3ae23fb0e7e987182c0d723fb809375f7e03d03f208727",
                "md5": "767bf86bc7af0bb777914ece55ad8b2f",
                "sha256": "b7ec0b1ceabd168296a18ab1184a26e40cfa4fdcaf156cd1d75b0e12c51f8aaa"
            },
            "downloads": -1,
            "filename": "kegg_pathways_completeness-1.0.5.tar.gz",
            "has_sig": false,
            "md5_digest": "767bf86bc7af0bb777914ece55ad8b2f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 107597,
            "upload_time": "2024-07-08T10:54:38",
            "upload_time_iso_8601": "2024-07-08T10:54:38.420530Z",
            "url": "https://files.pythonhosted.org/packages/6b/c1/5f896e7bc857dc3ae23fb0e7e987182c0d723fb809375f7e03d03f208727/kegg_pathways_completeness-1.0.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-07-08 10:54:38",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "kegg-pathways-completeness"
}
        
Elapsed time: 0.45849s