# KEGG Pathway Profiler
`KEGG Pathway Profiler` is a pathway profiling tool designed for traversing metabolic pathway graphs, identifying most complete paths based on an evaluation set of KEGG orthologs (KO), and generalized for internal usage within Python and via CLI executables. This package is a reimplementation of [kegg-pathways-completeness-tool](https://github.com/EBI-Metagenomics/kegg-pathways-completeness-tool) (e.g., base code and theory).
For any publications or usage, please cite the original implementation and credit the lead developer (See [Acknowledgements](#acknowledgements) below).
## Installation:
```
pip install kegg_pathway_profiler
```
## Dependencies:
```
networkx>=3.0
numpy>=1.9
scipy>=1.11
pandas>=1.0
tqdm
```
## CLI Usage:
### Fetching and building the database:
**Option 1:**
```
# Download and build the database
# Default: site-packages/kegg_pathway_profiler/data/database.pkl.gz
build-pathway-database.py \
-d data/database.pkl.gz \
--download \
```
**Option 2:**
```
# Fetch the database
mkdir -p data/
download-kegg-pathways.sh data/
# Build the database
build-pathway-database.py \
-d data/database.pkl.gz \
-i data/pathway_definitions.tsv \
-n data/pathway_names.tsv \
-c data/pathway_classes.tsv \
```
### Profile pathway coverage
#### Running:
```
profile-pathway-coverage.py -i data/test/kos.genomes.tsv -o data/test/pathway-profiler_output -d data/database.pkl.gz
```

#### Output:

## Python Usage:
### Evaluate pathway coverage
```python
import kegg_pathway_profiler as kpp
# Load Database
database = kpp.utils.read_pickle("data/database.pkl.gz")
id = "M00001"
pathway = kpp.pathways.Pathway(
id=id,
definition=database[id]["definition"],
name=database[id]["name"],
classes=database[id]["classes"],
)
pathway
# ==================
# Pathway(id:M00001)
# ==================
# Properties:
# - name: Glycolysis (Embden-Meyerhof pathway), glucose => pyruvate
# - classes: Pathway modules; Carbohydrate metabolism; Central carbohydrate metabolism
# - number_of_kos: 32
# Definition:
# (K00844,K12407,K00845,K25026,K00886,K08074,K00918) (K01810,K06859,K13810,K15916) (K00850,K16370,K21071,K00918) (K01623,K01624,K11645,K16305,K16306) K01803 ((K00134,K00150) K00927,K11389) (K01834,K15633,K15634,K15635) (K01689,K27394) (K00873,K12406)
# Evaluate
evaluation_kos = {'K00134',
'K00150',
'K00844',
'K00845',
'K00850',
'K00873',
'K00886',
'K00918',
'K00927',
'K01623',
'K01624',
'K01689',
'K16370',
'K21071',
'K25026',
'K27394',
}
results = pathway.evaluate(evaluation_kos)
# Get coverage only
results["coverage"]
# 0.6666666666666667
# Get most complete path KOs
results["most_complete_path"]
# ['K00844',
# 'K01810',
# 'K00850',
# 'K01623',
# 'K01803',
# 'K00134',
# 'K00927',
# 'K01834',
# 'K01689',
# 'K00873']
```
### Most complete path set enrichment (e.g., step enrichment)
```python
df_enrichment = kpp.enrichment.unweighted_pathway_enrichment_wrapper(
evaluation_kos=evaluation_kos,
database=database,
background_set=None,
)
```

## Documentation:
### profile-pathway-coverage.py
```
usage: profile-pathway-coverage.py
Running: profile-pathway-coverage.py v3.10.14 via Python v/Users/jolespin/miniconda3/envs/kegg_pathway_profiler_env/bin/python3.10 | profile-pathway-coverage.py
options:
-h, --help show this help message and exit
I/O arguments:
-i KOS, --kos KOS path/to/kos.list[.gz]. Can either be 1 KO per line or a tab-separated table with the following structure: [id_genome]<tab>[id_ko], No header.
-n NAME, --name NAME Name of genome. [Default: Filename for --kos]
-o OUTPUT_DIRECTORY, --output_directory OUTPUT_DIRECTORY
path/to/output_directory/ (e.g., kegg_pathway_profiler_output/]
-d DATABASE, --database DATABASE
path/to/database.pkl[.gz] [Default: /Users/jolespin/miniconda3/envs/kegg_pathway_profiler_env/lib/python3.10/site-packages/kegg_pathway_profiler/data/database.pkl.gz]
--index_name INDEX_NAME
Index name for coverage table (e.g., id_genome, id_genome_cluster, id_contig) [Default: id_genome]
Copyright 2024 New Atlantis Labs (jolespin@newatlantis.io)
```
### build-pathway-database.py
```
usage: build-pathway-database.py
Running: build-pathway-database.py v3.10.14 via Python v/Users/jolespin/miniconda3/envs/kegg_pathway_profiler_env/bin/python3.10 | build-pathway-database.py
options:
-h, --help show this help message and exit
Local arguments:
-d DATABASE, --database DATABASE
path/to/database.pkl[.gz] [Default: /Users/jolespin/miniconda3/envs/kegg_pathway_profiler_env/lib/python3.10/site-packages/kegg_pathway_profiler/data/database.pkl.gz]
-V DATABASE_VERSION, --database_version DATABASE_VERSION
Database version: Adds version information to the following file: path/to/database.version where .pkl extensions are removed [Default: KEGG_v2024.8.23]
-f, --force If file exists, then remove file and update it.
Local arguments:
-i PATHWAY_DEFINITIONS, --pathway_definitions PATHWAY_DEFINITIONS
path/to/pathway_definitions.tsv. [id_pathway]<tab>[definition], No header.
-n PATHWAY_NAMES, --pathway_names PATHWAY_NAMES
path/to/pathway_names.tsv [id_pathway]<tab>[name], No header.
-c PATHWAY_CLASSES, --pathway_classes PATHWAY_CLASSES
path/to/pathway_classes.tsv. [id_pathway]<tab>[class], No header.
Download arguments:
--download Download directly from http://rest.kegg.jp/
--intermediate_directory INTERMEDIATE_DIRECTORY
Write the intermediate files from http://rest.kegg.jp/ to a directory. If 'auto' then download to the directory that contains --database called `pathway_data`.
--no_intermediate_files
Don't write intermediate files
Copyright 2024 New Atlantis Labs (jolespin@newatlantis.io)
```
## Acknowledgements:
[Ekaterina Sakharova](https://github.com/KateSakharova) the developer for the original implementation [kegg-pathways-completeness-tool](https://github.com/EBI-Metagenomics/kegg-pathways-completeness-tool).
Raw data
{
"_id": null,
"home_page": "https://github.com/jolespin/kegg_pathway_profiler",
"name": "kegg-pathway-profiler",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": null,
"keywords": null,
"author": "Josh L. Espinoza",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/15/6f/1c82ded14fb5a601937dd1ddd87e2d4a2178234b3a293f79457a19fe5aec/kegg_pathway_profiler-2024.10.16.tar.gz",
"platform": null,
"description": "# KEGG Pathway Profiler\n`KEGG Pathway Profiler` is a pathway profiling tool designed for traversing metabolic pathway graphs, identifying most complete paths based on an evaluation set of KEGG orthologs (KO), and generalized for internal usage within Python and via CLI executables. This package is a reimplementation of [kegg-pathways-completeness-tool](https://github.com/EBI-Metagenomics/kegg-pathways-completeness-tool) (e.g., base code and theory). \n\nFor any publications or usage, please cite the original implementation and credit the lead developer (See [Acknowledgements](#acknowledgements) below).\n\n## Installation:\n\n```\npip install kegg_pathway_profiler\n```\n\n## Dependencies:\n\n```\nnetworkx>=3.0\nnumpy>=1.9\nscipy>=1.11\npandas>=1.0\ntqdm\n```\n\n## CLI Usage:\n\n### Fetching and building the database:\n\n**Option 1:** \n```\n# Download and build the database\n# Default: site-packages/kegg_pathway_profiler/data/database.pkl.gz\nbuild-pathway-database.py \\\n -d data/database.pkl.gz \\ \n --download \\\n```\n\n**Option 2:**\n```\n# Fetch the database\nmkdir -p data/\ndownload-kegg-pathways.sh data/\n\n# Build the database\nbuild-pathway-database.py \\\n -d data/database.pkl.gz \\\n -i data/pathway_definitions.tsv \\\n -n data/pathway_names.tsv \\\n -c data/pathway_classes.tsv \\\n```\n\n### Profile pathway coverage\n\n#### Running:\n```\nprofile-pathway-coverage.py -i data/test/kos.genomes.tsv -o data/test/pathway-profiler_output -d data/database.pkl.gz\n```\n\n\n#### Output:\n\n\n## Python Usage:\n\n### Evaluate pathway coverage\n```python\nimport kegg_pathway_profiler as kpp\n# Load Database\ndatabase = kpp.utils.read_pickle(\"data/database.pkl.gz\")\nid = \"M00001\"\npathway = kpp.pathways.Pathway(\n id=id, \n definition=database[id][\"definition\"],\n name=database[id][\"name\"],\n classes=database[id][\"classes\"],\n)\npathway\n# ==================\n# Pathway(id:M00001)\n# ==================\n# Properties:\n# - name: Glycolysis (Embden-Meyerhof pathway), glucose => pyruvate\n# - classes: Pathway modules; Carbohydrate metabolism; Central carbohydrate metabolism\n# - number_of_kos: 32\n# Definition:\n# (K00844,K12407,K00845,K25026,K00886,K08074,K00918) (K01810,K06859,K13810,K15916) (K00850,K16370,K21071,K00918) (K01623,K01624,K11645,K16305,K16306) K01803 ((K00134,K00150) K00927,K11389) (K01834,K15633,K15634,K15635) (K01689,K27394) (K00873,K12406)\n\n# Evaluate\nevaluation_kos = {'K00134',\n 'K00150',\n 'K00844',\n 'K00845',\n 'K00850',\n 'K00873',\n 'K00886',\n 'K00918',\n 'K00927',\n 'K01623',\n 'K01624',\n 'K01689',\n 'K16370',\n 'K21071',\n 'K25026',\n 'K27394',\n}\nresults = pathway.evaluate(evaluation_kos)\n\n# Get coverage only\nresults[\"coverage\"]\n# 0.6666666666666667\n\n# Get most complete path KOs\nresults[\"most_complete_path\"]\n# ['K00844',\n# 'K01810',\n# 'K00850',\n# 'K01623',\n# 'K01803',\n# 'K00134',\n# 'K00927',\n# 'K01834',\n# 'K01689',\n# 'K00873']\n\n```\n\n### Most complete path set enrichment (e.g., step enrichment)\n\n```python\ndf_enrichment = kpp.enrichment.unweighted_pathway_enrichment_wrapper(\n evaluation_kos=evaluation_kos, \n database=database,\n background_set=None,\n)\n```\n\n\n\n\n## Documentation: \n\n### profile-pathway-coverage.py\n```\nusage: profile-pathway-coverage.py\n\n Running: profile-pathway-coverage.py v3.10.14 via Python v/Users/jolespin/miniconda3/envs/kegg_pathway_profiler_env/bin/python3.10 | profile-pathway-coverage.py\n\noptions:\n -h, --help show this help message and exit\n\nI/O arguments:\n -i KOS, --kos KOS path/to/kos.list[.gz]. Can either be 1 KO per line or a tab-separated table with the following structure: [id_genome]<tab>[id_ko], No header.\n -n NAME, --name NAME Name of genome. [Default: Filename for --kos]\n -o OUTPUT_DIRECTORY, --output_directory OUTPUT_DIRECTORY\n path/to/output_directory/ (e.g., kegg_pathway_profiler_output/]\n -d DATABASE, --database DATABASE\n path/to/database.pkl[.gz] [Default: /Users/jolespin/miniconda3/envs/kegg_pathway_profiler_env/lib/python3.10/site-packages/kegg_pathway_profiler/data/database.pkl.gz]\n --index_name INDEX_NAME\n Index name for coverage table (e.g., id_genome, id_genome_cluster, id_contig) [Default: id_genome]\n\nCopyright 2024 New Atlantis Labs (jolespin@newatlantis.io)\n```\n\n### build-pathway-database.py\n```\nusage: build-pathway-database.py\n\n Running: build-pathway-database.py v3.10.14 via Python v/Users/jolespin/miniconda3/envs/kegg_pathway_profiler_env/bin/python3.10 | build-pathway-database.py\n\noptions:\n -h, --help show this help message and exit\n\nLocal arguments:\n -d DATABASE, --database DATABASE\n path/to/database.pkl[.gz] [Default: /Users/jolespin/miniconda3/envs/kegg_pathway_profiler_env/lib/python3.10/site-packages/kegg_pathway_profiler/data/database.pkl.gz]\n -V DATABASE_VERSION, --database_version DATABASE_VERSION\n Database version: Adds version information to the following file: path/to/database.version where .pkl extensions are removed [Default: KEGG_v2024.8.23]\n -f, --force If file exists, then remove file and update it.\n\nLocal arguments:\n -i PATHWAY_DEFINITIONS, --pathway_definitions PATHWAY_DEFINITIONS\n path/to/pathway_definitions.tsv. [id_pathway]<tab>[definition], No header.\n -n PATHWAY_NAMES, --pathway_names PATHWAY_NAMES\n path/to/pathway_names.tsv [id_pathway]<tab>[name], No header.\n -c PATHWAY_CLASSES, --pathway_classes PATHWAY_CLASSES\n path/to/pathway_classes.tsv. [id_pathway]<tab>[class], No header.\n\nDownload arguments:\n --download Download directly from http://rest.kegg.jp/\n --intermediate_directory INTERMEDIATE_DIRECTORY\n Write the intermediate files from http://rest.kegg.jp/ to a directory. If 'auto' then download to the directory that contains --database called `pathway_data`.\n --no_intermediate_files\n Don't write intermediate files\n\nCopyright 2024 New Atlantis Labs (jolespin@newatlantis.io)\n```\n\n\n## Acknowledgements: \n[Ekaterina Sakharova](https://github.com/KateSakharova) the developer for the original implementation [kegg-pathways-completeness-tool](https://github.com/EBI-Metagenomics/kegg-pathways-completeness-tool).\n",
"bugtrack_url": null,
"license": "GPLv3",
"summary": "KEGG Pathway Profiler",
"version": "2024.10.16",
"project_urls": {
"Homepage": "https://github.com/jolespin/kegg_pathway_profiler"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "156f1c82ded14fb5a601937dd1ddd87e2d4a2178234b3a293f79457a19fe5aec",
"md5": "efde61d23b6a4085873f952e692355de",
"sha256": "8c6a3a09c3e9f231216317ddc74d3bff40c7c629aee77ad595c9b592bd1b644c"
},
"downloads": -1,
"filename": "kegg_pathway_profiler-2024.10.16.tar.gz",
"has_sig": false,
"md5_digest": "efde61d23b6a4085873f952e692355de",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 24849,
"upload_time": "2024-10-16T21:28:21",
"upload_time_iso_8601": "2024-10-16T21:28:21.993846Z",
"url": "https://files.pythonhosted.org/packages/15/6f/1c82ded14fb5a601937dd1ddd87e2d4a2178234b3a293f79457a19fe5aec/kegg_pathway_profiler-2024.10.16.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-16 21:28:21",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "jolespin",
"github_project": "kegg_pathway_profiler",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "kegg-pathway-profiler"
}