gemcat


Namegemcat JSON
Version 1.4.0 PyPI version JSON
download
home_pageNone
SummaryA toolbox for gene expression-based prediction of metabolic alterations
upload_time2024-10-05 21:07:38
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseNone
keywords python bioinformatics modeling metabolites omics
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # GEMCAT: Gene Expression-based Metabolite Centrality Analyses Tool
A computational toolbox associated with the manuscript entitled _GEMCAT — A new algorithm for gene expression-based prediction of metabolic alterations_. 
Cite using: https://www.biorxiv.org/content/10.1101/2024.01.15.575710v1

Note: We are still refining the tool. Particularly, GEMCAT does not yet provide guidance for significance of predicted changes or any other measure of prediction quality. We suggest filtering the predictions for consistency. We do not recommend pre-filtering the transcriptomics and proteomics data based on significance as this is affecting the network coverage which might negatively impact the prediction quality as genes/proteins not present in the dataset should be unchanged. 

## Compatibility
We tested the package for compatibility with Python >= 3.10 on Ubuntu and Windows.

## Installation
Install from pip:

```pip install gemcat```

Or clone the repository and install GEMCAT from there using:  

```pip install .```


## Usage

### Standard workflow from the Command-Line Interface (CLI)

Use a single file containing per-gene fold-changes to calculate the resulting differential centralities:
``` gemcat <./expression_file.csv> <./model_file.xml> -e <column_name> -o <result_file.csv>```
Make sure the .csv file is either comma- or tab-delimited.
`column_name` is the name of the column in the file containing the fold-change.

Alternatively, use two files (or one file) with expression values for condition and baseline:
``` gemcat <./condition_file.csv> <./model_file.xml> -e <condition_column_name> -b <./baseline_file> -c <baseline_column_name> -o <result_file.csv>```

If you do not have a model file ready, some models can be automatically accessed using their names:
``` gemcat ./expression_file.csv <model_name> -e column_name -o <result_file.csv>```

Model names currently supported are:
- ```recon3d```: [Recon3D](http://bigg.ucsd.edu/models/Recon3D)
- ```ratgem```: [Rat-GEM](https://github.com/SysBioChalmers/Rat-GEM)


Currently, GEMCAT supports models in SBML, JSON, and MAT formats.

Important points to remember:
Your gene or protein identifiers should be the first column of the expression file.
Make sure the gene or protein identifiers in your expression data file exactly match those in the model.
A results list of all 1.0 is a sure sign of no identifier matching.

Positional arguments:
- expression file path
- model file path

All parameters:
`-e --expressioncolumn` name of column containing condition expression data
`-b BASELINE, --baseline` file containing baseline expression data
`-c BASELINECOLUMN, --baselinecolumn` name of column containing baseline expression data
`-v VERBOSE, --verbose` enables verbose output
`-o OUTFILE, --outfile` write output to this file
`-l LOGFILE, --logfile` write logs to this file


### Standard workflow in Python using a CobraPy model
```
import gemcat as gc
results = gc.workflows.workflow_standard(
  cobra_model: cobra.Model,
  mapped_genes_baseline: pd.Series,
  mapped_genes_comparison: pd.Series,
  adjacency = gc.adjacency_transformation.ATPureAdjacency,
  ranking = gc.ranking.PagerankNX,
  gene_fill = 1.0
)
```
This will return the changes in centrality relative to the baseline in a Pandas Series.
When using fold-changes as the mapped expression, use a vector of all ones as a comparison.

## Modularity and Configuration
GEMCAT is modular, and its central components can easily be swapped out or appended by other components 
adhering to the specifications laid out in the module base classes (primarily adjacency transformation, expression integration, and ranking components).
All classes inheriting from the abstract base classes laid out in the modules are exchangeable.

## Core modules
### Model
The core of the package is the GEMCAT model structure that contains the model data, integrates the workflow, and calculates the results.
### adjacency_transformation
Different approaches can be used to calculate adjacency in the networks.
We offer alternatives and a platform to create custom algorithms for the model.
### expression
Module covering the mapping of gene values onto reactions in the model via gene product rules.
Providing different algorithms along with a platform to create alternatives.
### ranking
Module providing ranking algorithms for the models along with a platform to include custom algorithms.
### workflows
The workflow module contains example workflows.
To customize the workflow to your needs simply copy the provided functions and switch out the desired steps.
### cli
Command-line interface for GEMCAT.
### io
Input and output functions that create GEMCAT models from different sources.
### utils
Contains common utility functions used throughout the package.
### verification
Functions to verify data integrity.
### model_manager
Functionality for automatic downloading, storing, and retrieving of common models.


## Development
You can run all local tests with `pytest .`. Default behavior is to also run integration tests, which takes time.
You can exclude slow running tests by using `pytest . -m "not slow"`.
These slow running tests are integration tests with _real world data_ and will take 10-30s each according to your hardware.

To run tests, make sure you have [git lfs](https://git-lfs.com/) installed and all the Tests are running.
Make sure to run `isort` and `black` to have properly formatted code.

The CI pipeline in GitHub will check with isort, black, and pytest.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "gemcat",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "python, bioinformatics, modeling, metabolites, omics",
    "author": null,
    "author_email": "Roland Sauter <roland.sauter@uit.no>",
    "download_url": "https://files.pythonhosted.org/packages/16/3f/fac8e6978215884ffc1faaedfb32e6872bf27a4ce2fca5d97e8e541d6cc2/gemcat-1.4.0.tar.gz",
    "platform": null,
    "description": "# GEMCAT: Gene Expression-based Metabolite Centrality Analyses Tool\nA computational toolbox associated with the manuscript entitled _GEMCAT \u2014 A new algorithm for gene expression-based prediction of metabolic alterations_. \nCite using: https://www.biorxiv.org/content/10.1101/2024.01.15.575710v1\n\nNote: We are still refining the tool. Particularly, GEMCAT does not yet provide guidance for significance of predicted changes or any other measure of prediction quality. We suggest filtering the predictions for consistency. We do not recommend pre-filtering the transcriptomics and proteomics data based on significance as this is affecting the network coverage which might negatively impact the prediction quality as genes/proteins not present in the dataset should be unchanged. \n\n## Compatibility\nWe tested the package for compatibility with Python >= 3.10 on Ubuntu and Windows.\n\n## Installation\nInstall from pip:\n\n```pip install gemcat```\n\nOr clone the repository and install GEMCAT from there using:  \n\n```pip install .```\n\n\n## Usage\n\n### Standard workflow from the Command-Line Interface (CLI)\n\nUse a single file containing per-gene fold-changes to calculate the resulting differential centralities:\n``` gemcat <./expression_file.csv> <./model_file.xml> -e <column_name> -o <result_file.csv>```\nMake sure the .csv file is either comma- or tab-delimited.\n`column_name` is the name of the column in the file containing the fold-change.\n\nAlternatively, use two files (or one file) with expression values for condition and baseline:\n``` gemcat <./condition_file.csv> <./model_file.xml> -e <condition_column_name> -b <./baseline_file> -c <baseline_column_name> -o <result_file.csv>```\n\nIf you do not have a model file ready, some models can be automatically accessed using their names:\n``` gemcat ./expression_file.csv <model_name> -e column_name -o <result_file.csv>```\n\nModel names currently supported are:\n- ```recon3d```: [Recon3D](http://bigg.ucsd.edu/models/Recon3D)\n- ```ratgem```: [Rat-GEM](https://github.com/SysBioChalmers/Rat-GEM)\n\n\nCurrently, GEMCAT supports models in SBML, JSON, and MAT formats.\n\nImportant points to remember:\nYour gene or protein identifiers should be the first column of the expression file.\nMake sure the gene or protein identifiers in your expression data file exactly match those in the model.\nA results list of all 1.0 is a sure sign of no identifier matching.\n\nPositional arguments:\n- expression file path\n- model file path\n\nAll parameters:\n`-e --expressioncolumn` name of column containing condition expression data\n`-b BASELINE, --baseline` file containing baseline expression data\n`-c BASELINECOLUMN, --baselinecolumn` name of column containing baseline expression data\n`-v VERBOSE, --verbose` enables verbose output\n`-o OUTFILE, --outfile` write output to this file\n`-l LOGFILE, --logfile` write logs to this file\n\n\n### Standard workflow in Python using a CobraPy model\n```\nimport gemcat as gc\nresults = gc.workflows.workflow_standard(\n  cobra_model: cobra.Model,\n  mapped_genes_baseline: pd.Series,\n  mapped_genes_comparison: pd.Series,\n  adjacency = gc.adjacency_transformation.ATPureAdjacency,\n  ranking = gc.ranking.PagerankNX,\n  gene_fill = 1.0\n)\n```\nThis will return the changes in centrality relative to the baseline in a Pandas Series.\nWhen using fold-changes as the mapped expression, use a vector of all ones as a comparison.\n\n## Modularity and Configuration\nGEMCAT is modular, and its central components can easily be swapped out or appended by other components \nadhering to the specifications laid out in the module base classes (primarily adjacency transformation, expression integration, and ranking components).\nAll classes inheriting from the abstract base classes laid out in the modules are exchangeable.\n\n## Core modules\n### Model\nThe core of the package is the GEMCAT model structure that contains the model data, integrates the workflow, and calculates the results.\n### adjacency_transformation\nDifferent approaches can be used to calculate adjacency in the networks.\nWe offer alternatives and a platform to create custom algorithms for the model.\n### expression\nModule covering the mapping of gene values onto reactions in the model via gene product rules.\nProviding different algorithms along with a platform to create alternatives.\n### ranking\nModule providing ranking algorithms for the models along with a platform to include custom algorithms.\n### workflows\nThe workflow module contains example workflows.\nTo customize the workflow to your needs simply copy the provided functions and switch out the desired steps.\n### cli\nCommand-line interface for GEMCAT.\n### io\nInput and output functions that create GEMCAT models from different sources.\n### utils\nContains common utility functions used throughout the package.\n### verification\nFunctions to verify data integrity.\n### model_manager\nFunctionality for automatic downloading, storing, and retrieving of common models.\n\n\n## Development\nYou can run all local tests with `pytest .`. Default behavior is to also run integration tests, which takes time.\nYou can exclude slow running tests by using `pytest . -m \"not slow\"`.\nThese slow running tests are integration tests with _real world data_ and will take 10-30s each according to your hardware.\n\nTo run tests, make sure you have [git lfs](https://git-lfs.com/) installed and all the Tests are running.\nMake sure to run `isort` and `black` to have properly formatted code.\n\nThe CI pipeline in GitHub will check with isort, black, and pytest.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A toolbox for gene expression-based prediction of metabolic alterations",
    "version": "1.4.0",
    "project_urls": null,
    "split_keywords": [
        "python",
        " bioinformatics",
        " modeling",
        " metabolites",
        " omics"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "92dd8c006c8f64dbf14851f36ab8d8c0bed82b336e619bfceea32b1e4ef1d396",
                "md5": "56904c4e42f7440811d3e2f4c581d682",
                "sha256": "5993f2412a8e123c3e8eeb975b2a1b0a39f213c401bc5bdf8c128e5593d7952e"
            },
            "downloads": -1,
            "filename": "gemcat-1.4.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "56904c4e42f7440811d3e2f4c581d682",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 23301,
            "upload_time": "2024-10-05T21:07:36",
            "upload_time_iso_8601": "2024-10-05T21:07:36.446401Z",
            "url": "https://files.pythonhosted.org/packages/92/dd/8c006c8f64dbf14851f36ab8d8c0bed82b336e619bfceea32b1e4ef1d396/gemcat-1.4.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "163ffac8e6978215884ffc1faaedfb32e6872bf27a4ce2fca5d97e8e541d6cc2",
                "md5": "7f3362364edbc19ce9143477c6d3d96b",
                "sha256": "add378274e693c9fc6e9e93ae3aa5d6cdc900c0c13f2159e33bb3b0503e14cd1"
            },
            "downloads": -1,
            "filename": "gemcat-1.4.0.tar.gz",
            "has_sig": false,
            "md5_digest": "7f3362364edbc19ce9143477c6d3d96b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 140224,
            "upload_time": "2024-10-05T21:07:38",
            "upload_time_iso_8601": "2024-10-05T21:07:38.255780Z",
            "url": "https://files.pythonhosted.org/packages/16/3f/fac8e6978215884ffc1faaedfb32e6872bf27a4ce2fca5d97e8e541d6cc2/gemcat-1.4.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-05 21:07:38",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "gemcat"
}
        
Elapsed time: 1.20489s