khipu-metabolomics


Namekhipu-metabolomics JSON
Version 0.7.5 PyPI version JSON
download
home_pagehttps://github.com/shuzhao-li/khipu
SummaryCommon utilities for interpreting mass spectrometry data
upload_time2023-10-02 02:48:06
maintainer
docs_urlNone
authorShuzhao Li
requires_python>=3.7
licenseBSD
keywords chemistry bioinformatics mass spectrometry
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # khipu: generalized tree structure to annotate untargeted metabolomics and stable isotope tracing data

[![Documentation Status](https://readthedocs.org/projects/khipu/badge/?version=latest)](https://khipu.readthedocs.io/en/latest/?badge=latest)
[![DOI](https://img.shields.io/badge/DOI-doi%2F10.1021%2Facs.analchem.2c05810-blue)](https://pubs.acs.org/doi/10.1021/acs.analchem.2c05810)


Pre-annotation tool to annotate degenerate ions in relationships to the original compound and infer neutral mass. 

This applies to regular LC-MS data, but also enables easy analysis of isotope tracing and chemical labeling data.

![khipugram](doc/khipugram.png)

## Implementation overview
Khipu is developed as an open source Python 3 package, and available to install from the standard PyPi repository via the pip tool. It is freely available on GitHub (https://github.com/shuzhao-li/khipu) under a BSD 3-Clause License. The graph operations are supported by the networkx library, tree visualization aided by the treelib library. Khipu uses our package mass2chem for search functions. The data model of “empirical compound” is described in the metDataModel package. The package is designed in a modular way to encourage reuse.

The classes of Weavor and Khipu contain main algorithms, supported by numerous utility functions. All functions are documented in the source via docstrings. Examples of reuse are given in wrapper functions and in Jupyter notebooks. It can be run as a standalone command line tool. Users can use a feature table from any preprocessing tool as input and get annotated empirical compounds in JSON and tab delimited formats.

## Installation and Use
Install as a package (some systems may require pip3):

    pip install khipu-metabolomics

Run as a command line tool after installation:

    khipu -i testdata/ecoli_pos.tsv -o this_test

This will output pre-annotation to two files of JSON and tab delimited formats, this_test.json and this_test.tsv.

Run from source code:

    python3 -m khipu.main -i testdata/ecoli_pos.tsv -o this_test

Run test:

    python3 -m khipu.test
    (This downloads and uses test data from GitHub.)

Best used as a library for software development or in a Jupyter Notebook for data analysis. 

## Demo notebooks
We have provided multiple demo notebooks under

    notebooks/

They include algorithm demostrations, data analysis examples, use of custom isotope and adduct patterns.

## Algorithm overview 
1. Start with an initial list of isotope patterns and adduct patterns (see khipu grid below). Search feature list to get all pairs that match any of the pattern. The initial adduct patterns are trimmed to reduce ambiguity. 

2. Connect all pattern-matched feature pairs to an overall network, which is further partitioned into connected subnetworks.

3. Each subnetwork becomes a khipu instance. The subnetwork is inspected, redundant nodes removed, and converted to an optimal tree structure (see below). A khipu is essentially an 'empirical compound' that is used for downstream annotation and analysis.

4. This library supports tree and grid visualization in plain text. Once imported to a Jupyter Notebook, one can use enhanced visualization schemes. 

5. The library can also be used by others for extended tools. Our data processing tool, asari, uses khipu for preannotation. Additional documentation, more for developers, is provided under `doc/`.

## Assignment of ion species in a khipu to grid
1. Separate isotope edges and adduct edges.
2. The isotope edges form their own groups by shared nodes, each group belong to one adduct type. Each group of connected isotope edges is treated as one "branch".
3. Establish a "trunk" of adducts with a root and a path for adducts, by optimizing the number of nodes explained.
4. Assign each isotopic branch to the adduct trunk.
5. Re-align isotopes in all branches to establish optimal match to the khipu grid. 
6. Based on available ions and the theoretical "khipu grid", the neutral mass can be obtained via linear regression. 

Some ions may come into the initial network by mistakes or unresolved signals.
The are removed from the established khipu, and sent off to form a new khipu.

## The khipu grid
Initial grid may look like this:

                   M+H[+]   M+NH4[+]   M+Na[+]   M+HCl+H[+]   M+K[+]   M+ACN+H[+]
    M0           1.007276  18.033826  22.989276  36.983976  38.963158  42.033825
    13C/12C      2.010631  19.037181  23.992631  37.987331  39.966513  43.037180
    13C/12C*2    3.013986  20.040536  24.995986  38.990686  40.969868  44.040535
    13C/12C*3    4.017341  21.043891  25.999341  39.994041  41.973223  45.043890
    13C/12C*4    5.020696  22.047246  27.002696  40.997396  42.976578  46.047245
    13C/12C*5    6.024051  23.050601  28.006051  42.000751  43.979933  47.050600
    13C/12C*6    7.027406  24.053956  29.009406  43.004106  44.983288  48.053955

This can be extended by searching for additional ions. But the core construction should be done first.

## Applicable to isotope tracing
The search pattern for isotopes is often dependent on the biochemical experiment.
Users can overwrite the default by supplying their search patterns (see demo notebooks).
Search patterns are separate from search functions, lending flexibility to data analysis.

The next step is to apply Khipu to chemical derivatization experiments.
In chemical derivatization experiments, the origin compound and derivatized compound can be both measured in the LC-MS data.
We have separate khipu trees for each, then link them by the m/z shift from derivatization.
Because derivatization is a reaction that occurs before LC-MS, and
LC-MS measures whatever compounds that are present in samples.

## Test data
Three datasets are included under testdata/. All three tables were generated by asari v1.9.2.
The automated khipu.test downloads ecoli_pos.tsv from GitHub remotely.
- The ecoli_pos.tsv was generated by Li lab using the credentialed E. coli sample from Cambridge Isotopes.
- The yeast datasets were from the NetID paper by Rabinowitz lab. The yeast_neg table is features that are filted by SNR > 100 to serve as a cleaner demo.

Input tables are tab delimited text files.
The first columns are feature ID, m/z, rtime, followed by intensities.
Users can specify the start column and end column of intensity data.

## Detailed use of command and parameters

    >>> khipu -h

    usage: main.py [-h] [-v] [-m MODE] [--ppm PPM] [--rtol RTOL] [-i INPUT]
                [-s START] [-e END] [-o OUTPUT]

    khipu, annotating metabolomics features to empCpds

    optional arguments:
    -h, --help            show this help message and exit
    -v, --version         print version and exit
    -m MODE, --mode MODE  mode of ionization, pos or neg
    --ppm PPM             mass precision in ppm (part per million), same as
                            mz_tolerance_ppm
    --rtol RTOL           tolerance of retention time match, arbitrary unit
                            dependent on preprocessing tool
    -i INPUT, --input INPUT
                            input file as feature table
    -s START, --start START
                            start column for intensity in input table
    -e END, --end END     end column for intensity in input table
    -o OUTPUT, --output OUTPUT
                            prefix of output files


## What's "khipu"?
Khipu is a recording device using knots, often 2-level of strings,
historically used by people in Andean South America, includign Inca (https://en.wikipedia.org/wiki/Quipu).
The format is similar to how we represent isotopes and adducts in our data.
We chose "khipu" over the spelling of "quipu", to pay respect to the indigenous people.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/shuzhao-li/khipu",
    "name": "khipu-metabolomics",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "chemistry,bioinformatics,mass spectrometry",
    "author": "Shuzhao Li",
    "author_email": "shuzhao.li@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/51/e9/5c5c0a9b18a6cdf43301867fa6982f5b2e71bdc33b0bede60b4a7b9b9657/khipu-metabolomics-0.7.5.tar.gz",
    "platform": null,
    "description": "# khipu: generalized tree structure to annotate untargeted metabolomics and stable isotope tracing data\n\n[![Documentation Status](https://readthedocs.org/projects/khipu/badge/?version=latest)](https://khipu.readthedocs.io/en/latest/?badge=latest)\n[![DOI](https://img.shields.io/badge/DOI-doi%2F10.1021%2Facs.analchem.2c05810-blue)](https://pubs.acs.org/doi/10.1021/acs.analchem.2c05810)\n\n\nPre-annotation tool to annotate degenerate ions in relationships to the original compound and infer neutral mass. \n\nThis applies to regular LC-MS data, but also enables easy analysis of isotope tracing and chemical labeling data.\n\n![khipugram](doc/khipugram.png)\n\n## Implementation overview\nKhipu is developed as an open source Python 3 package, and available to install from the standard PyPi repository via the pip tool. It is freely available on GitHub (https://github.com/shuzhao-li/khipu) under a BSD 3-Clause License. The graph operations are supported by the networkx library, tree visualization aided by the treelib library. Khipu uses our package mass2chem for search functions. The data model of \u201cempirical compound\u201d is described in the metDataModel package. The package is designed in a modular way to encourage reuse.\n\nThe classes of Weavor and Khipu contain main algorithms, supported by numerous utility functions. All functions are documented in the source via docstrings. Examples of reuse are given in wrapper functions and in Jupyter notebooks. It can be run as a standalone command line tool. Users can use a feature table from any preprocessing tool as input and get annotated empirical compounds in JSON and tab delimited formats.\n\n## Installation and Use\nInstall as a package (some systems may require pip3):\n\n    pip install khipu-metabolomics\n\nRun as a command line tool after installation:\n\n    khipu -i testdata/ecoli_pos.tsv -o this_test\n\nThis will output pre-annotation to two files of JSON and tab delimited formats, this_test.json and this_test.tsv.\n\nRun from source code:\n\n    python3 -m khipu.main -i testdata/ecoli_pos.tsv -o this_test\n\nRun test:\n\n    python3 -m khipu.test\n    (This downloads and uses test data from GitHub.)\n\nBest used as a library for software development or in a Jupyter Notebook for data analysis. \n\n## Demo notebooks\nWe have provided multiple demo notebooks under\n\n    notebooks/\n\nThey include algorithm demostrations, data analysis examples, use of custom isotope and adduct patterns.\n\n## Algorithm overview \n1. Start with an initial list of isotope patterns and adduct patterns (see khipu grid below). Search feature list to get all pairs that match any of the pattern. The initial adduct patterns are trimmed to reduce ambiguity. \n\n2. Connect all pattern-matched feature pairs to an overall network, which is further partitioned into connected subnetworks.\n\n3. Each subnetwork becomes a khipu instance. The subnetwork is inspected, redundant nodes removed, and converted to an optimal tree structure (see below). A khipu is essentially an 'empirical compound' that is used for downstream annotation and analysis.\n\n4. This library supports tree and grid visualization in plain text. Once imported to a Jupyter Notebook, one can use enhanced visualization schemes. \n\n5. The library can also be used by others for extended tools. Our data processing tool, asari, uses khipu for preannotation. Additional documentation, more for developers, is provided under `doc/`.\n\n## Assignment of ion species in a khipu to grid\n1. Separate isotope edges and adduct edges.\n2. The isotope edges form their own groups by shared nodes, each group belong to one adduct type. Each group of connected isotope edges is treated as one \"branch\".\n3. Establish a \"trunk\" of adducts with a root and a path for adducts, by optimizing the number of nodes explained.\n4. Assign each isotopic branch to the adduct trunk.\n5. Re-align isotopes in all branches to establish optimal match to the khipu grid. \n6. Based on available ions and the theoretical \"khipu grid\", the neutral mass can be obtained via linear regression. \n\nSome ions may come into the initial network by mistakes or unresolved signals.\nThe are removed from the established khipu, and sent off to form a new khipu.\n\n## The khipu grid\nInitial grid may look like this:\n\n                   M+H[+]   M+NH4[+]   M+Na[+]   M+HCl+H[+]   M+K[+]   M+ACN+H[+]\n    M0           1.007276  18.033826  22.989276  36.983976  38.963158  42.033825\n    13C/12C      2.010631  19.037181  23.992631  37.987331  39.966513  43.037180\n    13C/12C*2    3.013986  20.040536  24.995986  38.990686  40.969868  44.040535\n    13C/12C*3    4.017341  21.043891  25.999341  39.994041  41.973223  45.043890\n    13C/12C*4    5.020696  22.047246  27.002696  40.997396  42.976578  46.047245\n    13C/12C*5    6.024051  23.050601  28.006051  42.000751  43.979933  47.050600\n    13C/12C*6    7.027406  24.053956  29.009406  43.004106  44.983288  48.053955\n\nThis can be extended by searching for additional ions. But the core construction should be done first.\n\n## Applicable to isotope tracing\nThe search pattern for isotopes is often dependent on the biochemical experiment.\nUsers can overwrite the default by supplying their search patterns (see demo notebooks).\nSearch patterns are separate from search functions, lending flexibility to data analysis.\n\nThe next step is to apply Khipu to chemical derivatization experiments.\nIn chemical derivatization experiments, the origin compound and derivatized compound can be both measured in the LC-MS data.\nWe have separate khipu trees for each, then link them by the m/z shift from derivatization.\nBecause derivatization is a reaction that occurs before LC-MS, and\nLC-MS measures whatever compounds that are present in samples.\n\n## Test data\nThree datasets are included under testdata/. All three tables were generated by asari v1.9.2.\nThe automated khipu.test downloads ecoli_pos.tsv from GitHub remotely.\n- The ecoli_pos.tsv was generated by Li lab using the credentialed E. coli sample from Cambridge Isotopes.\n- The yeast datasets were from the NetID paper by Rabinowitz lab. The yeast_neg table is features that are filted by SNR > 100 to serve as a cleaner demo.\n\nInput tables are tab delimited text files.\nThe first columns are feature ID, m/z, rtime, followed by intensities.\nUsers can specify the start column and end column of intensity data.\n\n## Detailed use of command and parameters\n\n    >>> khipu -h\n\n    usage: main.py [-h] [-v] [-m MODE] [--ppm PPM] [--rtol RTOL] [-i INPUT]\n                [-s START] [-e END] [-o OUTPUT]\n\n    khipu, annotating metabolomics features to empCpds\n\n    optional arguments:\n    -h, --help            show this help message and exit\n    -v, --version         print version and exit\n    -m MODE, --mode MODE  mode of ionization, pos or neg\n    --ppm PPM             mass precision in ppm (part per million), same as\n                            mz_tolerance_ppm\n    --rtol RTOL           tolerance of retention time match, arbitrary unit\n                            dependent on preprocessing tool\n    -i INPUT, --input INPUT\n                            input file as feature table\n    -s START, --start START\n                            start column for intensity in input table\n    -e END, --end END     end column for intensity in input table\n    -o OUTPUT, --output OUTPUT\n                            prefix of output files\n\n\n## What's \"khipu\"?\nKhipu is a recording device using knots, often 2-level of strings,\nhistorically used by people in Andean South America, includign Inca (https://en.wikipedia.org/wiki/Quipu).\nThe format is similar to how we represent isotopes and adducts in our data.\nWe chose \"khipu\" over the spelling of \"quipu\", to pay respect to the indigenous people.\n",
    "bugtrack_url": null,
    "license": "BSD",
    "summary": "Common utilities for interpreting mass spectrometry data",
    "version": "0.7.5",
    "project_urls": {
        "Homepage": "https://github.com/shuzhao-li/khipu"
    },
    "split_keywords": [
        "chemistry",
        "bioinformatics",
        "mass spectrometry"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8d2c2ee0b3761092aa39796fc99a42b3025b36c15d8a1e8879343b646fbd4bcb",
                "md5": "7aa2cdde0aa05c1c08c0102079340dee",
                "sha256": "1e485a2ad0148a64f547414c565d3f82d95a5575fb0195537c4778883adf424d"
            },
            "downloads": -1,
            "filename": "khipu_metabolomics-0.7.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "7aa2cdde0aa05c1c08c0102079340dee",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 1825772,
            "upload_time": "2023-10-02T02:48:02",
            "upload_time_iso_8601": "2023-10-02T02:48:02.441425Z",
            "url": "https://files.pythonhosted.org/packages/8d/2c/2ee0b3761092aa39796fc99a42b3025b36c15d8a1e8879343b646fbd4bcb/khipu_metabolomics-0.7.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "51e95c5c0a9b18a6cdf43301867fa6982f5b2e71bdc33b0bede60b4a7b9b9657",
                "md5": "7d86936aa8883d5e389c1082b69022d1",
                "sha256": "5c89c28d30fccefb1e468d8fbcb3637bd452f002438291f7eb910a762ceb99f1"
            },
            "downloads": -1,
            "filename": "khipu-metabolomics-0.7.5.tar.gz",
            "has_sig": false,
            "md5_digest": "7d86936aa8883d5e389c1082b69022d1",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 1781468,
            "upload_time": "2023-10-02T02:48:06",
            "upload_time_iso_8601": "2023-10-02T02:48:06.677384Z",
            "url": "https://files.pythonhosted.org/packages/51/e9/5c5c0a9b18a6cdf43301867fa6982f5b2e71bdc33b0bede60b4a7b9b9657/khipu-metabolomics-0.7.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-10-02 02:48:06",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "shuzhao-li",
    "github_project": "khipu",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "khipu-metabolomics"
}
        
Elapsed time: 0.20150s