glytrait


Nameglytrait JSON
Version 0.1.7 PyPI version JSON
download
home_pagehttps://github.com/fubin1999/glytrait
SummaryA tool for calculating derived traits for N-glycome
upload_time2024-05-22 04:51:17
maintainerNone
docs_urlNone
authorfubin1999
requires_python<3.13,>=3.10
licenseMIT
keywords bioinformatics glycomics biology
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ![logo](img/logo.png)

# GlyTrait

[![PyPI - Version](https://img.shields.io/pypi/v/glytrait)](https://pypi.org/project/glytrait/)
![PyPI - Python Version](https://img.shields.io/pypi/pyversions/glytrait)
[![GitHub License](https://img.shields.io/github/license/fubin1999/glytrait)](https://github.com/fubin1999/glytrait/blob/main/LICENSE)
[![Coverage Status](https://coveralls.io/repos/github/fubin1999/glytrait/badge.svg?branch=main)](https://coveralls.io/github/fubin1999/glytrait?branch=main)
[![Codacy Badge](https://app.codacy.com/project/badge/Grade/e3e6a19dccb749f786264247738fa585)](https://app.codacy.com/gh/fubin1999/glytrait/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade)

## Overview

*Q: What is GlyTrait?*

*A: GlyTrait is a tool to calculate derived traits for N-glycomic data.*

*Q: Sounds cool! So..., what are derived traits again?*

*A: Well, derived traits are artificial variables that summarize certain aspects of 
the glycome. 
For example, the proportion of core-fucosylated glycans, 
the average number of sialic acids per glycan, 
or the proportion of bisected glycans within bi-antennary complex glycans, etc. 
Derived traits are more biologically relavant 
and have been used a lot in the glycomics community.*

*Q: So, GlyTrait does the dirty work for me, 
saving my time and energy for more interesting analysis?*

*A: You bet!*

## Contents

- [Web app](#web-app)
- [Installation](#installation)
- [Usage](#usage)
    - [Quick start](#quick-start)
    - [Options](#options)
    - [Mode](#mode)
    - [Input file format](#input-file-format)
    - [Specify output path](#specify-the-output-path)
    - [Preprocessing](#preprocessing)
    - [Sialic-acid-linkage traits](#sialic-acid-linkage-traits)
    - [Post-filtering](#post-filtering)
    - [Custom formulas](#custom-formulas)
- [Reproduction Instructions](#reproduction-instructions)
- [License](#license)

## Web app

> Let there be no code!
> 
> -- my colleagues

[![Streamlit App](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://glytrait.streamlit.app)

Click the badge above and try the GlyTrait web app!
The web app is self-documented, but in case you need help,
please refer to the [Usage](#usage) section below.

## Installation

### Requirement

```
python >= 3.10
```

If python hasn't been installed, 
download it from [its website](https://www.bing.com/search?q=python&form=APMCS1&PC=APMC),
or use [Anaconda](https://www.anaconda.com/download/) if you like.

### Option 1: Using pipx (recommended)

`pipx` is a tool to help you install and run end-user applications written in Python.
It's roughly similar to macOS's brew, JavaScript's npx, and Linux's apt.

#### Install pipx

Install pipx following its [Document](https://pypa.github.io/pipx/installation/).

#### Install GlyTrait from PyPi

```shell
pipx install glytrait
```

### Option 2: Using pip

#### 1. Create a new conda environment (optional)

```shell
conda create -n glytrait python=3.10
```

and activate the environment.

```shell
conda activate glytrait
```

#### 2. Install GlyTrait from PyPi

```shell
pip install glytrait
```

## Usage

### Quick start

1. Download the [example files](https://github.com/fubin1999/glytrait/tree/main/example_data)
to a new directory: `glytrait_example`.

2. Open a terminal and navigate to the directory 
(you need to replace `path/to/glytrait_example` with the actual path):

```shell
cd path/to/glytrait_example
```

3. Run the following command:

```shell
glytrait abundance.csv structures.csv
```

That's it! 
If everything goes well, a folder named "abundance_glytrait" will be created
in the same directory with the abundnce.csv file.
Inside the directory are four files:

1. `derived_traits.csv`: all derived traits calculated by GlyTrait.
2. `derived_traits_filtered.csv`: derived traits after post-filtering 
   (see [Post-filtering](#post-filtering)).
3. `glycan_abundance_processed.csv`: the glycan abundance after preprocessing
   (see [Preprocessing](#preprocessing)).
4. `meta_properties.csv`: the meta-properties of all glycans.

The detailed format of the input file will be introduced in the
[Input file format](#input-file-format) section.

### Options

*This section is intended to give an overview of the CLI interface.
Feel free to skip it right now.*

As a glance, GlyTrait supports the following options:

|          Option           | Description                                                                                                     |
|:-------------------------:|:----------------------------------------------------------------------------------------------------------------|
|          --help           | Show the help message and exit.                                                                                 |
|        -m, --mode         | The mode. "S" or "structure" for structure mode, "C" or "composition" for composition mode. Default: structure. |
|       -o, --output        | The output path. Default: the same directory with the input file.                                               |
| -r, --filter-glycan-ratio | The proportion of missing values for a glycan to be ruled out. Default: 0.5.                                    |
|    -i, --inpute-method    | The imputation method. "min", "mean", "median", "zero", or "lod". Default: "min".                               |
|   -c, --corr-threshold    | The correlation threshold for collinearity filtering. Default to 1.0.                                           |
|     -l, --sia-linkage     | Flag to include the sialic acid linkage traits.                                                                 |
|        --no-filter        | Flag to turn off post-filtering of derived traits.                                                              |
|        -g, --group        | The group file.                                                                                                 |
|    -f, --formula-file     | The custom formula file to use.                                                                                 |
|  -b, --builtin-formulas   | The directory path to save a copy of the built-in formulas.                                                     |

The following sections will introduce these options in detail.

### Mode

GlyTrait has two modes: the **"structure"** mode and the **"composition"** mode.
In the "structure" mode,
GlyTrait will calculate derived traits based on the topology properties of glycan structures.
In the "composition" mode, GlyTrait will make educated guesses on the structure properties 
based on the glycan composition.

Note that the "composition" mode has uncertainties to some extent. Specifically:

1. Estimating the number of Gal based on composition is not possible for hybrid glycans,
   so GlyTrait will calculate the number of Gal assuming there are no hybrid glycans.
   kily, hybrid glycans are usually in low abundance,
   so the algorithm is a good approximation for most cases.
2. Estimating the number of branches is not possible based on composition,
   so GlyTrait will roughly classify glycans into two categories: 
   low-branching and high-branching.
   Glycans with N > 4 (including bisecting diantenary glycans) are considered as high-branching,
   while others as low-branching.
3. Telling hybrid glycans from mono-antenary complex glycans is not possible based on composition,
   so GlyTrait will not classify glycans into complex, hybrid and high-mannose.

Due to the ambiguities above, we recommend using the "structure" mode if possible.

You can specify the mode by the "-m" or the "--mode" option:

```shell
glytrait abundance.csv composition.csv -m composition
```

Or in short:

```shell
glytrait abundance.csv composition.csv -m C
```

The default mode is the "structure" mode, as in the quick start example. 
Thus, using `glytrait -m structure` or `glytrait -m S` is equivalent to `glytrait` alone.
If you might use both modes in a project, 
we recommend using the "-m" option to specify the mode explicitly.

### Input file format

At least two files are needed for GlyTrait to work:

#### 1. The abundance file

A csv file with samples as rows and glycan IDs as columns.
An example file would be like:

| Sample  | Glycan1 | Glycan2 | Glycan3 |
|---------|---------|---------|---------|
| Sample1 | 0.0417  | 0.0503  | 0.0354  |
| Sample2 | 0.0233  | 0.0533  | 0.0593  |
| Sample3 | 0.0123  | 0.0133  | 0.0194  |

The header of the first column should be "Sample",
and the header of the other columns should be glycan IDs.
Glycan IDs can be any string, e.g., the composition strings ("H3N4").

**Both glycan IDs and samples should be unique.**

#### 2. The structure file (or the composition file)

A csv file with two columns: "GlycanID" and "Structure" (or "Composition").
An example file would be like:

| GlycanID | Structure |
|----------|-----------|
| Glycan1  | RES...    |
| Glycan2  | RES...    |
| Glycan3  | RES...    |

The "GlycanID" column should contain all glycan IDs in the abundance file.
The "Structure" column should contain the structure strings of the glycans.
For now, only the GlycoCT format is supported.
In the "composition" mode, the second column should be "Composition" instead of "Structure",
and the composition strings should be used instead of the structure strings.
Condensed format ("H3N4F1S1") is supported.

The names of the abundance file and the structure (composition) file is not important, 
i.e., it doesn't have to be "abundance.csv" and "structures.csv". 
You could use any name you want, as long as the two files are passed in order,
e.g.: 

```shell
glytrait experiment_1_15.csv serum_structures.csv
#        ~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~
#        the abundance file   the structure file
```

### Specify the output path

You might have noticed before that GlyTrait saves the output file to the same directory 
as the abundance file with a "_glytrait" suffix.
You can specify the output file path by using the "-o" or "--output-file" option:

```shell
glytrait abundance.csv structure.csv -o output
```

### Preprocessing

GlyTrait will carry out a preprocessing step before calculating derived traits.
The following steps will be done:

- Remove glycans with missing values in more than a certain proportion of samples.
- Impute missing values.
- Perform Total Abundance Normalization.

In the glycan-filtering step, the proportion threshold could be specified by the "-r" or the
"--filter-glycan-ratio" option.
The default value is 1, which means no glycan will be removed.
You can change this value to 0.5 by:

```shell
glytrait abundance.csv structure.csv -r 0.5
```

The imputation method could be specified by the "-i" or the "--impute-method" option.
The default method is "zero",
which means missing values will be imputed by 0.
Other supported methods are "mean", "median", "zero", "lod".
You can change the imputation method to "min" by:

```shell
glytrait abundance.csv structure.csv -i min
```

A full list of supported imputation methods are:

- "min": impute missing values by the minimum value of a glycan within all samples.
- "mean": impute missing values by the mean value of a glycan within all samples.
- "median": impute missing values by the median value of a glycan within all samples.
- "zero": impute missing values by 0.
- "lod": impute missing values by the limit of detection (LOD) of the equipment. The LOD of a
  glycan is defined as the minimum value of the glycan within all samples divided by 5.

### Sialic-acid-linkage traits

Sialic acids can have different linkages for N-glycans (e.g., α2,3 and α2,6).
Different sialic acid linkage has different biological functions.
GlyTrait supports calculating derived traits regarding these linkages.
To use this feature, you need to have siaic acid linkage information.

In the structure mode, the "Structure" column or the structure file should contain the linkage
information.
Only linkage information about sialic acids is needed.
This can be easily done using GlycoWorkbench.

In the composition mode, the "Composition" column must contain the linkage information.
GlyTrait uses a common notation for sialic acid with different linkages:
"E" for a2,6-linked sialic acids, and "L" a2,3-linked sialic acids.
For example, "H5N4F1E1L1" contains two sialic acids, one is a2,6-linked, 
and the other is a2,3-linked.

You can use the "-l" or "--sia-linkage" option to include sialic-acid-linkage traits:

```shell
glytrait abundance.csv structure.csv -l
```

Note that if you use this option, all glycans with sialic acids should have linkage information.
That is to say, all structure strings should have structure information in the structure mode and
no "S" in composition strings in the composition mode.

### Post-Filtering

Not all derived traits are informative.
For example, some traits might have the same value for all samples.
Some traits might be highly correlated with others.

GlyTrait carries out a two-step post-filtering process to remove these uninformative traits.
First, traits with the same value for all samples will be removed.
Second, highly correlated traits will be pruned,
keeping only the traits considering more glycans.

GlyTrait filters out highly correlated traits, using a "trait family tree" filtering method.
Briefly, for two correlated traits, the "parent" trait,
which normally considers more glycans, will be kept.
For example, for the two high-correlated traits: A2FG and A2G, the latter will be kept 
because it is more general, and more robust for considering more glycans.
Thanks to the dynamic "trait family tree" generated by GlyTrait,
user-defined traits will also be considered in this filtering process.

By default, GlyTrait only filters trarits with Pearson correlation coefficient of 1,
i.e., traits with perfect collinearity.
This threshold can be changed by the "-c" or "--corr-threshold" option:

```shell
glytrait abundance.csv structure.csv -c 0.9
```

Setting the threshold to -1 will turn off the colinearity filtering. 
To turn off postfiltering all together, use the "--no-filtering" option:

```shell
glytrait abundance.csv structure.csv --no-filtering
```

### Custom Formulas

Coming soon...

(The functionallity has been implemented, but the documentation is not ready yet.
After publishing the GlyTrait paper, we will update the documentation.)

## Reproduction Instructions

To reproduce the results in our paper 
"GlyTrait: A versatile bioinformatics tool for Glycomics Analysis",
please download the Supplementary Data files.
There are three such files, corresponding to different datasets mentioned in the paper.
Each Supplementary Data file has several sheets with the prefix of "Input -".
These are the input data for GlyTrait.
Please reformat the data according to the [Input file format](#input-file-format) section,
and run GlyTrait with the same options as in the paper.

## License

[MIT License](LICENSE)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/fubin1999/glytrait",
    "name": "glytrait",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.13,>=3.10",
    "maintainer_email": null,
    "keywords": "bioinformatics, glycomics, biology",
    "author": "fubin1999",
    "author_email": "65430559+fubin1999@users.noreply.github.com",
    "download_url": "https://files.pythonhosted.org/packages/8c/fa/97f6d8097847c7b6c66eae8523604f5d07a8dd45b0738fc2dc2cf78e72a5/glytrait-0.1.7.tar.gz",
    "platform": null,
    "description": "![logo](img/logo.png)\n\n# GlyTrait\n\n[![PyPI - Version](https://img.shields.io/pypi/v/glytrait)](https://pypi.org/project/glytrait/)\n![PyPI - Python Version](https://img.shields.io/pypi/pyversions/glytrait)\n[![GitHub License](https://img.shields.io/github/license/fubin1999/glytrait)](https://github.com/fubin1999/glytrait/blob/main/LICENSE)\n[![Coverage Status](https://coveralls.io/repos/github/fubin1999/glytrait/badge.svg?branch=main)](https://coveralls.io/github/fubin1999/glytrait?branch=main)\n[![Codacy Badge](https://app.codacy.com/project/badge/Grade/e3e6a19dccb749f786264247738fa585)](https://app.codacy.com/gh/fubin1999/glytrait/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade)\n\n## Overview\n\n*Q: What is GlyTrait?*\n\n*A: GlyTrait is a tool to calculate derived traits for N-glycomic data.*\n\n*Q: Sounds cool! So..., what are derived traits again?*\n\n*A: Well, derived traits are artificial variables that summarize certain aspects of \nthe glycome. \nFor example, the proportion of core-fucosylated glycans, \nthe average number of sialic acids per glycan, \nor the proportion of bisected glycans within bi-antennary complex glycans, etc. \nDerived traits are more biologically relavant \nand have been used a lot in the glycomics community.*\n\n*Q: So, GlyTrait does the dirty work for me, \nsaving my time and energy for more interesting analysis?*\n\n*A: You bet!*\n\n## Contents\n\n- [Web app](#web-app)\n- [Installation](#installation)\n- [Usage](#usage)\n    - [Quick start](#quick-start)\n    - [Options](#options)\n    - [Mode](#mode)\n    - [Input file format](#input-file-format)\n    - [Specify output path](#specify-the-output-path)\n    - [Preprocessing](#preprocessing)\n    - [Sialic-acid-linkage traits](#sialic-acid-linkage-traits)\n    - [Post-filtering](#post-filtering)\n    - [Custom formulas](#custom-formulas)\n- [Reproduction Instructions](#reproduction-instructions)\n- [License](#license)\n\n## Web app\n\n> Let there be no code!\n> \n> -- my colleagues\n\n[![Streamlit App](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://glytrait.streamlit.app)\n\nClick the badge above and try the GlyTrait web app!\nThe web app is self-documented, but in case you need help,\nplease refer to the [Usage](#usage) section below.\n\n## Installation\n\n### Requirement\n\n```\npython >= 3.10\n```\n\nIf python hasn't been installed, \ndownload it from [its website](https://www.bing.com/search?q=python&form=APMCS1&PC=APMC),\nor use [Anaconda](https://www.anaconda.com/download/) if you like.\n\n### Option 1: Using pipx (recommended)\n\n`pipx` is a tool to help you install and run end-user applications written in Python.\nIt's roughly similar to macOS's brew, JavaScript's npx, and Linux's apt.\n\n#### Install pipx\n\nInstall pipx following its [Document](https://pypa.github.io/pipx/installation/).\n\n#### Install GlyTrait from PyPi\n\n```shell\npipx install glytrait\n```\n\n### Option 2: Using pip\n\n#### 1. Create a new conda environment (optional)\n\n```shell\nconda create -n glytrait python=3.10\n```\n\nand activate the environment.\n\n```shell\nconda activate glytrait\n```\n\n#### 2. Install GlyTrait from PyPi\n\n```shell\npip install glytrait\n```\n\n## Usage\n\n### Quick start\n\n1. Download the [example files](https://github.com/fubin1999/glytrait/tree/main/example_data)\nto a new directory: `glytrait_example`.\n\n2. Open a terminal and navigate to the directory \n(you need to replace `path/to/glytrait_example` with the actual path):\n\n```shell\ncd path/to/glytrait_example\n```\n\n3. Run the following command:\n\n```shell\nglytrait abundance.csv structures.csv\n```\n\nThat's it! \nIf everything goes well, a folder named \"abundance_glytrait\" will be created\nin the same directory with the abundnce.csv file.\nInside the directory are four files:\n\n1. `derived_traits.csv`: all derived traits calculated by GlyTrait.\n2. `derived_traits_filtered.csv`: derived traits after post-filtering \n   (see [Post-filtering](#post-filtering)).\n3. `glycan_abundance_processed.csv`: the glycan abundance after preprocessing\n   (see [Preprocessing](#preprocessing)).\n4. `meta_properties.csv`: the meta-properties of all glycans.\n\nThe detailed format of the input file will be introduced in the\n[Input file format](#input-file-format) section.\n\n### Options\n\n*This section is intended to give an overview of the CLI interface.\nFeel free to skip it right now.*\n\nAs a glance, GlyTrait supports the following options:\n\n|          Option           | Description                                                                                                     |\n|:-------------------------:|:----------------------------------------------------------------------------------------------------------------|\n|          --help           | Show the help message and exit.                                                                                 |\n|        -m, --mode         | The mode. \"S\" or \"structure\" for structure mode, \"C\" or \"composition\" for composition mode. Default: structure. |\n|       -o, --output        | The output path. Default: the same directory with the input file.                                               |\n| -r, --filter-glycan-ratio | The proportion of missing values for a glycan to be ruled out. Default: 0.5.                                    |\n|    -i, --inpute-method    | The imputation method. \"min\", \"mean\", \"median\", \"zero\", or \"lod\". Default: \"min\".                               |\n|   -c, --corr-threshold    | The correlation threshold for collinearity filtering. Default to 1.0.                                           |\n|     -l, --sia-linkage     | Flag to include the sialic acid linkage traits.                                                                 |\n|        --no-filter        | Flag to turn off post-filtering of derived traits.                                                              |\n|        -g, --group        | The group file.                                                                                                 |\n|    -f, --formula-file     | The custom formula file to use.                                                                                 |\n|  -b, --builtin-formulas   | The directory path to save a copy of the built-in formulas.                                                     |\n\nThe following sections will introduce these options in detail.\n\n### Mode\n\nGlyTrait has two modes: the **\"structure\"** mode and the **\"composition\"** mode.\nIn the \"structure\" mode,\nGlyTrait will calculate derived traits based on the topology properties of glycan structures.\nIn the \"composition\" mode, GlyTrait will make educated guesses on the structure properties \nbased on the glycan composition.\n\nNote that the \"composition\" mode has uncertainties to some extent. Specifically:\n\n1. Estimating the number of Gal based on composition is not possible for hybrid glycans,\n   so GlyTrait will calculate the number of Gal assuming there are no hybrid glycans.\n   kily, hybrid glycans are usually in low abundance,\n   so the algorithm is a good approximation for most cases.\n2. Estimating the number of branches is not possible based on composition,\n   so GlyTrait will roughly classify glycans into two categories: \n   low-branching and high-branching.\n   Glycans with N > 4 (including bisecting diantenary glycans) are considered as high-branching,\n   while others as low-branching.\n3. Telling hybrid glycans from mono-antenary complex glycans is not possible based on composition,\n   so GlyTrait will not classify glycans into complex, hybrid and high-mannose.\n\nDue to the ambiguities above, we recommend using the \"structure\" mode if possible.\n\nYou can specify the mode by the \"-m\" or the \"--mode\" option:\n\n```shell\nglytrait abundance.csv composition.csv -m composition\n```\n\nOr in short:\n\n```shell\nglytrait abundance.csv composition.csv -m C\n```\n\nThe default mode is the \"structure\" mode, as in the quick start example. \nThus, using `glytrait -m structure` or `glytrait -m S` is equivalent to `glytrait` alone.\nIf you might use both modes in a project, \nwe recommend using the \"-m\" option to specify the mode explicitly.\n\n### Input file format\n\nAt least two files are needed for GlyTrait to work:\n\n#### 1. The abundance file\n\nA csv file with samples as rows and glycan IDs as columns.\nAn example file would be like:\n\n| Sample  | Glycan1 | Glycan2 | Glycan3 |\n|---------|---------|---------|---------|\n| Sample1 | 0.0417  | 0.0503  | 0.0354  |\n| Sample2 | 0.0233  | 0.0533  | 0.0593  |\n| Sample3 | 0.0123  | 0.0133  | 0.0194  |\n\nThe header of the first column should be \"Sample\",\nand the header of the other columns should be glycan IDs.\nGlycan IDs can be any string, e.g., the composition strings (\"H3N4\").\n\n**Both glycan IDs and samples should be unique.**\n\n#### 2. The structure file (or the composition file)\n\nA csv file with two columns: \"GlycanID\" and \"Structure\" (or \"Composition\").\nAn example file would be like:\n\n| GlycanID | Structure |\n|----------|-----------|\n| Glycan1  | RES...    |\n| Glycan2  | RES...    |\n| Glycan3  | RES...    |\n\nThe \"GlycanID\" column should contain all glycan IDs in the abundance file.\nThe \"Structure\" column should contain the structure strings of the glycans.\nFor now, only the GlycoCT format is supported.\nIn the \"composition\" mode, the second column should be \"Composition\" instead of \"Structure\",\nand the composition strings should be used instead of the structure strings.\nCondensed format (\"H3N4F1S1\") is supported.\n\nThe names of the abundance file and the structure (composition) file is not important, \ni.e., it doesn't have to be \"abundance.csv\" and \"structures.csv\". \nYou could use any name you want, as long as the two files are passed in order,\ne.g.: \n\n```shell\nglytrait experiment_1_15.csv serum_structures.csv\n#        ~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~\n#        the abundance file   the structure file\n```\n\n### Specify the output path\n\nYou might have noticed before that GlyTrait saves the output file to the same directory \nas the abundance file with a \"_glytrait\" suffix.\nYou can specify the output file path by using the \"-o\" or \"--output-file\" option:\n\n```shell\nglytrait abundance.csv structure.csv -o output\n```\n\n### Preprocessing\n\nGlyTrait will carry out a preprocessing step before calculating derived traits.\nThe following steps will be done:\n\n- Remove glycans with missing values in more than a certain proportion of samples.\n- Impute missing values.\n- Perform Total Abundance Normalization.\n\nIn the glycan-filtering step, the proportion threshold could be specified by the \"-r\" or the\n\"--filter-glycan-ratio\" option.\nThe default value is 1, which means no glycan will be removed.\nYou can change this value to 0.5 by:\n\n```shell\nglytrait abundance.csv structure.csv -r 0.5\n```\n\nThe imputation method could be specified by the \"-i\" or the \"--impute-method\" option.\nThe default method is \"zero\",\nwhich means missing values will be imputed by 0.\nOther supported methods are \"mean\", \"median\", \"zero\", \"lod\".\nYou can change the imputation method to \"min\" by:\n\n```shell\nglytrait abundance.csv structure.csv -i min\n```\n\nA full list of supported imputation methods are:\n\n- \"min\": impute missing values by the minimum value of a glycan within all samples.\n- \"mean\": impute missing values by the mean value of a glycan within all samples.\n- \"median\": impute missing values by the median value of a glycan within all samples.\n- \"zero\": impute missing values by 0.\n- \"lod\": impute missing values by the limit of detection (LOD) of the equipment. The LOD of a\n  glycan is defined as the minimum value of the glycan within all samples divided by 5.\n\n### Sialic-acid-linkage traits\n\nSialic acids can have different linkages for N-glycans (e.g., \u03b12,3 and \u03b12,6).\nDifferent sialic acid linkage has different biological functions.\nGlyTrait supports calculating derived traits regarding these linkages.\nTo use this feature, you need to have siaic acid linkage information.\n\nIn the structure mode, the \"Structure\" column or the structure file should contain the linkage\ninformation.\nOnly linkage information about sialic acids is needed.\nThis can be easily done using GlycoWorkbench.\n\nIn the composition mode, the \"Composition\" column must contain the linkage information.\nGlyTrait uses a common notation for sialic acid with different linkages:\n\"E\" for a2,6-linked sialic acids, and \"L\" a2,3-linked sialic acids.\nFor example, \"H5N4F1E1L1\" contains two sialic acids, one is a2,6-linked, \nand the other is a2,3-linked.\n\nYou can use the \"-l\" or \"--sia-linkage\" option to include sialic-acid-linkage traits:\n\n```shell\nglytrait abundance.csv structure.csv -l\n```\n\nNote that if you use this option, all glycans with sialic acids should have linkage information.\nThat is to say, all structure strings should have structure information in the structure mode and\nno \"S\" in composition strings in the composition mode.\n\n### Post-Filtering\n\nNot all derived traits are informative.\nFor example, some traits might have the same value for all samples.\nSome traits might be highly correlated with others.\n\nGlyTrait carries out a two-step post-filtering process to remove these uninformative traits.\nFirst, traits with the same value for all samples will be removed.\nSecond, highly correlated traits will be pruned,\nkeeping only the traits considering more glycans.\n\nGlyTrait filters out highly correlated traits, using a \"trait family tree\" filtering method.\nBriefly, for two correlated traits, the \"parent\" trait,\nwhich normally considers more glycans, will be kept.\nFor example, for the two high-correlated traits: A2FG and A2G, the latter will be kept \nbecause it is more general, and more robust for considering more glycans.\nThanks to the dynamic \"trait family tree\" generated by GlyTrait,\nuser-defined traits will also be considered in this filtering process.\n\nBy default, GlyTrait only filters trarits with Pearson correlation coefficient of 1,\ni.e., traits with perfect collinearity.\nThis threshold can be changed by the \"-c\" or \"--corr-threshold\" option:\n\n```shell\nglytrait abundance.csv structure.csv -c 0.9\n```\n\nSetting the threshold to -1 will turn off the colinearity filtering. \nTo turn off postfiltering all together, use the \"--no-filtering\" option:\n\n```shell\nglytrait abundance.csv structure.csv --no-filtering\n```\n\n### Custom Formulas\n\nComing soon...\n\n(The functionallity has been implemented, but the documentation is not ready yet.\nAfter publishing the GlyTrait paper, we will update the documentation.)\n\n## Reproduction Instructions\n\nTo reproduce the results in our paper \n\"GlyTrait: A versatile bioinformatics tool for Glycomics Analysis\",\nplease download the Supplementary Data files.\nThere are three such files, corresponding to different datasets mentioned in the paper.\nEach Supplementary Data file has several sheets with the prefix of \"Input -\".\nThese are the input data for GlyTrait.\nPlease reformat the data according to the [Input file format](#input-file-format) section,\nand run GlyTrait with the same options as in the paper.\n\n## License\n\n[MIT License](LICENSE)\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A tool for calculating derived traits for N-glycome",
    "version": "0.1.7",
    "project_urls": {
        "Homepage": "https://github.com/fubin1999/glytrait",
        "Repository": "https://github.com/fubin1999/glytrait"
    },
    "split_keywords": [
        "bioinformatics",
        " glycomics",
        " biology"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "317d8e025c53e4ea74beb6b43c7d3007aca19098b4735a74b76b597c2549c8f8",
                "md5": "ce32fca6dd45bf2d0b6dda219ffde794",
                "sha256": "13cc7b5c618d9c1693a6100b1a6f8dc287c1fce1d84213b9bd46a36a073b1913"
            },
            "downloads": -1,
            "filename": "glytrait-0.1.7-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ce32fca6dd45bf2d0b6dda219ffde794",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.13,>=3.10",
            "size": 44162,
            "upload_time": "2024-05-22T04:51:14",
            "upload_time_iso_8601": "2024-05-22T04:51:14.689369Z",
            "url": "https://files.pythonhosted.org/packages/31/7d/8e025c53e4ea74beb6b43c7d3007aca19098b4735a74b76b597c2549c8f8/glytrait-0.1.7-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8cfa97f6d8097847c7b6c66eae8523604f5d07a8dd45b0738fc2dc2cf78e72a5",
                "md5": "973cd9d635936a05aa607a1be2fe6079",
                "sha256": "08ccb4c9dbbd528a0fa97a42e6cd128722ab151ab128c0d87d6b203b949704ef"
            },
            "downloads": -1,
            "filename": "glytrait-0.1.7.tar.gz",
            "has_sig": false,
            "md5_digest": "973cd9d635936a05aa607a1be2fe6079",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.13,>=3.10",
            "size": 44107,
            "upload_time": "2024-05-22T04:51:17",
            "upload_time_iso_8601": "2024-05-22T04:51:17.482787Z",
            "url": "https://files.pythonhosted.org/packages/8c/fa/97f6d8097847c7b6c66eae8523604f5d07a8dd45b0738fc2dc2cf78e72a5/glytrait-0.1.7.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-22 04:51:17",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "fubin1999",
    "github_project": "glytrait",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "glytrait"
}
        
Elapsed time: 0.23429s