<h1 align="center">
<img src="https://github.com/cribbslab/mclumi-dev/blob/main/img/mclumi-logo-trans.png?raw=true" width="200" height="124">
<br>
</h1>
![](https://img.shields.io/badge/mclUMI-executable-519dd9.svg)
![](https://img.shields.io/badge/last_released-Oct._2021-green.svg)
![](https://img.shields.io/github/stars/cribbslab/mclumi?logo=GitHub&color=blue)
![](https://img.shields.io/pypi/v/mclumix?logo=PyPI)
[![Documentation Status](https://readthedocs.org/projects/mclumi/badge/?version=latest)](https://mclumi.readthedocs.io/en/latest/?badge=latest)
[![Downloads](https://pepy.tech/badge/mclumi)](https://pepy.tech/project/mclumi)
###### tags: `UMI deduplication` `PCR deduplication` `scRNA-seq` `bulk-RNA-seq`
## Overview
This repository deposits the mclUMI toolkit developed by Markov clustering (MCL) network-based algorithms for precisely localizing unique UMIs and thus removing PCR duplicates. mclUMI enables a construction of sub-graphs with UMI nodes to be relatively strongly connected.
## Documentation
The API documentation of mclUMI is available at https://mclumi.herokuapp.com and https://mclumi.readthedocs.io/en/latest.
## System requirement
Linux or Mac
## Installation
We tested the software installation on a Linux system, which has the following configuration:
* Distributor ID: Ubuntu
* Description: Ubuntu 20.04.3
* Release: 20.04
* Codename: focal
The anaconda is configured as:
* Conda version: 4.11.0
> You can use `conda update conda` and `conda update anaconda` to keep your anaconda up-to-date.
We recommend using a `Python` of version **`3.9.1`** as the base python to create your conda environment because `NumPy` and `Pandas` in a `Python` of higher version `3.9` may require a few dependencies that are not included in the installation of mclUMI or make conflicts with existing packages.
**Step 1**: create a conda environment, e.g., mclumi
```angular2html
conda create --name mclumi python=3.9.1
conda activate mclumi
```
<h1>
<img src="https://github.com/cribbslab/mclumi/blob/main/imgs/conda-setting.png?raw=true">
<br>
</h1>
**Step 2**: sourced from https://pypi.org/project/mclumix.
```angular2html
pip install --upgrade mclumix
```
After a two-step installation procedure, you should see the following outputs.
<h1>
<img src="https://github.com/cribbslab/mclumi/blob/main/imgs/install.png?raw=true">
<br>
</h1>
## Usage
To ease the use of mclUMI for multiple groups of users, we have made it usable in both command-line interface (CLI) and inline mode.
### 1. CLI
1.1 Parameter illustration
By typing `mclumi -h`, you are able to see the package usage as shown below.
```
usage: mclumi [-h] [--read_structure read_structure] [--lens lens]
[--input input] [--output output] [--method method]
[--input_bam input_bam] [--edit_dist edit dist]
[--inflation_value inflation_value]
[--expansion_value expansion_value]
[--iteration_number iteration_number]
[--mcl_fold_thres mcl_fold_thres] [--is_sv is_sv]
[--output_bam output_bam] [--verbose verbose]
[--pos_tag pos_tag] [--gene_assigned_tag gene_assigned_tag]
[--gene_is_assigned_tag gene_is_assigned_tag]
tool
Welcome to the mclumi toolkit
positional arguments:
tool trim, dedup_basic, dedup_pos, dedup_gene, dedup_sc
optional arguments:
-h, --help show this help message and exit
--read_structure read_structure, -rs read_structure
str - the read structure with elements in conjunction
with +, e.g., primer_1+umi_1+seq_1+umi_2+primer_2
--lens lens, -l lens str - lengths of all sub-structures separated by +,
e.g., 20+10+40+10+20 if the read structure is
primer_1+umi_1+seq_1+umi_2+primer_2
--input input, -i input
str - input a fastq file in gz format for trimming
UMIs
--output output, -o output
str - output a UMI-trimmed fastq file in gz format.
--method method, -m method
str - a dedup method: unique | cluster | adjacency |
directional | mcl | mcl_ed | mcl_val
--input_bam input_bam, -ibam input_bam
str - input a bam file curated by requirements of
different dedup modules: dedup_basic, dedup_pos,
dedup_gene, dedup_sc
--edit_dist edit dist, -ed edit dist
int - an edit distance used for building graphs at a
range of [1, l) where l is the length of a UMI
--inflation_value inflation_value, -infv inflation_value
float - an inflation value for MCL, 2.0 by default
--expansion_value expansion_value, -expv expansion_value
int - an expansion value for MCL at a range of (1,
+inf), 2 by default
--iteration_number iteration_number, -itern iteration_number
int - iteration number for MCL at a range of (1,
+inf), 100 by default
--mcl_fold_thres mcl_fold_thres, -fthres mcl_fold_thres
float - a fold threshold for MCL at a range of (1, l)
where l is the length of a UMI.
--is_sv is_sv, -issv is_sv
bool - to make sure if the deduplicated reads writes
to a bam file (True by default or False)
--output_bam output_bam, -obam output_bam
str - output UMI-deduplicated summary statistics to a
txt file.
--verbose verbose, -vb verbose
bool - to enable if output logs are on console (True
by default or False)
--pos_tag pos_tag, -pt pos_tag
str - to enable deduplication on the position tags (PO
recommended when your bam is tagged)
--gene_assigned_tag gene_assigned_tag, -gt gene_assigned_tag
str - to enable deduplication on the gene tag (XT
recommended)
--gene_is_assigned_tag gene_is_assigned_tag, -gist gene_is_assigned_tag
str - to check if reads are assigned the gene tag (XS
recommended)
```
1.2 Example commands
* extracting and attaching umis to names of reads in fastq format
```
mclumi trim -i ./pcr_1.fastq.gz -o ./pcr_trimmed.fastq.gz -rs primer_1+umi_1+seq_1+umi_2+primer_2 -l 20+10+40+10+20
```
* deduplication on only one genome position
```
mclumi dedup_basic -m mcl -ed 1 -infv 1.6 -expv 2 -ibam ./example_bundle.bam -obam ./dedup.bam
```
* deduplication per genome position
```
mclumi dedup_pos -m mcl -pt PO -ed 1 -infv 1.6 -expv 2 -ibam ./example_bundle.bam -obam ./basic/dedup.bam
```
* deduplication per gene (applicable to bulk RNA-seq data)
```
mclumi dedup_gene -m directional -gt XT -gist XS -ed 1 -ibam ./hgmm_100_STAR_FC_sorted.bam -obam ./dedup.bam
```
* deduplication per cell per gene (applicable to single-cell RNA-seq data)
```
mclumi dedup_sc -m directional -gt XT -gist XS -ed 1 -ibam ./hgmm_100_STAR_FC_sorted.bam -obam ./dedup.bam
```
### 2. Inline
see Jupyter notebooks
```
./notebooks/
```
## Output
see `./notebooks/results_spelt_out.ipynb` for result format. More types of output format are about to be added.
## Contact
Homepage: https://www.ndorms.ox.ac.uk/team/adam-cribbs
Raw data
{
"_id": null,
"home_page": "https://cribbslab.co.uk/",
"name": "mclumi",
"maintainer": "Jianfeng Sun",
"docs_url": null,
"requires_python": ">=3.10,<4.0",
"maintainer_email": "jianfeng.sun@ndorms.ox.ac.uk",
"keywords": "packaging,mclumi",
"author": "Jianfeng Sun",
"author_email": "",
"download_url": "https://files.pythonhosted.org/packages/58/02/0bd072c7eed5f4cff6170695cadd411d9daa8743153562051bc0a90d5329/mclumi-0.0.4.tar.gz",
"platform": null,
"description": "<h1 align=\"center\">\n <img src=\"https://github.com/cribbslab/mclumi-dev/blob/main/img/mclumi-logo-trans.png?raw=true\" width=\"200\" height=\"124\">\n <br>\n</h1>\n\n![](https://img.shields.io/badge/mclUMI-executable-519dd9.svg)\n![](https://img.shields.io/badge/last_released-Oct._2021-green.svg)\n![](https://img.shields.io/github/stars/cribbslab/mclumi?logo=GitHub&color=blue)\n![](https://img.shields.io/pypi/v/mclumix?logo=PyPI)\n[![Documentation Status](https://readthedocs.org/projects/mclumi/badge/?version=latest)](https://mclumi.readthedocs.io/en/latest/?badge=latest)\n[![Downloads](https://pepy.tech/badge/mclumi)](https://pepy.tech/project/mclumi)\n\n###### tags: `UMI deduplication` `PCR deduplication` `scRNA-seq` `bulk-RNA-seq`\n\n## Overview\nThis repository deposits the mclUMI toolkit developed by Markov clustering (MCL) network-based algorithms for precisely localizing unique UMIs and thus removing PCR duplicates. mclUMI enables a construction of sub-graphs with UMI nodes to be relatively strongly connected.\n\n## Documentation\nThe API documentation of mclUMI is available at https://mclumi.herokuapp.com and https://mclumi.readthedocs.io/en/latest.\n\n## System requirement\nLinux or Mac\n\n## Installation\nWe tested the software installation on a Linux system, which has the following configuration:\n* Distributor ID: Ubuntu\n* Description: Ubuntu 20.04.3\n* Release: 20.04\n* Codename: focal\n\nThe anaconda is configured as:\n* Conda version: 4.11.0\n\n> You can use `conda update conda` and `conda update anaconda` to keep your anaconda up-to-date.\n\nWe recommend using a `Python` of version **`3.9.1`** as the base python to create your conda environment because `NumPy` and `Pandas` in a `Python` of higher version `3.9` may require a few dependencies that are not included in the installation of mclUMI or make conflicts with existing packages.\n\n**Step 1**: create a conda environment, e.g., mclumi\n ```angular2html\n conda create --name mclumi python=3.9.1\n \n conda activate mclumi\n ```\n \n <h1>\n <img src=\"https://github.com/cribbslab/mclumi/blob/main/imgs/conda-setting.png?raw=true\">\n <br>\n </h1>\n\n**Step 2**: sourced from https://pypi.org/project/mclumix.\n ```angular2html\n pip install --upgrade mclumix\n ```\nAfter a two-step installation procedure, you should see the following outputs.\n <h1>\n <img src=\"https://github.com/cribbslab/mclumi/blob/main/imgs/install.png?raw=true\">\n <br>\n </h1>\n\n## Usage\nTo ease the use of mclUMI for multiple groups of users, we have made it usable in both command-line interface (CLI) and inline mode. \n\n### 1. CLI\n1.1 Parameter illustration\n\nBy typing `mclumi -h`, you are able to see the package usage as shown below.\n\n```\nusage: mclumi [-h] [--read_structure read_structure] [--lens lens]\n [--input input] [--output output] [--method method]\n [--input_bam input_bam] [--edit_dist edit dist]\n [--inflation_value inflation_value]\n [--expansion_value expansion_value]\n [--iteration_number iteration_number]\n [--mcl_fold_thres mcl_fold_thres] [--is_sv is_sv]\n [--output_bam output_bam] [--verbose verbose]\n [--pos_tag pos_tag] [--gene_assigned_tag gene_assigned_tag]\n [--gene_is_assigned_tag gene_is_assigned_tag]\n tool\n\nWelcome to the mclumi toolkit\n\npositional arguments:\n tool trim, dedup_basic, dedup_pos, dedup_gene, dedup_sc\n\noptional arguments:\n -h, --help show this help message and exit\n --read_structure read_structure, -rs read_structure\n str - the read structure with elements in conjunction\n with +, e.g., primer_1+umi_1+seq_1+umi_2+primer_2\n --lens lens, -l lens str - lengths of all sub-structures separated by +,\n e.g., 20+10+40+10+20 if the read structure is\n primer_1+umi_1+seq_1+umi_2+primer_2\n --input input, -i input\n str - input a fastq file in gz format for trimming\n UMIs\n --output output, -o output\n str - output a UMI-trimmed fastq file in gz format.\n --method method, -m method\n str - a dedup method: unique | cluster | adjacency |\n directional | mcl | mcl_ed | mcl_val\n --input_bam input_bam, -ibam input_bam\n str - input a bam file curated by requirements of\n different dedup modules: dedup_basic, dedup_pos,\n dedup_gene, dedup_sc\n --edit_dist edit dist, -ed edit dist\n int - an edit distance used for building graphs at a\n range of [1, l) where l is the length of a UMI\n --inflation_value inflation_value, -infv inflation_value\n float - an inflation value for MCL, 2.0 by default\n --expansion_value expansion_value, -expv expansion_value\n int - an expansion value for MCL at a range of (1,\n +inf), 2 by default\n --iteration_number iteration_number, -itern iteration_number\n int - iteration number for MCL at a range of (1,\n +inf), 100 by default\n --mcl_fold_thres mcl_fold_thres, -fthres mcl_fold_thres\n float - a fold threshold for MCL at a range of (1, l)\n where l is the length of a UMI.\n --is_sv is_sv, -issv is_sv\n bool - to make sure if the deduplicated reads writes\n to a bam file (True by default or False)\n --output_bam output_bam, -obam output_bam\n str - output UMI-deduplicated summary statistics to a\n txt file.\n --verbose verbose, -vb verbose\n bool - to enable if output logs are on console (True\n by default or False)\n --pos_tag pos_tag, -pt pos_tag\n str - to enable deduplication on the position tags (PO\n recommended when your bam is tagged)\n --gene_assigned_tag gene_assigned_tag, -gt gene_assigned_tag\n str - to enable deduplication on the gene tag (XT\n recommended)\n --gene_is_assigned_tag gene_is_assigned_tag, -gist gene_is_assigned_tag\n str - to check if reads are assigned the gene tag (XS\n recommended)\n```\n\n1.2 Example commands\n\n* extracting and attaching umis to names of reads in fastq format\n ```\n mclumi trim -i ./pcr_1.fastq.gz -o ./pcr_trimmed.fastq.gz -rs primer_1+umi_1+seq_1+umi_2+primer_2 -l 20+10+40+10+20\n ```\n\n* deduplication on only one genome position \n ```\n mclumi dedup_basic -m mcl -ed 1 -infv 1.6 -expv 2 -ibam ./example_bundle.bam -obam ./dedup.bam\n ```\n\n* deduplication per genome position\n ```\n mclumi dedup_pos -m mcl -pt PO -ed 1 -infv 1.6 -expv 2 -ibam ./example_bundle.bam -obam ./basic/dedup.bam\n ```\n\n* deduplication per gene (applicable to bulk RNA-seq data)\n ```\n mclumi dedup_gene -m directional -gt XT -gist XS -ed 1 -ibam ./hgmm_100_STAR_FC_sorted.bam -obam ./dedup.bam\n ```\n\n* deduplication per cell per gene (applicable to single-cell RNA-seq data)\n ```\n mclumi dedup_sc -m directional -gt XT -gist XS -ed 1 -ibam ./hgmm_100_STAR_FC_sorted.bam -obam ./dedup.bam\n ```\n\n### 2. Inline\n\nsee Jupyter notebooks\n```\n./notebooks/\n```\n\n## Output\nsee `./notebooks/results_spelt_out.ipynb` for result format. More types of output format are about to be added.\n\n## Contact\nHomepage: https://www.ndorms.ox.ac.uk/team/adam-cribbs ",
"bugtrack_url": null,
"license": "MIT",
"summary": "UMI de-duplication using mclUMI",
"version": "0.0.4",
"project_urls": {
"Homepage": "https://cribbslab.co.uk/",
"Repository": "https://github.com/cribbslab/mclumi"
},
"split_keywords": [
"packaging",
"mclumi"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "aba027fd768d8b080456f1907b6c1a89f3a8710711720e35a96a3923c9fc8810",
"md5": "8dc49fc0de09507996415d767f555c0b",
"sha256": "f34ae1c1b65d7a75d5257c5d6b5da1886bd7d8df99da326f02d02e65e6b51f00"
},
"downloads": -1,
"filename": "mclumi-0.0.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "8dc49fc0de09507996415d767f555c0b",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10,<4.0",
"size": 85159,
"upload_time": "2023-12-28T05:07:29",
"upload_time_iso_8601": "2023-12-28T05:07:29.947697Z",
"url": "https://files.pythonhosted.org/packages/ab/a0/27fd768d8b080456f1907b6c1a89f3a8710711720e35a96a3923c9fc8810/mclumi-0.0.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "58020bd072c7eed5f4cff6170695cadd411d9daa8743153562051bc0a90d5329",
"md5": "7d67f616f8b7349fed977349cae00ea1",
"sha256": "bc3f4569cf8359b52e3c929f1e8c5e3be986c94afbc156fea44d6a529eadfa7e"
},
"downloads": -1,
"filename": "mclumi-0.0.4.tar.gz",
"has_sig": false,
"md5_digest": "7d67f616f8b7349fed977349cae00ea1",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10,<4.0",
"size": 53876,
"upload_time": "2023-12-28T05:07:31",
"upload_time_iso_8601": "2023-12-28T05:07:31.134286Z",
"url": "https://files.pythonhosted.org/packages/58/02/0bd072c7eed5f4cff6170695cadd411d9daa8743153562051bc0a90d5329/mclumi-0.0.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-12-28 05:07:31",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "cribbslab",
"github_project": "mclumi",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "mclumi"
}