mclumi


Namemclumi JSON
Version 0.0.4 PyPI version JSON
download
home_pagehttps://cribbslab.co.uk/
SummaryUMI de-duplication using mclUMI
upload_time2023-12-28 05:07:31
maintainerJianfeng Sun
docs_urlNone
authorJianfeng Sun
requires_python>=3.10,<4.0
licenseMIT
keywords packaging mclumi
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <h1 align="center">
    <img src="https://github.com/cribbslab/mclumi-dev/blob/main/img/mclumi-logo-trans.png?raw=true" width="200" height="124">
    <br>
</h1>

![](https://img.shields.io/badge/mclUMI-executable-519dd9.svg)
![](https://img.shields.io/badge/last_released-Oct._2021-green.svg)
![](https://img.shields.io/github/stars/cribbslab/mclumi?logo=GitHub&color=blue)
![](https://img.shields.io/pypi/v/mclumix?logo=PyPI)
[![Documentation Status](https://readthedocs.org/projects/mclumi/badge/?version=latest)](https://mclumi.readthedocs.io/en/latest/?badge=latest)
[![Downloads](https://pepy.tech/badge/mclumi)](https://pepy.tech/project/mclumi)

###### tags: `UMI deduplication` `PCR deduplication` `scRNA-seq` `bulk-RNA-seq`

## Overview
This repository deposits the mclUMI toolkit developed by Markov clustering (MCL) network-based algorithms for precisely localizing unique UMIs and thus removing PCR duplicates. mclUMI enables a construction of sub-graphs with UMI nodes to be relatively strongly connected.

## Documentation
The API documentation of mclUMI is available at https://mclumi.herokuapp.com and https://mclumi.readthedocs.io/en/latest.

## System requirement
Linux or Mac

## Installation
We tested the software installation on a Linux system, which has the following configuration:
* Distributor ID: Ubuntu
* Description:    Ubuntu 20.04.3
* Release:        20.04
* Codename:       focal

The anaconda is configured as:
* Conda version: 4.11.0

> You can use `conda update conda` and `conda update anaconda` to keep your anaconda up-to-date.

We recommend using a `Python` of version **`3.9.1`** as the base python to create your conda environment because `NumPy` and `Pandas` in a `Python` of higher version `3.9` may require a few dependencies that are not included in the installation of mclUMI or make conflicts with existing packages.

**Step 1**: create a conda environment, e.g., mclumi
  ```angular2html
  conda create --name mclumi python=3.9.1
      
  conda activate mclumi
  ```
  
  <h1>
      <img src="https://github.com/cribbslab/mclumi/blob/main/imgs/conda-setting.png?raw=true">
      <br>
  </h1>

**Step 2**: sourced from https://pypi.org/project/mclumix.
  ```angular2html
  pip install --upgrade mclumix
  ```
After a two-step installation procedure, you should see the following outputs.
  <h1>
      <img src="https://github.com/cribbslab/mclumi/blob/main/imgs/install.png?raw=true">
      <br>
  </h1>

## Usage
To ease the use of mclUMI for multiple groups of users, we have made it usable in both command-line interface (CLI) and inline mode. 

### 1. CLI
1.1 Parameter illustration

By typing `mclumi -h`, you are able to see the package usage as shown below.

```
usage: mclumi [-h] [--read_structure read_structure] [--lens lens]
              [--input input] [--output output] [--method method]
              [--input_bam input_bam] [--edit_dist edit dist]
              [--inflation_value inflation_value]
              [--expansion_value expansion_value]
              [--iteration_number iteration_number]
              [--mcl_fold_thres mcl_fold_thres] [--is_sv is_sv]
              [--output_bam output_bam] [--verbose verbose]
              [--pos_tag pos_tag] [--gene_assigned_tag gene_assigned_tag]
              [--gene_is_assigned_tag gene_is_assigned_tag]
              tool

Welcome to the mclumi toolkit

positional arguments:
  tool                  trim, dedup_basic, dedup_pos, dedup_gene, dedup_sc

optional arguments:
  -h, --help            show this help message and exit
  --read_structure read_structure, -rs read_structure
                        str - the read structure with elements in conjunction
                        with +, e.g., primer_1+umi_1+seq_1+umi_2+primer_2
  --lens lens, -l lens  str - lengths of all sub-structures separated by +,
                        e.g., 20+10+40+10+20 if the read structure is
                        primer_1+umi_1+seq_1+umi_2+primer_2
  --input input, -i input
                        str - input a fastq file in gz format for trimming
                        UMIs
  --output output, -o output
                        str - output a UMI-trimmed fastq file in gz format.
  --method method, -m method
                        str - a dedup method: unique | cluster | adjacency |
                        directional | mcl | mcl_ed | mcl_val
  --input_bam input_bam, -ibam input_bam
                        str - input a bam file curated by requirements of
                        different dedup modules: dedup_basic, dedup_pos,
                        dedup_gene, dedup_sc
  --edit_dist edit dist, -ed edit dist
                        int - an edit distance used for building graphs at a
                        range of [1, l) where l is the length of a UMI
  --inflation_value inflation_value, -infv inflation_value
                        float - an inflation value for MCL, 2.0 by default
  --expansion_value expansion_value, -expv expansion_value
                        int - an expansion value for MCL at a range of (1,
                        +inf), 2 by default
  --iteration_number iteration_number, -itern iteration_number
                        int - iteration number for MCL at a range of (1,
                        +inf), 100 by default
  --mcl_fold_thres mcl_fold_thres, -fthres mcl_fold_thres
                        float - a fold threshold for MCL at a range of (1, l)
                        where l is the length of a UMI.
  --is_sv is_sv, -issv is_sv
                        bool - to make sure if the deduplicated reads writes
                        to a bam file (True by default or False)
  --output_bam output_bam, -obam output_bam
                        str - output UMI-deduplicated summary statistics to a
                        txt file.
  --verbose verbose, -vb verbose
                        bool - to enable if output logs are on console (True
                        by default or False)
  --pos_tag pos_tag, -pt pos_tag
                        str - to enable deduplication on the position tags (PO
                        recommended when your bam is tagged)
  --gene_assigned_tag gene_assigned_tag, -gt gene_assigned_tag
                        str - to enable deduplication on the gene tag (XT
                        recommended)
  --gene_is_assigned_tag gene_is_assigned_tag, -gist gene_is_assigned_tag
                        str - to check if reads are assigned the gene tag (XS
                        recommended)
```

1.2 Example commands

* extracting and attaching umis to names of reads in fastq format
    ```
    mclumi trim -i ./pcr_1.fastq.gz -o ./pcr_trimmed.fastq.gz -rs primer_1+umi_1+seq_1+umi_2+primer_2 -l 20+10+40+10+20
    ```

* deduplication on only one genome position 
    ```
    mclumi dedup_basic -m mcl -ed 1 -infv 1.6 -expv 2 -ibam ./example_bundle.bam -obam ./dedup.bam
    ```

* deduplication per genome position
    ```
   mclumi dedup_pos -m mcl -pt PO -ed 1 -infv 1.6 -expv 2 -ibam ./example_bundle.bam -obam ./basic/dedup.bam
    ```

* deduplication per gene (applicable to bulk RNA-seq data)
    ```
    mclumi dedup_gene -m directional -gt XT -gist XS -ed 1 -ibam ./hgmm_100_STAR_FC_sorted.bam -obam ./dedup.bam
    ```

* deduplication per cell per gene (applicable to single-cell RNA-seq data)
    ```
    mclumi dedup_sc -m directional -gt XT -gist XS -ed 1 -ibam ./hgmm_100_STAR_FC_sorted.bam -obam ./dedup.bam
    ```

### 2. Inline

see Jupyter notebooks
```
./notebooks/
```

## Output
see `./notebooks/results_spelt_out.ipynb` for result format. More types of output format are about to be added.

## Contact
Homepage: https://www.ndorms.ox.ac.uk/team/adam-cribbs  
            

Raw data

            {
    "_id": null,
    "home_page": "https://cribbslab.co.uk/",
    "name": "mclumi",
    "maintainer": "Jianfeng Sun",
    "docs_url": null,
    "requires_python": ">=3.10,<4.0",
    "maintainer_email": "jianfeng.sun@ndorms.ox.ac.uk",
    "keywords": "packaging,mclumi",
    "author": "Jianfeng Sun",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/58/02/0bd072c7eed5f4cff6170695cadd411d9daa8743153562051bc0a90d5329/mclumi-0.0.4.tar.gz",
    "platform": null,
    "description": "<h1 align=\"center\">\n    <img src=\"https://github.com/cribbslab/mclumi-dev/blob/main/img/mclumi-logo-trans.png?raw=true\" width=\"200\" height=\"124\">\n    <br>\n</h1>\n\n![](https://img.shields.io/badge/mclUMI-executable-519dd9.svg)\n![](https://img.shields.io/badge/last_released-Oct._2021-green.svg)\n![](https://img.shields.io/github/stars/cribbslab/mclumi?logo=GitHub&color=blue)\n![](https://img.shields.io/pypi/v/mclumix?logo=PyPI)\n[![Documentation Status](https://readthedocs.org/projects/mclumi/badge/?version=latest)](https://mclumi.readthedocs.io/en/latest/?badge=latest)\n[![Downloads](https://pepy.tech/badge/mclumi)](https://pepy.tech/project/mclumi)\n\n###### tags: `UMI deduplication` `PCR deduplication` `scRNA-seq` `bulk-RNA-seq`\n\n## Overview\nThis repository deposits the mclUMI toolkit developed by Markov clustering (MCL) network-based algorithms for precisely localizing unique UMIs and thus removing PCR duplicates. mclUMI enables a construction of sub-graphs with UMI nodes to be relatively strongly connected.\n\n## Documentation\nThe API documentation of mclUMI is available at https://mclumi.herokuapp.com and https://mclumi.readthedocs.io/en/latest.\n\n## System requirement\nLinux or Mac\n\n## Installation\nWe tested the software installation on a Linux system, which has the following configuration:\n* Distributor ID: Ubuntu\n* Description:    Ubuntu 20.04.3\n* Release:        20.04\n* Codename:       focal\n\nThe anaconda is configured as:\n* Conda version: 4.11.0\n\n> You can use `conda update conda` and `conda update anaconda` to keep your anaconda up-to-date.\n\nWe recommend using a `Python` of version **`3.9.1`** as the base python to create your conda environment because `NumPy` and `Pandas` in a `Python` of higher version `3.9` may require a few dependencies that are not included in the installation of mclUMI or make conflicts with existing packages.\n\n**Step 1**: create a conda environment, e.g., mclumi\n  ```angular2html\n  conda create --name mclumi python=3.9.1\n      \n  conda activate mclumi\n  ```\n  \n  <h1>\n      <img src=\"https://github.com/cribbslab/mclumi/blob/main/imgs/conda-setting.png?raw=true\">\n      <br>\n  </h1>\n\n**Step 2**: sourced from https://pypi.org/project/mclumix.\n  ```angular2html\n  pip install --upgrade mclumix\n  ```\nAfter a two-step installation procedure, you should see the following outputs.\n  <h1>\n      <img src=\"https://github.com/cribbslab/mclumi/blob/main/imgs/install.png?raw=true\">\n      <br>\n  </h1>\n\n## Usage\nTo ease the use of mclUMI for multiple groups of users, we have made it usable in both command-line interface (CLI) and inline mode. \n\n### 1. CLI\n1.1 Parameter illustration\n\nBy typing `mclumi -h`, you are able to see the package usage as shown below.\n\n```\nusage: mclumi [-h] [--read_structure read_structure] [--lens lens]\n              [--input input] [--output output] [--method method]\n              [--input_bam input_bam] [--edit_dist edit dist]\n              [--inflation_value inflation_value]\n              [--expansion_value expansion_value]\n              [--iteration_number iteration_number]\n              [--mcl_fold_thres mcl_fold_thres] [--is_sv is_sv]\n              [--output_bam output_bam] [--verbose verbose]\n              [--pos_tag pos_tag] [--gene_assigned_tag gene_assigned_tag]\n              [--gene_is_assigned_tag gene_is_assigned_tag]\n              tool\n\nWelcome to the mclumi toolkit\n\npositional arguments:\n  tool                  trim, dedup_basic, dedup_pos, dedup_gene, dedup_sc\n\noptional arguments:\n  -h, --help            show this help message and exit\n  --read_structure read_structure, -rs read_structure\n                        str - the read structure with elements in conjunction\n                        with +, e.g., primer_1+umi_1+seq_1+umi_2+primer_2\n  --lens lens, -l lens  str - lengths of all sub-structures separated by +,\n                        e.g., 20+10+40+10+20 if the read structure is\n                        primer_1+umi_1+seq_1+umi_2+primer_2\n  --input input, -i input\n                        str - input a fastq file in gz format for trimming\n                        UMIs\n  --output output, -o output\n                        str - output a UMI-trimmed fastq file in gz format.\n  --method method, -m method\n                        str - a dedup method: unique | cluster | adjacency |\n                        directional | mcl | mcl_ed | mcl_val\n  --input_bam input_bam, -ibam input_bam\n                        str - input a bam file curated by requirements of\n                        different dedup modules: dedup_basic, dedup_pos,\n                        dedup_gene, dedup_sc\n  --edit_dist edit dist, -ed edit dist\n                        int - an edit distance used for building graphs at a\n                        range of [1, l) where l is the length of a UMI\n  --inflation_value inflation_value, -infv inflation_value\n                        float - an inflation value for MCL, 2.0 by default\n  --expansion_value expansion_value, -expv expansion_value\n                        int - an expansion value for MCL at a range of (1,\n                        +inf), 2 by default\n  --iteration_number iteration_number, -itern iteration_number\n                        int - iteration number for MCL at a range of (1,\n                        +inf), 100 by default\n  --mcl_fold_thres mcl_fold_thres, -fthres mcl_fold_thres\n                        float - a fold threshold for MCL at a range of (1, l)\n                        where l is the length of a UMI.\n  --is_sv is_sv, -issv is_sv\n                        bool - to make sure if the deduplicated reads writes\n                        to a bam file (True by default or False)\n  --output_bam output_bam, -obam output_bam\n                        str - output UMI-deduplicated summary statistics to a\n                        txt file.\n  --verbose verbose, -vb verbose\n                        bool - to enable if output logs are on console (True\n                        by default or False)\n  --pos_tag pos_tag, -pt pos_tag\n                        str - to enable deduplication on the position tags (PO\n                        recommended when your bam is tagged)\n  --gene_assigned_tag gene_assigned_tag, -gt gene_assigned_tag\n                        str - to enable deduplication on the gene tag (XT\n                        recommended)\n  --gene_is_assigned_tag gene_is_assigned_tag, -gist gene_is_assigned_tag\n                        str - to check if reads are assigned the gene tag (XS\n                        recommended)\n```\n\n1.2 Example commands\n\n* extracting and attaching umis to names of reads in fastq format\n    ```\n    mclumi trim -i ./pcr_1.fastq.gz -o ./pcr_trimmed.fastq.gz -rs primer_1+umi_1+seq_1+umi_2+primer_2 -l 20+10+40+10+20\n    ```\n\n* deduplication on only one genome position \n    ```\n    mclumi dedup_basic -m mcl -ed 1 -infv 1.6 -expv 2 -ibam ./example_bundle.bam -obam ./dedup.bam\n    ```\n\n* deduplication per genome position\n    ```\n   mclumi dedup_pos -m mcl -pt PO -ed 1 -infv 1.6 -expv 2 -ibam ./example_bundle.bam -obam ./basic/dedup.bam\n    ```\n\n* deduplication per gene (applicable to bulk RNA-seq data)\n    ```\n    mclumi dedup_gene -m directional -gt XT -gist XS -ed 1 -ibam ./hgmm_100_STAR_FC_sorted.bam -obam ./dedup.bam\n    ```\n\n* deduplication per cell per gene (applicable to single-cell RNA-seq data)\n    ```\n    mclumi dedup_sc -m directional -gt XT -gist XS -ed 1 -ibam ./hgmm_100_STAR_FC_sorted.bam -obam ./dedup.bam\n    ```\n\n### 2. Inline\n\nsee Jupyter notebooks\n```\n./notebooks/\n```\n\n## Output\nsee `./notebooks/results_spelt_out.ipynb` for result format. More types of output format are about to be added.\n\n## Contact\nHomepage: https://www.ndorms.ox.ac.uk/team/adam-cribbs  ",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "UMI de-duplication using mclUMI",
    "version": "0.0.4",
    "project_urls": {
        "Homepage": "https://cribbslab.co.uk/",
        "Repository": "https://github.com/cribbslab/mclumi"
    },
    "split_keywords": [
        "packaging",
        "mclumi"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "aba027fd768d8b080456f1907b6c1a89f3a8710711720e35a96a3923c9fc8810",
                "md5": "8dc49fc0de09507996415d767f555c0b",
                "sha256": "f34ae1c1b65d7a75d5257c5d6b5da1886bd7d8df99da326f02d02e65e6b51f00"
            },
            "downloads": -1,
            "filename": "mclumi-0.0.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "8dc49fc0de09507996415d767f555c0b",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10,<4.0",
            "size": 85159,
            "upload_time": "2023-12-28T05:07:29",
            "upload_time_iso_8601": "2023-12-28T05:07:29.947697Z",
            "url": "https://files.pythonhosted.org/packages/ab/a0/27fd768d8b080456f1907b6c1a89f3a8710711720e35a96a3923c9fc8810/mclumi-0.0.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "58020bd072c7eed5f4cff6170695cadd411d9daa8743153562051bc0a90d5329",
                "md5": "7d67f616f8b7349fed977349cae00ea1",
                "sha256": "bc3f4569cf8359b52e3c929f1e8c5e3be986c94afbc156fea44d6a529eadfa7e"
            },
            "downloads": -1,
            "filename": "mclumi-0.0.4.tar.gz",
            "has_sig": false,
            "md5_digest": "7d67f616f8b7349fed977349cae00ea1",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10,<4.0",
            "size": 53876,
            "upload_time": "2023-12-28T05:07:31",
            "upload_time_iso_8601": "2023-12-28T05:07:31.134286Z",
            "url": "https://files.pythonhosted.org/packages/58/02/0bd072c7eed5f4cff6170695cadd411d9daa8743153562051bc0a90d5329/mclumi-0.0.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-12-28 05:07:31",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "cribbslab",
    "github_project": "mclumi",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "mclumi"
}
        
Elapsed time: 0.35248s