variant


Namevariant JSON
Version 0.0.87 PyPI version JSON
download
home_pagehttps://github.com/yech1990/variant
Summary
upload_time2024-01-10 08:22:33
maintainer
docs_urlNone
authorChang Ye
requires_python>=3.8,<4.0
licenseMIT
keywords bioinformatics variant mutation rna modification
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Python pakcage for genomic variant analysis

[![Pypi Releases](https://img.shields.io/pypi/v/variant.svg)](https://pypi.python.org/pypi/variant)
[![Downloads](https://pepy.tech/badge/variant)](https://pepy.tech/project/variant)

# How to install?

```
pip install variant
```

# How to use?

## 🧬 `variant motif` subcommand can fetch motif sequence around given site.

```
 Usage: variant motif [OPTIONS]

 Fetch genomic motif.

╭─ Options ─────────────────────────────────────────────────────────────────╮
│    --input        -i  TEXT  Input position file.                          │
│    --output       -o  TEXT  Output annotation file.                       │
│ *  --fasta        -f  TEXT  reference fasta file. [required]              │
│    --npad         -n  TEXT  Number of padding base to call motif. If you  │
│                             want to set different left and right pads,    │
│                             use comma to separate them. (eg. 2,3)         │
│    --with-header  -H        With header line in input file.               │
│    --columns      -c  TEXT  Sets columns for site info.                   │
│                             (Chrom,Pos,Strand)                            │
│                             [default: 1,2,3]                              │
│    --to-upper     -u        Convert motif to upper case.                  │
│    --wrap-site    -w        Wrap motif site.                              │
│    --help         -h        Show this message and exit.                   │
╰───────────────────────────────────────────────────────────────────────────╯
```

> demo:

I would like to get the 2 bases before the given sites, and 3 bases after the given sites, meanwhile, wrap the give sites with bracket. Moreover, the strand information should be taken into account.

use `-n 2,3 -w`

## 🧫 `variant effect` subcommand can infer the effect of a mutation

```
 Usage: variant effect [OPTIONS]

 Annotation genomic variant effect.

╭─ Options ─────────────────────────────────────────────────────────────────╮
│ --input                 -i  TEXT     Input position file.                 │
│ --output                -o  TEXT     Output annotation file               │
│ --reference             -r  TEXT     reference species                    │
│ --reference-gtf             TEXT     Customized reference gtf file.       │
│ --reference-transcript      TEXT     Customized reference transcript      │
│                                      fasta file.                          │
│ --reference-protein         TEXT     Customized reference protein fasta   │
│                                      file.                                │
│ --release               -e  INTEGER  ensembl release                      │
│ --strandness            -s           Use strand infomation or not?        │
│ --pU-mode               -u           Make rRNA, tRNA, snoRNA into top     │
│                                      priority.                            │
│ --npad                  -n  INTEGER  Number of padding base to call       │
│                                      motif.                               │
│ --all-effects           -a           Output all effects.                  │
│ --with-header           -H           With header line in input file.      │
│ --columns               -c  TEXT     Sets columns for site info.          │
│                                      (Chrom,Pos,Strand,Ref,Alt)           │
│                                      [default: 1,2,3,4,5]                 │
│ --help                  -h           Show this message and exit.          │
╰───────────────────────────────────────────────────────────────────────────╯
```

> demo:

Store the following table in file (`sites.tsv`).

| Chrom | Position  | Strand | Ref | Alt |
| ----- | --------- | ------ | --- | --- |
| chr1  | 230703034 | -      | C   | T   |
| chr12 | 69353439  | +      | A   | T   |
| chr14 | 23645352  | +      | G   | T   |
| chr2  | 215361150 | -      | A   | T   |
| chr2  | 84906537  | +      | C   | T   |
| chr22 | 39319077  | -      | T   | A   |
| chr22 | 39319095  | -      | T   | A   |
| chr22 | 39319098  | -      | T   | A   |

Run command:

```bash
variant-effect -i sites.tsv -H -r human -e 108 -t RNA -H -c 1,2,3
```

- `-i` specify the input file
- `-H` means the file is with header line, and the first row will be skipped;
- `-r` use the specific genome, default is human
- `-e` specify the Ensembl release version
- `-c` means only use some of the columns in the input file. default will use the first 5 columns.

You will have this output

| Chrom | Position  | Strand | Ref | Alt | mut_type      | gene_type      | gene_name               | gene_pos | transcript_name             | transcript_pos | transcript_motif      | coding_pos | codon_ref | aa_pos | aa_ref | distance2splice |
| :---- | :-------- | :----- | :-- | :-- | :------------ | :------------- | :---------------------- | :------- | :-------------------------- | :------------- | :-------------------- | :--------- | :-------- | :----- | :----- | --------------- |
| chr1  | 230703034 | -      | C   | T   | ThreePrimeUTR | protein_coding | ENSG00000135744(AGT)    | 42543    | ENST00000680041(AGT-208)    | 1753           | TGTGTCACCCCCAGTCTCCCA | None       | None      | None   | None   | 295             |
| chr12 | 69353439  | +      | A   | T   | ThreePrimeUTR | protein_coding | ENSG00000090382(LYZ)    | 5059     | ENST00000261267(LYZ-201)    | 695            | TAGAACTAATACTGGTGAAAA | None       | None      | None   | None   | 286             |
| chr14 | 23645352  | +      | G   | T   | ThreePrimeUTR | protein_coding | ENSG00000100867(DHRS2)  | 15238    | ENST00000344777(DHRS2-202)  | 1391           | CTGCCATTCTGCCAGACTAGC | None       | None      | None   | None   | 210             |
| chr2  | 215361150 | -      | A   | T   | ThreePrimeUTR | protein_coding | ENSG00000115414(FN1)    | 74924    | ENST00000323926(FN1-201)    | 8012           | GGCCCGCAATACTGTAGGAAC | None       | None      | None   | None   | 476             |
| chr2  | 84906537  | +      | C   | T   | ThreePrimeUTR | protein_coding | ENSG00000034510(TMSB10) | 882      | ENST00000233143(TMSB10-201) | 327            | CCTGGGCACTCCGCGCCGATG | None       | None      | None   | None   | 148             |
| chr22 | 39319077  | -      | T   | A   | Intronic      | protein_coding | ENSG00000100316(RPL3)   | 1313     | ENST00000216146(RPL3-201)   | None           | None                  | None       | None      | None   | None   | None            |
| chr22 | 39319095  | -      | T   | A   | Intronic      | protein_coding | ENSG00000100316(RPL3)   | 1295     | ENST00000216146(RPL3-201)   | None           | None                  | None       | None      | None   | None   | None            |
| chr22 | 39319098  | -      | T   | A   | Intronic      | protein_coding | ENSG00000100316(RPL3)   | 1292     | ENST00000216146(RPL3-201)   | None           | None                  | None       | None      | None   | None   | None            |

## 🧫 `variant coordinate` subcommand can mapping chrom name and positions between different reference coordinate

```
 Usage: variant coordinate [OPTIONS]

 Fetch genomic motif.

╭─ Options ───────────────────────────────────────────────────────────────────╮
│ --input              -i  TEXT  Input position file.                         │
│ --output             -o  TEXT  Output annotation file.                      │
│ --reference-mapping  -m  TEXT  Mapping file for chrom name, first column is │
│                                chrom in the input, second column is chrom   │
│                                in the reference db (sep by tab)             │
│ --buildin-mapping    -M  TEXT  Build-in mapping for chrom name: U2E (UCSC   │
│                                to Ensembl), E2U (Ensembl to UCSC)           │
│ --with-header        -H        With header line in input file.              │
│ --columns            -c  TEXT  Sets columns for site info. (Chrom)          │
│                                [default: 1]                                 │
│ --help               -h        Show this message and exit.                  │
╰─────────────────────────────────────────────────────────────────────────────╯

```

## ⏳⏳⏳ more functions will be supported in the future

## TODO:

- imporve speed. Base on [cgranges](https://github.com/lh3/cgranges), [pyranges](https://github.com/biocore-ntnu/pyranges)?, or [BioCantor](https://github.com/InscriptaLabs/BioCantor)?

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/yech1990/variant",
    "name": "variant",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8,<4.0",
    "maintainer_email": "",
    "keywords": "bioinformatics,variant,mutation,RNA modification",
    "author": "Chang Ye",
    "author_email": "yech1990@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/e1/89/1c266fbc7c59eb0216db77a1846902c02017e78e491e04e42ed3fba001a3/variant-0.0.87.tar.gz",
    "platform": null,
    "description": "# Python pakcage for genomic variant analysis\n\n[![Pypi Releases](https://img.shields.io/pypi/v/variant.svg)](https://pypi.python.org/pypi/variant)\n[![Downloads](https://pepy.tech/badge/variant)](https://pepy.tech/project/variant)\n\n# How to install?\n\n```\npip install variant\n```\n\n# How to use?\n\n## \ud83e\uddec `variant motif` subcommand can fetch motif sequence around given site.\n\n```\n Usage: variant motif [OPTIONS]\n\n Fetch genomic motif.\n\n\u256d\u2500 Options \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256e\n\u2502    --input        -i  TEXT  Input position file.                          \u2502\n\u2502    --output       -o  TEXT  Output annotation file.                       \u2502\n\u2502 *  --fasta        -f  TEXT  reference fasta file. [required]              \u2502\n\u2502    --npad         -n  TEXT  Number of padding base to call motif. If you  \u2502\n\u2502                             want to set different left and right pads,    \u2502\n\u2502                             use comma to separate them. (eg. 2,3)         \u2502\n\u2502    --with-header  -H        With header line in input file.               \u2502\n\u2502    --columns      -c  TEXT  Sets columns for site info.                   \u2502\n\u2502                             (Chrom,Pos,Strand)                            \u2502\n\u2502                             [default: 1,2,3]                              \u2502\n\u2502    --to-upper     -u        Convert motif to upper case.                  \u2502\n\u2502    --wrap-site    -w        Wrap motif site.                              \u2502\n\u2502    --help         -h        Show this message and exit.                   \u2502\n\u2570\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256f\n```\n\n> demo:\n\nI would like to get the 2 bases before the given sites, and 3 bases after the given sites, meanwhile, wrap the give sites with bracket. Moreover, the strand information should be taken into account.\n\nuse `-n 2,3 -w`\n\n## \ud83e\uddeb `variant effect` subcommand can infer the effect of a mutation\n\n```\n Usage: variant effect [OPTIONS]\n\n Annotation genomic variant effect.\n\n\u256d\u2500 Options \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256e\n\u2502 --input                 -i  TEXT     Input position file.                 \u2502\n\u2502 --output                -o  TEXT     Output annotation file               \u2502\n\u2502 --reference             -r  TEXT     reference species                    \u2502\n\u2502 --reference-gtf             TEXT     Customized reference gtf file.       \u2502\n\u2502 --reference-transcript      TEXT     Customized reference transcript      \u2502\n\u2502                                      fasta file.                          \u2502\n\u2502 --reference-protein         TEXT     Customized reference protein fasta   \u2502\n\u2502                                      file.                                \u2502\n\u2502 --release               -e  INTEGER  ensembl release                      \u2502\n\u2502 --strandness            -s           Use strand infomation or not?        \u2502\n\u2502 --pU-mode               -u           Make rRNA, tRNA, snoRNA into top     \u2502\n\u2502                                      priority.                            \u2502\n\u2502 --npad                  -n  INTEGER  Number of padding base to call       \u2502\n\u2502                                      motif.                               \u2502\n\u2502 --all-effects           -a           Output all effects.                  \u2502\n\u2502 --with-header           -H           With header line in input file.      \u2502\n\u2502 --columns               -c  TEXT     Sets columns for site info.          \u2502\n\u2502                                      (Chrom,Pos,Strand,Ref,Alt)           \u2502\n\u2502                                      [default: 1,2,3,4,5]                 \u2502\n\u2502 --help                  -h           Show this message and exit.          \u2502\n\u2570\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256f\n```\n\n> demo:\n\nStore the following table in file (`sites.tsv`).\n\n| Chrom | Position  | Strand | Ref | Alt |\n| ----- | --------- | ------ | --- | --- |\n| chr1  | 230703034 | -      | C   | T   |\n| chr12 | 69353439  | +      | A   | T   |\n| chr14 | 23645352  | +      | G   | T   |\n| chr2  | 215361150 | -      | A   | T   |\n| chr2  | 84906537  | +      | C   | T   |\n| chr22 | 39319077  | -      | T   | A   |\n| chr22 | 39319095  | -      | T   | A   |\n| chr22 | 39319098  | -      | T   | A   |\n\nRun command:\n\n```bash\nvariant-effect -i sites.tsv -H -r human -e 108 -t RNA -H -c 1,2,3\n```\n\n- `-i` specify the input file\n- `-H` means the file is with header line, and the first row will be skipped;\n- `-r` use the specific genome, default is human\n- `-e` specify the Ensembl release version\n- `-c` means only use some of the columns in the input file. default will use the first 5 columns.\n\nYou will have this output\n\n| Chrom | Position  | Strand | Ref | Alt | mut_type      | gene_type      | gene_name               | gene_pos | transcript_name             | transcript_pos | transcript_motif      | coding_pos | codon_ref | aa_pos | aa_ref | distance2splice |\n| :---- | :-------- | :----- | :-- | :-- | :------------ | :------------- | :---------------------- | :------- | :-------------------------- | :------------- | :-------------------- | :--------- | :-------- | :----- | :----- | --------------- |\n| chr1  | 230703034 | -      | C   | T   | ThreePrimeUTR | protein_coding | ENSG00000135744(AGT)    | 42543    | ENST00000680041(AGT-208)    | 1753           | TGTGTCACCCCCAGTCTCCCA | None       | None      | None   | None   | 295             |\n| chr12 | 69353439  | +      | A   | T   | ThreePrimeUTR | protein_coding | ENSG00000090382(LYZ)    | 5059     | ENST00000261267(LYZ-201)    | 695            | TAGAACTAATACTGGTGAAAA | None       | None      | None   | None   | 286             |\n| chr14 | 23645352  | +      | G   | T   | ThreePrimeUTR | protein_coding | ENSG00000100867(DHRS2)  | 15238    | ENST00000344777(DHRS2-202)  | 1391           | CTGCCATTCTGCCAGACTAGC | None       | None      | None   | None   | 210             |\n| chr2  | 215361150 | -      | A   | T   | ThreePrimeUTR | protein_coding | ENSG00000115414(FN1)    | 74924    | ENST00000323926(FN1-201)    | 8012           | GGCCCGCAATACTGTAGGAAC | None       | None      | None   | None   | 476             |\n| chr2  | 84906537  | +      | C   | T   | ThreePrimeUTR | protein_coding | ENSG00000034510(TMSB10) | 882      | ENST00000233143(TMSB10-201) | 327            | CCTGGGCACTCCGCGCCGATG | None       | None      | None   | None   | 148             |\n| chr22 | 39319077  | -      | T   | A   | Intronic      | protein_coding | ENSG00000100316(RPL3)   | 1313     | ENST00000216146(RPL3-201)   | None           | None                  | None       | None      | None   | None   | None            |\n| chr22 | 39319095  | -      | T   | A   | Intronic      | protein_coding | ENSG00000100316(RPL3)   | 1295     | ENST00000216146(RPL3-201)   | None           | None                  | None       | None      | None   | None   | None            |\n| chr22 | 39319098  | -      | T   | A   | Intronic      | protein_coding | ENSG00000100316(RPL3)   | 1292     | ENST00000216146(RPL3-201)   | None           | None                  | None       | None      | None   | None   | None            |\n\n## \ud83e\uddeb `variant coordinate` subcommand can mapping chrom name and positions between different reference coordinate\n\n```\n Usage: variant coordinate [OPTIONS]\n\n Fetch genomic motif.\n\n\u256d\u2500 Options \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256e\n\u2502 --input              -i  TEXT  Input position file.                         \u2502\n\u2502 --output             -o  TEXT  Output annotation file.                      \u2502\n\u2502 --reference-mapping  -m  TEXT  Mapping file for chrom name, first column is \u2502\n\u2502                                chrom in the input, second column is chrom   \u2502\n\u2502                                in the reference db (sep by tab)             \u2502\n\u2502 --buildin-mapping    -M  TEXT  Build-in mapping for chrom name: U2E (UCSC   \u2502\n\u2502                                to Ensembl), E2U (Ensembl to UCSC)           \u2502\n\u2502 --with-header        -H        With header line in input file.              \u2502\n\u2502 --columns            -c  TEXT  Sets columns for site info. (Chrom)          \u2502\n\u2502                                [default: 1]                                 \u2502\n\u2502 --help               -h        Show this message and exit.                  \u2502\n\u2570\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256f\n\n```\n\n## \u23f3\u23f3\u23f3 more functions will be supported in the future\n\n## TODO:\n\n- imporve speed. Base on [cgranges](https://github.com/lh3/cgranges), [pyranges](https://github.com/biocore-ntnu/pyranges)?, or [BioCantor](https://github.com/InscriptaLabs/BioCantor)?\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "",
    "version": "0.0.87",
    "project_urls": {
        "Homepage": "https://github.com/yech1990/variant",
        "Repository": "https://github.com/yech1990/variant"
    },
    "split_keywords": [
        "bioinformatics",
        "variant",
        "mutation",
        "rna modification"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9abe9287dc53e4bd3d2a7eee8aec6cf9cbcbfabfa71bb89e6ad3d3f2f5498229",
                "md5": "f15afc00640d80a1c045f6321023d154",
                "sha256": "7efa50d15ddbe2a3a870e4e0d7ef465cad878c0ef179c700ddf30587b244fac4"
            },
            "downloads": -1,
            "filename": "variant-0.0.87-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "f15afc00640d80a1c045f6321023d154",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8,<4.0",
            "size": 15463,
            "upload_time": "2024-01-10T08:22:31",
            "upload_time_iso_8601": "2024-01-10T08:22:31.632602Z",
            "url": "https://files.pythonhosted.org/packages/9a/be/9287dc53e4bd3d2a7eee8aec6cf9cbcbfabfa71bb89e6ad3d3f2f5498229/variant-0.0.87-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e1891c266fbc7c59eb0216db77a1846902c02017e78e491e04e42ed3fba001a3",
                "md5": "93c230b0a2610c97cd9abb30be916631",
                "sha256": "96cc849bf0f781fb8c25ead9b58e81fa0ae890b0b1be4895abbb22cb46ab0373"
            },
            "downloads": -1,
            "filename": "variant-0.0.87.tar.gz",
            "has_sig": false,
            "md5_digest": "93c230b0a2610c97cd9abb30be916631",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8,<4.0",
            "size": 14267,
            "upload_time": "2024-01-10T08:22:33",
            "upload_time_iso_8601": "2024-01-10T08:22:33.116834Z",
            "url": "https://files.pythonhosted.org/packages/e1/89/1c266fbc7c59eb0216db77a1846902c02017e78e491e04e42ed3fba001a3/variant-0.0.87.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-01-10 08:22:33",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "yech1990",
    "github_project": "variant",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "variant"
}
        
Elapsed time: 0.16318s