# Python pakcage for genomic variant analysis
[![Pypi Releases](https://img.shields.io/pypi/v/variant.svg)](https://pypi.python.org/pypi/variant)
[![Downloads](https://pepy.tech/badge/variant)](https://pepy.tech/project/variant)
# How to install?
```
pip install variant
```
# How to use?
## 🧬 `variant motif` subcommand can fetch motif sequence around given site.
```
Usage: variant motif [OPTIONS]
Fetch genomic motif.
╭─ Options ─────────────────────────────────────────────────────────────────╮
│ --input -i TEXT Input position file. │
│ --output -o TEXT Output annotation file. │
│ * --fasta -f TEXT reference fasta file. [required] │
│ --npad -n TEXT Number of padding base to call motif. If you │
│ want to set different left and right pads, │
│ use comma to separate them. (eg. 2,3) │
│ --with-header -H With header line in input file. │
│ --columns -c TEXT Sets columns for site info. │
│ (Chrom,Pos,Strand) │
│ [default: 1,2,3] │
│ --to-upper -u Convert motif to upper case. │
│ --wrap-site -w Wrap motif site. │
│ --help -h Show this message and exit. │
╰───────────────────────────────────────────────────────────────────────────╯
```
> demo:
I would like to get the 2 bases before the given sites, and 3 bases after the given sites, meanwhile, wrap the give sites with bracket. Moreover, the strand information should be taken into account.
use `-n 2,3 -w`
## 🧫 `variant effect` subcommand can infer the effect of a mutation
```
Usage: variant effect [OPTIONS]
Annotation genomic variant effect.
╭─ Options ─────────────────────────────────────────────────────────────────╮
│ --input -i TEXT Input position file. │
│ --output -o TEXT Output annotation file │
│ --reference -r TEXT reference species │
│ --reference-gtf TEXT Customized reference gtf file. │
│ --reference-transcript TEXT Customized reference transcript │
│ fasta file. │
│ --reference-protein TEXT Customized reference protein fasta │
│ file. │
│ --release -e INTEGER ensembl release │
│ --strandness -s Use strand infomation or not? │
│ --pU-mode -u Make rRNA, tRNA, snoRNA into top │
│ priority. │
│ --npad -n INTEGER Number of padding base to call │
│ motif. │
│ --all-effects -a Output all effects. │
│ --with-header -H With header line in input file. │
│ --columns -c TEXT Sets columns for site info. │
│ (Chrom,Pos,Strand,Ref,Alt) │
│ [default: 1,2,3,4,5] │
│ --help -h Show this message and exit. │
╰───────────────────────────────────────────────────────────────────────────╯
```
> demo:
Store the following table in file (`sites.tsv`).
| Chrom | Position | Strand | Ref | Alt |
| ----- | --------- | ------ | --- | --- |
| chr1 | 230703034 | - | C | T |
| chr12 | 69353439 | + | A | T |
| chr14 | 23645352 | + | G | T |
| chr2 | 215361150 | - | A | T |
| chr2 | 84906537 | + | C | T |
| chr22 | 39319077 | - | T | A |
| chr22 | 39319095 | - | T | A |
| chr22 | 39319098 | - | T | A |
Run command:
```bash
variant-effect -i sites.tsv -H -r human -e 108 -t RNA -H -c 1,2,3
```
- `-i` specify the input file
- `-H` means the file is with header line, and the first row will be skipped;
- `-r` use the specific genome, default is human
- `-e` specify the Ensembl release version
- `-c` means only use some of the columns in the input file. default will use the first 5 columns.
You will have this output
| Chrom | Position | Strand | Ref | Alt | mut_type | gene_type | gene_name | gene_pos | transcript_name | transcript_pos | transcript_motif | coding_pos | codon_ref | aa_pos | aa_ref | distance2splice |
| :---- | :-------- | :----- | :-- | :-- | :------------ | :------------- | :---------------------- | :------- | :-------------------------- | :------------- | :-------------------- | :--------- | :-------- | :----- | :----- | --------------- |
| chr1 | 230703034 | - | C | T | ThreePrimeUTR | protein_coding | ENSG00000135744(AGT) | 42543 | ENST00000680041(AGT-208) | 1753 | TGTGTCACCCCCAGTCTCCCA | None | None | None | None | 295 |
| chr12 | 69353439 | + | A | T | ThreePrimeUTR | protein_coding | ENSG00000090382(LYZ) | 5059 | ENST00000261267(LYZ-201) | 695 | TAGAACTAATACTGGTGAAAA | None | None | None | None | 286 |
| chr14 | 23645352 | + | G | T | ThreePrimeUTR | protein_coding | ENSG00000100867(DHRS2) | 15238 | ENST00000344777(DHRS2-202) | 1391 | CTGCCATTCTGCCAGACTAGC | None | None | None | None | 210 |
| chr2 | 215361150 | - | A | T | ThreePrimeUTR | protein_coding | ENSG00000115414(FN1) | 74924 | ENST00000323926(FN1-201) | 8012 | GGCCCGCAATACTGTAGGAAC | None | None | None | None | 476 |
| chr2 | 84906537 | + | C | T | ThreePrimeUTR | protein_coding | ENSG00000034510(TMSB10) | 882 | ENST00000233143(TMSB10-201) | 327 | CCTGGGCACTCCGCGCCGATG | None | None | None | None | 148 |
| chr22 | 39319077 | - | T | A | Intronic | protein_coding | ENSG00000100316(RPL3) | 1313 | ENST00000216146(RPL3-201) | None | None | None | None | None | None | None |
| chr22 | 39319095 | - | T | A | Intronic | protein_coding | ENSG00000100316(RPL3) | 1295 | ENST00000216146(RPL3-201) | None | None | None | None | None | None | None |
| chr22 | 39319098 | - | T | A | Intronic | protein_coding | ENSG00000100316(RPL3) | 1292 | ENST00000216146(RPL3-201) | None | None | None | None | None | None | None |
## 🧫 `variant coordinate` subcommand can mapping chrom name and positions between different reference coordinate
```
Usage: variant coordinate [OPTIONS]
Fetch genomic motif.
╭─ Options ───────────────────────────────────────────────────────────────────╮
│ --input -i TEXT Input position file. │
│ --output -o TEXT Output annotation file. │
│ --reference-mapping -m TEXT Mapping file for chrom name, first column is │
│ chrom in the input, second column is chrom │
│ in the reference db (sep by tab) │
│ --buildin-mapping -M TEXT Build-in mapping for chrom name: U2E (UCSC │
│ to Ensembl), E2U (Ensembl to UCSC) │
│ --with-header -H With header line in input file. │
│ --columns -c TEXT Sets columns for site info. (Chrom) │
│ [default: 1] │
│ --help -h Show this message and exit. │
╰─────────────────────────────────────────────────────────────────────────────╯
```
## ⏳⏳⏳ more functions will be supported in the future
## TODO:
- imporve speed. Base on [cgranges](https://github.com/lh3/cgranges), [pyranges](https://github.com/biocore-ntnu/pyranges)?, or [BioCantor](https://github.com/InscriptaLabs/BioCantor)?
Raw data
{
"_id": null,
"home_page": "https://github.com/yech1990/variant",
"name": "variant",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.9",
"maintainer_email": null,
"keywords": "bioinformatics, variant, mutation, RNA modification",
"author": "Chang Ye",
"author_email": "yech1990@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/b3/1b/c7d5d1f495d4d511cc3faf7d25fdf593b1d097ecadf001cb1f5ae3563606/variant-0.0.94.tar.gz",
"platform": null,
"description": "# Python pakcage for genomic variant analysis\n\n[![Pypi Releases](https://img.shields.io/pypi/v/variant.svg)](https://pypi.python.org/pypi/variant)\n[![Downloads](https://pepy.tech/badge/variant)](https://pepy.tech/project/variant)\n\n# How to install?\n\n```\npip install variant\n```\n\n# How to use?\n\n## \ud83e\uddec `variant motif` subcommand can fetch motif sequence around given site.\n\n```\n Usage: variant motif [OPTIONS]\n\n Fetch genomic motif.\n\n\u256d\u2500 Options \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256e\n\u2502 --input -i TEXT Input position file. \u2502\n\u2502 --output -o TEXT Output annotation file. \u2502\n\u2502 * --fasta -f TEXT reference fasta file. [required] \u2502\n\u2502 --npad -n TEXT Number of padding base to call motif. If you \u2502\n\u2502 want to set different left and right pads, \u2502\n\u2502 use comma to separate them. (eg. 2,3) \u2502\n\u2502 --with-header -H With header line in input file. \u2502\n\u2502 --columns -c TEXT Sets columns for site info. \u2502\n\u2502 (Chrom,Pos,Strand) \u2502\n\u2502 [default: 1,2,3] \u2502\n\u2502 --to-upper -u Convert motif to upper case. \u2502\n\u2502 --wrap-site -w Wrap motif site. \u2502\n\u2502 --help -h Show this message and exit. \u2502\n\u2570\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256f\n```\n\n> demo:\n\nI would like to get the 2 bases before the given sites, and 3 bases after the given sites, meanwhile, wrap the give sites with bracket. Moreover, the strand information should be taken into account.\n\nuse `-n 2,3 -w`\n\n## \ud83e\uddeb `variant effect` subcommand can infer the effect of a mutation\n\n```\n Usage: variant effect [OPTIONS]\n\n Annotation genomic variant effect.\n\n\u256d\u2500 Options \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256e\n\u2502 --input -i TEXT Input position file. \u2502\n\u2502 --output -o TEXT Output annotation file \u2502\n\u2502 --reference -r TEXT reference species \u2502\n\u2502 --reference-gtf TEXT Customized reference gtf file. \u2502\n\u2502 --reference-transcript TEXT Customized reference transcript \u2502\n\u2502 fasta file. \u2502\n\u2502 --reference-protein TEXT Customized reference protein fasta \u2502\n\u2502 file. \u2502\n\u2502 --release -e INTEGER ensembl release \u2502\n\u2502 --strandness -s Use strand infomation or not? \u2502\n\u2502 --pU-mode -u Make rRNA, tRNA, snoRNA into top \u2502\n\u2502 priority. \u2502\n\u2502 --npad -n INTEGER Number of padding base to call \u2502\n\u2502 motif. \u2502\n\u2502 --all-effects -a Output all effects. \u2502\n\u2502 --with-header -H With header line in input file. \u2502\n\u2502 --columns -c TEXT Sets columns for site info. \u2502\n\u2502 (Chrom,Pos,Strand,Ref,Alt) \u2502\n\u2502 [default: 1,2,3,4,5] \u2502\n\u2502 --help -h Show this message and exit. \u2502\n\u2570\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256f\n```\n\n> demo:\n\nStore the following table in file (`sites.tsv`).\n\n| Chrom | Position | Strand | Ref | Alt |\n| ----- | --------- | ------ | --- | --- |\n| chr1 | 230703034 | - | C | T |\n| chr12 | 69353439 | + | A | T |\n| chr14 | 23645352 | + | G | T |\n| chr2 | 215361150 | - | A | T |\n| chr2 | 84906537 | + | C | T |\n| chr22 | 39319077 | - | T | A |\n| chr22 | 39319095 | - | T | A |\n| chr22 | 39319098 | - | T | A |\n\nRun command:\n\n```bash\nvariant-effect -i sites.tsv -H -r human -e 108 -t RNA -H -c 1,2,3\n```\n\n- `-i` specify the input file\n- `-H` means the file is with header line, and the first row will be skipped;\n- `-r` use the specific genome, default is human\n- `-e` specify the Ensembl release version\n- `-c` means only use some of the columns in the input file. default will use the first 5 columns.\n\nYou will have this output\n\n| Chrom | Position | Strand | Ref | Alt | mut_type | gene_type | gene_name | gene_pos | transcript_name | transcript_pos | transcript_motif | coding_pos | codon_ref | aa_pos | aa_ref | distance2splice |\n| :---- | :-------- | :----- | :-- | :-- | :------------ | :------------- | :---------------------- | :------- | :-------------------------- | :------------- | :-------------------- | :--------- | :-------- | :----- | :----- | --------------- |\n| chr1 | 230703034 | - | C | T | ThreePrimeUTR | protein_coding | ENSG00000135744(AGT) | 42543 | ENST00000680041(AGT-208) | 1753 | TGTGTCACCCCCAGTCTCCCA | None | None | None | None | 295 |\n| chr12 | 69353439 | + | A | T | ThreePrimeUTR | protein_coding | ENSG00000090382(LYZ) | 5059 | ENST00000261267(LYZ-201) | 695 | TAGAACTAATACTGGTGAAAA | None | None | None | None | 286 |\n| chr14 | 23645352 | + | G | T | ThreePrimeUTR | protein_coding | ENSG00000100867(DHRS2) | 15238 | ENST00000344777(DHRS2-202) | 1391 | CTGCCATTCTGCCAGACTAGC | None | None | None | None | 210 |\n| chr2 | 215361150 | - | A | T | ThreePrimeUTR | protein_coding | ENSG00000115414(FN1) | 74924 | ENST00000323926(FN1-201) | 8012 | GGCCCGCAATACTGTAGGAAC | None | None | None | None | 476 |\n| chr2 | 84906537 | + | C | T | ThreePrimeUTR | protein_coding | ENSG00000034510(TMSB10) | 882 | ENST00000233143(TMSB10-201) | 327 | CCTGGGCACTCCGCGCCGATG | None | None | None | None | 148 |\n| chr22 | 39319077 | - | T | A | Intronic | protein_coding | ENSG00000100316(RPL3) | 1313 | ENST00000216146(RPL3-201) | None | None | None | None | None | None | None |\n| chr22 | 39319095 | - | T | A | Intronic | protein_coding | ENSG00000100316(RPL3) | 1295 | ENST00000216146(RPL3-201) | None | None | None | None | None | None | None |\n| chr22 | 39319098 | - | T | A | Intronic | protein_coding | ENSG00000100316(RPL3) | 1292 | ENST00000216146(RPL3-201) | None | None | None | None | None | None | None |\n\n## \ud83e\uddeb `variant coordinate` subcommand can mapping chrom name and positions between different reference coordinate\n\n```\n Usage: variant coordinate [OPTIONS]\n\n Fetch genomic motif.\n\n\u256d\u2500 Options \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256e\n\u2502 --input -i TEXT Input position file. \u2502\n\u2502 --output -o TEXT Output annotation file. \u2502\n\u2502 --reference-mapping -m TEXT Mapping file for chrom name, first column is \u2502\n\u2502 chrom in the input, second column is chrom \u2502\n\u2502 in the reference db (sep by tab) \u2502\n\u2502 --buildin-mapping -M TEXT Build-in mapping for chrom name: U2E (UCSC \u2502\n\u2502 to Ensembl), E2U (Ensembl to UCSC) \u2502\n\u2502 --with-header -H With header line in input file. \u2502\n\u2502 --columns -c TEXT Sets columns for site info. (Chrom) \u2502\n\u2502 [default: 1] \u2502\n\u2502 --help -h Show this message and exit. \u2502\n\u2570\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256f\n\n```\n\n## \u23f3\u23f3\u23f3 more functions will be supported in the future\n\n## TODO:\n\n- imporve speed. Base on [cgranges](https://github.com/lh3/cgranges), [pyranges](https://github.com/biocore-ntnu/pyranges)?, or [BioCantor](https://github.com/InscriptaLabs/BioCantor)?\n",
"bugtrack_url": null,
"license": "MIT",
"summary": null,
"version": "0.0.94",
"project_urls": {
"Homepage": "https://github.com/yech1990/variant",
"Repository": "https://github.com/yech1990/variant"
},
"split_keywords": [
"bioinformatics",
" variant",
" mutation",
" rna modification"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "8f93fbe8e4e076d5c5fd0437721bd6a167c46055513710597d19f4c0475fc259",
"md5": "30d3b57875505adf05e0d64a26cbc0a8",
"sha256": "cea41121cd0155b2543ed813e96ddbbd77ca41d17df64dc26b545e138fa11fe8"
},
"downloads": -1,
"filename": "variant-0.0.94-cp310-cp310-macosx_14_0_x86_64.whl",
"has_sig": false,
"md5_digest": "30d3b57875505adf05e0d64a26cbc0a8",
"packagetype": "bdist_wheel",
"python_version": "cp310",
"requires_python": "<4.0,>=3.9",
"size": 16842,
"upload_time": "2024-11-10T19:23:18",
"upload_time_iso_8601": "2024-11-10T19:23:18.094548Z",
"url": "https://files.pythonhosted.org/packages/8f/93/fbe8e4e076d5c5fd0437721bd6a167c46055513710597d19f4c0475fc259/variant-0.0.94-cp310-cp310-macosx_14_0_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "7eee3991f29a15142dac3a989288dc588d787e6f21f9f508b8b9c707bc5d1263",
"md5": "896a686173df2fca1c457a90676dcd02",
"sha256": "0b940b3ce4cbb15786d295477d675afcd558fe1797a00679d18ed95eb62659ba"
},
"downloads": -1,
"filename": "variant-0.0.94-cp312-cp312-manylinux_2_35_x86_64.whl",
"has_sig": false,
"md5_digest": "896a686173df2fca1c457a90676dcd02",
"packagetype": "bdist_wheel",
"python_version": "cp312",
"requires_python": "<4.0,>=3.9",
"size": 24671,
"upload_time": "2024-11-10T19:24:47",
"upload_time_iso_8601": "2024-11-10T19:24:47.609676Z",
"url": "https://files.pythonhosted.org/packages/7e/ee/3991f29a15142dac3a989288dc588d787e6f21f9f508b8b9c707bc5d1263/variant-0.0.94-cp312-cp312-manylinux_2_35_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "b31bc7d5d1f495d4d511cc3faf7d25fdf593b1d097ecadf001cb1f5ae3563606",
"md5": "3b816ad706995d88c45aa1971a7ddd77",
"sha256": "d992b82d7b158896271e10209c39b4a112970ed70dd5952d7e48e76ae57f9ad3"
},
"downloads": -1,
"filename": "variant-0.0.94.tar.gz",
"has_sig": false,
"md5_digest": "3b816ad706995d88c45aa1971a7ddd77",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.9",
"size": 16061,
"upload_time": "2024-11-10T19:23:19",
"upload_time_iso_8601": "2024-11-10T19:23:19.051442Z",
"url": "https://files.pythonhosted.org/packages/b3/1b/c7d5d1f495d4d511cc3faf7d25fdf593b1d097ecadf001cb1f5ae3563606/variant-0.0.94.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-10 19:23:19",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "yech1990",
"github_project": "variant",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "variant"
}