## PyReference ##
[![PyPi version](https://img.shields.io/pypi/v/pyreference.svg)](https://pypi.org/project/pyreference/) [![Python versions](https://img.shields.io/pypi/pyversions/pyreference.svg)](https://pypi.org/project/pyreference/)
A Python library for working with reference gene annotations. For RefSeq/Ensembl GRCh37/GRCh38 and other species
A GTF/GFF3 can take minutes to load. We pre-process it into JSON, so it can be loaded extremely rapidly.
PyReference makes it easy to write genomics code, which is easily run across different genomes or annotation versions.
## Example ##
import numpy as np
from pyreference import Reference
reference = Reference() # uses ~/pyreference.cfg default_build
my_gene_symbols = ["MSN", "GATA2", "ZEB1"]
for gene in reference[my_gene_symbols]:
average_length = np.mean([t.length for t in gene.transcripts])
print("%s average length = %.2f" % (gene, average_length))
print(gene.iv)
for transcript in gene.transcripts:
if transcript.is_coding:
threep_utr = transcript.get_3putr_sequence()
print("%s end of 3putr: %s" % (transcript.get_id(), threep_utr[-20:]))
Outputs:
MSN (MSN) 1 transcripts average length = 3970.00
chrX:[64887510,64961793)/+
NM_002444 end of 3putr: TAAAATTTAGGAAGACTTCA
GATA2 (GATA2) 3 transcripts average length = 3367.67
chr3:[128198264,128212030)/-
NM_001145662 end of 3putr: AATACTTTTTGTGAATGCCC
NM_001145661 end of 3putr: AATACTTTTTGTGAATGCCC
NM_032638 end of 3putr: AATACTTTTTGTGAATGCCC
ZEB1 (ZEB1) 6 transcripts average length = 6037.83
chr10:[31608100,31818742)/+
NM_001174093 end of 3putr: CTTCTTTTTCTATTGCCTTA
NM_001174094 end of 3putr: CTTCTTTTTCTATTGCCTTA
NM_030751 end of 3putr: CTTCTTTTTCTATTGCCTTA
NM_001174096 end of 3putr: CTTCTTTTTCTATTGCCTTA
NM_001174095 end of 3putr: CTTCTTTTTCTATTGCCTTA
NM_001128128 end of 3putr: CTTCTTTTTCTATTGCCTTA
This takes 4 seconds to load on my machine.
## pyreference biotype ##
Also included is a command line tool (pyreference_biotype.py) which shows which biotypes small RNA fragments map to.
![](https://i.stack.imgur.com/Tsjr3.jpg)
## Installation ##
sudo pip install pyreference
Then you will need to:
* [Download / Create gene annotations](https://github.com/SACGF/pyreference/wiki/genes_json_file)
* Create a [pyreference config files](https://github.com/SACGF/pyreference/wiki/pyreference_config_file)
Raw data
{
"_id": null,
"home_page": "https://github.com/SACGF/pyreference",
"name": "pyreference",
"maintainer": "",
"docs_url": null,
"requires_python": ">=2.7, >=3.5",
"maintainer_email": "",
"keywords": "genomics,gtf,gff,genome,genes",
"author": "David Lawrence",
"author_email": "davmlaw@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/97/55/dc03590e6b34c0fa3c1db137e4df4d5ac9a7573ccc572bf57e6b8c6b1159/pyreference-0.7.5.tar.gz",
"platform": null,
"description": "## PyReference ##\n\n[![PyPi version](https://img.shields.io/pypi/v/pyreference.svg)](https://pypi.org/project/pyreference/) [![Python versions](https://img.shields.io/pypi/pyversions/pyreference.svg)](https://pypi.org/project/pyreference/)\n\nA Python library for working with reference gene annotations. For RefSeq/Ensembl GRCh37/GRCh38 and other species\n\nA GTF/GFF3 can take minutes to load. We pre-process it into JSON, so it can be loaded extremely rapidly. \n\nPyReference makes it easy to write genomics code, which is easily run across different genomes or annotation versions.\n\n## Example ##\n\n import numpy as np\n from pyreference import Reference \n \n reference = Reference() # uses ~/pyreference.cfg default_build\n\n my_gene_symbols = [\"MSN\", \"GATA2\", \"ZEB1\"]\n for gene in reference[my_gene_symbols]:\n average_length = np.mean([t.length for t in gene.transcripts])\n print(\"%s average length = %.2f\" % (gene, average_length))\n print(gene.iv)\n for transcript in gene.transcripts:\n if transcript.is_coding:\n threep_utr = transcript.get_3putr_sequence()\n print(\"%s end of 3putr: %s\" % (transcript.get_id(), threep_utr[-20:]))\n\nOutputs:\n\n\tMSN (MSN) 1 transcripts average length = 3970.00\n\tchrX:[64887510,64961793)/+\n\tNM_002444 end of 3putr: TAAAATTTAGGAAGACTTCA\n\n\tGATA2 (GATA2) 3 transcripts average length = 3367.67\n\tchr3:[128198264,128212030)/-\n\tNM_001145662 end of 3putr: AATACTTTTTGTGAATGCCC\n\tNM_001145661 end of 3putr: AATACTTTTTGTGAATGCCC\n\tNM_032638 end of 3putr: AATACTTTTTGTGAATGCCC\n\n\tZEB1 (ZEB1) 6 transcripts average length = 6037.83\n\tchr10:[31608100,31818742)/+\n\tNM_001174093 end of 3putr: CTTCTTTTTCTATTGCCTTA\n\tNM_001174094 end of 3putr: CTTCTTTTTCTATTGCCTTA\n\tNM_030751 end of 3putr: CTTCTTTTTCTATTGCCTTA\n\tNM_001174096 end of 3putr: CTTCTTTTTCTATTGCCTTA\n\tNM_001174095 end of 3putr: CTTCTTTTTCTATTGCCTTA\n\tNM_001128128 end of 3putr: CTTCTTTTTCTATTGCCTTA\n\nThis takes 4 seconds to load on my machine.\n\n## pyreference biotype ##\n\nAlso included is a command line tool (pyreference_biotype.py) which shows which biotypes small RNA fragments map to.\n\n![](https://i.stack.imgur.com/Tsjr3.jpg)\n\n## Installation ##\n\n sudo pip install pyreference\n\nThen you will need to:\n\n* [Download / Create gene annotations](https://github.com/SACGF/pyreference/wiki/genes_json_file)\n* Create a [pyreference config files](https://github.com/SACGF/pyreference/wiki/pyreference_config_file)\n",
"bugtrack_url": null,
"license": "",
"summary": "Library for working with reference genomes and gene GTF/GFFs",
"version": "0.7.5",
"project_urls": {
"Homepage": "https://github.com/SACGF/pyreference"
},
"split_keywords": [
"genomics",
"gtf",
"gff",
"genome",
"genes"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "74d2ab42ffa5ccccc926b59aa6352a9e79169a9a395c38d87ec24bb686c3aa73",
"md5": "f6dd06c5608a3ba0c5ad37200f42832f",
"sha256": "6716bcf6bfdd31be36018faa2cb3c3fd3548d1ce47cc2ce2b41f948a57f40f18"
},
"downloads": -1,
"filename": "pyreference-0.7.5-py3-none-any.whl",
"has_sig": false,
"md5_digest": "f6dd06c5608a3ba0c5ad37200f42832f",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=2.7, >=3.5",
"size": 23821,
"upload_time": "2023-07-10T09:21:32",
"upload_time_iso_8601": "2023-07-10T09:21:32.146083Z",
"url": "https://files.pythonhosted.org/packages/74/d2/ab42ffa5ccccc926b59aa6352a9e79169a9a395c38d87ec24bb686c3aa73/pyreference-0.7.5-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "9755dc03590e6b34c0fa3c1db137e4df4d5ac9a7573ccc572bf57e6b8c6b1159",
"md5": "a7fdbe69540b70eda8e372103e3830ae",
"sha256": "bd62cc25adc102284808bd524d7958764f12b8f0be222fab776e7f9257e99c6d"
},
"downloads": -1,
"filename": "pyreference-0.7.5.tar.gz",
"has_sig": false,
"md5_digest": "a7fdbe69540b70eda8e372103e3830ae",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=2.7, >=3.5",
"size": 23536,
"upload_time": "2023-07-10T09:21:35",
"upload_time_iso_8601": "2023-07-10T09:21:35.726302Z",
"url": "https://files.pythonhosted.org/packages/97/55/dc03590e6b34c0fa3c1db137e4df4d5ac9a7573ccc572bf57e6b8c6b1159/pyreference-0.7.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-07-10 09:21:35",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "SACGF",
"github_project": "pyreference",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "biopython",
"specs": []
},
{
"name": "configargparse",
"specs": []
},
{
"name": "deprecation",
"specs": []
},
{
"name": "HTSeq",
"specs": [
[
"==",
"0.13.5"
]
]
},
{
"name": "lazy",
"specs": []
},
{
"name": "pysam",
"specs": []
}
],
"lcname": "pyreference"
}