pyreference


Namepyreference JSON
Version 0.7.5 PyPI version JSON
download
home_pagehttps://github.com/SACGF/pyreference
SummaryLibrary for working with reference genomes and gene GTF/GFFs
upload_time2023-07-10 09:21:35
maintainer
docs_urlNone
authorDavid Lawrence
requires_python>=2.7, >=3.5
license
keywords genomics gtf gff genome genes
VCS
bugtrack_url
requirements biopython configargparse deprecation HTSeq lazy pysam
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ## PyReference ##

[![PyPi version](https://img.shields.io/pypi/v/pyreference.svg)](https://pypi.org/project/pyreference/) [![Python versions](https://img.shields.io/pypi/pyversions/pyreference.svg)](https://pypi.org/project/pyreference/)

A Python library for working with reference gene annotations. For RefSeq/Ensembl GRCh37/GRCh38 and other species

A GTF/GFF3 can take minutes to load. We pre-process it into JSON, so it can be loaded extremely rapidly.  

PyReference makes it easy to write genomics code, which is easily run across different genomes or annotation versions.

## Example ##

    import numpy as np
    from pyreference import Reference 
    
    reference = Reference()  # uses ~/pyreference.cfg default_build

    my_gene_symbols = ["MSN", "GATA2", "ZEB1"]
    for gene in reference[my_gene_symbols]:
        average_length = np.mean([t.length for t in gene.transcripts])
        print("%s average length = %.2f" % (gene, average_length))
        print(gene.iv)
        for transcript in gene.transcripts:
            if transcript.is_coding:
                threep_utr = transcript.get_3putr_sequence()
                print("%s end of 3putr: %s" % (transcript.get_id(), threep_utr[-20:]))

Outputs:

	MSN (MSN) 1 transcripts average length = 3970.00
	chrX:[64887510,64961793)/+
	NM_002444 end of 3putr: TAAAATTTAGGAAGACTTCA

	GATA2 (GATA2) 3 transcripts average length = 3367.67
	chr3:[128198264,128212030)/-
	NM_001145662 end of 3putr: AATACTTTTTGTGAATGCCC
	NM_001145661 end of 3putr: AATACTTTTTGTGAATGCCC
	NM_032638 end of 3putr: AATACTTTTTGTGAATGCCC

	ZEB1 (ZEB1) 6 transcripts average length = 6037.83
	chr10:[31608100,31818742)/+
	NM_001174093 end of 3putr: CTTCTTTTTCTATTGCCTTA
	NM_001174094 end of 3putr: CTTCTTTTTCTATTGCCTTA
	NM_030751 end of 3putr: CTTCTTTTTCTATTGCCTTA
	NM_001174096 end of 3putr: CTTCTTTTTCTATTGCCTTA
	NM_001174095 end of 3putr: CTTCTTTTTCTATTGCCTTA
	NM_001128128 end of 3putr: CTTCTTTTTCTATTGCCTTA

This takes 4 seconds to load on my machine.

## pyreference biotype ##

Also included is a command line tool (pyreference_biotype.py) which shows which biotypes small RNA fragments map to.

![](https://i.stack.imgur.com/Tsjr3.jpg)

## Installation ##

    sudo pip install pyreference

Then you will need to:

* [Download / Create gene annotations](https://github.com/SACGF/pyreference/wiki/genes_json_file)
* Create a [pyreference config files](https://github.com/SACGF/pyreference/wiki/pyreference_config_file)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/SACGF/pyreference",
    "name": "pyreference",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=2.7, >=3.5",
    "maintainer_email": "",
    "keywords": "genomics,gtf,gff,genome,genes",
    "author": "David Lawrence",
    "author_email": "davmlaw@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/97/55/dc03590e6b34c0fa3c1db137e4df4d5ac9a7573ccc572bf57e6b8c6b1159/pyreference-0.7.5.tar.gz",
    "platform": null,
    "description": "## PyReference ##\n\n[![PyPi version](https://img.shields.io/pypi/v/pyreference.svg)](https://pypi.org/project/pyreference/) [![Python versions](https://img.shields.io/pypi/pyversions/pyreference.svg)](https://pypi.org/project/pyreference/)\n\nA Python library for working with reference gene annotations. For RefSeq/Ensembl GRCh37/GRCh38 and other species\n\nA GTF/GFF3 can take minutes to load. We pre-process it into JSON, so it can be loaded extremely rapidly.  \n\nPyReference makes it easy to write genomics code, which is easily run across different genomes or annotation versions.\n\n## Example ##\n\n    import numpy as np\n    from pyreference import Reference \n    \n    reference = Reference()  # uses ~/pyreference.cfg default_build\n\n    my_gene_symbols = [\"MSN\", \"GATA2\", \"ZEB1\"]\n    for gene in reference[my_gene_symbols]:\n        average_length = np.mean([t.length for t in gene.transcripts])\n        print(\"%s average length = %.2f\" % (gene, average_length))\n        print(gene.iv)\n        for transcript in gene.transcripts:\n            if transcript.is_coding:\n                threep_utr = transcript.get_3putr_sequence()\n                print(\"%s end of 3putr: %s\" % (transcript.get_id(), threep_utr[-20:]))\n\nOutputs:\n\n\tMSN (MSN) 1 transcripts average length = 3970.00\n\tchrX:[64887510,64961793)/+\n\tNM_002444 end of 3putr: TAAAATTTAGGAAGACTTCA\n\n\tGATA2 (GATA2) 3 transcripts average length = 3367.67\n\tchr3:[128198264,128212030)/-\n\tNM_001145662 end of 3putr: AATACTTTTTGTGAATGCCC\n\tNM_001145661 end of 3putr: AATACTTTTTGTGAATGCCC\n\tNM_032638 end of 3putr: AATACTTTTTGTGAATGCCC\n\n\tZEB1 (ZEB1) 6 transcripts average length = 6037.83\n\tchr10:[31608100,31818742)/+\n\tNM_001174093 end of 3putr: CTTCTTTTTCTATTGCCTTA\n\tNM_001174094 end of 3putr: CTTCTTTTTCTATTGCCTTA\n\tNM_030751 end of 3putr: CTTCTTTTTCTATTGCCTTA\n\tNM_001174096 end of 3putr: CTTCTTTTTCTATTGCCTTA\n\tNM_001174095 end of 3putr: CTTCTTTTTCTATTGCCTTA\n\tNM_001128128 end of 3putr: CTTCTTTTTCTATTGCCTTA\n\nThis takes 4 seconds to load on my machine.\n\n## pyreference biotype ##\n\nAlso included is a command line tool (pyreference_biotype.py) which shows which biotypes small RNA fragments map to.\n\n![](https://i.stack.imgur.com/Tsjr3.jpg)\n\n## Installation ##\n\n    sudo pip install pyreference\n\nThen you will need to:\n\n* [Download / Create gene annotations](https://github.com/SACGF/pyreference/wiki/genes_json_file)\n* Create a [pyreference config files](https://github.com/SACGF/pyreference/wiki/pyreference_config_file)\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Library for working with reference genomes and gene GTF/GFFs",
    "version": "0.7.5",
    "project_urls": {
        "Homepage": "https://github.com/SACGF/pyreference"
    },
    "split_keywords": [
        "genomics",
        "gtf",
        "gff",
        "genome",
        "genes"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "74d2ab42ffa5ccccc926b59aa6352a9e79169a9a395c38d87ec24bb686c3aa73",
                "md5": "f6dd06c5608a3ba0c5ad37200f42832f",
                "sha256": "6716bcf6bfdd31be36018faa2cb3c3fd3548d1ce47cc2ce2b41f948a57f40f18"
            },
            "downloads": -1,
            "filename": "pyreference-0.7.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "f6dd06c5608a3ba0c5ad37200f42832f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=2.7, >=3.5",
            "size": 23821,
            "upload_time": "2023-07-10T09:21:32",
            "upload_time_iso_8601": "2023-07-10T09:21:32.146083Z",
            "url": "https://files.pythonhosted.org/packages/74/d2/ab42ffa5ccccc926b59aa6352a9e79169a9a395c38d87ec24bb686c3aa73/pyreference-0.7.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9755dc03590e6b34c0fa3c1db137e4df4d5ac9a7573ccc572bf57e6b8c6b1159",
                "md5": "a7fdbe69540b70eda8e372103e3830ae",
                "sha256": "bd62cc25adc102284808bd524d7958764f12b8f0be222fab776e7f9257e99c6d"
            },
            "downloads": -1,
            "filename": "pyreference-0.7.5.tar.gz",
            "has_sig": false,
            "md5_digest": "a7fdbe69540b70eda8e372103e3830ae",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=2.7, >=3.5",
            "size": 23536,
            "upload_time": "2023-07-10T09:21:35",
            "upload_time_iso_8601": "2023-07-10T09:21:35.726302Z",
            "url": "https://files.pythonhosted.org/packages/97/55/dc03590e6b34c0fa3c1db137e4df4d5ac9a7573ccc572bf57e6b8c6b1159/pyreference-0.7.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-07-10 09:21:35",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "SACGF",
    "github_project": "pyreference",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "biopython",
            "specs": []
        },
        {
            "name": "configargparse",
            "specs": []
        },
        {
            "name": "deprecation",
            "specs": []
        },
        {
            "name": "HTSeq",
            "specs": [
                [
                    "==",
                    "0.13.5"
                ]
            ]
        },
        {
            "name": "lazy",
            "specs": []
        },
        {
            "name": "pysam",
            "specs": []
        }
    ],
    "lcname": "pyreference"
}
        
Elapsed time: 0.12066s