cdot


Namecdot JSON
Version 0.2.26 PyPI version JSON
download
home_pagehttps://github.com/SACGF/cdot
SummaryTranscripts for HGVS libraries
upload_time2024-08-15 04:48:21
maintainerNone
docs_urlNone
authorDave Lawrence
requires_python>=3.8
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # cdot

[![PyPi version](https://img.shields.io/pypi/v/cdot.svg)](https://pypi.org/project/cdot/) [![Python versions](https://img.shields.io/pypi/pyversions/cdot.svg)](https://pypi.org/project/cdot/)

cdot provides transcripts for the 2 most popular Python [HGVS](http://varnomen.hgvs.org/) libraries.

It works by:

* Converting RefSeq/Ensembl GTFs to JSON 
* Providing loaders for the HGVS libraries, via JSON.gz files, or REST API via [cdot_rest](https://github.com/SACGF/cdot_rest))

We currently support ~905k transcripts (vs ~141k in UTA v.20210129)

## New 

See [changelog](https://github.com/SACGF/cdot/blob/main/CHANGELOG.md)

2023-07-05:
* BioCommons HGVS DataProvider fixes
* Support for mouse transcripts (Mus Musculus GRCm38 and GRCm39)

2023-04-03:
* #41 - Support for T2T CHM13v2.0 [example code](https://github.com/SACGF/cdot/wiki/Biocommons-T2T-CHM13v2.0-example-code)

## Install

```
pip install cdot
```

## Examples

[Biocommons HGVS](https://github.com/biocommons/hgvs) example:

```
import hgvs
from hgvs.assemblymapper import AssemblyMapper
from cdot.hgvs.dataproviders import JSONDataProvider, RESTDataProvider

hdp = RESTDataProvider()  # Uses API server at cdot.cc
# hdp = JSONDataProvider(["./cdot-0.2.14.refseq.grch37.json.gz"])  # Uses local JSON file

am = AssemblyMapper(hdp,
                    assembly_name='GRCh37',
                    alt_aln_method='splign', replace_reference=True)

hp = hgvs.parser.Parser()
var_c = hp.parse_hgvs_variant('NM_001637.3:c.1582G>A')
am.c_to_g(var_c)
```

[more Biocommons examples](https://github.com/SACGF/cdot/wiki/Biocommons-HGVS-example-code):

[PyHGVS](https://github.com/counsyl/hgvs) example:

```
import pyhgvs
from pysam.libcfaidx import FastaFile
from cdot.pyhgvs.pyhgvs_transcript import JSONPyHGVSTranscriptFactory, RESTPyHGVSTranscriptFactory

genome = FastaFile("/data/annotation/fasta/GCF_000001405.25_GRCh37.p13_genomic.fna.gz")
factory = RESTPyHGVSTranscriptFactory()
# factory = JSONPyHGVSTranscriptFactory(["./cdot-0.2.14.refseq.grch37.json.gz"])  # Uses local JSON file
pyhgvs.parse_hgvs_name('NM_001637.3:c.1582G>A', genome, get_transcript=factory.get_transcript_grch37)
```

[more PyHGVS examples](https://github.com/SACGF/cdot/wiki/PyHGVS-example-code):

## Q. What's the performance like?

* UTA public DB: 1-1.5 seconds / transcript
* cdot REST service: 10/second
* cdot JSON.gz: 500-1k/second

## Q. Where can I download the JSON.gz files?

[Download from GitHub releases](https://github.com/SACGF/cdot/releases) - RefSeq (37/38) - 72M, Ensembl (37/38) 61M

Details on what the files contain [here](https://github.com/SACGF/cdot/wiki/GitHub-release-file-details)

## Q. How does this compare to Universal Transcript Archive?

Both projects have similar goals of providing transcripts for loading HGVS, but they approach it from different ways

* UTA aligns sequences, then stores coordinates in an SQL database. 
* cdot convert existing Ensembl/RefSeq GTFs into JSON

See [wiki for more details](https://github.com/SACGF/cdot/wiki/cdot-vs-UTA)

## Q. How do you store transcripts in JSON?

See [wiki page](https://github.com/SACGF/cdot/wiki/Transcript-JSON-format) for the format.

We think a standard for JSON gene/transcript information would be a great thing, and am keen to collaborate to make it happen!

## Q. What does cdot stand for?

cdot, pronounced "see dot" stands for Complete Dict of Transcripts

This was developed for the [Australian Genomics](https://www.australiangenomics.org.au/) [Shariant](https://shariant.org.au/) project, due to the need to load historical HGVS from lab archives.   

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/SACGF/cdot",
    "name": "cdot",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": null,
    "author": "Dave Lawrence",
    "author_email": "davmlaw@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/f0/62/f63febd29ad39c56d3616244d01f308f703b6901f6245fd8e1ac8c5221ed/cdot-0.2.26.tar.gz",
    "platform": null,
    "description": "# cdot\n\n[![PyPi version](https://img.shields.io/pypi/v/cdot.svg)](https://pypi.org/project/cdot/) [![Python versions](https://img.shields.io/pypi/pyversions/cdot.svg)](https://pypi.org/project/cdot/)\n\ncdot provides transcripts for the 2 most popular Python [HGVS](http://varnomen.hgvs.org/) libraries.\n\nIt works by:\n\n* Converting RefSeq/Ensembl GTFs to JSON \n* Providing loaders for the HGVS libraries, via JSON.gz files, or REST API via [cdot_rest](https://github.com/SACGF/cdot_rest))\n\nWe currently support ~905k transcripts (vs ~141k in UTA v.20210129)\n\n## New \n\nSee [changelog](https://github.com/SACGF/cdot/blob/main/CHANGELOG.md)\n\n2023-07-05:\n* BioCommons HGVS DataProvider fixes\n* Support for mouse transcripts (Mus Musculus GRCm38 and GRCm39)\n\n2023-04-03:\n* #41 - Support for T2T CHM13v2.0 [example code](https://github.com/SACGF/cdot/wiki/Biocommons-T2T-CHM13v2.0-example-code)\n\n## Install\n\n```\npip install cdot\n```\n\n## Examples\n\n[Biocommons HGVS](https://github.com/biocommons/hgvs) example:\n\n```\nimport hgvs\nfrom hgvs.assemblymapper import AssemblyMapper\nfrom cdot.hgvs.dataproviders import JSONDataProvider, RESTDataProvider\n\nhdp = RESTDataProvider()  # Uses API server at cdot.cc\n# hdp = JSONDataProvider([\"./cdot-0.2.14.refseq.grch37.json.gz\"])  # Uses local JSON file\n\nam = AssemblyMapper(hdp,\n                    assembly_name='GRCh37',\n                    alt_aln_method='splign', replace_reference=True)\n\nhp = hgvs.parser.Parser()\nvar_c = hp.parse_hgvs_variant('NM_001637.3:c.1582G>A')\nam.c_to_g(var_c)\n```\n\n[more Biocommons examples](https://github.com/SACGF/cdot/wiki/Biocommons-HGVS-example-code):\n\n[PyHGVS](https://github.com/counsyl/hgvs) example:\n\n```\nimport pyhgvs\nfrom pysam.libcfaidx import FastaFile\nfrom cdot.pyhgvs.pyhgvs_transcript import JSONPyHGVSTranscriptFactory, RESTPyHGVSTranscriptFactory\n\ngenome = FastaFile(\"/data/annotation/fasta/GCF_000001405.25_GRCh37.p13_genomic.fna.gz\")\nfactory = RESTPyHGVSTranscriptFactory()\n# factory = JSONPyHGVSTranscriptFactory([\"./cdot-0.2.14.refseq.grch37.json.gz\"])  # Uses local JSON file\npyhgvs.parse_hgvs_name('NM_001637.3:c.1582G>A', genome, get_transcript=factory.get_transcript_grch37)\n```\n\n[more PyHGVS examples](https://github.com/SACGF/cdot/wiki/PyHGVS-example-code):\n\n## Q. What's the performance like?\n\n* UTA public DB: 1-1.5 seconds / transcript\n* cdot REST service: 10/second\n* cdot JSON.gz: 500-1k/second\n\n## Q. Where can I download the JSON.gz files?\n\n[Download from GitHub releases](https://github.com/SACGF/cdot/releases) - RefSeq (37/38) - 72M, Ensembl (37/38) 61M\n\nDetails on what the files contain [here](https://github.com/SACGF/cdot/wiki/GitHub-release-file-details)\n\n## Q. How does this compare to Universal Transcript Archive?\n\nBoth projects have similar goals of providing transcripts for loading HGVS, but they approach it from different ways\n\n* UTA aligns sequences, then stores coordinates in an SQL database. \n* cdot convert existing Ensembl/RefSeq GTFs into JSON\n\nSee [wiki for more details](https://github.com/SACGF/cdot/wiki/cdot-vs-UTA)\n\n## Q. How do you store transcripts in JSON?\n\nSee [wiki page](https://github.com/SACGF/cdot/wiki/Transcript-JSON-format) for the format.\n\nWe think a standard for JSON gene/transcript information would be a great thing, and am keen to collaborate to make it happen!\n\n## Q. What does cdot stand for?\n\ncdot, pronounced \"see dot\" stands for Complete Dict of Transcripts\n\nThis was developed for the [Australian Genomics](https://www.australiangenomics.org.au/) [Shariant](https://shariant.org.au/) project, due to the need to load historical HGVS from lab archives.   \n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Transcripts for HGVS libraries",
    "version": "0.2.26",
    "project_urls": {
        "Bug Tracker": "https://github.com/SACGF/cdot/issues",
        "Homepage": "https://github.com/SACGF/cdot"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "66ab90cf9f3011a1018e874e86001c20a0d26f7004ac5c19697b8fbf62dec9b8",
                "md5": "60cf6fa5afd2e587d892bdb5c980e6fe",
                "sha256": "f9f6c3dbdb9dffda3779e77d9acef33ae3111c11a4de18fba5ff1d77cbc83c00"
            },
            "downloads": -1,
            "filename": "cdot-0.2.26-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "60cf6fa5afd2e587d892bdb5c980e6fe",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 13870,
            "upload_time": "2024-08-15T04:48:19",
            "upload_time_iso_8601": "2024-08-15T04:48:19.801385Z",
            "url": "https://files.pythonhosted.org/packages/66/ab/90cf9f3011a1018e874e86001c20a0d26f7004ac5c19697b8fbf62dec9b8/cdot-0.2.26-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f062f63febd29ad39c56d3616244d01f308f703b6901f6245fd8e1ac8c5221ed",
                "md5": "835e424070d449dd4e1ebf4a80919706",
                "sha256": "6f9b9fb4076722f5d92d189fa4ef5a7e2af1cdd4f790068bb7d9a5d3ba73921b"
            },
            "downloads": -1,
            "filename": "cdot-0.2.26.tar.gz",
            "has_sig": false,
            "md5_digest": "835e424070d449dd4e1ebf4a80919706",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 16399,
            "upload_time": "2024-08-15T04:48:21",
            "upload_time_iso_8601": "2024-08-15T04:48:21.436461Z",
            "url": "https://files.pythonhosted.org/packages/f0/62/f63febd29ad39c56d3616244d01f308f703b6901f6245fd8e1ac8c5221ed/cdot-0.2.26.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-15 04:48:21",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "SACGF",
    "github_project": "cdot",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "cdot"
}
        
Elapsed time: 0.33512s