| Name | cdot JSON |
| Version |
0.2.26
JSON |
| download |
| home_page | https://github.com/SACGF/cdot |
| Summary | Transcripts for HGVS libraries |
| upload_time | 2024-08-15 04:48:21 |
| maintainer | None |
| docs_url | None |
| author | Dave Lawrence |
| requires_python | >=3.8 |
| license | None |
| keywords |
|
| VCS |
 |
| bugtrack_url |
|
| requirements |
No requirements were recorded.
|
| Travis-CI |
No Travis.
|
| coveralls test coverage |
No coveralls.
|
# cdot
[](https://pypi.org/project/cdot/) [](https://pypi.org/project/cdot/)
cdot provides transcripts for the 2 most popular Python [HGVS](http://varnomen.hgvs.org/) libraries.
It works by:
* Converting RefSeq/Ensembl GTFs to JSON
* Providing loaders for the HGVS libraries, via JSON.gz files, or REST API via [cdot_rest](https://github.com/SACGF/cdot_rest))
We currently support ~905k transcripts (vs ~141k in UTA v.20210129)
## New
See [changelog](https://github.com/SACGF/cdot/blob/main/CHANGELOG.md)
2023-07-05:
* BioCommons HGVS DataProvider fixes
* Support for mouse transcripts (Mus Musculus GRCm38 and GRCm39)
2023-04-03:
* #41 - Support for T2T CHM13v2.0 [example code](https://github.com/SACGF/cdot/wiki/Biocommons-T2T-CHM13v2.0-example-code)
## Install
```
pip install cdot
```
## Examples
[Biocommons HGVS](https://github.com/biocommons/hgvs) example:
```
import hgvs
from hgvs.assemblymapper import AssemblyMapper
from cdot.hgvs.dataproviders import JSONDataProvider, RESTDataProvider
hdp = RESTDataProvider() # Uses API server at cdot.cc
# hdp = JSONDataProvider(["./cdot-0.2.14.refseq.grch37.json.gz"]) # Uses local JSON file
am = AssemblyMapper(hdp,
assembly_name='GRCh37',
alt_aln_method='splign', replace_reference=True)
hp = hgvs.parser.Parser()
var_c = hp.parse_hgvs_variant('NM_001637.3:c.1582G>A')
am.c_to_g(var_c)
```
[more Biocommons examples](https://github.com/SACGF/cdot/wiki/Biocommons-HGVS-example-code):
[PyHGVS](https://github.com/counsyl/hgvs) example:
```
import pyhgvs
from pysam.libcfaidx import FastaFile
from cdot.pyhgvs.pyhgvs_transcript import JSONPyHGVSTranscriptFactory, RESTPyHGVSTranscriptFactory
genome = FastaFile("/data/annotation/fasta/GCF_000001405.25_GRCh37.p13_genomic.fna.gz")
factory = RESTPyHGVSTranscriptFactory()
# factory = JSONPyHGVSTranscriptFactory(["./cdot-0.2.14.refseq.grch37.json.gz"]) # Uses local JSON file
pyhgvs.parse_hgvs_name('NM_001637.3:c.1582G>A', genome, get_transcript=factory.get_transcript_grch37)
```
[more PyHGVS examples](https://github.com/SACGF/cdot/wiki/PyHGVS-example-code):
## Q. What's the performance like?
* UTA public DB: 1-1.5 seconds / transcript
* cdot REST service: 10/second
* cdot JSON.gz: 500-1k/second
## Q. Where can I download the JSON.gz files?
[Download from GitHub releases](https://github.com/SACGF/cdot/releases) - RefSeq (37/38) - 72M, Ensembl (37/38) 61M
Details on what the files contain [here](https://github.com/SACGF/cdot/wiki/GitHub-release-file-details)
## Q. How does this compare to Universal Transcript Archive?
Both projects have similar goals of providing transcripts for loading HGVS, but they approach it from different ways
* UTA aligns sequences, then stores coordinates in an SQL database.
* cdot convert existing Ensembl/RefSeq GTFs into JSON
See [wiki for more details](https://github.com/SACGF/cdot/wiki/cdot-vs-UTA)
## Q. How do you store transcripts in JSON?
See [wiki page](https://github.com/SACGF/cdot/wiki/Transcript-JSON-format) for the format.
We think a standard for JSON gene/transcript information would be a great thing, and am keen to collaborate to make it happen!
## Q. What does cdot stand for?
cdot, pronounced "see dot" stands for Complete Dict of Transcripts
This was developed for the [Australian Genomics](https://www.australiangenomics.org.au/) [Shariant](https://shariant.org.au/) project, due to the need to load historical HGVS from lab archives.
Raw data
{
"_id": null,
"home_page": "https://github.com/SACGF/cdot",
"name": "cdot",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": null,
"author": "Dave Lawrence",
"author_email": "davmlaw@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/f0/62/f63febd29ad39c56d3616244d01f308f703b6901f6245fd8e1ac8c5221ed/cdot-0.2.26.tar.gz",
"platform": null,
"description": "# cdot\n\n[](https://pypi.org/project/cdot/) [](https://pypi.org/project/cdot/)\n\ncdot provides transcripts for the 2 most popular Python [HGVS](http://varnomen.hgvs.org/) libraries.\n\nIt works by:\n\n* Converting RefSeq/Ensembl GTFs to JSON \n* Providing loaders for the HGVS libraries, via JSON.gz files, or REST API via [cdot_rest](https://github.com/SACGF/cdot_rest))\n\nWe currently support ~905k transcripts (vs ~141k in UTA v.20210129)\n\n## New \n\nSee [changelog](https://github.com/SACGF/cdot/blob/main/CHANGELOG.md)\n\n2023-07-05:\n* BioCommons HGVS DataProvider fixes\n* Support for mouse transcripts (Mus Musculus GRCm38 and GRCm39)\n\n2023-04-03:\n* #41 - Support for T2T CHM13v2.0 [example code](https://github.com/SACGF/cdot/wiki/Biocommons-T2T-CHM13v2.0-example-code)\n\n## Install\n\n```\npip install cdot\n```\n\n## Examples\n\n[Biocommons HGVS](https://github.com/biocommons/hgvs) example:\n\n```\nimport hgvs\nfrom hgvs.assemblymapper import AssemblyMapper\nfrom cdot.hgvs.dataproviders import JSONDataProvider, RESTDataProvider\n\nhdp = RESTDataProvider() # Uses API server at cdot.cc\n# hdp = JSONDataProvider([\"./cdot-0.2.14.refseq.grch37.json.gz\"]) # Uses local JSON file\n\nam = AssemblyMapper(hdp,\n assembly_name='GRCh37',\n alt_aln_method='splign', replace_reference=True)\n\nhp = hgvs.parser.Parser()\nvar_c = hp.parse_hgvs_variant('NM_001637.3:c.1582G>A')\nam.c_to_g(var_c)\n```\n\n[more Biocommons examples](https://github.com/SACGF/cdot/wiki/Biocommons-HGVS-example-code):\n\n[PyHGVS](https://github.com/counsyl/hgvs) example:\n\n```\nimport pyhgvs\nfrom pysam.libcfaidx import FastaFile\nfrom cdot.pyhgvs.pyhgvs_transcript import JSONPyHGVSTranscriptFactory, RESTPyHGVSTranscriptFactory\n\ngenome = FastaFile(\"/data/annotation/fasta/GCF_000001405.25_GRCh37.p13_genomic.fna.gz\")\nfactory = RESTPyHGVSTranscriptFactory()\n# factory = JSONPyHGVSTranscriptFactory([\"./cdot-0.2.14.refseq.grch37.json.gz\"]) # Uses local JSON file\npyhgvs.parse_hgvs_name('NM_001637.3:c.1582G>A', genome, get_transcript=factory.get_transcript_grch37)\n```\n\n[more PyHGVS examples](https://github.com/SACGF/cdot/wiki/PyHGVS-example-code):\n\n## Q. What's the performance like?\n\n* UTA public DB: 1-1.5 seconds / transcript\n* cdot REST service: 10/second\n* cdot JSON.gz: 500-1k/second\n\n## Q. Where can I download the JSON.gz files?\n\n[Download from GitHub releases](https://github.com/SACGF/cdot/releases) - RefSeq (37/38) - 72M, Ensembl (37/38) 61M\n\nDetails on what the files contain [here](https://github.com/SACGF/cdot/wiki/GitHub-release-file-details)\n\n## Q. How does this compare to Universal Transcript Archive?\n\nBoth projects have similar goals of providing transcripts for loading HGVS, but they approach it from different ways\n\n* UTA aligns sequences, then stores coordinates in an SQL database. \n* cdot convert existing Ensembl/RefSeq GTFs into JSON\n\nSee [wiki for more details](https://github.com/SACGF/cdot/wiki/cdot-vs-UTA)\n\n## Q. How do you store transcripts in JSON?\n\nSee [wiki page](https://github.com/SACGF/cdot/wiki/Transcript-JSON-format) for the format.\n\nWe think a standard for JSON gene/transcript information would be a great thing, and am keen to collaborate to make it happen!\n\n## Q. What does cdot stand for?\n\ncdot, pronounced \"see dot\" stands for Complete Dict of Transcripts\n\nThis was developed for the [Australian Genomics](https://www.australiangenomics.org.au/) [Shariant](https://shariant.org.au/) project, due to the need to load historical HGVS from lab archives. \n",
"bugtrack_url": null,
"license": null,
"summary": "Transcripts for HGVS libraries",
"version": "0.2.26",
"project_urls": {
"Bug Tracker": "https://github.com/SACGF/cdot/issues",
"Homepage": "https://github.com/SACGF/cdot"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "66ab90cf9f3011a1018e874e86001c20a0d26f7004ac5c19697b8fbf62dec9b8",
"md5": "60cf6fa5afd2e587d892bdb5c980e6fe",
"sha256": "f9f6c3dbdb9dffda3779e77d9acef33ae3111c11a4de18fba5ff1d77cbc83c00"
},
"downloads": -1,
"filename": "cdot-0.2.26-py3-none-any.whl",
"has_sig": false,
"md5_digest": "60cf6fa5afd2e587d892bdb5c980e6fe",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 13870,
"upload_time": "2024-08-15T04:48:19",
"upload_time_iso_8601": "2024-08-15T04:48:19.801385Z",
"url": "https://files.pythonhosted.org/packages/66/ab/90cf9f3011a1018e874e86001c20a0d26f7004ac5c19697b8fbf62dec9b8/cdot-0.2.26-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "f062f63febd29ad39c56d3616244d01f308f703b6901f6245fd8e1ac8c5221ed",
"md5": "835e424070d449dd4e1ebf4a80919706",
"sha256": "6f9b9fb4076722f5d92d189fa4ef5a7e2af1cdd4f790068bb7d9a5d3ba73921b"
},
"downloads": -1,
"filename": "cdot-0.2.26.tar.gz",
"has_sig": false,
"md5_digest": "835e424070d449dd4e1ebf4a80919706",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 16399,
"upload_time": "2024-08-15T04:48:21",
"upload_time_iso_8601": "2024-08-15T04:48:21.436461Z",
"url": "https://files.pythonhosted.org/packages/f0/62/f63febd29ad39c56d3616244d01f308f703b6901f6245fd8e1ac8c5221ed/cdot-0.2.26.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-08-15 04:48:21",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "SACGF",
"github_project": "cdot",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "cdot"
}