Name | fasta-checksum-utils JSON |
Version |
0.4.3
JSON |
| download |
home_page | None |
Summary | Library and command-line utility for checksumming FASTA files and individual contigs. |
upload_time | 2024-07-21 00:30:48 |
maintainer | None |
docs_url | None |
author | David Lougheed |
requires_python | <4.0.0,>=3.9.1 |
license | LGPL-3.0 |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# fasta-checksum-utils
Asynchronous library and command-line utility for checksumming FASTA files and individual contigs.
Implements two checksumming algorithms: `MD5` and `GA4GH`, in order to fulfill the needs of the
[Refget v2](http://samtools.github.io/hts-specs/refget.html) API specification.
## Installation
To install `fasta-checksum-utils`, run the following `pip` command:
```bash
pip install fasta-checksum-utils
```
## CLI Usage
To generate a text report of checksums in the FASTA document, run the following command:
```bash
fasta-checksum-utils ./my-fasta.fa[.gz]
```
This will print output in the following tab-delimited format:
```
file [file size in bytes] md5 [file MD5 hash] ga4gh [file GA4GH hash]
chr1 [chr1 sequence length] md5 [chr1 sequence MD5 hash] ga4gh [chr1 sequence GA4GH hash]
chr2 [chr2 sequence length] md5 [chr2 sequence MD5 hash] ga4gh [chr2 sequence GA4GH hash]
...
```
The following example is the output generated by specifying the SARS-CoV-2 genome FASTA from NCBI:
```
file 30428 md5 825ab3c54b7a67ff2db55262eb532438 ga4gh SQ.mMg8qNej7pU84juQQWobw9JyUy09oYdd
NC_045512.2 29903 md5 105c82802b67521950854a851fc6eefd ga4gh SQ.SyGVJg_YRedxvsjpqNdUgyyqx7lUfu_D
```
If the `--out-format bento-json` arguments are passed, the tool will instead output the report in a JSON
format, designed to be compatible with the requirements of the
[Bento Reference Service](https://github.com/bento-platform/bento_reference_service). The following example
is the output generated by specifying the SARS-CoV-2 genome:
```json
{
"fasta": "sars_cov_2.fa",
"fasta_size": 30428,
"md5": "825ab3c54b7a67ff2db55262eb532438",
"ga4gh": "SQ.mMg8qNej7pU84juQQWobw9JyUy09oYdd",
"contigs": [
{
"name": "NC_045512.2",
"md5": "105c82802b67521950854a851fc6eefd",
"ga4gh": "SQ.SyGVJg_YRedxvsjpqNdUgyyqx7lUfu_D",
"length": 29903
}
]
}
```
If an argument like `--fai [path or URL]` is passed, an additional `"fai": "..."` property will be added to the JSON
object output.
If an argument like `--genome-id GRCh38` is provided, an additional `"id": "GRCh38"` property will be added to the
JSON object output.
## Library Usage
Below are some examples of how `fasta-checksum-utils` can be used as an asynchronous Python library:
```python
import asyncio
import fasta_checksum_utils as fc
import pysam
from pathlib import Path
async def demo():
covid_genome: Path = Path("./sars_cov_2.fa")
# calculate an MD5 checksum for a whole file
file_checksum: str = await fc.algorithms.AlgorithmMD5.checksum_file(covid_genome)
print(file_checksum)
# prints "863ee5dba1da0ca3f87783782284d489"
all_algorithms = (fc.algorithms.AlgorithmMD5, fc.algorithms.AlgorithmGA4GH)
# calculate multiple checksums for a whole file
all_checksums: tuple[str, ...] = await fc.checksum_file(file=covid_genome, algorithms=all_algorithms)
print(all_checksums)
# prints tuple: ("863ee5dba1da0ca3f87783782284d489", "SQ.mMg8qNej7pU84juQQWobw9JyUy09oYdd")
# calculate an MD5 and GA4GH checksum for a specific contig in a PySAM FASTA file:
fh = pysam.FastaFile(str(covid_genome))
try:
contig_checksums: tuple[str, ...] = await fc.checksum_contig(
fh=fh,
contig_name="NC_045512.2",
algorithms=all_algorithms,
)
print(contig_checksums)
# prints tuple: ("105c82802b67521950854a851fc6eefd", "SQ.SyGVJg_YRedxvsjpqNdUgyyqx7lUfu_D")
finally:
fh.close() # always close the file handle
asyncio.run(demo())
```
Raw data
{
"_id": null,
"home_page": null,
"name": "fasta-checksum-utils",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0.0,>=3.9.1",
"maintainer_email": null,
"keywords": null,
"author": "David Lougheed",
"author_email": "david.lougheed@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/b0/8c/ab4e547cb4ae01bdac53dfc716a536e4eb4ac5796dcfab1a739012b0d40b/fasta_checksum_utils-0.4.3.tar.gz",
"platform": null,
"description": "# fasta-checksum-utils\n\nAsynchronous library and command-line utility for checksumming FASTA files and individual contigs.\nImplements two checksumming algorithms: `MD5` and `GA4GH`, in order to fulfill the needs of the \n[Refget v2](http://samtools.github.io/hts-specs/refget.html) API specification.\n\n\n## Installation\n\nTo install `fasta-checksum-utils`, run the following `pip` command:\n\n```bash\npip install fasta-checksum-utils\n```\n\n\n## CLI Usage\n\nTo generate a text report of checksums in the FASTA document, run the following command:\n\n```bash\nfasta-checksum-utils ./my-fasta.fa[.gz]\n```\n\nThis will print output in the following tab-delimited format:\n\n```\nfile [file size in bytes] md5 [file MD5 hash] ga4gh [file GA4GH hash]\nchr1 [chr1 sequence length] md5 [chr1 sequence MD5 hash] ga4gh [chr1 sequence GA4GH hash]\nchr2 [chr2 sequence length] md5 [chr2 sequence MD5 hash] ga4gh [chr2 sequence GA4GH hash]\n...\n```\n\nThe following example is the output generated by specifying the SARS-CoV-2 genome FASTA from NCBI:\n\n```\nfile\t 30428\tmd5\t825ab3c54b7a67ff2db55262eb532438\tga4gh\tSQ.mMg8qNej7pU84juQQWobw9JyUy09oYdd\nNC_045512.2\t29903\tmd5\t105c82802b67521950854a851fc6eefd\tga4gh\tSQ.SyGVJg_YRedxvsjpqNdUgyyqx7lUfu_D\n```\n\nIf the `--out-format bento-json` arguments are passed, the tool will instead output the report in a JSON\nformat, designed to be compatible with the requirements of the \n[Bento Reference Service](https://github.com/bento-platform/bento_reference_service). The following example\nis the output generated by specifying the SARS-CoV-2 genome:\n\n```json\n{\n \"fasta\": \"sars_cov_2.fa\",\n \"fasta_size\": 30428,\n \"md5\": \"825ab3c54b7a67ff2db55262eb532438\",\n \"ga4gh\": \"SQ.mMg8qNej7pU84juQQWobw9JyUy09oYdd\",\n \"contigs\": [\n {\n \"name\": \"NC_045512.2\",\n \"md5\": \"105c82802b67521950854a851fc6eefd\",\n \"ga4gh\": \"SQ.SyGVJg_YRedxvsjpqNdUgyyqx7lUfu_D\",\n \"length\": 29903\n }\n ]\n}\n```\n\nIf an argument like `--fai [path or URL]` is passed, an additional `\"fai\": \"...\"` property will be added to the JSON \nobject output.\n\nIf an argument like `--genome-id GRCh38` is provided, an additional `\"id\": \"GRCh38\"` property will be added to the\nJSON object output.\n\n\n## Library Usage\n\nBelow are some examples of how `fasta-checksum-utils` can be used as an asynchronous Python library:\n\n```python\nimport asyncio\nimport fasta_checksum_utils as fc\nimport pysam\nfrom pathlib import Path\n\n\nasync def demo():\n covid_genome: Path = Path(\"./sars_cov_2.fa\")\n \n # calculate an MD5 checksum for a whole file\n file_checksum: str = await fc.algorithms.AlgorithmMD5.checksum_file(covid_genome)\n print(file_checksum)\n # prints \"863ee5dba1da0ca3f87783782284d489\"\n \n all_algorithms = (fc.algorithms.AlgorithmMD5, fc.algorithms.AlgorithmGA4GH)\n \n # calculate multiple checksums for a whole file\n all_checksums: tuple[str, ...] = await fc.checksum_file(file=covid_genome, algorithms=all_algorithms)\n print(all_checksums)\n # prints tuple: (\"863ee5dba1da0ca3f87783782284d489\", \"SQ.mMg8qNej7pU84juQQWobw9JyUy09oYdd\")\n \n # calculate an MD5 and GA4GH checksum for a specific contig in a PySAM FASTA file:\n fh = pysam.FastaFile(str(covid_genome))\n try:\n contig_checksums: tuple[str, ...] = await fc.checksum_contig(\n fh=fh, \n contig_name=\"NC_045512.2\", \n algorithms=all_algorithms,\n )\n print(contig_checksums)\n # prints tuple: (\"105c82802b67521950854a851fc6eefd\", \"SQ.SyGVJg_YRedxvsjpqNdUgyyqx7lUfu_D\")\n finally:\n fh.close() # always close the file handle\n\n\nasyncio.run(demo())\n```\n",
"bugtrack_url": null,
"license": "LGPL-3.0",
"summary": "Library and command-line utility for checksumming FASTA files and individual contigs.",
"version": "0.4.3",
"project_urls": null,
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "ea91606a1cb8537207b3aef9eaa81814de68bce7f0c79ad722ef6e8e16a120f7",
"md5": "2a43a39eca8dae98ef8cd791625ba55f",
"sha256": "53e6b796915c7d346890fbae34d58379d71b924282a71cb419e2e0e6585a337a"
},
"downloads": -1,
"filename": "fasta_checksum_utils-0.4.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "2a43a39eca8dae98ef8cd791625ba55f",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0.0,>=3.9.1",
"size": 11003,
"upload_time": "2024-07-21T00:30:46",
"upload_time_iso_8601": "2024-07-21T00:30:46.811104Z",
"url": "https://files.pythonhosted.org/packages/ea/91/606a1cb8537207b3aef9eaa81814de68bce7f0c79ad722ef6e8e16a120f7/fasta_checksum_utils-0.4.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "b08cab4e547cb4ae01bdac53dfc716a536e4eb4ac5796dcfab1a739012b0d40b",
"md5": "7e91a3f711e7e02388365e30177bbf06",
"sha256": "1f1ad64cc11b14c3743391b252e24649487f5c795660582a3386f2f607d21038"
},
"downloads": -1,
"filename": "fasta_checksum_utils-0.4.3.tar.gz",
"has_sig": false,
"md5_digest": "7e91a3f711e7e02388365e30177bbf06",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0.0,>=3.9.1",
"size": 8494,
"upload_time": "2024-07-21T00:30:48",
"upload_time_iso_8601": "2024-07-21T00:30:48.127003Z",
"url": "https://files.pythonhosted.org/packages/b0/8c/ab4e547cb4ae01bdac53dfc716a536e4eb4ac5796dcfab1a739012b0d40b/fasta_checksum_utils-0.4.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-07-21 00:30:48",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "fasta-checksum-utils"
}