fasta-checksum-utils


Namefasta-checksum-utils JSON
Version 0.4.3 PyPI version JSON
download
home_pageNone
SummaryLibrary and command-line utility for checksumming FASTA files and individual contigs.
upload_time2024-07-21 00:30:48
maintainerNone
docs_urlNone
authorDavid Lougheed
requires_python<4.0.0,>=3.9.1
licenseLGPL-3.0
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # fasta-checksum-utils

Asynchronous library and command-line utility for checksumming FASTA files and individual contigs.
Implements two checksumming algorithms: `MD5` and `GA4GH`, in order to fulfill the needs of the 
[Refget v2](http://samtools.github.io/hts-specs/refget.html) API specification.


## Installation

To install `fasta-checksum-utils`, run the following `pip` command:

```bash
pip install fasta-checksum-utils
```


## CLI Usage

To generate a text report of checksums in the FASTA document, run the following command:

```bash
fasta-checksum-utils ./my-fasta.fa[.gz]
```

This will print output in the following tab-delimited format:

```
file  [file size in bytes]    md5 [file MD5 hash]           ga4gh  [file GA4GH hash]
chr1  [chr1 sequence length]  md5 [chr1 sequence MD5 hash]  ga4gh  [chr1 sequence GA4GH hash]
chr2  [chr2 sequence length]  md5 [chr2 sequence MD5 hash]  ga4gh  [chr2 sequence GA4GH hash]
...
```

The following example is the output generated by specifying the SARS-CoV-2 genome FASTA from NCBI:

```
file	    30428	md5	825ab3c54b7a67ff2db55262eb532438	ga4gh	SQ.mMg8qNej7pU84juQQWobw9JyUy09oYdd
NC_045512.2	29903	md5	105c82802b67521950854a851fc6eefd	ga4gh	SQ.SyGVJg_YRedxvsjpqNdUgyyqx7lUfu_D
```

If the `--out-format bento-json` arguments are passed, the tool will instead output the report in a JSON
format, designed to be compatible with the requirements of the 
[Bento Reference Service](https://github.com/bento-platform/bento_reference_service). The following example
is the output generated by specifying the SARS-CoV-2 genome:

```json
{
  "fasta": "sars_cov_2.fa",
  "fasta_size": 30428,
  "md5": "825ab3c54b7a67ff2db55262eb532438",
  "ga4gh": "SQ.mMg8qNej7pU84juQQWobw9JyUy09oYdd",
  "contigs": [
    {
      "name": "NC_045512.2",
      "md5": "105c82802b67521950854a851fc6eefd",
      "ga4gh": "SQ.SyGVJg_YRedxvsjpqNdUgyyqx7lUfu_D",
      "length": 29903
    }
  ]
}
```

If an argument like `--fai [path or URL]` is passed, an additional `"fai": "..."` property will be added to the JSON 
object output.

If an argument like `--genome-id GRCh38` is provided, an additional `"id": "GRCh38"` property will be added to the
JSON object output.


## Library Usage

Below are some examples of how `fasta-checksum-utils` can be used as an asynchronous Python library:

```python
import asyncio
import fasta_checksum_utils as fc
import pysam
from pathlib import Path


async def demo():
    covid_genome: Path = Path("./sars_cov_2.fa")
    
    # calculate an MD5 checksum for a whole file
    file_checksum: str = await fc.algorithms.AlgorithmMD5.checksum_file(covid_genome)
    print(file_checksum)
    # prints "863ee5dba1da0ca3f87783782284d489"
    
    all_algorithms = (fc.algorithms.AlgorithmMD5, fc.algorithms.AlgorithmGA4GH)
    
    # calculate multiple checksums for a whole file
    all_checksums: tuple[str, ...] = await fc.checksum_file(file=covid_genome, algorithms=all_algorithms)
    print(all_checksums)
    # prints tuple: ("863ee5dba1da0ca3f87783782284d489", "SQ.mMg8qNej7pU84juQQWobw9JyUy09oYdd")
    
    # calculate an MD5 and GA4GH checksum for a specific contig in a PySAM FASTA file:
    fh = pysam.FastaFile(str(covid_genome))
    try:
        contig_checksums: tuple[str, ...] = await fc.checksum_contig(
            fh=fh, 
            contig_name="NC_045512.2", 
            algorithms=all_algorithms,
        )
        print(contig_checksums)
        # prints tuple: ("105c82802b67521950854a851fc6eefd", "SQ.SyGVJg_YRedxvsjpqNdUgyyqx7lUfu_D")
    finally:
        fh.close()  # always close the file handle


asyncio.run(demo())
```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "fasta-checksum-utils",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0.0,>=3.9.1",
    "maintainer_email": null,
    "keywords": null,
    "author": "David Lougheed",
    "author_email": "david.lougheed@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/b0/8c/ab4e547cb4ae01bdac53dfc716a536e4eb4ac5796dcfab1a739012b0d40b/fasta_checksum_utils-0.4.3.tar.gz",
    "platform": null,
    "description": "# fasta-checksum-utils\n\nAsynchronous library and command-line utility for checksumming FASTA files and individual contigs.\nImplements two checksumming algorithms: `MD5` and `GA4GH`, in order to fulfill the needs of the \n[Refget v2](http://samtools.github.io/hts-specs/refget.html) API specification.\n\n\n## Installation\n\nTo install `fasta-checksum-utils`, run the following `pip` command:\n\n```bash\npip install fasta-checksum-utils\n```\n\n\n## CLI Usage\n\nTo generate a text report of checksums in the FASTA document, run the following command:\n\n```bash\nfasta-checksum-utils ./my-fasta.fa[.gz]\n```\n\nThis will print output in the following tab-delimited format:\n\n```\nfile  [file size in bytes]    md5 [file MD5 hash]           ga4gh  [file GA4GH hash]\nchr1  [chr1 sequence length]  md5 [chr1 sequence MD5 hash]  ga4gh  [chr1 sequence GA4GH hash]\nchr2  [chr2 sequence length]  md5 [chr2 sequence MD5 hash]  ga4gh  [chr2 sequence GA4GH hash]\n...\n```\n\nThe following example is the output generated by specifying the SARS-CoV-2 genome FASTA from NCBI:\n\n```\nfile\t    30428\tmd5\t825ab3c54b7a67ff2db55262eb532438\tga4gh\tSQ.mMg8qNej7pU84juQQWobw9JyUy09oYdd\nNC_045512.2\t29903\tmd5\t105c82802b67521950854a851fc6eefd\tga4gh\tSQ.SyGVJg_YRedxvsjpqNdUgyyqx7lUfu_D\n```\n\nIf the `--out-format bento-json` arguments are passed, the tool will instead output the report in a JSON\nformat, designed to be compatible with the requirements of the \n[Bento Reference Service](https://github.com/bento-platform/bento_reference_service). The following example\nis the output generated by specifying the SARS-CoV-2 genome:\n\n```json\n{\n  \"fasta\": \"sars_cov_2.fa\",\n  \"fasta_size\": 30428,\n  \"md5\": \"825ab3c54b7a67ff2db55262eb532438\",\n  \"ga4gh\": \"SQ.mMg8qNej7pU84juQQWobw9JyUy09oYdd\",\n  \"contigs\": [\n    {\n      \"name\": \"NC_045512.2\",\n      \"md5\": \"105c82802b67521950854a851fc6eefd\",\n      \"ga4gh\": \"SQ.SyGVJg_YRedxvsjpqNdUgyyqx7lUfu_D\",\n      \"length\": 29903\n    }\n  ]\n}\n```\n\nIf an argument like `--fai [path or URL]` is passed, an additional `\"fai\": \"...\"` property will be added to the JSON \nobject output.\n\nIf an argument like `--genome-id GRCh38` is provided, an additional `\"id\": \"GRCh38\"` property will be added to the\nJSON object output.\n\n\n## Library Usage\n\nBelow are some examples of how `fasta-checksum-utils` can be used as an asynchronous Python library:\n\n```python\nimport asyncio\nimport fasta_checksum_utils as fc\nimport pysam\nfrom pathlib import Path\n\n\nasync def demo():\n    covid_genome: Path = Path(\"./sars_cov_2.fa\")\n    \n    # calculate an MD5 checksum for a whole file\n    file_checksum: str = await fc.algorithms.AlgorithmMD5.checksum_file(covid_genome)\n    print(file_checksum)\n    # prints \"863ee5dba1da0ca3f87783782284d489\"\n    \n    all_algorithms = (fc.algorithms.AlgorithmMD5, fc.algorithms.AlgorithmGA4GH)\n    \n    # calculate multiple checksums for a whole file\n    all_checksums: tuple[str, ...] = await fc.checksum_file(file=covid_genome, algorithms=all_algorithms)\n    print(all_checksums)\n    # prints tuple: (\"863ee5dba1da0ca3f87783782284d489\", \"SQ.mMg8qNej7pU84juQQWobw9JyUy09oYdd\")\n    \n    # calculate an MD5 and GA4GH checksum for a specific contig in a PySAM FASTA file:\n    fh = pysam.FastaFile(str(covid_genome))\n    try:\n        contig_checksums: tuple[str, ...] = await fc.checksum_contig(\n            fh=fh, \n            contig_name=\"NC_045512.2\", \n            algorithms=all_algorithms,\n        )\n        print(contig_checksums)\n        # prints tuple: (\"105c82802b67521950854a851fc6eefd\", \"SQ.SyGVJg_YRedxvsjpqNdUgyyqx7lUfu_D\")\n    finally:\n        fh.close()  # always close the file handle\n\n\nasyncio.run(demo())\n```\n",
    "bugtrack_url": null,
    "license": "LGPL-3.0",
    "summary": "Library and command-line utility for checksumming FASTA files and individual contigs.",
    "version": "0.4.3",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ea91606a1cb8537207b3aef9eaa81814de68bce7f0c79ad722ef6e8e16a120f7",
                "md5": "2a43a39eca8dae98ef8cd791625ba55f",
                "sha256": "53e6b796915c7d346890fbae34d58379d71b924282a71cb419e2e0e6585a337a"
            },
            "downloads": -1,
            "filename": "fasta_checksum_utils-0.4.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "2a43a39eca8dae98ef8cd791625ba55f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0.0,>=3.9.1",
            "size": 11003,
            "upload_time": "2024-07-21T00:30:46",
            "upload_time_iso_8601": "2024-07-21T00:30:46.811104Z",
            "url": "https://files.pythonhosted.org/packages/ea/91/606a1cb8537207b3aef9eaa81814de68bce7f0c79ad722ef6e8e16a120f7/fasta_checksum_utils-0.4.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b08cab4e547cb4ae01bdac53dfc716a536e4eb4ac5796dcfab1a739012b0d40b",
                "md5": "7e91a3f711e7e02388365e30177bbf06",
                "sha256": "1f1ad64cc11b14c3743391b252e24649487f5c795660582a3386f2f607d21038"
            },
            "downloads": -1,
            "filename": "fasta_checksum_utils-0.4.3.tar.gz",
            "has_sig": false,
            "md5_digest": "7e91a3f711e7e02388365e30177bbf06",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0.0,>=3.9.1",
            "size": 8494,
            "upload_time": "2024-07-21T00:30:48",
            "upload_time_iso_8601": "2024-07-21T00:30:48.127003Z",
            "url": "https://files.pythonhosted.org/packages/b0/8c/ab4e547cb4ae01bdac53dfc716a536e4eb4ac5796dcfab1a739012b0d40b/fasta_checksum_utils-0.4.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-07-21 00:30:48",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "fasta-checksum-utils"
}
        
Elapsed time: 0.26562s