seqtk-rs

Name	seqtk-rs JSON
Version	0.2.0 JSON
	download
home_page	https://github.com/yenyen1/seqtk-rs
Summary	This is a sequence processing tool written in Rust for manipulating FASTA/FASTQ files. Pure rust version of seqtk.
upload_time	2025-07-22 01:24:03
maintainer	None
docs_url	None
author	Yen Yen Wang <wangyenyen.st00g@g2.nctu.edu.tw>
requires_python	>=3.8
license	MIT OR Apache-2.0
keywords	fastq fasta bio sequence ngs
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # seqtk-rs
[![crate](https://img.shields.io/crates/v/seqtk-rs.svg)](https://crates.io/crates/seqtk-rs)

This is a sequence processing tool written in Rust for manipulating FASTA/FASTQ files. I built this tool out of my passion for Rust. Its functionality and subcommand names are similar to those in [`seqtk`](https://github.com/lh3/seqtk), but I’ve made some changes based on my own design logic. 

<!-- ⚠️ **Notice:** This project was previously paused but is now being actively resumed. The first release is expected by the end of June or early July 2025. -->

## Installation
```sh
cargo install seqtk-rs
seqtk_rs -h
```

## Current Features
- [x] `seq`     Common transformation of FASTA/Q
                
- [x] `sample`  Random Sampling by given seed and fraction

- [x] `size`    Report the stats of sequence length 
  
    (**Output:** #seq, #bases, avg_size, min_size, med_size, max_size, N50)


- [x] `fqchk`   Report stats for sequence and quality by position
  
    (**Output:** POS, #bases, %A, %C, %G, %T, %N, avgQ, errQ, ...)
    - **avgQ:** Average quality score *`(Q₁ + Q₂ + ... + Qₙ) / N`*
    - **errQ:** Estimated error rate *`-10 * log₁₀((P₁ + P₂ + ... + Pₙ) / N)`*

    **Notice:** Some tools treat quality scores less than 3 (Q < 3) as 3 to avoid instability in downstream metrics. For example, Q = 0 yields an error probability P = 1.0, Q = 1 gives P ≈ 0.794, and Q = 2 gives P ≈ 0.630. These low Q-scores can heavily skew error rate calculations (e.g., errQ), which is why they are often floored to 3. However, this adjustment can lead to results that are inconsistent with the original definition. Therefore, this tool preserves the original quality scores as-is.
  
- [x] `comp`    Report the nucleotide composition of FASTA/Q 
    
    (**Output**: #A, #C, #G, #T, #2, #3, #4, #CG, #GC)

    - `CG` or `GC`: Number of CG/GC on the template strand

- [x] `qctrim`    Trims low-quality bases from a FASTQ data based on a quality threshold Q.
    

## TODO
- [ ] `trimAdapter` trim the adapter for FASTQ file 


## Acknowledgements
- [`seqtk`](https://github.com/lh3/seqtk)

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/yenyen1/seqtk-rs",
    "name": "seqtk-rs",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "fastq, fasta, bio, sequence, NGS",
    "author": "Yen Yen Wang <wangyenyen.st00g@g2.nctu.edu.tw>",
    "author_email": "Yen Yen Wang <wangyenyen.st00g@g2.nctu.edu.tw>",
    "download_url": null,
    "platform": null,
    "description": "# seqtk-rs\n[![crate](https://img.shields.io/crates/v/seqtk-rs.svg)](https://crates.io/crates/seqtk-rs)\n\nThis is a sequence processing tool written in Rust for manipulating FASTA/FASTQ files. I built this tool out of my passion for Rust. Its functionality and subcommand names are similar to those in [`seqtk`](https://github.com/lh3/seqtk), but I\u2019ve made some changes based on my own design logic. \n\n<!-- \u26a0\ufe0f **Notice:** This project was previously paused but is now being actively resumed. The first release is expected by the end of June or early July 2025. -->\n\n## Installation\n```sh\ncargo install seqtk-rs\nseqtk_rs -h\n```\n\n## Current Features\n- [x] `seq`     Common transformation of FASTA/Q\n                \n- [x] `sample`  Random Sampling by given seed and fraction\n\n- [x] `size`    Report the stats of sequence length \n  \n    (**Output:** #seq, #bases, avg_size, min_size, med_size, max_size, N50)\n\n\n- [x] `fqchk`   Report stats for sequence and quality by position\n  \n    (**Output:** POS, #bases, %A, %C, %G, %T, %N, avgQ, errQ, ...)\n    - **avgQ:** Average quality score *`(Q\u2081 + Q\u2082 + ... + Q\u2099) / N`*\n    - **errQ:** Estimated error rate *`-10 * log\u2081\u2080((P\u2081 + P\u2082 + ... + P\u2099) / N)`*\n\n    **Notice:** Some tools treat quality scores less than 3 (Q < 3) as 3 to avoid instability in downstream metrics. For example, Q = 0 yields an error probability P = 1.0, Q = 1 gives P \u2248 0.794, and Q = 2 gives P \u2248 0.630. These low Q-scores can heavily skew error rate calculations (e.g., errQ), which is why they are often floored to 3. However, this adjustment can lead to results that are inconsistent with the original definition. Therefore, this tool preserves the original quality scores as-is.\n  \n- [x] `comp`    Report the nucleotide composition of FASTA/Q \n    \n    (**Output**: #A, #C, #G, #T, #2, #3, #4, #CG, #GC)\n\n    - `CG` or `GC`: Number of CG/GC on the template strand\n\n- [x] `qctrim`    Trims low-quality bases from a FASTQ data based on a quality threshold Q.\n    \n\n## TODO\n- [ ] `trimAdapter` trim the adapter for FASTQ file \n\n\n## Acknowledgements\n- [`seqtk`](https://github.com/lh3/seqtk)\n",
    "bugtrack_url": null,
    "license": "MIT OR Apache-2.0",
    "summary": "This is a sequence processing tool written in Rust for manipulating FASTA/FASTQ files. Pure rust version of seqtk.",
    "version": "0.2.0",
    "project_urls": {
        "Homepage": "https://github.com/yenyen1/seqtk-rs",
        "Source Code": "https://github.com/yenyen1/seqtk-rs"
    },
    "split_keywords": [
        "fastq",
        " fasta",
        " bio",
        " sequence",
        " ngs"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "7d06d444869bcc25019e3b959112472001609d56a9821bf9f060e0f1d047c812",
                "md5": "9905cc028e679279bd612827b7de96ce",
                "sha256": "06c81231bf33f4bd4c568500773b8e6ce89e58b382cee2c8e8f93be7a2c7e13c"
            },
            "downloads": -1,
            "filename": "seqtk_rs-0.2.0-py3-none-manylinux_2_34_x86_64.whl",
            "has_sig": false,
            "md5_digest": "9905cc028e679279bd612827b7de96ce",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 613062,
            "upload_time": "2025-07-22T01:24:03",
            "upload_time_iso_8601": "2025-07-22T01:24:03.773146Z",
            "url": "https://files.pythonhosted.org/packages/7d/06/d444869bcc25019e3b959112472001609d56a9821bf9f060e0f1d047c812/seqtk_rs-0.2.0-py3-none-manylinux_2_34_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-22 01:24:03",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "yenyen1",
    "github_project": "seqtk-rs",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "seqtk-rs"
}

Yen Yen Wang <wangyenyen.st00g@g2.nctu.edu.tw>