<img src="https://github.com/not-a-feature/fastq/raw/main/fastq.png" width=300px alt="fastq logo"></img>
A simple FASTQ toolbox for small to medium size projects without strange dependencies.
[![DOI](https://zenodo.org/badge/450063403.svg)](https://zenodo.org/badge/latestdoi/450063403)
![Test Badge](https://github.com/not-a-feature/fastq/actions/workflows/tests.yml/badge.svg)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
![Download Badge](https://img.shields.io/pypi/dm/fastq.svg)
![Python Version Badge](https://img.shields.io/pypi/pyversions/fastq)
FASTQ files are text-based files for storing nucleotide sequences and its corresponding quality scores.
Reading such files is not particularly difficult, yet most off the shelf packages are overloaded with strange dependencies.
fastq offers an alternative to this and brings many useful functions without relying on third party packages.
## Installation
Using pip / pip3:
```bash
pip install fastq
```
Or by source:
```bash
git clone git@github.com:not-a-feature/fastq.git
cd fastq
pip install .
```
## How to use
fastq offers easy to use functions for fastq handling.
The main parts are:
- read()
- write()
- fastq_object()
- head
- body
- qstr
- info
- toFasta()
- len() / str() / eq()
## Reading FASTQ files
`read()` is a fastq reader which is able to handle compressed and non-compressed files.
Following compressions are supported: zip, tar, tar.gz, gz. If multiple files are stored inside an archive, all files are read.
This function returns a iterator of fastq_objects.
```python
import fastq as fq
fos = fq.read("dolphin.fastq") # Iterator of fastq entries.
fos = list(fos) # Cast to list
fos = fq.read("reads.tar.gz") # Is able to handle compressed files.
```
### fastq_object()
The core component of fastq is the ```fastq_object()```.
This object represents an FASTQ entry and consists of a head and body.
```python
import fastq as fq
fo = fq.fastq_object("@M01967:23:0", "GATTTGGGG", "!''*((((*")
fo.getHead() or fo.head # @M01967:23:0
fo.getSeq() or fo.body # GATTTGGGG
fo.getQual() or fo.qstr # !''*((((*
```
When `fastq_object(..).info` is requested, some summary statistics are computed and returned as dict.
This computation is "lazy". I.e. the first query takes longer than the second.
If the body or qstr is changed, info is automatically reset.
```python
fo.getInfo() or fo.info
{'a_num': 1, 'g_num': 5, # Absolute counts of AGTC
't_num': 3, 'c_num': 0, #
'gc_content': 0.5555555555555556, # Relative GC content
'at_content': 0.4444444444444444, # Relative AT content
'qual': 6.444444444444445, # Mean quality (Illumina Encoding)
'qual_median': 7, # Median quality
'qual_variance': 7.027777777777778, # Variance of quality
'qual_min': 0, 'qual_max': 9} # Min / Max quality
```
### Following methods are defined on a `fastq_object()`:
```python
str(fo) # will return:
# @M01967:23:0
# GATTTGGGG
# +
# !''*((((*
# Body length
len(fo) # will return 10, the length of the body
# Equality
# Checks only the body, not the header and not the quality string
print(fo == fo) # True
fo_b = fq.fastq_object("@different header", "GATTTGGGG", "!!!!!!!!!")
print(fo == fo_b) # True
fo_c = fq..fastq_object(">Different Body", "ZZZZ", "!--!")
print(fo == fo_c) # False
```
## Writing FASTQ files
`write()` is a basic fastq writer.
It takes a single or a list of fastq_objects and writes it to the given path.
The file is usually overwritten. Set `write(fo, "path.fastq", mode="a")` to append file.
```python
fos = fq.read("dolphin.fastq") # Iterator of fastq entries
fos = list(fos)
fq.write(fos, "new.fastq")
```
## License
Copyright (C) 2024 by Jules Kreuer - @not_a_feature
This piece of software is published unter the GNU General Public License v3.0
TLDR:
| Permissions | Conditions | Limitations |
| ---------------- | ---------------------------- | ----------- |
| ✓ Commercial use | Disclose source | ✕ Liability |
| ✓ Distribution | License and copyright notice | ✕ Warranty |
| ✓ Modification | Same license | |
| ✓ Patent use | State changes | |
| ✓ Private use | | |
Go to [LICENSE.md](https://github.com/not-a-feature/fastq/blob/main/LICENSE) to see the full version.
Raw data
{
"_id": null,
"home_page": "https://github.com/not-a-feature/fastq",
"name": "fastq",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "",
"keywords": "FASTQ,fastq,reader,toolbox,bio,bioinformatics",
"author": "Jules Kreuer / not_a_feature",
"author_email": "",
"download_url": "https://files.pythonhosted.org/packages/c9/30/fce26cb88a55f3eacf22f644aa8051afb4781929ee4031f30dab9756148d/fastq-2.0.4.tar.gz",
"platform": "unix",
"description": "<img src=\"https://github.com/not-a-feature/fastq/raw/main/fastq.png\" width=300px alt=\"fastq logo\"></img>\n\nA simple FASTQ toolbox for small to medium size projects without strange dependencies.\n\n[![DOI](https://zenodo.org/badge/450063403.svg)](https://zenodo.org/badge/latestdoi/450063403)\n![Test Badge](https://github.com/not-a-feature/fastq/actions/workflows/tests.yml/badge.svg)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n![Download Badge](https://img.shields.io/pypi/dm/fastq.svg)\n![Python Version Badge](https://img.shields.io/pypi/pyversions/fastq)\n\n\nFASTQ files are text-based files for storing nucleotide sequences and its corresponding quality scores.\nReading such files is not particularly difficult, yet most off the shelf packages are overloaded with strange dependencies.\n\nfastq offers an alternative to this and brings many useful functions without relying on third party packages.\n\n## Installation\nUsing pip / pip3:\n```bash\npip install fastq\n```\nOr by source:\n```bash\ngit clone git@github.com:not-a-feature/fastq.git\ncd fastq\npip install .\n```\n\n## How to use\nfastq offers easy to use functions for fastq handling.\nThe main parts are:\n- read()\n- write()\n- fastq_object()\n - head\n - body\n - qstr\n - info\n - toFasta()\n - len() / str() / eq()\n\n## Reading FASTQ files\n`read()` is a fastq reader which is able to handle compressed and non-compressed files.\nFollowing compressions are supported: zip, tar, tar.gz, gz. If multiple files are stored inside an archive, all files are read.\nThis function returns a iterator of fastq_objects.\n\n```python\nimport fastq as fq\nfos = fq.read(\"dolphin.fastq\") # Iterator of fastq entries.\nfos = list(fos) # Cast to list\nfos = fq.read(\"reads.tar.gz\") # Is able to handle compressed files.\n```\n\n### fastq_object()\nThe core component of fastq is the ```fastq_object()```.\n\nThis object represents an FASTQ entry and consists of a head and body.\n\n```python\nimport fastq as fq\nfo = fq.fastq_object(\"@M01967:23:0\", \"GATTTGGGG\", \"!''*((((*\")\nfo.getHead() or fo.head # @M01967:23:0\nfo.getSeq() or fo.body # GATTTGGGG\nfo.getQual() or fo.qstr # !''*((((*\n```\n\nWhen `fastq_object(..).info` is requested, some summary statistics are computed and returned as dict.\nThis computation is \"lazy\". I.e. the first query takes longer than the second.\nIf the body or qstr is changed, info is automatically reset.\n\n```python\nfo.getInfo() or fo.info\n{'a_num': 1, 'g_num': 5, # Absolute counts of AGTC\n 't_num': 3, 'c_num': 0, #\n 'gc_content': 0.5555555555555556, # Relative GC content\n 'at_content': 0.4444444444444444, # Relative AT content\n 'qual': 6.444444444444445, # Mean quality (Illumina Encoding)\n 'qual_median': 7, # Median quality\n 'qual_variance': 7.027777777777778, # Variance of quality\n 'qual_min': 0, 'qual_max': 9} # Min / Max quality\n```\n\n### Following methods are defined on a `fastq_object()`:\n\n```python\nstr(fo) # will return:\n# @M01967:23:0\n# GATTTGGGG\n# +\n# !''*((((*\n\n\n# Body length\nlen(fo) # will return 10, the length of the body\n\n# Equality\n# Checks only the body, not the header and not the quality string\nprint(fo == fo) # True\n\nfo_b = fq.fastq_object(\"@different header\", \"GATTTGGGG\", \"!!!!!!!!!\")\nprint(fo == fo_b) # True\n\nfo_c = fq..fastq_object(\">Different Body\", \"ZZZZ\", \"!--!\")\nprint(fo == fo_c) # False\n```\n\n## Writing FASTQ files\n`write()` is a basic fastq writer.\nIt takes a single or a list of fastq_objects and writes it to the given path.\n\nThe file is usually overwritten. Set `write(fo, \"path.fastq\", mode=\"a\")` to append file.\n\n```python\nfos = fq.read(\"dolphin.fastq\") # Iterator of fastq entries\nfos = list(fos)\n\nfq.write(fos, \"new.fastq\")\n```\n\n## License\n\nCopyright (C) 2024 by Jules Kreuer - @not_a_feature\n\nThis piece of software is published unter the GNU General Public License v3.0\nTLDR:\n\n| Permissions | Conditions | Limitations |\n| ---------------- | ---------------------------- | ----------- |\n| \u2713 Commercial use | Disclose source | \u2715 Liability |\n| \u2713 Distribution | License and copyright notice | \u2715 Warranty |\n| \u2713 Modification | Same license | |\n| \u2713 Patent use | State changes | |\n| \u2713 Private use | | |\n\nGo to [LICENSE.md](https://github.com/not-a-feature/fastq/blob/main/LICENSE) to see the full version.\n",
"bugtrack_url": null,
"license": "gpl-3.0",
"summary": "A simple FASTQ reader / toolbox for small to medium size projects without dependencies.",
"version": "2.0.4",
"project_urls": {
"Homepage": "https://github.com/not-a-feature/fastq"
},
"split_keywords": [
"fastq",
"fastq",
"reader",
"toolbox",
"bio",
"bioinformatics"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "5982e3e7bbcdabb85af04fa2bd1f894b3dd31c028fb317828f76c77e8af35cb4",
"md5": "56c4f1803fb9bb7a73a8aa62e5abc47b",
"sha256": "cc1199c4473152621ea64839c80abf16ace0bd14b3bbc17162e307d2215c583c"
},
"downloads": -1,
"filename": "fastq-2.0.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "56c4f1803fb9bb7a73a8aa62e5abc47b",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 18922,
"upload_time": "2024-03-12T10:49:38",
"upload_time_iso_8601": "2024-03-12T10:49:38.141028Z",
"url": "https://files.pythonhosted.org/packages/59/82/e3e7bbcdabb85af04fa2bd1f894b3dd31c028fb317828f76c77e8af35cb4/fastq-2.0.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "c930fce26cb88a55f3eacf22f644aa8051afb4781929ee4031f30dab9756148d",
"md5": "44abd1bd48142067886c6b1d238cce04",
"sha256": "b64b3041045b0220483571564ee9ee43ba2a2fb6e5df5eb2dbcbaf5e68b3ebad"
},
"downloads": -1,
"filename": "fastq-2.0.4.tar.gz",
"has_sig": false,
"md5_digest": "44abd1bd48142067886c6b1d238cce04",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 21739,
"upload_time": "2024-03-12T10:49:39",
"upload_time_iso_8601": "2024-03-12T10:49:39.840913Z",
"url": "https://files.pythonhosted.org/packages/c9/30/fce26cb88a55f3eacf22f644aa8051afb4781929ee4031f30dab9756148d/fastq-2.0.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-03-12 10:49:39",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "not-a-feature",
"github_project": "fastq",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "os",
"specs": []
},
{
"name": "typing",
"specs": []
},
{
"name": "zipfile",
"specs": []
},
{
"name": "gzip",
"specs": []
},
{
"name": "tarfile",
"specs": []
},
{
"name": "miniFasta",
"specs": [
[
">=",
"3.0.1"
]
]
}
],
"tox": true,
"lcname": "fastq"
}