fastqwiper


Namefastqwiper JSON
Version 2023.2.82 PyPI version JSON
download
home_pagehttps://github.com/mazzalab/fastqwiper
SummaryAn ensamble method to recover corrupted FASTQ files, drop or fix pesky lines, remove unpaired reads, and fix reads interleaving
upload_time2023-10-30 19:59:28
maintainer
docs_urlNone
authorTommaso Mazza
requires_python>=3.7,<3.11
licenseMIT
keywords genomics ngs fastq bioinformatics
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage
            # FastqWiper 
[![Build](https://github.com/mazzalab/fastqwiper/actions/workflows/buildall_and_publish.yml/badge.svg)](https://github.com/mazzalab/fastqwiper/actions/workflows/buildall_and_publish.yml) [![codecov](https://codecov.io/gh/mazzalab/fastqwiper/graph/badge.svg?token=5V4AQTK619)](https://codecov.io/gh/mazzalab/fastqwiper) [![GitHub issues](https://img.shields.io/github/issues-raw/mazzalab/fastqwiper)](https://github.com/mazzalab/fastqwiper/issues) 

[![Anaconda-Server Badge](https://anaconda.org/bfxcss/fastqwiper/badges/version.svg)](https://anaconda.org/bfxcss/fastqwiper) [![Anaconda-Server Badge](https://anaconda.org/bfxcss/fastqwiper/badges/latest_release_date.svg)](https://anaconda.org/bfxcss/fastqwiper) [![Anaconda-Server Badge](https://anaconda.org/bfxcss/fastqwiper/badges/platforms.svg)](https://anaconda.org/bfxcss/fastqwiper) [![Anaconda-Server Badge](https://anaconda.org/bfxcss/fastqwiper/badges/downloads.svg)](https://anaconda.org/bfxcss/fastqwiper)

[![PyPI version](https://badge.fury.io/py/fastqwiper.svg)](https://badge.fury.io/py/fastqwiper) [![PyPI pyversions](https://img.shields.io/pypi/pyversions/fastqwiper.svg)](https://pypi.python.org/pypi/fastqwiper/) ![PyPI - Downloads](https://img.shields.io/pypi/dm/fastqwiper)

[![Docker](https://badgen.net/badge/icon/docker?icon=docker&label)](https://hub.docker.com/r/mazzalab/fastqwiper) ![Docker Pulls](https://img.shields.io/docker/pulls/mazzalab/fastqwiper)

`FastqWiper` is a Snakemake-enabled application that wipes out bad reads from broken FASTQ files. Additionally, the available and pre-designed Snakemake [workflows](https://github.com/mazzalab/fastqwiper/tree/main/pipeline) allows **recovering** corrupted `fastq.gz`, **dropping** or **fixing** pesky lines, **removing** unpaired reads, and **fixing** reads interleaving.

* Compatibility: Python ≥3.7, <3.11
* OS: Windows, Linux, Mac OS (Snakemake workflows through Docker for Windows)
* Contributions: [bioinformatics@css-mendel.it](bioinformatics@css-mendel.it)
* Docker: https://hub.docker.com/r/mazzalab/fastqwiper
* Singularity: https://cloud.sylabs.io/library/mazzalab/fastqwiper/fastqwiper.sif
* Bug report: [https://github.com/mazzalab/fastqwiper/issues](https://github.com/mazzalab/fastqwiper/issues)


## USAGE
- **Case 1**. You have one or a couple (R1&R2) of **computer readable** FASTQ files which contain pesky, unformatted, uncompliant lines: Use *FastWiper* to clean them;
- **Case 2**. You have one or a couple (R1&R2) of **computer readable** FASTQ files that you want to drop unpaired reads from or fix reads interleaving: Use the FastqWiper's *Snakemake workflows*;
- **Case 3**. You have one `fastq.gz` file or a couple (R1&R2) of `fastq.gz` files which are corrupted (**unreadable**) and you want to recover healthy reads and reformat them: Use the FastqWiper's *Snakemake workflows*;


## Installation
### Case 1
This requires you to install FastqWiper and therefore does not require you to configure *workflows* also. You can do it for all OSs:

#### Use Conda

```
conda create -n fastqwiper python=3.10
conda activate fastqwiper
conda install -c bfxcss -c conda-forge fastqwiper

fastqwiper --help
```
*Hint: for an healthier experience, use* **mamba**


#### Use Pypi
```
pip install fastqwiper

fastqwiper --help
```
<br/>

`fastqwiper  <options>`
```
options:
  --fastq_in TEXT          The input FASTQ file to be cleaned  [required]
  --fastq_out TEXT         The wiped FASTQ file                [required]
  --log_frequency INTEGER  The number of reads you want to print a status message
  --log_out TEXT           The file name of the final quality report summary
  --help                   Show this message and exit.
```
It  accepts in input and outputs **readable** `*.fastq` or `*.fastq.gz` files.


### Cases 2 & 3
There are <b>QUICK</b> and a <b>SLOW</b> methods to configure `FastqWiper`'s workflows.


#### One quick way (Docker)
1. Pull the Docker image from DockerHub:

`docker pull mazzalab/fastqwiper`

2. Once downloaded the image, type:

CMD: `docker run --rm -ti --name fastqwiper -v "YOUR_LOCAL_PATH_TO_DATA_FOLDER:/fastqwiper/data" mazzalab/fastqwiper paired 8 sample 50000000`

#### Another quick way (Singularity)
1. Pull the Singularity image from the Cloud Library:

`singularity pull library://mazzalab/fastqwiper/fastqwiper.sif`

2. Once downloaded the image (e.g., fastqwiper.sif_2023.2.70.sif), type:

CMD `singularity run --bind /scratch/tom/fastqwiper_singularity/data:/fastqwiper/data --writable-tmpfs fastqwiper.sif_2023.2.70.sif paired 8 sample 50000000`

If you want to bind the `.singularity` cache folder and the `logs` folder, you can omit `--writable-tmpfs`, create the folders `.singularity` and `logs` (`mkdir .singularity logs`) on the host system, and use this command instead:

CMD: `singularity run --bind YOUR_LOCAL_PATH_TO_DATA_FOLDER/:/fastqwiper/data --bind YOUR_LOCAL_PATH_TO_.singularity_FOLDER/:/fastqwiper/.snakemake --bind YOUR_LOCAL_PATH_TO_LOGS_FOLDER/:/fastqwiper/logs fastqwiper.sif_2023.2.70.sif paired 8 sample 50000000`

For both **Docker** and **Singularity**:

- `YOUR_LOCAL_PATH_TO_DATA_FOLDER` is the path of the folder where the fastq.gz files to be wiped are located;
- `paired` triggers the cleaning of R1 and R2. Alternatively, `single` will trigger the wipe of individual FASTQ files;
- `8` is the number of your choice of computing cores to be spawned;
- `sample` is part of the names of the FASTQ files to be wiped. <b>Be aware</b> that: for <b>paired-end</b> files (e.g., "sample_R1.fastq.gz" and "sample_R2.fastq.gz"), your files must finish with `_R1.fastq.gz` and `_R2.fastq.gz`. Therefore, the argument to pass is everything before these texts: `sample` in this case. For <b>single end</b>/individual files (e.g., "excerpt_R1_001.fastq.gz"), your file must end with the string `.fastq.gz`; the preceding text, i.e., "excerpt_R1_001" in this case, will be the text to be passed to the command as an argument. 
- `50000000` is the number of rows-per-chunk (used when cores>1. It must be a number multiple of 4)


#### The slow way (Linux & Mac OS)
To enable the use of preconfigured [pipelines](https://github.com/mazzalab/fastqwiper/tree/main/pipeline), you need to install **Snakemake**. The recommended way to install Snakemake is via Conda, because it enables **Snakemake** to [handle software dependencies of your workflow](https://snakemake.readthedocs.io/en/stable/snakefiles/deployment.html#integrated-package-management).
However, the default conda solver is slow and often hangs. Therefore, we recommend installing [Mamba](https://github.com/mamba-org/mamba) as a drop-in replacement via

`$ conda install -c conda-forge mamba`

if you have anaconda/miniconda already installed, or directly installing `Mambaforge` as described [here](https://github.com/conda-forge/miniforge#mambaforge).


Then, create and activate a clean environment as above:

```
mamba create -n fastqwiper python=3.10
mamba activate fastqwiper
```
Finally, install a few dependencies:

```
$ mamba install -c bioconda snakemake
$ mamba install colorama click
```


#### Usage
Clone the FastqWiper repository in a folder of your choice and enter it:

```
git clone https://github.com/mazzalab/fastqwiper.git
cd fastqwiper
```

It contains, in particular, a folder `data` containing the fastq files to be processed, a folder `pipeline` containing the released pipelines and a folder `fastq_wiper` with the source files of `FastqWiper`. <br/>
Input files to be processed should be copied into the **data** folder.

Currently, to run the `FastqWiper` pipelines, the following packages need to be installed manually:

### required packages:
[gzrt](https://github.com/arenn/gzrt) (Linux build fron source [instructions](https://github.com/arenn/gzrt/blob/master/README.build), Ubuntu install [instructions](https://howtoinstall.co/en/gzrt), Mac OS install [instructions](https://formulae.brew.sh/formula/gzrt))

[BBTools](https://jgi.doe.gov/data-and-tools/software-tools/bbtools/) (install [instructions](https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/installation-guide/))

If installed from source, `gzrt` scripts need to be put on PATH. `bbmap` must be installed in the root folder of FastqWiper, as the image below

![FastqWiper folder yierarchy](assets/hierarchy.png)

### Commands:
Copy the fastq files you want to fix in the `data` folder.
**N.b.**: In all commands above, you will pass to the workflow the name of the sample to be analyzed through the config argument: `sample_name`. Remember that your fastq files' names must finish with `_R1.fastq.gz` and `_R2.fastq.gz`, for paired fastq files, and with `.fastq.gz`, for individual fastq files, and, therefore, the text to be assigned to the variable `sample_name` must be everything before them. E.g., if your files are `my_sample_R1.fastq.gz` and `my_sample_R2.fastq.gz`, then `--config sample_name=my_sample`.

#### Paired-end files

- **Get a dry run** of a pipeline (e.g., `fix_wipe_pairs_reads_sequential.smk`):<br />
`snakemake --config sample_name=my_sample -s pipeline/fix_wipe_pairs_reads_sequential.smk --use-conda --cores 4`

- **Generate the planned DAG**:<br />
`snakemake --config sample_name=my_sample -s pipeline/fix_wipe_pairs_reads_sequential.smk --dag | dot -Tpdf > dag.pdf`<br /> <br />
<img src="https://github.com/mazzalab/fastqwiper/blob/main/pipeline/fix_wipe_pairs_reads.png?raw=true" width="400">

- **Run the pipeline** (n.b., during the first execution, Snakemake will download and install some required remote packages and may take longer). The number of computing cores can be tuned accordingly:<br />
`snakemake --config sample_name=my_sample -s pipeline/fix_wipe_single_reads_sequential.smk --use-conda --cores 2`

Fixed files will be copied in the `data` folder and will be suffixed with the string `_fixed_wiped_paired_interleaving`.
We remind that the `fix_wipe_pairs_reads_sequential.smk` and `fix_wipe_pairs_reads_parallel.smk` pipelines perform the following actions:
- execute `gzrt` on corrupted fastq.gz files (i.e., that cannot be unzipped because of errors) and recover readable reads;
- execute `FastqWiper` on recovered reads to make them compliant with the FASTQ format (source: [Wipipedia](https://en.wikipedia.org/wiki/FASTQ_format))
- execute `Trimmomatic` on wiped reads to remove residual unpaired reads
- execute `BBmap (repair.sh)` on paired reads to fix the correct interleaving and sort fastq files.  

#### Single-end files
`fix_wipe_single_reads_parallel.smk` and `fix_wipe_single_reads_sequential.smk` will not execute `trimmomatic` and BBmap's `repair.sh`.

- **Get a dry run** of a pipeline (e.g., `fix_wipe_single_reads_sequential.smk`):<br />
`snakemake --config sample_name=my_sample -s pipeline/fix_wipe_single_reads_sequential.smk --use-conda --cores 2 -np`

- **Generate the planned DAG**:<br />
`snakemake --config sample_name=my_sample -s pipeline/fix_wipe_single_reads_sequential.smk --dag | dot -Tpdf > dag.pdf`<br /><br />
<img src="https://github.com/mazzalab/fastqwiper/blob/main/pipeline/fix_wipe_single_reads.png?raw=true" width="200">

- **Run the pipeline** (n.b., The number of computing cores can be tuned accordingly):<br />
`snakemake --config sample_name=my_sample -s pipeline/fix_wipe_single_reads_sequential.smk --use-conda --cores 2`

# Author
**Tommaso Mazza**  
[![Tweeting](https://img.shields.io/twitter/url/http/shields.io.svg?style=social)](https://twitter.com/irongraft)

Laboratory of Bioinformatics</br>
Fondazione IRCCS Casa Sollievo della Sofferenza</br>
Viale Regina Margherita 261 - 00198 Roma IT</br>
Tel: +39 06 44160526 - Fax: +39 06 44160548</br>
E-mail: t.mazza@css-mendel.it</br>
Web page: http://www.css-mendel.it</br>
Web page: http://bioinformatics.css-mendel.it</br>

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/mazzalab/fastqwiper",
    "name": "fastqwiper",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7,<3.11",
    "maintainer_email": "",
    "keywords": "genomics,ngs,fastq,bioinformatics",
    "author": "Tommaso Mazza",
    "author_email": "bioinformatics@css-mendel.it",
    "download_url": "",
    "platform": null,
    "description": "# FastqWiper \n[![Build](https://github.com/mazzalab/fastqwiper/actions/workflows/buildall_and_publish.yml/badge.svg)](https://github.com/mazzalab/fastqwiper/actions/workflows/buildall_and_publish.yml) [![codecov](https://codecov.io/gh/mazzalab/fastqwiper/graph/badge.svg?token=5V4AQTK619)](https://codecov.io/gh/mazzalab/fastqwiper) [![GitHub issues](https://img.shields.io/github/issues-raw/mazzalab/fastqwiper)](https://github.com/mazzalab/fastqwiper/issues) \n\n[![Anaconda-Server Badge](https://anaconda.org/bfxcss/fastqwiper/badges/version.svg)](https://anaconda.org/bfxcss/fastqwiper) [![Anaconda-Server Badge](https://anaconda.org/bfxcss/fastqwiper/badges/latest_release_date.svg)](https://anaconda.org/bfxcss/fastqwiper) [![Anaconda-Server Badge](https://anaconda.org/bfxcss/fastqwiper/badges/platforms.svg)](https://anaconda.org/bfxcss/fastqwiper) [![Anaconda-Server Badge](https://anaconda.org/bfxcss/fastqwiper/badges/downloads.svg)](https://anaconda.org/bfxcss/fastqwiper)\n\n[![PyPI version](https://badge.fury.io/py/fastqwiper.svg)](https://badge.fury.io/py/fastqwiper) [![PyPI pyversions](https://img.shields.io/pypi/pyversions/fastqwiper.svg)](https://pypi.python.org/pypi/fastqwiper/) ![PyPI - Downloads](https://img.shields.io/pypi/dm/fastqwiper)\n\n[![Docker](https://badgen.net/badge/icon/docker?icon=docker&label)](https://hub.docker.com/r/mazzalab/fastqwiper) ![Docker Pulls](https://img.shields.io/docker/pulls/mazzalab/fastqwiper)\n\n`FastqWiper` is a Snakemake-enabled application that wipes out bad reads from broken FASTQ files. Additionally, the available and pre-designed Snakemake [workflows](https://github.com/mazzalab/fastqwiper/tree/main/pipeline) allows **recovering** corrupted `fastq.gz`, **dropping** or **fixing** pesky lines, **removing** unpaired reads, and **fixing** reads interleaving.\n\n* Compatibility: Python \u22653.7, <3.11\n* OS: Windows, Linux, Mac OS (Snakemake workflows through Docker for Windows)\n* Contributions: [bioinformatics@css-mendel.it](bioinformatics@css-mendel.it)\n* Docker: https://hub.docker.com/r/mazzalab/fastqwiper\n* Singularity: https://cloud.sylabs.io/library/mazzalab/fastqwiper/fastqwiper.sif\n* Bug report: [https://github.com/mazzalab/fastqwiper/issues](https://github.com/mazzalab/fastqwiper/issues)\n\n\n## USAGE\n- **Case 1**. You have one or a couple (R1&R2) of **computer readable** FASTQ files which contain pesky, unformatted, uncompliant lines: Use *FastWiper* to clean them;\n- **Case 2**. You have one or a couple (R1&R2) of **computer readable** FASTQ files that you want to drop unpaired reads from or fix reads interleaving: Use the FastqWiper's *Snakemake workflows*;\n- **Case 3**. You have one `fastq.gz` file or a couple (R1&R2) of `fastq.gz` files which are corrupted (**unreadable**) and you want to recover healthy reads and reformat them: Use the FastqWiper's *Snakemake workflows*;\n\n\n## Installation\n### Case 1\nThis requires you to install FastqWiper and therefore does not require you to configure *workflows* also. You can do it for all OSs:\n\n#### Use Conda\n\n```\nconda create -n fastqwiper python=3.10\nconda activate fastqwiper\nconda install -c bfxcss -c conda-forge fastqwiper\n\nfastqwiper --help\n```\n*Hint: for an healthier experience, use* **mamba**\n\n\n#### Use Pypi\n```\npip install fastqwiper\n\nfastqwiper --help\n```\n<br/>\n\n`fastqwiper  <options>`\n```\noptions:\n  --fastq_in TEXT          The input FASTQ file to be cleaned  [required]\n  --fastq_out TEXT         The wiped FASTQ file                [required]\n  --log_frequency INTEGER  The number of reads you want to print a status message\n  --log_out TEXT           The file name of the final quality report summary\n  --help                   Show this message and exit.\n```\nIt  accepts in input and outputs **readable** `*.fastq` or `*.fastq.gz` files.\n\n\n### Cases 2 & 3\nThere are <b>QUICK</b> and a <b>SLOW</b> methods to configure `FastqWiper`'s workflows.\n\n\n#### One quick way (Docker)\n1. Pull the Docker image from DockerHub:\n\n`docker pull mazzalab/fastqwiper`\n\n2. Once downloaded the image, type:\n\nCMD: `docker run --rm -ti --name fastqwiper -v \"YOUR_LOCAL_PATH_TO_DATA_FOLDER:/fastqwiper/data\" mazzalab/fastqwiper paired 8 sample 50000000`\n\n#### Another quick way (Singularity)\n1. Pull the Singularity image from the Cloud Library:\n\n`singularity pull library://mazzalab/fastqwiper/fastqwiper.sif`\n\n2. Once downloaded the image (e.g., fastqwiper.sif_2023.2.70.sif), type:\n\nCMD `singularity run --bind /scratch/tom/fastqwiper_singularity/data:/fastqwiper/data --writable-tmpfs fastqwiper.sif_2023.2.70.sif paired 8 sample 50000000`\n\nIf you want to bind the `.singularity` cache folder and the `logs` folder, you can omit `--writable-tmpfs`, create the folders `.singularity` and `logs` (`mkdir .singularity logs`) on the host system, and use this command instead:\n\nCMD: `singularity run --bind YOUR_LOCAL_PATH_TO_DATA_FOLDER/:/fastqwiper/data --bind YOUR_LOCAL_PATH_TO_.singularity_FOLDER/:/fastqwiper/.snakemake --bind YOUR_LOCAL_PATH_TO_LOGS_FOLDER/:/fastqwiper/logs fastqwiper.sif_2023.2.70.sif paired 8 sample 50000000`\n\nFor both **Docker** and **Singularity**:\n\n- `YOUR_LOCAL_PATH_TO_DATA_FOLDER` is the path of the folder where the fastq.gz files to be wiped are located;\n- `paired` triggers the cleaning of R1 and R2. Alternatively, `single` will trigger the wipe of individual FASTQ files;\n- `8` is the number of your choice of computing cores to be spawned;\n- `sample` is part of the names of the FASTQ files to be wiped. <b>Be aware</b> that: for <b>paired-end</b> files (e.g., \"sample_R1.fastq.gz\" and \"sample_R2.fastq.gz\"), your files must finish with `_R1.fastq.gz` and `_R2.fastq.gz`. Therefore, the argument to pass is everything before these texts: `sample` in this case. For <b>single end</b>/individual files (e.g., \"excerpt_R1_001.fastq.gz\"), your file must end with the string `.fastq.gz`; the preceding text, i.e., \"excerpt_R1_001\" in this case, will be the text to be passed to the command as an argument. \n- `50000000` is the number of rows-per-chunk (used when cores>1. It must be a number multiple of 4)\n\n\n#### The slow way (Linux & Mac OS)\nTo enable the use of preconfigured [pipelines](https://github.com/mazzalab/fastqwiper/tree/main/pipeline), you need to install **Snakemake**. The recommended way to install Snakemake is via Conda, because it enables **Snakemake** to [handle software dependencies of your workflow](https://snakemake.readthedocs.io/en/stable/snakefiles/deployment.html#integrated-package-management).\nHowever, the default conda solver is slow and often hangs. Therefore, we recommend installing [Mamba](https://github.com/mamba-org/mamba) as a drop-in replacement via\n\n`$ conda install -c conda-forge mamba`\n\nif you have anaconda/miniconda already installed, or directly installing `Mambaforge` as described [here](https://github.com/conda-forge/miniforge#mambaforge).\n\n\nThen, create and activate a clean environment as above:\n\n```\nmamba create -n fastqwiper python=3.10\nmamba activate fastqwiper\n```\nFinally, install a few dependencies:\n\n```\n$ mamba install -c bioconda snakemake\n$ mamba install colorama click\n```\n\n\n#### Usage\nClone the FastqWiper repository in a folder of your choice and enter it:\n\n```\ngit clone https://github.com/mazzalab/fastqwiper.git\ncd fastqwiper\n```\n\nIt contains, in particular, a folder `data` containing the fastq files to be processed, a folder `pipeline` containing the released pipelines and a folder `fastq_wiper` with the source files of `FastqWiper`. <br/>\nInput files to be processed should be copied into the **data** folder.\n\nCurrently, to run the `FastqWiper` pipelines, the following packages need to be installed manually:\n\n### required packages:\n[gzrt](https://github.com/arenn/gzrt) (Linux build fron source [instructions](https://github.com/arenn/gzrt/blob/master/README.build), Ubuntu install [instructions](https://howtoinstall.co/en/gzrt), Mac OS install [instructions](https://formulae.brew.sh/formula/gzrt))\n\n[BBTools](https://jgi.doe.gov/data-and-tools/software-tools/bbtools/) (install [instructions](https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/installation-guide/))\n\nIf installed from source, `gzrt` scripts need to be put on PATH. `bbmap` must be installed in the root folder of FastqWiper, as the image below\n\n![FastqWiper folder yierarchy](assets/hierarchy.png)\n\n### Commands:\nCopy the fastq files you want to fix in the `data` folder.\n**N.b.**: In all commands above, you will pass to the workflow the name of the sample to be analyzed through the config argument: `sample_name`. Remember that your fastq files' names must finish with `_R1.fastq.gz` and `_R2.fastq.gz`, for paired fastq files, and with `.fastq.gz`, for individual fastq files, and, therefore, the text to be assigned to the variable `sample_name` must be everything before them. E.g., if your files are `my_sample_R1.fastq.gz` and `my_sample_R2.fastq.gz`, then `--config sample_name=my_sample`.\n\n#### Paired-end files\n\n- **Get a dry run** of a pipeline (e.g., `fix_wipe_pairs_reads_sequential.smk`):<br />\n`snakemake --config sample_name=my_sample -s pipeline/fix_wipe_pairs_reads_sequential.smk --use-conda --cores 4`\n\n- **Generate the planned DAG**:<br />\n`snakemake --config sample_name=my_sample -s pipeline/fix_wipe_pairs_reads_sequential.smk --dag | dot -Tpdf > dag.pdf`<br /> <br />\n<img src=\"https://github.com/mazzalab/fastqwiper/blob/main/pipeline/fix_wipe_pairs_reads.png?raw=true\" width=\"400\">\n\n- **Run the pipeline** (n.b., during the first execution, Snakemake will download and install some required remote packages and may take longer). The number of computing cores can be tuned accordingly:<br />\n`snakemake --config sample_name=my_sample -s pipeline/fix_wipe_single_reads_sequential.smk --use-conda --cores 2`\n\nFixed files will be copied in the `data` folder and will be suffixed with the string `_fixed_wiped_paired_interleaving`.\nWe remind that the `fix_wipe_pairs_reads_sequential.smk` and `fix_wipe_pairs_reads_parallel.smk` pipelines perform the following actions:\n- execute `gzrt` on corrupted fastq.gz files (i.e., that cannot be unzipped because of errors) and recover readable reads;\n- execute `FastqWiper` on recovered reads to make them compliant with the FASTQ format (source: [Wipipedia](https://en.wikipedia.org/wiki/FASTQ_format))\n- execute `Trimmomatic` on wiped reads to remove residual unpaired reads\n- execute `BBmap (repair.sh)` on paired reads to fix the correct interleaving and sort fastq files.  \n\n#### Single-end files\n`fix_wipe_single_reads_parallel.smk` and `fix_wipe_single_reads_sequential.smk` will not execute `trimmomatic` and BBmap's `repair.sh`.\n\n- **Get a dry run** of a pipeline (e.g., `fix_wipe_single_reads_sequential.smk`):<br />\n`snakemake --config sample_name=my_sample -s pipeline/fix_wipe_single_reads_sequential.smk --use-conda --cores 2 -np`\n\n- **Generate the planned DAG**:<br />\n`snakemake --config sample_name=my_sample -s pipeline/fix_wipe_single_reads_sequential.smk --dag | dot -Tpdf > dag.pdf`<br /><br />\n<img src=\"https://github.com/mazzalab/fastqwiper/blob/main/pipeline/fix_wipe_single_reads.png?raw=true\" width=\"200\">\n\n- **Run the pipeline** (n.b., The number of computing cores can be tuned accordingly):<br />\n`snakemake --config sample_name=my_sample -s pipeline/fix_wipe_single_reads_sequential.smk --use-conda --cores 2`\n\n# Author\n**Tommaso Mazza**  \n[![Tweeting](https://img.shields.io/twitter/url/http/shields.io.svg?style=social)](https://twitter.com/irongraft)\n\nLaboratory of Bioinformatics</br>\nFondazione IRCCS Casa Sollievo della Sofferenza</br>\nViale Regina Margherita 261 - 00198 Roma IT</br>\nTel: +39 06 44160526 - Fax: +39 06 44160548</br>\nE-mail: t.mazza@css-mendel.it</br>\nWeb page: http://www.css-mendel.it</br>\nWeb page: http://bioinformatics.css-mendel.it</br>\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "An ensamble method to recover corrupted FASTQ files, drop or fix pesky lines, remove unpaired reads, and fix reads interleaving",
    "version": "2023.2.82",
    "project_urls": {
        "Developmental plan": "https://github.com/mazzalab/fastqwiper/projects",
        "Homepage": "https://github.com/mazzalab/fastqwiper",
        "Source": "https://github.com/mazzalab/fastqwiper",
        "Tracker": "https://github.com/mazzalab/fastqwiper/issues"
    },
    "split_keywords": [
        "genomics",
        "ngs",
        "fastq",
        "bioinformatics"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b9a09829f12bbf95fc6c1d55f8f12dbece24b7311967d6f346769b0f53e5f35b",
                "md5": "ed6497a6e60657bb17970dd9aac0c490",
                "sha256": "503904735566a1261696d7fa0dbffb55776d5e2f4ce34489b32843c6d90915cf"
            },
            "downloads": -1,
            "filename": "fastqwiper-2023.2.82-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ed6497a6e60657bb17970dd9aac0c490",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7,<3.11",
            "size": 14838,
            "upload_time": "2023-10-30T19:59:28",
            "upload_time_iso_8601": "2023-10-30T19:59:28.878117Z",
            "url": "https://files.pythonhosted.org/packages/b9/a0/9829f12bbf95fc6c1d55f8f12dbece24b7311967d6f346769b0f53e5f35b/fastqwiper-2023.2.82-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-10-30 19:59:28",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "mazzalab",
    "github_project": "fastqwiper",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "requirements": [],
    "lcname": "fastqwiper"
}
        
Elapsed time: 0.13865s