sequana-fastqc


Namesequana-fastqc JSON
Version 1.9.0 PyPI version JSON
download
home_pagehttps://github.com/sequana/fastqc
SummaryA multi-sample fastqc pipeline from Sequana project
upload_time2024-10-31 21:50:31
maintainerNone
docs_urlNone
authorSequana Team
requires_python<4.0,>=3.8
licenseBSD-3
keywords snakemake ngs sequana pipelines fastqc
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage
            
.. image:: https://badge.fury.io/py/sequana-fastqc.svg
     :target: https://pypi.python.org/pypi/sequana_fastqc

.. image:: http://joss.theoj.org/papers/10.21105/joss.00352/status.svg
    :target: http://joss.theoj.org/papers/10.21105/joss.00352
    :alt: JOSS (journal of open source software) DOI

.. image:: https://github.com/sequana/fastqc/actions/workflows/main.yml/badge.svg
   :target: https://github.com/sequana/fastqc/actions/workflows/main.yml


.. image:: https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C3.10-blue.svg
    :target: https://pypi.python.org/pypi/sequana
    :alt: Python 3.8 | 3.9 | 3.10

This is is the **fastqc** pipeline from the `Sequana <https://sequana.readthedocs.org>`_ projet

:Overview: Runs fastqc and multiqc on a set of Sequencing data to produce control quality reports
:Input: A set of FastQ files (paired or single-end) compressed or not
:Output: An HTML file summary.html (individual fastqc reports, mutli-samples report)
:Status: Production
:Wiki: https://github.com/sequana/fastqc/wiki
:Documentation: This README file, the Wiki from the github repository (link above) and https://sequana.readthedocs.io
:Citation: Cokelaer et al, (2017), 'Sequana': a Set of Snakemake NGS pipelines, Journal of Open Source Software, 2(16), 352, JOSS DOI https://doi:10.21105/joss.00352


Installation
~~~~~~~~~~~~

sequana_fastqc is based on Python3, just install the package as follows::

    pip install sequana_fastqc --upgrade

You will need third-party software such as fastqc. Please see below for details.

Usage
~~~~~

If you have a set of FastQ files in a data/ directory, type::

    sequana_fastqc --input-directory data

To know more about the options (e.g., add a different pattern to restrict the
execution to a subset of the input files, change the output/working directory,
etc)::

    sequana_fastqc --help

The call to sequana_fastqc creates a directory **fastqc**. Then, you go to the 
working directory and execute the pipeline as follows::

    cd fastqc
    sh fastqc.sh  # for a local run

This launch a snakemake pipeline. If you are familiar with snakemake, you can retrieve the fastqc.rules and config.yaml files and then execute the pipeline yourself with specific parameters::

    snakemake -s fastqc.rules --cores 4 --stats stats.txt

Or use `sequanix <https://sequana.readthedocs.io/en/master/sequanix.html>`_ interface.

Please see the `Wiki <https://github.com/sequana/fastqc/wiki>`_ for more examples and features.

Tutorial
~~~~~~~~

You can retrieve test data from sequana_fastqc (https://github.com/sequana/fastqc) or type::

    wget https://raw.githubusercontent.com/sequana/fastqc/master/sequana_pipelines/fastqc/data/data_R1_001.fastq.gz
    wget https://raw.githubusercontent.com/sequana/fastqc/master/sequana_pipelines/fastqc/data/data_R2_001.fastq.gz

then, prepare the pipeline::

    sequana_fastqc --input-directory .
    cd fastqc
    sh fastq.sh

    # once done, remove temporary files (snakemake and others)
    make clean

Just open the HTML entry called summary.html. A multiqc report is also available. 
You will get expected images such as the following one:

.. image:: https://github.com/sequana/fastqc/blob/main/doc/summary.png?raw=true

Please see the `Wiki <https://github.com/sequana/fastqc/wiki>`_ for more examples and features.

Requirements
~~~~~~~~~~~~

This pipelines requires the following executable(s):

- fastqc
- falco (optional)


For Linux users, we provide apptainer/singularity images available through the **damona** project (https://damona.readthedocs.io). 

To make use of them, initiliase the pipeline with the --use-apptainer option and everything should be downloaded
automatically for you, which also guarantees reproducibility::

    sequana_fastqc --input-directory data --use-apptainer --apptainer-prefix ~/images


.. image:: https://raw.githubusercontent.com/sequana/fastqc/main/sequana_pipelines/fastqc/dag.png


Details
~~~~~~~~~

This pipeline runs fastqc in parallel on the input fastq files (paired or not)
and then execute multiqc. A brief sequana summary report is also produced.
s
You may use falco instead of fastqc. This is experimental but seem to work for
Illumina/FastQ files.

This pipeline has been tested on several hundreds of MiSeq, NextSeq, MiniSeq,
ISeq100, Pacbio runs.

It produces a md5sum of your data. It copes with empty samples. Produces
ready-to-use HTML reports, etc


Rules and configuration details
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Here is the `latest documented configuration file <https://raw.githubusercontent.com/sequana/fastqc/main/sequana_pipelines/fastqc/config.yaml>`_
to be used with the pipeline. Each rule used in the pipeline may have a section in the configuration file. 

Changelog
~~~~~~~~~
========= ====================================================================
Version   Description
========= ====================================================================
1.8.2     * Fix the onerror typo in the pipeline, fix CI.
1.8.1     * update __init__ (version)
1.8.0     * uses pyproject instead of setuptools
          * uses click instead of argparse and newest sequana_pipetools 
            (0.16.0)
1.7.1     * Set wrapper version in the config based on new sequana_pipetools
            feature
1.7.0     * Use new rulegraph wrapper and new graphviz apptainer
1.6.2     * slight refactorisation to use rulegraph wrapper
1.6.1     * pin sequana version to 1.4.4 to force usage of new fastqc module
            to fix falco. Updated config documentation.
1.6.0     * Fixed falco output error and use singularity containers
1.5.0     * removed modules completely.
1.4.2     * simplified pipeline (suppress setup and use existing wrapper)
1.4.1     * simplified pipeline with wrappers/rules
1.4.0     * This version uses sequana 0.12.0 and new sequana-wrappers 
            mechanism. Functionalities is unchanged. Also based on
            sequana_pipetools 0.6.X
1.3.0     * add option --skip-multiqc (in case of memory trouble)
          * Fix typo in the link towards fastqc reports in the summary.html
            table
          * Fix number of samples in the paired case (divide by 2)
1.2.0     * compatibility with Sequanix
          * Fix pipeline to cope with new snakemake API
1.1.0     * add new rule to allow users to choose falco software instead of
            fastqc. Note that fastqc is 4 times faster but still a work in
            progress (version 0.1 as of Nov 2020).
          * allows the pipeline to process pacbio files (in fact any files
            accepted by fastqc i.e. SAM and BAM files
          * More doc, test and info on the wiki
1.0.1     * add md5sum of input files as md5.txt file
1.0.0     * a stable version. Added a wiki on github as well and a 
            singularity recipes
0.9.15    * For the HTML reports, takes into account samples with zero reads
0.9.14    * round up some statistics in the main table 
0.9.13    * improve the summary HTML report
0.9.12    * implemented new --from-project option
0.9.11    * now depends on sequana_pipetools instead of sequana.pipelines to 
            speed up --help calls
          * new summary.html report created with pipeline summary
          * new rule (plotting)
0.9.10    * simplify the onsuccess section
0.9.9     * add missing png and pipeline (regression bug)
0.9.8     * add missing multi_config file
0.9.7     * check existence of input directory in main.py
          * add a logo 
          * fix schema
          * add multiqc_config
          * add sequana + sequana_fastqc version
0.9.6     add the readtag option
========= ====================================================================


Contribute & Code of Conduct
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To contribute to this project, please take a look at the 
`Contributing Guidelines <https://github.com/sequana/sequana/blob/master/CONTRIBUTING.rst>`_ first. Please note that this project is released with a 
`Code of Conduct <https://github.com/sequana/sequana/blob/master/CONDUCT.md>`_. By contributing to this project, you agree to abide by its terms.



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/sequana/fastqc",
    "name": "sequana-fastqc",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.8",
    "maintainer_email": null,
    "keywords": "snakemake, NGS, sequana, pipelines, fastqc",
    "author": "Sequana Team",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/13/b7/a5a25b1edeae33392aca798bb800038a9fd520c13224b26850f38fff0050/sequana_fastqc-1.9.0.tar.gz",
    "platform": null,
    "description": "\n.. image:: https://badge.fury.io/py/sequana-fastqc.svg\n     :target: https://pypi.python.org/pypi/sequana_fastqc\n\n.. image:: http://joss.theoj.org/papers/10.21105/joss.00352/status.svg\n    :target: http://joss.theoj.org/papers/10.21105/joss.00352\n    :alt: JOSS (journal of open source software) DOI\n\n.. image:: https://github.com/sequana/fastqc/actions/workflows/main.yml/badge.svg\n   :target: https://github.com/sequana/fastqc/actions/workflows/main.yml\n\n\n.. image:: https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C3.10-blue.svg\n    :target: https://pypi.python.org/pypi/sequana\n    :alt: Python 3.8 | 3.9 | 3.10\n\nThis is is the **fastqc** pipeline from the `Sequana <https://sequana.readthedocs.org>`_ projet\n\n:Overview: Runs fastqc and multiqc on a set of Sequencing data to produce control quality reports\n:Input: A set of FastQ files (paired or single-end) compressed or not\n:Output: An HTML file summary.html (individual fastqc reports, mutli-samples report)\n:Status: Production\n:Wiki: https://github.com/sequana/fastqc/wiki\n:Documentation: This README file, the Wiki from the github repository (link above) and https://sequana.readthedocs.io\n:Citation: Cokelaer et al, (2017), 'Sequana': a Set of Snakemake NGS pipelines, Journal of Open Source Software, 2(16), 352, JOSS DOI https://doi:10.21105/joss.00352\n\n\nInstallation\n~~~~~~~~~~~~\n\nsequana_fastqc is based on Python3, just install the package as follows::\n\n    pip install sequana_fastqc --upgrade\n\nYou will need third-party software such as fastqc. Please see below for details.\n\nUsage\n~~~~~\n\nIf you have a set of FastQ files in a data/ directory, type::\n\n    sequana_fastqc --input-directory data\n\nTo know more about the options (e.g., add a different pattern to restrict the\nexecution to a subset of the input files, change the output/working directory,\netc)::\n\n    sequana_fastqc --help\n\nThe call to sequana_fastqc creates a directory **fastqc**. Then, you go to the \nworking directory and execute the pipeline as follows::\n\n    cd fastqc\n    sh fastqc.sh  # for a local run\n\nThis launch a snakemake pipeline. If you are familiar with snakemake, you can retrieve the fastqc.rules and config.yaml files and then execute the pipeline yourself with specific parameters::\n\n    snakemake -s fastqc.rules --cores 4 --stats stats.txt\n\nOr use `sequanix <https://sequana.readthedocs.io/en/master/sequanix.html>`_ interface.\n\nPlease see the `Wiki <https://github.com/sequana/fastqc/wiki>`_ for more examples and features.\n\nTutorial\n~~~~~~~~\n\nYou can retrieve test data from sequana_fastqc (https://github.com/sequana/fastqc) or type::\n\n    wget https://raw.githubusercontent.com/sequana/fastqc/master/sequana_pipelines/fastqc/data/data_R1_001.fastq.gz\n    wget https://raw.githubusercontent.com/sequana/fastqc/master/sequana_pipelines/fastqc/data/data_R2_001.fastq.gz\n\nthen, prepare the pipeline::\n\n    sequana_fastqc --input-directory .\n    cd fastqc\n    sh fastq.sh\n\n    # once done, remove temporary files (snakemake and others)\n    make clean\n\nJust open the HTML entry called summary.html. A multiqc report is also available. \nYou will get expected images such as the following one:\n\n.. image:: https://github.com/sequana/fastqc/blob/main/doc/summary.png?raw=true\n\nPlease see the `Wiki <https://github.com/sequana/fastqc/wiki>`_ for more examples and features.\n\nRequirements\n~~~~~~~~~~~~\n\nThis pipelines requires the following executable(s):\n\n- fastqc\n- falco (optional)\n\n\nFor Linux users, we provide apptainer/singularity images available through the **damona** project (https://damona.readthedocs.io). \n\nTo make use of them, initiliase the pipeline with the --use-apptainer option and everything should be downloaded\nautomatically for you, which also guarantees reproducibility::\n\n    sequana_fastqc --input-directory data --use-apptainer --apptainer-prefix ~/images\n\n\n.. image:: https://raw.githubusercontent.com/sequana/fastqc/main/sequana_pipelines/fastqc/dag.png\n\n\nDetails\n~~~~~~~~~\n\nThis pipeline runs fastqc in parallel on the input fastq files (paired or not)\nand then execute multiqc. A brief sequana summary report is also produced.\ns\nYou may use falco instead of fastqc. This is experimental but seem to work for\nIllumina/FastQ files.\n\nThis pipeline has been tested on several hundreds of MiSeq, NextSeq, MiniSeq,\nISeq100, Pacbio runs.\n\nIt produces a md5sum of your data. It copes with empty samples. Produces\nready-to-use HTML reports, etc\n\n\nRules and configuration details\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nHere is the `latest documented configuration file <https://raw.githubusercontent.com/sequana/fastqc/main/sequana_pipelines/fastqc/config.yaml>`_\nto be used with the pipeline. Each rule used in the pipeline may have a section in the configuration file. \n\nChangelog\n~~~~~~~~~\n========= ====================================================================\nVersion   Description\n========= ====================================================================\n1.8.2     * Fix the onerror typo in the pipeline, fix CI.\n1.8.1     * update __init__ (version)\n1.8.0     * uses pyproject instead of setuptools\n          * uses click instead of argparse and newest sequana_pipetools \n            (0.16.0)\n1.7.1     * Set wrapper version in the config based on new sequana_pipetools\n            feature\n1.7.0     * Use new rulegraph wrapper and new graphviz apptainer\n1.6.2     * slight refactorisation to use rulegraph wrapper\n1.6.1     * pin sequana version to 1.4.4 to force usage of new fastqc module\n            to fix falco. Updated config documentation.\n1.6.0     * Fixed falco output error and use singularity containers\n1.5.0     * removed modules completely.\n1.4.2     * simplified pipeline (suppress setup and use existing wrapper)\n1.4.1     * simplified pipeline with wrappers/rules\n1.4.0     * This version uses sequana 0.12.0 and new sequana-wrappers \n            mechanism. Functionalities is unchanged. Also based on\n            sequana_pipetools 0.6.X\n1.3.0     * add option --skip-multiqc (in case of memory trouble)\n          * Fix typo in the link towards fastqc reports in the summary.html\n            table\n          * Fix number of samples in the paired case (divide by 2)\n1.2.0     * compatibility with Sequanix\n          * Fix pipeline to cope with new snakemake API\n1.1.0     * add new rule to allow users to choose falco software instead of\n            fastqc. Note that fastqc is 4 times faster but still a work in\n            progress (version 0.1 as of Nov 2020).\n          * allows the pipeline to process pacbio files (in fact any files\n            accepted by fastqc i.e. SAM and BAM files\n          * More doc, test and info on the wiki\n1.0.1     * add md5sum of input files as md5.txt file\n1.0.0     * a stable version. Added a wiki on github as well and a \n            singularity recipes\n0.9.15    * For the HTML reports, takes into account samples with zero reads\n0.9.14    * round up some statistics in the main table \n0.9.13    * improve the summary HTML report\n0.9.12    * implemented new --from-project option\n0.9.11    * now depends on sequana_pipetools instead of sequana.pipelines to \n            speed up --help calls\n          * new summary.html report created with pipeline summary\n          * new rule (plotting)\n0.9.10    * simplify the onsuccess section\n0.9.9     * add missing png and pipeline (regression bug)\n0.9.8     * add missing multi_config file\n0.9.7     * check existence of input directory in main.py\n          * add a logo \n          * fix schema\n          * add multiqc_config\n          * add sequana + sequana_fastqc version\n0.9.6     add the readtag option\n========= ====================================================================\n\n\nContribute & Code of Conduct\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nTo contribute to this project, please take a look at the \n`Contributing Guidelines <https://github.com/sequana/sequana/blob/master/CONTRIBUTING.rst>`_ first. Please note that this project is released with a \n`Code of Conduct <https://github.com/sequana/sequana/blob/master/CONDUCT.md>`_. By contributing to this project, you agree to abide by its terms.\n\n\n",
    "bugtrack_url": null,
    "license": "BSD-3",
    "summary": "A multi-sample fastqc pipeline from Sequana project",
    "version": "1.9.0",
    "project_urls": {
        "Homepage": "https://github.com/sequana/fastqc",
        "Repository": "https://github.com/sequana/fastqc"
    },
    "split_keywords": [
        "snakemake",
        " ngs",
        " sequana",
        " pipelines",
        " fastqc"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5130ef207570cd07f70fdff2d7a6e09eb6bcf96c0302e3e1418d04b532d922dc",
                "md5": "e3431352ed6016360a5642de11c83b89",
                "sha256": "ee6e0febb11630d54eaa12afaae94fc78cc871a3fa5b245576d739a0852f63b5"
            },
            "downloads": -1,
            "filename": "sequana_fastqc-1.9.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e3431352ed6016360a5642de11c83b89",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.8",
            "size": 89054,
            "upload_time": "2024-10-31T21:50:29",
            "upload_time_iso_8601": "2024-10-31T21:50:29.987222Z",
            "url": "https://files.pythonhosted.org/packages/51/30/ef207570cd07f70fdff2d7a6e09eb6bcf96c0302e3e1418d04b532d922dc/sequana_fastqc-1.9.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "13b7a5a25b1edeae33392aca798bb800038a9fd520c13224b26850f38fff0050",
                "md5": "f398e81f11df44768739c949db08f916",
                "sha256": "5deb9aff418262a50650df3b5f90e9b56e8bb9cb4b7e730d095fbfef2a371934"
            },
            "downloads": -1,
            "filename": "sequana_fastqc-1.9.0.tar.gz",
            "has_sig": false,
            "md5_digest": "f398e81f11df44768739c949db08f916",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.8",
            "size": 91453,
            "upload_time": "2024-10-31T21:50:31",
            "upload_time_iso_8601": "2024-10-31T21:50:31.884355Z",
            "url": "https://files.pythonhosted.org/packages/13/b7/a5a25b1edeae33392aca798bb800038a9fd520c13224b26850f38fff0050/sequana_fastqc-1.9.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-31 21:50:31",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "sequana",
    "github_project": "fastqc",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "lcname": "sequana-fastqc"
}
        
Elapsed time: 0.36802s