sequana-nanomerge


Namesequana-nanomerge JSON
Version 1.5.0 PyPI version JSON
download
home_pagehttps://github.com/sequana/nanomerge
SummaryMerge barcoded or non barcoded fastq files generated by Nanopore runs
upload_time2023-12-03 21:04:18
maintainer
docs_urlNone
authorSequana Team
requires_python>=3.8,<4.0
licenseBSD-3
keywords snakemake nanopore sequana merge barcode
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage
            
.. image:: https://badge.fury.io/py/sequana-nanomerge.svg
     :target: https://pypi.python.org/pypi/sequana_nanomerge

.. image:: http://joss.theoj.org/papers/10.21105/joss.00352/status.svg
    :target: http://joss.theoj.org/papers/10.21105/joss.00352
    :alt: JOSS (journal of open source software) DOI

.. image:: https://github.com/sequana/nanomerge/actions/workflows/main.yml/badge.svg
   :target: https://github.com/sequana/nanomerge/actions/workflows

.. image:: https://coveralls.io/repos/github/sequana/nanomerge/badge.svg?branch=main
   :target: https://coveralls.io/github/sequana/nanomerge?branch=main


.. image:: http://joss.theoj.org/papers/10.21105/joss.00352/status.svg
   :target: http://joss.theoj.org/papers/10.21105/joss.00352
   :alt: JOSS (journal of open source software) DOI

.. image:: https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C3.10-blue.svg
    :target: https://pypi.python.org/pypi/sequana
    :alt: Python 3.8 | 3.9 | 3.10




This is is the **nanomerge** pipeline from the `Sequana <https://sequana.readthedocs.org>`_ project

:Overview: merge fastq files generated by Nanopore run and generates raw data QC.
:Input: individual fastq files generated by nanopore demultiplexing
:Output: merged fastq files for each barcode (or unique sample)
:Status: production
:Citation: Cokelaer et al, (2017), ‘Sequana’: a Set of Snakemake NGS pipelines, Journal of Open Source Software, 2(16), 352, JOSS DOI doi:10.21105/joss.00352


Installation
~~~~~~~~~~~~

You can install the packages using pip::

    pip install sequana_nanomerge --upgrade

An optional requirements is pycoQC, which can be install with conda/mamba using e.g.::

    conda install pycoQC

you will also need graphviz installed.

Usage
~~~~~

::

    sequana_nanomerge --help

If you data is barcoded, they are usually in sub-directories barcoded/barcodeXY so you will need to use a pattern
(--input-pattern) such as `*/*.gz`::

    sequana_nanomerge --input-directory DATAPATH/barcoded --samplesheet samplesheet.csv
        --summary summary.txt --input-pattern '*/*fastq.gz'

otherwise all fastq files are in DATAPATH/ so the input pattern can just be `*.fastq.gz`::

    sequana_nanomerge --input-directory DATAPATH --samplesheet samplesheet.csv
        --summary summary.txt --input-pattern '*fastq.gz'

The --summary is optional and takes as input the output of albacore/guppy demultiplexing. usually a file called sequencing_summary.txt

Note that the different between the two is the extra `*/` before the `*.fastq.gz` pattern since barcoded files are in individual subdirectories.

In both bases, the command creates a directory with the pipeline and configuration file. You will then need to execute the pipeline::

    cd nanomerge
    sh nanomerge.sh  # for a local run

This launch a snakemake pipeline. If you are familiar with snakemake, you can 
retrieve the pipeline itself and its configuration files and then execute the pipeline yourself with specific parameters::

    snakemake -s nanomerge.rules -c config.yaml --cores 4 --stats stats.txt

Or use `sequanix <https://sequana.readthedocs.io/en/master/sequanix.html>`_ interface.

Concerning the sample sheet, whether your data is barcoded or not, it should be a CSV file ::

    barcode,project,sample
    barcode01,main,A
    barcode02,main,B
    barcode03,main,C

For a non-barcoded run, you must provide a file where the barcode column can be set (empty)::

    barcode,project,sample
    ,main,A

or just removed::

    project,sample
    main,A

Usage with apptainer:
~~~~~~~~~~~~~~~~~~~~~~~~~

With apptainer, initiate the working directory as follows::

    sequana_nanomerge --use-apptainer

Images are downloaded in the working directory but you can store then in a directory globally (e.g.)::

    sequana_nanomerge --use-apptainer --apptainer-prefix ~/.sequana/apptainers

and then::

    cd nanomerge
    sh nanomerge.sh

if you decide to use snakemake manually, do not forget to add apptainer options::

    snakemake -s nanomerge.rules -c config.yaml --cores 4 --stats stats.txt --use-apptainer --apptainer-prefix ~/.sequana/apptainers --apptainer-args "-B /home:/home"


Requirements
~~~~~~~~~~~~

This pipelines requires the following executable(s), which is optional:

- pycoQC
- dot

.. image:: https://raw.githubusercontent.com/sequana/nanomerge/main/sequana_pipelines/nanomerge/dag.png


Details
~~~~~~~~~

This pipeline runs **nanomerge** in parallel on the input fastq files (paired or not). 
A brief sequana summary report is also produced.


Rules and configuration details
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Here is the `latest documented configuration file <https://raw.githubusercontent.com/sequana/sequana_nanomerge/master/sequana_pipelines/nanomerge/config.yaml>`_
to be used with the pipeline. Each rule used in the pipeline may have a section in the configuration file. 

Changelog
~~~~~~~~~

========= ====================================================================
Version   Description
========= ====================================================================
1.5.0     * refactoring to use Click
1.4.0     * sub sampling was biased in v1.3.0. Using stratified sampling to 
            correcly sample large file. Also set a --promethion option that
            auomatically sub sample 10% of the data
          * add summary table
1.3.0     * handle large promethium run by using a sub sample of the 
            sequencing summary file (--sample of pycoQC still loads the entire
            file in memory)
1.2.0     * handle large promethium run by using find+cat instead of just 
            cat to cope with very large number of input files.
1.1.0     * add subsample option and set to 1,000,000 reads to handle large 
            runs such as promethion
1.0.1     * CSV can now handle sample or samplename column name in samplesheet.
          * Fix the pyco file paths, update requirements and doc
1.0.0     Stable release ready for production
0.0.1     **First release.**
========= ====================================================================




            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/sequana/nanomerge",
    "name": "sequana-nanomerge",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8,<4.0",
    "maintainer_email": "",
    "keywords": "snakemake,Nanopore,sequana,merge,barcode",
    "author": "Sequana Team",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/8e/ab/5b27b6ddce8712cad53142a814471795542883b1b01a26cf0b9a82136b8f/sequana_nanomerge-1.5.0.tar.gz",
    "platform": null,
    "description": "\n.. image:: https://badge.fury.io/py/sequana-nanomerge.svg\n     :target: https://pypi.python.org/pypi/sequana_nanomerge\n\n.. image:: http://joss.theoj.org/papers/10.21105/joss.00352/status.svg\n    :target: http://joss.theoj.org/papers/10.21105/joss.00352\n    :alt: JOSS (journal of open source software) DOI\n\n.. image:: https://github.com/sequana/nanomerge/actions/workflows/main.yml/badge.svg\n   :target: https://github.com/sequana/nanomerge/actions/workflows\n\n.. image:: https://coveralls.io/repos/github/sequana/nanomerge/badge.svg?branch=main\n \u00a0 :target: https://coveralls.io/github/sequana/nanomerge?branch=main\n\n\n.. image:: http://joss.theoj.org/papers/10.21105/joss.00352/status.svg\n   :target: http://joss.theoj.org/papers/10.21105/joss.00352\n   :alt: JOSS (journal of open source software) DOI\n\n.. image:: https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C3.10-blue.svg\n    :target: https://pypi.python.org/pypi/sequana\n    :alt: Python 3.8 | 3.9 | 3.10\n\n\n\n\nThis is is the **nanomerge** pipeline from the `Sequana <https://sequana.readthedocs.org>`_ project\n\n:Overview: merge fastq files generated by Nanopore run and generates raw data QC.\n:Input: individual fastq files generated by nanopore demultiplexing\n:Output: merged fastq files for each barcode (or unique sample)\n:Status: production\n:Citation: Cokelaer et al, (2017), \u2018Sequana\u2019: a Set of Snakemake NGS pipelines, Journal of Open Source Software, 2(16), 352, JOSS DOI doi:10.21105/joss.00352\n\n\nInstallation\n~~~~~~~~~~~~\n\nYou can install the packages using pip::\n\n    pip install sequana_nanomerge --upgrade\n\nAn optional requirements is pycoQC, which can be install with conda/mamba using e.g.::\n\n    conda install pycoQC\n\nyou will also need graphviz installed.\n\nUsage\n~~~~~\n\n::\n\n    sequana_nanomerge --help\n\nIf you data is barcoded, they are usually in sub-directories barcoded/barcodeXY so you will need to use a pattern\n(--input-pattern) such as `*/*.gz`::\n\n    sequana_nanomerge --input-directory DATAPATH/barcoded --samplesheet samplesheet.csv\n        --summary summary.txt --input-pattern '*/*fastq.gz'\n\notherwise all fastq files are in DATAPATH/ so the input pattern can just be `*.fastq.gz`::\n\n    sequana_nanomerge --input-directory DATAPATH --samplesheet samplesheet.csv\n        --summary summary.txt --input-pattern '*fastq.gz'\n\nThe --summary is optional and takes as input the output of albacore/guppy demultiplexing. usually a file called sequencing_summary.txt\n\nNote that the different between the two is the extra `*/` before the `*.fastq.gz` pattern since barcoded files are in individual subdirectories.\n\nIn both bases, the command creates a directory with the pipeline and configuration file. You will then need to execute the pipeline::\n\n    cd nanomerge\n    sh nanomerge.sh  # for a local run\n\nThis launch a snakemake pipeline. If you are familiar with snakemake, you can \nretrieve the pipeline itself and its configuration files and then execute the pipeline yourself with specific parameters::\n\n    snakemake -s nanomerge.rules -c config.yaml --cores 4 --stats stats.txt\n\nOr use `sequanix <https://sequana.readthedocs.io/en/master/sequanix.html>`_ interface.\n\nConcerning the sample sheet, whether your data is barcoded or not, it should be a CSV file ::\n\n    barcode,project,sample\n    barcode01,main,A\n    barcode02,main,B\n    barcode03,main,C\n\nFor a non-barcoded run, you must provide a file where the barcode column can be set (empty)::\n\n    barcode,project,sample\n    ,main,A\n\nor just removed::\n\n    project,sample\n    main,A\n\nUsage with apptainer:\n~~~~~~~~~~~~~~~~~~~~~~~~~\n\nWith apptainer, initiate the working directory as follows::\n\n    sequana_nanomerge --use-apptainer\n\nImages are downloaded in the working directory but you can store then in a directory globally (e.g.)::\n\n    sequana_nanomerge --use-apptainer --apptainer-prefix ~/.sequana/apptainers\n\nand then::\n\n    cd nanomerge\n    sh nanomerge.sh\n\nif you decide to use snakemake manually, do not forget to add apptainer options::\n\n    snakemake -s nanomerge.rules -c config.yaml --cores 4 --stats stats.txt --use-apptainer --apptainer-prefix ~/.sequana/apptainers --apptainer-args \"-B /home:/home\"\n\n\nRequirements\n~~~~~~~~~~~~\n\nThis pipelines requires the following executable(s), which is optional:\n\n- pycoQC\n- dot\n\n.. image:: https://raw.githubusercontent.com/sequana/nanomerge/main/sequana_pipelines/nanomerge/dag.png\n\n\nDetails\n~~~~~~~~~\n\nThis pipeline runs **nanomerge** in parallel on the input fastq files (paired or not). \nA brief sequana summary report is also produced.\n\n\nRules and configuration details\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nHere is the `latest documented configuration file <https://raw.githubusercontent.com/sequana/sequana_nanomerge/master/sequana_pipelines/nanomerge/config.yaml>`_\nto be used with the pipeline. Each rule used in the pipeline may have a section in the configuration file. \n\nChangelog\n~~~~~~~~~\n\n========= ====================================================================\nVersion   Description\n========= ====================================================================\n1.5.0     * refactoring to use Click\n1.4.0     * sub sampling was biased in v1.3.0. Using stratified sampling to \n            correcly sample large file. Also set a --promethion option that\n            auomatically sub sample 10% of the data\n          * add summary table\n1.3.0     * handle large promethium run by using a sub sample of the \n            sequencing summary file (--sample of pycoQC still loads the entire\n            file in memory)\n1.2.0     * handle large promethium run by using find+cat instead of just \n            cat to cope with very large number of input files.\n1.1.0     * add subsample option and set to 1,000,000 reads to handle large \n            runs such as promethion\n1.0.1     * CSV can now handle sample or samplename column name in samplesheet.\n          * Fix the pyco file paths, update requirements and doc\n1.0.0     Stable release ready for production\n0.0.1     **First release.**\n========= ====================================================================\n\n\n\n",
    "bugtrack_url": null,
    "license": "BSD-3",
    "summary": "Merge barcoded or non barcoded fastq files generated by Nanopore runs",
    "version": "1.5.0",
    "project_urls": {
        "Homepage": "https://github.com/sequana/nanomerge",
        "Repository": "https://github.com/sequana/nanomerge"
    },
    "split_keywords": [
        "snakemake",
        "nanopore",
        "sequana",
        "merge",
        "barcode"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2d05cabc081f6fed6bc1db27273e572759b38f61f530c3c3922fbab56049b5a2",
                "md5": "49581d2d4e425ff21734374215945c5a",
                "sha256": "b4bd1f7a4cd480baf0a0af8ab5401b9eb3759e97880f8041f16e7f79db8f830a"
            },
            "downloads": -1,
            "filename": "sequana_nanomerge-1.5.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "49581d2d4e425ff21734374215945c5a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8,<4.0",
            "size": 28007,
            "upload_time": "2023-12-03T21:04:17",
            "upload_time_iso_8601": "2023-12-03T21:04:17.015966Z",
            "url": "https://files.pythonhosted.org/packages/2d/05/cabc081f6fed6bc1db27273e572759b38f61f530c3c3922fbab56049b5a2/sequana_nanomerge-1.5.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8eab5b27b6ddce8712cad53142a814471795542883b1b01a26cf0b9a82136b8f",
                "md5": "dba89585124411aef11ba7aa3e7d404a",
                "sha256": "c7bc32696ca359dcbac2095d02245b4b4ad52ff1d48d4e5a16fe2bdc28cfe809"
            },
            "downloads": -1,
            "filename": "sequana_nanomerge-1.5.0.tar.gz",
            "has_sig": false,
            "md5_digest": "dba89585124411aef11ba7aa3e7d404a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8,<4.0",
            "size": 28182,
            "upload_time": "2023-12-03T21:04:18",
            "upload_time_iso_8601": "2023-12-03T21:04:18.605650Z",
            "url": "https://files.pythonhosted.org/packages/8e/ab/5b27b6ddce8712cad53142a814471795542883b1b01a26cf0b9a82136b8f/sequana_nanomerge-1.5.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-12-03 21:04:18",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "sequana",
    "github_project": "nanomerge",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "lcname": "sequana-nanomerge"
}
        
Elapsed time: 2.57865s