.. image:: https://badge.fury.io/py/sequana-nanomerge.svg
:target: https://pypi.python.org/pypi/sequana_nanomerge
.. image:: http://joss.theoj.org/papers/10.21105/joss.00352/status.svg
:target: http://joss.theoj.org/papers/10.21105/joss.00352
:alt: JOSS (journal of open source software) DOI
.. image:: https://github.com/sequana/nanomerge/actions/workflows/main.yml/badge.svg
:target: https://github.com/sequana/nanomerge/actions/workflows
.. image:: https://coveralls.io/repos/github/sequana/nanomerge/badge.svg?branch=main
:target: https://coveralls.io/github/sequana/nanomerge?branch=main
.. image:: http://joss.theoj.org/papers/10.21105/joss.00352/status.svg
:target: http://joss.theoj.org/papers/10.21105/joss.00352
:alt: JOSS (journal of open source software) DOI
.. image:: https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C3.10-blue.svg
:target: https://pypi.python.org/pypi/sequana
:alt: Python 3.8 | 3.9 | 3.10
This is is the **nanomerge** pipeline from the `Sequana <https://sequana.readthedocs.org>`_ project
:Overview: merge fastq files generated by Nanopore run and generates raw data QC.
:Input: individual fastq files generated by nanopore demultiplexing
:Output: merged fastq files for each barcode (or unique sample)
:Status: production
:Citation: Cokelaer et al, (2017), ‘Sequana’: a Set of Snakemake NGS pipelines, Journal of Open Source Software, 2(16), 352, JOSS DOI doi:10.21105/joss.00352
Installation
~~~~~~~~~~~~
You can install the packages using pip::
pip install sequana_nanomerge --upgrade
An optional requirements is pycoQC, which can be install with conda/mamba using e.g.::
conda install pycoQC
you will also need graphviz installed.
Usage
~~~~~
::
sequana_nanomerge --help
If you data is barcoded, they are usually in sub-directories barcoded/barcodeXY so you will need to use a pattern
(--input-pattern) such as `*/*.gz`::
sequana_nanomerge --input-directory DATAPATH/barcoded --samplesheet samplesheet.csv
--summary summary.txt --input-pattern '*/*fastq.gz'
otherwise all fastq files are in DATAPATH/ so the input pattern can just be `*.fastq.gz`::
sequana_nanomerge --input-directory DATAPATH --samplesheet samplesheet.csv
--summary summary.txt --input-pattern '*fastq.gz'
The --summary is optional and takes as input the output of albacore/guppy demultiplexing. usually a file called sequencing_summary.txt
Note that the different between the two is the extra `*/` before the `*.fastq.gz` pattern since barcoded files are in individual subdirectories.
In both bases, the command creates a directory with the pipeline and configuration file. You will then need to execute the pipeline::
cd nanomerge
sh nanomerge.sh # for a local run
This launch a snakemake pipeline. If you are familiar with snakemake, you can
retrieve the pipeline itself and its configuration files and then execute the pipeline yourself with specific parameters::
snakemake -s nanomerge.rules -c config.yaml --cores 4 --stats stats.txt
Or use `sequanix <https://sequana.readthedocs.io/en/master/sequanix.html>`_ interface.
Concerning the sample sheet, whether your data is barcoded or not, it should be a CSV file ::
barcode,project,sample
barcode01,main,A
barcode02,main,B
barcode03,main,C
For a non-barcoded run, you must provide a file where the barcode column can be set (empty)::
barcode,project,sample
,main,A
or just removed::
project,sample
main,A
Usage with apptainer:
~~~~~~~~~~~~~~~~~~~~~~~~~
With apptainer, initiate the working directory as follows::
sequana_nanomerge --use-apptainer
Images are downloaded in the working directory but you can store then in a directory globally (e.g.)::
sequana_nanomerge --use-apptainer --apptainer-prefix ~/.sequana/apptainers
and then::
cd nanomerge
sh nanomerge.sh
if you decide to use snakemake manually, do not forget to add apptainer options::
snakemake -s nanomerge.rules -c config.yaml --cores 4 --stats stats.txt --use-apptainer --apptainer-prefix ~/.sequana/apptainers --apptainer-args "-B /home:/home"
Requirements
~~~~~~~~~~~~
This pipelines requires the following executable(s), which is optional:
- pycoQC
- dot
.. image:: https://raw.githubusercontent.com/sequana/nanomerge/main/sequana_pipelines/nanomerge/dag.png
Details
~~~~~~~~~
This pipeline runs **nanomerge** in parallel on the input fastq files (paired or not).
A brief sequana summary report is also produced.
Rules and configuration details
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Here is the `latest documented configuration file <https://raw.githubusercontent.com/sequana/sequana_nanomerge/master/sequana_pipelines/nanomerge/config.yaml>`_
to be used with the pipeline. Each rule used in the pipeline may have a section in the configuration file.
Changelog
~~~~~~~~~
========= ====================================================================
Version Description
========= ====================================================================
1.5.0 * refactoring to use Click
1.4.0 * sub sampling was biased in v1.3.0. Using stratified sampling to
correcly sample large file. Also set a --promethion option that
auomatically sub sample 10% of the data
* add summary table
1.3.0 * handle large promethium run by using a sub sample of the
sequencing summary file (--sample of pycoQC still loads the entire
file in memory)
1.2.0 * handle large promethium run by using find+cat instead of just
cat to cope with very large number of input files.
1.1.0 * add subsample option and set to 1,000,000 reads to handle large
runs such as promethion
1.0.1 * CSV can now handle sample or samplename column name in samplesheet.
* Fix the pyco file paths, update requirements and doc
1.0.0 Stable release ready for production
0.0.1 **First release.**
========= ====================================================================
Raw data
{
"_id": null,
"home_page": "https://github.com/sequana/nanomerge",
"name": "sequana-nanomerge",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8,<4.0",
"maintainer_email": "",
"keywords": "snakemake,Nanopore,sequana,merge,barcode",
"author": "Sequana Team",
"author_email": "",
"download_url": "https://files.pythonhosted.org/packages/8e/ab/5b27b6ddce8712cad53142a814471795542883b1b01a26cf0b9a82136b8f/sequana_nanomerge-1.5.0.tar.gz",
"platform": null,
"description": "\n.. image:: https://badge.fury.io/py/sequana-nanomerge.svg\n :target: https://pypi.python.org/pypi/sequana_nanomerge\n\n.. image:: http://joss.theoj.org/papers/10.21105/joss.00352/status.svg\n :target: http://joss.theoj.org/papers/10.21105/joss.00352\n :alt: JOSS (journal of open source software) DOI\n\n.. image:: https://github.com/sequana/nanomerge/actions/workflows/main.yml/badge.svg\n :target: https://github.com/sequana/nanomerge/actions/workflows\n\n.. image:: https://coveralls.io/repos/github/sequana/nanomerge/badge.svg?branch=main\n \u00a0 :target: https://coveralls.io/github/sequana/nanomerge?branch=main\n\n\n.. image:: http://joss.theoj.org/papers/10.21105/joss.00352/status.svg\n :target: http://joss.theoj.org/papers/10.21105/joss.00352\n :alt: JOSS (journal of open source software) DOI\n\n.. image:: https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C3.10-blue.svg\n :target: https://pypi.python.org/pypi/sequana\n :alt: Python 3.8 | 3.9 | 3.10\n\n\n\n\nThis is is the **nanomerge** pipeline from the `Sequana <https://sequana.readthedocs.org>`_ project\n\n:Overview: merge fastq files generated by Nanopore run and generates raw data QC.\n:Input: individual fastq files generated by nanopore demultiplexing\n:Output: merged fastq files for each barcode (or unique sample)\n:Status: production\n:Citation: Cokelaer et al, (2017), \u2018Sequana\u2019: a Set of Snakemake NGS pipelines, Journal of Open Source Software, 2(16), 352, JOSS DOI doi:10.21105/joss.00352\n\n\nInstallation\n~~~~~~~~~~~~\n\nYou can install the packages using pip::\n\n pip install sequana_nanomerge --upgrade\n\nAn optional requirements is pycoQC, which can be install with conda/mamba using e.g.::\n\n conda install pycoQC\n\nyou will also need graphviz installed.\n\nUsage\n~~~~~\n\n::\n\n sequana_nanomerge --help\n\nIf you data is barcoded, they are usually in sub-directories barcoded/barcodeXY so you will need to use a pattern\n(--input-pattern) such as `*/*.gz`::\n\n sequana_nanomerge --input-directory DATAPATH/barcoded --samplesheet samplesheet.csv\n --summary summary.txt --input-pattern '*/*fastq.gz'\n\notherwise all fastq files are in DATAPATH/ so the input pattern can just be `*.fastq.gz`::\n\n sequana_nanomerge --input-directory DATAPATH --samplesheet samplesheet.csv\n --summary summary.txt --input-pattern '*fastq.gz'\n\nThe --summary is optional and takes as input the output of albacore/guppy demultiplexing. usually a file called sequencing_summary.txt\n\nNote that the different between the two is the extra `*/` before the `*.fastq.gz` pattern since barcoded files are in individual subdirectories.\n\nIn both bases, the command creates a directory with the pipeline and configuration file. You will then need to execute the pipeline::\n\n cd nanomerge\n sh nanomerge.sh # for a local run\n\nThis launch a snakemake pipeline. If you are familiar with snakemake, you can \nretrieve the pipeline itself and its configuration files and then execute the pipeline yourself with specific parameters::\n\n snakemake -s nanomerge.rules -c config.yaml --cores 4 --stats stats.txt\n\nOr use `sequanix <https://sequana.readthedocs.io/en/master/sequanix.html>`_ interface.\n\nConcerning the sample sheet, whether your data is barcoded or not, it should be a CSV file ::\n\n barcode,project,sample\n barcode01,main,A\n barcode02,main,B\n barcode03,main,C\n\nFor a non-barcoded run, you must provide a file where the barcode column can be set (empty)::\n\n barcode,project,sample\n ,main,A\n\nor just removed::\n\n project,sample\n main,A\n\nUsage with apptainer:\n~~~~~~~~~~~~~~~~~~~~~~~~~\n\nWith apptainer, initiate the working directory as follows::\n\n sequana_nanomerge --use-apptainer\n\nImages are downloaded in the working directory but you can store then in a directory globally (e.g.)::\n\n sequana_nanomerge --use-apptainer --apptainer-prefix ~/.sequana/apptainers\n\nand then::\n\n cd nanomerge\n sh nanomerge.sh\n\nif you decide to use snakemake manually, do not forget to add apptainer options::\n\n snakemake -s nanomerge.rules -c config.yaml --cores 4 --stats stats.txt --use-apptainer --apptainer-prefix ~/.sequana/apptainers --apptainer-args \"-B /home:/home\"\n\n\nRequirements\n~~~~~~~~~~~~\n\nThis pipelines requires the following executable(s), which is optional:\n\n- pycoQC\n- dot\n\n.. image:: https://raw.githubusercontent.com/sequana/nanomerge/main/sequana_pipelines/nanomerge/dag.png\n\n\nDetails\n~~~~~~~~~\n\nThis pipeline runs **nanomerge** in parallel on the input fastq files (paired or not). \nA brief sequana summary report is also produced.\n\n\nRules and configuration details\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nHere is the `latest documented configuration file <https://raw.githubusercontent.com/sequana/sequana_nanomerge/master/sequana_pipelines/nanomerge/config.yaml>`_\nto be used with the pipeline. Each rule used in the pipeline may have a section in the configuration file. \n\nChangelog\n~~~~~~~~~\n\n========= ====================================================================\nVersion Description\n========= ====================================================================\n1.5.0 * refactoring to use Click\n1.4.0 * sub sampling was biased in v1.3.0. Using stratified sampling to \n correcly sample large file. Also set a --promethion option that\n auomatically sub sample 10% of the data\n * add summary table\n1.3.0 * handle large promethium run by using a sub sample of the \n sequencing summary file (--sample of pycoQC still loads the entire\n file in memory)\n1.2.0 * handle large promethium run by using find+cat instead of just \n cat to cope with very large number of input files.\n1.1.0 * add subsample option and set to 1,000,000 reads to handle large \n runs such as promethion\n1.0.1 * CSV can now handle sample or samplename column name in samplesheet.\n * Fix the pyco file paths, update requirements and doc\n1.0.0 Stable release ready for production\n0.0.1 **First release.**\n========= ====================================================================\n\n\n\n",
"bugtrack_url": null,
"license": "BSD-3",
"summary": "Merge barcoded or non barcoded fastq files generated by Nanopore runs",
"version": "1.5.0",
"project_urls": {
"Homepage": "https://github.com/sequana/nanomerge",
"Repository": "https://github.com/sequana/nanomerge"
},
"split_keywords": [
"snakemake",
"nanopore",
"sequana",
"merge",
"barcode"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "2d05cabc081f6fed6bc1db27273e572759b38f61f530c3c3922fbab56049b5a2",
"md5": "49581d2d4e425ff21734374215945c5a",
"sha256": "b4bd1f7a4cd480baf0a0af8ab5401b9eb3759e97880f8041f16e7f79db8f830a"
},
"downloads": -1,
"filename": "sequana_nanomerge-1.5.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "49581d2d4e425ff21734374215945c5a",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8,<4.0",
"size": 28007,
"upload_time": "2023-12-03T21:04:17",
"upload_time_iso_8601": "2023-12-03T21:04:17.015966Z",
"url": "https://files.pythonhosted.org/packages/2d/05/cabc081f6fed6bc1db27273e572759b38f61f530c3c3922fbab56049b5a2/sequana_nanomerge-1.5.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "8eab5b27b6ddce8712cad53142a814471795542883b1b01a26cf0b9a82136b8f",
"md5": "dba89585124411aef11ba7aa3e7d404a",
"sha256": "c7bc32696ca359dcbac2095d02245b4b4ad52ff1d48d4e5a16fe2bdc28cfe809"
},
"downloads": -1,
"filename": "sequana_nanomerge-1.5.0.tar.gz",
"has_sig": false,
"md5_digest": "dba89585124411aef11ba7aa3e7d404a",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8,<4.0",
"size": 28182,
"upload_time": "2023-12-03T21:04:18",
"upload_time_iso_8601": "2023-12-03T21:04:18.605650Z",
"url": "https://files.pythonhosted.org/packages/8e/ab/5b27b6ddce8712cad53142a814471795542883b1b01a26cf0b9a82136b8f/sequana_nanomerge-1.5.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-12-03 21:04:18",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "sequana",
"github_project": "nanomerge",
"travis_ci": false,
"coveralls": true,
"github_actions": true,
"lcname": "sequana-nanomerge"
}