labxpipe

Name	labxpipe JSON
Version	0.7.1 JSON
	download
home_page
Summary	Genomics pipelines
upload_time	2023-08-01 22:43:31
maintainer
docs_url	None
author	Charles E. Vejnar
requires_python	>=3.9
license	Mozilla Public License 2.0 (MPL 2.0)
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # <img src="https://raw.githubusercontent.com/vejnar/LabxPipe/main/img/logo.svg" alt="LabxPipe" width="45%" />

[![MPLv2](https://img.shields.io/aur/license/python-labxpipe?color=1793d1&style=for-the-badge)](https://mozilla.org/MPL/2.0/)

* Integrated with [LabxDB](https://labxdb.vejnar.org): all required annotations (labels, strand, paired etc) are retrieved from LabxDB. This is optional.
* Based on existing robust technologies. No new language.
    * LabxPipe pipelines are defined in JSON text files.
    * LabxPipe is written in Python. Using norms, such as input and output filenames, insures compatibility between tasks.
* Simple and complex pipelines.
    * By default, pipelines are linear (one step after the other).
    * Branching is easily achieved be defining a previous step (using `step_input` parameter) allowing users to create any dependency between tasks.
* Parallelized using robust asynchronous threads from the Python standard library.

## Examples

See JSON files in `config/pipelines` of this repository.

| Pipeline JSON file            |                                                                                                                           |
| ----------------------------- | ------------------------------------------------------------------------------------------------------------------------- |
| `mrna_seq.json`               | mRNA-seq.                                                                                                                 |
| `mrna_seq_profiling_bam.json` | mRNA-seq. Genomic coverage profiles using [GeneAbacus](https://sr.ht/~vejnar/GeneAbacus). BAM and SAM outputs.            |
| `mrna_seq_no_db.json`         | mRNA-seq. No [LabxDB](https://labxdb.vejnar.org).                                                                         |
| `mrna_seq_with_plotting.json` | mRNA-seq. Plotting non-mapped reads. Demonstrate `step_input`.                                                            |
| `mrna_seq_cufflinks.json`     | mRNA-seq. Replaces GeneAbacus by Cufflinks.                                                                               |
| `chip_seq.json`               | ChIP-seq. [Bowtie2](https://github.com/BenLangmead/bowtie2) and [Samtools](http://www.htslib.org) to uniquify reads.      |
| `chip_seq_user_function.json` | ChIP-seq. [Bowtie2](https://github.com/BenLangmead/bowtie2) and [Samtools](http://www.htslib.org) to uniquify reads. Genomic coverage profiles using [GeneAbacus](https://sr.ht/~vejnar/GeneAbacus). Peak-calling using [MACS3](https://github.com/macs3-project/MACS) employing a *user-defined step/function*. |

Following demonstrates how to apply `mrna_seq.json` pipeline. It requires:
* [LabxDB](https://labxdb.vejnar.org)
* FASTQ files for sample named `AGR000850` and `AGR000912`
    ```
    /plus/data/seq/by_run/AGR000850
    ├── 23_009_R1.fastq.zst
    └── 23_009_R2.fastq.zst
    /plus/data/seq/by_run/AGR000912
    ├── 65_009_R1.fastq.zst
    └── 65_009_R2.fastq.zst
    ```

Note: `mrna_seq_no_db.json` demonstrates how to use LabxPipe *without* LabxDB: it only requires FASTQ files (in `path_seq_run` directory, see above).

Requirements:
* [LabxDB](https://labxdb.vejnar.org). Alternatively, `mrna_seq_no_db.json` doesn't require LabxDB.
* [ReadKnead](https://sr.ht/~vejnar/ReadKnead) to trim reads.
* [STAR](https://github.com/alexdobin/STAR) and genome index in directory defined `path_star_index`.
* [GeneAbacus](https://sr.ht/~vejnar/GeneAbacus) to count reads and generate genomic profile for tracks.

1. Start pipeline:
    ```bash
    lxpipe run --pipeline mrna_seq.json \
               --worker 2 \
               --processor 16
    ```
    Output is written in `path_output` directory.
2. Create report:
    ```bash
    lxpipe report --pipeline mrna_seq.json
    ```
    Report file `mrna_seq.xlsx` should be created in same directory as `mrna_seq.json`.
3. Extract output file(s) to use them directly, for instance to load them in IGV. For example:
    * To extract BAM files and rename them using the sample label:
        ```bash
        lxpipe extract --pipeline mrna_seq.json \
                       --files aligning,accepted_hits.sam.zst \
                       --label
        ```
    * To extract BigWig profile files and rename them using the sample label and reference in addition to the original filename used as filename suffix:
        ```bash
        lxpipe extract --pipeline mrna_seq.json \
                       --files profiling,genome_plus.bw \
                       --label \
                       --reference \
                       --suffix
        ```
    Use `-d`/`--dry_run` to test the extract command before applying it.
4. Merge gene/mRNA counts generated by [GeneAbacus](https://sr.ht/~vejnar/GeneAbacus) in `counting` directory:
    ```bash
    lxpipe merge-count --pipeline mrna_seq.json \
                       --step counting
    ```
5. Create a trackhub. Requirements:
    * [ChromosomeMappings](https://github.com/dpryan79/ChromosomeMappings) file (to map chromosome names from Ensembl/NCBI to UCSC)
    * Tabulated file (with chromosome name and length)

    Execute in a separate directory:
    ```bash
    lxpipe trackhub --runs AGR000850,AGR000912 \
                    --species_ucsc danRer11 \
                    --path_genome /plus/scratch/sai/annots/danrer_genome_all_ensembl_grcz11_ucsc_chroms_chrom_length.tab \
                    --path_mapping /plus/scratch/sai/annots/ChromosomeMappings/GRCz11_ensembl2UCSC.txt \
                    --input_sam \
                    --bam_names accepted_hits.sam.zst \
                    --make_config \
                    --make_trackhub \
                    --make_bigwig \
                    --processor 16
    ```
    Directory is ready to be shared by a web server for display in the [UCSC genome browser](https://genome.ucsc.edu/cgi-bin/hgHubConnect).

## Configuration

Parameters can be defined [globally](https://labxdb.vejnar.org/doc/install/python/#configuration). See in `config` directory of this repository for examples.

## Writing pipelines

Parameters are defined first globally (see above), then per pipeline, then per replicate/run, and then per step/function. The latest definition takes precedence: `path_seq_run` defined in `/etc/hts/labxpipe.json` is used by default, but if `path_seq_run` is defined in the pipeline file, it will be used instead.

Main parameters

| Parameter           | Type          |
| ------------------- | ------------- |
| name                | string        |
| path_output         | string        |
| path_seq_run        | string        |
| path_local_steps    | string        |
| path_annots         | string        |
| path_bowtie2_index  | string        |
| path_bwa-mem2_index | string        |
| path_minimap2_index | string        |
| path_star_index     | string        |
| fastq_exts          | []strings     |
| adaptors            | {}            |
| logging_level       | string        |
| run_refs            | []strings     |
| replicate_refs      | []strings     |
| ref_info_source     | []strings     |
| ref_infos           | {}            |
| analysis            | [{}, {}, ...] |

Parameters for all steps

| Parameter     | Type    |
| ------------- | ------- |
| step_name     | string  |
| step_function | string  |
| step_desc     | string  |
| force         | boolean |

Step-specific parameters

| Step               | Synonym          | Parameter             | Type          |
| ------------------ | ---------------- | --------------------- | ------------- |
| readknead          | preparing        | options               | []strings     |
|                    |                  | ops_r1                | [{}, {}, ...] |
|                    |                  | ops_r2                | [{}, {}, ...] |
|                    |                  | plot_fastq_in         | boolean       |
|                    |                  | plot_fastq            | boolean       |
|                    |                  | fastq_out             | boolean       |
|                    |                  | zip_fastq_out         | string        |
| bowtie2            | genomic_aligning | options               | []strings     |
|                    |                  | index                 | string        |
|                    |                  | output                | string        |
|                    |                  | output_unfiltered     | string        |
|                    |                  | compress_sam          | boolean       |
|                    |                  | compress_sam_cmd      | string        |
|                    |                  | create_bam◆           | boolean       |
|                    |                  | index_bam◆            | boolean       |
| bwa-mem2           |                  | options               | []strings     |
|                    |                  | index                 | string        |
|                    |                  | output                | string        |
|                    |                  | compress_output       | boolean       |
|                    |                  | compress_output_cmd   | string        |
|                    |                  | create_bam◆           | boolean       |
|                    |                  | index_bam◆            | boolean       |
| minimap2           |                  | options               | []strings     |
|                    |                  | index                 | string        |
|                    |                  | output                | string        |
|                    |                  | compress_output       | boolean       |
|                    |                  | compress_output_cmd   | string        |
|                    |                  | create_bam◆           | boolean       |
|                    |                  | index_bam◆            | boolean       |
| star               | aligning         | options               | []strings     |
|                    |                  | index                 | string        |
|                    |                  | output_type           | []strings     |
|                    |                  | compress_sam          | boolean       |
|                    |                  | compress_sam_cmd      | string        |
|                    |                  | compress_unmapped     | boolean       |
|                    |                  | compress_unmapped_cmd | string        |
| cufflinks          |                  | options               | []strings     |
|                    |                  | inputs                | [{}, {}, ...] |
|                    |                  | features              | [{}, {}, ...] |
| geneabacus         | counting         | options               | []strings     |
|                    |                  | inputs                | [{}, {}, ...] |
|                    |                  | path_annots           | string        |
|                    |                  | features              | [{}, {}, ...] |
| samtools_sort      |                  | options               | []strings     |
|                    |                  | sort_by_name_bam      | boolean       |
| samtools_uniquify  |                  | options               | []strings     |
|                    |                  | sort_by_name_bam      | boolean       |
|                    |                  | index_bam             | boolean       |
| cleaning           |                  | steps                 | [{}, {}, ...] |

◆ indicates exclusive options. For example, either `create_bam` or `index_bam` can be used, but not both.

Sample-specific parameters. Automatically populated if using LabxDB or sourced from `ref_infos`. These parameters can be changed manually in any step (for example setting `paired` to `false` will ignore second reads in that step).

| Parameter      | Type    |
| -------------- | ------- |
| label_short    | string  |
| paired         | boolean |
| directional    | boolean |
| r1_strand      | string  |
| quality_scores | string  |

## User-defined step

In addition to the provided steps/functions, i.e. `bowtie2`, `star` or `geneabacus`, users can defined their own step, usable in the LabxPipe pipelines. LabxPipe will import user-defined steps:
* Written in Python
* One step per file with the `.py` extension located in the directory defined by `path_local_steps`
* Each step defined in individual file requires:
    1. A `functions` variable listing the step name(s)
    2. A function named `run` with the 3 parameters `path_in`, `path_out` and `params`

    For example:
    ```python
    functions = ['macs3']
    def run(path_in, path_out, params):
        ...
    ```

Example of a user-defined function providing peak-calling using [MACS3](https://github.com/macs3-project/MACS) is available in `config/user_steps/macs3.py` in this repository.

Example of a pipeline using the [MACS3](https://github.com/macs3-project/MACS) step is available in `config/pipelines/chip_seq_user_function.json` in this repository.

## Demultiplexing sequencing reads: `lxpipe demultiplex`

* Demultiplex reads based on barcode sequences from the `Second barcode` field in [LabxDB](https://labxdb.vejnar.org)
* Demultiplexing using [ReadKnead](https://sr.ht/~vejnar/ReadKnead). The most important for demultiplexing is the ReadKnead pipeline. Pipelines are identified using the `Adapter 3'` field in LabxDB.

* Example for simple demultiplexing. The first nucleotides at the 5' end of read 1 are used as barcodes (the `Adapter 3'` field is set to `sRNA 1.5` in LabxDB for these samples) with the following pipeline:
    ```json
    {
        "sRNA 1.5": {
            "R1": [
                {
                    "name": "demultiplex",
                    "end": 5,
                    "max_mismatch": 1
                }
            ],
            "R2": null
        }
    }
    ```
    The barcode sequences are added by LabxPipe using the `Second barcode` field in [LabxDB](https://labxdb.vejnar.org).

* Example for iCLIP demultiplexing. In [Vejnar et al.](https://pubmed.ncbi.nlm.nih.gov/31227602), iCLIP is demultiplexed (the `Adapter 3'` field is set to `TruSeq-DMS+A Index` in LabxDB for these samples) using the following pipeline:
    ```json
    {
        "TruSeq-DMS+A Index": {
            "R1": [
                {
                    "name": "clip",
                    "end": 5,
                    "length": 4,
                    "add_clipped": true
                },
                {
                    "name": "trim",
                    "end": 3,
                    "algo": "bktrim",
                    "min_sequence": 5,
                    "keep": ["trim_exact", "trim_align"]
                },
                {
                    "name": "length",
                    "min_length": 6
                },
                {
                    "name": "demultiplex",
                    "end": 3,
                    "max_mismatch": 1,
                    "length_ligand": 2
                },
                {
                    "name": "length",
                    "min_length": 15
                }
            ],
            "R2": null
        }
    }
    ```
    Pipeline is stored in `demux_truseq_dms_a.json`. The barcode sequences are added by LabxPipe using the `Second barcode` field in [LabxDB](https://labxdb.vejnar.org). (NB: published demultiplexed data were generated using `"algo": "align"` with a minimum score of 80 instead of `"algo": "bktrim"`)

    Then pipeline was tested running:
    ```bash
    lxpipe demultiplex --bulk HHYLKADXX \
                       --path_demux_ops demux_truseq_dms_a.json \
                       --path_seq_prepared prepared \
                       --demux_nozip \
                       --processor 1 \
                       --demux_verbose_level 20 \
                       --no_readonly
    ```
    This output is **very verbose**: for every read, output from every step of the demultiplexing pipeline is reported. To get consistent output, `--processor` must be set to `1`. Output is written in local directory `prepared`.

    And finally, once pipeline is validated (data is written in `path_seq_prepared` directory, see [here](https://labxdb.vejnar.org/doc/install/python/#configuration)):
    ```bash
    lxpipe demultiplex --bulk HHYLKADXX \
                       --path_demux_ops demux_truseq_dms_a.json \
                       --processor 10
    ```

## License

*LabxPipe* is distributed under the Mozilla Public License Version 2.0 (see /LICENSE).

Copyright © 2013-2023 Charles E. Vejnar

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "labxpipe",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": "",
    "keywords": "",
    "author": "Charles E. Vejnar",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/0c/66/b80e5f1650286cdc8b1f3fe15cf773742cd1df84756a9bb15d41a0045ca9/labxpipe-0.7.1.tar.gz",
    "platform": null,
    "description": "# <img src=\"https://raw.githubusercontent.com/vejnar/LabxPipe/main/img/logo.svg\" alt=\"LabxPipe\" width=\"45%\" />\n\n[![MPLv2](https://img.shields.io/aur/license/python-labxpipe?color=1793d1&style=for-the-badge)](https://mozilla.org/MPL/2.0/)\n\n* Integrated with [LabxDB](https://labxdb.vejnar.org): all required annotations (labels, strand, paired etc) are retrieved from LabxDB. This is optional.\n* Based on existing robust technologies. No new language.\n    * LabxPipe pipelines are defined in JSON text files.\n    * LabxPipe is written in Python. Using norms, such as input and output filenames, insures compatibility between tasks.\n* Simple and complex pipelines.\n    * By default, pipelines are linear (one step after the other).\n    * Branching is easily achieved be defining a previous step (using `step_input` parameter) allowing users to create any dependency between tasks.\n* Parallelized using robust asynchronous threads from the Python standard library.\n\n## Examples\n\nSee JSON files in `config/pipelines` of this repository.\n\n| Pipeline JSON file            |                                                                                                                           |\n| ----------------------------- | ------------------------------------------------------------------------------------------------------------------------- |\n| `mrna_seq.json`               | mRNA-seq.                                                                                                                 |\n| `mrna_seq_profiling_bam.json` | mRNA-seq. Genomic coverage profiles using [GeneAbacus](https://sr.ht/~vejnar/GeneAbacus). BAM and SAM outputs.            |\n| `mrna_seq_no_db.json`         | mRNA-seq. No [LabxDB](https://labxdb.vejnar.org).                                                                         |\n| `mrna_seq_with_plotting.json` | mRNA-seq. Plotting non-mapped reads. Demonstrate `step_input`.                                                            |\n| `mrna_seq_cufflinks.json`     | mRNA-seq. Replaces GeneAbacus by Cufflinks.                                                                               |\n| `chip_seq.json`               | ChIP-seq. [Bowtie2](https://github.com/BenLangmead/bowtie2) and [Samtools](http://www.htslib.org) to uniquify reads.      |\n| `chip_seq_user_function.json` | ChIP-seq. [Bowtie2](https://github.com/BenLangmead/bowtie2) and [Samtools](http://www.htslib.org) to uniquify reads. Genomic coverage profiles using [GeneAbacus](https://sr.ht/~vejnar/GeneAbacus). Peak-calling using [MACS3](https://github.com/macs3-project/MACS) employing a *user-defined step/function*. |\n\nFollowing demonstrates how to apply `mrna_seq.json` pipeline. It requires:\n* [LabxDB](https://labxdb.vejnar.org)\n* FASTQ files for sample named `AGR000850` and `AGR000912`\n    ```\n    /plus/data/seq/by_run/AGR000850\n    \u251c\u2500\u2500 23_009_R1.fastq.zst\n    \u2514\u2500\u2500 23_009_R2.fastq.zst\n    /plus/data/seq/by_run/AGR000912\n    \u251c\u2500\u2500 65_009_R1.fastq.zst\n    \u2514\u2500\u2500 65_009_R2.fastq.zst\n    ```\n\nNote: `mrna_seq_no_db.json` demonstrates how to use LabxPipe *without* LabxDB: it only requires FASTQ files (in `path_seq_run` directory, see above).\n\nRequirements:\n* [LabxDB](https://labxdb.vejnar.org). Alternatively, `mrna_seq_no_db.json` doesn't require LabxDB.\n* [ReadKnead](https://sr.ht/~vejnar/ReadKnead) to trim reads.\n* [STAR](https://github.com/alexdobin/STAR) and genome index in directory defined `path_star_index`.\n* [GeneAbacus](https://sr.ht/~vejnar/GeneAbacus) to count reads and generate genomic profile for tracks.\n\n1. Start pipeline:\n    ```bash\n    lxpipe run --pipeline mrna_seq.json \\\n               --worker 2 \\\n               --processor 16\n    ```\n    Output is written in `path_output` directory.\n2. Create report:\n    ```bash\n    lxpipe report --pipeline mrna_seq.json\n    ```\n    Report file `mrna_seq.xlsx` should be created in same directory as `mrna_seq.json`.\n3. Extract output file(s) to use them directly, for instance to load them in IGV. For example:\n    * To extract BAM files and rename them using the sample label:\n        ```bash\n        lxpipe extract --pipeline mrna_seq.json \\\n                       --files aligning,accepted_hits.sam.zst \\\n                       --label\n        ```\n    * To extract BigWig profile files and rename them using the sample label and reference in addition to the original filename used as filename suffix:\n        ```bash\n        lxpipe extract --pipeline mrna_seq.json \\\n                       --files profiling,genome_plus.bw \\\n                       --label \\\n                       --reference \\\n                       --suffix\n        ```\n    Use `-d`/`--dry_run` to test the extract command before applying it.\n4. Merge gene/mRNA counts generated by [GeneAbacus](https://sr.ht/~vejnar/GeneAbacus) in `counting` directory:\n    ```bash\n    lxpipe merge-count --pipeline mrna_seq.json \\\n                       --step counting\n    ```\n5. Create a trackhub. Requirements:\n    * [ChromosomeMappings](https://github.com/dpryan79/ChromosomeMappings) file (to map chromosome names from Ensembl/NCBI to UCSC)\n    * Tabulated file (with chromosome name and length)\n\n    Execute in a separate directory:\n    ```bash\n    lxpipe trackhub --runs AGR000850,AGR000912 \\\n                    --species_ucsc danRer11 \\\n                    --path_genome /plus/scratch/sai/annots/danrer_genome_all_ensembl_grcz11_ucsc_chroms_chrom_length.tab \\\n                    --path_mapping /plus/scratch/sai/annots/ChromosomeMappings/GRCz11_ensembl2UCSC.txt \\\n                    --input_sam \\\n                    --bam_names accepted_hits.sam.zst \\\n                    --make_config \\\n                    --make_trackhub \\\n                    --make_bigwig \\\n                    --processor 16\n    ```\n    Directory is ready to be shared by a web server for display in the [UCSC genome browser](https://genome.ucsc.edu/cgi-bin/hgHubConnect).\n\n## Configuration\n\nParameters can be defined [globally](https://labxdb.vejnar.org/doc/install/python/#configuration). See in `config` directory of this repository for examples.\n\n## Writing pipelines\n\nParameters are defined first globally (see above), then per pipeline, then per replicate/run, and then per step/function. The latest definition takes precedence: `path_seq_run` defined in `/etc/hts/labxpipe.json` is used by default, but if `path_seq_run` is defined in the pipeline file, it will be used instead.\n\nMain parameters\n\n| Parameter           | Type          |\n| ------------------- | ------------- |\n| name                | string        |\n| path_output         | string        |\n| path_seq_run        | string        |\n| path_local_steps    | string        |\n| path_annots         | string        |\n| path_bowtie2_index  | string        |\n| path_bwa-mem2_index | string        |\n| path_minimap2_index | string        |\n| path_star_index     | string        |\n| fastq_exts          | []strings     |\n| adaptors            | {}            |\n| logging_level       | string        |\n| run_refs            | []strings     |\n| replicate_refs      | []strings     |\n| ref_info_source     | []strings     |\n| ref_infos           | {}            |\n| analysis            | [{}, {}, ...] |\n\nParameters for all steps\n\n| Parameter     | Type    |\n| ------------- | ------- |\n| step_name     | string  |\n| step_function | string  |\n| step_desc     | string  |\n| force         | boolean |\n\nStep-specific parameters\n\n| Step               | Synonym          | Parameter             | Type          |\n| ------------------ | ---------------- | --------------------- | ------------- |\n| readknead          | preparing        | options               | []strings     |\n|                    |                  | ops_r1                | [{}, {}, ...] |\n|                    |                  | ops_r2                | [{}, {}, ...] |\n|                    |                  | plot_fastq_in         | boolean       |\n|                    |                  | plot_fastq            | boolean       |\n|                    |                  | fastq_out             | boolean       |\n|                    |                  | zip_fastq_out         | string        |\n| bowtie2            | genomic_aligning | options               | []strings     |\n|                    |                  | index                 | string        |\n|                    |                  | output                | string        |\n|                    |                  | output_unfiltered     | string        |\n|                    |                  | compress_sam          | boolean       |\n|                    |                  | compress_sam_cmd      | string        |\n|                    |                  | create_bam\u25c6           | boolean       |\n|                    |                  | index_bam\u25c6            | boolean       |\n| bwa-mem2           |                  | options               | []strings     |\n|                    |                  | index                 | string        |\n|                    |                  | output                | string        |\n|                    |                  | compress_output       | boolean       |\n|                    |                  | compress_output_cmd   | string        |\n|                    |                  | create_bam\u25c6           | boolean       |\n|                    |                  | index_bam\u25c6            | boolean       |\n| minimap2           |                  | options               | []strings     |\n|                    |                  | index                 | string        |\n|                    |                  | output                | string        |\n|                    |                  | compress_output       | boolean       |\n|                    |                  | compress_output_cmd   | string        |\n|                    |                  | create_bam\u25c6           | boolean       |\n|                    |                  | index_bam\u25c6            | boolean       |\n| star               | aligning         | options               | []strings     |\n|                    |                  | index                 | string        |\n|                    |                  | output_type           | []strings     |\n|                    |                  | compress_sam          | boolean       |\n|                    |                  | compress_sam_cmd      | string        |\n|                    |                  | compress_unmapped     | boolean       |\n|                    |                  | compress_unmapped_cmd | string        |\n| cufflinks          |                  | options               | []strings     |\n|                    |                  | inputs                | [{}, {}, ...] |\n|                    |                  | features              | [{}, {}, ...] |\n| geneabacus         | counting         | options               | []strings     |\n|                    |                  | inputs                | [{}, {}, ...] |\n|                    |                  | path_annots           | string        |\n|                    |                  | features              | [{}, {}, ...] |\n| samtools_sort      |                  | options               | []strings     |\n|                    |                  | sort_by_name_bam      | boolean       |\n| samtools_uniquify  |                  | options               | []strings     |\n|                    |                  | sort_by_name_bam      | boolean       |\n|                    |                  | index_bam             | boolean       |\n| cleaning           |                  | steps                 | [{}, {}, ...] |\n\n\u25c6 indicates exclusive options. For example, either `create_bam` or `index_bam` can be used, but not both.\n\nSample-specific parameters. Automatically populated if using LabxDB or sourced from `ref_infos`. These parameters can be changed manually in any step (for example setting `paired` to `false` will ignore second reads in that step).\n\n| Parameter      | Type    |\n| -------------- | ------- |\n| label_short    | string  |\n| paired         | boolean |\n| directional    | boolean |\n| r1_strand      | string  |\n| quality_scores | string  |\n\n## User-defined step\n\nIn addition to the provided steps/functions, i.e. `bowtie2`, `star` or `geneabacus`, users can defined their own step, usable in the LabxPipe pipelines. LabxPipe will import user-defined steps:\n* Written in Python\n* One step per file with the `.py` extension located in the directory defined by `path_local_steps`\n* Each step defined in individual file requires:\n    1. A `functions` variable listing the step name(s)\n    2. A function named `run` with the 3 parameters `path_in`, `path_out` and `params`\n\n    For example:\n    ```python\n    functions = ['macs3']\n    def run(path_in, path_out, params):\n        ...\n    ```\n\nExample of a user-defined function providing peak-calling using [MACS3](https://github.com/macs3-project/MACS) is available in `config/user_steps/macs3.py` in this repository.\n\nExample of a pipeline using the [MACS3](https://github.com/macs3-project/MACS) step is available in `config/pipelines/chip_seq_user_function.json` in this repository.\n\n## Demultiplexing sequencing reads: `lxpipe demultiplex`\n\n* Demultiplex reads based on barcode sequences from the `Second barcode` field in [LabxDB](https://labxdb.vejnar.org)\n* Demultiplexing using [ReadKnead](https://sr.ht/~vejnar/ReadKnead). The most important for demultiplexing is the ReadKnead pipeline. Pipelines are identified using the `Adapter 3'` field in LabxDB.\n\n* Example for simple demultiplexing. The first nucleotides at the 5' end of read 1 are used as barcodes (the `Adapter 3'` field is set to `sRNA 1.5` in LabxDB for these samples) with the following pipeline:\n    ```json\n    {\n        \"sRNA 1.5\": {\n            \"R1\": [\n                {\n                    \"name\": \"demultiplex\",\n                    \"end\": 5,\n                    \"max_mismatch\": 1\n                }\n            ],\n            \"R2\": null\n        }\n    }\n    ```\n    The barcode sequences are added by LabxPipe using the `Second barcode` field in [LabxDB](https://labxdb.vejnar.org).\n\n* Example for iCLIP demultiplexing. In [Vejnar et al.](https://pubmed.ncbi.nlm.nih.gov/31227602), iCLIP is demultiplexed (the `Adapter 3'` field is set to `TruSeq-DMS+A Index` in LabxDB for these samples) using the following pipeline:\n    ```json\n    {\n        \"TruSeq-DMS+A Index\": {\n            \"R1\": [\n                {\n                    \"name\": \"clip\",\n                    \"end\": 5,\n                    \"length\": 4,\n                    \"add_clipped\": true\n                },\n                {\n                    \"name\": \"trim\",\n                    \"end\": 3,\n                    \"algo\": \"bktrim\",\n                    \"min_sequence\": 5,\n                    \"keep\": [\"trim_exact\", \"trim_align\"]\n                },\n                {\n                    \"name\": \"length\",\n                    \"min_length\": 6\n                },\n                {\n                    \"name\": \"demultiplex\",\n                    \"end\": 3,\n                    \"max_mismatch\": 1,\n                    \"length_ligand\": 2\n                },\n                {\n                    \"name\": \"length\",\n                    \"min_length\": 15\n                }\n            ],\n            \"R2\": null\n        }\n    }\n    ```\n    Pipeline is stored in `demux_truseq_dms_a.json`. The barcode sequences are added by LabxPipe using the `Second barcode` field in [LabxDB](https://labxdb.vejnar.org). (NB: published demultiplexed data were generated using `\"algo\": \"align\"` with a minimum score of 80 instead of `\"algo\": \"bktrim\"`)\n\n    Then pipeline was tested running:\n    ```bash\n    lxpipe demultiplex --bulk HHYLKADXX \\\n                       --path_demux_ops demux_truseq_dms_a.json \\\n                       --path_seq_prepared prepared \\\n                       --demux_nozip \\\n                       --processor 1 \\\n                       --demux_verbose_level 20 \\\n                       --no_readonly\n    ```\n    This output is **very verbose**: for every read, output from every step of the demultiplexing pipeline is reported. To get consistent output, `--processor` must be set to `1`. Output is written in local directory `prepared`.\n\n    And finally, once pipeline is validated (data is written in `path_seq_prepared` directory, see [here](https://labxdb.vejnar.org/doc/install/python/#configuration)):\n    ```bash\n    lxpipe demultiplex --bulk HHYLKADXX \\\n                       --path_demux_ops demux_truseq_dms_a.json \\\n                       --processor 10\n    ```\n\n## License\n\n*LabxPipe* is distributed under the Mozilla Public License Version 2.0 (see /LICENSE).\n\nCopyright \u00a9 2013-2023 Charles E. Vejnar\n",
    "bugtrack_url": null,
    "license": "Mozilla Public License 2.0 (MPL 2.0)",
    "summary": "Genomics pipelines",
    "version": "0.7.1",
    "project_urls": {
        "homepage": "https://git.sr.ht/~vejnar/LabxPipe"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0c66b80e5f1650286cdc8b1f3fe15cf773742cd1df84756a9bb15d41a0045ca9",
                "md5": "2581f5aa748effcdb60870e21fc8d621",
                "sha256": "74348cd7a0e8c6d349d4423734cc4162798b7734e925fa3e9b1977e2705bab5c"
            },
            "downloads": -1,
            "filename": "labxpipe-0.7.1.tar.gz",
            "has_sig": false,
            "md5_digest": "2581f5aa748effcdb60870e21fc8d621",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 58199,
            "upload_time": "2023-08-01T22:43:31",
            "upload_time_iso_8601": "2023-08-01T22:43:31.699224Z",
            "url": "https://files.pythonhosted.org/packages/0c/66/b80e5f1650286cdc8b1f3fe15cf773742cd1df84756a9bb15d41a0045ca9/labxpipe-0.7.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-08-01 22:43:31",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "labxpipe"
}

Charles E. Vejnar