# <img src="https://raw.githubusercontent.com/vejnar/LabxPipe/main/img/logo.svg" alt="LabxPipe" width="45%" />
[](https://mozilla.org/MPL/2.0/)
* Integrated with [LabxDB](https://labxdb.vejnar.org): all required annotations (labels, strand, paired etc) are retrieved from LabxDB. This is optional.
* Based on existing robust technologies. No new language.
* LabxPipe pipelines are defined in JSON text files.
* LabxPipe is written in Python. Using norms, such as input and output filenames, insures compatibility between tasks.
* Simple and complex pipelines.
* By default, pipelines are linear (one step after the other).
* Branching is easily achieved be defining a previous step (using `step_input` parameter) allowing users to create any dependency between tasks.
* Parallelized using robust asynchronous threads from the Python standard library.
## Examples
See JSON files in `config/pipelines` of this repository.
| Pipeline JSON file | |
| ----------------------------- | ------------------------------------------------------------------------------------------------------------------------- |
| `mrna_seq.json` | mRNA-seq. |
| `mrna_seq_profiling_bam.json` | mRNA-seq. Genomic coverage profiles using [GeneAbacus](https://sr.ht/~vejnar/GeneAbacus). BAM and SAM outputs. |
| `mrna_seq_no_db.json` | mRNA-seq. No [LabxDB](https://labxdb.vejnar.org). |
| `mrna_seq_with_plotting.json` | mRNA-seq. Plotting non-mapped reads. Demonstrate `step_input`. |
| `mrna_seq_cufflinks.json` | mRNA-seq. Replaces GeneAbacus by Cufflinks. |
| `chip_seq.json` | ChIP-seq. [Bowtie2](https://github.com/BenLangmead/bowtie2) and [Samtools](http://www.htslib.org) to uniquify reads. |
| `chip_seq_user_function.json` | ChIP-seq. [Bowtie2](https://github.com/BenLangmead/bowtie2) and [Samtools](http://www.htslib.org) to uniquify reads. Genomic coverage profiles using [GeneAbacus](https://sr.ht/~vejnar/GeneAbacus). Peak-calling using [MACS3](https://github.com/macs3-project/MACS) employing a *user-defined step/function*. |
Following demonstrates how to apply `mrna_seq.json` pipeline. It requires:
* [LabxDB](https://labxdb.vejnar.org)
* FASTQ files for sample named `AGR000850` and `AGR000912`
```
/plus/data/seq/by_run/AGR000850
├── 23_009_R1.fastq.zst
└── 23_009_R2.fastq.zst
/plus/data/seq/by_run/AGR000912
├── 65_009_R1.fastq.zst
└── 65_009_R2.fastq.zst
```
Note: `mrna_seq_no_db.json` demonstrates how to use LabxPipe *without* LabxDB: it only requires FASTQ files (in `path_seq_run` directory, see above).
Requirements:
* [LabxDB](https://labxdb.vejnar.org). Alternatively, `mrna_seq_no_db.json` doesn't require LabxDB.
* [ReadKnead](https://sr.ht/~vejnar/ReadKnead) to trim reads.
* [STAR](https://github.com/alexdobin/STAR) and genome index in directory defined `path_star_index`.
* [GeneAbacus](https://sr.ht/~vejnar/GeneAbacus) to count reads and generate genomic profile for tracks.
1. Start pipeline:
```bash
lxpipe run --pipeline mrna_seq.json \
--worker 2 \
--processor 16
```
Output is written in `path_output` directory.
2. Create report:
```bash
lxpipe report --pipeline mrna_seq.json
```
Report file `mrna_seq.xlsx` should be created in same directory as `mrna_seq.json`.
3. Extract output file(s) to use them directly, for instance to load them in IGV. For example:
* To extract BAM files and rename them using the sample label:
```bash
lxpipe extract --pipeline mrna_seq.json \
--files aligning,accepted_hits.sam.zst \
--label
```
* To extract BigWig profile files and rename them using the sample label and reference in addition to the original filename used as filename suffix:
```bash
lxpipe extract --pipeline mrna_seq.json \
--files profiling,genome_plus.bw \
--label \
--reference \
--suffix
```
Use `-d`/`--dry_run` to test the extract command before applying it.
4. Merge gene/mRNA counts generated by [GeneAbacus](https://sr.ht/~vejnar/GeneAbacus) in `counting` directory:
```bash
lxpipe merge-count --pipeline mrna_seq.json \
--step counting
```
5. Create a trackhub. Requirements:
* [ChromosomeMappings](https://github.com/dpryan79/ChromosomeMappings) file (to map chromosome names from Ensembl/NCBI to UCSC)
* Tabulated file (with chromosome name and length)
Execute in a separate directory:
```bash
lxpipe trackhub --runs AGR000850,AGR000912 \
--species_ucsc danRer11 \
--path_genome /plus/scratch/sai/annots/danrer_genome_all_ensembl_grcz11_ucsc_chroms_chrom_length.tab \
--path_mapping /plus/scratch/sai/annots/ChromosomeMappings/GRCz11_ensembl2UCSC.txt \
--input_sam \
--bam_names accepted_hits.sam.zst \
--make_config \
--make_trackhub \
--make_bigwig \
--processor 16
```
Directory is ready to be shared by a web server for display in the [UCSC genome browser](https://genome.ucsc.edu/cgi-bin/hgHubConnect).
## Configuration
Parameters can be defined [globally](https://labxdb.vejnar.org/doc/install/python/#configuration). See in `config` directory of this repository for examples.
## Writing pipelines
Parameters are defined first globally (see above), then per pipeline, then per replicate/run, and then per step/function. The latest definition takes precedence: `path_seq_run` defined in `/etc/hts/labxpipe.json` is used by default, but if `path_seq_run` is defined in the pipeline file, it will be used instead.
Main parameters
| Parameter | Type |
| ------------------- | ------------- |
| name | string |
| path_output | string |
| path_seq_run | string |
| path_local_steps | string |
| path_annots | string |
| path_bowtie2_index | string |
| path_bwa-mem2_index | string |
| path_minimap2_index | string |
| path_star_index | string |
| fastq_exts | []strings |
| adaptors | {} |
| logging_level | string |
| run_refs | []strings |
| replicate_refs | []strings |
| ref_info_source | []strings |
| ref_infos | {} |
| analysis | [{}, {}, ...] |
Parameters for all steps
| Parameter | Type |
| ------------- | ------- |
| step_name | string |
| step_function | string |
| step_desc | string |
| force | boolean |
Step-specific parameters
| Step | Synonym | Parameter | Type |
| ------------------ | ---------------- | --------------------- | ------------- |
| readknead | preparing | options | []strings |
| | | ops_r1 | [{}, {}, ...] |
| | | ops_r2 | [{}, {}, ...] |
| | | plot_fastq_in | boolean |
| | | plot_fastq | boolean |
| | | fastq_out | boolean |
| | | zip_fastq_out | string |
| bowtie2 | genomic_aligning | options | []strings |
| | | index | string |
| | | output | string |
| | | output_unfiltered | string |
| | | compress_sam | boolean |
| | | compress_sam_cmd | string |
| | | create_bam◆ | boolean |
| | | index_bam◆ | boolean |
| bwa-mem2 | | options | []strings |
| | | index | string |
| | | output | string |
| | | compress_output | boolean |
| | | compress_output_cmd | string |
| | | create_bam◆ | boolean |
| | | index_bam◆ | boolean |
| minimap2 | | options | []strings |
| | | index | string |
| | | output | string |
| | | compress_output | boolean |
| | | compress_output_cmd | string |
| | | create_bam◆ | boolean |
| | | index_bam◆ | boolean |
| star | aligning | options | []strings |
| | | index | string |
| | | output_type | []strings |
| | | compress_sam | boolean |
| | | compress_sam_cmd | string |
| | | compress_unmapped | boolean |
| | | compress_unmapped_cmd | string |
| cufflinks | | options | []strings |
| | | inputs | [{}, {}, ...] |
| | | features | [{}, {}, ...] |
| geneabacus | counting | options | []strings |
| | | inputs | [{}, {}, ...] |
| | | path_annots | string |
| | | features | [{}, {}, ...] |
| samtools_sort | | options | []strings |
| | | sort_by_name_bam | boolean |
| samtools_uniquify | | options | []strings |
| | | sort_by_name_bam | boolean |
| | | index_bam | boolean |
| cleaning | | steps | [{}, {}, ...] |
◆ indicates exclusive options. For example, either `create_bam` or `index_bam` can be used, but not both.
Sample-specific parameters. Automatically populated if using LabxDB or sourced from `ref_infos`. These parameters can be changed manually in any step (for example setting `paired` to `false` will ignore second reads in that step).
| Parameter | Type |
| -------------- | ------- |
| label_short | string |
| paired | boolean |
| directional | boolean |
| r1_strand | string |
| quality_scores | string |
## User-defined step
In addition to the provided steps/functions, i.e. `bowtie2`, `star` or `geneabacus`, users can defined their own step, usable in the LabxPipe pipelines. LabxPipe will import user-defined steps:
* Written in Python
* One step per file with the `.py` extension located in the directory defined by `path_local_steps`
* Each step defined in individual file requires:
1. A `functions` variable listing the step name(s)
2. A function named `run` with the 3 parameters `path_in`, `path_out` and `params`
For example:
```python
functions = ['macs3']
def run(path_in, path_out, params):
...
```
Example of a user-defined function providing peak-calling using [MACS3](https://github.com/macs3-project/MACS) is available in `config/user_steps/macs3.py` in this repository.
Example of a pipeline using the [MACS3](https://github.com/macs3-project/MACS) step is available in `config/pipelines/chip_seq_user_function.json` in this repository.
## Demultiplexing sequencing reads: `lxpipe demultiplex`
* Demultiplex reads based on barcode sequences from the `Second barcode` field in [LabxDB](https://labxdb.vejnar.org)
* Demultiplexing using [ReadKnead](https://sr.ht/~vejnar/ReadKnead). The most important for demultiplexing is the ReadKnead pipeline. Pipelines are identified using the `Adapter 3'` field in LabxDB.
* Example for simple demultiplexing. The first nucleotides at the 5' end of read 1 are used as barcodes (the `Adapter 3'` field is set to `sRNA 1.5` in LabxDB for these samples) with the following pipeline:
```json
{
"sRNA 1.5": {
"R1": [
{
"name": "demultiplex",
"end": 5,
"max_mismatch": 1
}
],
"R2": null
}
}
```
The barcode sequences are added by LabxPipe using the `Second barcode` field in [LabxDB](https://labxdb.vejnar.org).
* Example for iCLIP demultiplexing. In [Vejnar et al.](https://pubmed.ncbi.nlm.nih.gov/31227602), iCLIP is demultiplexed (the `Adapter 3'` field is set to `TruSeq-DMS+A Index` in LabxDB for these samples) using the following pipeline:
```json
{
"TruSeq-DMS+A Index": {
"R1": [
{
"name": "clip",
"end": 5,
"length": 4,
"add_clipped": true
},
{
"name": "trim",
"end": 3,
"algo": "bktrim",
"min_sequence": 5,
"keep": ["trim_exact", "trim_align"]
},
{
"name": "length",
"min_length": 6
},
{
"name": "demultiplex",
"end": 3,
"max_mismatch": 1,
"length_ligand": 2
},
{
"name": "length",
"min_length": 15
}
],
"R2": null
}
}
```
Pipeline is stored in `demux_truseq_dms_a.json`. The barcode sequences are added by LabxPipe using the `Second barcode` field in [LabxDB](https://labxdb.vejnar.org). (NB: published demultiplexed data were generated using `"algo": "align"` with a minimum score of 80 instead of `"algo": "bktrim"`)
Then pipeline was tested running:
```bash
lxpipe demultiplex --bulk HHYLKADXX \
--path_demux_ops demux_truseq_dms_a.json \
--path_seq_prepared prepared \
--demux_nozip \
--processor 1 \
--demux_verbose_level 20 \
--no_readonly
```
This output is **very verbose**: for every read, output from every step of the demultiplexing pipeline is reported. To get consistent output, `--processor` must be set to `1`. Output is written in local directory `prepared`.
And finally, once pipeline is validated (data is written in `path_seq_prepared` directory, see [here](https://labxdb.vejnar.org/doc/install/python/#configuration)):
```bash
lxpipe demultiplex --bulk HHYLKADXX \
--path_demux_ops demux_truseq_dms_a.json \
--processor 10
```
## License
*LabxPipe* is distributed under the Mozilla Public License Version 2.0 (see /LICENSE).
Copyright © 2013-2023 Charles E. Vejnar
Raw data
{
"_id": null,
"home_page": "",
"name": "labxpipe",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": "",
"keywords": "",
"author": "Charles E. Vejnar",
"author_email": "",
"download_url": "https://files.pythonhosted.org/packages/0c/66/b80e5f1650286cdc8b1f3fe15cf773742cd1df84756a9bb15d41a0045ca9/labxpipe-0.7.1.tar.gz",
"platform": null,
"description": "# <img src=\"https://raw.githubusercontent.com/vejnar/LabxPipe/main/img/logo.svg\" alt=\"LabxPipe\" width=\"45%\" />\n\n[](https://mozilla.org/MPL/2.0/)\n\n* Integrated with [LabxDB](https://labxdb.vejnar.org): all required annotations (labels, strand, paired etc) are retrieved from LabxDB. This is optional.\n* Based on existing robust technologies. No new language.\n * LabxPipe pipelines are defined in JSON text files.\n * LabxPipe is written in Python. Using norms, such as input and output filenames, insures compatibility between tasks.\n* Simple and complex pipelines.\n * By default, pipelines are linear (one step after the other).\n * Branching is easily achieved be defining a previous step (using `step_input` parameter) allowing users to create any dependency between tasks.\n* Parallelized using robust asynchronous threads from the Python standard library.\n\n## Examples\n\nSee JSON files in `config/pipelines` of this repository.\n\n| Pipeline JSON file | |\n| ----------------------------- | ------------------------------------------------------------------------------------------------------------------------- |\n| `mrna_seq.json` | mRNA-seq. |\n| `mrna_seq_profiling_bam.json` | mRNA-seq. Genomic coverage profiles using [GeneAbacus](https://sr.ht/~vejnar/GeneAbacus). BAM and SAM outputs. |\n| `mrna_seq_no_db.json` | mRNA-seq. No [LabxDB](https://labxdb.vejnar.org). |\n| `mrna_seq_with_plotting.json` | mRNA-seq. Plotting non-mapped reads. Demonstrate `step_input`. |\n| `mrna_seq_cufflinks.json` | mRNA-seq. Replaces GeneAbacus by Cufflinks. |\n| `chip_seq.json` | ChIP-seq. [Bowtie2](https://github.com/BenLangmead/bowtie2) and [Samtools](http://www.htslib.org) to uniquify reads. |\n| `chip_seq_user_function.json` | ChIP-seq. [Bowtie2](https://github.com/BenLangmead/bowtie2) and [Samtools](http://www.htslib.org) to uniquify reads. Genomic coverage profiles using [GeneAbacus](https://sr.ht/~vejnar/GeneAbacus). Peak-calling using [MACS3](https://github.com/macs3-project/MACS) employing a *user-defined step/function*. |\n\nFollowing demonstrates how to apply `mrna_seq.json` pipeline. It requires:\n* [LabxDB](https://labxdb.vejnar.org)\n* FASTQ files for sample named `AGR000850` and `AGR000912`\n ```\n /plus/data/seq/by_run/AGR000850\n \u251c\u2500\u2500 23_009_R1.fastq.zst\n \u2514\u2500\u2500 23_009_R2.fastq.zst\n /plus/data/seq/by_run/AGR000912\n \u251c\u2500\u2500 65_009_R1.fastq.zst\n \u2514\u2500\u2500 65_009_R2.fastq.zst\n ```\n\nNote: `mrna_seq_no_db.json` demonstrates how to use LabxPipe *without* LabxDB: it only requires FASTQ files (in `path_seq_run` directory, see above).\n\nRequirements:\n* [LabxDB](https://labxdb.vejnar.org). Alternatively, `mrna_seq_no_db.json` doesn't require LabxDB.\n* [ReadKnead](https://sr.ht/~vejnar/ReadKnead) to trim reads.\n* [STAR](https://github.com/alexdobin/STAR) and genome index in directory defined `path_star_index`.\n* [GeneAbacus](https://sr.ht/~vejnar/GeneAbacus) to count reads and generate genomic profile for tracks.\n\n1. Start pipeline:\n ```bash\n lxpipe run --pipeline mrna_seq.json \\\n --worker 2 \\\n --processor 16\n ```\n Output is written in `path_output` directory.\n2. Create report:\n ```bash\n lxpipe report --pipeline mrna_seq.json\n ```\n Report file `mrna_seq.xlsx` should be created in same directory as `mrna_seq.json`.\n3. Extract output file(s) to use them directly, for instance to load them in IGV. For example:\n * To extract BAM files and rename them using the sample label:\n ```bash\n lxpipe extract --pipeline mrna_seq.json \\\n --files aligning,accepted_hits.sam.zst \\\n --label\n ```\n * To extract BigWig profile files and rename them using the sample label and reference in addition to the original filename used as filename suffix:\n ```bash\n lxpipe extract --pipeline mrna_seq.json \\\n --files profiling,genome_plus.bw \\\n --label \\\n --reference \\\n --suffix\n ```\n Use `-d`/`--dry_run` to test the extract command before applying it.\n4. Merge gene/mRNA counts generated by [GeneAbacus](https://sr.ht/~vejnar/GeneAbacus) in `counting` directory:\n ```bash\n lxpipe merge-count --pipeline mrna_seq.json \\\n --step counting\n ```\n5. Create a trackhub. Requirements:\n * [ChromosomeMappings](https://github.com/dpryan79/ChromosomeMappings) file (to map chromosome names from Ensembl/NCBI to UCSC)\n * Tabulated file (with chromosome name and length)\n\n Execute in a separate directory:\n ```bash\n lxpipe trackhub --runs AGR000850,AGR000912 \\\n --species_ucsc danRer11 \\\n --path_genome /plus/scratch/sai/annots/danrer_genome_all_ensembl_grcz11_ucsc_chroms_chrom_length.tab \\\n --path_mapping /plus/scratch/sai/annots/ChromosomeMappings/GRCz11_ensembl2UCSC.txt \\\n --input_sam \\\n --bam_names accepted_hits.sam.zst \\\n --make_config \\\n --make_trackhub \\\n --make_bigwig \\\n --processor 16\n ```\n Directory is ready to be shared by a web server for display in the [UCSC genome browser](https://genome.ucsc.edu/cgi-bin/hgHubConnect).\n\n## Configuration\n\nParameters can be defined [globally](https://labxdb.vejnar.org/doc/install/python/#configuration). See in `config` directory of this repository for examples.\n\n## Writing pipelines\n\nParameters are defined first globally (see above), then per pipeline, then per replicate/run, and then per step/function. The latest definition takes precedence: `path_seq_run` defined in `/etc/hts/labxpipe.json` is used by default, but if `path_seq_run` is defined in the pipeline file, it will be used instead.\n\nMain parameters\n\n| Parameter | Type |\n| ------------------- | ------------- |\n| name | string |\n| path_output | string |\n| path_seq_run | string |\n| path_local_steps | string |\n| path_annots | string |\n| path_bowtie2_index | string |\n| path_bwa-mem2_index | string |\n| path_minimap2_index | string |\n| path_star_index | string |\n| fastq_exts | []strings |\n| adaptors | {} |\n| logging_level | string |\n| run_refs | []strings |\n| replicate_refs | []strings |\n| ref_info_source | []strings |\n| ref_infos | {} |\n| analysis | [{}, {}, ...] |\n\nParameters for all steps\n\n| Parameter | Type |\n| ------------- | ------- |\n| step_name | string |\n| step_function | string |\n| step_desc | string |\n| force | boolean |\n\nStep-specific parameters\n\n| Step | Synonym | Parameter | Type |\n| ------------------ | ---------------- | --------------------- | ------------- |\n| readknead | preparing | options | []strings |\n| | | ops_r1 | [{}, {}, ...] |\n| | | ops_r2 | [{}, {}, ...] |\n| | | plot_fastq_in | boolean |\n| | | plot_fastq | boolean |\n| | | fastq_out | boolean |\n| | | zip_fastq_out | string |\n| bowtie2 | genomic_aligning | options | []strings |\n| | | index | string |\n| | | output | string |\n| | | output_unfiltered | string |\n| | | compress_sam | boolean |\n| | | compress_sam_cmd | string |\n| | | create_bam\u25c6 | boolean |\n| | | index_bam\u25c6 | boolean |\n| bwa-mem2 | | options | []strings |\n| | | index | string |\n| | | output | string |\n| | | compress_output | boolean |\n| | | compress_output_cmd | string |\n| | | create_bam\u25c6 | boolean |\n| | | index_bam\u25c6 | boolean |\n| minimap2 | | options | []strings |\n| | | index | string |\n| | | output | string |\n| | | compress_output | boolean |\n| | | compress_output_cmd | string |\n| | | create_bam\u25c6 | boolean |\n| | | index_bam\u25c6 | boolean |\n| star | aligning | options | []strings |\n| | | index | string |\n| | | output_type | []strings |\n| | | compress_sam | boolean |\n| | | compress_sam_cmd | string |\n| | | compress_unmapped | boolean |\n| | | compress_unmapped_cmd | string |\n| cufflinks | | options | []strings |\n| | | inputs | [{}, {}, ...] |\n| | | features | [{}, {}, ...] |\n| geneabacus | counting | options | []strings |\n| | | inputs | [{}, {}, ...] |\n| | | path_annots | string |\n| | | features | [{}, {}, ...] |\n| samtools_sort | | options | []strings |\n| | | sort_by_name_bam | boolean |\n| samtools_uniquify | | options | []strings |\n| | | sort_by_name_bam | boolean |\n| | | index_bam | boolean |\n| cleaning | | steps | [{}, {}, ...] |\n\n\u25c6 indicates exclusive options. For example, either `create_bam` or `index_bam` can be used, but not both.\n\nSample-specific parameters. Automatically populated if using LabxDB or sourced from `ref_infos`. These parameters can be changed manually in any step (for example setting `paired` to `false` will ignore second reads in that step).\n\n| Parameter | Type |\n| -------------- | ------- |\n| label_short | string |\n| paired | boolean |\n| directional | boolean |\n| r1_strand | string |\n| quality_scores | string |\n\n## User-defined step\n\nIn addition to the provided steps/functions, i.e. `bowtie2`, `star` or `geneabacus`, users can defined their own step, usable in the LabxPipe pipelines. LabxPipe will import user-defined steps:\n* Written in Python\n* One step per file with the `.py` extension located in the directory defined by `path_local_steps`\n* Each step defined in individual file requires:\n 1. A `functions` variable listing the step name(s)\n 2. A function named `run` with the 3 parameters `path_in`, `path_out` and `params`\n\n For example:\n ```python\n functions = ['macs3']\n def run(path_in, path_out, params):\n ...\n ```\n\nExample of a user-defined function providing peak-calling using [MACS3](https://github.com/macs3-project/MACS) is available in `config/user_steps/macs3.py` in this repository.\n\nExample of a pipeline using the [MACS3](https://github.com/macs3-project/MACS) step is available in `config/pipelines/chip_seq_user_function.json` in this repository.\n\n## Demultiplexing sequencing reads: `lxpipe demultiplex`\n\n* Demultiplex reads based on barcode sequences from the `Second barcode` field in [LabxDB](https://labxdb.vejnar.org)\n* Demultiplexing using [ReadKnead](https://sr.ht/~vejnar/ReadKnead). The most important for demultiplexing is the ReadKnead pipeline. Pipelines are identified using the `Adapter 3'` field in LabxDB.\n\n* Example for simple demultiplexing. The first nucleotides at the 5' end of read 1 are used as barcodes (the `Adapter 3'` field is set to `sRNA 1.5` in LabxDB for these samples) with the following pipeline:\n ```json\n {\n \"sRNA 1.5\": {\n \"R1\": [\n {\n \"name\": \"demultiplex\",\n \"end\": 5,\n \"max_mismatch\": 1\n }\n ],\n \"R2\": null\n }\n }\n ```\n The barcode sequences are added by LabxPipe using the `Second barcode` field in [LabxDB](https://labxdb.vejnar.org).\n\n* Example for iCLIP demultiplexing. In [Vejnar et al.](https://pubmed.ncbi.nlm.nih.gov/31227602), iCLIP is demultiplexed (the `Adapter 3'` field is set to `TruSeq-DMS+A Index` in LabxDB for these samples) using the following pipeline:\n ```json\n {\n \"TruSeq-DMS+A Index\": {\n \"R1\": [\n {\n \"name\": \"clip\",\n \"end\": 5,\n \"length\": 4,\n \"add_clipped\": true\n },\n {\n \"name\": \"trim\",\n \"end\": 3,\n \"algo\": \"bktrim\",\n \"min_sequence\": 5,\n \"keep\": [\"trim_exact\", \"trim_align\"]\n },\n {\n \"name\": \"length\",\n \"min_length\": 6\n },\n {\n \"name\": \"demultiplex\",\n \"end\": 3,\n \"max_mismatch\": 1,\n \"length_ligand\": 2\n },\n {\n \"name\": \"length\",\n \"min_length\": 15\n }\n ],\n \"R2\": null\n }\n }\n ```\n Pipeline is stored in `demux_truseq_dms_a.json`. The barcode sequences are added by LabxPipe using the `Second barcode` field in [LabxDB](https://labxdb.vejnar.org). (NB: published demultiplexed data were generated using `\"algo\": \"align\"` with a minimum score of 80 instead of `\"algo\": \"bktrim\"`)\n\n Then pipeline was tested running:\n ```bash\n lxpipe demultiplex --bulk HHYLKADXX \\\n --path_demux_ops demux_truseq_dms_a.json \\\n --path_seq_prepared prepared \\\n --demux_nozip \\\n --processor 1 \\\n --demux_verbose_level 20 \\\n --no_readonly\n ```\n This output is **very verbose**: for every read, output from every step of the demultiplexing pipeline is reported. To get consistent output, `--processor` must be set to `1`. Output is written in local directory `prepared`.\n\n And finally, once pipeline is validated (data is written in `path_seq_prepared` directory, see [here](https://labxdb.vejnar.org/doc/install/python/#configuration)):\n ```bash\n lxpipe demultiplex --bulk HHYLKADXX \\\n --path_demux_ops demux_truseq_dms_a.json \\\n --processor 10\n ```\n\n## License\n\n*LabxPipe* is distributed under the Mozilla Public License Version 2.0 (see /LICENSE).\n\nCopyright \u00a9 2013-2023 Charles E. Vejnar\n",
"bugtrack_url": null,
"license": "Mozilla Public License 2.0 (MPL 2.0)",
"summary": "Genomics pipelines",
"version": "0.7.1",
"project_urls": {
"homepage": "https://git.sr.ht/~vejnar/LabxPipe"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "0c66b80e5f1650286cdc8b1f3fe15cf773742cd1df84756a9bb15d41a0045ca9",
"md5": "2581f5aa748effcdb60870e21fc8d621",
"sha256": "74348cd7a0e8c6d349d4423734cc4162798b7734e925fa3e9b1977e2705bab5c"
},
"downloads": -1,
"filename": "labxpipe-0.7.1.tar.gz",
"has_sig": false,
"md5_digest": "2581f5aa748effcdb60870e21fc8d621",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 58199,
"upload_time": "2023-08-01T22:43:31",
"upload_time_iso_8601": "2023-08-01T22:43:31.699224Z",
"url": "https://files.pythonhosted.org/packages/0c/66/b80e5f1650286cdc8b1f3fe15cf773742cd1df84756a9bb15d41a0045ca9/labxpipe-0.7.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-08-01 22:43:31",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "labxpipe"
}