Curare

Name	Curare JSON
Version	0.6.0 JSON
	download
home_page	https://github.com/pblumenkamp/Curare
Summary	Curare: A Customizable and Reproducible Analysis Pipeline for RNA-Seq Experiments
upload_time	2024-02-27 19:05:00
maintainer
docs_url	None
author	Patrick Blumenkamp
requires_python	>=3.10
license	GPLv3
keywords	bioinformatics rna-seq differential-gene-expression transcriptomics
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            [![License: GPL v3](https://img.shields.io/badge/License-GPL%20v3-brightgreen.svg)](https://github.com/pblumenkamp/curare/blob/master/LICENSE.txt)
![Don't judge me](https://img.shields.io/badge/Language-Python-blue.svg)
![Don't judge me](https://img.shields.io/badge/Language-Snakemake-blue.svg)
![Release](https://img.shields.io/github/release/pblumenkamp/curare.svg)
[![PyPI](https://img.shields.io/pypi/v/Curare.svg)](https://pypi.org/project/Curare)
[![Conda](https://img.shields.io/conda/v/bioconda/curare.svg)](https://bioconda.github.io/recipes/curare/README.html)

# Curare - A Customizable and Reproducible Analysis Pipeline for RNA-Seq Experiments

## Contents
- [Description](#description)
- [Features](#features)
- [Usage](#usage)
  - [Installation](#installation)
  - [Execute Curare](#creating-a-pipeline)
- [Availability](#availability)
- [Citation](#citation)
- [FAQ](#faq)

## Description

Curare is a freely available analysis pipeline for reproducible, high-throughput RNA-Seq experiments. Define standardized pipelines customized for your specific workflow without the necessity of installing all the tools by yourself.   

Curare is implemented in Python and uses the power of [Snakemake](https://snakemake.readthedocs.io/) and [Conda](https://docs.conda.io/projects/conda/en/latest/index.html) to build and execute the defined workflows. Its modularized structure and the simplicity of Snakemake enables developers to create new and advanced workflow steps. 

## Features
Curare was developed to simplify the automized execution of RNA-Seq workflows on large datasets. Each workflow can be divided into four steps: Preprocessing, Premapping, Mapping, and Analysis.

##### Available modules
+ Preprocessing
  + Trim-Galore
  + Fastp
+ Premapping
  + FastQC
  + MultiQC
+ Mapping
  + Bowtie
  + Bowtie2
  + BWA-Backtrack
  + BWA-MEM
  + BWA-MEM2
  + BWA-SW
  + Minimap2
  + Segemehl
  + Star
+ Analysis
  + Count Table (FeatureCounts)
  + DGE Analysis (DESeq2)
  + Normalized Coverage

### Results Report
At the end of a Curare run, you will also get an HTML report containing the most important results and an overview of all used settings. The start page will include Curare statistics, the runtime of this analysis, sample descriptions, and all dependencies of the tools used in this pipeline. From the navigation bar at the top, you can then navigate to the specific reports of each module with detailed charts and many statistics. (Images created with Curare using the data from: Banerjee R et al., "Tailoring a Global Iron Regulon to a Uropathogen.", mBio, 2020 Mar 24;11(2))
![start_page](https://user-images.githubusercontent.com/9703726/144844060-5acc6f29-fcaf-446d-9dd8-27c92bf33269.png)

![Curare_Statistics](https://user-images.githubusercontent.com/9703726/145035029-5dd70610-10d2-45e1-8c1f-0334522c2625.png)


## Usage
### Installation
It is recommended to use [mamba](https://mamba.readthedocs.io/en/latest/installation/mamba-installation.html) for installing curare with all dependencies. Please notice that the channels bioconda **and** conda-forge are required for installing Curare correctly.
```bash
mamba create -n curare -c conda-forge -c bioconda curare
conda activate curare
curare --help
```

Alternatively, you can install Curare via GitHub release and install all dependencies in conda_environment.yaml manually. 

```bash
mamba env create -n curare -f /path/to/Curare/conda_environment.yaml
conda activate curare
bin/curare --help
```

If you want to run Curare completely without Conda, you would also need to install the dependencies of all used Curare modules. You can find the dependencies of every module at `curare/snakefiles/<module_category>/<module>/lib/conda_env.yaml`. To omit the usage of conda/mamba, use the `--no-conda` option when starting Curare.

If you want to test Curare, you can download our [example dataset](https://curare-dataset.s3.computational.bio.uni-giessen.de/curare-user-testcase.zip) with a pre-built configuration.

### Creating a pipeline
The easiest way to create a new pipeline is by using the Curare wizard. It will guide through all steps, asks what module you wish to use and creates the two necessary files (`samples.tsv` and `pipeline.yml`). These files can then be edited with a standard file editor for customizing your data and analysis.  
```bash
# Current working directory is inside of tool directory
curare_wizard --samples target_directory/samples.tsv --pipeline target_directory/pipeline.yml
```

### Samples File
The samples file (`samples.tsv`) is a tab-separated file collecting all necessary information about the used biological samples. This includes a unique identifier (`name`), a file path to the sequencing data (`forward_reads`/`reverse reads` on paired-end data, `reads` on single-end data), and depending on used modules further information like the condition. Every line starting with a # is a comment line and will be ignored by Curare. These lines are just helpful information for correctly writing this file.

**Name column**: A unique identifier used throughout the whole pipeline for this sample. To prevent any side-effects on the file system or in the used scripts, only alphanumerical characters and '_' are allowed for sample names.

**Reads columns**: Here, you define the file path to the responding sequencing data. For paired-end datasets, two columns must be set (`forward_reads` and `reverse_reads`), for single-end data only one column (`reads`). The file path can either be a relative path (relative to this file) or an absolute path (starting with '/').

**Additional columns**: Every selected module can define additional columns. The Curare wizard automatically creates a `samples.tsv` containing all required columns. So just fill out all open fields in the generated file, and everything will work. You can find a description of all additional columns in the header of the created file.

**Example**         
```tsv
# name: Unique sample name. Only use alphanumeric characters and '_'. [Value Type: String]
# forward_reads: File path to fastq file containing forward reads. Either as an absolute path or relative to this file. [Value Type: Path]
# reverse_reads: File path to fastq file containing reverse reads. Either as an absolute path or relative to this file. [Value Type: Path]
# condition: Condition name of the sequencing run. May contain [A-Z, a-z, 0-9, _;!@^(),.[]-, Whitespace]. [Value Type: String]

name    forward_reads   reverse_reads   condition
wt_1    data/wt_1_R1.fastq  data/wt_1_R2.fastq  WT
wt_2    data/wt_2_R1.fastq  data/wt_2_R2.fastq  WT
wt_3    data/wt_3_R1.fastq  data/wt_3_R2.fastq  WT
heat_1  data/wt_1_R1.fastq  data/wt_1_R2.fastq  Heat
heat_2  data/wt_2_R1.fastq  data/wt_2_R2.fastq  Heat
heat_3  data/wt_3_R1.fastq  data/wt_3_R2.fastq  Heat
starvation_1    data/wt_1_R1.fastq  data/wt_1_R2.fastq  Starvation
starvation_2    data/wt_2_R1.fastq  data/wt_2_R2.fastq  Starvation
starvation_3    data/wt_3_R1.fastq  data/wt_3_R2.fastq  Starvation
```

### Pipeline File
The pipeline file (`pipeline.yml`) defines the used modules and their parameters in the newly created workflow. As a typical YAML file, everything is structured in categories. There are categories for each workflow step (`preprocessing`, `premapping`, `mapping`, and `analysis`) and the main category for the whole pipeline (`pipeline`). Each of the four workflow categories has a parameter `modules` defining the used modules in this step. Since many modules need additional information, like a file path to the reference genome or a quality threshold, modules have their own block in their category for specifying these values. 

One differentiates between mandatory and optional settings. Mandatory settings follow the structure `gff_feature_type: <Insert Config Here>`. It is necessary to replace `<Insert Config Here>` with a real value, e.g. "CDS" (like in the `samples.tsv`, the file path can either be relative to this file or absolute). Optional settings are commented out with a single #. For using other values than the default value, just remove the # and write your parameter after the colon.

**Example**  
```yaml
## Curare Pipeline File
## This is an automatically created pipeline file for Curare.
## All required parameters must be set (replace <Insert Config Here> with real value).
## All optional parameters are commented out with a single '#'. For including these parameters, just remove the '#'.

pipeline:
  paired_end: true

preprocessing:
  modules: ["trimgalore"]

  trimgalore:
    ## Choose phred33 (Sanger/Illumina 1.9+ encoding) or phred64 (Illumina 1.5 encoding). [Value Type: Enum]
    #phred_score_type: "--phred33"

    ## Trim low-quality ends from reads in addition to adapter removal. (Default: 20). [Value Type: Number]
    #quality_threshold: "20"

    ## Discard reads that became shorter than length INT because of either quality or adapter trimming. (Default: 20). [Value Type: Number]
    #min_length: "20"

    ## Additional options to use in shell command. [Value Type: String]
    #additional_parameter: ""

    ## Adapter sequence to be trimmed. If not specified explicitly, Trim Galore will try to auto-detect whether the Illumina universal, Nextera transposase, or Illumina small RNA adapter sequence was used. [Value Type: String]
    #adapter_forward: ""

    ## adapter sequence to be trimmed off read 2 of paired-end files. [Value Type: String]
    #adapter_reverse: ""


premapping:
  modules: ["multiqc"]

mapping:
  modules: ["bowtie2"]

  bowtie2:
    ## Path to reference genome fasta file. [Value Type: File_input]
    genome_fasta: <Insert Config Here>

    ## Choose local or end-to-end alignment. [Value Type: Enum]
    ## Enum choices: "local", "end-to-end"
    alignment_type: <Insert Config Here>

    ## Additional options to use in shell command. [Value Type: String]
    #additional_bowtie2_options: ""


analysis:
  modules: ["dge_analysis"]

  dge_analysis:
    ## Used feature type, e.g. gene or exon. [Value Type: String]
    gff_feature_type: <Insert Config Here>

    ## Descriptor for gene name, e.g. ID or gene_id. [Value Type: String]
    gff_feature_name: <Insert Config Here>

    ## File path to gff file. [Value Type: File_input]
    gff_path: <Insert Config Here>

    ## Additional options to use in shell command. [Value Type: String]
    #additional_featcounts_options: ""

    ## Strand specificity of reads. Specifies if reads must lie on the same strand as the feature, the opposite strand, or can be on both. Options: "unstranded, stranded, reversely_stranded"
    #strand_specificity: "unstranded"

    ## GFF attributes to show in the beginning of the xlsx summary (Comma-separated list, e.g. "experiment, product, Dbxref"). [Value Type: String]
    #attribute_columns: ""
```
<details>
  <summary>Filled pipeline file</summary>

```yaml
## Curare Pipeline File
## This is an automatically created pipeline file for Curare.
## All required parameters must be set (replace <Insert Config Here> with real value).
## All optional parameters are commented out with a single '#'. For including these parameters, just remove the '#'.

pipeline:
  paired_end: true

preprocessing:
  modules: ["trimgalore"]

  trimgalore:
    ## Choose phred33 (Sanger/Illumina 1.9+ encoding) or phred64 (Illumina 1.5 encoding). [Value Type: Enum]
    phred_score_type: "--phred64"

    ## Trim low-quality ends from reads in addition to adapter removal. (Default: 20). [Value Type: Number]
    quality_threshold: "35"

    ## Discard reads that became shorter than length INT because of either quality or adapter trimming. (Default: 20). [Value Type: Number]
    #min_length: "20"

    ## Additional options to use in shell command. [Value Type: String]
    #additional_parameter: ""

    ## Adapter sequence to be trimmed. If not specified explicitly, Trim Galore will try to auto-detect whether the Illumina universal, Nextera transposase, or Illumina small RNA adapter sequence was used. [Value Type: String]
    #adapter_forward: ""

    ## adapter sequence to be trimmed off read 2 of paired-end files. [Value Type: String]
    #adapter_reverse: ""


premapping:
  modules: ["multiqc"]

mapping:
  modules: ["bowtie2"]

  bowtie2:
    ## Path to reference genome fasta file. [Value Type: File_input]
    genome_fasta: "reference/my_genome.fasta"

    ## Choose local or end-to-end alignment. [Value Type: Enum]
    ## Enum choices: "local", "end-to-end"
    alignment_type: "end-to-end"

    ## Additional options to use in shell command. [Value Type: String]
    #additional_bowtie2_options: ""


analysis:
  modules: ["dge_analysis"]

  dge_analysis:
    ## Used feature type, e.g. gene or exon. [Value Type: String]
    gff_feature_type: "CDS"

    ## Descriptor for gene name, e.g. ID or gene_id. [Value Type: String]
    gff_feature_name: "ID"

    ## File path to gff file. [Value Type: File_input]
    gff_path: "reference/my_genome.gff"

    ## Additional options to use in shell command. [Value Type: String]
    #additional_featcounts_options: ""

    ## Strand specificity of reads. Specifies if reads must lie on the same strand as the feature, the opposite strand, or can be on both. Options: "unstranded, stranded, reversely_stranded"
    strand_specificity: "reversely_stranded"

    ## GFF attributes to show in the beginning of the xlsx summary (Comma-separated list, e.g. "experiment, product, Dbxref"). [Value Type: String]
    #attribute_columns: ""
```
</details>

### Starting Curare
Curare can be started with this command: 
```bash
# Current working directory inside of root tool directory
cd curare
conda activate curare
curare --samples <target_directory>/samples.tsv --pipeline <target_directory>/pipeline.yml --output <results_directory>
```

All results, including the conda environments and a final report, will be written in `results_directory`.

### Results
Curare structures all the results by categories and modules. This way each module can create their own structure and is independent from all other modules. For example, the mapping modules generates multiple bam files with various flag filters like unmapped or concordant reads and the differential gene expression module builds large excel files with the most important values and an R object to continue the analysis on your own. (Images: Bowtie2 mapping chart and DESeq2 summary table )

<div style="display: inline-flex; flex-direction: row; justify-content: space-evenly; align-items: center">
  <img src="https://user-images.githubusercontent.com/9703726/145060940-d8dda4b1-7ad5-4f3c-947d-f1381aa9614c.png" width=450>
  <img src="https://user-images.githubusercontent.com/9703726/145061206-5b01b8b7-c81f-4dc4-b893-cf8f7b2a850d.png" width=450>
 </p>


## Availability
Available at Bioconda: https://bioconda.github.io/recipes/curare/README.html

Available at PyPI: https://pypi.org/project/Curare/

Example dataset: [curare-user-testcase.zip](https://curare-dataset.s3.computational.bio.uni-giessen.de/curare-user-testcase.zip)

## Citation
In preparation.

## FAQ
1. How can I use Curare? [Usage](#usage)
2. Contact and support: curare@computational.bio.uni-giessen.de
3. Issues: Bugs and issues can be filed [here](https://github.com/pblumenkamp/curare/issues)

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/pblumenkamp/Curare",
    "name": "Curare",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": "",
    "keywords": "bioinformatics,rna-seq,differential-gene-expression,transcriptomics",
    "author": "Patrick Blumenkamp",
    "author_email": "patrick.blumenkamp@computational.bio.uni-giessen.de",
    "download_url": "https://files.pythonhosted.org/packages/2f/17/e48e29d5654e7e7f59cce3ca9cca2338306a9bf456eb7655aea4cd5ec093/Curare-0.6.0.tar.gz",
    "platform": null,
    "description": "[![License: GPL v3](https://img.shields.io/badge/License-GPL%20v3-brightgreen.svg)](https://github.com/pblumenkamp/curare/blob/master/LICENSE.txt)\n![Don't judge me](https://img.shields.io/badge/Language-Python-blue.svg)\n![Don't judge me](https://img.shields.io/badge/Language-Snakemake-blue.svg)\n![Release](https://img.shields.io/github/release/pblumenkamp/curare.svg)\n[![PyPI](https://img.shields.io/pypi/v/Curare.svg)](https://pypi.org/project/Curare)\n[![Conda](https://img.shields.io/conda/v/bioconda/curare.svg)](https://bioconda.github.io/recipes/curare/README.html)\n\n# Curare - A Customizable and Reproducible Analysis Pipeline for RNA-Seq Experiments\n\n## Contents\n- [Description](#description)\n- [Features](#features)\n- [Usage](#usage)\n  - [Installation](#installation)\n  - [Execute Curare](#creating-a-pipeline)\n- [Availability](#availability)\n- [Citation](#citation)\n- [FAQ](#faq)\n\n## Description\n\nCurare is a freely available analysis pipeline for reproducible, high-throughput RNA-Seq experiments. Define standardized pipelines customized for your specific workflow without the necessity of installing all the tools by yourself.   \n\nCurare is implemented in Python and uses the power of [Snakemake](https://snakemake.readthedocs.io/) and [Conda](https://docs.conda.io/projects/conda/en/latest/index.html) to build and execute the defined workflows. Its modularized structure and the simplicity of Snakemake enables developers to create new and advanced workflow steps. \n\n## Features\nCurare was developed to simplify the automized execution of RNA-Seq workflows on large datasets. Each workflow can be divided into four steps: Preprocessing, Premapping, Mapping, and Analysis.\n\n##### Available modules\n+ Preprocessing\n  + Trim-Galore\n  + Fastp\n+ Premapping\n  + FastQC\n  + MultiQC\n+ Mapping\n  + Bowtie\n  + Bowtie2\n  + BWA-Backtrack\n  + BWA-MEM\n  + BWA-MEM2\n  + BWA-SW\n  + Minimap2\n  + Segemehl\n  + Star\n+ Analysis\n  + Count Table (FeatureCounts)\n  + DGE Analysis (DESeq2)\n  + Normalized Coverage\n\n### Results Report\nAt the end of a Curare run, you will also get an HTML report containing the most important results and an overview of all used settings. The start page will include Curare statistics, the runtime of this analysis, sample descriptions, and all dependencies of the tools used in this pipeline. From the navigation bar at the top, you can then navigate to the specific reports of each module with detailed charts and many statistics. (Images created with Curare using the data from: Banerjee R et al., \"Tailoring a Global Iron Regulon to a Uropathogen.\", mBio, 2020 Mar 24;11(2))\n![start_page](https://user-images.githubusercontent.com/9703726/144844060-5acc6f29-fcaf-446d-9dd8-27c92bf33269.png)\n\n![Curare_Statistics](https://user-images.githubusercontent.com/9703726/145035029-5dd70610-10d2-45e1-8c1f-0334522c2625.png)\n\n\n## Usage\n### Installation\nIt is recommended to use [mamba](https://mamba.readthedocs.io/en/latest/installation/mamba-installation.html) for installing curare with all dependencies. Please notice that the channels bioconda **and** conda-forge are required for installing Curare correctly.\n```bash\nmamba create -n curare -c conda-forge -c bioconda curare\nconda activate curare\ncurare --help\n```\n\nAlternatively, you can install Curare via GitHub release and install all dependencies in conda_environment.yaml manually. \n\n```bash\nmamba env create -n curare -f /path/to/Curare/conda_environment.yaml\nconda activate curare\nbin/curare --help\n```\n\nIf you want to run Curare completely without Conda, you would also need to install the dependencies of all used Curare modules. You can find the dependencies of every module at `curare/snakefiles/<module_category>/<module>/lib/conda_env.yaml`. To omit the usage of conda/mamba, use the `--no-conda` option when starting Curare.\n\nIf you want to test Curare, you can download our [example dataset](https://curare-dataset.s3.computational.bio.uni-giessen.de/curare-user-testcase.zip) with a pre-built configuration.\n\n### Creating a pipeline\nThe easiest way to create a new pipeline is by using the Curare wizard. It will guide through all steps, asks what module you wish to use and creates the two necessary files (`samples.tsv` and `pipeline.yml`). These files can then be edited with a standard file editor for customizing your data and analysis.  \n```bash\n# Current working directory is inside of tool directory\ncurare_wizard --samples target_directory/samples.tsv --pipeline target_directory/pipeline.yml\n```\n\n### Samples File\nThe samples file (`samples.tsv`) is a tab-separated file collecting all necessary information about the used biological samples. This includes a unique identifier (`name`), a file path to the sequencing data (`forward_reads`/`reverse reads` on paired-end data, `reads` on single-end data), and depending on used modules further information like the condition. Every line starting with a # is a comment line and will be ignored by Curare. These lines are just helpful information for correctly writing this file.\n\n**Name column**: A unique identifier used throughout the whole pipeline for this sample. To prevent any side-effects on the file system or in the used scripts, only alphanumerical characters and '_' are allowed for sample names.\n\n**Reads columns**: Here, you define the file path to the responding sequencing data. For paired-end datasets, two columns must be set (`forward_reads` and `reverse_reads`), for single-end data only one column (`reads`). The file path can either be a relative path (relative to this file) or an absolute path (starting with '/').\n\n**Additional columns**: Every selected module can define additional columns. The Curare wizard automatically creates a `samples.tsv` containing all required columns. So just fill out all open fields in the generated file, and everything will work. You can find a description of all additional columns in the header of the created file.\n\n**Example**         \n```tsv\n# name: Unique sample name. Only use alphanumeric characters and '_'. [Value Type: String]\n# forward_reads: File path to fastq file containing forward reads. Either as an absolute path or relative to this file. [Value Type: Path]\n# reverse_reads: File path to fastq file containing reverse reads. Either as an absolute path or relative to this file. [Value Type: Path]\n# condition: Condition name of the sequencing run. May contain [A-Z, a-z, 0-9, _;!@^(),.[]-, Whitespace]. [Value Type: String]\n\nname    forward_reads   reverse_reads   condition\nwt_1    data/wt_1_R1.fastq  data/wt_1_R2.fastq  WT\nwt_2    data/wt_2_R1.fastq  data/wt_2_R2.fastq  WT\nwt_3    data/wt_3_R1.fastq  data/wt_3_R2.fastq  WT\nheat_1  data/wt_1_R1.fastq  data/wt_1_R2.fastq  Heat\nheat_2  data/wt_2_R1.fastq  data/wt_2_R2.fastq  Heat\nheat_3  data/wt_3_R1.fastq  data/wt_3_R2.fastq  Heat\nstarvation_1    data/wt_1_R1.fastq  data/wt_1_R2.fastq  Starvation\nstarvation_2    data/wt_2_R1.fastq  data/wt_2_R2.fastq  Starvation\nstarvation_3    data/wt_3_R1.fastq  data/wt_3_R2.fastq  Starvation\n```\n\n### Pipeline File\nThe pipeline file (`pipeline.yml`) defines the used modules and their parameters in the newly created workflow. As a typical YAML file, everything is structured in categories. There are categories for each workflow step (`preprocessing`, `premapping`, `mapping`, and `analysis`) and the main category for the whole pipeline (`pipeline`). Each of the four workflow categories has a parameter `modules` defining the used modules in this step. Since many modules need additional information, like a file path to the reference genome or a quality threshold, modules have their own block in their category for specifying these values. \n\nOne differentiates between mandatory and optional settings. Mandatory settings follow the structure `gff_feature_type: <Insert Config Here>`. It is necessary to replace `<Insert Config Here>` with a real value, e.g. \"CDS\" (like in the `samples.tsv`, the file path can either be relative to this file or absolute). Optional settings are commented out with a single #. For using other values than the default value, just remove the # and write your parameter after the colon.\n\n**Example**  \n```yaml\n## Curare Pipeline File\n## This is an automatically created pipeline file for Curare.\n## All required parameters must be set (replace <Insert Config Here> with real value).\n## All optional parameters are commented out with a single '#'. For including these parameters, just remove the '#'.\n\npipeline:\n  paired_end: true\n\npreprocessing:\n  modules: [\"trimgalore\"]\n\n  trimgalore:\n    ## Choose phred33 (Sanger/Illumina 1.9+ encoding) or phred64 (Illumina 1.5 encoding). [Value Type: Enum]\n    #phred_score_type: \"--phred33\"\n\n    ## Trim low-quality ends from reads in addition to adapter removal. (Default: 20). [Value Type: Number]\n    #quality_threshold: \"20\"\n\n    ## Discard reads that became shorter than length INT because of either quality or adapter trimming. (Default: 20). [Value Type: Number]\n    #min_length: \"20\"\n\n    ## Additional options to use in shell command. [Value Type: String]\n    #additional_parameter: \"\"\n\n    ## Adapter sequence to be trimmed. If not specified explicitly, Trim Galore will try to auto-detect whether the Illumina universal, Nextera transposase, or Illumina small RNA adapter sequence was used. [Value Type: String]\n    #adapter_forward: \"\"\n\n    ## adapter sequence to be trimmed off read 2 of paired-end files. [Value Type: String]\n    #adapter_reverse: \"\"\n\n\npremapping:\n  modules: [\"multiqc\"]\n\nmapping:\n  modules: [\"bowtie2\"]\n\n  bowtie2:\n    ## Path to reference genome fasta file. [Value Type: File_input]\n    genome_fasta: <Insert Config Here>\n\n    ## Choose local or end-to-end alignment. [Value Type: Enum]\n    ## Enum choices: \"local\", \"end-to-end\"\n    alignment_type: <Insert Config Here>\n\n    ## Additional options to use in shell command. [Value Type: String]\n    #additional_bowtie2_options: \"\"\n\n\nanalysis:\n  modules: [\"dge_analysis\"]\n\n  dge_analysis:\n    ## Used feature type, e.g. gene or exon. [Value Type: String]\n    gff_feature_type: <Insert Config Here>\n\n    ## Descriptor for gene name, e.g. ID or gene_id. [Value Type: String]\n    gff_feature_name: <Insert Config Here>\n\n    ## File path to gff file. [Value Type: File_input]\n    gff_path: <Insert Config Here>\n\n    ## Additional options to use in shell command. [Value Type: String]\n    #additional_featcounts_options: \"\"\n\n    ## Strand specificity of reads. Specifies if reads must lie on the same strand as the feature, the opposite strand, or can be on both. Options: \"unstranded, stranded, reversely_stranded\"\n    #strand_specificity: \"unstranded\"\n\n    ## GFF attributes to show in the beginning of the xlsx summary (Comma-separated list, e.g. \"experiment, product, Dbxref\"). [Value Type: String]\n    #attribute_columns: \"\"\n```\n<details>\n  <summary>Filled pipeline file</summary>\n\n```yaml\n## Curare Pipeline File\n## This is an automatically created pipeline file for Curare.\n## All required parameters must be set (replace <Insert Config Here> with real value).\n## All optional parameters are commented out with a single '#'. For including these parameters, just remove the '#'.\n\npipeline:\n  paired_end: true\n\npreprocessing:\n  modules: [\"trimgalore\"]\n\n  trimgalore:\n    ## Choose phred33 (Sanger/Illumina 1.9+ encoding) or phred64 (Illumina 1.5 encoding). [Value Type: Enum]\n    phred_score_type: \"--phred64\"\n\n    ## Trim low-quality ends from reads in addition to adapter removal. (Default: 20). [Value Type: Number]\n    quality_threshold: \"35\"\n\n    ## Discard reads that became shorter than length INT because of either quality or adapter trimming. (Default: 20). [Value Type: Number]\n    #min_length: \"20\"\n\n    ## Additional options to use in shell command. [Value Type: String]\n    #additional_parameter: \"\"\n\n    ## Adapter sequence to be trimmed. If not specified explicitly, Trim Galore will try to auto-detect whether the Illumina universal, Nextera transposase, or Illumina small RNA adapter sequence was used. [Value Type: String]\n    #adapter_forward: \"\"\n\n    ## adapter sequence to be trimmed off read 2 of paired-end files. [Value Type: String]\n    #adapter_reverse: \"\"\n\n\npremapping:\n  modules: [\"multiqc\"]\n\nmapping:\n  modules: [\"bowtie2\"]\n\n  bowtie2:\n    ## Path to reference genome fasta file. [Value Type: File_input]\n    genome_fasta: \"reference/my_genome.fasta\"\n\n    ## Choose local or end-to-end alignment. [Value Type: Enum]\n    ## Enum choices: \"local\", \"end-to-end\"\n    alignment_type: \"end-to-end\"\n\n    ## Additional options to use in shell command. [Value Type: String]\n    #additional_bowtie2_options: \"\"\n\n\nanalysis:\n  modules: [\"dge_analysis\"]\n\n  dge_analysis:\n    ## Used feature type, e.g. gene or exon. [Value Type: String]\n    gff_feature_type: \"CDS\"\n\n    ## Descriptor for gene name, e.g. ID or gene_id. [Value Type: String]\n    gff_feature_name: \"ID\"\n\n    ## File path to gff file. [Value Type: File_input]\n    gff_path: \"reference/my_genome.gff\"\n\n    ## Additional options to use in shell command. [Value Type: String]\n    #additional_featcounts_options: \"\"\n\n    ## Strand specificity of reads. Specifies if reads must lie on the same strand as the feature, the opposite strand, or can be on both. Options: \"unstranded, stranded, reversely_stranded\"\n    strand_specificity: \"reversely_stranded\"\n\n    ## GFF attributes to show in the beginning of the xlsx summary (Comma-separated list, e.g. \"experiment, product, Dbxref\"). [Value Type: String]\n    #attribute_columns: \"\"\n```\n</details>\n\n### Starting Curare\nCurare can be started with this command: \n```bash\n# Current working directory inside of root tool directory\ncd curare\nconda activate curare\ncurare --samples <target_directory>/samples.tsv --pipeline <target_directory>/pipeline.yml --output <results_directory>\n```\n\nAll results, including the conda environments and a final report, will be written in `results_directory`.\n\n### Results\nCurare structures all the results by categories and modules. This way each module can create their own structure and is independent from all other modules. For example, the mapping modules generates multiple bam files with various flag filters like unmapped or concordant reads and the differential gene expression module builds large excel files with the most important values and an R object to continue the analysis on your own. (Images: Bowtie2 mapping chart and DESeq2 summary table )\n\n<div style=\"display: inline-flex; flex-direction: row; justify-content: space-evenly; align-items: center\">\n  <img src=\"https://user-images.githubusercontent.com/9703726/145060940-d8dda4b1-7ad5-4f3c-947d-f1381aa9614c.png\" width=450>\n  <img src=\"https://user-images.githubusercontent.com/9703726/145061206-5b01b8b7-c81f-4dc4-b893-cf8f7b2a850d.png\" width=450>\n </p>\n\n\n## Availability\nAvailable at Bioconda: https://bioconda.github.io/recipes/curare/README.html\n\nAvailable at PyPI: https://pypi.org/project/Curare/\n\nExample dataset: [curare-user-testcase.zip](https://curare-dataset.s3.computational.bio.uni-giessen.de/curare-user-testcase.zip)\n\n## Citation\nIn preparation.\n\n## FAQ\n1. How can I use Curare? [Usage](#usage)\n2. Contact and support: curare@computational.bio.uni-giessen.de\n3. Issues: Bugs and issues can be filed [here](https://github.com/pblumenkamp/curare/issues)\n\n\n",
    "bugtrack_url": null,
    "license": "GPLv3",
    "summary": "Curare: A Customizable and Reproducible Analysis Pipeline for RNA-Seq Experiments",
    "version": "0.6.0",
    "project_urls": {
        "Bug Reports": "https://https://github.com/pblumenkamp/Curare/issues",
        "Homepage": "https://github.com/pblumenkamp/Curare",
        "Source": "https://github.com/pblumenkamp/Curare"
    },
    "split_keywords": [
        "bioinformatics",
        "rna-seq",
        "differential-gene-expression",
        "transcriptomics"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "53ff094a7dc15e0c1983c77573a870cf75f0d83b9e814438c056e08c656b4b04",
                "md5": "7d9a76e0500ace36b4311d57166b0e62",
                "sha256": "df501ef7f2cf6402c46efe048a397e3ff1066c5ff09198730b93f9f8769c7183"
            },
            "downloads": -1,
            "filename": "Curare-0.6.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "7d9a76e0500ace36b4311d57166b0e62",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 936355,
            "upload_time": "2024-02-27T19:04:58",
            "upload_time_iso_8601": "2024-02-27T19:04:58.191374Z",
            "url": "https://files.pythonhosted.org/packages/53/ff/094a7dc15e0c1983c77573a870cf75f0d83b9e814438c056e08c656b4b04/Curare-0.6.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2f17e48e29d5654e7e7f59cce3ca9cca2338306a9bf456eb7655aea4cd5ec093",
                "md5": "aa242db130d8615ab458e9f0360e3db9",
                "sha256": "602178bccdd6a4243604f84be5169519a82ff099a8bfa5ff8a1a6550bcf89b34"
            },
            "downloads": -1,
            "filename": "Curare-0.6.0.tar.gz",
            "has_sig": false,
            "md5_digest": "aa242db130d8615ab458e9f0360e3db9",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 43050,
            "upload_time": "2024-02-27T19:05:00",
            "upload_time_iso_8601": "2024-02-27T19:05:00.155140Z",
            "url": "https://files.pythonhosted.org/packages/2f/17/e48e29d5654e7e7f59cce3ca9cca2338306a9bf456eb7655aea4cd5ec093/Curare-0.6.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-27 19:05:00",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "pblumenkamp",
    "github_project": "Curare",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "curare"
}

Patrick Blumenkamp