probe-design

Name	probe-design JSON
Version	0.2.45 JSON
	download
home_page	https://github.com/qverron/probe_design
Summary	Probe design for FISH by Quentin Verron
upload_time	2024-06-04 14:51:54
maintainer	Quentin Verron
docs_url	None
author	Quentin Verron
requires_python	<4.0,>=3.10
license	CC NC BY 4.0
keywords	fish probe design bioinformatics
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Instructions for probe design

> [!CAUTION]
> You may want to add pip as alias for pip3

On your terminal;

```shell

echo "alias pip='pip3'" >> ~/.bashrc
source ~/.bashrc
```

Or simply use `pip3` instead of `pip`

## Installation

- Install **probe_design**  (also installs **ifpd2q**)

```shell
pip install probe_design -U
```

This adds `prb` (short for probe design) as a shell command.

- Install **oligo-melting**

On your terminal;

```shell
pip install git+https://github.com/ggirelli/oligo-melting.git
```



> [!NOTE]
> nHUSH, HUSH and escafish are private repositories

- Install the **dev branch** of [nHUSH](https://github.com/elgw/nHUSH/tree/dev)

- Install [HUSH](https://github.com/elgw/hush)

- Install [escafish](https://github.com/elgw/escafish)

- Install [OligoArrayAux](http://www.unafold.org/Dinamelt/software/oligoarrayaux.php)


## Preparation

### DNA:

- Get the genomic coordinates of the regions of interest

- Get the reference genome

### RNA:

- Get the transcripts of interest

- Get the reference transcriptome

### Notes:

- For DNA probes, the reference genome will be used both to extract
  the sequences of interest and to test probe candidates for
  homology. If different genomes need to be used, follow RNA steps and
  provide the regions of interest directly.
  
- For combined DNA-RNA FISH, the probe sets should be designed with an 
  homology check against both genome and transcriptome.
  
- All the commands below assume you are starting from your project directory

- To make a project directory and change directory to project directory

```shell
mkdir <project_name>
cd <project_name>
```

# Probe design pipeline:

## Alternative 1: Normally repetitive regions.

1. Preparation

Inside the project dirctory.

```shell
prb makedirs
```  

This will create `data` directory and its subdirectories `data/rois` and `data/ref`.

- Upon starting the pipeline, the `data/` folder should only contain
  `data/rois/` and `data/ref/` (and possibly `data/blacklist/`, see 6.). If more folders are included, consider making a back-up or simply removing them.

2. Input file for Region of Intrests (ROIS)

> [!CAUTION]
> 1. your region of interests file MUST be named `all_regions.tsv`
> 2. `all_regions.tsv` MUST follow the [EXAMPLE](probe_design/data/rois/all_regions.tsv) format.
> 3. `all_regions.tsv` MUST be placed within `data/rois` folder.
> 4. 


- List your regions of interest and their coordinates in the input file:
  `data/rois/all_regions.tsv`


3. Download Reference genome

For CHM13 T2T 

```shell
prb get_T2T
```
options
> -p: prefix for the chromosomes ;default: CHM13.T2T
> names will prefix.chromosome.ID.fa where ID stands for chromosome ID i.e., 1-22+X,Y,M

For GRCh38

```shell
prb get_GRC -split
```
>usage: prb get_GRC [-h] [-s {homo_sapiens,mus_musculus}] [-b BUILD] [-r RELEASE] [-d DIR] [-f FILENAME] [-k] [-split]
>
>download ensemble genome
>
>options:<br>
>  -h, --help<br>
> ------> show this help message and exit<br>
>  -s {homo_sapiens,mus_musculus}, --species {homo_sapiens,mus_musculus}<br>
>  -b BUILD, --build BUILD<br>
> ------> the build number of the genome<br>
>  -r RELEASE, --release RELEASE<br>
> ------> release number of the build<br>
>  -d DIR, --dir DIR     destination directory<br>
>  -f FILENAME, --filename FILENAME<br>
> ------> give a specific name to the downloaded file<br>
>  -k, --keep<br>
> ------> whether to keep gzip files<br>
>  -split                <br>
> ------> whether to split into chromosomes

4. Retrieve your region sequences and extract all k-mers of correct length:

```shell
prb get_oligos DNA|RNA [optional: applyGCfilter 0|1]
# Example:
prb get_oligos DNA 1
```

> [!NOTE]
> If indicating `RNA`, the module will assume that the transcript / region
> sequences are already present in the `data/regions` folder. Default: `DNA.


5. Test all k-mers for their homology to other regions in the genome,
   using nHUSH. Instead of running the entire k-mers (of length `L`) at
   once, can be sped up by testing shorter sublength oligos (of length
   l).  `-m` number of mismatches to test for (always use 1 when running
   sublength); `-t` number of threads, `-i` comb size

> [!CAUTION]
> Make sure your Length (-L) here matches with the Length in your all_regions.tsv file

- Full length:

``` shell
prb run_nHUSH -d RNA -L 35 -m 5 -t 40 -i 14
```

- Sublength:

``` shell
prb run_nHUSH -d DNA -L 40 -l 21 -m 3 -t 40 -i 14
```

> prb run_nHUSH -d {DNA|RNA} -L {length} -l (optional){sublength} -m {number of mismatches} -t {threads} i {comb size}


 
  
- In case nHUSH is interrupted before completion, run before continuing:

``` shell
prb unfinished_HUSH
```
  
6. Recapitulate nHUSH results as a score 

``` shell
prb reform_hush_combined DNA|RNA|-RNA length sublength until
```

> e.g., prb reform_hush_combined DNA 40 21 3

(`until` denotes the same number as specified after `-m` when running nHUSH). 

7. Calculate the melting temperature of k-mers and the free energy of
   secondary structure formation:

``` shell
prb melt_secs_parallel (optional DNA(ref) | RNA(rev. compl))   
```

> e.g., prb melt_secs_parallel DNA

7. Generate a black list of abundantly repeated oligos in the reference genome.

> [!NOTE]
> This only needs to be run once per reference genome if not using any 
> exclusion regions! Just save the blacklist folder between runs.

``` shell
prb generate_blacklist -L 40 -c 100
```


> L: oligo length <br>
> c: min abundance to be included in oligo black list

8. Create k-mer database, convert to TSV for querying and attribute
   score to each oligo (based on nHUSH score, GC content, melting
   temperature, homopolymer stretches, secondary structures).

``` shell
prb build-db_BL -f q_bl -m 32 -i 6 -L 40 -c 100 -d 8 -T 72
```

> m: Maximum length of a consecutive match. Default: 24 <br>
> i: Maximum length of a consecutive homopolymer. Default: 6 <br>
> All oligos with a longer consecutive match or homopolymer are stricly excluded. <br>
> L: oligo length <br>
> c: min number of occurrences for an oligo to be counted in black list <br>
> (should match settings used in 6.) <br>
> d: min Hamming distance to an oligo in the blacklist for exclusion  <br>
> T: Target melting temperature. Default: 72C



9. Query the database to get candidate probes:

``` shell
prb cycling_query -s DNA -L 40 -m 8 -c 100 -t 40 -g 500 -greedy
```

**[optional: -greedy. Speed > quality]
[optional: -start 20 -end 100 -step 5]**

To sweep different oligo numbers, otherwise uses the oligo counts provided in `./rois/all_regions.tsv`
        [optional: -stepdown 10]
Number of oligos to decrease probe size with every iteration that does not find enough oligos. Default: 1

Cycling query which generate probe candidates, then checks the resulting oligos using HUSH, removes inacceptable oligos and generate probes again.
If enough oligos cannot be found, design probes with fewer oligos, decreasing with `stepdown` at each step.

10. Summarize the final probes:

```shell
prb summarize_probes_final
```

Some visual elements can be obtained using the following notebooks (TODO!):

``` shell
prb plot_probe_candidates
prb plot_oligos
```

## Alternative 2: Repetitive or repeated regions.

In this alternative, the region (along with any user-indicated repeats)
is masked out from the reference genome used by nHUSH. This way, repeated
oligos that are specific for the ROI can be included in the final probe.

### Warning: This approach occupies a lot more hard drive space!

1. Generate all required subfolders:

``` shell
prb makedirs
```

2. Input file for Region of Intrests (ROIS)

> [!CAUTION]
> 1. your region of interests file MUST be named `all_regions.tsv`
> 2. `all_regions.tsv` MUST follow the [EXAMPLE](probe_design/data/rois/all_regions.tsv) format.
> 3. `all_regions.tsv` MUST be placed within `data/rois` folder.
> 4. 

3. Additional Preparation
- Besides `data/rois/` and `data/ref/`, the pipeline requires an additional
  `data/exclude/` folder containing BED files with the coordinates of sections
  to mask out when running HUSH for each ROI. 

4. Download Reference genome

For CHM13 T2T (advised for repetetive regions)

```shell
prb get_T2T
```
options
> -p: prefix for the chromosomes ;default: CHM13.T2T
> names will prefix.chromosome.ID.fa where ID stands for chromosome ID i.e., 1-22+X,Y,M


5. (UNLESS manually providing exclusion regions)
Exclude regions of interest from HUSH scan.

``` shell
prb generate_exclude
```
- The same sheet template can be used to manually add further regions to exclude.


6. Retrieve your region sequences and extract all k-mers of correct length:

``` shell
# (from Pipeline/)
prb get_oligos DNA|RNA [optional: applyGCfilter 0|1]
# Example:
prb get_oligos DNA
```

   If indicating `RNA`, the module will assume that the transcript / region
   sequences are already present in the `data/regions` folder. Default: `DNA.
   
7. Apply the region exclusion mask on the reference genome.

``` shell
prb exclude_region
```

8. Generate a black list of abundantly repeated oligos in the reference genome.

```shell
prb generate_blacklist -L 40 -c 100
```

Needs to be re-run everytime when using exclusion masks.
L: oligo length; c: min abundance to be included in oligo black list   


9. Test all k-mers for their homology to other regions in the genome,
using nHUSH. Instead of running the entire k-mers (of length `L`) at
once, can be sped up by testing shorter sublength oligos (of length
l).  `-m` number of mismatches to test for (minimum 1 for sublength;
more gives better information but takes longer time);
`-t` number of threads, `-i` comb size

Sublength:

```shell
prb run_nHUSH_excl -d DNA -L 40 -l 21 -m 3 -t 40 -i 14
```

> prb run_nHUSH_excl -d {DNA|RNA} -L {length} -l (optional){sublength} -m {number of mismatches} -t {threads} i {comb size}
  
Note the `_excl` specific to the exclusion mode.  
  
In case nHUSH is interrupted before completion, run before continuing:

```shell
prb unfinished_HUSH
```
  
10. Recapitulate nHUSH results as a score

```shell
# Format:
prb reform_hush_combined DNA|RNA|-RNA length sublength until
# Example:
prb reform_hush_combined DNA 40 21 3
```


(`until` denotes the same number as specified after `-m` when running nHUSH).

11. Calculate the melting temperature of k-mers and the free energy of
   secondary structure formation:

```shell
prb melt_secs_parallel (optional DNA(ref) / RNA(rev. compl))
```
> e.g., prb melt_secs_parallel DNA

12. Create k-mer database, convert to TSV for querying and attribute
   score to each oligo (based on nHUSH score, GC content, melting
   temperature, homopolymer stretches, secondary structures).
   
   Recommended:
   
``` shell
prb build-db_BL -f q_bl -m 32 -i 6 -L 40 -c 100 -d 8 -T 72
```
    
> f: score function  <br>
> d: max Hamming distance to blacklist that is excluded <br>
> L: oligo length <br>
> c: min abundance to be included in oligo blacklist <br>
> i: max identical consecutive base pairs,  <br>
> T: target temperature <br>
> m: max length of consecutive off-target match <br>
  
13. Query the database to get candidate probes:

``` shell
prb cycling_query -s DNA -L 40 -m 8 -c 100 -t 40 -g 500 -stepdown 50 -greedy -excl
```

**[optional: -greedy. Speed > quality]
[optional: -start 20 -end 100 -step 5]**

To sweep different oligo numbers, otherwise uses the oligo counts provided in `./rois/all_regions.tsv`
        [optional: -stepdown 10]
Number of oligos to decrease probe size with every iteration that does not find enough oligos. Default: 1

Cycling query which generate probe candidates, then checks the resulting oligos using HUSH, removes inacceptable oligos and generate probes again.
If enough oligos cannot be found, design probes with fewer oligos, decreasing with `stepdown` at each step.

14. Summarize the final probes:

``` shell
prb summarize_probes_final
```

## Generate probes for ordering

- Select forward, reverse primers and color flaps.
- Add the forward and reverse primer sequences to the probe oligos
- The forward primer to order has the color flap + the forward sequence
- The reverse primer to order has the t7 promoter sequence + the
  rev. compl of the rev sequence in the oligo
- The complete oligos can be uploaded as an Excel file containing the
  oligo names (arbitrary but unique) and the sequences

## TO DO:

- Adapt the code for more flexibility in input/output folders.
- Add a visual report of the probes at the end of the pipeline.
- One-button process!
- Find a way to automatize selecting primer sequences.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/qverron/probe_design",
    "name": "probe-design",
    "maintainer": "Quentin Verron",
    "docs_url": null,
    "requires_python": "<4.0,>=3.10",
    "maintainer_email": "quentin.verron@ki.se",
    "keywords": "FISH, probe, design, bioinformatics",
    "author": "Quentin Verron",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/d7/69/fc683844cfbcdc9b13dc5f6c54978ebef42153769ffca6848fca60ecc363/probe_design-0.2.45.tar.gz",
    "platform": null,
    "description": "# Instructions for probe design\n\n> [!CAUTION]\n> You may want to add pip as alias for pip3\n\nOn your terminal;\n\n```shell\n\necho \"alias pip='pip3'\" >> ~/.bashrc\nsource ~/.bashrc\n```\n\nOr simply use `pip3` instead of `pip`\n\n## Installation\n\n- Install **probe_design**  (also installs **ifpd2q**)\n\n```shell\npip install probe_design -U\n```\n\nThis adds `prb` (short for probe design) as a shell command.\n\n- Install **oligo-melting**\n\nOn your terminal;\n\n```shell\npip install git+https://github.com/ggirelli/oligo-melting.git\n```\n\n\n\n> [!NOTE]\n> nHUSH, HUSH and escafish are private repositories\n\n- Install the **dev branch** of [nHUSH](https://github.com/elgw/nHUSH/tree/dev)\n\n- Install [HUSH](https://github.com/elgw/hush)\n\n- Install [escafish](https://github.com/elgw/escafish)\n\n- Install [OligoArrayAux](http://www.unafold.org/Dinamelt/software/oligoarrayaux.php)\n\n\n## Preparation\n\n### DNA:\n\n- Get the genomic coordinates of the regions of interest\n\n- Get the reference genome\n\n### RNA:\n\n- Get the transcripts of interest\n\n- Get the reference transcriptome\n\n### Notes:\n\n- For DNA probes, the reference genome will be used both to extract\n  the sequences of interest and to test probe candidates for\n  homology. If different genomes need to be used, follow RNA steps and\n  provide the regions of interest directly.\n  \n- For combined DNA-RNA FISH, the probe sets should be designed with an \n  homology check against both genome and transcriptome.\n  \n- All the commands below assume you are starting from your project directory\n\n- To make a project directory and change directory to project directory\n\n```shell\nmkdir <project_name>\ncd <project_name>\n```\n\n# Probe design pipeline:\n\n## Alternative 1: Normally repetitive regions.\n\n1. Preparation\n\nInside the project dirctory.\n\n```shell\nprb makedirs\n```  \n\nThis will create `data` directory and its subdirectories `data/rois` and `data/ref`.\n\n- Upon starting the pipeline, the `data/` folder should only contain\n  `data/rois/` and `data/ref/` (and possibly `data/blacklist/`, see 6.). If more folders are included, consider making a back-up or simply removing them.\n\n2. Input file for Region of Intrests (ROIS)\n\n> [!CAUTION]\n> 1. your region of interests file MUST be named `all_regions.tsv`\n> 2. `all_regions.tsv` MUST follow the [EXAMPLE](probe_design/data/rois/all_regions.tsv) format.\n> 3. `all_regions.tsv` MUST be placed within `data/rois` folder.\n> 4. \n\n\n- List your regions of interest and their coordinates in the input file:\n  `data/rois/all_regions.tsv`\n\n\n3. Download Reference genome\n\nFor CHM13 T2T \n\n```shell\nprb get_T2T\n```\noptions\n> -p: prefix for the chromosomes ;default: CHM13.T2T\n> names will prefix.chromosome.ID.fa where ID stands for chromosome ID i.e., 1-22+X,Y,M\n\nFor GRCh38\n\n```shell\nprb get_GRC -split\n```\n>usage: prb get_GRC [-h] [-s {homo_sapiens,mus_musculus}] [-b BUILD] [-r RELEASE] [-d DIR] [-f FILENAME] [-k] [-split]\n>\n>download ensemble genome\n>\n>options:<br>\n>  -h, --help<br>\n> ------> show this help message and exit<br>\n>  -s {homo_sapiens,mus_musculus}, --species {homo_sapiens,mus_musculus}<br>\n>  -b BUILD, --build BUILD<br>\n> ------> the build number of the genome<br>\n>  -r RELEASE, --release RELEASE<br>\n> ------> release number of the build<br>\n>  -d DIR, --dir DIR     destination directory<br>\n>  -f FILENAME, --filename FILENAME<br>\n> ------> give a specific name to the downloaded file<br>\n>  -k, --keep<br>\n> ------> whether to keep gzip files<br>\n>  -split                <br>\n> ------> whether to split into chromosomes\n\n4. Retrieve your region sequences and extract all k-mers of correct length:\n\n```shell\nprb get_oligos DNA|RNA [optional: applyGCfilter 0|1]\n# Example:\nprb get_oligos DNA 1\n```\n\n> [!NOTE]\n> If indicating `RNA`, the module will assume that the transcript / region\n> sequences are already present in the `data/regions` folder. Default: `DNA.\n\n\n5. Test all k-mers for their homology to other regions in the genome,\n   using nHUSH. Instead of running the entire k-mers (of length `L`) at\n   once, can be sped up by testing shorter sublength oligos (of length\n   l).  `-m` number of mismatches to test for (always use 1 when running\n   sublength); `-t` number of threads, `-i` comb size\n\n> [!CAUTION]\n> Make sure your Length (-L) here matches with the Length in your all_regions.tsv file\n\n- Full length:\n\n``` shell\nprb run_nHUSH -d RNA -L 35 -m 5 -t 40 -i 14\n```\n\n- Sublength:\n\n``` shell\nprb run_nHUSH -d DNA -L 40 -l 21 -m 3 -t 40 -i 14\n```\n\n> prb run_nHUSH -d {DNA|RNA} -L {length} -l (optional){sublength} -m {number of mismatches} -t {threads} i {comb size}\n\n\n \n  \n- In case nHUSH is interrupted before completion, run before continuing:\n\n``` shell\nprb unfinished_HUSH\n```\n  \n6. Recapitulate nHUSH results as a score \n\n``` shell\nprb reform_hush_combined DNA|RNA|-RNA length sublength until\n```\n\n> e.g., prb reform_hush_combined DNA 40 21 3\n\n(`until` denotes the same number as specified after `-m` when running nHUSH). \n\n7. Calculate the melting temperature of k-mers and the free energy of\n   secondary structure formation:\n\n``` shell\nprb melt_secs_parallel (optional DNA(ref) | RNA(rev. compl))   \n```\n\n> e.g., prb melt_secs_parallel DNA\n\n7. Generate a black list of abundantly repeated oligos in the reference genome.\n\n> [!NOTE]\n> This only needs to be run once per reference genome if not using any \n> exclusion regions! Just save the blacklist folder between runs.\n\n``` shell\nprb generate_blacklist -L 40 -c 100\n```\n\n\n> L: oligo length <br>\n> c: min abundance to be included in oligo black list\n\n8. Create k-mer database, convert to TSV for querying and attribute\n   score to each oligo (based on nHUSH score, GC content, melting\n   temperature, homopolymer stretches, secondary structures).\n\n``` shell\nprb build-db_BL -f q_bl -m 32 -i 6 -L 40 -c 100 -d 8 -T 72\n```\n\n> m: Maximum length of a consecutive match. Default: 24 <br>\n> i: Maximum length of a consecutive homopolymer. Default: 6 <br>\n> All oligos with a longer consecutive match or homopolymer are stricly excluded. <br>\n> L: oligo length <br>\n> c: min number of occurrences for an oligo to be counted in black list <br>\n> (should match settings used in 6.) <br>\n> d: min Hamming distance to an oligo in the blacklist for exclusion  <br>\n> T: Target melting temperature. Default: 72C\n\n\n\n9. Query the database to get candidate probes:\n\n``` shell\nprb cycling_query -s DNA -L 40 -m 8 -c 100 -t 40 -g 500 -greedy\n```\n\n**[optional: -greedy. Speed > quality]\n[optional: -start 20 -end 100 -step 5]**\n\nTo sweep different oligo numbers, otherwise uses the oligo counts provided in `./rois/all_regions.tsv`\n        [optional: -stepdown 10]\nNumber of oligos to decrease probe size with every iteration that does not find enough oligos. Default: 1\n\nCycling query which generate probe candidates, then checks the resulting oligos using HUSH, removes inacceptable oligos and generate probes again.\nIf enough oligos cannot be found, design probes with fewer oligos, decreasing with `stepdown` at each step.\n\n10. Summarize the final probes:\n\n```shell\nprb summarize_probes_final\n```\n\nSome visual elements can be obtained using the following notebooks (TODO!):\n\n``` shell\nprb plot_probe_candidates\nprb plot_oligos\n```\n\n## Alternative 2: Repetitive or repeated regions.\n\nIn this alternative, the region (along with any user-indicated repeats)\nis masked out from the reference genome used by nHUSH. This way, repeated\noligos that are specific for the ROI can be included in the final probe.\n\n### Warning: This approach occupies a lot more hard drive space!\n\n1. Generate all required subfolders:\n\n``` shell\nprb makedirs\n```\n\n2. Input file for Region of Intrests (ROIS)\n\n> [!CAUTION]\n> 1. your region of interests file MUST be named `all_regions.tsv`\n> 2. `all_regions.tsv` MUST follow the [EXAMPLE](probe_design/data/rois/all_regions.tsv) format.\n> 3. `all_regions.tsv` MUST be placed within `data/rois` folder.\n> 4. \n\n3. Additional Preparation\n- Besides `data/rois/` and `data/ref/`, the pipeline requires an additional\n  `data/exclude/` folder containing BED files with the coordinates of sections\n  to mask out when running HUSH for each ROI. \n\n4. Download Reference genome\n\nFor CHM13 T2T (advised for repetetive regions)\n\n```shell\nprb get_T2T\n```\noptions\n> -p: prefix for the chromosomes ;default: CHM13.T2T\n> names will prefix.chromosome.ID.fa where ID stands for chromosome ID i.e., 1-22+X,Y,M\n\n\n5. (UNLESS manually providing exclusion regions)\nExclude regions of interest from HUSH scan.\n\n``` shell\nprb generate_exclude\n```\n- The same sheet template can be used to manually add further regions to exclude.\n\n\n6. Retrieve your region sequences and extract all k-mers of correct length:\n\n``` shell\n# (from Pipeline/)\nprb get_oligos DNA|RNA [optional: applyGCfilter 0|1]\n# Example:\nprb get_oligos DNA\n```\n\n   If indicating `RNA`, the module will assume that the transcript / region\n   sequences are already present in the `data/regions` folder. Default: `DNA.\n   \n7. Apply the region exclusion mask on the reference genome.\n\n``` shell\nprb exclude_region\n```\n\n8. Generate a black list of abundantly repeated oligos in the reference genome.\n\n```shell\nprb generate_blacklist -L 40 -c 100\n```\n\nNeeds to be re-run everytime when using exclusion masks.\nL: oligo length; c: min abundance to be included in oligo black list   \n\n\n9. Test all k-mers for their homology to other regions in the genome,\nusing nHUSH. Instead of running the entire k-mers (of length `L`) at\nonce, can be sped up by testing shorter sublength oligos (of length\nl).  `-m` number of mismatches to test for (minimum 1 for sublength;\nmore gives better information but takes longer time);\n`-t` number of threads, `-i` comb size\n\nSublength:\n\n```shell\nprb run_nHUSH_excl -d DNA -L 40 -l 21 -m 3 -t 40 -i 14\n```\n\n> prb run_nHUSH_excl -d {DNA|RNA} -L {length} -l (optional){sublength} -m {number of mismatches} -t {threads} i {comb size}\n  \nNote the `_excl` specific to the exclusion mode.  \n  \nIn case nHUSH is interrupted before completion, run before continuing:\n\n```shell\nprb unfinished_HUSH\n```\n  \n10. Recapitulate nHUSH results as a score\n\n```shell\n# Format:\nprb reform_hush_combined DNA|RNA|-RNA length sublength until\n# Example:\nprb reform_hush_combined DNA 40 21 3\n```\n\n\n(`until` denotes the same number as specified after `-m` when running nHUSH).\n\n11. Calculate the melting temperature of k-mers and the free energy of\n   secondary structure formation:\n\n```shell\nprb melt_secs_parallel (optional DNA(ref) / RNA(rev. compl))\n```\n> e.g., prb melt_secs_parallel DNA\n\n12. Create k-mer database, convert to TSV for querying and attribute\n   score to each oligo (based on nHUSH score, GC content, melting\n   temperature, homopolymer stretches, secondary structures).\n   \n   Recommended:\n   \n``` shell\nprb build-db_BL -f q_bl -m 32 -i 6 -L 40 -c 100 -d 8 -T 72\n```\n    \n> f: score function  <br>\n> d: max Hamming distance to blacklist that is excluded <br>\n> L: oligo length <br>\n> c: min abundance to be included in oligo blacklist <br>\n> i: max identical consecutive base pairs,  <br>\n> T: target temperature <br>\n> m: max length of consecutive off-target match <br>\n  \n13. Query the database to get candidate probes:\n\n``` shell\nprb cycling_query -s DNA -L 40 -m 8 -c 100 -t 40 -g 500 -stepdown 50 -greedy -excl\n```\n\n**[optional: -greedy. Speed > quality]\n[optional: -start 20 -end 100 -step 5]**\n\nTo sweep different oligo numbers, otherwise uses the oligo counts provided in `./rois/all_regions.tsv`\n        [optional: -stepdown 10]\nNumber of oligos to decrease probe size with every iteration that does not find enough oligos. Default: 1\n\nCycling query which generate probe candidates, then checks the resulting oligos using HUSH, removes inacceptable oligos and generate probes again.\nIf enough oligos cannot be found, design probes with fewer oligos, decreasing with `stepdown` at each step.\n\n14. Summarize the final probes:\n\n``` shell\nprb summarize_probes_final\n```\n\n## Generate probes for ordering\n\n- Select forward, reverse primers and color flaps.\n- Add the forward and reverse primer sequences to the probe oligos\n- The forward primer to order has the color flap + the forward sequence\n- The reverse primer to order has the t7 promoter sequence + the\n  rev. compl of the rev sequence in the oligo\n- The complete oligos can be uploaded as an Excel file containing the\n  oligo names (arbitrary but unique) and the sequences\n\n## TO DO:\n\n- Adapt the code for more flexibility in input/output folders.\n- Add a visual report of the probes at the end of the pipeline.\n- One-button process!\n- Find a way to automatize selecting primer sequences.\n",
    "bugtrack_url": null,
    "license": "CC NC BY 4.0",
    "summary": "Probe design for FISH by Quentin Verron",
    "version": "0.2.45",
    "project_urls": {
        "Homepage": "https://github.com/qverron/probe_design",
        "Repository": "https://github.com/qverron/probe_design"
    },
    "split_keywords": [
        "fish",
        " probe",
        " design",
        " bioinformatics"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e9a14a8840b3bb4fe3d43e0adf10bf7b72f7cf3ccbfbcc1e1b0859d3dc29ed6a",
                "md5": "a8a9e7c3c4df5c38357cb8fc36bd63c5",
                "sha256": "366ec2455d39b5ca5d6c3eb41165ff1ff92eb27fb148e879d4a0c8a4f9c7a325"
            },
            "downloads": -1,
            "filename": "probe_design-0.2.45-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a8a9e7c3c4df5c38357cb8fc36bd63c5",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.10",
            "size": 154394,
            "upload_time": "2024-06-04T14:51:52",
            "upload_time_iso_8601": "2024-06-04T14:51:52.932068Z",
            "url": "https://files.pythonhosted.org/packages/e9/a1/4a8840b3bb4fe3d43e0adf10bf7b72f7cf3ccbfbcc1e1b0859d3dc29ed6a/probe_design-0.2.45-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d769fc683844cfbcdc9b13dc5f6c54978ebef42153769ffca6848fca60ecc363",
                "md5": "033a22712901612410f1045ae06cbe85",
                "sha256": "60c486bcdef53b4a7520c661cb3df6a0c64bb36df951c45701efad9266e25b15"
            },
            "downloads": -1,
            "filename": "probe_design-0.2.45.tar.gz",
            "has_sig": false,
            "md5_digest": "033a22712901612410f1045ae06cbe85",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.10",
            "size": 160511,
            "upload_time": "2024-06-04T14:51:54",
            "upload_time_iso_8601": "2024-06-04T14:51:54.712482Z",
            "url": "https://files.pythonhosted.org/packages/d7/69/fc683844cfbcdc9b13dc5f6c54978ebef42153769ffca6848fca60ecc363/probe_design-0.2.45.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-06-04 14:51:54",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "qverron",
    "github_project": "probe_design",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "probe-design"
}

Quentin Verron