monte-barcode


Namemonte-barcode JSON
Version 0.0.2 PyPI version JSON
download
home_page
SummaryGenerating sets of random DNA sequences optimized for use in high-throughput sequencing.
upload_time2023-06-02 14:26:47
maintainer
docs_urlNone
author
requires_python>=3.8
licenseMIT License Copyright (c) [year] [fullname] Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords barcodes sequencing science assay
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # 🔴🟢🔵⚫️ monte barcode

![GitHub Workflow Status (with branch)](https://img.shields.io/github/actions/workflow/status/scbirlab/monte-barcode/python-publish.yml)
![PyPI - Python Version](https://img.shields.io/pypi/pyversions/monte-barcode)
![PyPI](https://img.shields.io/pypi/v/monte-barcode)

Generating sets of random DNA sequences optimized for use in high-throughput sequencing.

## Installation

### The easy way

Install the pre-compiled version from PyPI:

```bash
pip install monte-barcode
```

### From source

Clone the repository, then `cd` into it. Then run:

```bash
pip install -e .
```

## Usage

**monte barcode** provides command line utilities to generate completely random or 
peptide-encoding barcodes conforming to custom contraints, like minimum edit
distance among the set, GC content, and color balance for Illumina chemistry.

Barcode sets and individual barcodes are deterministically given an adjective-noun mnemonic 
(generated by [nemony](https://github.com/scbirlab/nemony)) for easy reference.

Each utility writes a lot of commentary to `stderr`, but the barcodes go to
`stdout` by default so they can be piped.

### Command line

Generate random barcodes of a particular length.

```bash
$ monte barcode --length 6 -n 5
Generating barcodes with the following parameters:
       ...
Requested barcodes with length 6, and 4096 possible combinations.
> Tried 16 barcodes, rejected 11, accepted 5; rejection rate is 0.69

Rejection reasons:
        gc_content: 0.62
        homopolymer: 0.25
        restriction_sites: 0.06
mighty_orchid:l6-n5-d3:x0:fresh_prague  TGAGGT
mighty_orchid:l6-n5-d3:x1:flexible_forest       AGTTCG
mighty_orchid:l6-n5-d3:x2:fun_baby      GACATC
mighty_orchid:l6-n5-d3:x3:woolly_podium TGTCCT
mighty_orchid:l6-n5-d3:x4:strong_factor GAACCA
Wrote barcode set called mighty_orchid, with minimum Hamming distance 3 and maximum Hamming distance 6.

```

Or encoding a peptide.

```bash
$ monte barcode --amino-acid HELP -n 5
Generating barcodes with the following parameters:
        ...
Using amino acid sequence HELP with length 12 and 96 possible combinations.
> Tried 7 barcodes, rejected 2, accepted 5; rejection rate is 0.29

Rejection reasons:
        gc_content: 0.14
        homopolymer: 0.14
basic_hamlet:l12-n5-d2:x0:volatile_lesson       CATGAGCTGCCT
basic_hamlet:l12-n5-d2:x1:pricy_scuba   CACGAACTGCCT
basic_hamlet:l12-n5-d2:x2:good_race     CACGAATTGCCA
basic_hamlet:l12-n5-d2:x3:demanding_bruno       CATGAATTACCG
basic_hamlet:l12-n5-d2:x4:pawky_plaster CATGAGTTACCT
Wrote barcode set called basic_hamlet, with minimum Hamming distance 2 and maximum Hamming distance 4.
```

Insist on a minimum edit distance.

```bash
$ monte barcode --length 6 -n 10 -d 3
Generating barcodes with the following parameters:
       ...
Requested barcodes with length 6, and 4096 possible combinations.
> Tried 39 barcodes, rejected 29, accepted 10; rejection rate is 0.74

Rejection reasons:
        gc_content: 0.67
        distance: 0.13
        homopolymer: 0.05
scenic_blast:l6-n10-d3:x0:acidic_turtle TGTGTG
scenic_blast:l6-n10-d3:x1:rowdy_grace   ACCATC
scenic_blast:l6-n10-d3:x2:rich_export   CGTTAG
scenic_blast:l6-n10-d3:x3:unique_break  GGAATC
scenic_blast:l6-n10-d3:x4:careful_fuji  GCAAGT
scenic_blast:l6-n10-d3:x5:whimsical_derby       CGGAAT
scenic_blast:l6-n10-d3:x6:pricy_aloha   TTCTCC
scenic_blast:l6-n10-d3:x7:zestful_ricardo       AGAGCT
scenic_blast:l6-n10-d3:x8:terse_cobra   AAGTCC
scenic_blast:l6-n10-d3:x9:zany_chamber  TTACGG
Wrote barcode set called scenic_blast, with minimum Hamming distance 3 and maximum Hamming distance 6.
```

Or insist on ideal color balance for Illumina chemistry.

```bash
$ monte barcode --length 6 -n 10 -d 3 --color
Generating barcodes with the following parameters:
        ...
Requested barcodes with length 6, and 4096 possible combinations.
> Tried 151 barcodes, rejected 141, accepted 10; rejection rate is 0.93

Rejection reasons:
        gc_content: 0.65
        homopolymer: 0.21
        color_balance: 0.72
        distance: 0.17
        palindrome: 0.02
bright_cliff:l6-n10-d3:x0:ultimate_spray        AGCGAT
bright_cliff:l6-n10-d3:x1:bulky_drama   AGTTGC
bright_cliff:l6-n10-d3:x2:tropical_pinball      TTCACG
bright_cliff:l6-n10-d3:x3:unique_info   GTACGT
bright_cliff:l6-n10-d3:x4:chilly_sahara CCTCTT
bright_cliff:l6-n10-d3:x5:novel_wisdom  GACCTA
bright_cliff:l6-n10-d3:x6:oceanic_plume AGACTG
bright_cliff:l6-n10-d3:x7:wanted_jessica        TCTCGA
bright_cliff:l6-n10-d3:x8:incise_radical        TCTGTC
bright_cliff:l6-n10-d3:x9:rebel_option  TAGGAC
Wrote barcode set called bright_cliff, with minimum Hamming distance 3 and maximum Hamming distance 6.
```

You can bias sampling based on a set of other sequences. This sampling conditions the choice of each base
on the previous base.

```bash
$ monte barcode -n 5 --amino-acid HELP | monte sample --field 2 --distance 2 -n 5
Generating barcodes with the following parameters:
        ...
Requested barcodes with length 12, and 16777216 possible combinations.
> Tried 12 barcodes, rejected 7, accepted 5; rejection rate is 0.58

Rejection reasons:
        distance: 0.58
        gc_content: 0.08
ritzy_parker:l12-n5-d2:x0:good_race     CACGAATTGCCA
ritzy_parker:l12-n5-d2:x1:wiry_cairo    CATGAACTACCA
ritzy_parker:l12-n5-d2:x2:pricy_scuba   CACGAACTGCCT
ritzy_parker:l12-n5-d2:x3:brisk_neptune CATGAATTGCCG
ritzy_parker:l12-n5-d2:x4:dextrous_frame        CACGAATTACCG
Wrote barcode set called ritzy_parker, with minimum Hamming distance 2 and maximum Hamming distance 3.
```

You can also check and filter previously generated sets.

```bash
$ monte barcode --length 6 -n 10 -d 3 2> /dev/null | monte check --color --field 2
Checking barcodes with the following parameters:
        ...
> Tried 10 barcodes, rejected 6, accepted 4; rejection rate is 0.60
Rejection reasons:
        color_balance: 0.60
Could only generate 4 barcodes, but 10 were requested. You might need to try different settings.
thorough_adam:l6-n4-d4:x0:savvy_ruby    TCCTGA
thorough_adam:l6-n4-d4:x1:elfin_rufus   AGCTTC
thorough_adam:l6-n4-d4:x2:damaged_atlas AAGGCA
thorough_adam:l6-n4-d4:x3:faded_elite   GCACTA
Wrote barcode set called thorough_adam, with minimum Hamming distance 4 and maximum Hamming distance 5.

```

Or use a previous set as a starting point for generating more, possibly with different parameters.

```bash
$ monte barcode -n10 --distance 4 --length 10  --append <(monte barcode -n 5 -a HELP) --append_field 2
Generating barcodes with the following parameters:
...
> Tried 32 barcodes, rejected 22, accepted 10; rejection rate is 0.69

Rejection reasons:
        gc_content: 0.44
        homopolymer: 0.47
        distance: 0.03
        palindrome: 0.03
elegant_triton:l12-n15-d1:x0:vocal_stand        CACGAACTTCCT
elegant_triton:l12-n15-d1:x1:real_clinic        CATGAATTGCCT
elegant_triton:l12-n15-d1:x2:dextrous_frame     CACGAATTACCG
elegant_triton:l12-n15-d1:x3:dizzy_record       CACGAATTACCT
elegant_triton:l12-n15-d1:x4:prudent_jester     CACGAGCTACCA
elegant_triton:l10-n15-d1:x5:useful_cabinet     ACGCGACACT
elegant_triton:l10-n15-d1:x6:deafening_sphere   TAATACGCGC
elegant_triton:l10-n15-d1:x7:old_program        ATCCTAAGCC
elegant_triton:l10-n15-d1:x8:eager_doctor       TTGGCCACTG
elegant_triton:l10-n15-d1:x9:dopey_limbo        ATCCGTCGTA
elegant_triton:l10-n15-d1:x10:plain_lunar       ACGAGAATTC
elegant_triton:l10-n15-d1:x11:discreet_ford     CTAACGTAGC
elegant_triton:l10-n15-d1:x12:proud_jet CTTCAGTGTC
elegant_triton:l10-n15-d1:x13:wry_insect        CAGACTGGAG
elegant_triton:l10-n15-d1:x14:lofty_shave       TTCGTAACTC
Wrote barcode set called elegant_triton, with minimum Hamming distance 1 and maximum Hamming distance 10.
```

And try to sort by ideal color balance for Illumina chemistries (if you want to use subsets).

```bash
$ monte barcode --length 6 -n 15 -d 1 2> /dev/null | monte sort --field 2
Sorting barcodes with the following parameters:
        ...
round_mono:l6-n15-d2:x0:shady_soda      AGTCCT
round_mono:l6-n15-d2:x1:vogue_cosmos    TGAGTC
round_mono:l6-n15-d2:x2:upbeat_baboon   AACGGA
round_mono:l6-n15-d2:x3:sweet_octavia   CATCCT
round_mono:l6-n15-d2:x4:clean_copper    CCTTAG
round_mono:l6-n15-d2:x5:fabulous_partner        TCCTAG
round_mono:l6-n15-d2:x6:defiant_charlie GAACGA
round_mono:l6-n15-d2:x7:misty_miguel    GCATGA
round_mono:l6-n15-d2:x8:urgent_rodeo    ACTGTG
round_mono:l6-n15-d2:x9:injured_news    GAAGGT
round_mono:l6-n15-d2:x10:clear_public   TGAGAG
round_mono:l6-n15-d2:x11:seemly_satire  GATTGG
round_mono:l6-n15-d2:x12:exemplary_robert       TTCAGC
round_mono:l6-n15-d2:x13:nuclear_choice CATCAC
round_mono:l6-n15-d2:x14:discreet_shake GCATTG
Wrote barcode set called round_mono, with minimum Hamming distance 2 and maximum Hamming distance 6.
```

#### Details

```bash
usage: monte [-h] {barcode,check,sort,sample} ...

Generate random DNA barcodes conforming to contraints, or check sets of barcodes for their conformance.

optional arguments:
  -h, --help            show this help message and exit

Sub-commands:
  {barcode,check,sort,sample}
                        Use these commands to specify the action you want.
    barcode             Generate random barcodes.
    check               Check barcode list.
    sort                Sort barcode list for optimal color balance.
    sample              Generate barcode list by sampling nucleotides from an existing list of sequences.
```

```bash
usage: monte barcode [-h] [--length LENGTH] [--amino-acid AMINO_ACID] --number NUMBER [--rejection-rate REJECTION_RATE] [--append APPEND] [--append_field APPEND_FIELD] [--distance DISTANCE]
                     [--homopolymer HOMOPOLYMER] [--levenshtein] [--color] [--gc_min GC_MIN] [--gc_max GC_MAX] [--output OUTPUT]

optional arguments:
  -h, --help            show this help message and exit
  --length LENGTH, -l LENGTH
                        Barcode length. Default: 12
  --amino-acid AMINO_ACID, -a AMINO_ACID
                        Generate barcodes encoding this amino acid sequence. Default: do not use.
  --number NUMBER, -n NUMBER
                        Number of barcodes to generate. Required.
  --rejection-rate REJECTION_RATE, -r REJECTION_RATE
                        Rate of rejection before aborting. Default: 0.85
  --append APPEND       File to take a list of barcodes to extend. Default: do not use
  --append_field APPEND_FIELD
                        Column name or number to take barcodes from for appending. Default: 1
  --distance DISTANCE, -d DISTANCE
                        Minimum distance between barcodes. Default: 1
  --homopolymer HOMOPOLYMER, -p HOMOPOLYMER
                        Maximum homopolymer length. Default: 3
  --levenshtein, -e     Use Levenshtein distance. Otherwise using Hamming diatnce. Default: False
  --color, -c           Check optimal Illumina color balance. Default: False
  --gc_min GC_MIN, -g GC_MIN
                        Minimum GC content. Default: 0.4
  --gc_max GC_MAX, -j GC_MAX
                        Maximum GC content. Default: 0.6
  --output OUTPUT, -o OUTPUT
                        Output file. Default: STDOUT
```

```bash
usage: monte sample [-h] --number NUMBER [--rejection-rate REJECTION_RATE] [--append APPEND] [--append_field APPEND_FIELD] [--distance DISTANCE] [--homopolymer HOMOPOLYMER] [--levenshtein] [--color]
                    [--gc_min GC_MIN] [--gc_max GC_MAX] [--field FIELD] [--output OUTPUT]
                    [input]

positional arguments:
  input                 Input file. Default: STDIN.

optional arguments:
  -h, --help            show this help message and exit
  --number NUMBER, -n NUMBER
                        Number of barcodes to generate. Required.
  --rejection-rate REJECTION_RATE, -r REJECTION_RATE
                        Rate of rejection before aborting. Default: 0.85
  --append APPEND       File to take a list of barcodes to extend. Default: do not use
  --append_field APPEND_FIELD
                        Column name or number to take barcodes from for appending. Default: 1
  --distance DISTANCE, -d DISTANCE
                        Minimum distance between barcodes. Default: 1
  --homopolymer HOMOPOLYMER, -p HOMOPOLYMER
                        Maximum homopolymer length. Default: 3
  --levenshtein, -e     Use Levenshtein distance. Otherwise using Hamming diatnce. Default: False
  --color, -c           Check optimal Illumina color balance. Default: False
  --gc_min GC_MIN, -g GC_MIN
                        Minimum GC content. Default: 0.4
  --gc_max GC_MAX, -j GC_MAX
                        Maximum GC content. Default: 0.6
  --field FIELD, -f FIELD
                        Column name or number for barcode sequences. Default: 1
  --output OUTPUT, -o OUTPUT
                        Output file. Default: STDOUT
```

```bash
usage: monte check [-h] [--distance DISTANCE] [--homopolymer HOMOPOLYMER] [--levenshtein]
                   [--color] [--gc_min GC_MIN] [--gc_max GC_MAX] [--field FIELD] [--output OUTPUT]
                   [input]

positional arguments:
  input                 Input file. Default: STDIN.

options:
  -h, --help            show this help message and exit
  --distance DISTANCE, -d DISTANCE
                        Minimum distance between barcodes. Default: 1
  --homopolymer HOMOPOLYMER, -p HOMOPOLYMER
                        Maximum homopolymer length. Default: 3
  --levenshtein, -e     Use Levenshtein distance. Otherwise using Hamming diatnce. Default: False
  --color, -c           Check optimal Illumina color balance. Default: False
  --gc_min GC_MIN, -g GC_MIN
                        Minimum GC content. Default: 0.4
  --gc_max GC_MAX, -j GC_MAX
                        Maximum GC content. Default: 0.6
  --field FIELD, -f FIELD
                        Column number for barcode sequences. Default: 1
  --output OUTPUT, -o OUTPUT
                        Output file. Default: STDOUT
```

```bash
usage: monte sort [-h] [--field FIELD] [--output OUTPUT] [input]

positional arguments:
  input                 Input file. Default: STDIN.

options:
  -h, --help            show this help message and exit
  --field FIELD, -f FIELD
                        Column number for barcode sequences. Default: 1
  --output OUTPUT, -o OUTPUT
                        Output file. Default: STDOUT
```

### Python API

**monte-barcode** can be imported into Python to generate and check barcodes in your own programs.

```python
import montebarcode as mb
```

Generate random DNA sequences.

```python
>>> for bc in mb.infinite_barcodes(length=20, check_used=False): 
...     print(bc)
...     break
... 
ATCAGTCGTCACACTAGTTA
```

Or peptide-encoding sequences.

```python
>>> list(mb.codon_barcodes("L", ordered=True)) 
['CTT', 'CTC', 'CTA', 'CTG', 'TTA', 'TTG']
```

You can check the minimum and maximum distances among a set.

```python
>>> mb.minmax_distance(['AAA', 'AAA'])
(0, 0)
>>> mb.minmax_distance(['AAA', 'TCG', 'AAT'])
(1, 3)
>>> mb.minmax_distance(['AAA', 'TCG', 'AAAT'], use_levenshtein=False)
(0, 3)
>>> mb.minmax_distance(['AAA', 'TCG', 'AAAT'])
(1, 4)
```

And get usage of each base at each position.

```python
>>> mb.base_usage(['AAA', 'TTT', 'GCT', 'CCA'])[0]['A']
0.25
>>> mb.base_usage(['AAA', 'TTT', 'GCT', 'CCA'])[1]['G']
0
>>> mb.base_usage(['AAA', 'TTT', 'GCT', 'CCA'])[2]['A']
0.5
```

You can see whether adding a barcode to a set would throw off
the Illumina color balance.

```python
>>> mb.IlluminaColorBalance()('AAAT', ['TCGC', 'ACAG', 'TGGC', 'ATCG'])
True
>>> mb.IlluminaColorBalance()('AAAT', ['TCGC', 'CCAG', 'TGGC', 'ATCG'])
False
```

And run a suite of checks against a set of barcodes (or infinite stream),
retrieving failure reasons, number of tries, and conforming barcode set.

```python
>>> checks = [mb.Homopolymer(), mb.Palindrome()]
>>> mb.make_checks(['AAAAT', 'CCCGGG', 'ATCGCG', 'GCCGAT'], n=4, checks=checks, quiet=True)
(Counter({'homopolymer': 1, 'palindrome': 1}), 4, ['ATCGCG', 'GCCGAT'])
>>> mb.make_checks(['AAAAT', 'CCCGGG', 'ATCGCG', 'GCCGAT'], n=1, checks=checks, quiet=True)
(Counter({'homopolymer': 1, 'palindrome': 1}), 3, ['ATCGCG'])
```

### Documentation

Full API documentation is at [ReadTheDocs](https://monte-barcode.readthedocs.org).

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "monte-barcode",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "barcodes,sequencing,science,assay",
    "author": "",
    "author_email": "Eachan Johnson <eachan.johnson@crick.ac.uk>",
    "download_url": "https://files.pythonhosted.org/packages/13/99/7acfa4eed93645013840eb7c92d44c7d9f47aff9da6f81c98ee497bdd6c0/monte-barcode-0.0.2.tar.gz",
    "platform": null,
    "description": "# \ud83d\udd34\ud83d\udfe2\ud83d\udd35\u26ab\ufe0f monte barcode\n\n![GitHub Workflow Status (with branch)](https://img.shields.io/github/actions/workflow/status/scbirlab/monte-barcode/python-publish.yml)\n![PyPI - Python Version](https://img.shields.io/pypi/pyversions/monte-barcode)\n![PyPI](https://img.shields.io/pypi/v/monte-barcode)\n\nGenerating sets of random DNA sequences optimized for use in high-throughput sequencing.\n\n## Installation\n\n### The easy way\n\nInstall the pre-compiled version from PyPI:\n\n```bash\npip install monte-barcode\n```\n\n### From source\n\nClone the repository, then `cd` into it. Then run:\n\n```bash\npip install -e .\n```\n\n## Usage\n\n**monte barcode** provides command line utilities to generate completely random or \npeptide-encoding barcodes conforming to custom contraints, like minimum edit\ndistance among the set, GC content, and color balance for Illumina chemistry.\n\nBarcode sets and individual barcodes are deterministically given an adjective-noun mnemonic \n(generated by [nemony](https://github.com/scbirlab/nemony)) for easy reference.\n\nEach utility writes a lot of commentary to `stderr`, but the barcodes go to\n`stdout` by default so they can be piped.\n\n### Command line\n\nGenerate random barcodes of a particular length.\n\n```bash\n$ monte barcode --length 6 -n 5\nGenerating barcodes with the following parameters:\n       ...\nRequested barcodes with length 6, and 4096 possible combinations.\n> Tried 16 barcodes, rejected 11, accepted 5; rejection rate is 0.69\n\nRejection reasons:\n        gc_content: 0.62\n        homopolymer: 0.25\n        restriction_sites: 0.06\nmighty_orchid:l6-n5-d3:x0:fresh_prague  TGAGGT\nmighty_orchid:l6-n5-d3:x1:flexible_forest       AGTTCG\nmighty_orchid:l6-n5-d3:x2:fun_baby      GACATC\nmighty_orchid:l6-n5-d3:x3:woolly_podium TGTCCT\nmighty_orchid:l6-n5-d3:x4:strong_factor GAACCA\nWrote barcode set called mighty_orchid, with minimum Hamming distance 3 and maximum Hamming distance 6.\n\n```\n\nOr encoding a peptide.\n\n```bash\n$ monte barcode --amino-acid HELP -n 5\nGenerating barcodes with the following parameters:\n        ...\nUsing amino acid sequence HELP with length 12 and 96 possible combinations.\n> Tried 7 barcodes, rejected 2, accepted 5; rejection rate is 0.29\n\nRejection reasons:\n        gc_content: 0.14\n        homopolymer: 0.14\nbasic_hamlet:l12-n5-d2:x0:volatile_lesson       CATGAGCTGCCT\nbasic_hamlet:l12-n5-d2:x1:pricy_scuba   CACGAACTGCCT\nbasic_hamlet:l12-n5-d2:x2:good_race     CACGAATTGCCA\nbasic_hamlet:l12-n5-d2:x3:demanding_bruno       CATGAATTACCG\nbasic_hamlet:l12-n5-d2:x4:pawky_plaster CATGAGTTACCT\nWrote barcode set called basic_hamlet, with minimum Hamming distance 2 and maximum Hamming distance 4.\n```\n\nInsist on a minimum edit distance.\n\n```bash\n$ monte barcode --length 6 -n 10 -d 3\nGenerating barcodes with the following parameters:\n       ...\nRequested barcodes with length 6, and 4096 possible combinations.\n> Tried 39 barcodes, rejected 29, accepted 10; rejection rate is 0.74\n\nRejection reasons:\n        gc_content: 0.67\n        distance: 0.13\n        homopolymer: 0.05\nscenic_blast:l6-n10-d3:x0:acidic_turtle TGTGTG\nscenic_blast:l6-n10-d3:x1:rowdy_grace   ACCATC\nscenic_blast:l6-n10-d3:x2:rich_export   CGTTAG\nscenic_blast:l6-n10-d3:x3:unique_break  GGAATC\nscenic_blast:l6-n10-d3:x4:careful_fuji  GCAAGT\nscenic_blast:l6-n10-d3:x5:whimsical_derby       CGGAAT\nscenic_blast:l6-n10-d3:x6:pricy_aloha   TTCTCC\nscenic_blast:l6-n10-d3:x7:zestful_ricardo       AGAGCT\nscenic_blast:l6-n10-d3:x8:terse_cobra   AAGTCC\nscenic_blast:l6-n10-d3:x9:zany_chamber  TTACGG\nWrote barcode set called scenic_blast, with minimum Hamming distance 3 and maximum Hamming distance 6.\n```\n\nOr insist on ideal color balance for Illumina chemistry.\n\n```bash\n$ monte barcode --length 6 -n 10 -d 3 --color\nGenerating barcodes with the following parameters:\n        ...\nRequested barcodes with length 6, and 4096 possible combinations.\n> Tried 151 barcodes, rejected 141, accepted 10; rejection rate is 0.93\n\nRejection reasons:\n        gc_content: 0.65\n        homopolymer: 0.21\n        color_balance: 0.72\n        distance: 0.17\n        palindrome: 0.02\nbright_cliff:l6-n10-d3:x0:ultimate_spray        AGCGAT\nbright_cliff:l6-n10-d3:x1:bulky_drama   AGTTGC\nbright_cliff:l6-n10-d3:x2:tropical_pinball      TTCACG\nbright_cliff:l6-n10-d3:x3:unique_info   GTACGT\nbright_cliff:l6-n10-d3:x4:chilly_sahara CCTCTT\nbright_cliff:l6-n10-d3:x5:novel_wisdom  GACCTA\nbright_cliff:l6-n10-d3:x6:oceanic_plume AGACTG\nbright_cliff:l6-n10-d3:x7:wanted_jessica        TCTCGA\nbright_cliff:l6-n10-d3:x8:incise_radical        TCTGTC\nbright_cliff:l6-n10-d3:x9:rebel_option  TAGGAC\nWrote barcode set called bright_cliff, with minimum Hamming distance 3 and maximum Hamming distance 6.\n```\n\nYou can bias sampling based on a set of other sequences. This sampling conditions the choice of each base\non the previous base.\n\n```bash\n$ monte barcode -n 5 --amino-acid HELP | monte sample --field 2 --distance 2 -n 5\nGenerating barcodes with the following parameters:\n        ...\nRequested barcodes with length 12, and 16777216 possible combinations.\n> Tried 12 barcodes, rejected 7, accepted 5; rejection rate is 0.58\n\nRejection reasons:\n        distance: 0.58\n        gc_content: 0.08\nritzy_parker:l12-n5-d2:x0:good_race     CACGAATTGCCA\nritzy_parker:l12-n5-d2:x1:wiry_cairo    CATGAACTACCA\nritzy_parker:l12-n5-d2:x2:pricy_scuba   CACGAACTGCCT\nritzy_parker:l12-n5-d2:x3:brisk_neptune CATGAATTGCCG\nritzy_parker:l12-n5-d2:x4:dextrous_frame        CACGAATTACCG\nWrote barcode set called ritzy_parker, with minimum Hamming distance 2 and maximum Hamming distance 3.\n```\n\nYou can also check and filter previously generated sets.\n\n```bash\n$ monte barcode --length 6 -n 10 -d 3 2> /dev/null | monte check --color --field 2\nChecking barcodes with the following parameters:\n        ...\n> Tried 10 barcodes, rejected 6, accepted 4; rejection rate is 0.60\nRejection reasons:\n        color_balance: 0.60\nCould only generate 4 barcodes, but 10 were requested. You might need to try different settings.\nthorough_adam:l6-n4-d4:x0:savvy_ruby    TCCTGA\nthorough_adam:l6-n4-d4:x1:elfin_rufus   AGCTTC\nthorough_adam:l6-n4-d4:x2:damaged_atlas AAGGCA\nthorough_adam:l6-n4-d4:x3:faded_elite   GCACTA\nWrote barcode set called thorough_adam, with minimum Hamming distance 4 and maximum Hamming distance 5.\n\n```\n\nOr use a previous set as a starting point for generating more, possibly with different parameters.\n\n```bash\n$ monte barcode -n10 --distance 4 --length 10  --append <(monte barcode -n 5 -a HELP) --append_field 2\nGenerating barcodes with the following parameters:\n...\n> Tried 32 barcodes, rejected 22, accepted 10; rejection rate is 0.69\n\nRejection reasons:\n        gc_content: 0.44\n        homopolymer: 0.47\n        distance: 0.03\n        palindrome: 0.03\nelegant_triton:l12-n15-d1:x0:vocal_stand        CACGAACTTCCT\nelegant_triton:l12-n15-d1:x1:real_clinic        CATGAATTGCCT\nelegant_triton:l12-n15-d1:x2:dextrous_frame     CACGAATTACCG\nelegant_triton:l12-n15-d1:x3:dizzy_record       CACGAATTACCT\nelegant_triton:l12-n15-d1:x4:prudent_jester     CACGAGCTACCA\nelegant_triton:l10-n15-d1:x5:useful_cabinet     ACGCGACACT\nelegant_triton:l10-n15-d1:x6:deafening_sphere   TAATACGCGC\nelegant_triton:l10-n15-d1:x7:old_program        ATCCTAAGCC\nelegant_triton:l10-n15-d1:x8:eager_doctor       TTGGCCACTG\nelegant_triton:l10-n15-d1:x9:dopey_limbo        ATCCGTCGTA\nelegant_triton:l10-n15-d1:x10:plain_lunar       ACGAGAATTC\nelegant_triton:l10-n15-d1:x11:discreet_ford     CTAACGTAGC\nelegant_triton:l10-n15-d1:x12:proud_jet CTTCAGTGTC\nelegant_triton:l10-n15-d1:x13:wry_insect        CAGACTGGAG\nelegant_triton:l10-n15-d1:x14:lofty_shave       TTCGTAACTC\nWrote barcode set called elegant_triton, with minimum Hamming distance 1 and maximum Hamming distance 10.\n```\n\nAnd try to sort by ideal color balance for Illumina chemistries (if you want to use subsets).\n\n```bash\n$ monte barcode --length 6 -n 15 -d 1 2> /dev/null | monte sort --field 2\nSorting barcodes with the following parameters:\n        ...\nround_mono:l6-n15-d2:x0:shady_soda      AGTCCT\nround_mono:l6-n15-d2:x1:vogue_cosmos    TGAGTC\nround_mono:l6-n15-d2:x2:upbeat_baboon   AACGGA\nround_mono:l6-n15-d2:x3:sweet_octavia   CATCCT\nround_mono:l6-n15-d2:x4:clean_copper    CCTTAG\nround_mono:l6-n15-d2:x5:fabulous_partner        TCCTAG\nround_mono:l6-n15-d2:x6:defiant_charlie GAACGA\nround_mono:l6-n15-d2:x7:misty_miguel    GCATGA\nround_mono:l6-n15-d2:x8:urgent_rodeo    ACTGTG\nround_mono:l6-n15-d2:x9:injured_news    GAAGGT\nround_mono:l6-n15-d2:x10:clear_public   TGAGAG\nround_mono:l6-n15-d2:x11:seemly_satire  GATTGG\nround_mono:l6-n15-d2:x12:exemplary_robert       TTCAGC\nround_mono:l6-n15-d2:x13:nuclear_choice CATCAC\nround_mono:l6-n15-d2:x14:discreet_shake GCATTG\nWrote barcode set called round_mono, with minimum Hamming distance 2 and maximum Hamming distance 6.\n```\n\n#### Details\n\n```bash\nusage: monte [-h] {barcode,check,sort,sample} ...\n\nGenerate random DNA barcodes conforming to contraints, or check sets of barcodes for their conformance.\n\noptional arguments:\n  -h, --help            show this help message and exit\n\nSub-commands:\n  {barcode,check,sort,sample}\n                        Use these commands to specify the action you want.\n    barcode             Generate random barcodes.\n    check               Check barcode list.\n    sort                Sort barcode list for optimal color balance.\n    sample              Generate barcode list by sampling nucleotides from an existing list of sequences.\n```\n\n```bash\nusage: monte barcode [-h] [--length LENGTH] [--amino-acid AMINO_ACID] --number NUMBER [--rejection-rate REJECTION_RATE] [--append APPEND] [--append_field APPEND_FIELD] [--distance DISTANCE]\n                     [--homopolymer HOMOPOLYMER] [--levenshtein] [--color] [--gc_min GC_MIN] [--gc_max GC_MAX] [--output OUTPUT]\n\noptional arguments:\n  -h, --help            show this help message and exit\n  --length LENGTH, -l LENGTH\n                        Barcode length. Default: 12\n  --amino-acid AMINO_ACID, -a AMINO_ACID\n                        Generate barcodes encoding this amino acid sequence. Default: do not use.\n  --number NUMBER, -n NUMBER\n                        Number of barcodes to generate. Required.\n  --rejection-rate REJECTION_RATE, -r REJECTION_RATE\n                        Rate of rejection before aborting. Default: 0.85\n  --append APPEND       File to take a list of barcodes to extend. Default: do not use\n  --append_field APPEND_FIELD\n                        Column name or number to take barcodes from for appending. Default: 1\n  --distance DISTANCE, -d DISTANCE\n                        Minimum distance between barcodes. Default: 1\n  --homopolymer HOMOPOLYMER, -p HOMOPOLYMER\n                        Maximum homopolymer length. Default: 3\n  --levenshtein, -e     Use Levenshtein distance. Otherwise using Hamming diatnce. Default: False\n  --color, -c           Check optimal Illumina color balance. Default: False\n  --gc_min GC_MIN, -g GC_MIN\n                        Minimum GC content. Default: 0.4\n  --gc_max GC_MAX, -j GC_MAX\n                        Maximum GC content. Default: 0.6\n  --output OUTPUT, -o OUTPUT\n                        Output file. Default: STDOUT\n```\n\n```bash\nusage: monte sample [-h] --number NUMBER [--rejection-rate REJECTION_RATE] [--append APPEND] [--append_field APPEND_FIELD] [--distance DISTANCE] [--homopolymer HOMOPOLYMER] [--levenshtein] [--color]\n                    [--gc_min GC_MIN] [--gc_max GC_MAX] [--field FIELD] [--output OUTPUT]\n                    [input]\n\npositional arguments:\n  input                 Input file. Default: STDIN.\n\noptional arguments:\n  -h, --help            show this help message and exit\n  --number NUMBER, -n NUMBER\n                        Number of barcodes to generate. Required.\n  --rejection-rate REJECTION_RATE, -r REJECTION_RATE\n                        Rate of rejection before aborting. Default: 0.85\n  --append APPEND       File to take a list of barcodes to extend. Default: do not use\n  --append_field APPEND_FIELD\n                        Column name or number to take barcodes from for appending. Default: 1\n  --distance DISTANCE, -d DISTANCE\n                        Minimum distance between barcodes. Default: 1\n  --homopolymer HOMOPOLYMER, -p HOMOPOLYMER\n                        Maximum homopolymer length. Default: 3\n  --levenshtein, -e     Use Levenshtein distance. Otherwise using Hamming diatnce. Default: False\n  --color, -c           Check optimal Illumina color balance. Default: False\n  --gc_min GC_MIN, -g GC_MIN\n                        Minimum GC content. Default: 0.4\n  --gc_max GC_MAX, -j GC_MAX\n                        Maximum GC content. Default: 0.6\n  --field FIELD, -f FIELD\n                        Column name or number for barcode sequences. Default: 1\n  --output OUTPUT, -o OUTPUT\n                        Output file. Default: STDOUT\n```\n\n```bash\nusage: monte check [-h] [--distance DISTANCE] [--homopolymer HOMOPOLYMER] [--levenshtein]\n                   [--color] [--gc_min GC_MIN] [--gc_max GC_MAX] [--field FIELD] [--output OUTPUT]\n                   [input]\n\npositional arguments:\n  input                 Input file. Default: STDIN.\n\noptions:\n  -h, --help            show this help message and exit\n  --distance DISTANCE, -d DISTANCE\n                        Minimum distance between barcodes. Default: 1\n  --homopolymer HOMOPOLYMER, -p HOMOPOLYMER\n                        Maximum homopolymer length. Default: 3\n  --levenshtein, -e     Use Levenshtein distance. Otherwise using Hamming diatnce. Default: False\n  --color, -c           Check optimal Illumina color balance. Default: False\n  --gc_min GC_MIN, -g GC_MIN\n                        Minimum GC content. Default: 0.4\n  --gc_max GC_MAX, -j GC_MAX\n                        Maximum GC content. Default: 0.6\n  --field FIELD, -f FIELD\n                        Column number for barcode sequences. Default: 1\n  --output OUTPUT, -o OUTPUT\n                        Output file. Default: STDOUT\n```\n\n```bash\nusage: monte sort [-h] [--field FIELD] [--output OUTPUT] [input]\n\npositional arguments:\n  input                 Input file. Default: STDIN.\n\noptions:\n  -h, --help            show this help message and exit\n  --field FIELD, -f FIELD\n                        Column number for barcode sequences. Default: 1\n  --output OUTPUT, -o OUTPUT\n                        Output file. Default: STDOUT\n```\n\n### Python API\n\n**monte-barcode** can be imported into Python to generate and check barcodes in your own programs.\n\n```python\nimport montebarcode as mb\n```\n\nGenerate random DNA sequences.\n\n```python\n>>> for bc in mb.infinite_barcodes(length=20, check_used=False): \n...     print(bc)\n...     break\n... \nATCAGTCGTCACACTAGTTA\n```\n\nOr peptide-encoding sequences.\n\n```python\n>>> list(mb.codon_barcodes(\"L\", ordered=True)) \n['CTT', 'CTC', 'CTA', 'CTG', 'TTA', 'TTG']\n```\n\nYou can check the minimum and maximum distances among a set.\n\n```python\n>>> mb.minmax_distance(['AAA', 'AAA'])\n(0, 0)\n>>> mb.minmax_distance(['AAA', 'TCG', 'AAT'])\n(1, 3)\n>>> mb.minmax_distance(['AAA', 'TCG', 'AAAT'], use_levenshtein=False)\n(0, 3)\n>>> mb.minmax_distance(['AAA', 'TCG', 'AAAT'])\n(1, 4)\n```\n\nAnd get usage of each base at each position.\n\n```python\n>>> mb.base_usage(['AAA', 'TTT', 'GCT', 'CCA'])[0]['A']\n0.25\n>>> mb.base_usage(['AAA', 'TTT', 'GCT', 'CCA'])[1]['G']\n0\n>>> mb.base_usage(['AAA', 'TTT', 'GCT', 'CCA'])[2]['A']\n0.5\n```\n\nYou can see whether adding a barcode to a set would throw off\nthe Illumina color balance.\n\n```python\n>>> mb.IlluminaColorBalance()('AAAT', ['TCGC', 'ACAG', 'TGGC', 'ATCG'])\nTrue\n>>> mb.IlluminaColorBalance()('AAAT', ['TCGC', 'CCAG', 'TGGC', 'ATCG'])\nFalse\n```\n\nAnd run a suite of checks against a set of barcodes (or infinite stream),\nretrieving failure reasons, number of tries, and conforming barcode set.\n\n```python\n>>> checks = [mb.Homopolymer(), mb.Palindrome()]\n>>> mb.make_checks(['AAAAT', 'CCCGGG', 'ATCGCG', 'GCCGAT'], n=4, checks=checks, quiet=True)\n(Counter({'homopolymer': 1, 'palindrome': 1}), 4, ['ATCGCG', 'GCCGAT'])\n>>> mb.make_checks(['AAAAT', 'CCCGGG', 'ATCGCG', 'GCCGAT'], n=1, checks=checks, quiet=True)\n(Counter({'homopolymer': 1, 'palindrome': 1}), 3, ['ATCGCG'])\n```\n\n### Documentation\n\nFull API documentation is at [ReadTheDocs](https://monte-barcode.readthedocs.org).\n",
    "bugtrack_url": null,
    "license": "MIT License  Copyright (c) [year] [fullname]  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:  The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.  THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.",
    "summary": "Generating sets of random DNA sequences optimized for use in high-throughput sequencing.",
    "version": "0.0.2",
    "project_urls": {
        "Bug Tracker": "https://github.com/scbirlab/monte-barcode/issues",
        "Homepage": "https://github.com/scbirlab/monte-barcode"
    },
    "split_keywords": [
        "barcodes",
        "sequencing",
        "science",
        "assay"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0c2530277f39d050ebcd392a06aa1355afe105b95e173cc890a83b8d80dcb3fa",
                "md5": "cde25eb1438172a414b5c16af539cef2",
                "sha256": "c8e0a43eb33d747ef8c0b7d1fbccd8db9ebce7fc34dd91df0cf95abbb703b2ba"
            },
            "downloads": -1,
            "filename": "monte_barcode-0.0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "cde25eb1438172a414b5c16af539cef2",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 18305,
            "upload_time": "2023-06-02T14:26:45",
            "upload_time_iso_8601": "2023-06-02T14:26:45.132984Z",
            "url": "https://files.pythonhosted.org/packages/0c/25/30277f39d050ebcd392a06aa1355afe105b95e173cc890a83b8d80dcb3fa/monte_barcode-0.0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "13997acfa4eed93645013840eb7c92d44c7d9f47aff9da6f81c98ee497bdd6c0",
                "md5": "4998495539b48616eed764830c0d7956",
                "sha256": "c01fb882459188ddd5a31e8863867848e17f6f9f1a1cdb20fe086f9d24416253"
            },
            "downloads": -1,
            "filename": "monte-barcode-0.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "4998495539b48616eed764830c0d7956",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 17601,
            "upload_time": "2023-06-02T14:26:47",
            "upload_time_iso_8601": "2023-06-02T14:26:47.043389Z",
            "url": "https://files.pythonhosted.org/packages/13/99/7acfa4eed93645013840eb7c92d44c7d9f47aff9da6f81c98ee497bdd6c0/monte-barcode-0.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-06-02 14:26:47",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "scbirlab",
    "github_project": "monte-barcode",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "monte-barcode"
}
        
Elapsed time: 0.07720s