savemoney


Namesavemoney JSON
Version 0.2.16 PyPI version JSON
download
home_pagehttps://github.com/MasaakiU/MultiplexNanopore
SummarySimple Algorithm for Very Efficient Multiplexing of Oxford Nanopore Experiments for You!
upload_time2024-03-20 21:57:02
maintainerMasaaki Uematsu
docs_urlNone
authorMasaaki Uematsu
requires_python>=3.10
licenseCC BY-NC-SA 4.0
keywords plasmid whole-plasmid sequencing alignment multiplexing barcode-free bayesian analysis prior information
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <p align="center"><img src="https://github.com/MasaakiU/MultiplexNanopore/raw/master/resources/logo/SAVEMONEY_logo_with_letter.png"/></p>

*Simple Algorithm for Very Efficient Multiplexing of Oxford Nanopore Experiments for You!*

# Overview

SAVEMONEY guides researchers to mix multiple plasmids for submission as a single sample to a commercial long-read sequencing service (e.g., Oxford Nanopore Technology), reducing overall sequencing costs while maintaining fidelity of sequencing results. Following is the outline of the procedure:

- **Step 1. pre-survey** takes plasmid maps as inputs and guides users which groupings of plasmids is optimal.
- **Step 2. submit samples** according to the output of pre-survey.
- **Step 3. post-analysis** execute computational deconvolution of the obtained results, and generate a consensus sequence for each plasmid constituent within the sample mixture. This step must be run separately for each sample mixture.
- An optional third step, **Step 4. visualization of results (optional)** provides a platform for the detailed examination of the alignments and consensus generated in the post-analysis.

<p align="center"><img src="https://github.com/MasaakiU/MultiplexNanopore/raw/master/resources/figures/Fig1_20230313_margin.png" width="500"/></p>

The algorithm permits mixing of six (or potentially even more) plasmids for sequencing with Oxford Nanopore Technology (e.g., Plasmidsaurus services) and permits mixing of plasmids with as few as two base differences. For more information, please check out our publication (coming soon).

# SAVEMONEY via Google Colab!

- [SAVEMONEY](https://colab.research.google.com/github/MasaakiU/MultiplexNanopore/blob/master/colab/MultiplexNanopore.ipynb)
- SAVEMONEY_batch (coming soon!)

# SAVEMONEY for local environment

## Requirements

Verified on macOS, Linux, and Windows10

- Python 3.10 or later
- One of the following C++ compiler (though I don't know the minimum required version number)
  - [Clang 14.0.0](https://clang.llvm.org)
  - [GCC 12.2.0](https://gcc.gnu.org)
  - [Microsoft C++ Build Tools](https://visualstudio.microsoft.com/visual-cpp-build-tools/) (for Windows)
- biopython>=1.83
- pandas>=1.5.3
- parasail>=1.3.4
- Pillow>=9.4.0
- PuLP>=2.7.0
- scipy>=1.11.4
- snapgene_reader>=0.1.20
- tqdm>=4.66.1
- Cython>=3.0.7
- matplotlib>=3.7.1
- numpy>=1.23.5
- pyspoa>=0.2.1
- pysam>=0.22.0 (optional)

## Installation

SAVEMONEY is available via pip.

```shell
pip install savemoney
```

If installation via pip fails, please check the requirements above. If any of the package conflicts with those already present in your environment, I recommend creating a new virtual environment. 

If C++ compiler does not exist, install Xcode Command Line Tools using the following command (for macOS):

```shell
xcode-select --install
```

or download [Microsoft C++ Build Tools](https://visualstudio.microsoft.com/visual-cpp-build-tools/) (for Windows).

## Quick usage

SAVEMONEY can be executed either in the python script or via command line.

### Execute SAVEMONEY in python script

To import and execute SAVEMONEY in the python script. Follow the example below:

```python
import savemoney
savemoney.pre_survey("path_to_sequence_directory", "save_directory", **kwargs)
savemoney.post_analysis("path_to_sequence_directory", "save_directory", **kwargs)
```

All of the plasmid map files with `*.dna` and `.fasta` extension (and in addition `*.fastq` files for post analysis) in the `path_to_sequence_directory` will be used for the analysis. Results will be generated in the `save_directory`. `kwargs` are optional parameters through which you can optimize the analysis:

```python
# pre-survey
kwargs = {
    'distance_threshold':   5,  # main parameter to be changed
    'number_of_groups':     1,  # main parameter to be changed
    'gap_open_penalty':     3,  # alignment parameter
    'gap_extend_penalty':   1,  # alignment parameter
    'match_score':          1,  # alignment parameter
    'mismatch_score':      -2,  # alignment parameter
}

# post-analysis
kwargs = {
    'score_threshold':    0.3,   # main parameter to be changed 
    'gap_open_penalty':     3,   # alignment parameter
    'gap_extend_penalty':   1,   # alignment parameter
    'match_score':          1,   # alignment parameter
    'mismatch_score':      -2,   # alignment parameter
    'error_rate':     0.00001,   # prior probability for Bayesian analysis
    'del_mut_rate':  0.0001/4,   # prior probability for Bayesian analysis # e.g. "A -> T, C, G, del"
    'ins_rate':       0.00001,   # prior probability for Bayesian analysis
    'window':             160,   # maximum detectable length of repetitive sequences when wrong plasmid maps are provided: if region of 80 nt is repeated adjascently two times, put the value of 160
}
```

For the meaning of these parameters, please refer to the [SAVEMONEY Google Colab page](https://colab.research.google.com/github/MasaakiU/MultiplexNanopore/blob/master/colab/MultiplexNanopore.ipynb) or the reference below.

### Execute SAVEMONEY via command line

SAVEMONEY can also be executed via command line:

```shell
python -m savemoney.pre_survey path_to_sequence_directory save_directory
python -m savemoney.post_analysis path_to_sequence_directory save_directory
```

Parameters can be specified as follows:

```shell
# pre-survey
python -m savemoney.pre_survey -h
usage: __main__.py [-h] [-gop GOP] [-gep GEP] [-ms MS] [-mms MMS] [-dt DT] [-nog NOG] plasmid_map_dir_paths save_dir_base
positional arguments:
  plasmid_map_dir_paths path to plasmid map_directory
  save_dir_base         save directory path
options:
  -h, --help            show this help message and exit
  -gop GOP              gap_open_penalty, optional, default_value = 3
  -gep GEP              gap_extend_penalty, optional, default_value = 1
  -ms MS                match_score, optional, default_value = 1
  -mms MMS              mismatch_score, optional, default_value = -2
  -dt DT                distance_threshold, optional, default_value = 5
  -nog NOG              number_of_groups, optional, default_value = 1

# post-analysis
python -m savemoney.post_analysis -h
usage: __main__.py [-h] [-gop GOP] [-gep GEP] [-ms MS] [-mms MMS] [-st ST] [-er ER] [-dmr DMR] [-ir IR] sequence_dir_paths save_dir_base
positional arguments:
  sequence_dir_paths  sequence_dir_paths
  save_dir_base       save directory path
options:
  -h, --help          show this help message and exit
  -gop GOP            gap_open_penalty, optional, default_value = 3
  -gep GEP            gap_extend_penalty, optional, default_value = 1
  -ms MS              match_score, optional, default_value = 1
  -mms MMS            mismatch_score, optional, default_value = -2
  -st ST              score_threshold, optional, default_value = 0.3
  -er ER              error_rate, optional, default_value = 0.0001
  -dmr DMR            del_mut_rate, optional, default_value = 2.5e-05
  -ir IR              ins_rate, optional, default_value = 0.0001
  -w W                window, optional, default_value = 160
```

### Output

The interpretation of output files are described on [SAVEMONEY Google Colab page](https://colab.research.google.com/github/MasaakiU/MultiplexNanopore/blob/master/colab/MultiplexNanopore.ipynb#InterpretationOfResults) in details. Other than that, you can visualize consensus alignment results by using `your_plasmid_name.ca` file generated by SAVEMONEY.

From python script: 

```python
import savempney
savemoney.show_consensus(consensus_alignment_path, center=2000, seq_range=50, offset=0)
```

From command line: 

```shell
python -m savemoney.show_consensus path_to_consensus_alignment_file
```

Parameters can be specified as follows:

```shell
python -m savemoney.show_consensus -h
usage: __main__.py [-h] [--center CENTER] [--seq_range SEQ_RANGE] [--offset OFFSET] consensus_alignment_path
positional arguments:
  consensus_alignment_path  path to consensus_alignment (*.ca) file
options:
  -h, --help            show this help message and exit
  --center CENTER       center, optional, default_value = 2000
  --seq_range SEQ_RANGE seq_range, optional, default_value = 50
  --offset OFFSET       offset, optional, default_value = 0
```

Conversion of consensus alignment results (`*.ca`) to `*.bam` and `*.fastq` format is also supported. The conversion requires [pysam>=0.22.0](https://pypi.org/project/pysam/) be installed in your environment. To convert the file, type the following code in a python script:

```python
import savemoney
savemoney.ca2bam(consensus_alignment_path)
```

If you want to convert it via command line, type the following commnad:

```shell
python -m savemoney.ca2bam path_to_consensus_alignment_file
```

# References

[Uematsu M., Baskin J. M., "Barcode-free multiplex plasmid sequencing using Bayesian analysis and nanopore sequencing." *eLife*. **2023**; 12: RP88794](https://doi.org/10.7554/eLife.88794.1)

[Slide from Weill Institute Science Workshop, May 22, 2023](https://github.com/MasaakiU/MultiplexNanopore/blob/master/resources/slides/20230522_Weill-Institute-Science-Workshop.pdf)


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/MasaakiU/MultiplexNanopore",
    "name": "savemoney",
    "maintainer": "Masaaki Uematsu",
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": "mu84@cornell.edu",
    "keywords": "plasmid, whole-plasmid, sequencing, alignment, multiplexing, barcode-free, bayesian analysis, prior information, ",
    "author": "Masaaki Uematsu",
    "author_email": "mu84@cornell.edu",
    "download_url": "https://files.pythonhosted.org/packages/cf/51/a9f3e72bbab92a4f046cd2f85d056d86f04f1daf5d7ff84af8d6a6c320ab/savemoney-0.2.16.tar.gz",
    "platform": null,
    "description": "<p align=\"center\"><img src=\"https://github.com/MasaakiU/MultiplexNanopore/raw/master/resources/logo/SAVEMONEY_logo_with_letter.png\"/></p>\n\n*Simple Algorithm for Very Efficient Multiplexing of Oxford Nanopore Experiments for You!*\n\n# Overview\n\nSAVEMONEY guides researchers to mix multiple plasmids for submission as a single sample to a commercial long-read sequencing service (e.g., Oxford Nanopore Technology), reducing overall sequencing costs while maintaining fidelity of sequencing results. Following is the outline of the procedure:\n\n- **Step 1. pre-survey** takes plasmid maps as inputs and guides users which groupings of plasmids is optimal.\n- **Step 2. submit samples** according to the output of pre-survey.\n- **Step 3. post-analysis** execute computational deconvolution of the obtained results, and generate a consensus sequence for each plasmid constituent within the sample mixture. This step must be run separately for each sample mixture.\n- An optional third step, **Step 4. visualization of results (optional)** provides a platform for the detailed examination of the alignments and consensus generated in the post-analysis.\n\n<p align=\"center\"><img src=\"https://github.com/MasaakiU/MultiplexNanopore/raw/master/resources/figures/Fig1_20230313_margin.png\" width=\"500\"/></p>\n\nThe algorithm permits mixing of six (or potentially even more) plasmids for sequencing with Oxford Nanopore Technology (e.g., Plasmidsaurus services) and permits mixing of plasmids with as few as two base differences. For more information, please check out our publication (coming soon).\n\n# SAVEMONEY via Google Colab!\n\n- [SAVEMONEY](https://colab.research.google.com/github/MasaakiU/MultiplexNanopore/blob/master/colab/MultiplexNanopore.ipynb)\n- SAVEMONEY_batch (coming soon!)\n\n# SAVEMONEY for local environment\n\n## Requirements\n\nVerified on macOS, Linux, and Windows10\n\n- Python 3.10 or later\n- One of the following C++ compiler (though I don't know the minimum required version number)\n  - [Clang 14.0.0](https://clang.llvm.org)\n  - [GCC 12.2.0](https://gcc.gnu.org)\n  - [Microsoft C++ Build Tools](https://visualstudio.microsoft.com/visual-cpp-build-tools/) (for Windows)\n- biopython>=1.83\n- pandas>=1.5.3\n- parasail>=1.3.4\n- Pillow>=9.4.0\n- PuLP>=2.7.0\n- scipy>=1.11.4\n- snapgene_reader>=0.1.20\n- tqdm>=4.66.1\n- Cython>=3.0.7\n- matplotlib>=3.7.1\n- numpy>=1.23.5\n- pyspoa>=0.2.1\n- pysam>=0.22.0 (optional)\n\n## Installation\n\nSAVEMONEY is available via pip.\n\n```shell\npip install savemoney\n```\n\nIf installation via pip fails, please check the requirements above. If any of the package conflicts with those already present in your environment, I recommend creating a new virtual environment. \n\nIf C++ compiler does not exist, install Xcode Command Line Tools using the following command (for macOS):\n\n```shell\nxcode-select --install\n```\n\nor download [Microsoft C++ Build Tools](https://visualstudio.microsoft.com/visual-cpp-build-tools/) (for Windows).\n\n## Quick usage\n\nSAVEMONEY can be executed either in the python script or via command line.\n\n### Execute SAVEMONEY in python script\n\nTo import and execute SAVEMONEY in the python script. Follow the example below:\n\n```python\nimport savemoney\nsavemoney.pre_survey(\"path_to_sequence_directory\", \"save_directory\", **kwargs)\nsavemoney.post_analysis(\"path_to_sequence_directory\", \"save_directory\", **kwargs)\n```\n\nAll of the plasmid map files with `*.dna` and `.fasta` extension (and in addition `*.fastq` files for post analysis) in the `path_to_sequence_directory` will be used for the analysis. Results will be generated in the `save_directory`. `kwargs` are optional parameters through which you can optimize the analysis:\n\n```python\n# pre-survey\nkwargs = {\n    'distance_threshold':   5,  # main parameter to be changed\n    'number_of_groups':     1,  # main parameter to be changed\n    'gap_open_penalty':     3,  # alignment parameter\n    'gap_extend_penalty':   1,  # alignment parameter\n    'match_score':          1,  # alignment parameter\n    'mismatch_score':      -2,  # alignment parameter\n}\n\n# post-analysis\nkwargs = {\n    'score_threshold':    0.3,   # main parameter to be changed \n    'gap_open_penalty':     3,   # alignment parameter\n    'gap_extend_penalty':   1,   # alignment parameter\n    'match_score':          1,   # alignment parameter\n    'mismatch_score':      -2,   # alignment parameter\n    'error_rate':     0.00001,   # prior probability for Bayesian analysis\n    'del_mut_rate':  0.0001/4,   # prior probability for Bayesian analysis # e.g. \"A -> T, C, G, del\"\n    'ins_rate':       0.00001,   # prior probability for Bayesian analysis\n    'window':             160,   # maximum detectable length of repetitive sequences when wrong plasmid maps are provided: if region of 80 nt is repeated adjascently two times, put the value of 160\n}\n```\n\nFor the meaning of these parameters, please refer to the [SAVEMONEY Google Colab page](https://colab.research.google.com/github/MasaakiU/MultiplexNanopore/blob/master/colab/MultiplexNanopore.ipynb) or the reference below.\n\n### Execute SAVEMONEY via command line\n\nSAVEMONEY can also be executed via command line:\n\n```shell\npython -m savemoney.pre_survey path_to_sequence_directory save_directory\npython -m savemoney.post_analysis path_to_sequence_directory save_directory\n```\n\nParameters can be specified as follows:\n\n```shell\n# pre-survey\npython -m savemoney.pre_survey -h\nusage: __main__.py [-h] [-gop GOP] [-gep GEP] [-ms MS] [-mms MMS] [-dt DT] [-nog NOG] plasmid_map_dir_paths save_dir_base\npositional arguments:\n  plasmid_map_dir_paths path to plasmid map_directory\n  save_dir_base         save directory path\noptions:\n  -h, --help            show this help message and exit\n  -gop GOP              gap_open_penalty, optional, default_value = 3\n  -gep GEP              gap_extend_penalty, optional, default_value = 1\n  -ms MS                match_score, optional, default_value = 1\n  -mms MMS              mismatch_score, optional, default_value = -2\n  -dt DT                distance_threshold, optional, default_value = 5\n  -nog NOG              number_of_groups, optional, default_value = 1\n\n# post-analysis\npython -m savemoney.post_analysis -h\nusage: __main__.py [-h] [-gop GOP] [-gep GEP] [-ms MS] [-mms MMS] [-st ST] [-er ER] [-dmr DMR] [-ir IR] sequence_dir_paths save_dir_base\npositional arguments:\n  sequence_dir_paths  sequence_dir_paths\n  save_dir_base       save directory path\noptions:\n  -h, --help          show this help message and exit\n  -gop GOP            gap_open_penalty, optional, default_value = 3\n  -gep GEP            gap_extend_penalty, optional, default_value = 1\n  -ms MS              match_score, optional, default_value = 1\n  -mms MMS            mismatch_score, optional, default_value = -2\n  -st ST              score_threshold, optional, default_value = 0.3\n  -er ER              error_rate, optional, default_value = 0.0001\n  -dmr DMR            del_mut_rate, optional, default_value = 2.5e-05\n  -ir IR              ins_rate, optional, default_value = 0.0001\n  -w W                window, optional, default_value = 160\n```\n\n### Output\n\nThe interpretation of output files are described on [SAVEMONEY Google Colab page](https://colab.research.google.com/github/MasaakiU/MultiplexNanopore/blob/master/colab/MultiplexNanopore.ipynb#InterpretationOfResults) in details. Other than that, you can visualize consensus alignment results by using `your_plasmid_name.ca` file generated by SAVEMONEY.\n\nFrom python script: \n\n```python\nimport savempney\nsavemoney.show_consensus(consensus_alignment_path, center=2000, seq_range=50, offset=0)\n```\n\nFrom command line: \n\n```shell\npython -m savemoney.show_consensus path_to_consensus_alignment_file\n```\n\nParameters can be specified as follows:\n\n```shell\npython -m savemoney.show_consensus -h\nusage: __main__.py [-h] [--center CENTER] [--seq_range SEQ_RANGE] [--offset OFFSET] consensus_alignment_path\npositional arguments:\n  consensus_alignment_path  path to consensus_alignment (*.ca) file\noptions:\n  -h, --help            show this help message and exit\n  --center CENTER       center, optional, default_value = 2000\n  --seq_range SEQ_RANGE seq_range, optional, default_value = 50\n  --offset OFFSET       offset, optional, default_value = 0\n```\n\nConversion of consensus alignment results (`*.ca`) to `*.bam` and `*.fastq` format is also supported. The conversion requires [pysam>=0.22.0](https://pypi.org/project/pysam/) be installed in your environment. To convert the file, type the following code in a python script:\n\n```python\nimport savemoney\nsavemoney.ca2bam(consensus_alignment_path)\n```\n\nIf you want to convert it via command line, type the following commnad:\n\n```shell\npython -m savemoney.ca2bam path_to_consensus_alignment_file\n```\n\n# References\n\n[Uematsu M., Baskin J. M., \"Barcode-free multiplex plasmid sequencing using Bayesian analysis and nanopore sequencing.\" *eLife*. **2023**; 12: RP88794](https://doi.org/10.7554/eLife.88794.1)\n\n[Slide from Weill Institute Science Workshop, May 22, 2023](https://github.com/MasaakiU/MultiplexNanopore/blob/master/resources/slides/20230522_Weill-Institute-Science-Workshop.pdf)\n\n",
    "bugtrack_url": null,
    "license": "CC BY-NC-SA 4.0",
    "summary": "Simple Algorithm for Very Efficient Multiplexing of Oxford Nanopore Experiments for You!",
    "version": "0.2.16",
    "project_urls": {
        "Homepage": "https://github.com/MasaakiU/MultiplexNanopore"
    },
    "split_keywords": [
        "plasmid",
        " whole-plasmid",
        " sequencing",
        " alignment",
        " multiplexing",
        " barcode-free",
        " bayesian analysis",
        " prior information",
        " "
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "53e8987a5acf03fc5fe34bdf2d5632d9883b85ad5e43a03a89a9796ec5d43e15",
                "md5": "9f84903dda25797b1dddb41030eb472a",
                "sha256": "45266d4cb520d91fdfe5dc5688a277cc68929ce92cb0175c12e41a062e5150cb"
            },
            "downloads": -1,
            "filename": "savemoney-0.2.16-cp310-cp310-macosx_11_0_arm64.whl",
            "has_sig": false,
            "md5_digest": "9f84903dda25797b1dddb41030eb472a",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": ">=3.10",
            "size": 358351,
            "upload_time": "2024-03-20T21:57:00",
            "upload_time_iso_8601": "2024-03-20T21:57:00.358888Z",
            "url": "https://files.pythonhosted.org/packages/53/e8/987a5acf03fc5fe34bdf2d5632d9883b85ad5e43a03a89a9796ec5d43e15/savemoney-0.2.16-cp310-cp310-macosx_11_0_arm64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "cf51a9f3e72bbab92a4f046cd2f85d056d86f04f1daf5d7ff84af8d6a6c320ab",
                "md5": "b19a1030130f89489587c4fe6abe93e3",
                "sha256": "79fc2880195e1f85af2aa79668e5359a66cb53d9dbf0b8b3e050398f7918dd84"
            },
            "downloads": -1,
            "filename": "savemoney-0.2.16.tar.gz",
            "has_sig": false,
            "md5_digest": "b19a1030130f89489587c4fe6abe93e3",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 246309,
            "upload_time": "2024-03-20T21:57:02",
            "upload_time_iso_8601": "2024-03-20T21:57:02.230984Z",
            "url": "https://files.pythonhosted.org/packages/cf/51/a9f3e72bbab92a4f046cd2f85d056d86f04f1daf5d7ff84af8d6a6c320ab/savemoney-0.2.16.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-20 21:57:02",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "MasaakiU",
    "github_project": "MultiplexNanopore",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "savemoney"
}
        
Elapsed time: 0.28765s