<p align="center"><img src="https://github.com/MasaakiU/MultiplexNanopore/raw/master/resources/logo/SAVEMONEY_logo_with_letter.png"/></p>
*Simple Algorithm for Very Efficient Multiplexing of Oxford Nanopore Experiments for You!*
# Overview
SAVEMONEY guides researchers to mix multiple plasmids for submission as a single sample to a commercial long-read sequencing service (e.g., Oxford Nanopore Technology), reducing overall sequencing costs while maintaining fidelity of sequencing results. Following is the outline of the procedure:
- **Step 1. pre-survey** takes plasmid maps as inputs and guides users which groupings of plasmids is optimal.
- **Step 2. submit samples** according to the output of pre-survey.
- **Step 3. post-analysis** execute computational deconvolution of the obtained results, and generate a consensus sequence for each plasmid constituent within the sample mixture. This step must be run separately for each sample mixture.
- An optional third step, **Step 4. visualization of results (optional)** provides a platform for the detailed examination of the alignments and consensus generated in the post-analysis.
<p align="center"><img src="https://github.com/MasaakiU/MultiplexNanopore/raw/master/resources/figures/Fig1_20230313_margin.png" width="500"/></p>
The algorithm permits mixing of six (or potentially even more) plasmids for sequencing with Oxford Nanopore Technology (e.g., Plasmidsaurus services) and permits mixing of plasmids with as few as two base differences. For more information, please check out our publication (coming soon).
# SAVEMONEY via Google Colab!
- [SAVEMONEY](https://colab.research.google.com/github/MasaakiU/MultiplexNanopore/blob/master/colab/MultiplexNanopore.ipynb) (supports both circular and linear alignment)
- [SAVEMONEY BATCH](https://colab.research.google.com/github/MasaakiU/MultiplexNanopore/blob/master/colab/MultiplexNanoporeBatch.ipynb) (execute multiple rounds of post_analysis at once)
# SAVEMONEY for local environment
## Requirements
Verified on macOS, Linux, and Windows10
- Python 3.10 or later
- One of the following C++ compiler (though I don't know the minimum required version number)
- [Clang 14.0.0](https://clang.llvm.org)
- [GCC 12.2.0](https://gcc.gnu.org)
- [Microsoft C++ Build Tools](https://visualstudio.microsoft.com/visual-cpp-build-tools/) (for Windows)
- biopython>=1.83
- pandas>=1.5.3
- parasail>=1.3.4
- Pillow>=9.4.0
- PuLP>=2.7.0
- scipy>=1.11.4
- snapgene_reader>=0.1.20
- tqdm>=4.66.1
- Cython>=3.0.7
- matplotlib>=3.7.1
- numpy>=1.23.5
- pyspoa>=0.2.1
- pysam>=0.22.0 (optional)
## Installation
SAVEMONEY is available via pip.
```shell
pip install savemoney
```
If installation via pip fails, please check the requirements above. If any of the package conflicts with those already present in your environment, I recommend creating a new virtual environment.
If C++ compiler does not exist, install Xcode Command Line Tools using the following command (for macOS):
```shell
xcode-select --install
```
or download [Microsoft C++ Build Tools](https://visualstudio.microsoft.com/visual-cpp-build-tools/) (for Windows).
## Quick usage
SAVEMONEY can be executed either in the python script or via command line.
### Execute SAVEMONEY in python script
To import and execute SAVEMONEY in the python script. Follow the example below:
```python
import savemoney
savemoney.pre_survey("path_to_sequence_directory", "save_directory", **kwargs)
savemoney.post_analysis("path_to_sequence_directory", "save_directory", **kwargs)
```
All of the plasmid map files with `*.dna` and `.fasta` extension (and in addition `*.fastq` files for post analysis) in the `path_to_sequence_directory` will be used for the analysis. Results will be generated in the `save_directory`. `kwargs` are optional parameters through which you can optimize the analysis:
```python
# pre-survey
kwargs = {
'distance_threshold': 5, # main parameter to be changed
'number_of_groups': 1, # main parameter to be changed
'gap_open_penalty': 3, # alignment parameter
'gap_extend_penalty': 1, # alignment parameter
'match_score': 1, # alignment parameter
'mismatch_score': -2, # alignment parameter
'topology_of_dna': 0, # 0: circular, 1: linear
'n_cpu': 2, # number of cpu cores to be used
'export_image_results': 1, # 0; skip export of svg figure files, 1: export svg figure files
}
# post-analysis
kwargs = {
'score_threshold': 0.3, # main parameter to be changed
'gap_open_penalty': 3, # alignment parameter
'gap_extend_penalty': 1, # alignment parameter
'match_score': 1, # alignment parameter
'mismatch_score': -2, # alignment parameter
'error_rate': 0.00001, # prior probability for Bayesian analysis
'ins_rate': 0.00001, # prior probability for Bayesian analysis
'window': 160, # maximum detectable length of repetitive sequences when wrong plasmid maps are provided: if region of 80 nt is repeated adjascently two times, put the value of 160
'topology_of_dna': 0, # 0: circular, 1: linear
'n_cpu': 2, # number of cpu cores to be used
'export_image_results': 1, # 0; skip export of svg figure files, 1: export svg figure files
}
```
For the meaning of these parameters, please refer to the [SAVEMONEY Google Colab page](https://colab.research.google.com/github/MasaakiU/MultiplexNanopore/blob/master/colab/MultiplexNanopore.ipynb) or the reference below.
### Execute SAVEMONEY via command line
SAVEMONEY can also be executed via command line:
```shell
python -m savemoney.pre_survey path_to_sequence_directory save_directory
python -m savemoney.post_analysis path_to_sequence_directory save_directory
```
Parameters can be specified as follows:
```shell
# pre-survey
python -m savemoney.pre_survey -h
usage: __main__.py [-h] [-gop GOP] [-gep GEP] [-ms MS] [-mms MMS] [-dt DT] [-nog NOG] plasmid_map_dir_paths save_dir_base
positional arguments:
plasmid_map_dir_paths path to plasmid map_directory
save_dir_base save directory path
options:
-h, --help show this help message and exit
-gop GOP gap_open_penalty, optional, default_value = 3
-gep GEP gap_extend_penalty, optional, default_value = 1
-ms MS match_score, optional, default_value = 1
-mms MMS mismatch_score, optional, default_value = -2
-dt DT distance_threshold, optional, default_value = 5
-nog NOG number_of_groups, optional, default_value = 1
-tod TOD topology_of_dna, optional, default_value = 0 (0: circular, 1: linear)
-nc NC n_cpu, optional, default_value = 2
-eir EIR export_image_results, optional, default_value = 1
# post-analysis
python -m savemoney.post_analysis -h
usage: __main__.py [-h] [-gop GOP] [-gep GEP] [-ms MS] [-mms MMS] [-st ST] [-er ER] [-dmr DMR] [-ir IR] sequence_dir_paths save_dir_base
positional arguments:
sequence_dir_paths sequence_dir_paths
save_dir_base save directory path
options:
-h, --help show this help message and exit
-gop GOP gap_open_penalty, optional, default_value = 3
-gep GEP gap_extend_penalty, optional, default_value = 1
-ms MS match_score, optional, default_value = 1
-mms MMS mismatch_score, optional, default_value = -2
-st ST score_threshold, optional, default_value = 0.3
-er ER error_rate, optional, default_value = 1e-07
-ir IR ins_rate, optional, default_value = 1e-07
-w W window, optional, default_value = 160
-tod TOD topology_of_dna, optional, default_value = 0 (0: circular, 1: linear)
-nc NC n_cpu, optional, default_value = 2
-eir EIR export_image_results, optional, default_value = 1
```
### Output
The interpretation of output files are described on [SAVEMONEY Google Colab page](https://colab.research.google.com/github/MasaakiU/MultiplexNanopore/blob/master/colab/MultiplexNanopore.ipynb#InterpretationOfResults) in details. Other than that, you can visualize consensus alignment results by using `your_plasmid_name.ca` file generated by SAVEMONEY.
From python script:
```python
import savempney
savemoney.show_consensus(consensus_alignment_path, center=2000, seq_range=50, offset=0)
```
From command line:
```shell
python -m savemoney.show_consensus path_to_consensus_alignment_file
```
Parameters can be specified as follows:
```shell
python -m savemoney.show_consensus -h
usage: __main__.py [-h] [--center CENTER] [--seq_range SEQ_RANGE] [--offset OFFSET] consensus_alignment_path
positional arguments:
consensus_alignment_path path to consensus_alignment (*.ca) file
options:
-h, --help show this help message and exit
--center CENTER center, optional, default_value = 2000
--seq_range SEQ_RANGE seq_range, optional, default_value = 50
--offset OFFSET offset, optional, default_value = 0
```
Conversion of consensus alignment results (`*.ca`) to `*.bam` and `*.fastq` format is also supported. The conversion requires [pysam>=0.22.0](https://pypi.org/project/pysam/) be installed in your environment. To convert the file, type the following code in a python script:
```python
import savemoney
savemoney.ca2bam(consensus_alignment_path)
```
If you want to convert it via command line, type the following commnad:
```shell
python -m savemoney.ca2bam path_to_consensus_alignment_file
```
# References
[Uematsu M., Baskin J. M., "Barcode-free multiplex plasmid sequencing using Bayesian analysis and nanopore sequencing." *eLife*. **2023**; 12: RP88794](https://doi.org/10.7554/eLife.88794.1)
[Slide from Weill Institute Science Workshop, May 22, 2023](https://github.com/MasaakiU/MultiplexNanopore/blob/master/resources/slides/20230522_Weill-Institute-Science-Workshop.pdf)
Raw data
{
"_id": null,
"home_page": "https://github.com/MasaakiU/MultiplexNanopore",
"name": "savemoney",
"maintainer": "Masaaki Uematsu",
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": "mu84@cornell.edu",
"keywords": "plasmid, whole-plasmid, sequencing, alignment, multiplexing, barcode-free, bayesian analysis, prior information, ",
"author": "Masaaki Uematsu",
"author_email": "mu84@cornell.edu",
"download_url": "https://files.pythonhosted.org/packages/5c/bd/060e7fd792bc2ed97aad5a33fb7ee7a65aeba36fec333c2ef9318e739b5c/savemoney-0.3.4.tar.gz",
"platform": null,
"description": "<p align=\"center\"><img src=\"https://github.com/MasaakiU/MultiplexNanopore/raw/master/resources/logo/SAVEMONEY_logo_with_letter.png\"/></p>\n\n*Simple Algorithm for Very Efficient Multiplexing of Oxford Nanopore Experiments for You!*\n\n# Overview\n\nSAVEMONEY guides researchers to mix multiple plasmids for submission as a single sample to a commercial long-read sequencing service (e.g., Oxford Nanopore Technology), reducing overall sequencing costs while maintaining fidelity of sequencing results. Following is the outline of the procedure:\n\n- **Step 1. pre-survey** takes plasmid maps as inputs and guides users which groupings of plasmids is optimal.\n- **Step 2. submit samples** according to the output of pre-survey.\n- **Step 3. post-analysis** execute computational deconvolution of the obtained results, and generate a consensus sequence for each plasmid constituent within the sample mixture. This step must be run separately for each sample mixture.\n- An optional third step, **Step 4. visualization of results (optional)** provides a platform for the detailed examination of the alignments and consensus generated in the post-analysis.\n\n<p align=\"center\"><img src=\"https://github.com/MasaakiU/MultiplexNanopore/raw/master/resources/figures/Fig1_20230313_margin.png\" width=\"500\"/></p>\n\nThe algorithm permits mixing of six (or potentially even more) plasmids for sequencing with Oxford Nanopore Technology (e.g., Plasmidsaurus services) and permits mixing of plasmids with as few as two base differences. For more information, please check out our publication (coming soon).\n\n# SAVEMONEY via Google Colab!\n\n- [SAVEMONEY](https://colab.research.google.com/github/MasaakiU/MultiplexNanopore/blob/master/colab/MultiplexNanopore.ipynb) (supports both circular and linear alignment)\n- [SAVEMONEY BATCH](https://colab.research.google.com/github/MasaakiU/MultiplexNanopore/blob/master/colab/MultiplexNanoporeBatch.ipynb) (execute multiple rounds of post_analysis at once)\n\n# SAVEMONEY for local environment\n\n## Requirements\n\nVerified on macOS, Linux, and Windows10\n\n- Python 3.10 or later\n- One of the following C++ compiler (though I don't know the minimum required version number)\n - [Clang 14.0.0](https://clang.llvm.org)\n - [GCC 12.2.0](https://gcc.gnu.org)\n - [Microsoft C++ Build Tools](https://visualstudio.microsoft.com/visual-cpp-build-tools/) (for Windows)\n- biopython>=1.83\n- pandas>=1.5.3\n- parasail>=1.3.4\n- Pillow>=9.4.0\n- PuLP>=2.7.0\n- scipy>=1.11.4\n- snapgene_reader>=0.1.20\n- tqdm>=4.66.1\n- Cython>=3.0.7\n- matplotlib>=3.7.1\n- numpy>=1.23.5\n- pyspoa>=0.2.1\n- pysam>=0.22.0 (optional)\n\n## Installation\n\nSAVEMONEY is available via pip.\n\n```shell\npip install savemoney\n```\n\nIf installation via pip fails, please check the requirements above. If any of the package conflicts with those already present in your environment, I recommend creating a new virtual environment. \n\nIf C++ compiler does not exist, install Xcode Command Line Tools using the following command (for macOS):\n\n```shell\nxcode-select --install\n```\n\nor download [Microsoft C++ Build Tools](https://visualstudio.microsoft.com/visual-cpp-build-tools/) (for Windows).\n\n## Quick usage\n\nSAVEMONEY can be executed either in the python script or via command line.\n\n### Execute SAVEMONEY in python script\n\nTo import and execute SAVEMONEY in the python script. Follow the example below:\n\n```python\nimport savemoney\nsavemoney.pre_survey(\"path_to_sequence_directory\", \"save_directory\", **kwargs)\nsavemoney.post_analysis(\"path_to_sequence_directory\", \"save_directory\", **kwargs)\n```\n\nAll of the plasmid map files with `*.dna` and `.fasta` extension (and in addition `*.fastq` files for post analysis) in the `path_to_sequence_directory` will be used for the analysis. Results will be generated in the `save_directory`. `kwargs` are optional parameters through which you can optimize the analysis:\n\n```python\n# pre-survey\nkwargs = {\n 'distance_threshold': 5, # main parameter to be changed\n 'number_of_groups': 1, # main parameter to be changed\n 'gap_open_penalty': 3, # alignment parameter\n 'gap_extend_penalty': 1, # alignment parameter\n 'match_score': 1, # alignment parameter\n 'mismatch_score': -2, # alignment parameter\n 'topology_of_dna': 0, # 0: circular, 1: linear\n 'n_cpu': 2, # number of cpu cores to be used\n 'export_image_results': 1, # 0; skip export of svg figure files, 1: export svg figure files\n}\n\n# post-analysis\nkwargs = {\n 'score_threshold': 0.3, # main parameter to be changed \n 'gap_open_penalty': 3, # alignment parameter\n 'gap_extend_penalty': 1, # alignment parameter\n 'match_score': 1, # alignment parameter\n 'mismatch_score': -2, # alignment parameter\n 'error_rate': 0.00001, # prior probability for Bayesian analysis\n 'ins_rate': 0.00001, # prior probability for Bayesian analysis\n 'window': 160, # maximum detectable length of repetitive sequences when wrong plasmid maps are provided: if region of 80 nt is repeated adjascently two times, put the value of 160\n 'topology_of_dna': 0, # 0: circular, 1: linear\n 'n_cpu': 2, # number of cpu cores to be used\n 'export_image_results': 1, # 0; skip export of svg figure files, 1: export svg figure files\n}\n```\n\nFor the meaning of these parameters, please refer to the [SAVEMONEY Google Colab page](https://colab.research.google.com/github/MasaakiU/MultiplexNanopore/blob/master/colab/MultiplexNanopore.ipynb) or the reference below.\n\n### Execute SAVEMONEY via command line\n\nSAVEMONEY can also be executed via command line:\n\n```shell\npython -m savemoney.pre_survey path_to_sequence_directory save_directory\npython -m savemoney.post_analysis path_to_sequence_directory save_directory\n```\n\nParameters can be specified as follows:\n\n```shell\n# pre-survey\npython -m savemoney.pre_survey -h\nusage: __main__.py [-h] [-gop GOP] [-gep GEP] [-ms MS] [-mms MMS] [-dt DT] [-nog NOG] plasmid_map_dir_paths save_dir_base\npositional arguments:\n plasmid_map_dir_paths path to plasmid map_directory\n save_dir_base save directory path\noptions:\n -h, --help show this help message and exit\n -gop GOP gap_open_penalty, optional, default_value = 3\n -gep GEP gap_extend_penalty, optional, default_value = 1\n -ms MS match_score, optional, default_value = 1\n -mms MMS mismatch_score, optional, default_value = -2\n -dt DT distance_threshold, optional, default_value = 5\n -nog NOG number_of_groups, optional, default_value = 1\n -tod TOD topology_of_dna, optional, default_value = 0 (0: circular, 1: linear)\n -nc NC n_cpu, optional, default_value = 2\n -eir EIR export_image_results, optional, default_value = 1\n\n# post-analysis\npython -m savemoney.post_analysis -h\nusage: __main__.py [-h] [-gop GOP] [-gep GEP] [-ms MS] [-mms MMS] [-st ST] [-er ER] [-dmr DMR] [-ir IR] sequence_dir_paths save_dir_base\npositional arguments:\n sequence_dir_paths sequence_dir_paths\n save_dir_base save directory path\noptions:\n -h, --help show this help message and exit\n -gop GOP gap_open_penalty, optional, default_value = 3\n -gep GEP gap_extend_penalty, optional, default_value = 1\n -ms MS match_score, optional, default_value = 1\n -mms MMS mismatch_score, optional, default_value = -2\n -st ST score_threshold, optional, default_value = 0.3\n -er ER error_rate, optional, default_value = 1e-07\n -ir IR ins_rate, optional, default_value = 1e-07\n -w W window, optional, default_value = 160\n -tod TOD topology_of_dna, optional, default_value = 0 (0: circular, 1: linear)\n -nc NC n_cpu, optional, default_value = 2\n -eir EIR export_image_results, optional, default_value = 1\n```\n\n### Output\n\nThe interpretation of output files are described on [SAVEMONEY Google Colab page](https://colab.research.google.com/github/MasaakiU/MultiplexNanopore/blob/master/colab/MultiplexNanopore.ipynb#InterpretationOfResults) in details. Other than that, you can visualize consensus alignment results by using `your_plasmid_name.ca` file generated by SAVEMONEY.\n\nFrom python script: \n\n```python\nimport savempney\nsavemoney.show_consensus(consensus_alignment_path, center=2000, seq_range=50, offset=0)\n```\n\nFrom command line: \n\n```shell\npython -m savemoney.show_consensus path_to_consensus_alignment_file\n```\n\nParameters can be specified as follows:\n\n```shell\npython -m savemoney.show_consensus -h\nusage: __main__.py [-h] [--center CENTER] [--seq_range SEQ_RANGE] [--offset OFFSET] consensus_alignment_path\npositional arguments:\n consensus_alignment_path path to consensus_alignment (*.ca) file\noptions:\n -h, --help show this help message and exit\n --center CENTER center, optional, default_value = 2000\n --seq_range SEQ_RANGE seq_range, optional, default_value = 50\n --offset OFFSET offset, optional, default_value = 0\n```\n\nConversion of consensus alignment results (`*.ca`) to `*.bam` and `*.fastq` format is also supported. The conversion requires [pysam>=0.22.0](https://pypi.org/project/pysam/) be installed in your environment. To convert the file, type the following code in a python script:\n\n```python\nimport savemoney\nsavemoney.ca2bam(consensus_alignment_path)\n```\n\nIf you want to convert it via command line, type the following commnad:\n\n```shell\npython -m savemoney.ca2bam path_to_consensus_alignment_file\n```\n\n# References\n\n[Uematsu M., Baskin J. M., \"Barcode-free multiplex plasmid sequencing using Bayesian analysis and nanopore sequencing.\" *eLife*. **2023**; 12: RP88794](https://doi.org/10.7554/eLife.88794.1)\n\n[Slide from Weill Institute Science Workshop, May 22, 2023](https://github.com/MasaakiU/MultiplexNanopore/blob/master/resources/slides/20230522_Weill-Institute-Science-Workshop.pdf)\n\n",
"bugtrack_url": null,
"license": "CC BY-NC-SA 4.0",
"summary": "Simple Algorithm for Very Efficient Multiplexing of Oxford Nanopore Experiments for You!",
"version": "0.3.4",
"project_urls": {
"Homepage": "https://github.com/MasaakiU/MultiplexNanopore"
},
"split_keywords": [
"plasmid",
" whole-plasmid",
" sequencing",
" alignment",
" multiplexing",
" barcode-free",
" bayesian analysis",
" prior information",
" "
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "99435314d35b5f8307f01da74da2b91d7f80094f9549a5178da9880b472b83fd",
"md5": "0e3cab715d98e123a980449571d4f440",
"sha256": "d32f678fd71da0acbfa9b6c215a5d5ddb58aa2d23d6a02778798171f0f7caa8f"
},
"downloads": -1,
"filename": "savemoney-0.3.4-cp310-cp310-macosx_11_0_arm64.whl",
"has_sig": false,
"md5_digest": "0e3cab715d98e123a980449571d4f440",
"packagetype": "bdist_wheel",
"python_version": "cp310",
"requires_python": ">=3.10",
"size": 362379,
"upload_time": "2024-08-26T22:37:59",
"upload_time_iso_8601": "2024-08-26T22:37:59.763921Z",
"url": "https://files.pythonhosted.org/packages/99/43/5314d35b5f8307f01da74da2b91d7f80094f9549a5178da9880b472b83fd/savemoney-0.3.4-cp310-cp310-macosx_11_0_arm64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "5cbd060e7fd792bc2ed97aad5a33fb7ee7a65aeba36fec333c2ef9318e739b5c",
"md5": "d54b8f3fcfca4f0a4df1cef8d147697d",
"sha256": "caf3b1d6b110bf65928eddcf1cb31e45aebf0d80fb20afafca6e7422d8e0d7c7"
},
"downloads": -1,
"filename": "savemoney-0.3.4.tar.gz",
"has_sig": false,
"md5_digest": "d54b8f3fcfca4f0a4df1cef8d147697d",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 250359,
"upload_time": "2024-08-26T22:38:01",
"upload_time_iso_8601": "2024-08-26T22:38:01.597040Z",
"url": "https://files.pythonhosted.org/packages/5c/bd/060e7fd792bc2ed97aad5a33fb7ee7a65aeba36fec333c2ef9318e739b5c/savemoney-0.3.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-08-26 22:38:01",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "MasaakiU",
"github_project": "MultiplexNanopore",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "savemoney"
}