<h1 align="center">
<a href="https://github.com/ayaanhossain/oligopool/">
<img src="https://raw.githubusercontent.com/ayaanhossain/repfmt/main/oligopool/img/logo.svg" alt="Oligopool Calculator" width="460" class="center"/>
</a>
</h1>
<h4><p align="center">Version: 2024.11.03</p></h4>
<p align="center">
<a style="text-decoration: none" href="#Installation">Installation</a> •
<a style="text-decoration: none" href="#Getting-Started">Getting Started</a> •
<a style="text-decoration: none" href="#License">License</a> •
<a style="text-decoration: none" href="#Citation">Citation</a>
</p>
`Oligopool Calculator` is a suite of algorithms for automated design and analysis of [oligopool libraries](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9300125/).
It enables the scalable design of universal primer sets, error-correctable barcodes, the splitting of long constructs into multiple oligos, and the rapid packing and counting of barcoded reads -- all on a regular 8-core desktop computer.
We have used `Oligopool Calculator` in multiple projects to build libraries of tens of thousands of promoters, ribozymes, and mRNA stability elements, illustrating the use of a flexible grammar to add multiple barcodes, cut sites, avoid excluded sequences, and optimize experimental constraints. These libraries were later characterized using highly efficient barcode counting provided by `Oligopool Calculator`.
`Oligopool Calculator` facilitates the creative design and application of massively parallel reporter assays by automating and simplifying the whole process. It has been benchmarked on simulated libraries containing millions of defined variants and to analyze billions of reads.
<h1 align="center">
<a href="https://github.com/ayaanhossain/oligopool/">
<img src="https://raw.githubusercontent.com/ayaanhossain/repfmt/refs/heads/main/oligopool/img/workflow.svg" alt="Oligopool Calculator Workflow" width="3840" class="center"/>
</a>
</h1>
**Design and analysis of oligopool variants using `Oligopool Calculator`.** **(a)** In `Design Mode`, `Oligopool Calculator` can be used to generate optimized `barcode`s, `primer`s, `spacer`s, `motif`s and `split` longer oligos into shorter `pad`ded fragments for downstream synthesis and assembly. **(b)** Once the library is assembled and cloned, barcoded amplicon sequencing data can be processed via `Analysis Mode` for characterization. `Analysis Mode` proceeds by first `index`ing one or more sets of barcodes, `pack`ing the reads, and then producing count matrices either using `acount` (association counting) or `xcount` (combinatorial counting).
## Installation
`Oligopool Calculator` is a `Python3.10+`-exclusive library.
On `Linux`, `MacOS` and `Windows Subsystem for Linux` you can install `Oligopool Calculator` from [PyPI](https://pypi.org/project/oligopool/), where it is published as the `oligopool` package
```bash
$ pip install oligopool
```
or install it directly from GitHub.
```bash
$ pip install git+https://github.com/ayaanhossain/oligopool.git
```
Both approaches should install all dependencies automatically.
> **Note** This GitHub version will always be updated with all recent fixes. The PyPI version should be more stable.
If you are on `Windows` or simply prefer to, `Oligopool Calculator` can also be used via `docker` (see [our notes](https://github.com/ayaanhossain/oligopool/blob/master/docker-notes.md)).
**Verifying Installation**
Successful installation will look like this.
```python
$ python
Python 3.10.9 | packaged by conda-forge | (main, Feb 2 2023, 20:20:04) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import oligopool as op
>>> op.__version__
'2024.10.24'
>>>
```
## Getting Started
`Oligopool Calculator` is carefully designed, easy to use, and stupid fast.
You can import the library and use its various functions either in a script or interactively inside a `jupyter` environment. Use `help(...)` to read the docs as necessary and follow along.
There are examples of a [design parser](https://github.com/ayaanhossain/oligopool/blob/master/examples/design-parser/design_parser.py) and an [analysis pipleine](https://github.com/ayaanhossain/oligopool/blob/master/examples/analysis-pipeline/analysis_pipeline.py) inside the [`examples`](https://github.com/ayaanhossain/oligopool/tree/master/examples) directory.
A notebook demonstrating [`Oligopool Calculator` in action](https://github.com/ayaanhossain/oligopool/blob/master/examples/OligopoolCalculatorInAction.ipynb) is provided there as well.
```python
$ python
Python 3.12.6 | packaged by conda-forge | (main, Sep 30 2024, 18:08:52) [GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> import oligopool as op
>>> help(op)
...
oligopool v2024.10.24
by ah
Automated design and analysis of oligopool libraries.
The various modules in Oligopool Calculator can be used
interactively in a jupyter notebook, or be used to define
scripts for design and analysis pipelines on the cloud.
Oligopool Calculator offers two modes of operation
- Design Mode for designing oligopool libraries, and
- Analysis Mode for analyzing oligopool datasets.
Design Mode workflow
1. Initialize a pandas DataFrame with core library elements.
a. The DataFrame must contain a unique 'ID' column serving as primary key.
b. All other columns in the DataFrame must be DNA sequences.
2. Define any optional background sequences via the background module.
3. Add necessary oligopool elements with constraints via element modules.
4. Optionally, split long oligos and pad them via assembly modules.
5. Perform additional maneuvers and finalize library via auxiliary modules.
Background module available
- background
Element modules available
- primer
- barcode
- motif
- spacer
Assembly modules available
- split
- pad
Auxiliary modules available
- merge
- revcomp
- lenstat
- final
Design Mode example sketch
>>> import pandas as pd
>>> import oligopool as op
>>>
>>> # Read initial library
>>> init_df = pd.read_csv('initial_library.csv')
>>>
>>> # Add oligo elements one by one
>>> primer_df, stats = op.primer(input_data=init_df, ...)
>>> barcode_df, stats = op.barcode(input_data=primer_df, ...)
...
>>> # Check length statistics as needed
>>> length_stats = op.lenstat(input_data=further_along_df)
...
>>>
>>> # Split and pad longer oligos if needed
>>> split_df, stats = op.split(input_data=even_further_along_df, ...)
>>> first_pad_df, stats = op.pad(input_data=split_df, ...)
>>> second_pad_df, stats = op.pad(input_data=split_df, ...)
...
>>>
>>> # Finalize the library
>>> final_df, stats = op.final(input_data=ready_to_go_df, ...)
...
Analysis Mode workflow
1. Index one or more CSVs containing the barcode information.
2. Pack all NGS FastQ files, optionally merging them if required.
3. Use acount for association counting of variants and barcodes.
4. If multiple barcode combinations are to be counted use xcount.
5. Combine count DataFrames and perform stats and ML as necessary.
Indexing module available
- index
Packing module available
- pack
Counting modules available
- acount
- xcount
Analysis Mode example sketch
>>> import pandas as pd
>>> import oligopool as op
>>>
>>> # Read annotated library
>>> bc1_df = pd.read_csv('barcode_1.csv')
>>> bc2_df = pd.read_csv('barcode_2.csv')
>>> av1_df = pd.read_csv('associate_1.csv')
...
>>>
>>> # Index barcodes and any associates
>>> bc1_index_stats = op.index(barcode_data=bc1_df, barcode_column='BC1', ...)
>>> bc2_index_stats = op.index(barcode_data=bc2_df, barcode_column='BC2', ...)
...
>>>
>>> # Pack experiment FastQ files
>>> sam1_pack_stats = op.pack(r1_file='sample_1_R1.fq.gz', ...)
>>> sam2_pack_stats = op.pack(r1_file='sample_2_R1.fq.gz', ...)
...
>>>
>>> # Compute and write barcode combination count matrix
>>> xcount_df, stats = op.xcount(index_files=['bc1_index', 'bc2_index'],
... pack_file='sample_1_pack', ...)
...
You can learn more about each module using help.
>>> import oligopool as op
>>> help(op)
>>> help(op.primer)
>>> help(op.barcode)
...
>>> help(op.xcount)
For advanced uses, the following classes are also available.
- vectorDB
- Scry
...
```
## License
`Oligpool Calculator` (c) 2024 Ayaan Hossain.
`Oligpool Calculator` is an **open-source software** under [GPL-3.0](https://opensource.org/license/gpl-3-0) License.
See [LICENSE](https://github.com/ayaanhossain/oligopool/blob/master/LICENSE) file for more details.
Raw data
{
"_id": null,
"home_page": "https://github.com/ayaanhossain/oligopool",
"name": "oligopool",
"maintainer": null,
"docs_url": null,
"requires_python": "<4,>=3.10",
"maintainer_email": null,
"keywords": "synthetic computational biology nucleotide oligo pool calculator design analysis barcode primer spacer motif split pad assembly index pack scry classifier count acount xcount",
"author": "Ayaan Hossain and Howard Salis",
"author_email": "auh57@psu.edu, salis@psu.edu",
"download_url": "https://files.pythonhosted.org/packages/de/2a/6a8070da144efbc358f18b6d518101e97f43d425bdd6e9ed0b168ba50f8d/oligopool-2024.11.3.tar.gz",
"platform": null,
"description": "<h1 align=\"center\">\n <a href=\"https://github.com/ayaanhossain/oligopool/\">\n <img src=\"https://raw.githubusercontent.com/ayaanhossain/repfmt/main/oligopool/img/logo.svg\" alt=\"Oligopool Calculator\" width=\"460\" class=\"center\"/>\n </a>\n</h1>\n\n<h4><p align=\"center\">Version: 2024.11.03</p></h4>\n\n<p align=\"center\">\n <a style=\"text-decoration: none\" href=\"#Installation\">Installation</a> \u2022\n <a style=\"text-decoration: none\" href=\"#Getting-Started\">Getting Started</a> \u2022\n <a style=\"text-decoration: none\" href=\"#License\">License</a> \u2022\n <a style=\"text-decoration: none\" href=\"#Citation\">Citation</a>\n</p>\n\n`Oligopool Calculator` is a suite of algorithms for automated design and analysis of [oligopool libraries](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9300125/).\n\nIt enables the scalable design of universal primer sets, error-correctable barcodes, the splitting of long constructs into multiple oligos, and the rapid packing and counting of barcoded reads -- all on a regular 8-core desktop computer.\n\nWe have used `Oligopool Calculator` in multiple projects to build libraries of tens of thousands of promoters, ribozymes, and mRNA stability elements, illustrating the use of a flexible grammar to add multiple barcodes, cut sites, avoid excluded sequences, and optimize experimental constraints. These libraries were later characterized using highly efficient barcode counting provided by `Oligopool Calculator`.\n\n`Oligopool Calculator` facilitates the creative design and application of massively parallel reporter assays by automating and simplifying the whole process. It has been benchmarked on simulated libraries containing millions of defined variants and to analyze billions of reads.\n\n<h1 align=\"center\">\n <a href=\"https://github.com/ayaanhossain/oligopool/\">\n <img src=\"https://raw.githubusercontent.com/ayaanhossain/repfmt/refs/heads/main/oligopool/img/workflow.svg\" alt=\"Oligopool Calculator Workflow\" width=\"3840\" class=\"center\"/>\n </a>\n</h1>\n\n**Design and analysis of oligopool variants using `Oligopool Calculator`.** **(a)** In `Design Mode`, `Oligopool Calculator` can be used to generate optimized `barcode`s, `primer`s, `spacer`s, `motif`s and `split` longer oligos into shorter `pad`ded fragments for downstream synthesis and assembly. **(b)** Once the library is assembled and cloned, barcoded amplicon sequencing data can be processed via `Analysis Mode` for characterization. `Analysis Mode` proceeds by first `index`ing one or more sets of barcodes, `pack`ing the reads, and then producing count matrices either using `acount` (association counting) or `xcount` (combinatorial counting).\n\n\n## Installation\n\n`Oligopool Calculator` is a `Python3.10+`-exclusive library.\n\nOn `Linux`, `MacOS` and `Windows Subsystem for Linux` you can install `Oligopool Calculator` from [PyPI](https://pypi.org/project/oligopool/), where it is published as the `oligopool` package\n```bash\n$ pip install oligopool\n```\nor install it directly from GitHub.\n```bash\n$ pip install git+https://github.com/ayaanhossain/oligopool.git\n```\nBoth approaches should install all dependencies automatically.\n> **Note** This GitHub version will always be updated with all recent fixes. The PyPI version should be more stable.\n\nIf you are on `Windows` or simply prefer to, `Oligopool Calculator` can also be used via `docker` (see [our notes](https://github.com/ayaanhossain/oligopool/blob/master/docker-notes.md)).\n\n**Verifying Installation**\n\nSuccessful installation will look like this.\n```python\n$ python\nPython 3.10.9 | packaged by conda-forge | (main, Feb 2 2023, 20:20:04) [GCC 11.3.0] on linux\nType \"help\", \"copyright\", \"credits\" or \"license\" for more information.\n>>> import oligopool as op\n>>> op.__version__\n'2024.10.24'\n>>>\n```\n\n## Getting Started\n\n`Oligopool Calculator` is carefully designed, easy to use, and stupid fast.\n\nYou can import the library and use its various functions either in a script or interactively inside a `jupyter` environment. Use `help(...)` to read the docs as necessary and follow along.\n\nThere are examples of a [design parser](https://github.com/ayaanhossain/oligopool/blob/master/examples/design-parser/design_parser.py) and an [analysis pipleine](https://github.com/ayaanhossain/oligopool/blob/master/examples/analysis-pipeline/analysis_pipeline.py) inside the [`examples`](https://github.com/ayaanhossain/oligopool/tree/master/examples) directory.\n\nA notebook demonstrating [`Oligopool Calculator` in action](https://github.com/ayaanhossain/oligopool/blob/master/examples/OligopoolCalculatorInAction.ipynb) is provided there as well.\n\n```python\n$ python\nPython 3.12.6 | packaged by conda-forge | (main, Sep 30 2024, 18:08:52) [GCC 13.3.0] on linux\nType \"help\", \"copyright\", \"credits\" or \"license\" for more information.\n>>>\n>>> import oligopool as op\n>>> help(op)\n...\n oligopool v2024.10.24\n by ah\n\n Automated design and analysis of oligopool libraries.\n\n The various modules in Oligopool Calculator can be used\n interactively in a jupyter notebook, or be used to define\n scripts for design and analysis pipelines on the cloud.\n\n Oligopool Calculator offers two modes of operation\n - Design Mode for designing oligopool libraries, and\n - Analysis Mode for analyzing oligopool datasets.\n\n Design Mode workflow\n\n 1. Initialize a pandas DataFrame with core library elements.\n a. The DataFrame must contain a unique 'ID' column serving as primary key.\n b. All other columns in the DataFrame must be DNA sequences.\n 2. Define any optional background sequences via the background module.\n 3. Add necessary oligopool elements with constraints via element modules.\n 4. Optionally, split long oligos and pad them via assembly modules.\n 5. Perform additional maneuvers and finalize library via auxiliary modules.\n\n Background module available\n - background\n\n Element modules available\n - primer\n - barcode\n - motif\n - spacer\n\n Assembly modules available\n - split\n - pad\n\n Auxiliary modules available\n - merge\n - revcomp\n - lenstat\n - final\n\n Design Mode example sketch\n\n >>> import pandas as pd\n >>> import oligopool as op\n >>>\n >>> # Read initial library\n >>> init_df = pd.read_csv('initial_library.csv')\n >>>\n >>> # Add oligo elements one by one\n >>> primer_df, stats = op.primer(input_data=init_df, ...)\n >>> barcode_df, stats = op.barcode(input_data=primer_df, ...)\n ...\n >>> # Check length statistics as needed\n >>> length_stats = op.lenstat(input_data=further_along_df)\n ...\n >>>\n >>> # Split and pad longer oligos if needed\n >>> split_df, stats = op.split(input_data=even_further_along_df, ...)\n >>> first_pad_df, stats = op.pad(input_data=split_df, ...)\n >>> second_pad_df, stats = op.pad(input_data=split_df, ...)\n ...\n >>>\n >>> # Finalize the library\n >>> final_df, stats = op.final(input_data=ready_to_go_df, ...)\n ...\n\n Analysis Mode workflow\n\n 1. Index one or more CSVs containing the barcode information.\n 2. Pack all NGS FastQ files, optionally merging them if required.\n 3. Use acount for association counting of variants and barcodes.\n 4. If multiple barcode combinations are to be counted use xcount.\n 5. Combine count DataFrames and perform stats and ML as necessary.\n\n Indexing module available\n - index\n\n Packing module available\n - pack\n\n Counting modules available\n - acount\n - xcount\n\n Analysis Mode example sketch\n\n >>> import pandas as pd\n >>> import oligopool as op\n >>>\n >>> # Read annotated library\n >>> bc1_df = pd.read_csv('barcode_1.csv')\n >>> bc2_df = pd.read_csv('barcode_2.csv')\n >>> av1_df = pd.read_csv('associate_1.csv')\n ...\n >>>\n >>> # Index barcodes and any associates\n >>> bc1_index_stats = op.index(barcode_data=bc1_df, barcode_column='BC1', ...)\n >>> bc2_index_stats = op.index(barcode_data=bc2_df, barcode_column='BC2', ...)\n ...\n >>>\n >>> # Pack experiment FastQ files\n >>> sam1_pack_stats = op.pack(r1_file='sample_1_R1.fq.gz', ...)\n >>> sam2_pack_stats = op.pack(r1_file='sample_2_R1.fq.gz', ...)\n ...\n >>>\n >>> # Compute and write barcode combination count matrix\n >>> xcount_df, stats = op.xcount(index_files=['bc1_index', 'bc2_index'],\n ... pack_file='sample_1_pack', ...)\n ...\n\n You can learn more about each module using help.\n >>> import oligopool as op\n >>> help(op)\n >>> help(op.primer)\n >>> help(op.barcode)\n ...\n >>> help(op.xcount)\n\n For advanced uses, the following classes are also available.\n - vectorDB\n - Scry\n...\n```\n\n## License\n\n`Oligpool Calculator` (c) 2024 Ayaan Hossain.\n\n`Oligpool Calculator` is an **open-source software** under [GPL-3.0](https://opensource.org/license/gpl-3-0) License.\n\nSee [LICENSE](https://github.com/ayaanhossain/oligopool/blob/master/LICENSE) file for more details.\n",
"bugtrack_url": null,
"license": null,
"summary": "Oligopool Calculator - Automated design and analysis of oligopool libraries",
"version": "2024.11.3",
"project_urls": {
"Bug Reports": "https://github.com/ayaanhossain/oligopool/issues",
"Homepage": "https://github.com/ayaanhossain/oligopool",
"Source": "https://github.com/ayaanhossain/oligopool/tree/master/oligopool"
},
"split_keywords": [
"synthetic",
"computational",
"biology",
"nucleotide",
"oligo",
"pool",
"calculator",
"design",
"analysis",
"barcode",
"primer",
"spacer",
"motif",
"split",
"pad",
"assembly",
"index",
"pack",
"scry",
"classifier",
"count",
"acount",
"xcount"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "d9073500f47120d442e369841f41afa3b9db04cf183f81c2a486c9451f0c5de7",
"md5": "2db3ed8b3bf39f025c9bdabdb742c75e",
"sha256": "6d0c7ec06a6f801d60b7e2899b0801819365afc1d92fabb8923cde23f636406f"
},
"downloads": -1,
"filename": "oligopool-2024.11.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "2db3ed8b3bf39f025c9bdabdb742c75e",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4,>=3.10",
"size": 170854,
"upload_time": "2024-11-04T00:32:52",
"upload_time_iso_8601": "2024-11-04T00:32:52.647132Z",
"url": "https://files.pythonhosted.org/packages/d9/07/3500f47120d442e369841f41afa3b9db04cf183f81c2a486c9451f0c5de7/oligopool-2024.11.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "de2a6a8070da144efbc358f18b6d518101e97f43d425bdd6e9ed0b168ba50f8d",
"md5": "9748901e72d589252c3b14dfc470dc5e",
"sha256": "00569482d6527a1f5f28ac3d770cb2654e34978777f0127d65349cec3f28a20b"
},
"downloads": -1,
"filename": "oligopool-2024.11.3.tar.gz",
"has_sig": false,
"md5_digest": "9748901e72d589252c3b14dfc470dc5e",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4,>=3.10",
"size": 147578,
"upload_time": "2024-11-04T00:32:54",
"upload_time_iso_8601": "2024-11-04T00:32:54.185095Z",
"url": "https://files.pythonhosted.org/packages/de/2a/6a8070da144efbc358f18b6d518101e97f43d425bdd6e9ed0b168ba50f8d/oligopool-2024.11.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-04 00:32:54",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "ayaanhossain",
"github_project": "oligopool",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "oligopool"
}