# ROCCO: [R]obust [O]pen [C]hromatin Detection via [C]onvex [O]ptimization
[](https://github.com/nolan-h-hamilton/ROCCO/actions/workflows/docs.yml)
[](https://github.com/nolan-h-hamilton/ROCCO/actions/workflows/tests.yml)

<p align="center">
<img width="400" alt="logo" src="docs/logo.png">
</p>
## What
ROCCO is an efficient algorithm for detection of "consensus peaks" in large datasets with multiple HTS data samples (namely, ATAC-seq), where an enrichment in read counts/densities is observed in a nontrivial subset of samples.
### Input/Output
* *Input*: Samples' BAM alignments or BigWig tracks
* *Output*: BED file of consensus peak regions (Default format is BED3: `chrom,start,end`)
* Note, if BigWig input is used, no preprocessing options can be applied at the alignment level and narrowPeak output cannot be generated.
## How
ROCCO models consensus peak calling as a constrained optimization problem with an upper-bound on the total proportion of the genome selected as open/accessible and a fragmentation penalty to promote spatial consistency in active regions and sparsity elsewhere.
## Why
ROCCO offers several attractive features:
1. **Consideration of enrichment and spatial characteristics** of open chromatin signals
2. **Scaling to large sample sizes (100+)** with an asymptotic time complexity independent of sample size
3. **No required training data** or a heuristically determined set of initial candidate peak regions
4. **No rigid thresholds** on the minimum number/width of supporting samples/replicates
5. **Mathematically tractable model** permitting worst-case analysis of runtime and performance
## Example Behavior
### Input
* ENCODE lymphoblastoid data (BEST5, WORST5): 10 real ATAC-seq alignments of varying TSS enrichment (SNR-like quality measure for ATAC-seq)
* Synthetic noisy data (NOISY5)
We run twice under two conditions -- *with noisy samples* and *without* for comparison (blue)
```shell
rocco -i *.BEST5.bam *.WORST5.bam -g hg38 -o rocco_output_without_noise.bed
rocco -i *.BEST5.bam *.WORST5.bam *.NOISY5.bam -g hg38 -o rocco_output_with_noise.bed
```
Note, users may run ROCCO with flag [`--narrowPeak`](https://genome.ucsc.edu/FAQ/FAQformat.html#format12) to generate 10-column output with various statistics for comparison of peaks.
### Output
Comparing each output file:
* *ROCCO is unaffected by the Noisy5 samples and effectively identifies true signal across multiple samples*
* *ROCCO simultaneously detects both wide and narrow consensus peaks*
<p align="center">
<img width="800" height="400" alt="example" src="docs/example_behavior.png">
</p>
## Paper/Citation
If using ROCCO in your research, please cite the [original paper](https://doi.org/10.1093/bioinformatics/btad725) in *Bioinformatics* (DOI: `btad725`)
```plaintext
Nolan H Hamilton, Terrence S Furey, ROCCO: a robust method for detection of open chromatin via convex optimization,
Bioinformatics, Volume 39, Issue 12, December 2023
```
## Documentation
For additional details, usage examples, etc. please see ROCCO's documentation: <https://nolan-h-hamilton.github.io/ROCCO/>
## Installation
### PyPI (`pip`)
```shell
python -m pip install rocco --upgrade
```
If lacking administrative control, you may need to append `--user` to the above.
### Build from Source
If preferred, ROCCO can easily be built from source:
* Clone or download this repository
```shell
git clone https://github.com/nolan-h-hamilton/ROCCO.git
cd ROCCO
python setup.py sdist bdist_wheel
pip install -e .
```
Raw data
{
"_id": null,
"home_page": "https://github.com/nolan-h-hamilton/rocco",
"name": "rocco",
"maintainer": null,
"docs_url": null,
"requires_python": "<4,>=3.8",
"maintainer_email": null,
"keywords": "genomics, functional genomics, epigenomics, epigenetics, peak calling, ATAC-seq",
"author": "Nolan Holt Hamilton",
"author_email": "nolan.hamilton@unc.edu",
"download_url": "https://files.pythonhosted.org/packages/df/80/b1b9bcfb4c9dae790640bc70f2dcbb5224a937b4ff5907edf6b22a32f95a/rocco-1.6.0.post1.tar.gz",
"platform": null,
"description": "# ROCCO: [R]obust [O]pen [C]hromatin Detection via [C]onvex [O]ptimization\n\n[](https://github.com/nolan-h-hamilton/ROCCO/actions/workflows/docs.yml)\n[](https://github.com/nolan-h-hamilton/ROCCO/actions/workflows/tests.yml)\n\n\n<p align=\"center\">\n<img width=\"400\" alt=\"logo\" src=\"docs/logo.png\">\n</p>\n\n## What\n\nROCCO is an efficient algorithm for detection of \"consensus peaks\" in large datasets with multiple HTS data samples (namely, ATAC-seq), where an enrichment in read counts/densities is observed in a nontrivial subset of samples.\n\n### Input/Output\n\n* *Input*: Samples' BAM alignments or BigWig tracks\n* *Output*: BED file of consensus peak regions (Default format is BED3: `chrom,start,end`)\n\n* Note, if BigWig input is used, no preprocessing options can be applied at the alignment level and narrowPeak output cannot be generated.\n\n## How\n\nROCCO models consensus peak calling as a constrained optimization problem with an upper-bound on the total proportion of the genome selected as open/accessible and a fragmentation penalty to promote spatial consistency in active regions and sparsity elsewhere.\n\n## Why\n\nROCCO offers several attractive features:\n\n1. **Consideration of enrichment and spatial characteristics** of open chromatin signals\n2. **Scaling to large sample sizes (100+)** with an asymptotic time complexity independent of sample size\n3. **No required training data** or a heuristically determined set of initial candidate peak regions\n4. **No rigid thresholds** on the minimum number/width of supporting samples/replicates\n5. **Mathematically tractable model** permitting worst-case analysis of runtime and performance\n\n## Example Behavior\n\n### Input\n\n* ENCODE lymphoblastoid data (BEST5, WORST5): 10 real ATAC-seq alignments of varying TSS enrichment (SNR-like quality measure for ATAC-seq)\n* Synthetic noisy data (NOISY5)\n\nWe run twice under two conditions -- *with noisy samples* and *without* for comparison (blue)\n\n ```shell\n rocco -i *.BEST5.bam *.WORST5.bam -g hg38 -o rocco_output_without_noise.bed\n rocco -i *.BEST5.bam *.WORST5.bam *.NOISY5.bam -g hg38 -o rocco_output_with_noise.bed\n ```\nNote, users may run ROCCO with flag [`--narrowPeak`](https://genome.ucsc.edu/FAQ/FAQformat.html#format12) to generate 10-column output with various statistics for comparison of peaks.\n\n### Output\n\nComparing each output file:\n\n* *ROCCO is unaffected by the Noisy5 samples and effectively identifies true signal across multiple samples*\n* *ROCCO simultaneously detects both wide and narrow consensus peaks*\n\n<p align=\"center\">\n<img width=\"800\" height=\"400\" alt=\"example\" src=\"docs/example_behavior.png\">\n</p>\n\n## Paper/Citation\n\nIf using ROCCO in your research, please cite the [original paper](https://doi.org/10.1093/bioinformatics/btad725) in *Bioinformatics* (DOI: `btad725`)\n\n ```plaintext\n Nolan H Hamilton, Terrence S Furey, ROCCO: a robust method for detection of open chromatin via convex optimization,\n Bioinformatics, Volume 39, Issue 12, December 2023\n ```\n\n## Documentation\n\nFor additional details, usage examples, etc. please see ROCCO's documentation: <https://nolan-h-hamilton.github.io/ROCCO/>\n\n## Installation\n\n### PyPI (`pip`)\n\n ```shell\n python -m pip install rocco --upgrade\n ```\n\nIf lacking administrative control, you may need to append `--user` to the above.\n\n\n### Build from Source\n\nIf preferred, ROCCO can easily be built from source:\n\n* Clone or download this repository\n\n ```shell\n git clone https://github.com/nolan-h-hamilton/ROCCO.git\n cd ROCCO\n python setup.py sdist bdist_wheel\n pip install -e .\n ```\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Robust ATAC-seq Peak Calling for Many Samples via Convex Optimization",
"version": "1.6.0.post1",
"project_urls": {
"Homepage": "https://github.com/nolan-h-hamilton/rocco"
},
"split_keywords": [
"genomics",
" functional genomics",
" epigenomics",
" epigenetics",
" peak calling",
" atac-seq"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "3f1072d36d21f99b8a8d16e68e3b5fb15f758eb7fe96d3bea37fb676b10a1288",
"md5": "16168f43b524875237b1ef3c08abc9ee",
"sha256": "85373a19b334de9e3df99d377961b74f06ae2aca9df63b0ed6c51a62f7354946"
},
"downloads": -1,
"filename": "rocco-1.6.0.post1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "16168f43b524875237b1ef3c08abc9ee",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4,>=3.8",
"size": 40683,
"upload_time": "2025-02-11T01:49:31",
"upload_time_iso_8601": "2025-02-11T01:49:31.103615Z",
"url": "https://files.pythonhosted.org/packages/3f/10/72d36d21f99b8a8d16e68e3b5fb15f758eb7fe96d3bea37fb676b10a1288/rocco-1.6.0.post1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "df80b1b9bcfb4c9dae790640bc70f2dcbb5224a937b4ff5907edf6b22a32f95a",
"md5": "1bb8f0a52bf7130800ddd60ab5555dde",
"sha256": "1b57950f1467c74dd427098c5e6cca5cd4ec3dea25f47855bfb3700d1906d63b"
},
"downloads": -1,
"filename": "rocco-1.6.0.post1.tar.gz",
"has_sig": false,
"md5_digest": "1bb8f0a52bf7130800ddd60ab5555dde",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4,>=3.8",
"size": 706001,
"upload_time": "2025-02-11T01:49:32",
"upload_time_iso_8601": "2025-02-11T01:49:32.811333Z",
"url": "https://files.pythonhosted.org/packages/df/80/b1b9bcfb4c9dae790640bc70f2dcbb5224a937b4ff5907edf6b22a32f95a/rocco-1.6.0.post1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-02-11 01:49:32",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "nolan-h-hamilton",
"github_project": "rocco",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "rocco"
}