omnigenbench

Name	omnigenbench JSON
Version	0.3.13a0 JSON
	download
home_page	https://github.com/yangheng95/OmniGenBench
Summary	OmniGenBench: A comprehensive toolkit for genome analysis benchmarking.
upload_time	2025-09-11 18:41:30
maintainer	None
docs_url	None
author	Yang, Heng
requires_python	>=3.10
license	Apache-2.0
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            ![favicon.png](asset/favicon.png)

<h3 align="center">OmniGenBench provides an all-in-one solution for genomic foundation model finetuning, inference, deployment and automated benchmarking, designed for research and applications in genomics.</h3>

<div align="center">

  <a href="https://omnigenbenchdoc.readthedocs.io/en/latest/">
    <img src="https://img.shields.io/readthedocs/omnigenbench?logo=readthedocs&logoColor=white" alt="Documentation Status" />
  </a>

  <a href="https://pypi.org/project/omnigenome/">
    <img src="https://img.shields.io/pypi/v/omnigenome?color=blue&label=PyPI" alt="PyPI" />
  </a>

  <a href="https://pepy.tech/project/omnigenome">
    <img src="https://static.pepy.tech/badge/omnigenome" alt="PyPI Downloads" />
  </a>

  <a href="https://pypi.org/project/omnigenbench/">
    <img src="https://img.shields.io/pypi/pyversions/omnigenome" alt="Python Version" />
  </a>

  <a href="https://github.com/yangheng95/omnigenome/blob/main/LICENSE">
    <img src="https://img.shields.io/github/license/yangheng95/omnigenome" alt="License" />
  </a>

</div>
<h3 align="center">
  <a href="#installation">📦 Installation</a>
  <span> · </span>
  <a href="#quick-start">🚀 Getting Started</a>
  <span> · </span>
  <a href="#supported-models">🧬 Model Support</a>
  <span> · </span>
  <a href="#benchmarks">📊 Benchmarks </a>
  <span> · </span>
  <a href="#tutorials">🧪 Application Tutorials</a>
  <span> · </span>
  <a href="https://arxiv.org/pdf/2505.14402">📚 Paper</a>
</h3>


## 🔍 What You Can Do with OmniGenBench?

- 🧬 **Benchmark effortlessly** — Run automated and reproducible evaluations for genomic foundation models  
- 🧠 **Understand your models** — Explore interpretability across diverse tasks and species  
- ⚙️ **Run tutorials instantly** — Use click-to-run guides for genomic sequence modeling  
- 🚀 **Fine-tune and infer efficiently** — Accelerated workflows for fine-tuning and inference on GFMs on downstream tasks

## Installation

### Requirements
Before installing OmniGenoBench, you need to install the following dependencies: 
- Python 3.10+
- PyTorch 2.5+
- Transformers 4.46.0+

### PyPI Installation
To install OmniGenoBench, you can use pip:
```bash
pip install omnigenbench -U
```

### Source Installation
Or you can clone the repository and install it from source:
```bash
git clone https://github.com/yangheng95/OmniGenBench.git
cd OmniGenBench
pip install -e .
```

## Quick Start
`OmniGenBench is available for diverse models and benchmark suites, please refer to the  following sections for more details.`
### Auto-benchmark via CLI
The following command will download the model from the Hugging Face model hub and run the benchmark on the RGB benchmark:
```bash
autobench --model_name_or_path "yangheng/OmniGenome-186M" --benchmark "RGB" --trainer accelerate
```
You can find a visualization of AutoBench [here](asset/AutoBench.gif).


### Auto-benchmark via Python API
Or you can use the following python code to run the auto-benchmark:
```python
from omnigenbench import AutoBench
gfm = 'LongSafari/hyenadna-medium-160k-seqlen-hf'
# benchmark could be "RGB", "GB", "PGB", "GUE", which will be downloaded from the Hugging Face model hub
benchmark = "RGB"
bench_size = 8
seeds = [0, 1, 2, 3, 4]
bench = AutoBench(benchmark=benchmark, model_name_or_path=gfm, overwrite=False)
bench.run(autocast=False, batch_size=bench_size, seeds=seeds)
```
You can find an example of AutoBench via Python API [here](examples/autobench/AutoBench_Tutorial.ipynb).

## Supported Models


OmniGenBench provides plug-and-play evaluation for over **30 genomic foundation models**, covering both **RNA** and **DNA** modalities. The following are highlights:

| Model          | Params | Pre-training Corpus                        | Highlights                                          |
|----------------|--------|--------------------------------------------|-----------------------------------------------------|
| **OmniGenome** | 186M   | 54B plant RNA+DNA tokens                   | Multi-modal, structure-aware encoder                |
| **Agro-NT-1B** | 985M   | 48 edible-plant genomes                    | Billion-scale DNA LM w/ NT-V2 k-mer vocab           |
| **RiNALMo**    | 651M   | 36M ncRNA sequences                        | Largest public RNA LM; FlashAttention-2             |
| **DNABERT-2**  | 117M   | 32B DNA tokens, 136 species (BPE)          | Byte-pair encoding; 2nd-gen DNA BERT                |
| **RNA-FM**     | 96M    | 23M ncRNA sequences                        | High performance on RNA structure tasks             |
| **RNA-MSM**    | 96M    | Multi-sequence alignments                  | MSA-based evolutionary RNA LM                       |
| **NT-V2**      | 96M    | 300B DNA tokens (850 species)              | Hybrid k-mer vocabulary                             |
| **HyenaDNA**   | 47M    | Human chromosomes                          | Long-context autoregressive model (1Mb)             |
| **SpliceBERT** | 19M    | 2M pre-mRNA sequences                      | Fine-grained splice-site recognition                |
| **Caduceus**   | 1.9M   | Human chromosomes                          | Ultra-compact DNA LM (RC-equivariant)               |
| **RNA-BERT**   | 0.5M   | 4,000+ ncRNA families                      | Small BERT with nucleotide masking                  |
| *...and more*  | —      | See Appendix E of the paper                | Includes PlantRNA-FM, UTR-LM, MP-RNA, CALM, etc.    |

## Benchmarks

OmniGenBench supports five curated benchmark suites covering both **sequence-level** and **structure-level** genomics tasks across species.

| Suite        | Focus                       | #Tasks / Datasets        | Sample Tasks                                         |
|--------------|-----------------------------|--------------------------|------------------------------------------------------|
| **RGB**      | RNA structure + function    | 12 tasks (SN-level)      | RNA secondary structure, SNMR, degradation prediction |
| **BEACON**   | RNA (multi-domain)          | 13 tasks                 | Base pairing, mRNA design, RNA contact maps         |
| **PGB**      | Plant long-range DNA        | 7 categories             | PolyA, enhancer, chromatin access, splice site      |
| **GUE**      | DNA general tasks           | 36 datasets (9 tasks)    | TF binding, core promoter, enhancer detection       |
| **GB**       | Classic DNA classification  | 9 datasets               | Human/mouse enhancer, promoter variant classification|


## Tutorials

### RNA Design
RNA design is a fundamental problem in synthetic biology,
where the goal is to design RNA sequences that fold into a target structure.
In this demo, we show how to use OmniGenoBench to design RNA sequences 
that fold into a target structure using a pre-trained model.
The tutorials of RNA Design Demo can be found in [RNA_Design_Tutorial.ipynb](examples/rna_design/RNA_Design_Tutorial.ipynb).

You can find a visual example of RNA Design [here](asset/RNA_Design.gif).

### RNA Secondary Structure Prediction

RNA secondary structure prediction is a fundamental problem in computational biology,
where the goal is to predict the secondary structure of an RNA sequence.
In this demo, we show how to use OmniGenoBench to predict the secondary structure of RNA sequences using a pre-trained model.
The tutorials of RNA Secondary Structure Prediction can be found in
[Secondary_Structure_Prediction_Tutorial.ipynb](examples/rna_secondary_structure_prediction/Secondary_Structure_Prediction_Tutorial.ipynb).

You can find a visual example of RNA Secondary Structure Prediction [here](asset/RNA_Structure_Prediction.gif).

### More Tutorials
Please find more usage tutorials in [examples/tutorials](examples/tutorials).

## Citation
```bibtex
@article{yang2024omnigenbench,
      title={OmniGenBench: A Modular Platform for Reproducible Genomic Foundation Models Benchmarking}, 
      author={Heng Yang and Jack Cole, Yuan Li, Renzhi Chen, Geyong Min and Ke Li},
      year={2024},
      eprint={https://arxiv.org/abs/2505.14402},
      archivePrefix={arXiv},
      primaryClass={q-bio.GN},
      url={https://arxiv.org/abs/2505.14402}, 
}
```
## License
OmniGenBench is licensed under the Apache License 2.0. See the LICENSE file for more information.


## Contribution
We welcome contributions to OmniGenBench! If you have any ideas, suggestions, or bug reports, please open an issue or submit a pull request on GitHub.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/yangheng95/OmniGenBench",
    "name": "omnigenbench",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": null,
    "author": "Yang, Heng",
    "author_email": "hy345@exeter.ac.uk",
    "download_url": "https://files.pythonhosted.org/packages/cf/55/60b52c8b091d6700275b9235ff755589995e0ef8eb31ce165b7c0ff940a5/omnigenbench-0.3.13a0.tar.gz",
    "platform": "Windows",
    "description": "![favicon.png](asset/favicon.png)\r\n\r\n<h3 align=\"center\">OmniGenBench provides an all-in-one solution for genomic foundation model finetuning, inference, deployment and automated benchmarking, designed for research and applications in genomics.</h3>\r\n\r\n<div align=\"center\">\r\n\r\n  <a href=\"https://omnigenbenchdoc.readthedocs.io/en/latest/\">\r\n    <img src=\"https://img.shields.io/readthedocs/omnigenbench?logo=readthedocs&logoColor=white\" alt=\"Documentation Status\" />\r\n  </a>\r\n\r\n  <a href=\"https://pypi.org/project/omnigenome/\">\r\n    <img src=\"https://img.shields.io/pypi/v/omnigenome?color=blue&label=PyPI\" alt=\"PyPI\" />\r\n  </a>\r\n\r\n  <a href=\"https://pepy.tech/project/omnigenome\">\r\n    <img src=\"https://static.pepy.tech/badge/omnigenome\" alt=\"PyPI Downloads\" />\r\n  </a>\r\n\r\n  <a href=\"https://pypi.org/project/omnigenbench/\">\r\n    <img src=\"https://img.shields.io/pypi/pyversions/omnigenome\" alt=\"Python Version\" />\r\n  </a>\r\n\r\n  <a href=\"https://github.com/yangheng95/omnigenome/blob/main/LICENSE\">\r\n    <img src=\"https://img.shields.io/github/license/yangheng95/omnigenome\" alt=\"License\" />\r\n  </a>\r\n\r\n</div>\r\n<h3 align=\"center\">\r\n  <a href=\"#installation\">\ud83d\udce6 Installation</a>\r\n  <span> \u00b7 </span>\r\n  <a href=\"#quick-start\">\ud83d\ude80 Getting Started</a>\r\n  <span> \u00b7 </span>\r\n  <a href=\"#supported-models\">\ud83e\uddec Model Support</a>\r\n  <span> \u00b7 </span>\r\n  <a href=\"#benchmarks\">\ud83d\udcca Benchmarks </a>\r\n  <span> \u00b7 </span>\r\n  <a href=\"#tutorials\">\ud83e\uddea Application Tutorials</a>\r\n  <span> \u00b7 </span>\r\n  <a href=\"https://arxiv.org/pdf/2505.14402\">\ud83d\udcda Paper</a>\r\n</h3>\r\n\r\n\r\n## \ud83d\udd0d What You Can Do with OmniGenBench?\r\n\r\n- \ud83e\uddec **Benchmark effortlessly** \u2014 Run automated and reproducible evaluations for genomic foundation models  \r\n- \ud83e\udde0 **Understand your models** \u2014 Explore interpretability across diverse tasks and species  \r\n- \u2699\ufe0f **Run tutorials instantly** \u2014 Use click-to-run guides for genomic sequence modeling  \r\n- \ud83d\ude80 **Fine-tune and infer efficiently** \u2014 Accelerated workflows for fine-tuning and inference on GFMs on downstream tasks\r\n\r\n## Installation\r\n\r\n### Requirements\r\nBefore installing OmniGenoBench, you need to install the following dependencies: \r\n- Python 3.10+\r\n- PyTorch 2.5+\r\n- Transformers 4.46.0+\r\n\r\n### PyPI Installation\r\nTo install OmniGenoBench, you can use pip:\r\n```bash\r\npip install omnigenbench -U\r\n```\r\n\r\n### Source Installation\r\nOr you can clone the repository and install it from source:\r\n```bash\r\ngit clone https://github.com/yangheng95/OmniGenBench.git\r\ncd OmniGenBench\r\npip install -e .\r\n```\r\n\r\n## Quick Start\r\n`OmniGenBench is available for diverse models and benchmark suites, please refer to the  following sections for more details.`\r\n### Auto-benchmark via CLI\r\nThe following command will download the model from the Hugging Face model hub and run the benchmark on the RGB benchmark:\r\n```bash\r\nautobench --model_name_or_path \"yangheng/OmniGenome-186M\" --benchmark \"RGB\" --trainer accelerate\r\n```\r\nYou can find a visualization of AutoBench [here](asset/AutoBench.gif).\r\n\r\n\r\n### Auto-benchmark via Python API\r\nOr you can use the following python code to run the auto-benchmark:\r\n```python\r\nfrom omnigenbench import AutoBench\r\ngfm = 'LongSafari/hyenadna-medium-160k-seqlen-hf'\r\n# benchmark could be \"RGB\", \"GB\", \"PGB\", \"GUE\", which will be downloaded from the Hugging Face model hub\r\nbenchmark = \"RGB\"\r\nbench_size = 8\r\nseeds = [0, 1, 2, 3, 4]\r\nbench = AutoBench(benchmark=benchmark, model_name_or_path=gfm, overwrite=False)\r\nbench.run(autocast=False, batch_size=bench_size, seeds=seeds)\r\n```\r\nYou can find an example of AutoBench via Python API [here](examples/autobench/AutoBench_Tutorial.ipynb).\r\n\r\n## Supported Models\r\n\r\n\r\nOmniGenBench provides plug-and-play evaluation for over **30 genomic foundation models**, covering both **RNA** and **DNA** modalities. The following are highlights:\r\n\r\n| Model          | Params | Pre-training Corpus                        | Highlights                                          |\r\n|----------------|--------|--------------------------------------------|-----------------------------------------------------|\r\n| **OmniGenome** | 186M   | 54B plant RNA+DNA tokens                   | Multi-modal, structure-aware encoder                |\r\n| **Agro-NT-1B** | 985M   | 48 edible-plant genomes                    | Billion-scale DNA LM w/ NT-V2 k-mer vocab           |\r\n| **RiNALMo**    | 651M   | 36M ncRNA sequences                        | Largest public RNA LM; FlashAttention-2             |\r\n| **DNABERT-2**  | 117M   | 32B DNA tokens, 136 species (BPE)          | Byte-pair encoding; 2nd-gen DNA BERT                |\r\n| **RNA-FM**     | 96M    | 23M ncRNA sequences                        | High performance on RNA structure tasks             |\r\n| **RNA-MSM**    | 96M    | Multi-sequence alignments                  | MSA-based evolutionary RNA LM                       |\r\n| **NT-V2**      | 96M    | 300B DNA tokens (850 species)              | Hybrid k-mer vocabulary                             |\r\n| **HyenaDNA**   | 47M    | Human chromosomes                          | Long-context autoregressive model (1Mb)             |\r\n| **SpliceBERT** | 19M    | 2M pre-mRNA sequences                      | Fine-grained splice-site recognition                |\r\n| **Caduceus**   | 1.9M   | Human chromosomes                          | Ultra-compact DNA LM (RC-equivariant)               |\r\n| **RNA-BERT**   | 0.5M   | 4,000+ ncRNA families                      | Small BERT with nucleotide masking                  |\r\n| *...and more*  | \u2014      | See Appendix E of the paper                | Includes PlantRNA-FM, UTR-LM, MP-RNA, CALM, etc.    |\r\n\r\n## Benchmarks\r\n\r\nOmniGenBench supports five curated benchmark suites covering both **sequence-level** and **structure-level** genomics tasks across species.\r\n\r\n| Suite        | Focus                       | #Tasks / Datasets        | Sample Tasks                                         |\r\n|--------------|-----------------------------|--------------------------|------------------------------------------------------|\r\n| **RGB**      | RNA structure + function    | 12 tasks (SN-level)      | RNA secondary structure, SNMR, degradation prediction |\r\n| **BEACON**   | RNA (multi-domain)          | 13 tasks                 | Base pairing, mRNA design, RNA contact maps         |\r\n| **PGB**      | Plant long-range DNA        | 7 categories             | PolyA, enhancer, chromatin access, splice site      |\r\n| **GUE**      | DNA general tasks           | 36 datasets (9 tasks)    | TF binding, core promoter, enhancer detection       |\r\n| **GB**       | Classic DNA classification  | 9 datasets               | Human/mouse enhancer, promoter variant classification|\r\n\r\n\r\n## Tutorials\r\n\r\n### RNA Design\r\nRNA design is a fundamental problem in synthetic biology,\r\nwhere the goal is to design RNA sequences that fold into a target structure.\r\nIn this demo, we show how to use OmniGenoBench to design RNA sequences \r\nthat fold into a target structure using a pre-trained model.\r\nThe tutorials of RNA Design Demo can be found in [RNA_Design_Tutorial.ipynb](examples/rna_design/RNA_Design_Tutorial.ipynb).\r\n\r\nYou can find a visual example of RNA Design [here](asset/RNA_Design.gif).\r\n\r\n### RNA Secondary Structure Prediction\r\n\r\nRNA secondary structure prediction is a fundamental problem in computational biology,\r\nwhere the goal is to predict the secondary structure of an RNA sequence.\r\nIn this demo, we show how to use OmniGenoBench to predict the secondary structure of RNA sequences using a pre-trained model.\r\nThe tutorials of RNA Secondary Structure Prediction can be found in\r\n[Secondary_Structure_Prediction_Tutorial.ipynb](examples/rna_secondary_structure_prediction/Secondary_Structure_Prediction_Tutorial.ipynb).\r\n\r\nYou can find a visual example of RNA Secondary Structure Prediction [here](asset/RNA_Structure_Prediction.gif).\r\n\r\n### More Tutorials\r\nPlease find more usage tutorials in [examples/tutorials](examples/tutorials).\r\n\r\n## Citation\r\n```bibtex\r\n@article{yang2024omnigenbench,\r\n      title={OmniGenBench: A Modular Platform for Reproducible Genomic Foundation Models Benchmarking}, \r\n      author={Heng Yang and Jack Cole, Yuan Li, Renzhi Chen, Geyong Min and Ke Li},\r\n      year={2024},\r\n      eprint={https://arxiv.org/abs/2505.14402},\r\n      archivePrefix={arXiv},\r\n      primaryClass={q-bio.GN},\r\n      url={https://arxiv.org/abs/2505.14402}, \r\n}\r\n```\r\n## License\r\nOmniGenBench is licensed under the Apache License 2.0. See the LICENSE file for more information.\r\n\r\n\r\n## Contribution\r\nWe welcome contributions to OmniGenBench! If you have any ideas, suggestions, or bug reports, please open an issue or submit a pull request on GitHub.\r\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "OmniGenBench: A comprehensive toolkit for genome analysis benchmarking.",
    "version": "0.3.13a0",
    "project_urls": {
        "Homepage": "https://github.com/yangheng95/OmniGenBench"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "fe1c30dc4abed82d40481ffac50ab0e75da2d7a31677f3c25ae06691d71ac431",
                "md5": "72e5dec82d3023b35b830de17aa92325",
                "sha256": "32bd36bbc5cf6d57d551db20f9191f436db0a4332a2f9f0ce35c0370229c6a7a"
            },
            "downloads": -1,
            "filename": "omnigenbench-0.3.13a0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "72e5dec82d3023b35b830de17aa92325",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 216728,
            "upload_time": "2025-09-11T18:41:28",
            "upload_time_iso_8601": "2025-09-11T18:41:28.027053Z",
            "url": "https://files.pythonhosted.org/packages/fe/1c/30dc4abed82d40481ffac50ab0e75da2d7a31677f3c25ae06691d71ac431/omnigenbench-0.3.13a0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "cf5560b52c8b091d6700275b9235ff755589995e0ef8eb31ce165b7c0ff940a5",
                "md5": "c471c125775bbc5bc3469d526c131c7f",
                "sha256": "c870f130bad1b836e9720175ad2495f2145e793cd347beafd22ab24748f24f3b"
            },
            "downloads": -1,
            "filename": "omnigenbench-0.3.13a0.tar.gz",
            "has_sig": false,
            "md5_digest": "c471c125775bbc5bc3469d526c131c7f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 166747,
            "upload_time": "2025-09-11T18:41:30",
            "upload_time_iso_8601": "2025-09-11T18:41:30.904718Z",
            "url": "https://files.pythonhosted.org/packages/cf/55/60b52c8b091d6700275b9235ff755589995e0ef8eb31ce165b7c0ff940a5/omnigenbench-0.3.13a0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-11 18:41:30",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "yangheng95",
    "github_project": "OmniGenBench",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "omnigenbench"
}

Yang, Heng