<!-- -------------------------------------------------------- -->
<!-- Banner / Logo (optional) -->
<!-- Replace the link below with your own logo or SVG banner -->
<p align="center">
<img src="https://raw.githubusercontent.com/your-org/GenAIRR/main/docs/_static/banner.svg" alt="GenAIRR" width="60%" />
</p>
<h1 align="center">GenAIRR</h1>
<p align="center">
<b>Adaptive Immune Receptor Repertoire sequence simulator</b><br/>
Generate realistic BCR & TCR repertoires in a single line of Python.
</p>
<p align="center">
<a href="https://pypi.org/project/GenAIRR/"><img src="https://img.shields.io/pypi/v/GenAIRR.svg?logo=pypi&logoColor=white" alt="PyPI version"></a>
<a href="https://genairr.readthedocs.io/en/latest/"><img src="https://img.shields.io/readthedocs/genairr?logo=readthedocs"></a>
</a>
</p>
---
## 📑 Table of Contents
1. [Why GenAIRR?](#-why-genairr)
2. [Key Features](#-key-features)
3. [Installation](#-installation)
4. [Quick Start](#-quick-start)
5. [Examples](#-examples)
6. [Mutation Models](#-mutation-models)
7. [Roadmap](#-roadmap)
8. [Contributing](#-contributing)
9. [Citing GenAIRR](#-citing-genairr)
10. [License](#-license)
11. [Acknowledgements](#-acknowledgements)
---
## 🧐 Why GenAIRR?
<details>
<summary>Click to expand</summary>
*Benchmarking modern aligners, exploring somatic-hypermutation, or stress-testing novel ML pipelines requires large, perfectly-annotated repertoires—not snippets of real data peppered with sequencing error.*
GenAIRR fills that gap with a **plug-and-play, fully-extensible simulation engine** that produces sequences while giving you full ground-truth labels.
</details>
---
## ✨ Key Features
| Category | Highlights |
| -------- |------------------------------------------------------------------------------------|
| **Realistic Simulation** | Context-aware S5F mutations, indels, allele-specific trimming, NP-region modelling |
| **Composable Pipelines** | Chain together built-in & custom `AugmentationStep`s into simulation pipelines |
| **Multi-Chain Support** | Heavy & light BCRs plus TCR-β out of the box |
| **Research-ready Output** | JSON / pandas export, built-in plotting stubs, deterministic seeds |
| **Docs & Tutorials** | Rich API docs, Jupyter notebooks, step-by-step guides |
---
## ⚡ Installation
```bash
# Python ≥ 3.9
pip install GenAIRR
# or the bleeding edge
pip install git+https://github.com/MuteJester/GenAIRR.git
````
---
## 🚀 Quick Start
Below is a 60-second tour. See [`/examples`](examples/) for notebooks and CLI usages.
```python
from GenAIRR.pipeline import AugmentationPipeline
from GenAIRR.steps import SimulateSequence, FixVPositionAfterTrimmingIndexAmbiguity
from GenAIRR.mutation import S5F
from GenAIRR.data import HUMAN_IGH_OGRDB
from GenAIRR.steps.StepBase import AugmentationStep
# 1️⃣ Configure built-in germline data
AugmentationStep.set_dataconfig(HUMAN_IGH_OGRDB)
# 2️⃣ Build a minimal pipeline
pipeline = AugmentationPipeline([
SimulateSequence(S5F(min_mutation_rate=0.003, max_mutation_rate=0.25), True),
FixVPositionAfterTrimmingIndexAmbiguity()
])
# 3️⃣ Simulate!
sim = pipeline.execute()
print(sim.get_dict())
```
---
## 🧑💻 Examples
### 1. Full Heavy-Chain Pipeline
```python
from GenAIRR.steps import (
FixDPositionAfterTrimmingIndexAmbiguity, FixJPositionAfterTrimmingIndexAmbiguity,
CorrectForVEndCut, CorrectForDTrims, CorruptSequenceBeginning,
InsertNs, InsertIndels, ShortDValidation, DistillMutationRate
)
pipeline = AugmentationPipeline([
SimulateSequence(S5F(min_mutation_rate=0.003, max_mutation_rate=0.25), True),
FixVPositionAfterTrimmingIndexAmbiguity(),
FixDPositionAfterTrimmingIndexAmbiguity(),
FixJPositionAfterTrimmingIndexAmbiguity(),
CorrectForVEndCut(),
CorrectForDTrims(),
CorruptSequenceBeginning(0.7, [0.4, 0.4, 0.2], 576, 210, 310, 50),
InsertNs(0.02, 0.5),
ShortDValidation(),
InsertIndels(0.5, 5, 0.5, 0.5),
DistillMutationRate()
])
result = pipeline.execute()
```
### 2. Naïve Sequence (no SHM)
```python
from GenAIRR.mutation import Uniform
naive_step = SimulateSequence(Uniform(0, 0), True)
pipeline = AugmentationPipeline([naive_step])
naive_seq = pipeline.execute()
print(naive_seq.sequence)
```
### 3. Custom Allele Combination
```python
custom_step = SimulateSequence(
S5F(0.003, 0.25),
True,
specific_v=HUMAN_IGH_OGRDB.v_alleles['IGHV1-2*02'][0], # specific V allele
specific_d=HUMAN_IGH_OGRDB.d_alleles['IGHD3-10*01'][0], # specific D allele
specific_j=HUMAN_IGH_OGRDB.j_alleles['IGHJ4*02'][0] # specific J allele
)
pipeline = AugmentationPipeline([custom_step])
print(pipeline.execute().get_dict())
```
---
## 🔬 Mutation Models
| Model | Description | When to use |
| ---------------- | --------------------------------------- | ----------------------------- |
| `S5F` | Context-specific somatic hyper-mutation | Antibody maturation studies |
| `Uniform` | Evenly random mutations | Baselines / ablation |
| **Your Model ➕** | Implement `BaseMutationModel` | Custom evolutionary scenarios |
```python
from GenAIRR.mutation import S5F
s5f = S5F(min_mutation_rate=0.01, max_mutation_rate=0.05)
mut_seq, muts, rate = s5f.apply_mutation(naive_seq)
```
---
## 🗺️ Roadmap
* [ ] 🚧 **More Complex Mutation Model (With Selection)**
* [ ] 🚧 **More Built-in Data Configs** (e.g., TCR, custom germlines)
* [ ] 🚧 **More Built-in Steps** (e.g., more mutation models, more data augmentation)
* [ ] 🚧 **Deeper Docs** (e.g., more examples, more tutorials)
*See the [open issues](https://github.com/your-org/GenAIRR/issues).*
Feel something’s missing? [Open a feature request](https://github.com/your-org/GenAIRR/issues/new).
---
## 🤝 Contributing
Contributions are welcome! 💙
Please read our [contributing guide](CONTRIBUTING.md) and check the **good first issue** label.
---
## ✏️ Citing GenAIRR
If GenAIRR helps your research, please cite:
```
Konstantinovsky T, Peres A, Polak P, Yaari G.
An unbiased comparison of immunoglobulin sequence aligners.
Briefings in Bioinformatics. 2024 Sep 23; 25(6): bbae556.
https://doi.org/10.1093/bib/bbae556
PMID: 39489605 | PMCID: PMC11531861
```
---
## 📜 License
Distributed under the GPL3 License. See **[LICENSE](LICENSE)** for details.
---
## 🙏 Acknowledgements
GenAIRR is inspired by and builds upon amazing work from the immunoinformatics community—especially [AIRRship](https://github.com/Cowanlab/airrship).
<!-- End of README -->
Raw data
{
"_id": null,
"home_page": "https://github.com/MuteJester/GenAIRR",
"name": "GenAIRR",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "immunogenetics, sequence simulation, bioinformatics, alignment benchmarking",
"author": "Thomas Konstantinovsky & Ayelet Peres",
"author_email": "thomaskon90@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/21/ad/f942a93e981f83397e57a16504da29acfa5e6b6b496a4eab6d99adaf3c32/genairr-0.5.2.tar.gz",
"platform": null,
"description": "<!-- -------------------------------------------------------- -->\n<!-- Banner / Logo (optional) -->\n<!-- Replace the link below with your own logo or SVG banner -->\n<p align=\"center\">\n <img src=\"https://raw.githubusercontent.com/your-org/GenAIRR/main/docs/_static/banner.svg\" alt=\"GenAIRR\" width=\"60%\" />\n</p>\n\n<h1 align=\"center\">GenAIRR</h1>\n\n<p align=\"center\">\n <b>Adaptive Immune Receptor Repertoire sequence simulator</b><br/>\n Generate realistic BCR & TCR repertoires in a single line of Python.\n</p>\n\n<p align=\"center\">\n <a href=\"https://pypi.org/project/GenAIRR/\"><img src=\"https://img.shields.io/pypi/v/GenAIRR.svg?logo=pypi&logoColor=white\" alt=\"PyPI version\"></a>\n <a href=\"https://genairr.readthedocs.io/en/latest/\"><img src=\"https://img.shields.io/readthedocs/genairr?logo=readthedocs\"></a>\n</a>\n</p>\n\n---\n\n## \ud83d\udcd1 Table of Contents\n1. [Why GenAIRR?](#-why-genairr)\n2. [Key Features](#-key-features)\n3. [Installation](#-installation)\n4. [Quick Start](#-quick-start)\n5. [Examples](#-examples)\n6. [Mutation Models](#-mutation-models)\n7. [Roadmap](#-roadmap)\n8. [Contributing](#-contributing)\n9. [Citing GenAIRR](#-citing-genairr)\n10. [License](#-license)\n11. [Acknowledgements](#-acknowledgements)\n\n---\n\n## \ud83e\uddd0 Why GenAIRR?\n<details>\n<summary>Click to expand</summary>\n\n*Benchmarking modern aligners, exploring somatic-hypermutation, or stress-testing novel ML pipelines requires large, perfectly-annotated repertoires\u2014not snippets of real data peppered with sequencing error.* \nGenAIRR fills that gap with a **plug-and-play, fully-extensible simulation engine** that produces sequences while giving you full ground-truth labels.\n\n</details>\n\n---\n\n## \u2728 Key Features\n| Category | Highlights |\n| -------- |------------------------------------------------------------------------------------|\n| **Realistic Simulation** | Context-aware S5F mutations, indels, allele-specific trimming, NP-region modelling |\n| **Composable Pipelines** | Chain together built-in & custom `AugmentationStep`s into simulation pipelines |\n| **Multi-Chain Support** | Heavy & light BCRs plus TCR-\u03b2 out of the box |\n| **Research-ready Output** | JSON / pandas export, built-in plotting stubs, deterministic seeds |\n| **Docs & Tutorials** | Rich API docs, Jupyter notebooks, step-by-step guides |\n\n---\n\n## \u26a1 Installation\n```bash\n# Python \u2265 3.9\npip install GenAIRR\n# or the bleeding edge\npip install git+https://github.com/MuteJester/GenAIRR.git\n````\n\n---\n\n## \ud83d\ude80 Quick Start\n\nBelow is a 60-second tour. See [`/examples`](examples/) for notebooks and CLI usages.\n\n```python\nfrom GenAIRR.pipeline import AugmentationPipeline\nfrom GenAIRR.steps import SimulateSequence, FixVPositionAfterTrimmingIndexAmbiguity\nfrom GenAIRR.mutation import S5F\nfrom GenAIRR.data import HUMAN_IGH_OGRDB\nfrom GenAIRR.steps.StepBase import AugmentationStep\n\n# 1\ufe0f\u20e3 Configure built-in germline data\nAugmentationStep.set_dataconfig(HUMAN_IGH_OGRDB)\n\n# 2\ufe0f\u20e3 Build a minimal pipeline\npipeline = AugmentationPipeline([\n SimulateSequence(S5F(min_mutation_rate=0.003, max_mutation_rate=0.25), True),\n FixVPositionAfterTrimmingIndexAmbiguity()\n])\n\n# 3\ufe0f\u20e3 Simulate!\nsim = pipeline.execute()\nprint(sim.get_dict())\n```\n\n---\n\n## \ud83e\uddd1\u200d\ud83d\udcbb Examples\n\n### 1. Full Heavy-Chain Pipeline\n\n```python\nfrom GenAIRR.steps import (\n FixDPositionAfterTrimmingIndexAmbiguity, FixJPositionAfterTrimmingIndexAmbiguity,\n CorrectForVEndCut, CorrectForDTrims, CorruptSequenceBeginning,\n InsertNs, InsertIndels, ShortDValidation, DistillMutationRate\n)\n\npipeline = AugmentationPipeline([\n SimulateSequence(S5F(min_mutation_rate=0.003, max_mutation_rate=0.25), True),\n FixVPositionAfterTrimmingIndexAmbiguity(),\n FixDPositionAfterTrimmingIndexAmbiguity(),\n FixJPositionAfterTrimmingIndexAmbiguity(),\n CorrectForVEndCut(),\n CorrectForDTrims(),\n CorruptSequenceBeginning(0.7, [0.4, 0.4, 0.2], 576, 210, 310, 50),\n InsertNs(0.02, 0.5),\n ShortDValidation(),\n InsertIndels(0.5, 5, 0.5, 0.5),\n DistillMutationRate()\n])\nresult = pipeline.execute()\n```\n\n### 2. Na\u00efve Sequence (no SHM)\n\n```python\nfrom GenAIRR.mutation import Uniform\nnaive_step = SimulateSequence(Uniform(0, 0), True)\npipeline = AugmentationPipeline([naive_step])\nnaive_seq = pipeline.execute()\nprint(naive_seq.sequence)\n```\n\n### 3. Custom Allele Combination\n\n```python\ncustom_step = SimulateSequence(\n S5F(0.003, 0.25),\n True,\n specific_v=HUMAN_IGH_OGRDB.v_alleles['IGHV1-2*02'][0], # specific V allele\n specific_d=HUMAN_IGH_OGRDB.d_alleles['IGHD3-10*01'][0], # specific D allele \n specific_j=HUMAN_IGH_OGRDB.j_alleles['IGHJ4*02'][0] # specific J allele\n)\npipeline = AugmentationPipeline([custom_step])\nprint(pipeline.execute().get_dict())\n```\n---\n\n## \ud83d\udd2c Mutation Models\n\n| Model | Description | When to use |\n| ---------------- | --------------------------------------- | ----------------------------- |\n| `S5F` | Context-specific somatic hyper-mutation | Antibody maturation studies |\n| `Uniform` | Evenly random mutations | Baselines / ablation |\n| **Your Model \u2795** | Implement `BaseMutationModel` | Custom evolutionary scenarios |\n\n```python\nfrom GenAIRR.mutation import S5F\ns5f = S5F(min_mutation_rate=0.01, max_mutation_rate=0.05)\nmut_seq, muts, rate = s5f.apply_mutation(naive_seq)\n```\n\n---\n\n## \ud83d\uddfa\ufe0f Roadmap\n\n* [ ] \ud83d\udea7 **More Complex Mutation Model (With Selection)**\n* [ ] \ud83d\udea7 **More Built-in Data Configs** (e.g., TCR, custom germlines)\n* [ ] \ud83d\udea7 **More Built-in Steps** (e.g., more mutation models, more data augmentation)\n* [ ] \ud83d\udea7 **Deeper Docs** (e.g., more examples, more tutorials)\n\n*See the [open issues](https://github.com/your-org/GenAIRR/issues).*\n Feel something\u2019s missing? [Open a feature request](https://github.com/your-org/GenAIRR/issues/new).\n\n---\n\n## \ud83e\udd1d Contributing\n\nContributions are welcome! \ud83d\udc99\nPlease read our [contributing guide](CONTRIBUTING.md) and check the **good first issue** label.\n\n---\n\n## \u270f\ufe0f Citing GenAIRR\n\nIf GenAIRR helps your research, please cite:\n\n```\nKonstantinovsky T, Peres A, Polak P, Yaari G. \nAn unbiased comparison of immunoglobulin sequence aligners.\nBriefings in Bioinformatics. 2024 Sep 23; 25(6): bbae556. \nhttps://doi.org/10.1093/bib/bbae556 \nPMID: 39489605\u2003|\u2003PMCID: PMC11531861\n```\n\n---\n\n## \ud83d\udcdc License\n\nDistributed under the GPL3 License. See **[LICENSE](LICENSE)** for details.\n\n---\n\n## \ud83d\ude4f Acknowledgements\n\nGenAIRR is inspired by and builds upon amazing work from the immunoinformatics community\u2014especially [AIRRship](https://github.com/Cowanlab/airrship).\n\n<!-- End of README -->\n\n\n",
"bugtrack_url": null,
"license": null,
"summary": "An advanced immunoglobulin sequence simulation suite for benchmarking alignment models and sequence analysis.",
"version": "0.5.2",
"project_urls": {
"Bug Tracker": "https://github.com/MuteJester/GenAIRR/issues",
"Download": "https://github.com/MuteJester/GenAIRR/archive/refs/tags/0.5.1.tar.gz",
"Homepage": "https://github.com/MuteJester/GenAIRR"
},
"split_keywords": [
"immunogenetics",
" sequence simulation",
" bioinformatics",
" alignment benchmarking"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "f05406e1b83b2b023bc3989ecbe593b032954c2390524254356e36218fbeb8a8",
"md5": "58d89247d53337768dc0055a601c1148",
"sha256": "9b4076173db5d0dc8d3b38dc30deb35de17da84812b71bcee2f344fec312b507"
},
"downloads": -1,
"filename": "genairr-0.5.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "58d89247d53337768dc0055a601c1148",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 2353445,
"upload_time": "2025-08-06T09:29:56",
"upload_time_iso_8601": "2025-08-06T09:29:56.690795Z",
"url": "https://files.pythonhosted.org/packages/f0/54/06e1b83b2b023bc3989ecbe593b032954c2390524254356e36218fbeb8a8/genairr-0.5.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "21adf942a93e981f83397e57a16504da29acfa5e6b6b496a4eab6d99adaf3c32",
"md5": "00387e5030fc614b99cbcd3063825348",
"sha256": "60f63dbdea7e26e73b75cbdfeba4cf8668471881c7568ac40358635e08c54472"
},
"downloads": -1,
"filename": "genairr-0.5.2.tar.gz",
"has_sig": false,
"md5_digest": "00387e5030fc614b99cbcd3063825348",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 2470982,
"upload_time": "2025-08-06T09:29:58",
"upload_time_iso_8601": "2025-08-06T09:29:58.287615Z",
"url": "https://files.pythonhosted.org/packages/21/ad/f942a93e981f83397e57a16504da29acfa5e6b6b496a4eab6d99adaf3c32/genairr-0.5.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-06 09:29:58",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "MuteJester",
"github_project": "GenAIRR",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "pandas",
"specs": [
[
"~=",
"1.5.3"
]
]
},
{
"name": "numpy",
"specs": [
[
"~=",
"1.24.3"
]
]
},
{
"name": "scipy",
"specs": [
[
"~=",
"1.11.1"
]
]
},
{
"name": "setuptools",
"specs": [
[
"~=",
"68.0.0"
]
]
},
{
"name": "graphviz",
"specs": [
[
"~=",
"0.20.3"
]
]
},
{
"name": "tqdm",
"specs": [
[
"~=",
"4.67.1"
]
]
}
],
"lcname": "genairr"
}