GenAIRR


NameGenAIRR JSON
Version 0.5.2 PyPI version JSON
download
home_pagehttps://github.com/MuteJester/GenAIRR
SummaryAn advanced immunoglobulin sequence simulation suite for benchmarking alignment models and sequence analysis.
upload_time2025-08-06 09:29:58
maintainerNone
docs_urlNone
authorThomas Konstantinovsky & Ayelet Peres
requires_python>=3.9
licenseNone
keywords immunogenetics sequence simulation bioinformatics alignment benchmarking
VCS
bugtrack_url
requirements pandas numpy scipy setuptools graphviz tqdm
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <!-- -------------------------------------------------------- -->
<!-- Banner / Logo (optional) -->
<!-- Replace the link below with your own logo or SVG banner -->
<p align="center">
  <img src="https://raw.githubusercontent.com/your-org/GenAIRR/main/docs/_static/banner.svg" alt="GenAIRR" width="60%" />
</p>

<h1 align="center">GenAIRR</h1>

<p align="center">
  <b>Adaptive Immune Receptor Repertoire sequence simulator</b><br/>
  Generate realistic BCR & TCR repertoires in a single line of Python.
</p>

<p align="center">
  <a href="https://pypi.org/project/GenAIRR/"><img src="https://img.shields.io/pypi/v/GenAIRR.svg?logo=pypi&logoColor=white" alt="PyPI version"></a>
  <a href="https://genairr.readthedocs.io/en/latest/"><img src="https://img.shields.io/readthedocs/genairr?logo=readthedocs"></a>
</a>
</p>

---

## 📑 Table of Contents
1. [Why GenAIRR?](#-why-genairr)
2. [Key Features](#-key-features)
3. [Installation](#-installation)
4. [Quick Start](#-quick-start)
5. [Examples](#-examples)
6. [Mutation Models](#-mutation-models)
7. [Roadmap](#-roadmap)
8. [Contributing](#-contributing)
9. [Citing GenAIRR](#-citing-genairr)
10. [License](#-license)
11. [Acknowledgements](#-acknowledgements)

---

## 🧐 Why GenAIRR?
<details>
<summary>Click to expand</summary>

*Benchmarking modern aligners, exploring somatic-hypermutation, or stress-testing novel ML pipelines requires large, perfectly-annotated repertoires—not snippets of real data peppered with sequencing error.*  
GenAIRR fills that gap with a **plug-and-play, fully-extensible simulation engine** that produces sequences while giving you full ground-truth labels.

</details>

---

## ✨ Key Features
| Category | Highlights                                                                         |
| -------- |------------------------------------------------------------------------------------|
| **Realistic Simulation** | Context-aware S5F mutations, indels, allele-specific trimming, NP-region modelling |
| **Composable Pipelines** | Chain together built-in & custom `AugmentationStep`s into simulation pipelines     |
| **Multi-Chain Support** | Heavy & light BCRs plus TCR-β out of the box                                       |
| **Research-ready Output** | JSON / pandas export, built-in plotting stubs, deterministic seeds                 |
| **Docs & Tutorials** | Rich API docs, Jupyter notebooks, step-by-step guides                              |

---

## ⚡ Installation
```bash
# Python ≥ 3.9
pip install GenAIRR
# or the bleeding edge
pip install git+https://github.com/MuteJester/GenAIRR.git
````

---

## 🚀 Quick Start

Below is a 60-second tour. See [`/examples`](examples/) for notebooks and CLI usages.

```python
from GenAIRR.pipeline import AugmentationPipeline
from GenAIRR.steps import SimulateSequence, FixVPositionAfterTrimmingIndexAmbiguity
from GenAIRR.mutation import S5F
from GenAIRR.data import HUMAN_IGH_OGRDB
from GenAIRR.steps.StepBase import AugmentationStep

# 1️⃣  Configure built-in germline data
AugmentationStep.set_dataconfig(HUMAN_IGH_OGRDB)

# 2️⃣  Build a minimal pipeline
pipeline = AugmentationPipeline([
    SimulateSequence(S5F(min_mutation_rate=0.003, max_mutation_rate=0.25), True),
    FixVPositionAfterTrimmingIndexAmbiguity()
])

# 3️⃣  Simulate!
sim = pipeline.execute()
print(sim.get_dict())
```

---

## 🧑‍💻 Examples

### 1. Full Heavy-Chain Pipeline

```python
from GenAIRR.steps import (
    FixDPositionAfterTrimmingIndexAmbiguity, FixJPositionAfterTrimmingIndexAmbiguity,
    CorrectForVEndCut, CorrectForDTrims, CorruptSequenceBeginning,
    InsertNs, InsertIndels, ShortDValidation, DistillMutationRate
)

pipeline = AugmentationPipeline([
    SimulateSequence(S5F(min_mutation_rate=0.003, max_mutation_rate=0.25), True),
    FixVPositionAfterTrimmingIndexAmbiguity(),
    FixDPositionAfterTrimmingIndexAmbiguity(),
    FixJPositionAfterTrimmingIndexAmbiguity(),
    CorrectForVEndCut(),
    CorrectForDTrims(),
    CorruptSequenceBeginning(0.7, [0.4, 0.4, 0.2], 576, 210, 310, 50),
    InsertNs(0.02, 0.5),
    ShortDValidation(),
    InsertIndels(0.5, 5, 0.5, 0.5),
    DistillMutationRate()
])
result = pipeline.execute()
```

### 2. Naïve Sequence (no SHM)

```python
from GenAIRR.mutation import Uniform
naive_step = SimulateSequence(Uniform(0, 0), True)
pipeline = AugmentationPipeline([naive_step])
naive_seq = pipeline.execute()
print(naive_seq.sequence)
```

### 3. Custom Allele Combination

```python
custom_step = SimulateSequence(
    S5F(0.003, 0.25),
    True,
    specific_v=HUMAN_IGH_OGRDB.v_alleles['IGHV1-2*02'][0],  # specific V allele
    specific_d=HUMAN_IGH_OGRDB.d_alleles['IGHD3-10*01'][0], # specific D allele  
    specific_j=HUMAN_IGH_OGRDB.j_alleles['IGHJ4*02'][0]     # specific J allele
)
pipeline = AugmentationPipeline([custom_step])
print(pipeline.execute().get_dict())
```
---

## 🔬 Mutation Models

| Model            | Description                             | When to use                   |
| ---------------- | --------------------------------------- | ----------------------------- |
| `S5F`            | Context-specific somatic hyper-mutation | Antibody maturation studies   |
| `Uniform`        | Evenly random mutations                 | Baselines / ablation          |
| **Your Model ➕** | Implement `BaseMutationModel`           | Custom evolutionary scenarios |

```python
from GenAIRR.mutation import S5F
s5f = S5F(min_mutation_rate=0.01, max_mutation_rate=0.05)
mut_seq, muts, rate = s5f.apply_mutation(naive_seq)
```

---

## 🗺️ Roadmap

* [ ] 🚧 **More Complex Mutation Model (With Selection)**
* [ ] 🚧 **More Built-in Data Configs** (e.g., TCR, custom germlines)
* [ ] 🚧 **More Built-in Steps** (e.g., more mutation models, more data augmentation)
* [ ] 🚧 **Deeper Docs** (e.g., more examples, more tutorials)

*See the [open issues](https://github.com/your-org/GenAIRR/issues).*
  Feel something’s missing? [Open a feature request](https://github.com/your-org/GenAIRR/issues/new).

---

## 🤝 Contributing

Contributions are welcome! 💙
Please read our [contributing guide](CONTRIBUTING.md) and check the **good first issue** label.

---

## ✏️ Citing GenAIRR

If GenAIRR helps your research, please cite:

```
Konstantinovsky T, Peres A, Polak P, Yaari G.  
An unbiased comparison of immunoglobulin sequence aligners.
Briefings in Bioinformatics. 2024 Sep 23; 25(6): bbae556.  
https://doi.org/10.1093/bib/bbae556  
PMID: 39489605 | PMCID: PMC11531861
```

---

## 📜 License

Distributed under the GPL3 License. See **[LICENSE](LICENSE)** for details.

---

## 🙏 Acknowledgements

GenAIRR is inspired by and builds upon amazing work from the immunoinformatics community—especially [AIRRship](https://github.com/Cowanlab/airrship).

<!-- End of README -->



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/MuteJester/GenAIRR",
    "name": "GenAIRR",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "immunogenetics, sequence simulation, bioinformatics, alignment benchmarking",
    "author": "Thomas Konstantinovsky & Ayelet Peres",
    "author_email": "thomaskon90@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/21/ad/f942a93e981f83397e57a16504da29acfa5e6b6b496a4eab6d99adaf3c32/genairr-0.5.2.tar.gz",
    "platform": null,
    "description": "<!-- -------------------------------------------------------- -->\n<!-- Banner / Logo (optional) -->\n<!-- Replace the link below with your own logo or SVG banner -->\n<p align=\"center\">\n  <img src=\"https://raw.githubusercontent.com/your-org/GenAIRR/main/docs/_static/banner.svg\" alt=\"GenAIRR\" width=\"60%\" />\n</p>\n\n<h1 align=\"center\">GenAIRR</h1>\n\n<p align=\"center\">\n  <b>Adaptive Immune Receptor Repertoire sequence simulator</b><br/>\n  Generate realistic BCR & TCR repertoires in a single line of Python.\n</p>\n\n<p align=\"center\">\n  <a href=\"https://pypi.org/project/GenAIRR/\"><img src=\"https://img.shields.io/pypi/v/GenAIRR.svg?logo=pypi&logoColor=white\" alt=\"PyPI version\"></a>\n  <a href=\"https://genairr.readthedocs.io/en/latest/\"><img src=\"https://img.shields.io/readthedocs/genairr?logo=readthedocs\"></a>\n</a>\n</p>\n\n---\n\n## \ud83d\udcd1 Table of Contents\n1. [Why GenAIRR?](#-why-genairr)\n2. [Key Features](#-key-features)\n3. [Installation](#-installation)\n4. [Quick Start](#-quick-start)\n5. [Examples](#-examples)\n6. [Mutation Models](#-mutation-models)\n7. [Roadmap](#-roadmap)\n8. [Contributing](#-contributing)\n9. [Citing GenAIRR](#-citing-genairr)\n10. [License](#-license)\n11. [Acknowledgements](#-acknowledgements)\n\n---\n\n## \ud83e\uddd0 Why GenAIRR?\n<details>\n<summary>Click to expand</summary>\n\n*Benchmarking modern aligners, exploring somatic-hypermutation, or stress-testing novel ML pipelines requires large, perfectly-annotated repertoires\u2014not snippets of real data peppered with sequencing error.*  \nGenAIRR fills that gap with a **plug-and-play, fully-extensible simulation engine** that produces sequences while giving you full ground-truth labels.\n\n</details>\n\n---\n\n## \u2728 Key Features\n| Category | Highlights                                                                         |\n| -------- |------------------------------------------------------------------------------------|\n| **Realistic Simulation** | Context-aware S5F mutations, indels, allele-specific trimming, NP-region modelling |\n| **Composable Pipelines** | Chain together built-in & custom `AugmentationStep`s into simulation pipelines     |\n| **Multi-Chain Support** | Heavy & light BCRs plus TCR-\u03b2 out of the box                                       |\n| **Research-ready Output** | JSON / pandas export, built-in plotting stubs, deterministic seeds                 |\n| **Docs & Tutorials** | Rich API docs, Jupyter notebooks, step-by-step guides                              |\n\n---\n\n## \u26a1 Installation\n```bash\n# Python \u2265 3.9\npip install GenAIRR\n# or the bleeding edge\npip install git+https://github.com/MuteJester/GenAIRR.git\n````\n\n---\n\n## \ud83d\ude80 Quick Start\n\nBelow is a 60-second tour. See [`/examples`](examples/) for notebooks and CLI usages.\n\n```python\nfrom GenAIRR.pipeline import AugmentationPipeline\nfrom GenAIRR.steps import SimulateSequence, FixVPositionAfterTrimmingIndexAmbiguity\nfrom GenAIRR.mutation import S5F\nfrom GenAIRR.data import HUMAN_IGH_OGRDB\nfrom GenAIRR.steps.StepBase import AugmentationStep\n\n# 1\ufe0f\u20e3  Configure built-in germline data\nAugmentationStep.set_dataconfig(HUMAN_IGH_OGRDB)\n\n# 2\ufe0f\u20e3  Build a minimal pipeline\npipeline = AugmentationPipeline([\n    SimulateSequence(S5F(min_mutation_rate=0.003, max_mutation_rate=0.25), True),\n    FixVPositionAfterTrimmingIndexAmbiguity()\n])\n\n# 3\ufe0f\u20e3  Simulate!\nsim = pipeline.execute()\nprint(sim.get_dict())\n```\n\n---\n\n## \ud83e\uddd1\u200d\ud83d\udcbb Examples\n\n### 1. Full Heavy-Chain Pipeline\n\n```python\nfrom GenAIRR.steps import (\n    FixDPositionAfterTrimmingIndexAmbiguity, FixJPositionAfterTrimmingIndexAmbiguity,\n    CorrectForVEndCut, CorrectForDTrims, CorruptSequenceBeginning,\n    InsertNs, InsertIndels, ShortDValidation, DistillMutationRate\n)\n\npipeline = AugmentationPipeline([\n    SimulateSequence(S5F(min_mutation_rate=0.003, max_mutation_rate=0.25), True),\n    FixVPositionAfterTrimmingIndexAmbiguity(),\n    FixDPositionAfterTrimmingIndexAmbiguity(),\n    FixJPositionAfterTrimmingIndexAmbiguity(),\n    CorrectForVEndCut(),\n    CorrectForDTrims(),\n    CorruptSequenceBeginning(0.7, [0.4, 0.4, 0.2], 576, 210, 310, 50),\n    InsertNs(0.02, 0.5),\n    ShortDValidation(),\n    InsertIndels(0.5, 5, 0.5, 0.5),\n    DistillMutationRate()\n])\nresult = pipeline.execute()\n```\n\n### 2. Na\u00efve Sequence (no SHM)\n\n```python\nfrom GenAIRR.mutation import Uniform\nnaive_step = SimulateSequence(Uniform(0, 0), True)\npipeline = AugmentationPipeline([naive_step])\nnaive_seq = pipeline.execute()\nprint(naive_seq.sequence)\n```\n\n### 3. Custom Allele Combination\n\n```python\ncustom_step = SimulateSequence(\n    S5F(0.003, 0.25),\n    True,\n    specific_v=HUMAN_IGH_OGRDB.v_alleles['IGHV1-2*02'][0],  # specific V allele\n    specific_d=HUMAN_IGH_OGRDB.d_alleles['IGHD3-10*01'][0], # specific D allele  \n    specific_j=HUMAN_IGH_OGRDB.j_alleles['IGHJ4*02'][0]     # specific J allele\n)\npipeline = AugmentationPipeline([custom_step])\nprint(pipeline.execute().get_dict())\n```\n---\n\n## \ud83d\udd2c Mutation Models\n\n| Model            | Description                             | When to use                   |\n| ---------------- | --------------------------------------- | ----------------------------- |\n| `S5F`            | Context-specific somatic hyper-mutation | Antibody maturation studies   |\n| `Uniform`        | Evenly random mutations                 | Baselines / ablation          |\n| **Your Model \u2795** | Implement `BaseMutationModel`           | Custom evolutionary scenarios |\n\n```python\nfrom GenAIRR.mutation import S5F\ns5f = S5F(min_mutation_rate=0.01, max_mutation_rate=0.05)\nmut_seq, muts, rate = s5f.apply_mutation(naive_seq)\n```\n\n---\n\n## \ud83d\uddfa\ufe0f Roadmap\n\n* [ ] \ud83d\udea7 **More Complex Mutation Model (With Selection)**\n* [ ] \ud83d\udea7 **More Built-in Data Configs** (e.g., TCR, custom germlines)\n* [ ] \ud83d\udea7 **More Built-in Steps** (e.g., more mutation models, more data augmentation)\n* [ ] \ud83d\udea7 **Deeper Docs** (e.g., more examples, more tutorials)\n\n*See the [open issues](https://github.com/your-org/GenAIRR/issues).*\n  Feel something\u2019s missing? [Open a feature request](https://github.com/your-org/GenAIRR/issues/new).\n\n---\n\n## \ud83e\udd1d Contributing\n\nContributions are welcome! \ud83d\udc99\nPlease read our [contributing guide](CONTRIBUTING.md) and check the **good first issue** label.\n\n---\n\n## \u270f\ufe0f Citing GenAIRR\n\nIf GenAIRR helps your research, please cite:\n\n```\nKonstantinovsky T, Peres A, Polak P, Yaari G.  \nAn unbiased comparison of immunoglobulin sequence aligners.\nBriefings in Bioinformatics. 2024 Sep 23; 25(6): bbae556.  \nhttps://doi.org/10.1093/bib/bbae556  \nPMID: 39489605\u2003|\u2003PMCID: PMC11531861\n```\n\n---\n\n## \ud83d\udcdc License\n\nDistributed under the GPL3 License. See **[LICENSE](LICENSE)** for details.\n\n---\n\n## \ud83d\ude4f Acknowledgements\n\nGenAIRR is inspired by and builds upon amazing work from the immunoinformatics community\u2014especially [AIRRship](https://github.com/Cowanlab/airrship).\n\n<!-- End of README -->\n\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "An advanced immunoglobulin sequence simulation suite for benchmarking alignment models and sequence analysis.",
    "version": "0.5.2",
    "project_urls": {
        "Bug Tracker": "https://github.com/MuteJester/GenAIRR/issues",
        "Download": "https://github.com/MuteJester/GenAIRR/archive/refs/tags/0.5.1.tar.gz",
        "Homepage": "https://github.com/MuteJester/GenAIRR"
    },
    "split_keywords": [
        "immunogenetics",
        " sequence simulation",
        " bioinformatics",
        " alignment benchmarking"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "f05406e1b83b2b023bc3989ecbe593b032954c2390524254356e36218fbeb8a8",
                "md5": "58d89247d53337768dc0055a601c1148",
                "sha256": "9b4076173db5d0dc8d3b38dc30deb35de17da84812b71bcee2f344fec312b507"
            },
            "downloads": -1,
            "filename": "genairr-0.5.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "58d89247d53337768dc0055a601c1148",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 2353445,
            "upload_time": "2025-08-06T09:29:56",
            "upload_time_iso_8601": "2025-08-06T09:29:56.690795Z",
            "url": "https://files.pythonhosted.org/packages/f0/54/06e1b83b2b023bc3989ecbe593b032954c2390524254356e36218fbeb8a8/genairr-0.5.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "21adf942a93e981f83397e57a16504da29acfa5e6b6b496a4eab6d99adaf3c32",
                "md5": "00387e5030fc614b99cbcd3063825348",
                "sha256": "60f63dbdea7e26e73b75cbdfeba4cf8668471881c7568ac40358635e08c54472"
            },
            "downloads": -1,
            "filename": "genairr-0.5.2.tar.gz",
            "has_sig": false,
            "md5_digest": "00387e5030fc614b99cbcd3063825348",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 2470982,
            "upload_time": "2025-08-06T09:29:58",
            "upload_time_iso_8601": "2025-08-06T09:29:58.287615Z",
            "url": "https://files.pythonhosted.org/packages/21/ad/f942a93e981f83397e57a16504da29acfa5e6b6b496a4eab6d99adaf3c32/genairr-0.5.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-06 09:29:58",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "MuteJester",
    "github_project": "GenAIRR",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "pandas",
            "specs": [
                [
                    "~=",
                    "1.5.3"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    "~=",
                    "1.24.3"
                ]
            ]
        },
        {
            "name": "scipy",
            "specs": [
                [
                    "~=",
                    "1.11.1"
                ]
            ]
        },
        {
            "name": "setuptools",
            "specs": [
                [
                    "~=",
                    "68.0.0"
                ]
            ]
        },
        {
            "name": "graphviz",
            "specs": [
                [
                    "~=",
                    "0.20.3"
                ]
            ]
        },
        {
            "name": "tqdm",
            "specs": [
                [
                    "~=",
                    "4.67.1"
                ]
            ]
        }
    ],
    "lcname": "genairr"
}
        
Elapsed time: 1.14625s