# Pelops
Dedicated caller for *DUX4* rearrangements from whole genome sequencing data.
## Citing
Pelops is based on a method first described in:
Ryan, S.L., Peden, J.F., Kingsbury, Z. et al.
Whole genome sequencing provides comprehensive genetic testing in childhood B-cell acute lymphoblastic leukaemia.
Leukemia 37, 518–528 (2023). https://doi.org/10.1038/s41375-022-01806-8
Pelops itself is described and validated in:
Grobecker, P., Mijuskovic, M., et al.
Pelops: A dedicated caller for *DUX4* rearrangements from short-read whole genome sequencing data.
In preparation (2024)
## Prerequisites
- Python 3.7 and above.
## Installation
You can install the latest stable released version of Pelops using pip
```shell
pip install ilmn-pelops --upgrade
```
Note: pip/pypi will prefer stable versions (i.e. 0.7.0) over subsequent beta releases (i.e.
0.8.0b1). If you need to install a beta version for testing, please first
uninstall your current version of pelops.
```shell
pip uninstall ilmn-pelops
```
## Usage
Pelops is a tool with a command line interface (cli). Discover its usage with
```shell
pelops --help
```
### Calling _DUX4_-rearrangements
To call DUX4-rearrangements from a BAM/CRAM file, use the `dux4r` subcommand. To
see all available options run
```shell
pelops dux4r --help
```
## Inputs
The input to Pelops is a short-read whole-genome sequencing BAM or CRAM file from a tumour sample,
aligned to the GRCh38 reference genome. The BAM/CRAM file needs to be indexed.
Pelops was tested on alignments by DRAGEN (version 4.0.3), bwa (version 0.7.17), and Isaac (version SAAC01325.18.01.29).
### Systematic noise BED file
To increase specificity when calling non-_IGH_ _DUX4_-rearrangements, we recommend using a systematic noise BED file.
This file contains genomic regions that will be ignored by Pelops when identifying candidate regions involved in a
_DUX4_-rearrangement. Since such regions can be specific to the read alignment tool, reference genome,
sequencing protocol, and cancer type analysed, we recommend creating a separate systematic noise BED file
for each project. One way to obtain these genomic regions would be to run Pelops on a panel of normal samples, which are
guaranteed to have no _DUX4_-rearrangements, and generate a list of false-positive calls.
## Outputs
Pelops outputs results in a JSON file, and optionally exports supporting reads in SAM files.
### Description of JSON output
The **top level** of the JSON contains information about pelops (assumed genome reference, version, name, and CLI command).
It also contains information about the input file (number of unique and mapped reads - which can be a user input).
Finally, it contains a list of rearrangements investigated by pelops.
```json
{
"reference": "GRCh38",
"unique_mapped_reads": 1000000000,
"rearrangements": [...],
"program_name": "pelops",
"version": "0.5.0",
"cli_command": "pelops dux4r --total-number-reads 1000000000 --export . test.bam"
}
```
The **rearrangements** consist of a unique ID, genomic region sets "A" and "B", and the evidence for the rearrangement
between these two regions.
For the command `pelops dux4r`, ID `01` always corresponds to rearrangements between the core _DUX4_ regions and _IGH_,
while ID `02` corresponds to rearrangements of the extended _DUX4_ regions with _IGH_.
IDs `03` and beyond are potential rearrangements of the core _DUX4_ region with other genomic regions (marked as `UNNAMED`);
there can be a variable number of them.
```json
{
"rearrangements": [
{
"id": "01",
"A": {"name": "CoreDUX4"...},
"B": {"name": "IGH"...},
"evidence": {...}
},
{
"id": "02",
"A": {"name": "ExtendedDUX4"...},
"B": {"name": "IGH"...},
"evidence": {...}
},
{
"id": "03",
"A": {"name": "CoreDUX4"...},
"B": {"name": "UNNAMED"...},
"evidence": {...}
}
]
}
```
**A** and **B** document the exact set of genomic regions used for each rearrangement.
For example, the core _DUX4_ region is shown below.
While `IGH`, `CoreDUX4` and `ExtendedDUX4` are pre-defined, each `UNNAMED` region will be different.
```json
{
"name": "CoreDUX4",
"regions": [
{
"chrom": "chr4",
"start": 190020407,
"end": 190023665
},
{
"chrom": "chr4",
"start": 190066935,
"end": 190093279
},
{
"chrom": "chr4",
"start": 190172774,
"end": 190176845
},
{
"chrom": "chr10",
"start": 133663429,
"end": 133685936
},
{
"chrom": "chr10",
"start": 133739606,
"end": 133762125
}
]
}
```
The **evidence** for each rearrangement consists of the number of split and paired reads between region sets A and B,
and the spanning read pairs per billion (SRPB). It is calculated as
$$\text{SRPB} = 10^9 \frac{\text{paired reads} + \text{split reads}}{\text{total unique and mapped reads}}.$$
```json
{
"paired_reads": 15,
"split_reads": 4,
"SRPB": 19.0
}
```
### SAM files
Optionally, for each rearrangement a SAM file can be exported which contains all paired and split reads with their mates.
The naming convention is `<id>_<name_A>-<name_B>.sam`, where `<id>`, `<name_A>`, `<name_B>` correspond to the ID and
names of genomic region sets A and B, respectively, as documented in the JSON.
## Contributing
We are not accepting pull requests into this repository at this time, as the
licence currently does not allow modifications by third parties.
For any bug report / recommendation / feature request, please open an issue.
## Credits
See [Authors](AUTHORS.md).
Raw data
{
"_id": null,
"home_page": "",
"name": "ilmn-pelops",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": "Stefano Berri <sberri@illumina.com>, Kai Jie Chow <kchow@illumina.com>",
"keywords": "DUX4,acute lymphoblastic leukaemia,whole-genome sequencing,IGH::DUX4 rearrangements",
"author": "",
"author_email": "Stefano Berri <sberri@illumina.com>, Pascal Grobecker <pgrobecker@illumina.com>, Kai Jie Chow <kchow@illumina.com>, Martina Mijuskovic <mmijuskovic@illumina.com>",
"download_url": "https://files.pythonhosted.org/packages/f4/33/d279c3180f01b4142e1167eb19ccbddf7ac90b7056a0e7e15d766f531c19/ilmn-pelops-0.8.0.tar.gz",
"platform": null,
"description": "# Pelops\n\nDedicated caller for *DUX4* rearrangements from whole genome sequencing data.\n\n## Citing\nPelops is based on a method first described in:\n\nRyan, S.L., Peden, J.F., Kingsbury, Z. et al.\nWhole genome sequencing provides comprehensive genetic testing in childhood B-cell acute lymphoblastic leukaemia.\nLeukemia 37, 518\u2013528 (2023). https://doi.org/10.1038/s41375-022-01806-8\n\nPelops itself is described and validated in:\n\nGrobecker, P., Mijuskovic, M., et al.\nPelops: A dedicated caller for *DUX4* rearrangements from short-read whole genome sequencing data.\nIn preparation (2024)\n\n\n## Prerequisites\n\n- Python 3.7 and above.\n\n## Installation\n\nYou can install the latest stable released version of Pelops using pip\n\n```shell\npip install ilmn-pelops --upgrade\n```\n\nNote: pip/pypi will prefer stable versions (i.e. 0.7.0) over subsequent beta releases (i.e.\n0.8.0b1). If you need to install a beta version for testing, please first\nuninstall your current version of pelops.\n\n```shell\npip uninstall ilmn-pelops\n```\n\n## Usage\n\nPelops is a tool with a command line interface (cli). Discover its usage with\n```shell\npelops --help\n```\n\n### Calling _DUX4_-rearrangements\nTo call DUX4-rearrangements from a BAM/CRAM file, use the `dux4r` subcommand. To\nsee all available options run\n```shell\npelops dux4r --help\n```\n\n## Inputs\n\nThe input to Pelops is a short-read whole-genome sequencing BAM or CRAM file from a tumour sample,\naligned to the GRCh38 reference genome. The BAM/CRAM file needs to be indexed.\nPelops was tested on alignments by DRAGEN (version 4.0.3), bwa (version 0.7.17), and Isaac (version SAAC01325.18.01.29).\n\n### Systematic noise BED file\n\nTo increase specificity when calling non-_IGH_ _DUX4_-rearrangements, we recommend using a systematic noise BED file.\nThis file contains genomic regions that will be ignored by Pelops when identifying candidate regions involved in a\n_DUX4_-rearrangement. Since such regions can be specific to the read alignment tool, reference genome,\nsequencing protocol, and cancer type analysed, we recommend creating a separate systematic noise BED file\nfor each project. One way to obtain these genomic regions would be to run Pelops on a panel of normal samples, which are\nguaranteed to have no _DUX4_-rearrangements, and generate a list of false-positive calls.\n\n## Outputs\n\nPelops outputs results in a JSON file, and optionally exports supporting reads in SAM files.\n\n### Description of JSON output\nThe **top level** of the JSON contains information about pelops (assumed genome reference, version, name, and CLI command).\nIt also contains information about the input file (number of unique and mapped reads - which can be a user input).\nFinally, it contains a list of rearrangements investigated by pelops.\n\n```json\n{\n \"reference\": \"GRCh38\",\n \"unique_mapped_reads\": 1000000000,\n \"rearrangements\": [...],\n \"program_name\": \"pelops\",\n \"version\": \"0.5.0\",\n \"cli_command\": \"pelops dux4r --total-number-reads 1000000000 --export . test.bam\"\n}\n```\n\nThe **rearrangements** consist of a unique ID, genomic region sets \"A\" and \"B\", and the evidence for the rearrangement\nbetween these two regions.\nFor the command `pelops dux4r`, ID `01` always corresponds to rearrangements between the core _DUX4_ regions and _IGH_,\nwhile ID `02` corresponds to rearrangements of the extended _DUX4_ regions with _IGH_.\nIDs `03` and beyond are potential rearrangements of the core _DUX4_ region with other genomic regions (marked as `UNNAMED`);\nthere can be a variable number of them.\n\n```json\n{\n \"rearrangements\": [\n {\n \"id\": \"01\",\n \"A\": {\"name\": \"CoreDUX4\"...},\n \"B\": {\"name\": \"IGH\"...},\n \"evidence\": {...}\n },\n {\n \"id\": \"02\",\n \"A\": {\"name\": \"ExtendedDUX4\"...},\n \"B\": {\"name\": \"IGH\"...},\n \"evidence\": {...}\n },\n {\n \"id\": \"03\",\n \"A\": {\"name\": \"CoreDUX4\"...},\n \"B\": {\"name\": \"UNNAMED\"...},\n \"evidence\": {...}\n }\n ]\n}\n```\n\n**A** and **B** document the exact set of genomic regions used for each rearrangement.\nFor example, the core _DUX4_ region is shown below.\nWhile `IGH`, `CoreDUX4` and `ExtendedDUX4` are pre-defined, each `UNNAMED` region will be different.\n\n```json\n{\n \"name\": \"CoreDUX4\",\n \"regions\": [\n {\n \"chrom\": \"chr4\",\n \"start\": 190020407,\n \"end\": 190023665\n },\n {\n \"chrom\": \"chr4\",\n \"start\": 190066935,\n \"end\": 190093279\n },\n {\n \"chrom\": \"chr4\",\n \"start\": 190172774,\n \"end\": 190176845\n },\n {\n \"chrom\": \"chr10\",\n \"start\": 133663429,\n \"end\": 133685936\n },\n {\n \"chrom\": \"chr10\",\n \"start\": 133739606,\n \"end\": 133762125\n }\n ]\n}\n```\n\nThe **evidence** for each rearrangement consists of the number of split and paired reads between region sets A and B,\nand the spanning read pairs per billion (SRPB). It is calculated as\n$$\\text{SRPB} = 10^9 \\frac{\\text{paired reads} + \\text{split reads}}{\\text{total unique and mapped reads}}.$$\n\n```json\n{\n \"paired_reads\": 15,\n \"split_reads\": 4,\n \"SRPB\": 19.0\n}\n```\n\n### SAM files\nOptionally, for each rearrangement a SAM file can be exported which contains all paired and split reads with their mates.\nThe naming convention is `<id>_<name_A>-<name_B>.sam`, where `<id>`, `<name_A>`, `<name_B>` correspond to the ID and\nnames of genomic region sets A and B, respectively, as documented in the JSON.\n\n## Contributing\nWe are not accepting pull requests into this repository at this time, as the\nlicence currently does not allow modifications by third parties.\nFor any bug report / recommendation / feature request, please open an issue.\n\n## Credits\n\nSee [Authors](AUTHORS.md).\n",
"bugtrack_url": null,
"license": "PolyForm Strict License 1.0.0",
"summary": "Dedicated caller for DUX4 rearrangements from whole genome sequencing data.",
"version": "0.8.0",
"project_urls": {
"Repository": "https://github.com/Illumina/Pelops"
},
"split_keywords": [
"dux4",
"acute lymphoblastic leukaemia",
"whole-genome sequencing",
"igh::dux4 rearrangements"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "ca30f13eb8a98a2fcf8ac10cf18dcf882a91c0cb0b5b096c276dcad676b69479",
"md5": "47e7273f728963f7edc2a526a0c19f38",
"sha256": "5ac0caa77d9909415e6d10fac6b226a202390212d8444c8afe3934c39107b78f"
},
"downloads": -1,
"filename": "ilmn_pelops-0.8.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "47e7273f728963f7edc2a526a0c19f38",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 40887,
"upload_time": "2024-03-18T11:30:29",
"upload_time_iso_8601": "2024-03-18T11:30:29.087116Z",
"url": "https://files.pythonhosted.org/packages/ca/30/f13eb8a98a2fcf8ac10cf18dcf882a91c0cb0b5b096c276dcad676b69479/ilmn_pelops-0.8.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "f433d279c3180f01b4142e1167eb19ccbddf7ac90b7056a0e7e15d766f531c19",
"md5": "a31bd2c20c67709019d1065cfb263790",
"sha256": "39942601dc15c7b4081ca45c39fe36b133eca4989d2e990be730d4d00d9788a9"
},
"downloads": -1,
"filename": "ilmn-pelops-0.8.0.tar.gz",
"has_sig": false,
"md5_digest": "a31bd2c20c67709019d1065cfb263790",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 31090,
"upload_time": "2024-03-18T11:30:30",
"upload_time_iso_8601": "2024-03-18T11:30:30.547750Z",
"url": "https://files.pythonhosted.org/packages/f4/33/d279c3180f01b4142e1167eb19ccbddf7ac90b7056a0e7e15d766f531c19/ilmn-pelops-0.8.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-03-18 11:30:30",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Illumina",
"github_project": "Pelops",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"tox": true,
"lcname": "ilmn-pelops"
}