ilmn-pelops


Nameilmn-pelops JSON
Version 0.8.0 PyPI version JSON
download
home_page
SummaryDedicated caller for DUX4 rearrangements from whole genome sequencing data.
upload_time2024-03-18 11:30:30
maintainer
docs_urlNone
author
requires_python>=3.7
licensePolyForm Strict License 1.0.0
keywords dux4 acute lymphoblastic leukaemia whole-genome sequencing igh::dux4 rearrangements
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Pelops

Dedicated caller for *DUX4* rearrangements from whole genome sequencing data.

## Citing
Pelops is based on a method first described in:

Ryan, S.L., Peden, J.F., Kingsbury, Z. et al.
Whole genome sequencing provides comprehensive genetic testing in childhood B-cell acute lymphoblastic leukaemia.
Leukemia 37, 518–528 (2023). https://doi.org/10.1038/s41375-022-01806-8

Pelops itself is described and validated in:

Grobecker, P., Mijuskovic, M., et al.
Pelops: A dedicated caller for *DUX4* rearrangements from short-read whole genome sequencing data.
In preparation (2024)


## Prerequisites

- Python 3.7 and above.

## Installation

You can install the latest stable released version of Pelops using pip

```shell
pip install ilmn-pelops --upgrade
```

Note: pip/pypi will prefer stable versions (i.e. 0.7.0) over subsequent beta releases (i.e.
0.8.0b1). If you need to install a beta version for testing, please first
uninstall your current version of pelops.

```shell
pip uninstall ilmn-pelops
```

## Usage

Pelops is a tool with a command line interface (cli). Discover its usage with
```shell
pelops --help
```

### Calling _DUX4_-rearrangements
To call DUX4-rearrangements from a BAM/CRAM file, use the `dux4r` subcommand. To
see all available options run
```shell
pelops dux4r --help
```

## Inputs

The input to Pelops is a short-read whole-genome sequencing BAM or CRAM file from a tumour sample,
aligned to the GRCh38 reference genome. The BAM/CRAM file needs to be indexed.
Pelops was tested on alignments by DRAGEN (version 4.0.3), bwa (version 0.7.17), and Isaac (version SAAC01325.18.01.29).

### Systematic noise BED file

To increase specificity when calling non-_IGH_ _DUX4_-rearrangements, we recommend using a systematic noise BED file.
This file contains genomic regions that will be ignored by Pelops when identifying candidate regions involved in a
_DUX4_-rearrangement. Since such regions can be specific to the read alignment tool, reference genome,
sequencing protocol, and cancer type analysed, we recommend creating a separate systematic noise BED file
for each project. One way to obtain these genomic regions would be to run Pelops on a panel of normal samples, which are
guaranteed to have no _DUX4_-rearrangements, and generate a list of false-positive calls.

## Outputs

Pelops outputs results in a JSON file, and optionally exports supporting reads in SAM files.

### Description of JSON output
The **top level** of the JSON contains information about pelops (assumed genome reference, version, name, and CLI command).
It also contains information about the input file (number of unique and mapped reads - which can be a user input).
Finally, it contains a list of rearrangements investigated by pelops.

```json
{
    "reference": "GRCh38",
    "unique_mapped_reads": 1000000000,
    "rearrangements": [...],
    "program_name": "pelops",
    "version": "0.5.0",
    "cli_command": "pelops dux4r --total-number-reads 1000000000 --export . test.bam"
}
```

The **rearrangements** consist of a unique ID, genomic region sets "A" and "B", and the evidence for the rearrangement
between these two regions.
For the command `pelops dux4r`, ID `01` always corresponds to rearrangements between the core _DUX4_ regions and _IGH_,
while ID `02` corresponds to rearrangements of the extended _DUX4_ regions with _IGH_.
IDs `03` and beyond are potential rearrangements of the core _DUX4_ region with other genomic regions (marked as `UNNAMED`);
there can be a variable number of them.

```json
{
  "rearrangements": [
    {
      "id": "01",
      "A": {"name": "CoreDUX4"...},
      "B": {"name": "IGH"...},
      "evidence": {...}
    },
    {
      "id": "02",
      "A": {"name": "ExtendedDUX4"...},
      "B": {"name": "IGH"...},
      "evidence": {...}
    },
    {
      "id": "03",
      "A": {"name": "CoreDUX4"...},
      "B": {"name": "UNNAMED"...},
      "evidence": {...}
    }
  ]
}
```

**A** and **B** document the exact set of genomic regions used for each rearrangement.
For example, the core _DUX4_ region is shown below.
While `IGH`, `CoreDUX4` and `ExtendedDUX4` are pre-defined, each `UNNAMED` region will be different.

```json
{
  "name": "CoreDUX4",
  "regions": [
    {
      "chrom": "chr4",
      "start": 190020407,
      "end": 190023665
    },
    {
      "chrom": "chr4",
      "start": 190066935,
      "end": 190093279
    },
    {
      "chrom": "chr4",
      "start": 190172774,
      "end": 190176845
    },
    {
      "chrom": "chr10",
      "start": 133663429,
      "end": 133685936
    },
    {
      "chrom": "chr10",
      "start": 133739606,
      "end": 133762125
    }
  ]
}
```

The **evidence** for each rearrangement consists of the number of split and paired reads between region sets A and B,
and the spanning read pairs per billion (SRPB). It is calculated as
$$\text{SRPB} = 10^9 \frac{\text{paired reads} + \text{split reads}}{\text{total unique and mapped reads}}.$$

```json
{
  "paired_reads": 15,
  "split_reads": 4,
  "SRPB": 19.0
}
```

### SAM files
Optionally, for each rearrangement a SAM file can be exported which contains all paired and split reads with their mates.
The naming convention is `<id>_<name_A>-<name_B>.sam`, where `<id>`, `<name_A>`, `<name_B>` correspond to the ID and
names of genomic region sets A and B, respectively, as documented in the JSON.

## Contributing
We are not accepting pull requests into this repository at this time, as the
licence currently does not allow modifications by third parties.
For any bug report / recommendation / feature request, please open an issue.

## Credits

See [Authors](AUTHORS.md).

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "ilmn-pelops",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "Stefano Berri <sberri@illumina.com>, Kai Jie Chow <kchow@illumina.com>",
    "keywords": "DUX4,acute lymphoblastic leukaemia,whole-genome sequencing,IGH::DUX4 rearrangements",
    "author": "",
    "author_email": "Stefano Berri <sberri@illumina.com>, Pascal Grobecker <pgrobecker@illumina.com>, Kai Jie Chow <kchow@illumina.com>, Martina Mijuskovic <mmijuskovic@illumina.com>",
    "download_url": "https://files.pythonhosted.org/packages/f4/33/d279c3180f01b4142e1167eb19ccbddf7ac90b7056a0e7e15d766f531c19/ilmn-pelops-0.8.0.tar.gz",
    "platform": null,
    "description": "# Pelops\n\nDedicated caller for *DUX4* rearrangements from whole genome sequencing data.\n\n## Citing\nPelops is based on a method first described in:\n\nRyan, S.L., Peden, J.F., Kingsbury, Z. et al.\nWhole genome sequencing provides comprehensive genetic testing in childhood B-cell acute lymphoblastic leukaemia.\nLeukemia 37, 518\u2013528 (2023). https://doi.org/10.1038/s41375-022-01806-8\n\nPelops itself is described and validated in:\n\nGrobecker, P., Mijuskovic, M., et al.\nPelops: A dedicated caller for *DUX4* rearrangements from short-read whole genome sequencing data.\nIn preparation (2024)\n\n\n## Prerequisites\n\n- Python 3.7 and above.\n\n## Installation\n\nYou can install the latest stable released version of Pelops using pip\n\n```shell\npip install ilmn-pelops --upgrade\n```\n\nNote: pip/pypi will prefer stable versions (i.e. 0.7.0) over subsequent beta releases (i.e.\n0.8.0b1). If you need to install a beta version for testing, please first\nuninstall your current version of pelops.\n\n```shell\npip uninstall ilmn-pelops\n```\n\n## Usage\n\nPelops is a tool with a command line interface (cli). Discover its usage with\n```shell\npelops --help\n```\n\n### Calling _DUX4_-rearrangements\nTo call DUX4-rearrangements from a BAM/CRAM file, use the `dux4r` subcommand. To\nsee all available options run\n```shell\npelops dux4r --help\n```\n\n## Inputs\n\nThe input to Pelops is a short-read whole-genome sequencing BAM or CRAM file from a tumour sample,\naligned to the GRCh38 reference genome. The BAM/CRAM file needs to be indexed.\nPelops was tested on alignments by DRAGEN (version 4.0.3), bwa (version 0.7.17), and Isaac (version SAAC01325.18.01.29).\n\n### Systematic noise BED file\n\nTo increase specificity when calling non-_IGH_ _DUX4_-rearrangements, we recommend using a systematic noise BED file.\nThis file contains genomic regions that will be ignored by Pelops when identifying candidate regions involved in a\n_DUX4_-rearrangement. Since such regions can be specific to the read alignment tool, reference genome,\nsequencing protocol, and cancer type analysed, we recommend creating a separate systematic noise BED file\nfor each project. One way to obtain these genomic regions would be to run Pelops on a panel of normal samples, which are\nguaranteed to have no _DUX4_-rearrangements, and generate a list of false-positive calls.\n\n## Outputs\n\nPelops outputs results in a JSON file, and optionally exports supporting reads in SAM files.\n\n### Description of JSON output\nThe **top level** of the JSON contains information about pelops (assumed genome reference, version, name, and CLI command).\nIt also contains information about the input file (number of unique and mapped reads - which can be a user input).\nFinally, it contains a list of rearrangements investigated by pelops.\n\n```json\n{\n    \"reference\": \"GRCh38\",\n    \"unique_mapped_reads\": 1000000000,\n    \"rearrangements\": [...],\n    \"program_name\": \"pelops\",\n    \"version\": \"0.5.0\",\n    \"cli_command\": \"pelops dux4r --total-number-reads 1000000000 --export . test.bam\"\n}\n```\n\nThe **rearrangements** consist of a unique ID, genomic region sets \"A\" and \"B\", and the evidence for the rearrangement\nbetween these two regions.\nFor the command `pelops dux4r`, ID `01` always corresponds to rearrangements between the core _DUX4_ regions and _IGH_,\nwhile ID `02` corresponds to rearrangements of the extended _DUX4_ regions with _IGH_.\nIDs `03` and beyond are potential rearrangements of the core _DUX4_ region with other genomic regions (marked as `UNNAMED`);\nthere can be a variable number of them.\n\n```json\n{\n  \"rearrangements\": [\n    {\n      \"id\": \"01\",\n      \"A\": {\"name\": \"CoreDUX4\"...},\n      \"B\": {\"name\": \"IGH\"...},\n      \"evidence\": {...}\n    },\n    {\n      \"id\": \"02\",\n      \"A\": {\"name\": \"ExtendedDUX4\"...},\n      \"B\": {\"name\": \"IGH\"...},\n      \"evidence\": {...}\n    },\n    {\n      \"id\": \"03\",\n      \"A\": {\"name\": \"CoreDUX4\"...},\n      \"B\": {\"name\": \"UNNAMED\"...},\n      \"evidence\": {...}\n    }\n  ]\n}\n```\n\n**A** and **B** document the exact set of genomic regions used for each rearrangement.\nFor example, the core _DUX4_ region is shown below.\nWhile `IGH`, `CoreDUX4` and `ExtendedDUX4` are pre-defined, each `UNNAMED` region will be different.\n\n```json\n{\n  \"name\": \"CoreDUX4\",\n  \"regions\": [\n    {\n      \"chrom\": \"chr4\",\n      \"start\": 190020407,\n      \"end\": 190023665\n    },\n    {\n      \"chrom\": \"chr4\",\n      \"start\": 190066935,\n      \"end\": 190093279\n    },\n    {\n      \"chrom\": \"chr4\",\n      \"start\": 190172774,\n      \"end\": 190176845\n    },\n    {\n      \"chrom\": \"chr10\",\n      \"start\": 133663429,\n      \"end\": 133685936\n    },\n    {\n      \"chrom\": \"chr10\",\n      \"start\": 133739606,\n      \"end\": 133762125\n    }\n  ]\n}\n```\n\nThe **evidence** for each rearrangement consists of the number of split and paired reads between region sets A and B,\nand the spanning read pairs per billion (SRPB). It is calculated as\n$$\\text{SRPB} = 10^9 \\frac{\\text{paired reads} + \\text{split reads}}{\\text{total unique and mapped reads}}.$$\n\n```json\n{\n  \"paired_reads\": 15,\n  \"split_reads\": 4,\n  \"SRPB\": 19.0\n}\n```\n\n### SAM files\nOptionally, for each rearrangement a SAM file can be exported which contains all paired and split reads with their mates.\nThe naming convention is `<id>_<name_A>-<name_B>.sam`, where `<id>`, `<name_A>`, `<name_B>` correspond to the ID and\nnames of genomic region sets A and B, respectively, as documented in the JSON.\n\n## Contributing\nWe are not accepting pull requests into this repository at this time, as the\nlicence currently does not allow modifications by third parties.\nFor any bug report / recommendation / feature request, please open an issue.\n\n## Credits\n\nSee [Authors](AUTHORS.md).\n",
    "bugtrack_url": null,
    "license": "PolyForm Strict License 1.0.0",
    "summary": "Dedicated caller for DUX4 rearrangements from whole genome sequencing data.",
    "version": "0.8.0",
    "project_urls": {
        "Repository": "https://github.com/Illumina/Pelops"
    },
    "split_keywords": [
        "dux4",
        "acute lymphoblastic leukaemia",
        "whole-genome sequencing",
        "igh::dux4 rearrangements"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ca30f13eb8a98a2fcf8ac10cf18dcf882a91c0cb0b5b096c276dcad676b69479",
                "md5": "47e7273f728963f7edc2a526a0c19f38",
                "sha256": "5ac0caa77d9909415e6d10fac6b226a202390212d8444c8afe3934c39107b78f"
            },
            "downloads": -1,
            "filename": "ilmn_pelops-0.8.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "47e7273f728963f7edc2a526a0c19f38",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 40887,
            "upload_time": "2024-03-18T11:30:29",
            "upload_time_iso_8601": "2024-03-18T11:30:29.087116Z",
            "url": "https://files.pythonhosted.org/packages/ca/30/f13eb8a98a2fcf8ac10cf18dcf882a91c0cb0b5b096c276dcad676b69479/ilmn_pelops-0.8.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f433d279c3180f01b4142e1167eb19ccbddf7ac90b7056a0e7e15d766f531c19",
                "md5": "a31bd2c20c67709019d1065cfb263790",
                "sha256": "39942601dc15c7b4081ca45c39fe36b133eca4989d2e990be730d4d00d9788a9"
            },
            "downloads": -1,
            "filename": "ilmn-pelops-0.8.0.tar.gz",
            "has_sig": false,
            "md5_digest": "a31bd2c20c67709019d1065cfb263790",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 31090,
            "upload_time": "2024-03-18T11:30:30",
            "upload_time_iso_8601": "2024-03-18T11:30:30.547750Z",
            "url": "https://files.pythonhosted.org/packages/f4/33/d279c3180f01b4142e1167eb19ccbddf7ac90b7056a0e7e15d766f531c19/ilmn-pelops-0.8.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-18 11:30:30",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Illumina",
    "github_project": "Pelops",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "tox": true,
    "lcname": "ilmn-pelops"
}
        
Elapsed time: 0.27617s