PaReBrick


NamePaReBrick JSON
Version 0.5.7 PyPI version JSON
download
home_pagehttps://github.com/ctlab/parallel-rearrangements
SummaryA bioinf tool for finding genome rearrangements in bacterial genomes
upload_time2024-10-06 19:25:37
maintainerNone
docs_urlNone
authorAlexey Zabelkin
requires_python<3.9,>=3.6
licenseNone
keywords genome rearrangements phylogenetic trees non-convex characters synteny blocks phylogenetic tree pattern consistency
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # PaReBrick: PArallel REarrangements and BReakpoints Identification Toolkit

---
## Motivation
The high plasticity of bacterial genomes is facilitated by numerous mechanisms, including horizontal gene transfer and recombination via flanking repeats.  
Genome rearrangements such as inversions, deletions, insertions, and duplications may occur independently in different strains, leading to parallel adaptation or phenotypic diversity.  
Such rearrangements may be responsible for virulence, antibiotic resistance, and antigenic variation.  
However, identifying these events often requires laborious manual inspection and verification of phyletic pattern consistency.

## Methods and Results

![Pipeline of tool](figs/pipeline.svg)

We present **PaReBrick** — a tool implementing an algorithm for identifying parallel rearrangements in bacterial populations.  
We define "parallel rearrangements" as events that occur independently in phylogenetically distant bacterial strains and provide a formalization for calling these events.

The tool takes a collection of strains represented as sequences of oriented synteny blocks and a phylogenetic tree as input.  
It identifies rearrangements, tests them for consistency with the tree, and ranks events by their parallelism score.  
The tool also generates diagrams for each block of interest, facilitating the detection of horizontally transferred blocks, their extra copies, and any inversions involving duplicated blocks.

We [demonstrated](https://doi.org/10.1093/bioinformatics/btab691) PaReBrick’s efficiency and accuracy, showing its potential for detecting genome rearrangements responsible for pathogenicity and adaptation in bacterial genomes.

## Installation

PaReBrick can be installed using `pip`.  
Please note that Python <= 3.8 is required (caused by dependencies). To create and activate a Python 3.8 environment using Conda, run the following commands:

```bash
conda create -n py38 python=3.8
conda activate py38
```

Then, install PaReBrick:

```bash
pip install PaReBrick
```

Now you can run the tool from any directory using `PaReBrick` (or `parebrick`).

## Script Parameters
The main script of the project, which includes all modules, can be run from any location as a console tool.

### Required Input

**Important:** Identifiers in the tree and blocks must match.

#### `--tree/-t`
Path to a phylogenetic tree in Newick format, parsable by the `ete3` library.  
For more information about supported formats, see the [ete3 documentation](http://etetoolkit.org/docs/latest/tutorial/tutorial_trees.html#reading-and-writing-newick-trees).

#### `--blocks_folder/-b`
Path to a folder containing synteny blocks, generated by tools such as Sibelia or `maf2synteny`.  
Refer to [BLOCKS-OBTAIN.md](BLOCKS-OBTAIN.md) for instructions on obtaining synteny blocks using SibeliaZ.

### Optional Input

#### `--labels/-l`
Path to a CSV file with tree labels for visualization.  
The file must contain two columns: `strain` and `label`.

#### `--output/-o`
Path to the output folder.  
Default is `./parebrick_output`.

### Output
The output consists of three main folders:

1. **`preprocessed_data`** — Contains all synteny blocks in `infercars`, `GRIMM`, and `CSV` formats, as well as `genomes_lengths.csv`, which lists the lengths of the provided genomes.
2. **`balanced_rearrangements_output`** — Contains a `stats.csv` file with statistics of non-convex characters from balanced rearrangements, as well as folders (`characters`, `tree_colorings`) containing character representations in `.pdf` trees and `.csv` formats.
3. **`unbalanced_rearrangements_output`** — Similar to the above, but for unbalanced rearrangements. Contains a `stats.csv` file and subfolders with tree renderings in `.pdf` and `.csv` formats.

## Example Run and Data
Example data is available in the `example-data` folder.

### How to Run the Example:
1. Clone the repository:
```bash
git clone https://github.com/ctlab/parallel-rearrangements
```

2. Navigate to the example data folder:
```bash
cd parallel-rearrangements/example-data/streptococcus_pyogenes
```

3. Run PaReBrick using the example input:
```bash
PaReBrick -t input/tree.nwk -b input/maf2synteny-output -l input/labels.csv
```

Or, run with minimal required arguments (without labels):
```bash
PaReBrick -t input/tree.nwk -b input/maf2synteny-output
```

## Citation
If you use `PaReBrick` in your research, please cite:

Alexey Zabelkin, Yulia Yakovleva, Olga Bochkareva, Nikita Alexeev, PaReBrick: PArallel REarrangements and BReakpoints identification toolkit, Bioinformatics, 2021; btab691, https://doi.org/10.1093/bioinformatics/btab691

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/ctlab/parallel-rearrangements",
    "name": "PaReBrick",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.9,>=3.6",
    "maintainer_email": null,
    "keywords": "genome rearrangements, phylogenetic trees, non-convex characters, synteny blocks, phylogenetic tree, pattern consistency",
    "author": "Alexey Zabelkin",
    "author_email": "a.zabelkin@itmo.ru",
    "download_url": "https://files.pythonhosted.org/packages/a5/9a/66cdd176ee86114736a927c1408bc1e5e8b1f46a065384d575facacf8142/PaReBrick-0.5.7.tar.gz",
    "platform": null,
    "description": "# PaReBrick: PArallel REarrangements and BReakpoints Identification Toolkit\n\n---\n## Motivation\nThe high plasticity of bacterial genomes is facilitated by numerous mechanisms, including horizontal gene transfer and recombination via flanking repeats.  \nGenome rearrangements such as inversions, deletions, insertions, and duplications may occur independently in different strains, leading to parallel adaptation or phenotypic diversity.  \nSuch rearrangements may be responsible for virulence, antibiotic resistance, and antigenic variation.  \nHowever, identifying these events often requires laborious manual inspection and verification of phyletic pattern consistency.\n\n## Methods and Results\n\n![Pipeline of tool](figs/pipeline.svg)\n\nWe present **PaReBrick** \u2014 a tool implementing an algorithm for identifying parallel rearrangements in bacterial populations.  \nWe define \"parallel rearrangements\" as events that occur independently in phylogenetically distant bacterial strains and provide a formalization for calling these events.\n\nThe tool takes a collection of strains represented as sequences of oriented synteny blocks and a phylogenetic tree as input.  \nIt identifies rearrangements, tests them for consistency with the tree, and ranks events by their parallelism score.  \nThe tool also generates diagrams for each block of interest, facilitating the detection of horizontally transferred blocks, their extra copies, and any inversions involving duplicated blocks.\n\nWe [demonstrated](https://doi.org/10.1093/bioinformatics/btab691) PaReBrick\u2019s efficiency and accuracy, showing its potential for detecting genome rearrangements responsible for pathogenicity and adaptation in bacterial genomes.\n\n## Installation\n\nPaReBrick can be installed using `pip`.  \nPlease note that Python <= 3.8 is required (caused by dependencies). To create and activate a Python 3.8 environment using Conda, run the following commands:\n\n```bash\nconda create -n py38 python=3.8\nconda activate py38\n```\n\nThen, install PaReBrick:\n\n```bash\npip install PaReBrick\n```\n\nNow you can run the tool from any directory using `PaReBrick` (or `parebrick`).\n\n## Script Parameters\nThe main script of the project, which includes all modules, can be run from any location as a console tool.\n\n### Required Input\n\n**Important:** Identifiers in the tree and blocks must match.\n\n#### `--tree/-t`\nPath to a phylogenetic tree in Newick format, parsable by the `ete3` library.  \nFor more information about supported formats, see the [ete3 documentation](http://etetoolkit.org/docs/latest/tutorial/tutorial_trees.html#reading-and-writing-newick-trees).\n\n#### `--blocks_folder/-b`\nPath to a folder containing synteny blocks, generated by tools such as Sibelia or `maf2synteny`.  \nRefer to [BLOCKS-OBTAIN.md](BLOCKS-OBTAIN.md) for instructions on obtaining synteny blocks using SibeliaZ.\n\n### Optional Input\n\n#### `--labels/-l`\nPath to a CSV file with tree labels for visualization.  \nThe file must contain two columns: `strain` and `label`.\n\n#### `--output/-o`\nPath to the output folder.  \nDefault is `./parebrick_output`.\n\n### Output\nThe output consists of three main folders:\n\n1. **`preprocessed_data`** \u2014 Contains all synteny blocks in `infercars`, `GRIMM`, and `CSV` formats, as well as `genomes_lengths.csv`, which lists the lengths of the provided genomes.\n2. **`balanced_rearrangements_output`** \u2014 Contains a `stats.csv` file with statistics of non-convex characters from balanced rearrangements, as well as folders (`characters`, `tree_colorings`) containing character representations in `.pdf` trees and `.csv` formats.\n3. **`unbalanced_rearrangements_output`** \u2014 Similar to the above, but for unbalanced rearrangements. Contains a `stats.csv` file and subfolders with tree renderings in `.pdf` and `.csv` formats.\n\n## Example Run and Data\nExample data is available in the `example-data` folder.\n\n### How to Run the Example:\n1. Clone the repository:\n```bash\ngit clone https://github.com/ctlab/parallel-rearrangements\n```\n\n2. Navigate to the example data folder:\n```bash\ncd parallel-rearrangements/example-data/streptococcus_pyogenes\n```\n\n3. Run PaReBrick using the example input:\n```bash\nPaReBrick -t input/tree.nwk -b input/maf2synteny-output -l input/labels.csv\n```\n\nOr, run with minimal required arguments (without labels):\n```bash\nPaReBrick -t input/tree.nwk -b input/maf2synteny-output\n```\n\n## Citation\nIf you use `PaReBrick` in your research, please cite:\n\nAlexey Zabelkin, Yulia Yakovleva, Olga Bochkareva, Nikita Alexeev, PaReBrick: PArallel REarrangements and BReakpoints identification toolkit, Bioinformatics, 2021; btab691, https://doi.org/10.1093/bioinformatics/btab691\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A bioinf tool for finding genome rearrangements in bacterial genomes",
    "version": "0.5.7",
    "project_urls": {
        "Homepage": "https://github.com/ctlab/parallel-rearrangements"
    },
    "split_keywords": [
        "genome rearrangements",
        " phylogenetic trees",
        " non-convex characters",
        " synteny blocks",
        " phylogenetic tree",
        " pattern consistency"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "58c9fcc62b7f348705f61368aeccd5dc2c319ffcda311e9ed3b0097f6828c40c",
                "md5": "8144361e0e08af7542bb8fcded5da983",
                "sha256": "3cdf9bbfe066d6cb3ba29c025db83b225a3c3842eab9a9617ae3e71b8d6c631e"
            },
            "downloads": -1,
            "filename": "PaReBrick-0.5.7-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "8144361e0e08af7542bb8fcded5da983",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.9,>=3.6",
            "size": 28019,
            "upload_time": "2024-10-06T19:25:35",
            "upload_time_iso_8601": "2024-10-06T19:25:35.546834Z",
            "url": "https://files.pythonhosted.org/packages/58/c9/fcc62b7f348705f61368aeccd5dc2c319ffcda311e9ed3b0097f6828c40c/PaReBrick-0.5.7-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a59a66cdd176ee86114736a927c1408bc1e5e8b1f46a065384d575facacf8142",
                "md5": "6cdce222d125c436016acc5a17425148",
                "sha256": "87aea585e0bc340abad2a4a37d532b5be75872284cabf3c9ad0be83ef372f0e4"
            },
            "downloads": -1,
            "filename": "PaReBrick-0.5.7.tar.gz",
            "has_sig": false,
            "md5_digest": "6cdce222d125c436016acc5a17425148",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.9,>=3.6",
            "size": 22966,
            "upload_time": "2024-10-06T19:25:37",
            "upload_time_iso_8601": "2024-10-06T19:25:37.110299Z",
            "url": "https://files.pythonhosted.org/packages/a5/9a/66cdd176ee86114736a927c1408bc1e5e8b1f46a065384d575facacf8142/PaReBrick-0.5.7.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-06 19:25:37",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "ctlab",
    "github_project": "parallel-rearrangements",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "parebrick"
}
        
Elapsed time: 0.41581s