cblaster


Namecblaster JSON
Version 1.3.17 PyPI version JSON
download
home_pagehttps://github.com/gamcil/cblaster
Summary
upload_time2023-01-05 10:03:44
maintainer
docs_urlNone
authorCameron Gilchrist
requires_python>=3.6
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage
            # cblaster
[![Python package](https://github.com/gamcil/cblaster/actions/workflows/pythonapp.yml/badge.svg)](https://github.com/gamcil/cblaster/actions/workflows/pythonapp.yml)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![PyPI version](https://badge.fury.io/py/cblaster.svg)](https://badge.fury.io/py/cblaster)
[![Documentation Status](https://readthedocs.org/projects/cblaster/badge/?version=latest)](https://cblaster.readthedocs.io/en/latest/?badge=latest)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3660769.svg)](https://doi.org/10.5281/zenodo.3660769)

>**(Temporarily down)**
Both cblaster and clinker can now be used without installation on the [CAGECAT webserver](http://cagecat.bioinformatics.nl/).

## Outline

`cblaster` is a tool for finding clusters of co-located homologous sequences
in BLAST searches.

<img src="docs/source/_static/workflow.png" alt="cblaster search workflow" width=600>

Given a collection of protein sequences, `cblaster` can search sequence databases
remotely (via NCBI BLAST API) or locally (via `DIAMOND`). Search results are parsed
and filtered based on user thresholds for identity, coverage and e-value. The genomic
coordinates of remaining hits are obtained from the NCBI's Identical Protein
Group (IPG) database (or a local database in local searches). Finally,
`cblaster` scans for instances of collocation and generates visualisations:

<img src="docs/source/_static/results.png" alt="cblaster search results" width=700>

## Installation
`cblaster` can be installed via pip:

```bash
$ pip3 install cblaster --user
```

or by cloning the repository and installing:

```bash
$ git clone https://github.com/gamcil/cblaster.git
...
$ cd cblaster/
$ pip3 install .
```

Additionally, we provide executables for Windows and Mac which can be downloaded [from here](https://github.com/gamcil/cblaster/releases/latest).

Once installed, make sure you configure cblaster with your email address:

```bash
$ cblaster config --email name@domain.com
```

You can find example search files, along with generated output, in the [examples folder
of the repository](https://github.com/gamcil/cblaster/tree/master/example).

## Dependencies
`cblaster` is tested on Python 3.6, and its only external Python dependency is
the `requests` module (used for interaction with NCBI APIs).
If you want to perform local searches, you should have `diamond` installed and available
on your system $PATH.
`cblaster` will throw an error if a local search is started but it cannot find
`diamond` or `diamond-aligner` (alias when installed via apt) on the system.

## Usage
`cblaster` accepts FASTA files and collections of valid NCBI sequence identifiers
(GIs, accession numbers) as input.
A remote search can be performed as simply as:

```bash
$ cblaster search --query_file query.fasta
```

For example, to remotely search the
[burnettramic acids gene cluster, *bua*](https://pubs.acs.org/doi/10.1021/acs.orglett.8b04042)
, against the NCBI's nr database:

```bash
$ cblaster search -qf bua.fasta

[12:14:17] INFO - Starting cblaster in remote mode
[12:14:17] INFO - Launching new search
[12:14:19] INFO - Request Identifier (RID): WHS0UGYJ015
[12:14:19] INFO - Request Time Of Execution (RTOE): 25s
[12:14:44] INFO - Polling NCBI for completion status
[12:14:44] INFO - Checking search status...
[12:15:44] INFO - Checking search status...
[12:16:44] INFO - Checking search status...
[12:16:46] INFO - Search has completed successfully!
[12:16:46] INFO - Retrieving results for search WHS0UGYJ015
[12:16:51] INFO - Parsing results...
[12:16:51] INFO - Found 3944 hits meeting score thresholds
[12:16:51] INFO - Fetching genomic context of hits
[12:17:14] INFO - Searching for clustered hits across 705 organisms
[12:17:14] INFO - Writing summary to <stdout>

Aspergillus mulundensis DSM 5745
================================
NW_020797889.1
--------------
Query       Subject         Identity  Coverage  E-value    Bitscore  Start    End      Strand
QBE85641.1  XP_026607259.1  75.56     99.5918   0          742       1717881  1719409  -
QBE85642.1  XP_026607260.1  89.916    100       0          667       1719650  1720797  +
QBE85643.1  XP_026607261.1  89.532    83.1169   0          832       1721494  1722934  +
QBE85644.1  XP_026607262.1  64.829    98.9218   6.51e-157  455       1723252  1724467  -
QBE85645.1  XP_026607263.1  69.97     100       6.93e-157  449       1725113  1726277  -
QBE85646.1  XP_026607264.1  82.759    96.8447   0          670       1726892  1728302  +
QBE85647.1  XP_026607265.1  72.674    99.2048   0          764       1729735  1731338  +
QBE85648.1  XP_026607266.1  56.098    98.324    4.24e-64   205       1731701  1732402  -
QBE85649.1  XP_026607267.1  79.623    99.8746   0          6573      1732820  1745289  +

...
```

A query sequence absence/presence matrix can be generated using the `--binary` argument:

```
Organism                                   Scaffold        Start    End      QBE85641.1  QBE85642.1  QBE85643.1  QBE85644.1  QBE85645.1  QBE85646.1  QBE85647.1  QBE85648.1  QBE85649.1
Aspergillus mulundensis DSM 5745           NW_020797889.1  1717881  1745289  1           1           1           1           1           1           1           1           1         
Aspergillus versicolor CBS 583.65          KV878126.1      3162095  3187090  1           1           1           0           1           1           1           1           1         
Pseudomassariella vexata CBS 129021        MCFJ01000004.1  1606356  1628483  1           1           1           0           0           1           0           1           1         
Hypoxylon sp. CO27-5                       KZ112517.1      92119    112957   1           1           1           0           0           0           1           0           1         
Hypoxylon sp. EC38                         KZ111255.1      514739   535366   1           1           1           0           0           0           1           0           1         
Epicoccum nigrum ICMP 19927                KZ107839.1      2116719  2142558  1           1           0           0           0           1           1           0           1         
Aureobasidium subglaciale EXF-2481         NW_013566983.1  700476   718693   1           1           0           0           0           1           1           0           0         
Aureobasidium pullulans EXF-6514           QZBF01000009.1  18721    34295    1           1           0           0           0           1           1           0           0         
Aureobasidium pullulans EXF-5628           QZBI01000512.1  329      13401    1           0           0           0           0           1           1           0           0         
```

`cblaster` can also generate fully interactive visualisations of the binary
table. To view an example, click [here](https://cblaster.readthedocs.io/en/latest/_static/example.html).

For further usage examples and API documentation, please refer to the
[documentation](https://cblaster.readthedocs.io/en/latest/).

## Citation
If you found this tool useful, please cite:

```text
Cameron L M Gilchrist, Thomas J Booth, Bram van Wersch, Liana van Grieken, Marnix H Medema, Yit-Heng Chooi, cblaster: a remote search tool for rapid identification and visualisation of homologous gene clusters, Bioinformatics Advances, 2021;, vbab016, https://doi.org/10.1093/bioadv/vbab016
```

`cblaster` makes use of the following tools:
```
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2015).

Acland, A. et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 42, 7–17 (2014).
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/gamcil/cblaster",
    "name": "cblaster",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "",
    "author": "Cameron Gilchrist",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/0c/2e/26f45495e4fbba233a7b84437b0c15efb1505cd89e3f3d593235c7bba871/cblaster-1.3.17.tar.gz",
    "platform": null,
    "description": "# cblaster\n[![Python package](https://github.com/gamcil/cblaster/actions/workflows/pythonapp.yml/badge.svg)](https://github.com/gamcil/cblaster/actions/workflows/pythonapp.yml)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![PyPI version](https://badge.fury.io/py/cblaster.svg)](https://badge.fury.io/py/cblaster)\n[![Documentation Status](https://readthedocs.org/projects/cblaster/badge/?version=latest)](https://cblaster.readthedocs.io/en/latest/?badge=latest)\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3660769.svg)](https://doi.org/10.5281/zenodo.3660769)\n\n>**(Temporarily down)**\nBoth cblaster and clinker can now be used without installation on the [CAGECAT webserver](http://cagecat.bioinformatics.nl/).\n\n## Outline\n\n`cblaster` is a tool for finding clusters of co-located homologous sequences\nin BLAST searches.\n\n<img src=\"docs/source/_static/workflow.png\" alt=\"cblaster search workflow\" width=600>\n\nGiven a collection of protein sequences, `cblaster` can search sequence databases\nremotely (via NCBI BLAST API) or locally (via `DIAMOND`). Search results are parsed\nand filtered based on user thresholds for identity, coverage and e-value. The genomic\ncoordinates of remaining hits are obtained from the NCBI's Identical Protein\nGroup (IPG) database (or a local database in local searches). Finally,\n`cblaster` scans for instances of collocation and generates visualisations:\n\n<img src=\"docs/source/_static/results.png\" alt=\"cblaster search results\" width=700>\n\n## Installation\n`cblaster` can be installed via pip:\n\n```bash\n$ pip3 install cblaster --user\n```\n\nor by cloning the repository and installing:\n\n```bash\n$ git clone https://github.com/gamcil/cblaster.git\n...\n$ cd cblaster/\n$ pip3 install .\n```\n\nAdditionally, we provide executables for Windows and Mac which can be downloaded [from here](https://github.com/gamcil/cblaster/releases/latest).\n\nOnce installed, make sure you configure cblaster with your email address:\n\n```bash\n$ cblaster config --email name@domain.com\n```\n\nYou can find example search files, along with generated output, in the [examples folder\nof the repository](https://github.com/gamcil/cblaster/tree/master/example).\n\n## Dependencies\n`cblaster` is tested on Python 3.6, and its only external Python dependency is\nthe `requests` module (used for interaction with NCBI APIs).\nIf you want to perform local searches, you should have `diamond` installed and available\non your system $PATH.\n`cblaster` will throw an error if a local search is started but it cannot find\n`diamond` or `diamond-aligner` (alias when installed via apt) on the system.\n\n## Usage\n`cblaster` accepts FASTA files and collections of valid NCBI sequence identifiers\n(GIs, accession numbers) as input.\nA remote search can be performed as simply as:\n\n```bash\n$ cblaster search --query_file query.fasta\n```\n\nFor example, to remotely search the\n[burnettramic acids gene cluster, *bua*](https://pubs.acs.org/doi/10.1021/acs.orglett.8b04042)\n, against the NCBI's nr database:\n\n```bash\n$ cblaster search -qf bua.fasta\n\n[12:14:17] INFO - Starting cblaster in remote mode\n[12:14:17] INFO - Launching new search\n[12:14:19] INFO - Request Identifier (RID): WHS0UGYJ015\n[12:14:19] INFO - Request Time Of Execution (RTOE): 25s\n[12:14:44] INFO - Polling NCBI for completion status\n[12:14:44] INFO - Checking search status...\n[12:15:44] INFO - Checking search status...\n[12:16:44] INFO - Checking search status...\n[12:16:46] INFO - Search has completed successfully!\n[12:16:46] INFO - Retrieving results for search WHS0UGYJ015\n[12:16:51] INFO - Parsing results...\n[12:16:51] INFO - Found 3944 hits meeting score thresholds\n[12:16:51] INFO - Fetching genomic context of hits\n[12:17:14] INFO - Searching for clustered hits across 705 organisms\n[12:17:14] INFO - Writing summary to <stdout>\n\nAspergillus mulundensis DSM 5745\n================================\nNW_020797889.1\n--------------\nQuery       Subject         Identity  Coverage  E-value    Bitscore  Start    End      Strand\nQBE85641.1  XP_026607259.1  75.56     99.5918   0          742       1717881  1719409  -\nQBE85642.1  XP_026607260.1  89.916    100       0          667       1719650  1720797  +\nQBE85643.1  XP_026607261.1  89.532    83.1169   0          832       1721494  1722934  +\nQBE85644.1  XP_026607262.1  64.829    98.9218   6.51e-157  455       1723252  1724467  -\nQBE85645.1  XP_026607263.1  69.97     100       6.93e-157  449       1725113  1726277  -\nQBE85646.1  XP_026607264.1  82.759    96.8447   0          670       1726892  1728302  +\nQBE85647.1  XP_026607265.1  72.674    99.2048   0          764       1729735  1731338  +\nQBE85648.1  XP_026607266.1  56.098    98.324    4.24e-64   205       1731701  1732402  -\nQBE85649.1  XP_026607267.1  79.623    99.8746   0          6573      1732820  1745289  +\n\n...\n```\n\nA query sequence absence/presence matrix can be generated using the `--binary` argument:\n\n```\nOrganism                                   Scaffold        Start    End      QBE85641.1  QBE85642.1  QBE85643.1  QBE85644.1  QBE85645.1  QBE85646.1  QBE85647.1  QBE85648.1  QBE85649.1\nAspergillus mulundensis DSM 5745           NW_020797889.1  1717881  1745289  1           1           1           1           1           1           1           1           1         \nAspergillus versicolor CBS 583.65          KV878126.1      3162095  3187090  1           1           1           0           1           1           1           1           1         \nPseudomassariella vexata CBS 129021        MCFJ01000004.1  1606356  1628483  1           1           1           0           0           1           0           1           1         \nHypoxylon sp. CO27-5                       KZ112517.1      92119    112957   1           1           1           0           0           0           1           0           1         \nHypoxylon sp. EC38                         KZ111255.1      514739   535366   1           1           1           0           0           0           1           0           1         \nEpicoccum nigrum ICMP 19927                KZ107839.1      2116719  2142558  1           1           0           0           0           1           1           0           1         \nAureobasidium subglaciale EXF-2481         NW_013566983.1  700476   718693   1           1           0           0           0           1           1           0           0         \nAureobasidium pullulans EXF-6514           QZBF01000009.1  18721    34295    1           1           0           0           0           1           1           0           0         \nAureobasidium pullulans EXF-5628           QZBI01000512.1  329      13401    1           0           0           0           0           1           1           0           0         \n```\n\n`cblaster` can also generate fully interactive visualisations of the binary\ntable. To view an example, click [here](https://cblaster.readthedocs.io/en/latest/_static/example.html).\n\nFor further usage examples and API documentation, please refer to the\n[documentation](https://cblaster.readthedocs.io/en/latest/).\n\n## Citation\nIf you found this tool useful, please cite:\n\n```text\nCameron L M Gilchrist, Thomas J Booth, Bram van Wersch, Liana van Grieken, Marnix H Medema, Yit-Heng Chooi, cblaster: a remote search tool for rapid identification and visualisation of homologous gene clusters, Bioinformatics Advances, 2021;, vbab016, https://doi.org/10.1093/bioadv/vbab016\n```\n\n`cblaster` makes use of the following tools:\n```\nBuchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59\u201360 (2015).\n\nAcland, A. et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 42, 7\u201317 (2014).\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "",
    "version": "1.3.17",
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "80f36c20a6f81ebfe54d7bce95fa3c1e46b6cfdd3bc91b9b7c1a3845a7aba19f",
                "md5": "ceb525052d0279056355fc375f9c8a9f",
                "sha256": "8783456e1e03d5f300188727ab8e70c8a95b16d62df9232c4099c99028632361"
            },
            "downloads": -1,
            "filename": "cblaster-1.3.17-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ceb525052d0279056355fc375f9c8a9f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 188784,
            "upload_time": "2023-01-05T10:03:41",
            "upload_time_iso_8601": "2023-01-05T10:03:41.880922Z",
            "url": "https://files.pythonhosted.org/packages/80/f3/6c20a6f81ebfe54d7bce95fa3c1e46b6cfdd3bc91b9b7c1a3845a7aba19f/cblaster-1.3.17-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0c2e26f45495e4fbba233a7b84437b0c15efb1505cd89e3f3d593235c7bba871",
                "md5": "09678d9adf599c05b4daefca4c691502",
                "sha256": "40480ba5316837c9460f0c867972be205ca2e2bc60072051d9552692bb2e27a0"
            },
            "downloads": -1,
            "filename": "cblaster-1.3.17.tar.gz",
            "has_sig": false,
            "md5_digest": "09678d9adf599c05b4daefca4c691502",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 173395,
            "upload_time": "2023-01-05T10:03:44",
            "upload_time_iso_8601": "2023-01-05T10:03:44.091471Z",
            "url": "https://files.pythonhosted.org/packages/0c/2e/26f45495e4fbba233a7b84437b0c15efb1505cd89e3f3d593235c7bba871/cblaster-1.3.17.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-01-05 10:03:44",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "gamcil",
    "github_project": "cblaster",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "requirements": [],
    "lcname": "cblaster"
}
        
Elapsed time: 0.02554s