[![Tests](https://github.com/oxfordmmm/gnomonicus/actions/workflows/tests.yaml/badge.svg)](https://github.com/oxfordmmm/gnomonicus/actions/workflows/tests.yaml)
[![Build and release Docker](https://github.com/oxfordmmm/gnomonicus/actions/workflows/build.yaml/badge.svg)](https://github.com/oxfordmmm/gnomonicus/actions/workflows/build.yaml)
[![PyPI version](https://badge.fury.io/py/gnomonicus.svg)](https://badge.fury.io/py/gnomonicus)
[![Docs](https://github.com/oxfordmmm/gnomonicus/actions/workflows/docs.yaml/badge.svg)](https://oxfordmmm.github.io/gnomonicus/)
# gnomonicus
Python code to integrate results of tb-pipeline and provide an antibiogram, mutations and variations
Provides a library of functions for use within scripts, as well as a CLI tool for linking the functions together to produce output
## Documentation
API reference for developers, and CLI instructions can be found here: https://oxfordmmm.github.io/gnomonicus/
## Usage
```
usage: gnomonicus [-h] --vcf_file VCF_FILE --genome_object GENOME_OBJECT [--catalogue_file CATALOGUE_FILE] [--ignore_vcf_filter] [--progress] [--output_dir OUTPUT_DIR] [--json] [--fasta FASTA] [--minor_populations MINOR_POPULATIONS]
options:
-h, --help show this help message and exit
--vcf_file VCF_FILE the path to a single VCF file
--genome_object GENOME_OBJECT
the path to a compressed gumpy Genome object or a genbank file
--catalogue_file CATALOGUE_FILE
the path to the resistance catalogue
--ignore_vcf_filter whether to ignore the FILTER field in the vcf (e.g. necessary for some versions of Clockwork VCFs)
--progress whether to show progress using tqdm
--output_dir OUTPUT_DIR
Directory to save output files to. Defaults to wherever the script is run from.
--json Flag to create a single JSON output as well as the CSVs
--fasta FASTA Use to output a FASTA file of the resultant genome. Specify either 'fixed' or 'variable' for fixed length and variable length FASTA respectively.
--minor_populations MINOR_POPULATIONS
Path to a line separated file containing genome indices of minor populations.
```
## Helper usage
As the main script can utilise pickled `gumpy.Genome` objects, there is a supplied helper script. This converts a Genbank file into a pickled gumpy.Genome for significant time saving.
Due to the security implications of the pickle module, **DO NOT SEND/RECEIVE PICKLES**. This script should be used on a host VM before running nextflow to avoid reinstanciation.
Supports gzip compression to reduce file size significantly (using the `--compress` flag).
```
usage: gbkToPkl FILENAME [--compress]
```
## Install
Simple install using pip for the latest release
```
pip install gnomonicus
```
Install from source
```
git clone https://github.com/oxfordmmm/gnomonicus.git
cd gnomonicus
pip install -e .
```
## Docker
A Docker image should be built on releases. To open a shell with gnomonicus installed:
```
docker run -it oxfordmmm/gnomonicus:latest
```
## Updating
If a `gnomonicus` update changes the version of `gumpy` used, an `OutdatedGumpyException` will be thrown if using a pickled `Genome` object which was instantiated with the old version. This can be fixed by simply re-instantiating the `Genome` object by passing a genbank file once.
## Notes
When generating mutations, in cases of synonymous amino acid mutation, the nucelotides changed are also included. This can lead to a mix of nucleotides and amino acids for coding genes, but these are excluded from generating effects unless specified in the catalogue. This means that the default rule of `gene@*= --> S` is still in place regardless of the introduced `gene@*?` which would otherwise take precedence. For example:
```
'MUTATIONS': [
{
'MUTATION': 'F2F',
'GENE': 'S',
'GENE_POSITION': 2
},
{
'MUTATION': 't6c',
'GENE': 'S',
'GENE_POSITION': 6
},
],
'EFFECTS': {
'AAA': [
{
'GENE': 'S',
'MUTATION': 'F2F',
'PREDICTION': 'S'
},
{
'PHENOTYPE': 'S'
}
],
```
The nucelotide variation is included in the the `MUTATIONS`, but explictly removed from the `EFFECTS` unless it is specified within the catalogue.
In order for this variation to be included, a line in the catalogue of `S@F2F&S@t6c` would have to be present.
## User stories
1. As a bioinformatician, I want to be able to run `gnomonicus` on the command line, passing it (i) a GenBank file (or pickled `gumpy.Genome` object), (ii) a resistance catalogue and (iii) a VCF file, and get back `pandas.DataFrames` of the genetic variants, mutations, effects and predictions/antibiogram. The latter is for all the drugs described in the passed resistance catalogue.
2. As a GPAS developer, I want to be able to embed `gnomonicus` in a Docker image/NextFlow pipeline that consumes the outputs of [tb-pipeline](https://github.com/Pathogen-Genomics-Cymru/tb-pipeline) and emits a structured, well-designed `JSON` object describing the genetic variants, mutations, effects and predictions/antibiogram.
3. In general, I would also like the option to output fixed- and variable-length FASTA files (the latter takes into account insertions and deletions described in any input VCF file).
## Unit testing
For speed, rather than use NC_000962.3 (i.e. H37Rv *M. tuberculosis*), we shall use SARS-CoV-2 and have created a fictious drug resistance catalogue, along with some `vcf` files and the expected outputs in `tests/`.
These can be run with `pytest -vv`
Raw data
{
"_id": null,
"home_page": "https://github.com/oxfordmmm/gnomonicus",
"name": "gnomonicus",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "gnomonicus, piezo, lodestone, clockwork, TB",
"author": "Philip W Fowler, Jeremy Westhead",
"author_email": "philip.fowler@ndm.ox.ac.uk",
"download_url": "https://files.pythonhosted.org/packages/c9/ad/ad1168c597a895ffbd5247102e82564fe0e093146fb366940e2f5aaf3eb5/gnomonicus-2.5.6.tar.gz",
"platform": null,
"description": "[![Tests](https://github.com/oxfordmmm/gnomonicus/actions/workflows/tests.yaml/badge.svg)](https://github.com/oxfordmmm/gnomonicus/actions/workflows/tests.yaml) \n[![Build and release Docker](https://github.com/oxfordmmm/gnomonicus/actions/workflows/build.yaml/badge.svg)](https://github.com/oxfordmmm/gnomonicus/actions/workflows/build.yaml) \n[![PyPI version](https://badge.fury.io/py/gnomonicus.svg)](https://badge.fury.io/py/gnomonicus)\n[![Docs](https://github.com/oxfordmmm/gnomonicus/actions/workflows/docs.yaml/badge.svg)](https://oxfordmmm.github.io/gnomonicus/)\n\n# gnomonicus\nPython code to integrate results of tb-pipeline and provide an antibiogram, mutations and variations\n\nProvides a library of functions for use within scripts, as well as a CLI tool for linking the functions together to produce output\n\n## Documentation\nAPI reference for developers, and CLI instructions can be found here: https://oxfordmmm.github.io/gnomonicus/ \n## Usage\n```\nusage: gnomonicus [-h] --vcf_file VCF_FILE --genome_object GENOME_OBJECT [--catalogue_file CATALOGUE_FILE] [--ignore_vcf_filter] [--progress] [--output_dir OUTPUT_DIR] [--json] [--fasta FASTA] [--minor_populations MINOR_POPULATIONS]\n\noptions:\n -h, --help show this help message and exit\n --vcf_file VCF_FILE the path to a single VCF file\n --genome_object GENOME_OBJECT\n the path to a compressed gumpy Genome object or a genbank file\n --catalogue_file CATALOGUE_FILE\n the path to the resistance catalogue\n --ignore_vcf_filter whether to ignore the FILTER field in the vcf (e.g. necessary for some versions of Clockwork VCFs)\n --progress whether to show progress using tqdm\n --output_dir OUTPUT_DIR\n Directory to save output files to. Defaults to wherever the script is run from.\n --json Flag to create a single JSON output as well as the CSVs\n --fasta FASTA Use to output a FASTA file of the resultant genome. Specify either 'fixed' or 'variable' for fixed length and variable length FASTA respectively.\n --minor_populations MINOR_POPULATIONS\n Path to a line separated file containing genome indices of minor populations.\n```\n\n## Helper usage\nAs the main script can utilise pickled `gumpy.Genome` objects, there is a supplied helper script. This converts a Genbank file into a pickled gumpy.Genome for significant time saving.\nDue to the security implications of the pickle module, **DO NOT SEND/RECEIVE PICKLES**. This script should be used on a host VM before running nextflow to avoid reinstanciation.\nSupports gzip compression to reduce file size significantly (using the `--compress` flag).\n```\nusage: gbkToPkl FILENAME [--compress]\n```\n\n## Install\nSimple install using pip for the latest release\n```\npip install gnomonicus\n```\n\nInstall from source\n```\ngit clone https://github.com/oxfordmmm/gnomonicus.git\ncd gnomonicus\npip install -e .\n```\n\n## Docker\nA Docker image should be built on releases. To open a shell with gnomonicus installed:\n```\ndocker run -it oxfordmmm/gnomonicus:latest\n```\n\n## Updating\nIf a `gnomonicus` update changes the version of `gumpy` used, an `OutdatedGumpyException` will be thrown if using a pickled `Genome` object which was instantiated with the old version. This can be fixed by simply re-instantiating the `Genome` object by passing a genbank file once.\n\n## Notes\nWhen generating mutations, in cases of synonymous amino acid mutation, the nucelotides changed are also included. This can lead to a mix of nucleotides and amino acids for coding genes, but these are excluded from generating effects unless specified in the catalogue. This means that the default rule of `gene@*= --> S` is still in place regardless of the introduced `gene@*?` which would otherwise take precedence. For example:\n```\n 'MUTATIONS': [\n {\n 'MUTATION': 'F2F',\n 'GENE': 'S',\n 'GENE_POSITION': 2\n },\n {\n 'MUTATION': 't6c',\n 'GENE': 'S',\n 'GENE_POSITION': 6\n },\n ],\n 'EFFECTS': {\n 'AAA': [\n {\n 'GENE': 'S',\n 'MUTATION': 'F2F',\n 'PREDICTION': 'S'\n },\n {\n 'PHENOTYPE': 'S'\n }\n ],\n```\nThe nucelotide variation is included in the the `MUTATIONS`, but explictly removed from the `EFFECTS` unless it is specified within the catalogue.\nIn order for this variation to be included, a line in the catalogue of `S@F2F&S@t6c` would have to be present.\n\n## User stories\n\n1. As a bioinformatician, I want to be able to run `gnomonicus` on the command line, passing it (i) a GenBank file (or pickled `gumpy.Genome` object), (ii) a resistance catalogue and (iii) a VCF file, and get back `pandas.DataFrames` of the genetic variants, mutations, effects and predictions/antibiogram. The latter is for all the drugs described in the passed resistance catalogue.\n\n2. As a GPAS developer, I want to be able to embed `gnomonicus` in a Docker image/NextFlow pipeline that consumes the outputs of [tb-pipeline](https://github.com/Pathogen-Genomics-Cymru/tb-pipeline) and emits a structured, well-designed `JSON` object describing the genetic variants, mutations, effects and predictions/antibiogram.\n\n3. In general, I would also like the option to output fixed- and variable-length FASTA files (the latter takes into account insertions and deletions described in any input VCF file).\n\n## Unit testing\n\nFor speed, rather than use NC_000962.3 (i.e. H37Rv *M. tuberculosis*), we shall use SARS-CoV-2 and have created a fictious drug resistance catalogue, along with some `vcf` files and the expected outputs in `tests/`.\n\nThese can be run with `pytest -vv`\n",
"bugtrack_url": null,
"license": "University of Oxford License, see LICENSE",
"summary": "Python code to integrate results of tb-pipeline and provide an antibiogram, mutations and variants",
"version": "2.5.6",
"project_urls": {
"Homepage": "https://github.com/oxfordmmm/gnomonicus"
},
"split_keywords": [
"gnomonicus",
" piezo",
" lodestone",
" clockwork",
" tb"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "b086c53d1c652dda288ed025c17c3535fe2eb39b426004eb8b3a8337628cbf8d",
"md5": "768ac4a0cdb4ed7981851d0aba901b18",
"sha256": "530afae50c86301ab6f4165e30930529d37b8deafa72f3340ea39e37dcdb6695"
},
"downloads": -1,
"filename": "gnomonicus-2.5.6-py3-none-any.whl",
"has_sig": false,
"md5_digest": "768ac4a0cdb4ed7981851d0aba901b18",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 28936,
"upload_time": "2024-05-09T10:35:18",
"upload_time_iso_8601": "2024-05-09T10:35:18.595497Z",
"url": "https://files.pythonhosted.org/packages/b0/86/c53d1c652dda288ed025c17c3535fe2eb39b426004eb8b3a8337628cbf8d/gnomonicus-2.5.6-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "c9adad1168c597a895ffbd5247102e82564fe0e093146fb366940e2f5aaf3eb5",
"md5": "84f703bdb0ce2ac9e9d76bdbcf59da1e",
"sha256": "d5c7fc317dc46dace96ff04895b7f7525eb68f50e03ee994a4ff297a86102756"
},
"downloads": -1,
"filename": "gnomonicus-2.5.6.tar.gz",
"has_sig": false,
"md5_digest": "84f703bdb0ce2ac9e9d76bdbcf59da1e",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 28895,
"upload_time": "2024-05-09T10:35:20",
"upload_time_iso_8601": "2024-05-09T10:35:20.292985Z",
"url": "https://files.pythonhosted.org/packages/c9/ad/ad1168c597a895ffbd5247102e82564fe0e093146fb366940e2f5aaf3eb5/gnomonicus-2.5.6.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-05-09 10:35:20",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "oxfordmmm",
"github_project": "gnomonicus",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "gnomonicus"
}