[![codecov](https://codecov.io/gh/fowler-lab/catomatic/branch/ecoff/graph/badge.svg?token=8fnOy6rHCd)](https://codecov.io/gh/fowler-lab/catomatic)
# catomatic
Python code that algorithmically builds antimicrobial resistance catalogues of mutations.
## Introduction
This method relies on the logic that mutations that do not cause resistance can co-occur with those that do, and if a mutation in isolation (solo) does not cause resistance, then it will also not contribute to the phenotype when not in isolation.
Mutations that occur in isolation across specified genes are traversed in sequence, and if their proportion of drug-susceptibility (vs resistance) passes the specified statistical test, they are characterized as benign and removed from the dataset. This step repeats while there are susceptible mutations in isolation. Once the dataset has been 'cleaned' of benign mutations, resistant mutations are classified via their proportions by the specified test, failing which they are added to the catalogue as 'Unclassified'.
Construction can either rely on homogenous susceptibility for the particular mutation (and no explicit phenotyping is carried out, other than to unlock susceptible variants), use a Binomial test where the proportion of resistance is tested against a specified background rate, or a Fisher's test where the proportion of resistance is tested against a calculated background rate.
Although the method is entirely algorithmic, there are 2 entry points for intervention. Firstly, one is able to 'seed' the method with neutral mutations (such as those gathered in a literature search - often helpful if a gene contains phylogenetic mutations with high prevalence that add noise), and secondly one can add or overwrite classifications and entries to the catalogue, although not recommended unless aggregating.
Because the method uses and understands GARC1 grammar, one can supply 'rules' to the catalogue post-hoc - such as `{rpoB@*_fs:R}` for frameshifts in rpoB, which can either simply be added (and would have lower prediction priority to finer grain mutations, such as `rpoB@44_ins`) or can replace any mutations that fall under that rule, effectively aggregating relevant variants.
The generated catalogue can be returned either as a dictionary, or as a Pandas dataframe which can be exported in a Piezo compatible format for rapid parsing and resistance predictions.
Contingency tables, proportions, p_values, and Wilson's Confidence Intervals are logged under the 'EVIDENCE' column of the catalogue.
A workflow with example parameters:
![Catalogue Diagram](docs/workflow.png)
## Installation
### Using Conda
It is recommended to manage the Python environment and dependencies through Conda. You can install Catomatic within a Conda environment by following these steps:
#### Create and Activate Environment
First, ensure that you have Conda installed. Then, create and activate a new environment, and install catomatic:
```bash
conda env create -f env.yml
conda activate catomatic
pip install .
```
## Running catomatic
At the most basic level, the method takes 2 input dataframes: a `samples dataframe` which contains 1 row per sample with 'R' vs 'S' binary phenotypes, and a `mutations dataframe` which contains 1 row per mutation. They have to be joinable on their `UNIQUEID` columns.
If exporting to Piezo format, the `MUTATION` column must contain GARC1 grammer (ie `gene@mutation`). One must also supply a path to the `wildcards.json` file, which should contain Piezo wildcards in a json object/dictionary (example file in `/data/bdq_wildcards.json`).
If seeding or updating the catalogue, the mutation grammar must match that of the `MUTATION` column.
### CLI
After installation, the simplest way to run the catomatic catalogue builder is via the command line interface. ` --to_piezo` or `--to_json ` flags will need to specified to save the catalogue (with additional arguments if using --to_piezo)
`BuildCatalogue --samples path/to/samples.csv --mutations path/to/mutations.csv --to_json --outfile path/to/out/catalogue.json`
or
`BuildCatalogue --samples path/to/samples.csv --mutations path/to/mutations.csv --to_piezo --outfile path/to/out/catalogue.csv --genbank_ref '...' --catalogue_name '...' --version '...' --drug '...' --wildcards path/to/wildcards.json`
### Python/Jupyter notebook
Should you wish to run catomatic in a notebook, for example, you can do so simply by calling BuildCatalogue after import.
```python
from catomatic.CatalogueBuilder import BuildCatalogue
#instantiate a catalogue object - this will build the catalogue
catalogue = BuildCatalogue(samples = samples_df, mutations = mutations_df)
#return the catalogue as a dictionary in order of variant addition
catalogue.return_catalogue()
#return the catalogue as a piezo-structured dataframe
catalogue.build_piezo(genbank_ref='...', catalogue_name='...', version='...', drug='...', wildcards='path/to/wildcards.json')
```
More detailed examples on running catomatic can be found in `examples/demo.ipynb`
### CLI Parameters
| Parameter | Type | Description |
| ------------------ | ------- | ------------------------------------------------------------------------------------------------- |
| `--samples` | `str` | Path to the samples file. Required. |
| `--mutations` | `str` | Path to the mutations file. Required. |
| `--FRS` | `float` | Fraction Read Support threshold. Optional. |
| `--seed` | `list` | List of seed mutations using GARC grammar. Optional. |
| `--test` | `str` | Type of statistical test to run: `None`, `Binomial`, or `Fisher`. Optional. |
| `--background` | `float` | Background mutation rate for the binomial test. Required if using test = Binomial. Optional. |
| `--p` | `float` | Significance level for statistical testing. Optional. Defaults to `0.95`. |
| `--strict_unlock` | `bool` | Enforce strict unlocking for classifications, which requires p < 0.05. Optional. |
| `--to_json` | `bool` | Export the catalogue to JSON format. Optional. |
| `--outfile` | `str` | Path to output file for exporting the catalogue. Used with `--to_json` or `--to_piezo`. Optional. |
| `--to_piezo` | `bool` | Export catalogue to Piezo format. Optional. |
| `--genbank_ref` | `str` | GenBank reference for the catalogue. Required if to_piezo = True. Optional. |
| `--catalogue_name` | `str` | Name of the catalogue. Required if to_piezo = True. Optional. |
| `--version` | `str` | Version of the catalogue. Required if to_piezo = True. Optional. |
| `--drug` | `str` | Drug associated with the mutations. Required if to_piezo = True. Optional. |
| `--wildcards` | `str` | JSON file with wildcard rules. Required if to_piezo = True. Optional. |
| `--grammar` | `str` | Grammar used in the catalogue. Optional. Defaults to `GARC1`. |
| `--values` | `str` | Values used for predictions in the catalogue. Optional. Defaults to `RUS`. |
Raw data
{
"_id": null,
"home_page": "https://github.com/fowler-lab/catomatic",
"name": "catomatic",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": null,
"keywords": "resistance catalogue, tuberculosis, clinical microbiology",
"author": "Dylan Adlard, Philip W Fowler",
"author_email": "philip.fowler@ndm.ox.ac.uk",
"download_url": "https://files.pythonhosted.org/packages/2c/13/8b38bc9edb60e71ea1f3d56aebfa603c41e1f127f06f53604da0fcf8ae64/catomatic-0.1.2.tar.gz",
"platform": null,
"description": "[![codecov](https://codecov.io/gh/fowler-lab/catomatic/branch/ecoff/graph/badge.svg?token=8fnOy6rHCd)](https://codecov.io/gh/fowler-lab/catomatic)\n\n# catomatic\n\nPython code that algorithmically builds antimicrobial resistance catalogues of mutations.\n\n## Introduction\n\nThis method relies on the logic that mutations that do not cause resistance can co-occur with those that do, and if a mutation in isolation (solo) does not cause resistance, then it will also not contribute to the phenotype when not in isolation.\n\nMutations that occur in isolation across specified genes are traversed in sequence, and if their proportion of drug-susceptibility (vs resistance) passes the specified statistical test, they are characterized as benign and removed from the dataset. This step repeats while there are susceptible mutations in isolation. Once the dataset has been 'cleaned' of benign mutations, resistant mutations are classified via their proportions by the specified test, failing which they are added to the catalogue as 'Unclassified'.\n\nConstruction can either rely on homogenous susceptibility for the particular mutation (and no explicit phenotyping is carried out, other than to unlock susceptible variants), use a Binomial test where the proportion of resistance is tested against a specified background rate, or a Fisher's test where the proportion of resistance is tested against a calculated background rate.\n\nAlthough the method is entirely algorithmic, there are 2 entry points for intervention. Firstly, one is able to 'seed' the method with neutral mutations (such as those gathered in a literature search - often helpful if a gene contains phylogenetic mutations with high prevalence that add noise), and secondly one can add or overwrite classifications and entries to the catalogue, although not recommended unless aggregating.\n\nBecause the method uses and understands GARC1 grammar, one can supply 'rules' to the catalogue post-hoc - such as `{rpoB@*_fs:R}` for frameshifts in rpoB, which can either simply be added (and would have lower prediction priority to finer grain mutations, such as `rpoB@44_ins`) or can replace any mutations that fall under that rule, effectively aggregating relevant variants.\n\nThe generated catalogue can be returned either as a dictionary, or as a Pandas dataframe which can be exported in a Piezo compatible format for rapid parsing and resistance predictions.\n\nContingency tables, proportions, p_values, and Wilson's Confidence Intervals are logged under the 'EVIDENCE' column of the catalogue.\n\nA workflow with example parameters:\n\n![Catalogue Diagram](docs/workflow.png)\n\n## Installation\n\n### Using Conda\n\nIt is recommended to manage the Python environment and dependencies through Conda. You can install Catomatic within a Conda environment by following these steps:\n\n#### Create and Activate Environment\n\nFirst, ensure that you have Conda installed. Then, create and activate a new environment, and install catomatic:\n\n```bash\nconda env create -f env.yml\nconda activate catomatic\npip install .\n```\n\n## Running catomatic\n\nAt the most basic level, the method takes 2 input dataframes: a `samples dataframe` which contains 1 row per sample with 'R' vs 'S' binary phenotypes, and a `mutations dataframe` which contains 1 row per mutation. They have to be joinable on their `UNIQUEID` columns.\n\nIf exporting to Piezo format, the `MUTATION` column must contain GARC1 grammer (ie `gene@mutation`). One must also supply a path to the `wildcards.json` file, which should contain Piezo wildcards in a json object/dictionary (example file in `/data/bdq_wildcards.json`).\n\nIf seeding or updating the catalogue, the mutation grammar must match that of the `MUTATION` column.\n\n### CLI\n\nAfter installation, the simplest way to run the catomatic catalogue builder is via the command line interface. ` --to_piezo` or `--to_json ` flags will need to specified to save the catalogue (with additional arguments if using --to_piezo)\n\n`BuildCatalogue --samples path/to/samples.csv --mutations path/to/mutations.csv --to_json --outfile path/to/out/catalogue.json`\n\nor\n\n`BuildCatalogue --samples path/to/samples.csv --mutations path/to/mutations.csv --to_piezo --outfile path/to/out/catalogue.csv --genbank_ref '...' --catalogue_name '...' --version '...' --drug '...' --wildcards path/to/wildcards.json`\n\n### Python/Jupyter notebook\n\nShould you wish to run catomatic in a notebook, for example, you can do so simply by calling BuildCatalogue after import.\n\n```python\nfrom catomatic.CatalogueBuilder import BuildCatalogue\n\n#instantiate a catalogue object - this will build the catalogue\ncatalogue = BuildCatalogue(samples = samples_df, mutations = mutations_df)\n\n#return the catalogue as a dictionary in order of variant addition\ncatalogue.return_catalogue()\n\n#return the catalogue as a piezo-structured dataframe\ncatalogue.build_piezo(genbank_ref='...', catalogue_name='...', version='...', drug='...', wildcards='path/to/wildcards.json')\n```\n\nMore detailed examples on running catomatic can be found in `examples/demo.ipynb`\n\n### CLI Parameters\n\n| Parameter | Type | Description |\n| ------------------ | ------- | ------------------------------------------------------------------------------------------------- |\n| `--samples` | `str` | Path to the samples file. Required. |\n| `--mutations` | `str` | Path to the mutations file. Required. |\n| `--FRS` | `float` | Fraction Read Support threshold. Optional. |\n| `--seed` | `list` | List of seed mutations using GARC grammar. Optional. |\n| `--test` | `str` | Type of statistical test to run: `None`, `Binomial`, or `Fisher`. Optional. |\n| `--background` | `float` | Background mutation rate for the binomial test. Required if using test = Binomial. Optional. |\n| `--p` | `float` | Significance level for statistical testing. Optional. Defaults to `0.95`. |\n| `--strict_unlock` | `bool` | Enforce strict unlocking for classifications, which requires p < 0.05. Optional. |\n| `--to_json` | `bool` | Export the catalogue to JSON format. Optional. |\n| `--outfile` | `str` | Path to output file for exporting the catalogue. Used with `--to_json` or `--to_piezo`. Optional. |\n| `--to_piezo` | `bool` | Export catalogue to Piezo format. Optional. |\n| `--genbank_ref` | `str` | GenBank reference for the catalogue. Required if to_piezo = True. Optional. |\n| `--catalogue_name` | `str` | Name of the catalogue. Required if to_piezo = True. Optional. |\n| `--version` | `str` | Version of the catalogue. Required if to_piezo = True. Optional. |\n| `--drug` | `str` | Drug associated with the mutations. Required if to_piezo = True. Optional. |\n| `--wildcards` | `str` | JSON file with wildcard rules. Required if to_piezo = True. Optional. |\n| `--grammar` | `str` | Grammar used in the catalogue. Optional. Defaults to `GARC1`. |\n| `--values` | `str` | Values used for predictions in the catalogue. Optional. Defaults to `RUS`. |\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A tool for automatically building catalogues of antibiotic resistance-associated variants",
"version": "0.1.2",
"project_urls": {
"Homepage": "https://github.com/fowler-lab/catomatic"
},
"split_keywords": [
"resistance catalogue",
" tuberculosis",
" clinical microbiology"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "d9232557f6fc0925a7f0289338e83c18b2219f002d4a1f38b32bb3a919939120",
"md5": "d27f7ca54ec4766486080f32d3c815e3",
"sha256": "d8fc44e1b4c9a0f7864f4a756fa030f6829b8be8df3e3b4c12239b74ec792533"
},
"downloads": -1,
"filename": "catomatic-0.1.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "d27f7ca54ec4766486080f32d3c815e3",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 16225,
"upload_time": "2024-12-16T14:02:47",
"upload_time_iso_8601": "2024-12-16T14:02:47.486982Z",
"url": "https://files.pythonhosted.org/packages/d9/23/2557f6fc0925a7f0289338e83c18b2219f002d4a1f38b32bb3a919939120/catomatic-0.1.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "2c138b38bc9edb60e71ea1f3d56aebfa603c41e1f127f06f53604da0fcf8ae64",
"md5": "e434c1ffe4700b5fdb8540fd70397d5d",
"sha256": "66bd4a32b343249e3372750763f553de49df6570c0d219373d0f97c29c5c0514"
},
"downloads": -1,
"filename": "catomatic-0.1.2.tar.gz",
"has_sig": false,
"md5_digest": "e434c1ffe4700b5fdb8540fd70397d5d",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 17617,
"upload_time": "2024-12-16T14:02:48",
"upload_time_iso_8601": "2024-12-16T14:02:48.579024Z",
"url": "https://files.pythonhosted.org/packages/2c/13/8b38bc9edb60e71ea1f3d56aebfa603c41e1f127f06f53604da0fcf8ae64/catomatic-0.1.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-16 14:02:48",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "fowler-lab",
"github_project": "catomatic",
"github_not_found": true,
"lcname": "catomatic"
}