๐ช camlhmp ๐ช - Classification through yAML Heuristic Mapping Protocol (__yeah, it's a stretch to
make sure ๐ช is in the name!__)
# camlhmp
`camlhmp` is a tool for generating organism typing tools from YAML schemas. The idea came
up from discussions with Tim Read about the need for a tool that would allow researchers
to more easily define typing schemas for their organisms of interest. YAML seemed like a
a nice format for this due to its simplicity and readability.
_`camlhmp` is under active development, and any feedback is appreciated._
## Purpose
The primary purpose of `camlhmp` is to provide a framework that enables researchers to
independently define typing schemas for their organisms of interest using YAML. This
facilitates the management and analysis biological data, no matter the researchers experience
level.
`camlhmp` does not supply any pre-defined typing schemas. Instead, it provides researchers
with the tools necessary tools to create and maintain their own schemas. This I believe will
ensure the schemas remain up to date with the latest developments in its respective field.
Additionally, this really aroses from a practical need to streamline my maintenance of
multiple organism typing tools. Long-term maintenance of these tools is a challenge, and
I think `camlhmp` will help me to keep them up-to-date and consistent.
## Installation
`camlhmp` will be made available through PyPI and Bioconda. For now, you can install it
from the GitHub repository with the following command:
```bash
conda create -n camlhmp -c conda-forge -c bioconda camlhmp
conda activate camlhmp
camlhmp
```
## YAML Schema Structure
The schema structure is designed to be simple and intuitive. Here is a basic skeleton of the
expected schema structure:
```yaml
%YAML 1.2
---
# metadata: general information about the schema
metadata:
id: "" # unique identifier for the schema
name: "" # name of the schema
description: "" # description of the schema
version: "" # version of the schema
curators: [] # A list of curators of the schema
# engine: specifies the computational tools and additional parameters used for sequence
# analysis.
engine:
tool: "" # The tool used to generate the data
# targets: Lists the specific sequence targets such as genes, proteins, or markers that the
# schema will analyze. These should be included in the associated sequence query data
targets: []
# aliases: groups multiple targets under a common name for easier reference
aliases:
- name: "" # name of the alias
targets: [] # list of targets that are part of the alias
# types: define specific combinations of targets and aliases to form distinct types
types:
- name: "" # name of the profile
targets: [] # list of targets (can use aliases) that are part of the profile
excludes: [] # list of targets (or aliases) that will automatically fail the type
```
From this schema we have a few sections:
- `metadata`: general information about the schema
- `engine`: computational requirements for sequence analysis
- `targets`: lists the sequence targets such as genes, proteins, or markers
- `aliases`: groups multiple targets under a common name for easier reference
- `profiles`: defines combinations of targets and aliases to form typing profiles
Within each section there are additional fields that will be descibed in the next sections.
### metadata
The `metadata` section provides general information about the schema. This includes:
| Field | Type | Description |
|--------------|--------|--------------------------------------------------|
| id | string | A unique identifier for the schema |
| name | string | The name of the schema |
| description | string | A brief description of the schema |
| version | string | The version of the schema |
| curators | list | A list of curators of the schema |
### engine
The `engine` section specifies the computational tools used for sequence analysis. Currently
only one tool can be specified, and only `blastn` is supported.
| Field | Type | Description |
|-------|--------|--------------------------------------------------|
| tool | string | The tool used to generate the data |
### targets
The `targets` section lists the specific sequence targets such as genes, proteins, or markers
that the schema will analyze. These should be included in the associated sequence query data.
| Field | Type | Description |
|---------|--------|------------------------------------------------|
| targets | list | A list of targets to be analyzed |
### aliases
`aliases` are a convenient way to group multiple targets under a common name for easier
reference.
| Field | Type | Description |
|---------|--------|------------------------------------------------|
| name | string | The name of the alias |
| targets | list | A list of targets that are part of the alias |
### types
The `types` section defines specific combinations of targets and aliases to form distinct
types.
| Field | Type | Description |
|---------|--------|----------------------------------------------------------------------|
| name | string | The name of the profile |
| targets | list | A list of targets (or aliases) that are part of the type |
| excludes | list | A list of targets (or aliases) that will automatically fail the type |
### Example Schema: Partial SCCmec Typing
Here is an example of a partial schema for SCCmec typing:
```yaml
%YAML 1.2
---
# metadata: general information about the schema
metadata:
id: "sccmec_partial" # unique identifier for the schema
name: "SCCmec Typing" # name of the schema
description: "A partial schema for SCCmec typing" # description of the schema
version: "0.0.1" # version of the schema
curators: # A list of curators of the schema
- "Robert Petit"
# engine: specifies the computational tools and additional parameters used for sequence
# analysis.
engine:
tool: blastn # The tool used to generate the data
# targets: Lists the specific sequence targets such as genes, proteins, or markers that the
# schema will analyze. These should be included in the associated sequence query data
targets:
- "ccrA1"
- "ccrA2"
- "ccrA3"
- "ccrB1"
- "ccrB2"
- "ccrB3"
- "IS431"
- "IS1272"
- "mecA"
- "mecI"
- "mecR1"
# aliases: groups multiple targets under a common name for easier reference
aliases:
- name: "ccr Type 1" # name of the alias
targets: ["ccrA1", "ccrB1"] # list of targets that are part of the alias
- name: "ccr Type 2"
targets: ["ccrA2", "ccrB2"]
- name: "ccr Type 3"
targets: ["ccrA3", "ccrB3"]
- name: "mec Class A"
targets: ["IS431", "mecA", "mecR1", "mecI"]
- name: "mec Class B"
targets: ["IS431", "mecA", "mecR1", "IS1272"]
# types: define specific combinations of targets and aliases to form distinct types
types:
- name: "I" # name of the profile
targets: # list of targets (can use aliases) that are part of the profile
- "ccr Type 1"
- "mec Class B"
- name: "II"
targets:
- "ccr Type 2"
- "mec Class A"
- name: "III"
targets:
- "ccr Type 3"
- "mec Class A"
- name: "IV"
targets:
- "ccr Type 2"
- "mec Class B"
```
From this schema, `camlhmp` can generate a typing tool that can be used to analyze input
assemblies. This is only a partial schema, as there are many more SCCmec types and subtypes.
But using this schema it should be straight forward to add additional targets and profiles.
## `camlhmp-blast`
`camlhmp-blast` is a command that allows users to type their samples using a provided schema
with BLAST algorithms.
### Usage
```bash
๐ช camlhmp-blast ๐ช - Classify assemblies with a camlhmp schema using BLAST
โญโ Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --version -V Show the version and exit. โ
โ * --input -i TEXT Input file in FASTA format to classify [required] โ
โ * --yaml -y TEXT YAML file documenting the targets and types [required] โ
โ * --targets -t TEXT Query targets in FASTA format [required] โ
โ --outdir -o PATH Directory to write output [default: ./] โ
โ --prefix -p TEXT Prefix to use for output files [default: camlhmp] โ
โ --min-pident INTEGER Minimum percent identity to count a hit [default: 95] โ
โ --min-coverage INTEGER Minimum percent coverage to count a hit [default: 95] โ
โ --force Overwrite existing reports โ
โ --verbose Increase the verbosity of output โ
โ --silent Only critical errors will be printed โ
โ --help Show this message and exit. |
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
```
### Output Files
`camlhmp-blast` will generate three output files:
| File Name | Description |
|------------------------|-------------------------------------------------|
| `{PREFIX}.tsv` | A tab-delimited file with the predicted type |
| `{PREFIX}.blast.tsv` | A tab-delimited file of all blast hits |
| `{PREFIX}.details.tsv` | A tab-delimited file with details for each type |
#### Example {PREFIX}.tsv
```tsv
sample type targets schema version comment
saureus V ccrC1,IS431,IS431_1,IS431_2,mecA,mecR1 sccmec 1.0.0
```
| Column | Description |
|---------|--------------------------------------------------|
| sample | The sample name as determined by `--prefix` |
| type | The predicted type |
| targets | The targets for the given type that had a hit |
| schema | The schema used to determine the type |
| version | The version of the schema used |
| comment | A small comment about the result |
#### Example {PREFIX}.blast.tsv
```tsv
qseqid sseqid pident qcovs qlen slen length nident mismatch gapopen qstart qend sstart send evalue bitscore
ccrC1 AB121219.1 100.000 100 1623 28612 1623 1623 0 0 1 1623 16132 17754 0.0 2998
IS431_1 AB121219.1 100.000 100 791 28612 791 791 0 0 1 791 8221 9011 0.0 1461
IS431_1 AB121219.1 99.704 100 675 28612 675 673 2 0 1 675 2693 3367 0.0 1236
IS431_1 AB121219.1 98.519 100 675 28612 675 665 10 0 1 675 8951 8277 0.0 1192
...
```
This is the standard BLAST output with `-outfmt 6`
#### Example {PREFIX}.details.tsv
```tsv
sample type status targets missing schema version comment
type-v I False IS431,mecA,mecR1 ccrA1,ccrB1,IS1272 sccmec 1.0.0
type-v II False IS431,mecA,mecR1 ccrA2,ccrB2,mecI sccmec 1.0.0
type-v III False IS431,mecA,mecR1 ccrA3,ccrB3,mecI sccmec 1.0.0
type-v IV False IS431,mecA,mecR1 ccrA2,ccrB2,IS1272 sccmec 1.0.0
type-v V True ccrC1,IS431_1,mecA,mecR1,IS431_2 sccmec 1.0.0
type-v VI False IS431,mecA,mecR1 ccrA4,ccrB4,IS1272 sccmec 1.0.0
type-v VII False ccrC1,IS431_1,mecA,mecR1,IS431_2 IS12960D sccmec 1.0.0
type-v VIII False IS431,mecA,mecR1 ccrA4,ccrB4,mecI sccmec 1.0.0 Excluded target ccrC1 found, failing type VIII
type-v IX False IS431_1,mecA,mecR1,IS431_2 ccrA1,ccrB1 sccmec 1.0.0
type-v X False IS431_1,mecA,mecR1,IS431_2 ccrA1,ccrB6 sccmec 1.0.0
type-v XI False mecA,mecR1 ccrA1,ccrB3,blaZ,mecI sccmec 1.0.0
type-v XII False IS431_1,mecA,mecR1,IS431_2 ccrC2 sccmec 1.0.0
type-v XIII False IS431,mecA,mecR1 ccrC2,mecI sccmec 1.0.0
type-v XIV False ccrC1,IS431,mecA,mecR1 mecI sccmec 1.0.0
type-v XV False IS431,mecA,mecR1 ccrA1,ccrB6,mecI sccmec 1.0.0
```
This file provides a detailed view of the results. The columns are:
| Column | Description |
|---------|----------------------------------------------------|
| sample | The sample name as determined by `--prefix` |
| type | The predicted type |
| status | The status of the type (True if failed) |
| targets | The targets for the given type that had a match |
| missing | The targets for the given type that were not found |
| schema | The schema used to determine the type |
| version | The version of the schema used |
| comment | A small comment about the result |
## `camlhmp-extract`
`camlhmp-extract` is a command that allows users to extract targets from a set of references.
You should think of this script as a "helper" script for curators. It allows you to maintain
a TSV file with the targets and their positions in the reference sequences. `camlhmp-extract`
will then extract the targets from the reference sequences and write them to a FASTA file.
### Usage
```bash
๐ช camlhmp-extract ๐ช - Extract typing targets from a set of reference sequences
โญโ Required Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ * --path -i TEXT The path where input files are located [required] โ
โ * --targets -t TEXT A TSV of targets to extract in FASTA format [required] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Additional Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --outdir -o TEXT The path to save the extracted targets โ
โ --verbose Increase the verbosity of output โ
โ --silent Only critical errors will be printed โ
โ --version -V Show the version and exit. โ
โ --help Show this message and exit. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
```
## Citations
If you make use of this tool, please cite the following:
* **[BLAST](https://blast.ncbi.nlm.nih.gov/Blast.cgi)**
Basic Local Alignment Search Tool
*Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL [BLAST+: architecture and applications](http://dx.doi.org/10.1186/1471-2105-10-421). BMC Bioinformatics 10, 421 (2009)*
## Naming
If I'm being honest, I really wanted to name a tool with "camel" in it because they are my
wife's favorite animal๐ช and they also remind me of my friends in Oman!
Once it was decided YAML was going to be the format for defining schemas, I quickly stumbled
on "Classification through YAML" and quickly found out I wasn't the only once who thought
of "CAML". But, no matter, it was decided it would be something with "CAML", then Tim Read
came with the save and suggested "Heuristic Mapping Protocol". So, here we are - _camlhmp_!
## License
I'm not a lawyer and MIT has always been my go-to license. So, MIT it is!
Raw data
{
"_id": null,
"home_page": "https://github.com/rpetit3/camlhmp",
"name": "camlhmp",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.11",
"maintainer_email": null,
"keywords": "bioinformatics, bacteria, serotype, genotype",
"author": "Robert A. Petit III",
"author_email": "robbie.petit@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/4d/f0/90bcd69982dcc0de0b3269b7f3fe1227ccef6e55c447cecd4eee3ad36826/camlhmp-0.1.0.tar.gz",
"platform": null,
"description": "\ud83d\udc2a camlhmp \ud83d\udc2a - Classification through yAML Heuristic Mapping Protocol (__yeah, it's a stretch to\nmake sure \ud83d\udc2a is in the name!__)\n\n# camlhmp\n\n`camlhmp` is a tool for generating organism typing tools from YAML schemas. The idea came\nup from discussions with Tim Read about the need for a tool that would allow researchers\nto more easily define typing schemas for their organisms of interest. YAML seemed like a\na nice format for this due to its simplicity and readability.\n\n_`camlhmp` is under active development, and any feedback is appreciated._\n\n## Purpose\n\nThe primary purpose of `camlhmp` is to provide a framework that enables researchers to\nindependently define typing schemas for their organisms of interest using YAML. This\nfacilitates the management and analysis biological data, no matter the researchers experience\nlevel.\n\n`camlhmp` does not supply any pre-defined typing schemas. Instead, it provides researchers\nwith the tools necessary tools to create and maintain their own schemas. This I believe will\nensure the schemas remain up to date with the latest developments in its respective field.\n\nAdditionally, this really aroses from a practical need to streamline my maintenance of\nmultiple organism typing tools. Long-term maintenance of these tools is a challenge, and\nI think `camlhmp` will help me to keep them up-to-date and consistent.\n\n## Installation\n\n`camlhmp` will be made available through PyPI and Bioconda. For now, you can install it\nfrom the GitHub repository with the following command:\n\n```bash\nconda create -n camlhmp -c conda-forge -c bioconda camlhmp\nconda activate camlhmp\ncamlhmp\n```\n\n## YAML Schema Structure\n\nThe schema structure is designed to be simple and intuitive. Here is a basic skeleton of the\nexpected schema structure:\n\n```yaml\n%YAML 1.2\n---\n# metadata: general information about the schema\nmetadata:\n id: \"\" # unique identifier for the schema\n name: \"\" # name of the schema\n description: \"\" # description of the schema\n version: \"\" # version of the schema\n curators: [] # A list of curators of the schema\n\n# engine: specifies the computational tools and additional parameters used for sequence\n# analysis.\nengine:\n tool: \"\" # The tool used to generate the data\n\n# targets: Lists the specific sequence targets such as genes, proteins, or markers that the\n# schema will analyze. These should be included in the associated sequence query data\ntargets: []\n\n# aliases: groups multiple targets under a common name for easier reference\naliases:\n - name: \"\" # name of the alias\n targets: [] # list of targets that are part of the alias\n\n# types: define specific combinations of targets and aliases to form distinct types\ntypes:\n - name: \"\" # name of the profile\n targets: [] # list of targets (can use aliases) that are part of the profile\n excludes: [] # list of targets (or aliases) that will automatically fail the type\n```\n\nFrom this schema we have a few sections:\n\n- `metadata`: general information about the schema\n- `engine`: computational requirements for sequence analysis\n- `targets`: lists the sequence targets such as genes, proteins, or markers\n- `aliases`: groups multiple targets under a common name for easier reference\n- `profiles`: defines combinations of targets and aliases to form typing profiles\n\nWithin each section there are additional fields that will be descibed in the next sections.\n\n### metadata\n\nThe `metadata` section provides general information about the schema. This includes:\n\n| Field | Type | Description |\n|--------------|--------|--------------------------------------------------|\n| id | string | A unique identifier for the schema |\n| name | string | The name of the schema |\n| description | string | A brief description of the schema |\n| version | string | The version of the schema |\n| curators | list | A list of curators of the schema |\n\n### engine\n\nThe `engine` section specifies the computational tools used for sequence analysis. Currently\nonly one tool can be specified, and only `blastn` is supported.\n\n| Field | Type | Description |\n|-------|--------|--------------------------------------------------|\n| tool | string | The tool used to generate the data |\n\n### targets\n\nThe `targets` section lists the specific sequence targets such as genes, proteins, or markers\nthat the schema will analyze. These should be included in the associated sequence query data.\n\n| Field | Type | Description |\n|---------|--------|------------------------------------------------|\n| targets | list | A list of targets to be analyzed |\n\n### aliases\n\n`aliases` are a convenient way to group multiple targets under a common name for easier\nreference.\n\n| Field | Type | Description |\n|---------|--------|------------------------------------------------|\n| name | string | The name of the alias |\n| targets | list | A list of targets that are part of the alias |\n\n### types\n\nThe `types` section defines specific combinations of targets and aliases to form distinct\ntypes.\n\n| Field | Type | Description |\n|---------|--------|----------------------------------------------------------------------|\n| name | string | The name of the profile |\n| targets | list | A list of targets (or aliases) that are part of the type |\n| excludes | list | A list of targets (or aliases) that will automatically fail the type |\n\n### Example Schema: Partial SCCmec Typing\n\nHere is an example of a partial schema for SCCmec typing:\n\n```yaml\n%YAML 1.2\n---\n# metadata: general information about the schema\nmetadata:\n id: \"sccmec_partial\" # unique identifier for the schema\n name: \"SCCmec Typing\" # name of the schema\n description: \"A partial schema for SCCmec typing\" # description of the schema\n version: \"0.0.1\" # version of the schema\n curators: # A list of curators of the schema\n - \"Robert Petit\"\n\n# engine: specifies the computational tools and additional parameters used for sequence\n# analysis.\nengine:\n tool: blastn # The tool used to generate the data\n\n# targets: Lists the specific sequence targets such as genes, proteins, or markers that the\n# schema will analyze. These should be included in the associated sequence query data\ntargets:\n - \"ccrA1\"\n - \"ccrA2\"\n - \"ccrA3\"\n - \"ccrB1\"\n - \"ccrB2\"\n - \"ccrB3\"\n - \"IS431\"\n - \"IS1272\"\n - \"mecA\"\n - \"mecI\"\n - \"mecR1\"\n\n# aliases: groups multiple targets under a common name for easier reference\naliases:\n - name: \"ccr Type 1\" # name of the alias\n targets: [\"ccrA1\", \"ccrB1\"] # list of targets that are part of the alias\n - name: \"ccr Type 2\"\n targets: [\"ccrA2\", \"ccrB2\"]\n - name: \"ccr Type 3\"\n targets: [\"ccrA3\", \"ccrB3\"]\n - name: \"mec Class A\"\n targets: [\"IS431\", \"mecA\", \"mecR1\", \"mecI\"]\n - name: \"mec Class B\"\n targets: [\"IS431\", \"mecA\", \"mecR1\", \"IS1272\"]\n\n# types: define specific combinations of targets and aliases to form distinct types\ntypes:\n - name: \"I\" # name of the profile\n targets: # list of targets (can use aliases) that are part of the profile\n - \"ccr Type 1\"\n - \"mec Class B\"\n - name: \"II\"\n targets:\n - \"ccr Type 2\"\n - \"mec Class A\"\n - name: \"III\"\n targets:\n - \"ccr Type 3\"\n - \"mec Class A\"\n - name: \"IV\"\n targets:\n - \"ccr Type 2\"\n - \"mec Class B\"\n```\n\nFrom this schema, `camlhmp` can generate a typing tool that can be used to analyze input\nassemblies. This is only a partial schema, as there are many more SCCmec types and subtypes.\nBut using this schema it should be straight forward to add additional targets and profiles.\n\n## `camlhmp-blast`\n\n`camlhmp-blast` is a command that allows users to type their samples using a provided schema\nwith BLAST algorithms.\n\n### Usage\n\n```bash\n \ud83d\udc2a camlhmp-blast \ud83d\udc2a - Classify assemblies with a camlhmp schema using BLAST \n\n\u256d\u2500 Options \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256e\n\u2502 --version -V Show the version and exit. \u2502\n\u2502 * --input -i TEXT Input file in FASTA format to classify [required] \u2502\n\u2502 * --yaml -y TEXT YAML file documenting the targets and types [required] \u2502\n\u2502 * --targets -t TEXT Query targets in FASTA format [required] \u2502\n\u2502 --outdir -o PATH Directory to write output [default: ./] \u2502\n\u2502 --prefix -p TEXT Prefix to use for output files [default: camlhmp] \u2502\n\u2502 --min-pident INTEGER Minimum percent identity to count a hit [default: 95] \u2502\n\u2502 --min-coverage INTEGER Minimum percent coverage to count a hit [default: 95] \u2502\n\u2502 --force Overwrite existing reports \u2502\n\u2502 --verbose Increase the verbosity of output \u2502\n\u2502 --silent Only critical errors will be printed \u2502\n\u2502 --help Show this message and exit. |\n\u2570\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256f\n```\n\n### Output Files\n\n`camlhmp-blast` will generate three output files:\n\n| File Name | Description |\n|------------------------|-------------------------------------------------|\n| `{PREFIX}.tsv` | A tab-delimited file with the predicted type |\n| `{PREFIX}.blast.tsv` | A tab-delimited file of all blast hits |\n| `{PREFIX}.details.tsv` | A tab-delimited file with details for each type |\n\n#### Example {PREFIX}.tsv\n\n```tsv\nsample\ttype\ttargets\tschema\tversion\tcomment\nsaureus\tV\tccrC1,IS431,IS431_1,IS431_2,mecA,mecR1\tsccmec\t1.0.0\t\n```\n\n| Column | Description |\n|---------|--------------------------------------------------|\n| sample | The sample name as determined by `--prefix` |\n| type | The predicted type |\n| targets | The targets for the given type that had a hit |\n| schema | The schema used to determine the type |\n| version | The version of the schema used |\n| comment | A small comment about the result |\n\n#### Example {PREFIX}.blast.tsv\n\n```tsv\nqseqid\tsseqid\tpident\tqcovs\tqlen\tslen\tlength\tnident\tmismatch\tgapopen\tqstart\tqend\tsstart\tsend\tevalue\tbitscore\nccrC1\tAB121219.1\t100.000\t100\t1623\t28612\t1623\t1623\t0\t0\t1\t1623\t16132\t17754\t0.0\t2998\nIS431_1\tAB121219.1\t100.000\t100\t791\t28612\t791\t791\t0\t0\t1\t791\t8221\t9011\t0.0\t1461\nIS431_1\tAB121219.1\t99.704\t100\t675\t28612\t675\t673\t2\t0\t1\t675\t2693\t3367\t0.0\t1236\nIS431_1\tAB121219.1\t98.519\t100\t675\t28612\t675\t665\t10\t0\t1\t675\t8951\t8277\t0.0\t1192\n...\n```\n\nThis is the standard BLAST output with `-outfmt 6`\n\n#### Example {PREFIX}.details.tsv\n\n```tsv\nsample\ttype\tstatus\ttargets\tmissing\tschema\tversion\tcomment\ntype-v\tI\tFalse\tIS431,mecA,mecR1\tccrA1,ccrB1,IS1272\tsccmec\t1.0.0\t\ntype-v\tII\tFalse\tIS431,mecA,mecR1\tccrA2,ccrB2,mecI\tsccmec\t1.0.0\t\ntype-v\tIII\tFalse\tIS431,mecA,mecR1\tccrA3,ccrB3,mecI\tsccmec\t1.0.0\t\ntype-v\tIV\tFalse\tIS431,mecA,mecR1\tccrA2,ccrB2,IS1272\tsccmec\t1.0.0\t\ntype-v\tV\tTrue\tccrC1,IS431_1,mecA,mecR1,IS431_2\t\tsccmec\t1.0.0\t\ntype-v\tVI\tFalse\tIS431,mecA,mecR1\tccrA4,ccrB4,IS1272\tsccmec\t1.0.0\t\ntype-v\tVII\tFalse\tccrC1,IS431_1,mecA,mecR1,IS431_2\tIS12960D\tsccmec\t1.0.0\t\ntype-v\tVIII\tFalse\tIS431,mecA,mecR1\tccrA4,ccrB4,mecI\tsccmec\t1.0.0\tExcluded target ccrC1 found, failing type VIII\ntype-v\tIX\tFalse\tIS431_1,mecA,mecR1,IS431_2\tccrA1,ccrB1\tsccmec\t1.0.0\t\ntype-v\tX\tFalse\tIS431_1,mecA,mecR1,IS431_2\tccrA1,ccrB6\tsccmec\t1.0.0\t\ntype-v\tXI\tFalse\tmecA,mecR1\tccrA1,ccrB3,blaZ,mecI\tsccmec\t1.0.0\t\ntype-v\tXII\tFalse\tIS431_1,mecA,mecR1,IS431_2\tccrC2\tsccmec\t1.0.0\t\ntype-v\tXIII\tFalse\tIS431,mecA,mecR1\tccrC2,mecI\tsccmec\t1.0.0\t\ntype-v\tXIV\tFalse\tccrC1,IS431,mecA,mecR1\tmecI\tsccmec\t1.0.0\t\ntype-v\tXV\tFalse\tIS431,mecA,mecR1\tccrA1,ccrB6,mecI\tsccmec\t1.0.0\t\n```\n\nThis file provides a detailed view of the results. The columns are:\n\n| Column | Description |\n|---------|----------------------------------------------------|\n| sample | The sample name as determined by `--prefix` |\n| type | The predicted type |\n| status | The status of the type (True if failed) |\n| targets | The targets for the given type that had a match |\n| missing | The targets for the given type that were not found |\n| schema | The schema used to determine the type |\n| version | The version of the schema used |\n| comment | A small comment about the result |\n\n## `camlhmp-extract`\n\n`camlhmp-extract` is a command that allows users to extract targets from a set of references.\nYou should think of this script as a \"helper\" script for curators. It allows you to maintain\na TSV file with the targets and their positions in the reference sequences. `camlhmp-extract`\nwill then extract the targets from the reference sequences and write them to a FASTA file.\n\n### Usage\n\n```bash\n \ud83d\udc2a camlhmp-extract \ud83d\udc2a - Extract typing targets from a set of reference sequences\n\n\u256d\u2500 Required Options \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256e\n\u2502 * --path -i TEXT The path where input files are located [required] \u2502\n\u2502 * --targets -t TEXT A TSV of targets to extract in FASTA format [required] \u2502\n\u2570\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256f\n\u256d\u2500 Additional Options \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256e\n\u2502 --outdir -o TEXT The path to save the extracted targets \u2502\n\u2502 --verbose Increase the verbosity of output \u2502\n\u2502 --silent Only critical errors will be printed \u2502\n\u2502 --version -V Show the version and exit. \u2502\n\u2502 --help Show this message and exit. \u2502\n\u2570\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256f\n```\n\n## Citations\n\nIf you make use of this tool, please cite the following:\n\n* **[BLAST](https://blast.ncbi.nlm.nih.gov/Blast.cgi)** \nBasic Local Alignment Search Tool \n*Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL [BLAST+: architecture and applications](http://dx.doi.org/10.1186/1471-2105-10-421). BMC Bioinformatics 10, 421 (2009)* \n\n## Naming\n\nIf I'm being honest, I really wanted to name a tool with \"camel\" in it because they are my\nwife's favorite animal\ud83d\udc2a and they also remind me of my friends in Oman!\n\nOnce it was decided YAML was going to be the format for defining schemas, I quickly stumbled\non \"Classification through YAML\" and quickly found out I wasn't the only once who thought\nof \"CAML\". But, no matter, it was decided it would be something with \"CAML\", then Tim Read\ncame with the save and suggested \"Heuristic Mapping Protocol\". So, here we are - _camlhmp_!\n\n## License\n\nI'm not a lawyer and MIT has always been my go-to license. So, MIT it is!\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Classification through yAML Heuristic Mapping Protocol",
"version": "0.1.0",
"project_urls": {
"Homepage": "https://github.com/rpetit3/camlhmp",
"Repository": "https://github.com/rpetit3/camlhmp"
},
"split_keywords": [
"bioinformatics",
" bacteria",
" serotype",
" genotype"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "98987e7a47142679b1a2a5b88e520e0b001db755e83811bf15aeab05c10f53bb",
"md5": "21f4dcf9faae75b36be9cd1e7745e1a7",
"sha256": "1d501f675c3a987b6ba6a5db3d7fb769f18541f95f1b2e3276ad856cbd7ad819"
},
"downloads": -1,
"filename": "camlhmp-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "21f4dcf9faae75b36be9cd1e7745e1a7",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.11",
"size": 17499,
"upload_time": "2024-05-01T00:01:53",
"upload_time_iso_8601": "2024-05-01T00:01:53.910758Z",
"url": "https://files.pythonhosted.org/packages/98/98/7e7a47142679b1a2a5b88e520e0b001db755e83811bf15aeab05c10f53bb/camlhmp-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "4df090bcd69982dcc0de0b3269b7f3fe1227ccef6e55c447cecd4eee3ad36826",
"md5": "d1ed37cc4898ac458eb60f3e1f08967f",
"sha256": "c2b19d85be14c5764f3ef4c977f958d11e6612d2c4b6c790c1c4e38f1c8b9f79"
},
"downloads": -1,
"filename": "camlhmp-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "d1ed37cc4898ac458eb60f3e1f08967f",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.11",
"size": 17161,
"upload_time": "2024-05-01T00:01:55",
"upload_time_iso_8601": "2024-05-01T00:01:55.318960Z",
"url": "https://files.pythonhosted.org/packages/4d/f0/90bcd69982dcc0de0b3269b7f3fe1227ccef6e55c447cecd4eee3ad36826/camlhmp-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-05-01 00:01:55",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "rpetit3",
"github_project": "camlhmp",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "camlhmp"
}