howard-ann

Name	howard-ann JSON
Version	0.12.1.0 JSON
	download
home_page	https://github.com/bioinfo-chru-strasbourg/howard
Summary	HOWARD - Highly Open Workflow for Annotation & Ranking toward genomic variant Discovery
upload_time	2024-12-20 12:16:57
maintainer	None
docs_url	None
author	Antony Le Bechec
requires_python	<3.11,>=3.10
license	None
keywords	vcf variant annotation ranking
VCS
bugtrack_url
requirements	pandas duckdb pyarrow bio pyvcf3 dask pytest coverage fastparquet polars genomepy flake8 lazy pynose pyfaidx multiprocesspandas bgzip pgzip mgzip pysam jproperties psutil markdown tabulate md_toc numpy beautifulsoup4 pyBigWig cyvcf2 pypandoc_binary
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # HOWARD

<figure>
<img src="images/icon.png" title="HOWARD - Highly Open Workflow for Annotation & Ranking toward genomic variant
Discovery"
alt="HOWARD - Highly Open Workflow for Annotation & Ranking toward genomic variant
Discovery" />
<figcaption aria-hidden="true">
HOWARD - Highly Open Workflow for Annotation & Ranking toward genomic
variant Discovery
</figcaption>
</figure>

Highly Open Workflow for Annotation & Ranking toward genomic variant
Discovery

HOWARD annotates and prioritizes genetic variations, calculates and
normalizes annotations, translates files in multiple formats (e.g. vcf,
tsv, parquet) and generates variants statistics.

HOWARD annotation is mainly based on a build-in Parquet annotation
method, and external tools such as BCFTOOLS, ANNOVAR, snpEff, Exomiser
and Splice (see docs, automatically downloaded if needed). Parquet
annotation uses annotation database in VCF or BED format, in mutliple
file format: Parquet/duckdb, VCF, BED, TSV, CSV, TBL, JSON.

HOWARD calculation processes variants information to calculate new
information, such as: harmonizes allele frequency (VAF), extracts Nomen
(transcript, cNomen, pNomen...) from HGVS fields with an optional list
of personalized transcripts, generates VaRank format barcode.

HOWARD prioritization algorithm uses profiles to flag variants (as
passed or filtered), calculate a prioritization score, and automatically
generate a comment for each variants (example: 'polymorphism identified
in dbSNP. associated to Lung Cancer. Found in ClinVar
database').Prioritization profiles are defined in a configuration file.
A profile is defined as a list of annotation/value, using wildcards and
comparison options (contains, lower than, greater than, equal...).
Annotations fields may be quality values (usually from callers, such as
'GQ', 'DP') or other annotations fields provided by annotations tools,
such as HOWARD itself (example: COSMIC, Clinvar, 1000genomes, PolyPhen,
SIFT). Multiple profiles can be used simultaneously, which is useful to
define multiple validation/prioritization levels (example: 'standard',
'stringent', 'rare variants', 'low allele frequency').

HOWARD translates VCF format into multiple formats (e.g. VCF, TSV,
Parquet), by sorting variants using specific fields (example :
'prioritization score', 'allele frequency', 'gene symbol'),
including/excluding annotations/fields, including/excluding variants,
adding fixed columns.

HOWARD generates statistics files with a specific algorithm, snpEff and
BCFTOOLS.

HOWARD is multithreaded through the number of variants and by database
(data-scaling).

HOWARD is able to add plugins for further analyses.

## Table of contents

- [Installation](#installation)
  - [Download](#download)
  - [Python](#python)
  - [Docker](#docker)
  - [Databases](#databases)
  - [Configuration](#configuration)
- [Tools](#tools)
  - [Parameters](#parameters)
  - [Stats](#stats)
  - [Convert](#convert)
  - [Query](#query)
  - [Annotation](#annotation)
  - [Calculation](#calculation)
  - [Prioritization](#prioritization)
  - [HGVS annotation](#hgvs-annotation)
  - [Process](#process)
- [Documentation](#documentation)
- [Contact](#contact)

# Installation

HOWARD can be installed using [Python](#python), and a [Docker](#docker)
installation provides a CLI (Command Line Interface) with all external
tools and useful databases. [Databases](#databases) can be automatically
downloaded, or home-made generated (created or downloaded).

## Download

Download sources from gitHub

```bash
mkdir -p ~/howard/src
cd ~/howard/src
git clone https://github.com/bioinfo-chru-strasbourg/howard.git .
```

## Python

Install HOWARD using Python Pip tool, and run HOWARD for help options:

``` bash
conda create --name=howard python=3.10
conda activate howard
python -m pip install -e .
howard --help
```

``` text
usage: howard [-h] {query,stats,convert,hgvs,annotation,calculation,prioritization,process,databases,gui} ...

HOWARD:0.12.1
Highly Open Workflow for Annotation & Ranking toward genomic variant Discovery
HOWARD annotates and prioritizes genetic variations, calculates and normalizes annotations,
convert on multiple formats, query variations and generates statistics

Shared arguments:
  -h, --help            show this help message and exit

Tools:
  {query,stats,convert,hgvs,annotation,calculation,prioritization,process,databases,gui}
    query               Query genetic variations file in SQL format.
    stats               Statistics on genetic variations file.
    convert             Convert genetic variations file to another format.
    hgvs                HGVS annotation (HUGO internation nomenclature) using refGene,
                        genome and transcripts list.
    annotation          Annotation of genetic variations file using databases/files and tools.
    calculation         Calculation operations on genetic variations file and genotype information.
    prioritization      Prioritization of genetic variations based on annotations criteria (profiles).
    process             Full genetic variations process: annotation, calculation, prioritization, 
                        format, query, filter...
    databases           Download databases and needed files for howard and associated tools
    gui                 Graphical User Interface tools
```

Install HOWARD Graphical User Interface using Python Pip tool with
supplementary packages, and run as a tool:

``` bash
python -m pip install -r requirements-gui.txt
howard gui
```

<figure>
<img src="images/howard-gui.png" title="HOWARD Graphical User Interface"
alt="HOWARD Graphical User Interface" />
<figcaption aria-hidden="true">
HOWARD Graphical User Interface
</figcaption>
</figure>

## Docker

In order to build, setup and create a persitent CLI (running container
with all useful external tools such as
[BCFTools](https://samtools.github.io/bcftools/),
[snpEff](https://pcingola.github.io/SnpEff/),
[Annovar](https://annovar.openbioinformatics.org/),
[Exomiser](https://www.sanger.ac.uk/tool/exomiser/)), docker-compose
command build images and launch services as containers.

``` bash
docker-compose up -d
```

A setup container (HOWARD-setup) will download useful databases (take a
while). To avoid databases download (see [Databases section](#databases)
to download manually), just start:

``` bash
docker-compose up -d HOWARD-CLI
```

A Command Line Interface container (HOWARD-CLI) is started with host
data and databases folders mounted (by default in ~/howard folder, i.e.
`~/howard/data:/data` and `~/howard/databases:/databases`). Let's play
within Docker HOWARD-CLI service!

``` bash
docker exec -ti HOWARD-CLI bash
howard --help
```

<details>
<summary>
More details
</summary>

Docker HOWARD-CLI container (Command Line Interface) can be used to
execute commands.

> Example: Query of an existing VCF
>
> ``` bash
> docker exec HOWARD-CLI \
>    howard query \
>    --input='/tool/tests/data/example.vcf.gz' \
>    --query='SELECT * FROM variants'
> ```

> Example: VCF annotation using HOWARD-CLI (snpEff and ANNOVAR databases
> will be automatically downloaded), and query list of genes with HGVS
>
> ``` bash
> docker exec --workdir=/tool HOWARD-CLI \
>    howard process \
>       --config='config/config.json' \
>       --param='config/param.json' \
>       --input='tests/data/example.vcf.gz' \
>       --output='/tmp/example.process.tsv' \
>       --explode_infos \
>       --query="SELECT NOMEN, PZFlag, PZScore, PZComment \
>                FROM variants \
>                ORDER BY PZScore DESC"
> ```

</details>

## Databases

Multiple databases can be automatically downloaded with databases tool,
such as:

| database | description |
|----|----------------|
| [Genome](https://genome.ucsc.edu/cgi-bin/hgGateway) | Genome Reference Consortium Human |
| [Annovar](https://annovar.openbioinformatics.org/en/latest/) | ANNOVAR is an efficient software tool to utilize update-to-date information to functionally annotate genetic variants detected from diverse genomes |
| [snpEff](https://pcingola.github.io/SnpEff/) | Genetic variant annotation, and functional effect prediction toolbox |
| [refSeq](https://www.ncbi.nlm.nih.gov/refseq/) | A comprehensive, integrated, non-redundant, well-annotated set of reference sequences including genomic, transcript, and protein |
| [dbSNP](https://www.ncbi.nlm.nih.gov/snp/) | dbSNP contains human single nucleotide variations, microsatellites, and small-scale insertions and deletions along with publication, population frequency, molecular consequence, and genomic and RefSeq mapping information for both common variations and clinical mutations |
| [dbNSFP](https://sites.google.com/site/jpopgen/dbNSFP) | dbNSFP is a database developed for functional prediction and annotation of all potential non-synonymous single-nucleotide variants (nsSNVs) in the human genome |
| [AlphaMissense](https://github.com/google-deepmind/alphamissense) | AlphaMissense model implementation |
| [Exomiser](https://www.sanger.ac.uk/tool/exomiser/) | The Exomiser is a Java program that finds potential disease-causing variants from whole-exome or whole-genome sequencing data |

<details>
<summary>
More details
</summary>

> Example: Download Multiple databases in the same time for assembly
> 'hg19' (can take a while)
>
> ``` bash
> howard databases \
>    --assembly=hg19 \
>    --download-genomes='~/howard/databases/genomes/current' \
>    --download-genomes-provider='UCSC'\
>    --download-genomes-contig-regex='chr[0-9XYM]+$' \
>    --download-annovar='~/howard/databases/annovar/current' \
>    --download-annovar-files='refGene,cosmic70,nci60' \
>    --download-snpeff='~/howard/databases/snpeff/current' \
>    --download-refseq='~/howard/databases/refseq/current' \
>    --download-refseq-format-file='ncbiRefSeq.txt' \
>    --download-dbnsfp='~/howard/databases/dbnsfp/current' \
>    --download-dbnsfp-release='4.4a' \
>    --download-dbnsfp-subdatabases \
>    --download-alphamissense='~/howard/databases/alphamissense/current' \
>    --download-exomiser='~/howard/databases/exomiser/current' \
>    --download-dbsnp='~/howard/databases/dbsnp/current' \
>    --download-dbsnp-vcf \
>    --threads=8
> ```

See [HOWARD Help Databases tool](docs/help.md#databases-tool) for more
information.

Databases can be home-made generated, starting with a existing
annotation file, especially using [HOWARD convert](#convert) tool. These
files need to contain specific fields (depending on the annotation
type):

- variant annotation: '#CHROM', 'POS', 'ALT', 'REF'
- region annotation: '#CHROM', 'START', 'STOP'

Each database annotation file is associated with a 'header' file
('.hdr'), in VCF header format, to describe annotations within the
database.

</details>

## Configuration

HOWARD Configuration JSON file defined default configuration regarding
resources (e.g. threads, memory), settings (e.g. verbosity, temporary
files), default folders (e.g. for databases) and paths to external
tools.

See [HOWARD Configuration JSON](docs/help.config.md) for more
information.

# Tools

HOWARD annotates and prioritizes genetic variations, calculates and
normalizes annotations, convert on multiple formats, query variations
and generates statistics. These tools require options or a [Parameters
JSON](help.param.md) file.

## Parameters

HOWARD Parameters JSON file defined parameters to process annotations,
prioritization, calculations, convertions and queries. Use this
parameters file to configure tools, instead of options or as a main
configuration (options will replace parameters in JSON file).

See [HOWARD Parameters JSON](docs/help.param.md) for more information.

## Stats

Statistics on genetic variations, such as: number of variants, number of
samples, statistics by chromosome, genotypes by samples, annotations.
Theses statsitics can be applied to VCF files and all database
annotation files.

<details>
<summary>
More details
</summary>

> Example: Show example VCF statistics and brief overview
>
> ``` bash
> howard stats \
>    --input='tests/data/example.vcf.gz'
> ```

See [HOWARD Help Stats tool](docs/help.md#stats-tool) for more
information.

</details>

## Convert

Convert genetic variations file to another format. Multiple format are
available, such as usual and official VCF format, but also other formats
such as TSV, CSV, TBL, JSON and Parquet/duckDB. These formats need a
header '.hdr' file to take advantage of the power of howard (especially
through INFO/tag definition), and using howard convert tool
automatically generate header file fo futher use (otherwise, an default
'.hdr' file is generated).

<details>
<summary>
More details
</summary>

> Example: Translate VCF into TSV, export INFO/tags into columns, and
> show output file
>
> ``` bash
> howard convert \
>    --input='tests/data/example.vcf.gz' \
>    --explode_infos \
>    --output='/tmp/example.tsv'
> cat '/tmp/example.tsv'
> ```

See [HOWARD Help Convert tool](docs/help.md#convert-tool) for more
options.

</details>

## Query

Query genetic variations in SQL format. Data can be loaded into
'variants' table from various formats (e.g. VCF, TSV, Parquet...). Using
'explode' option allows querying on INFO/tag annotations. SQL query can
also use external data within the request, such as a Parquet file(s).

<details>
<summary>
More details
</summary>

> Example: Select variants in VCF with INFO Tags criterions
>
> ``` bash
> howard query \
>    --input='tests/data/example.vcf.gz' \
>    --explode_infos \
>    --query='SELECT "#CHROM", POS, REF, ALT, DP, CLNSIG, sample2, sample3 
>             FROM variants 
>             WHERE DP >= 50 OR CLNSIG NOT NULL 
>             ORDER BY CLNSIG DESC, DP DESC'
> ```

See [HOWARD Help Query tool](docs/help.md#query-tool) for more options.

</details>

## Annotation

Annotation is mainly based on a build-in Parquet annotation method,
using database format such as Parquet, duckdb, VCF, BED, TSV, JSON.
External annotation tools are also available, such as BCFTOOLS, Annovar,
snpEff, Exomiser and Splice. It uses available databases and homemade
databases. Annovar and snpEff databases are automatically downloaded
(see [HOWARD Help Databases tool](docs/help.md#databases-tool)). All
annotation parameters are defined in [HOWARD Parameters
JSON](docs/help.param.md) file.

Quick annotation allows to annotates by simply listing annotation
databases, or defining external tools keywords. These annotations can be
combined.

<details>
<summary>
More details
</summary>

> Example: VCF annotation with Parquet and VCF databases, output as VCF
> format
>
> ``` bash
> howard annotation \
>    --input='tests/data/example.vcf.gz' \
>    --annotations='tests/databases/annotations/current/hg19/dbnsfp42a.parquet,
>       tests/databases/annotations/current/hg19/cosmic70.vcf.gz' \
>    --output='/tmp/example.howard.vcf.gz'
> ```

> Example: VCF annotation with external tools (Annovar refGene and
> snpEff databases), output as TSV format
>
> ``` bash
> howard annotation \
>    --input='tests/data/example.vcf.gz' \
>    --annotations='annovar:refGene,snpeff' \
>    --output='/tmp/example.howard.tsv'
> ```

See [HOWARD Help Annotation tool](docs/help.md#annotation-tool) for more
options.

</details>

## Calculation

Calculation processes variants information to generate new information,
such as: identify variation type (VarType), harmonizes allele frequency
(VAF) and calculate sttistics (VAF_stats), extracts Nomen (transcript,
cNomen, pNomen...) from an HGVS field (e.g. snpEff, Annovar) with an
optional list of personalized transcripts, generates VaRank format
barcode, identify trio inheritance.

<details>
<summary>
More details
</summary>

> Example: Identify variant types and generate a table of variant type
> count
>
> ``` bash
> howard calculation \
>    --input='tests/data/example.full.vcf' \
>    --calculations='vartype' \
>    --output='/tmp/example.calculation.tsv'
>
> howard query \
>    --input='/tmp/example.calculation.tsv' \
>    --explode_infos \
>    --query='SELECT
>                "VARTYPE" AS 'VariantType',
>                count(*) AS 'Count'
>             FROM variants
>             GROUP BY "VARTYPE"
>             ORDER BY count DESC'
> ```
>
> ``` ts
>   VariantType  Count
> 0         BND      7
> 1         DUP      6
> 2         INS      5
> 3         SNV      4
> 4         CNV      3
> 5         DEL      3
> 6         INV      3
> 7      MOSAIC      2
> 8       INDEL      2
> 9         MNV      1
> ```

See [HOWARD Help Calculation tool](docs/help.md#calculation-tool) for
more options.

</details>

## Prioritization

Prioritization algorithm uses profiles to flag variants (as passed or
filtered), calculate a prioritization score, and automatically generate
a comment for each variants (example: 'polymorphism identified in dbSNP.
associated to Lung Cancer. Found in ClinVar database'). Prioritization
profiles are defined in a configuration file in JSON format. A profile
is defined as a list of annotation/value, using wildcards and comparison
options (contains, lower than, greater than, equal...). Annotations
fields may be quality values (usually from callers, such as 'DP') or
other annotations fields provided by annotations tools, such as HOWARD
itself (example: COSMIC, Clinvar, 1000genomes, PolyPhen, SIFT).

Multiple profiles can be used simultaneously, which is useful to define
multiple validation/prioritization levels (e.g. 'standard', 'stringent',
'rare variants'). Prioritization score can be calculated following
multiple mode, either 'HOWARD' (incremental) or 'VaRank' (maximum).
Prioritization fields can be selected (PZScore, PZFlag, PZComment,
PZTags, PZInfos).

<details>
<summary>
More details
</summary>

> Example: Prioritize variants from criteria on INFO annotations for
> profiles 'default' and 'GERMLINE' (from 'prioritization_profiles.json'
> profiles configuration), export prioritization tags, and query
> variants passing filters
>
> ``` bash
> howard prioritization \
>    --input='tests/data/example.vcf.gz' \
>    --prioritization_config='config/prioritization_profiles.json' \
>    --prioritizations='default,GERMLINE' \
>    --default_profile='default' \
>    --pzfields='PZFlag,PZScore,PZComment,PZTags,PZInfos' \
>    --prioritization_score_mode='HOWARD' \
>    --output='/tmp/example.prioritized.vcf.gz'
> ```
>
> ``` bash
> howard query \
>    --input='/tmp/example.prioritized.vcf.gz' \
>    --explode_infos \
>    --query="SELECT \"#CHROM\", POS, ALT, REF, PZFlag, PZScore, PZTags, DP, CLNSIG \
>             FROM variants \
>             WHERE PZScore > 0 \
>               AND PZFlag == 'PASS' \
>             ORDER BY PZScore DESC"
> ```
>
> ``` ts
>   #CHROM       POS ALT REF PZFlag  PZScore                     PZTags     DP      CLNSIG
> 0   chr1     28736   C   A   PASS       15  PZFlag#PASS|PZScore#15...    NaN  pathogenic
> 1   chr1     69101   G   A   PASS        5  PZFlag#PASS|PZScore#5|...   50.0        None
> 2   chr7  55249063   A   G   PASS        5  PZFlag#PASS|PZScore#5|...  125.0        None
> ```

See [HOWARD Help Prioritization tool](docs/help.md#prioritization-tool)
for more options.

</details>

## HGVS Annotation

HOWARD annotates variants with HGVS annotation using HUGO HGVS
internation Sequence Variant Nomenclature (http://varnomen.hgvs.org/).
Annotation refere to refGene and genome to generate HGVS nomenclature
for all available transcripts. This annotation add 'hgvs' field into VCF
INFO column of a VCF file. Several options are available, to add gene,
exon and protein information, to generate a "full format" detailed
annotation, to choose codon format.

See [HOWARD Help HGVS tool](docs/help.md#hgvs-tool) for more options.

<details>
<summary>
More details
</summary>

> Example: HGVS annotation with quick options
>
> ``` bash
> howard hgvs \
>    --input='tests/data/example.vcf.gz' \
>    --output='/tmp/example.process.tsv' \
>    --hgvs=full_format,use_exon
> ```
>
> ``` bash
> howard query \
>    --input='/tmp/example.process.tsv' \
>    --explode_infos \
>    --query="SELECT hgvs \
>             FROM variants "
> ```
>
> ``` ts
>                                                 hgvs
> 0                     WASH7P:NR_024540.1:n.50+585T>G
> 1     FAM138A:NR_026818.1:exon3:n.597T>G:p.Tyr199Asp
> 2  OR4F5:NM_001005484.2:NP_001005484.2:exon3:c.74...
> 3  LINC01128:NR_047526.1:n.287+3767A>G,LINC01128:...
> 4  LINC01128:NR_047526.1:n.287+3768A>G,LINC01128:...
> 5  LINC01128:NR_047526.1:n.287+3769A>G,LINC01128:...
> 6  EGFR:NM_001346897.2:NP_001333826.1:exon19:c.22...
> ```

</details>

## Process

HOWARD process tool manage genetic variations to:

- annotates genetic variants with multiple annotation databases/files
  and tools
- calculates and normalizes annotations
- prioritizes variants with profiles (list of citeria) to calculate
  scores and flags
- annotates genetic variants with HGVS nomenclature
- translates into various formats
- query genetic variants and annotations
- generates variants statistics

This process tool combines all other tools to pipe them in a uniq
command, through available options or a parameters file in JSON format
(see [HOWARD Parameters JSON](docs/help.param.md) file).

See [HOWARD Help Process tool](docs/help.md#process-tool) tool for more
information.

<details>
<summary>
More details
</summary>

> Example: Full process command with options (HGVS, annotation,
> calculation and prioritization)
>
> ``` bash
> howard process \
>    --input='tests/data/example.vcf.gz' \
>    --output='/tmp/example.process.tsv' \
>    --hgvs='full_format,use_exon' \
>    --annotations='tests/databases/annotations/current/hg19/avsnp150.parquet,
>       tests/databases/annotations/current/hg19/dbnsfp42a.parquet,
>       tests/databases/annotations/current/hg19/gnomad211_genome.parquet,
>       bcftools:tests/databases/annotations/current/hg19/cosmic70.vcf.gz,
>       snpeff,
>       annovar:refGene' \
>    --calculations='vartype,snpeff_hgvs,VAF,NOMEN' \
>    --prioritization_config='config/prioritization_profiles.json' \
>    --prioritizations='default' \
>    --explode_infos \
>    --query="SELECT NOMEN, PZFlag, PZScore \
>             FROM variants \
>             ORDER BY PZScore DESC"
> ```
>
> ``` ts
>                                             NOMEN    PZFlag  PZScore
> 0                    WASH7P:NR_024540:n.50+585T>G      PASS       15
> 1     OR4F5:NP_001005484:exon3:c.74A>G:p.Glu25Gly      PASS        5
> 2  EGFR:NM_001346897:exon19:c.2226G>A:p.Gln742Gln      PASS        5
> 3               LINC01128:NR_047526:n.287+3767A>G      PASS        0
> 4               LINC01128:NR_047526:n.287+3768A>G      PASS        0
> 5               LINC01128:NR_047526:n.287+3769A>G      PASS        0
> 6    FAM138A:NR_026818:exon3:n.597T>G:p.Tyr199Asp  FILTERED     -100
> ```

</details>

# Documentation

[HOWARD User Guide](docs/user_guide.md) is available to assist users for
particular commands, such as software installation, databases download,
annotation command, and so on.

[HOWARD Tips](docs/tips.md) proposes some additional advices to handle
HOWARD for particular use cases.

[HOWARD Help](docs/help.md) describes options of all HOWARD tools.
All information are also available for each tool using `--help` option.

[HOWARD Configuration JSON](docs/help.configuration.md) describes configuration
JSON file structure and options.

[HOWARD Parameters JSON](docs/help.parameters.md) describes parameters JSON
file structure and options.

[HOWARD Parameters Databases JSON](docs/help.parameters.databases.md)
describes configuration JSON file for databases download and convert.

[HOWARD Plugins](plugins/README.md) describes how to create HOWARD
plugins.

[HOWARD Package](docs/pdoc/index.html) describes HOWARD Package, Classes
and Functions.

# Contact

[Medical Bioinformatics applied to Diagnosis
Lab](https://www.chru-strasbourg.fr/service/bioinformatique-medicale-appliquee-au-diagnostic-unite-de/)
@ Strasbourg Univerty Hospital

[bioinfo@chru-strasbourg.fr](bioinfo@chru-strasbourg.fr)

[GitHub](https://github.com/bioinfo-chru-strasbourg)

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/bioinfo-chru-strasbourg/howard",
    "name": "howard-ann",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.11,>=3.10",
    "maintainer_email": null,
    "keywords": "VCF, variant, annotation, ranking",
    "author": "Antony Le Bechec",
    "author_email": "bioinfo@chru-strasbourg.fr",
    "download_url": "https://files.pythonhosted.org/packages/65/2d/c923bef61683a1f86019d5e73e8c332391466e63b42bf0e741a0574cf085/howard_ann-0.12.1.0.tar.gz",
    "platform": null,
    "description": "# HOWARD\n\n<figure>\n<img src=\"images/icon.png\" title=\"HOWARD - Highly Open Workflow for Annotation & Ranking toward genomic variant\nDiscovery\"\nalt=\"HOWARD - Highly Open Workflow for Annotation & Ranking toward genomic variant\nDiscovery\" />\n<figcaption aria-hidden=\"true\">\nHOWARD - Highly Open Workflow for Annotation & Ranking toward genomic\nvariant Discovery\n</figcaption>\n</figure>\n\nHighly Open Workflow for Annotation & Ranking toward genomic variant\nDiscovery\n\nHOWARD annotates and prioritizes genetic variations, calculates and\nnormalizes annotations, translates files in multiple formats (e.g. vcf,\ntsv, parquet) and generates variants statistics.\n\nHOWARD annotation is mainly based on a build-in Parquet annotation\nmethod, and external tools such as BCFTOOLS, ANNOVAR, snpEff, Exomiser\nand Splice (see docs, automatically downloaded if needed). Parquet\nannotation uses annotation database in VCF or BED format, in mutliple\nfile format: Parquet/duckdb, VCF, BED, TSV, CSV, TBL, JSON.\n\nHOWARD calculation processes variants information to calculate new\ninformation, such as: harmonizes allele frequency (VAF), extracts Nomen\n(transcript, cNomen, pNomen...) from HGVS fields with an optional list\nof personalized transcripts, generates VaRank format barcode.\n\nHOWARD prioritization algorithm uses profiles to flag variants (as\npassed or filtered), calculate a prioritization score, and automatically\ngenerate a comment for each variants (example: 'polymorphism identified\nin dbSNP. associated to Lung Cancer. Found in ClinVar\ndatabase').Prioritization profiles are defined in a configuration file.\nA profile is defined as a list of annotation/value, using wildcards and\ncomparison options (contains, lower than, greater than, equal...).\nAnnotations fields may be quality values (usually from callers, such as\n'GQ', 'DP') or other annotations fields provided by annotations tools,\nsuch as HOWARD itself (example: COSMIC, Clinvar, 1000genomes, PolyPhen,\nSIFT). Multiple profiles can be used simultaneously, which is useful to\ndefine multiple validation/prioritization levels (example: 'standard',\n'stringent', 'rare variants', 'low allele frequency').\n\nHOWARD translates VCF format into multiple formats (e.g. VCF, TSV,\nParquet), by sorting variants using specific fields (example :\n'prioritization score', 'allele frequency', 'gene symbol'),\nincluding/excluding annotations/fields, including/excluding variants,\nadding fixed columns.\n\nHOWARD generates statistics files with a specific algorithm, snpEff and\nBCFTOOLS.\n\nHOWARD is multithreaded through the number of variants and by database\n(data-scaling).\n\nHOWARD is able to add plugins for further analyses.\n\n## Table of contents\n\n- [Installation](#installation)\n  - [Download](#download)\n  - [Python](#python)\n  - [Docker](#docker)\n  - [Databases](#databases)\n  - [Configuration](#configuration)\n- [Tools](#tools)\n  - [Parameters](#parameters)\n  - [Stats](#stats)\n  - [Convert](#convert)\n  - [Query](#query)\n  - [Annotation](#annotation)\n  - [Calculation](#calculation)\n  - [Prioritization](#prioritization)\n  - [HGVS annotation](#hgvs-annotation)\n  - [Process](#process)\n- [Documentation](#documentation)\n- [Contact](#contact)\n\n# Installation\n\nHOWARD can be installed using [Python](#python), and a [Docker](#docker)\ninstallation provides a CLI (Command Line Interface) with all external\ntools and useful databases. [Databases](#databases) can be automatically\ndownloaded, or home-made generated (created or downloaded).\n\n## Download\n\nDownload sources from gitHub\n\n```bash\nmkdir -p ~/howard/src\ncd ~/howard/src\ngit clone https://github.com/bioinfo-chru-strasbourg/howard.git .\n```\n\n## Python\n\nInstall HOWARD using Python Pip tool, and run HOWARD for help options:\n\n``` bash\nconda create --name=howard python=3.10\nconda activate howard\npython -m pip install -e .\nhoward --help\n```\n\n``` text\nusage: howard [-h] {query,stats,convert,hgvs,annotation,calculation,prioritization,process,databases,gui} ...\n\nHOWARD:0.12.1\nHighly Open Workflow for Annotation & Ranking toward genomic variant Discovery\nHOWARD annotates and prioritizes genetic variations, calculates and normalizes annotations,\nconvert on multiple formats, query variations and generates statistics\n\nShared arguments:\n  -h, --help            show this help message and exit\n\nTools:\n  {query,stats,convert,hgvs,annotation,calculation,prioritization,process,databases,gui}\n    query               Query genetic variations file in SQL format.\n    stats               Statistics on genetic variations file.\n    convert             Convert genetic variations file to another format.\n    hgvs                HGVS annotation (HUGO internation nomenclature) using refGene,\n                        genome and transcripts list.\n    annotation          Annotation of genetic variations file using databases/files and tools.\n    calculation         Calculation operations on genetic variations file and genotype information.\n    prioritization      Prioritization of genetic variations based on annotations criteria (profiles).\n    process             Full genetic variations process: annotation, calculation, prioritization, \n                        format, query, filter...\n    databases           Download databases and needed files for howard and associated tools\n    gui                 Graphical User Interface tools\n```\n\nInstall HOWARD Graphical User Interface using Python Pip tool with\nsupplementary packages, and run as a tool:\n\n``` bash\npython -m pip install -r requirements-gui.txt\nhoward gui\n```\n\n<figure>\n<img src=\"images/howard-gui.png\" title=\"HOWARD Graphical User Interface\"\nalt=\"HOWARD Graphical User Interface\" />\n<figcaption aria-hidden=\"true\">\nHOWARD Graphical User Interface\n</figcaption>\n</figure>\n\n## Docker\n\nIn order to build, setup and create a persitent CLI (running container\nwith all useful external tools such as\n[BCFTools](https://samtools.github.io/bcftools/),\n[snpEff](https://pcingola.github.io/SnpEff/),\n[Annovar](https://annovar.openbioinformatics.org/),\n[Exomiser](https://www.sanger.ac.uk/tool/exomiser/)), docker-compose\ncommand build images and launch services as containers.\n\n``` bash\ndocker-compose up -d\n```\n\nA setup container (HOWARD-setup) will download useful databases (take a\nwhile). To avoid databases download (see [Databases section](#databases)\nto download manually), just start:\n\n``` bash\ndocker-compose up -d HOWARD-CLI\n```\n\nA Command Line Interface container (HOWARD-CLI) is started with host\ndata and databases folders mounted (by default in ~/howard folder, i.e.\n`~/howard/data:/data` and `~/howard/databases:/databases`). Let's play\nwithin Docker HOWARD-CLI service!\n\n``` bash\ndocker exec -ti HOWARD-CLI bash\nhoward --help\n```\n\n<details>\n<summary>\nMore details\n</summary>\n\nDocker HOWARD-CLI container (Command Line Interface) can be used to\nexecute commands.\n\n> Example: Query of an existing VCF\n>\n> ``` bash\n> docker exec HOWARD-CLI \\\n>    howard query \\\n>    --input='/tool/tests/data/example.vcf.gz' \\\n>    --query='SELECT * FROM variants'\n> ```\n\n> Example: VCF annotation using HOWARD-CLI (snpEff and ANNOVAR databases\n> will be automatically downloaded), and query list of genes with HGVS\n>\n> ``` bash\n> docker exec --workdir=/tool HOWARD-CLI \\\n>    howard process \\\n>       --config='config/config.json' \\\n>       --param='config/param.json' \\\n>       --input='tests/data/example.vcf.gz' \\\n>       --output='/tmp/example.process.tsv' \\\n>       --explode_infos \\\n>       --query=\"SELECT NOMEN, PZFlag, PZScore, PZComment \\\n>                FROM variants \\\n>                ORDER BY PZScore DESC\"\n> ```\n\n</details>\n\n## Databases\n\nMultiple databases can be automatically downloaded with databases tool,\nsuch as:\n\n| database | description |\n|----|----------------|\n| [Genome](https://genome.ucsc.edu/cgi-bin/hgGateway) | Genome Reference Consortium Human |\n| [Annovar](https://annovar.openbioinformatics.org/en/latest/) | ANNOVAR is an efficient software tool to utilize update-to-date information to functionally annotate genetic variants detected from diverse genomes |\n| [snpEff](https://pcingola.github.io/SnpEff/) | Genetic variant annotation, and functional effect prediction toolbox |\n| [refSeq](https://www.ncbi.nlm.nih.gov/refseq/) | A comprehensive, integrated, non-redundant, well-annotated set of reference sequences including genomic, transcript, and protein |\n| [dbSNP](https://www.ncbi.nlm.nih.gov/snp/) | dbSNP contains human single nucleotide variations, microsatellites, and small-scale insertions and deletions along with publication, population frequency, molecular consequence, and genomic and RefSeq mapping information for both common variations and clinical mutations |\n| [dbNSFP](https://sites.google.com/site/jpopgen/dbNSFP) | dbNSFP is a database developed for functional prediction and annotation of all potential non-synonymous single-nucleotide variants (nsSNVs) in the human genome |\n| [AlphaMissense](https://github.com/google-deepmind/alphamissense) | AlphaMissense model implementation |\n| [Exomiser](https://www.sanger.ac.uk/tool/exomiser/) | The Exomiser is a Java program that finds potential disease-causing variants from whole-exome or whole-genome sequencing data |\n\n<details>\n<summary>\nMore details\n</summary>\n\n> Example: Download Multiple databases in the same time for assembly\n> 'hg19' (can take a while)\n>\n> ``` bash\n> howard databases \\\n>    --assembly=hg19 \\\n>    --download-genomes='~/howard/databases/genomes/current' \\\n>    --download-genomes-provider='UCSC'\\\n>    --download-genomes-contig-regex='chr[0-9XYM]+$' \\\n>    --download-annovar='~/howard/databases/annovar/current' \\\n>    --download-annovar-files='refGene,cosmic70,nci60' \\\n>    --download-snpeff='~/howard/databases/snpeff/current' \\\n>    --download-refseq='~/howard/databases/refseq/current' \\\n>    --download-refseq-format-file='ncbiRefSeq.txt' \\\n>    --download-dbnsfp='~/howard/databases/dbnsfp/current' \\\n>    --download-dbnsfp-release='4.4a' \\\n>    --download-dbnsfp-subdatabases \\\n>    --download-alphamissense='~/howard/databases/alphamissense/current' \\\n>    --download-exomiser='~/howard/databases/exomiser/current' \\\n>    --download-dbsnp='~/howard/databases/dbsnp/current' \\\n>    --download-dbsnp-vcf \\\n>    --threads=8\n> ```\n\nSee [HOWARD Help Databases tool](docs/help.md#databases-tool) for more\ninformation.\n\nDatabases can be home-made generated, starting with a existing\nannotation file, especially using [HOWARD convert](#convert) tool. These\nfiles need to contain specific fields (depending on the annotation\ntype):\n\n- variant annotation: '#CHROM', 'POS', 'ALT', 'REF'\n- region annotation: '#CHROM', 'START', 'STOP'\n\nEach database annotation file is associated with a 'header' file\n('.hdr'), in VCF header format, to describe annotations within the\ndatabase.\n\n</details>\n\n## Configuration\n\nHOWARD Configuration JSON file defined default configuration regarding\nresources (e.g. threads, memory), settings (e.g. verbosity, temporary\nfiles), default folders (e.g. for databases) and paths to external\ntools.\n\nSee [HOWARD Configuration JSON](docs/help.config.md) for more\ninformation.\n\n# Tools\n\nHOWARD annotates and prioritizes genetic variations, calculates and\nnormalizes annotations, convert on multiple formats, query variations\nand generates statistics. These tools require options or a [Parameters\nJSON](help.param.md) file.\n\n## Parameters\n\nHOWARD Parameters JSON file defined parameters to process annotations,\nprioritization, calculations, convertions and queries. Use this\nparameters file to configure tools, instead of options or as a main\nconfiguration (options will replace parameters in JSON file).\n\nSee [HOWARD Parameters JSON](docs/help.param.md) for more information.\n\n## Stats\n\nStatistics on genetic variations, such as: number of variants, number of\nsamples, statistics by chromosome, genotypes by samples, annotations.\nTheses statsitics can be applied to VCF files and all database\nannotation files.\n\n<details>\n<summary>\nMore details\n</summary>\n\n> Example: Show example VCF statistics and brief overview\n>\n> ``` bash\n> howard stats \\\n>    --input='tests/data/example.vcf.gz'\n> ```\n\nSee [HOWARD Help Stats tool](docs/help.md#stats-tool) for more\ninformation.\n\n</details>\n\n## Convert\n\nConvert genetic variations file to another format. Multiple format are\navailable, such as usual and official VCF format, but also other formats\nsuch as TSV, CSV, TBL, JSON and Parquet/duckDB. These formats need a\nheader '.hdr' file to take advantage of the power of howard (especially\nthrough INFO/tag definition), and using howard convert tool\nautomatically generate header file fo futher use (otherwise, an default\n'.hdr' file is generated).\n\n<details>\n<summary>\nMore details\n</summary>\n\n> Example: Translate VCF into TSV, export INFO/tags into columns, and\n> show output file\n>\n> ``` bash\n> howard convert \\\n>    --input='tests/data/example.vcf.gz' \\\n>    --explode_infos \\\n>    --output='/tmp/example.tsv'\n> cat '/tmp/example.tsv'\n> ```\n\nSee [HOWARD Help Convert tool](docs/help.md#convert-tool) for more\noptions.\n\n</details>\n\n## Query\n\nQuery genetic variations in SQL format. Data can be loaded into\n'variants' table from various formats (e.g. VCF, TSV, Parquet...). Using\n'explode' option allows querying on INFO/tag annotations. SQL query can\nalso use external data within the request, such as a Parquet file(s).\n\n<details>\n<summary>\nMore details\n</summary>\n\n> Example: Select variants in VCF with INFO Tags criterions\n>\n> ``` bash\n> howard query \\\n>    --input='tests/data/example.vcf.gz' \\\n>    --explode_infos \\\n>    --query='SELECT \"#CHROM\", POS, REF, ALT, DP, CLNSIG, sample2, sample3 \n>             FROM variants \n>             WHERE DP >= 50 OR CLNSIG NOT NULL \n>             ORDER BY CLNSIG DESC, DP DESC'\n> ```\n\nSee [HOWARD Help Query tool](docs/help.md#query-tool) for more options.\n\n</details>\n\n## Annotation\n\nAnnotation is mainly based on a build-in Parquet annotation method,\nusing database format such as Parquet, duckdb, VCF, BED, TSV, JSON.\nExternal annotation tools are also available, such as BCFTOOLS, Annovar,\nsnpEff, Exomiser and Splice. It uses available databases and homemade\ndatabases. Annovar and snpEff databases are automatically downloaded\n(see [HOWARD Help Databases tool](docs/help.md#databases-tool)). All\nannotation parameters are defined in [HOWARD Parameters\nJSON](docs/help.param.md) file.\n\nQuick annotation allows to annotates by simply listing annotation\ndatabases, or defining external tools keywords. These annotations can be\ncombined.\n\n<details>\n<summary>\nMore details\n</summary>\n\n> Example: VCF annotation with Parquet and VCF databases, output as VCF\n> format\n>\n> ``` bash\n> howard annotation \\\n>    --input='tests/data/example.vcf.gz' \\\n>    --annotations='tests/databases/annotations/current/hg19/dbnsfp42a.parquet,\n>       tests/databases/annotations/current/hg19/cosmic70.vcf.gz' \\\n>    --output='/tmp/example.howard.vcf.gz'\n> ```\n\n> Example: VCF annotation with external tools (Annovar refGene and\n> snpEff databases), output as TSV format\n>\n> ``` bash\n> howard annotation \\\n>    --input='tests/data/example.vcf.gz' \\\n>    --annotations='annovar:refGene,snpeff' \\\n>    --output='/tmp/example.howard.tsv'\n> ```\n\nSee [HOWARD Help Annotation tool](docs/help.md#annotation-tool) for more\noptions.\n\n</details>\n\n## Calculation\n\nCalculation processes variants information to generate new information,\nsuch as: identify variation type (VarType), harmonizes allele frequency\n(VAF) and calculate sttistics (VAF_stats), extracts Nomen (transcript,\ncNomen, pNomen...) from an HGVS field (e.g. snpEff, Annovar) with an\noptional list of personalized transcripts, generates VaRank format\nbarcode, identify trio inheritance.\n\n<details>\n<summary>\nMore details\n</summary>\n\n> Example: Identify variant types and generate a table of variant type\n> count\n>\n> ``` bash\n> howard calculation \\\n>    --input='tests/data/example.full.vcf' \\\n>    --calculations='vartype' \\\n>    --output='/tmp/example.calculation.tsv'\n>\n> howard query \\\n>    --input='/tmp/example.calculation.tsv' \\\n>    --explode_infos \\\n>    --query='SELECT\n>                \"VARTYPE\" AS 'VariantType',\n>                count(*) AS 'Count'\n>             FROM variants\n>             GROUP BY \"VARTYPE\"\n>             ORDER BY count DESC'\n> ```\n>\n> ``` ts\n>   VariantType  Count\n> 0         BND      7\n> 1         DUP      6\n> 2         INS      5\n> 3         SNV      4\n> 4         CNV      3\n> 5         DEL      3\n> 6         INV      3\n> 7      MOSAIC      2\n> 8       INDEL      2\n> 9         MNV      1\n> ```\n\nSee [HOWARD Help Calculation tool](docs/help.md#calculation-tool) for\nmore options.\n\n</details>\n\n## Prioritization\n\nPrioritization algorithm uses profiles to flag variants (as passed or\nfiltered), calculate a prioritization score, and automatically generate\na comment for each variants (example: 'polymorphism identified in dbSNP.\nassociated to Lung Cancer. Found in ClinVar database'). Prioritization\nprofiles are defined in a configuration file in JSON format. A profile\nis defined as a list of annotation/value, using wildcards and comparison\noptions (contains, lower than, greater than, equal...). Annotations\nfields may be quality values (usually from callers, such as 'DP') or\nother annotations fields provided by annotations tools, such as HOWARD\nitself (example: COSMIC, Clinvar, 1000genomes, PolyPhen, SIFT).\n\nMultiple profiles can be used simultaneously, which is useful to define\nmultiple validation/prioritization levels (e.g. 'standard', 'stringent',\n'rare variants'). Prioritization score can be calculated following\nmultiple mode, either 'HOWARD' (incremental) or 'VaRank' (maximum).\nPrioritization fields can be selected (PZScore, PZFlag, PZComment,\nPZTags, PZInfos).\n\n<details>\n<summary>\nMore details\n</summary>\n\n> Example: Prioritize variants from criteria on INFO annotations for\n> profiles 'default' and 'GERMLINE' (from 'prioritization_profiles.json'\n> profiles configuration), export prioritization tags, and query\n> variants passing filters\n>\n> ``` bash\n> howard prioritization \\\n>    --input='tests/data/example.vcf.gz' \\\n>    --prioritization_config='config/prioritization_profiles.json' \\\n>    --prioritizations='default,GERMLINE' \\\n>    --default_profile='default' \\\n>    --pzfields='PZFlag,PZScore,PZComment,PZTags,PZInfos' \\\n>    --prioritization_score_mode='HOWARD' \\\n>    --output='/tmp/example.prioritized.vcf.gz'\n> ```\n>\n> ``` bash\n> howard query \\\n>    --input='/tmp/example.prioritized.vcf.gz' \\\n>    --explode_infos \\\n>    --query=\"SELECT \\\"#CHROM\\\", POS, ALT, REF, PZFlag, PZScore, PZTags, DP, CLNSIG \\\n>             FROM variants \\\n>             WHERE PZScore > 0 \\\n>               AND PZFlag == 'PASS' \\\n>             ORDER BY PZScore DESC\"\n> ```\n>\n> ``` ts\n>   #CHROM       POS ALT REF PZFlag  PZScore                     PZTags     DP      CLNSIG\n> 0   chr1     28736   C   A   PASS       15  PZFlag#PASS|PZScore#15...    NaN  pathogenic\n> 1   chr1     69101   G   A   PASS        5  PZFlag#PASS|PZScore#5|...   50.0        None\n> 2   chr7  55249063   A   G   PASS        5  PZFlag#PASS|PZScore#5|...  125.0        None\n> ```\n\nSee [HOWARD Help Prioritization tool](docs/help.md#prioritization-tool)\nfor more options.\n\n</details>\n\n## HGVS Annotation\n\nHOWARD annotates variants with HGVS annotation using HUGO HGVS\ninternation Sequence Variant Nomenclature (http://varnomen.hgvs.org/).\nAnnotation refere to refGene and genome to generate HGVS nomenclature\nfor all available transcripts. This annotation add 'hgvs' field into VCF\nINFO column of a VCF file. Several options are available, to add gene,\nexon and protein information, to generate a \"full format\" detailed\nannotation, to choose codon format.\n\nSee [HOWARD Help HGVS tool](docs/help.md#hgvs-tool) for more options.\n\n<details>\n<summary>\nMore details\n</summary>\n\n> Example: HGVS annotation with quick options\n>\n> ``` bash\n> howard hgvs \\\n>    --input='tests/data/example.vcf.gz' \\\n>    --output='/tmp/example.process.tsv' \\\n>    --hgvs=full_format,use_exon\n> ```\n>\n> ``` bash\n> howard query \\\n>    --input='/tmp/example.process.tsv' \\\n>    --explode_infos \\\n>    --query=\"SELECT hgvs \\\n>             FROM variants \"\n> ```\n>\n> ``` ts\n>                                                 hgvs\n> 0                     WASH7P:NR_024540.1:n.50+585T>G\n> 1     FAM138A:NR_026818.1:exon3:n.597T>G:p.Tyr199Asp\n> 2  OR4F5:NM_001005484.2:NP_001005484.2:exon3:c.74...\n> 3  LINC01128:NR_047526.1:n.287+3767A>G,LINC01128:...\n> 4  LINC01128:NR_047526.1:n.287+3768A>G,LINC01128:...\n> 5  LINC01128:NR_047526.1:n.287+3769A>G,LINC01128:...\n> 6  EGFR:NM_001346897.2:NP_001333826.1:exon19:c.22...\n> ```\n\n</details>\n\n## Process\n\nHOWARD process tool manage genetic variations to:\n\n- annotates genetic variants with multiple annotation databases/files\n  and tools\n- calculates and normalizes annotations\n- prioritizes variants with profiles (list of citeria) to calculate\n  scores and flags\n- annotates genetic variants with HGVS nomenclature\n- translates into various formats\n- query genetic variants and annotations\n- generates variants statistics\n\nThis process tool combines all other tools to pipe them in a uniq\ncommand, through available options or a parameters file in JSON format\n(see [HOWARD Parameters JSON](docs/help.param.md) file).\n\nSee [HOWARD Help Process tool](docs/help.md#process-tool) tool for more\ninformation.\n\n<details>\n<summary>\nMore details\n</summary>\n\n> Example: Full process command with options (HGVS, annotation,\n> calculation and prioritization)\n>\n> ``` bash\n> howard process \\\n>    --input='tests/data/example.vcf.gz' \\\n>    --output='/tmp/example.process.tsv' \\\n>    --hgvs='full_format,use_exon' \\\n>    --annotations='tests/databases/annotations/current/hg19/avsnp150.parquet,\n>       tests/databases/annotations/current/hg19/dbnsfp42a.parquet,\n>       tests/databases/annotations/current/hg19/gnomad211_genome.parquet,\n>       bcftools:tests/databases/annotations/current/hg19/cosmic70.vcf.gz,\n>       snpeff,\n>       annovar:refGene' \\\n>    --calculations='vartype,snpeff_hgvs,VAF,NOMEN' \\\n>    --prioritization_config='config/prioritization_profiles.json' \\\n>    --prioritizations='default' \\\n>    --explode_infos \\\n>    --query=\"SELECT NOMEN, PZFlag, PZScore \\\n>             FROM variants \\\n>             ORDER BY PZScore DESC\"\n> ```\n>\n> ``` ts\n>                                             NOMEN    PZFlag  PZScore\n> 0                    WASH7P:NR_024540:n.50+585T>G      PASS       15\n> 1     OR4F5:NP_001005484:exon3:c.74A>G:p.Glu25Gly      PASS        5\n> 2  EGFR:NM_001346897:exon19:c.2226G>A:p.Gln742Gln      PASS        5\n> 3               LINC01128:NR_047526:n.287+3767A>G      PASS        0\n> 4               LINC01128:NR_047526:n.287+3768A>G      PASS        0\n> 5               LINC01128:NR_047526:n.287+3769A>G      PASS        0\n> 6    FAM138A:NR_026818:exon3:n.597T>G:p.Tyr199Asp  FILTERED     -100\n> ```\n\n</details>\n\n# Documentation\n\n[HOWARD User Guide](docs/user_guide.md) is available to assist users for\nparticular commands, such as software installation, databases download,\nannotation command, and so on.\n\n[HOWARD Tips](docs/tips.md) proposes some additional advices to handle\nHOWARD for particular use cases.\n\n[HOWARD Help](docs/help.md) describes options of all HOWARD tools.\nAll information are also available for each tool using `--help` option.\n\n[HOWARD Configuration JSON](docs/help.configuration.md) describes configuration\nJSON file structure and options.\n\n[HOWARD Parameters JSON](docs/help.parameters.md) describes parameters JSON\nfile structure and options.\n\n[HOWARD Parameters Databases JSON](docs/help.parameters.databases.md)\ndescribes configuration JSON file for databases download and convert.\n\n[HOWARD Plugins](plugins/README.md) describes how to create HOWARD\nplugins.\n\n[HOWARD Package](docs/pdoc/index.html) describes HOWARD Package, Classes\nand Functions.\n\n# Contact\n\n[Medical Bioinformatics applied to Diagnosis\nLab](https://www.chru-strasbourg.fr/service/bioinformatique-medicale-appliquee-au-diagnostic-unite-de/)\n@ Strasbourg Univerty Hospital\n\n[bioinfo@chru-strasbourg.fr](bioinfo@chru-strasbourg.fr)\n\n[GitHub](https://github.com/bioinfo-chru-strasbourg)\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "HOWARD - Highly Open Workflow for Annotation & Ranking toward genomic variant Discovery",
    "version": "0.12.1.0",
    "project_urls": {
        "Homepage": "https://github.com/bioinfo-chru-strasbourg/howard"
    },
    "split_keywords": [
        "vcf",
        " variant",
        " annotation",
        " ranking"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8f41e7ea7cdb26e23599348cc292d4178f61c3724b87dcfb78d0d7459bacfe41",
                "md5": "a0a4ac02961dd8dcedbe82ea0fde8604",
                "sha256": "bd6c2bfb33e3ca60cd9ab5d89b7161b6e59af28952605651bee557852a4c5f2e"
            },
            "downloads": -1,
            "filename": "howard_ann-0.12.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a0a4ac02961dd8dcedbe82ea0fde8604",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.11,>=3.10",
            "size": 415727,
            "upload_time": "2024-12-20T12:16:55",
            "upload_time_iso_8601": "2024-12-20T12:16:55.571631Z",
            "url": "https://files.pythonhosted.org/packages/8f/41/e7ea7cdb26e23599348cc292d4178f61c3724b87dcfb78d0d7459bacfe41/howard_ann-0.12.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "652dc923bef61683a1f86019d5e73e8c332391466e63b42bf0e741a0574cf085",
                "md5": "14ecc9e8951b5a839d6b9f68fd3d8569",
                "sha256": "4c3ce0937e49d3b6a5c9b89c2971f98fd3bfe3f7893edb4f1b849a20fd910dd6"
            },
            "downloads": -1,
            "filename": "howard_ann-0.12.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "14ecc9e8951b5a839d6b9f68fd3d8569",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.11,>=3.10",
            "size": 465604,
            "upload_time": "2024-12-20T12:16:57",
            "upload_time_iso_8601": "2024-12-20T12:16:57.298553Z",
            "url": "https://files.pythonhosted.org/packages/65/2d/c923bef61683a1f86019d5e73e8c332391466e63b42bf0e741a0574cf085/howard_ann-0.12.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-20 12:16:57",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "bioinfo-chru-strasbourg",
    "github_project": "howard",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "pandas",
            "specs": [
                [
                    "~=",
                    "2.2.0"
                ]
            ]
        },
        {
            "name": "duckdb",
            "specs": [
                [
                    "~=",
                    "1.0.0"
                ]
            ]
        },
        {
            "name": "pyarrow",
            "specs": [
                [
                    "~=",
                    "16.1.0"
                ]
            ]
        },
        {
            "name": "bio",
            "specs": [
                [
                    "~=",
                    "1.7.0"
                ]
            ]
        },
        {
            "name": "pyvcf3",
            "specs": [
                [
                    "~=",
                    "1.0.0"
                ]
            ]
        },
        {
            "name": "dask",
            "specs": [
                [
                    "~=",
                    "2023.12.0"
                ]
            ]
        },
        {
            "name": "pytest",
            "specs": [
                [
                    "~=",
                    "8.2.0"
                ]
            ]
        },
        {
            "name": "coverage",
            "specs": [
                [
                    "~=",
                    "7.5.0"
                ]
            ]
        },
        {
            "name": "fastparquet",
            "specs": [
                [
                    "~=",
                    "2024.5.0"
                ]
            ]
        },
        {
            "name": "polars",
            "specs": [
                [
                    "~=",
                    "0.20.0"
                ]
            ]
        },
        {
            "name": "genomepy",
            "specs": [
                [
                    "~=",
                    "0.16.0"
                ]
            ]
        },
        {
            "name": "flake8",
            "specs": [
                [
                    "~=",
                    "7.1.0"
                ]
            ]
        },
        {
            "name": "lazy",
            "specs": [
                [
                    "~=",
                    "1.6.0"
                ]
            ]
        },
        {
            "name": "pynose",
            "specs": [
                [
                    "~=",
                    "1.5.0"
                ]
            ]
        },
        {
            "name": "pyfaidx",
            "specs": [
                [
                    "~=",
                    "0.8.0"
                ]
            ]
        },
        {
            "name": "multiprocesspandas",
            "specs": [
                [
                    "~=",
                    "1.1.0"
                ]
            ]
        },
        {
            "name": "bgzip",
            "specs": [
                [
                    "~=",
                    "0.4.0"
                ]
            ]
        },
        {
            "name": "pgzip",
            "specs": [
                [
                    "~=",
                    "0.3.0"
                ]
            ]
        },
        {
            "name": "mgzip",
            "specs": [
                [
                    "~=",
                    "0.2.0"
                ]
            ]
        },
        {
            "name": "pysam",
            "specs": [
                [
                    "~=",
                    "0.22.0"
                ]
            ]
        },
        {
            "name": "jproperties",
            "specs": [
                [
                    "~=",
                    "2.1.0"
                ]
            ]
        },
        {
            "name": "psutil",
            "specs": [
                [
                    "~=",
                    "6.0.0"
                ]
            ]
        },
        {
            "name": "markdown",
            "specs": [
                [
                    "~=",
                    "3.6.0"
                ]
            ]
        },
        {
            "name": "tabulate",
            "specs": [
                [
                    "~=",
                    "0.9.0"
                ]
            ]
        },
        {
            "name": "md_toc",
            "specs": [
                [
                    "~=",
                    "9.0.0"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    "~=",
                    "1.26.0"
                ]
            ]
        },
        {
            "name": "beautifulsoup4",
            "specs": [
                [
                    "~=",
                    "4.12.0"
                ]
            ]
        },
        {
            "name": "pyBigWig",
            "specs": [
                [
                    "~=",
                    "0.3.0"
                ]
            ]
        },
        {
            "name": "cyvcf2",
            "specs": [
                [
                    "~=",
                    "0.31.0"
                ]
            ]
        },
        {
            "name": "pypandoc_binary",
            "specs": [
                [
                    "~=",
                    "1.14.0"
                ]
            ]
        }
    ],
    "lcname": "howard-ann"
}

Antony Le Bechec