viralQC


NameviralQC JSON
Version 0.4.0 PyPI version JSON
download
home_pageNone
SummaryA tool and package for quality control of consensus virus genomes.
upload_time2025-10-11 13:08:19
maintainerNone
docs_urlNone
authorNone
requires_python<3.12,>=3.8
licenseMIT
keywords python quality-control snakemake virus
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ## Install

ViralQC is a tool and package for quality control of consensus virus genomes which uses the [nextclade tool](https://docs.nextstrain.org/projects/nextclade/en/stable/), [BLAST](https://www.ncbi.nlm.nih.gov/books/NBK279690/) and a series of internal logics to classify viral sequences and perform quality control of complete genomes, regions or target genes.

### From pip

First, install the dependencies:

- [Snakemake 7.32](https://snakemake.readthedocs.io/en/v7.32.0/getting_started/installation.html)
- [Ncbi-blast 2.16](https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.16.0/)
- [Nextclade 3.15](https://docs.nextstrain.org/projects/nextclade/en/3.15.3/user/nextclade-cli/installation/)
- [Seqtk 1.5](https://github.com/lh3/seqtk/releases/tag/v1.5)
- [Python < 3.12](https://www.python.org/downloads/)

or with micromamba

```bash
micromamba install \
  -c conda-forge \
  -c bioconda \
  "python>=3.8.0,<3.12.0" \
  "snakemake-minimal>=7.32.0,<7.33.0" \
  "blast>=2.16.0,<2.17.0" \
  "nextclade>=3.15.0,<3.16.0" \
  "seqtk>=1.5.0,<1.6.0"
```

Then, install viralQC

```bash
pip install viralQC
```

### From Source

```bash
git clone https://github.com/InstitutoTodosPelaSaude/viralQC.git
cd viralQC
```

#### Dependencies

```bash
micromamba env create -f env.yml
micromamba activate viralQC
```

#### viralQC

```bash
pip install .
```

### Check installation (CLI)

```bash
vqc --help
```

## Usage (CLI)

### get-nextclade-datasets

This command configures local datasets using nextclade. It is necessary to run at least once to generate a local copy of the nextclade datasets, before running the `run-from-fasta` command

```bash
vqc get-nextclade-datasets --cores 2
```

A directory name can be specified, the default is `datasets`.

```bash
vqc get-nextclade-datasets --cores 2 --datasets-dir <directory_name>
```

### get-blast-database

This command configures local blast database with all ncbi refseq viral genomes. It is necessary to run at least once to generate a local blast database, before running the `run-from-fasta` command.

```bash
vqc get-blast-database --cores 2
```

A output directory name can be specified, the default is `datasets`.

```bash
vqc get-nextclade-datasets --cores 2 --output-dir <directory_name>
```

### run-from-fasta

This command runs several steps to identify viruses represented in the input FASTA file and executes Nextclade for each identified virus/dataset.

#### run-from-fasta

```bash
vqc run-from-fasta --sequences-fasta test_data/sequences.fasta
```

Some parameters can be specified:

- `--output-dir` — Output directory name. **Default:** `output`
- `--output-file` - File to write final results. Valid extensions: .csv, .tsv or .json. **Default:** `results.tsv`
- `--datasets-dir` — Path to the local Nextclade datasets directory. **Default:** `datasets`
- `--ns-min-score` — Minimum score used by the Nextclade `sort` command. **Default:** `0.1`
- `--ns-min-hits` — Minimum number of hits for Nextclade to consider a dataset. **Default:** `10`
- `--blast-database` - Path to store local blast database. **Default:** `datasets/blast.fasta`
- `--identity-threshold` - Percentual identity threshold for BLAST analysis. **Default:** `0.9`
- `--cores` — Number of threads used in `nextclade sort` and `nextclade run`. **Default:** `1`

The output directory has the following structure:

```
├── <datasets>                    # Output from nextclade sort; sequences for each dataset split into sequences.fa files.
├── datasets_selected.tsv         # Formatted nextclade sort output showing the mapping between input sequences and local datasets.
├── <virus/dataset>.nextclade.tsv # Nextclade run output for each identified virus, including clade assignments and QC metrics.
├── unmapped_sequences.txt        # Names of input sequences that were not mapped to any virus on nextclade sort.
├── unmapped_sequences.blast.tsv  # BLAST results for unmapped sequences.
└── viruses.tsv                   # Nextclade sort output showing the mapping between input sequences and remote
```

## Usage (API)

```bash
vqc-server
```

Go to `http://127.0.0.1:8000/docs`

### Development

Install development dependencies and run `black` into `viralqc` directory.

```bash
pip install -e ".[dev]"
black viralqc
```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "viralQC",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.12,>=3.8",
    "maintainer_email": null,
    "keywords": "python, quality-control, snakemake, virus",
    "author": null,
    "author_email": "Filipe Zimmer Dezordi <zimmer.filipe@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/6d/e8/71191533b433838e3fafa428384e6f2612a39c319e5ef15f815cdbb2a039/viralqc-0.4.0.tar.gz",
    "platform": null,
    "description": "## Install\n\nViralQC is a tool and package for quality control of consensus virus genomes which uses the [nextclade tool](https://docs.nextstrain.org/projects/nextclade/en/stable/), [BLAST](https://www.ncbi.nlm.nih.gov/books/NBK279690/) and a series of internal logics to classify viral sequences and perform quality control of complete genomes, regions or target genes.\n\n### From pip\n\nFirst, install the dependencies:\n\n- [Snakemake 7.32](https://snakemake.readthedocs.io/en/v7.32.0/getting_started/installation.html)\n- [Ncbi-blast 2.16](https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.16.0/)\n- [Nextclade 3.15](https://docs.nextstrain.org/projects/nextclade/en/3.15.3/user/nextclade-cli/installation/)\n- [Seqtk 1.5](https://github.com/lh3/seqtk/releases/tag/v1.5)\n- [Python < 3.12](https://www.python.org/downloads/)\n\nor with micromamba\n\n```bash\nmicromamba install \\\n  -c conda-forge \\\n  -c bioconda \\\n  \"python>=3.8.0,<3.12.0\" \\\n  \"snakemake-minimal>=7.32.0,<7.33.0\" \\\n  \"blast>=2.16.0,<2.17.0\" \\\n  \"nextclade>=3.15.0,<3.16.0\" \\\n  \"seqtk>=1.5.0,<1.6.0\"\n```\n\nThen, install viralQC\n\n```bash\npip install viralQC\n```\n\n### From Source\n\n```bash\ngit clone https://github.com/InstitutoTodosPelaSaude/viralQC.git\ncd viralQC\n```\n\n#### Dependencies\n\n```bash\nmicromamba env create -f env.yml\nmicromamba activate viralQC\n```\n\n#### viralQC\n\n```bash\npip install .\n```\n\n### Check installation (CLI)\n\n```bash\nvqc --help\n```\n\n## Usage (CLI)\n\n### get-nextclade-datasets\n\nThis command configures local datasets using nextclade. It is necessary to run at least once to generate a local copy of the nextclade datasets, before running the `run-from-fasta` command\n\n```bash\nvqc get-nextclade-datasets --cores 2\n```\n\nA directory name can be specified, the default is `datasets`.\n\n```bash\nvqc get-nextclade-datasets --cores 2 --datasets-dir <directory_name>\n```\n\n### get-blast-database\n\nThis command configures local blast database with all ncbi refseq viral genomes. It is necessary to run at least once to generate a local blast database, before running the `run-from-fasta` command.\n\n```bash\nvqc get-blast-database --cores 2\n```\n\nA output directory name can be specified, the default is `datasets`.\n\n```bash\nvqc get-nextclade-datasets --cores 2 --output-dir <directory_name>\n```\n\n### run-from-fasta\n\nThis command runs several steps to identify viruses represented in the input FASTA file and executes Nextclade for each identified virus/dataset.\n\n#### run-from-fasta\n\n```bash\nvqc run-from-fasta --sequences-fasta test_data/sequences.fasta\n```\n\nSome parameters can be specified:\n\n- `--output-dir` \u2014 Output directory name. **Default:** `output`\n- `--output-file` - File to write final results. Valid extensions: .csv, .tsv or .json. **Default:** `results.tsv`\n- `--datasets-dir` \u2014 Path to the local Nextclade datasets directory. **Default:** `datasets`\n- `--ns-min-score` \u2014 Minimum score used by the Nextclade `sort` command. **Default:** `0.1`\n- `--ns-min-hits` \u2014 Minimum number of hits for Nextclade to consider a dataset. **Default:** `10`\n- `--blast-database` - Path to store local blast database. **Default:** `datasets/blast.fasta`\n- `--identity-threshold` - Percentual identity threshold for BLAST analysis. **Default:** `0.9`\n- `--cores` \u2014 Number of threads used in `nextclade sort` and `nextclade run`. **Default:** `1`\n\nThe output directory has the following structure:\n\n```\n\u251c\u2500\u2500 <datasets>                    # Output from nextclade sort; sequences for each dataset split into sequences.fa files.\n\u251c\u2500\u2500 datasets_selected.tsv         # Formatted nextclade sort output showing the mapping between input sequences and local datasets.\n\u251c\u2500\u2500 <virus/dataset>.nextclade.tsv # Nextclade run output for each identified virus, including clade assignments and QC metrics.\n\u251c\u2500\u2500 unmapped_sequences.txt        # Names of input sequences that were not mapped to any virus on nextclade sort.\n\u251c\u2500\u2500 unmapped_sequences.blast.tsv  # BLAST results for unmapped sequences.\n\u2514\u2500\u2500 viruses.tsv                   # Nextclade sort output showing the mapping between input sequences and remote\n```\n\n## Usage (API)\n\n```bash\nvqc-server\n```\n\nGo to `http://127.0.0.1:8000/docs`\n\n### Development\n\nInstall development dependencies and run `black` into `viralqc` directory.\n\n```bash\npip install -e \".[dev]\"\nblack viralqc\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A tool and package for quality control of consensus virus genomes.",
    "version": "0.4.0",
    "project_urls": {
        "Homepage": "https://github.com/InstitutoTodosPelaSaude/viralQC"
    },
    "split_keywords": [
        "python",
        " quality-control",
        " snakemake",
        " virus"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "e71ef2dfd6ee34ea0507d591d42d3b45f6d57fd3ddee748177cad5ae98255865",
                "md5": "3d7517f27c94a07d81ddfdc84e3f7bde",
                "sha256": "122029a2226d17b41fea647e460adcf0771a84c49c48a42cac111289ec65aace"
            },
            "downloads": -1,
            "filename": "viralqc-0.4.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3d7517f27c94a07d81ddfdc84e3f7bde",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.12,>=3.8",
            "size": 22999,
            "upload_time": "2025-10-11T13:08:17",
            "upload_time_iso_8601": "2025-10-11T13:08:17.645689Z",
            "url": "https://files.pythonhosted.org/packages/e7/1e/f2dfd6ee34ea0507d591d42d3b45f6d57fd3ddee748177cad5ae98255865/viralqc-0.4.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "6de871191533b433838e3fafa428384e6f2612a39c319e5ef15f815cdbb2a039",
                "md5": "1e592c6af9d5f8c10e6a8970038e9b02",
                "sha256": "d8392e9e42f142168f7f380b966f43f5830f2ce769b6b612516a692311365306"
            },
            "downloads": -1,
            "filename": "viralqc-0.4.0.tar.gz",
            "has_sig": false,
            "md5_digest": "1e592c6af9d5f8c10e6a8970038e9b02",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.12,>=3.8",
            "size": 49342,
            "upload_time": "2025-10-11T13:08:19",
            "upload_time_iso_8601": "2025-10-11T13:08:19.423364Z",
            "url": "https://files.pythonhosted.org/packages/6d/e8/71191533b433838e3fafa428384e6f2612a39c319e5ef15f815cdbb2a039/viralqc-0.4.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-11 13:08:19",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "InstitutoTodosPelaSaude",
    "github_project": "viralQC",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "viralqc"
}
        
Elapsed time: 0.87655s