Name | viralQC JSON |
Version |
0.1.0
JSON |
| download |
home_page | None |
Summary | A tool and package for quality control of consensus virus genomes. |
upload_time | 2025-09-05 12:48:40 |
maintainer | None |
docs_url | None |
author | None |
requires_python | <3.12,>=3.8 |
license | MIT |
keywords |
python
quality-control
snakemake
virus
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
## Install
### From pip
First, install the dependencies:
- [Snakemake 7.32](https://snakemake.readthedocs.io/en/v7.32.0/getting_started/installation.html)
- [Ncbi-blast 2.16](https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.16.0/)
- [Nextclade 3.15](https://docs.nextstrain.org/projects/nextclade/en/3.15.3/user/nextclade-cli/installation/)
- [Seqtk 1.5](https://github.com/lh3/seqtk/releases/tag/v1.5)
- [Python < 3.12](https://www.python.org/downloads/)
or with micromamba
```bash
micromamba install \
-c conda-forge \
-c bioconda \
"python>=3.8.0,<3.12.0" \
"snakemake-minimal>=7.32.0,<7.33.0" \
"blast>=2.16.0,<2.17.0" \
"nextclade>=3.15.0,<3.16.0" \
"seqtk>=1.5.0,<1.6.0"
```
Then, install viralQC
```bash
pip install viralQC
```
### From Source
```bash
git clone https://github.com/InstitutoTodosPelaSaude/viralQC.git
cd viralQC
```
#### Dependencies
```bash
micromamba env create -f env.yml
micromamba activate viralQC
```
#### viralQC
```bash
pip install .
```
### Check installation (CLI)
```bash
vqc --help
```
## Usage (CLI)
### get-nextclade-datasets
This command configures local datasets using nextclade. It is necessary to run at least once to generate a local copy of the nextclade datasets, before running the `run-from-fasta` command
```bash
vqc get-nextclade-datasets --cores 2
```
A directory name can be specified, the default is `datasets`.
```bash
vqc get-nextclade-datasets --cores 2 --datasets-dir <directory_name>
```
### run-from-fasta
This command runs several steps to identify viruses represented in the input FASTA file and executes Nextclade for each identified virus/dataset. There are two modes for viral identification:
- nextclade (default): Run `nextclade sort`
- blast: Run `blastn`.
#### run-from-fasta (nextclade)
```bash
vqc run-from-fasta --sequences-fasta test_data/sequences.fasta
```
Some parameters can be specified:
- `--output-dir` — Output directory name. **Default:** `output`
- `--datasets-dir` — Path to the local Nextclade datasets directory. **Default:** `datasets`
- `--ns-min-score` — Minimum score used by the Nextclade `sort` command. **Default:** `0.1`
- `--ns-min-hits` — Minimum number of hits for Nextclade to consider a dataset. **Default:** `10`
- `--cores` — Number of threads used in `nextclade sort` and `nextclade run`. **Default:** `1`
The output directory has the following structure:
```
├── <datasets> # Output from nextclade sort; sequences for each dataset split into sequences.fa files.
├── datasets_selected.tsv # Formatted nextclade sort output showing the mapping between input sequences and local datasets.
├── <virus/dataset>.nextclade.tsv # Nextclade run output for each identified virus, including clade assignments and QC metrics.
├── unmapped_sequences.txt # Names of input sequences that were not mapped to any virus.
└── viruses.tsv # Nextclade sort output showing the mapping between input sequences and remote
```
#### run-from-fasta (nextclade)
For a run-from-fasta analysis using BLAST to identify the corresponding virus, a BLAST database must first be created:
```bash
vqc get-blast-database
```
Some parameters can be specified:
- `--output-dir` — Path to store the BLAST database.. **Default:** `datasets`
- `--datasets-dir` — Path of local nextclade datasets. **Default:** `datasets`
- `--cores` — Number of threads used in `nextclade sort` and `nextclade run`. **Default:** `1`
Then:
```bash
vqc run-from-fasta --sequences-fasta test_data/sequences.fasta --sort-mode blast
```
Some parameters can be specified:
- `--output-dir` — Output directory name. **Default:** `output`
- `--datasets-dir` — Path to the local BLAST database directory. **Default:** `datasets`
- `--cores` — Number of threads used in `nextclade sort` and `nextclade run`. **Default:** `1`
The output directory has the following structure:
```bash
├── datasets_selected.tsv # Mapping between input sequences and local datasets.
├── <virus/dataset>.nextclade.tsv # Nextclade run output for each identified virus, including clade assignments and QC metrics.
├── sequences.<virus/dataset>.fasta # Input sequences split by virus/dataset.
├── unmapped_sequences.txt # Names of input sequences that were not mapped to any virus.
└── viruses.tsv # BLAST output.
```
## Usage (API)
```bash
vqc-server
```
Go to `http://127.0.0.1:8000/docs`
### Development
Install development dependencies and run `black` into `viralqc` directory.
```bash
pip install -e ".[dev]"
black viralqc
```
Raw data
{
"_id": null,
"home_page": null,
"name": "viralQC",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.12,>=3.8",
"maintainer_email": null,
"keywords": "python, quality-control, snakemake, virus",
"author": null,
"author_email": "Filipe Zimmer Dezordi <zimmer.filipe@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/3a/26/e3dd4b2ccbef9ce18d5b3eea0b19c109a3f71ca0638dd5ed0efc1cc3dfcd/viralqc-0.1.0.tar.gz",
"platform": null,
"description": "## Install\n\n### From pip\n\nFirst, install the dependencies:\n\n- [Snakemake 7.32](https://snakemake.readthedocs.io/en/v7.32.0/getting_started/installation.html)\n- [Ncbi-blast 2.16](https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.16.0/)\n- [Nextclade 3.15](https://docs.nextstrain.org/projects/nextclade/en/3.15.3/user/nextclade-cli/installation/)\n- [Seqtk 1.5](https://github.com/lh3/seqtk/releases/tag/v1.5)\n- [Python < 3.12](https://www.python.org/downloads/)\n\nor with micromamba\n\n```bash\nmicromamba install \\\n -c conda-forge \\\n -c bioconda \\\n \"python>=3.8.0,<3.12.0\" \\\n \"snakemake-minimal>=7.32.0,<7.33.0\" \\\n \"blast>=2.16.0,<2.17.0\" \\\n \"nextclade>=3.15.0,<3.16.0\" \\\n \"seqtk>=1.5.0,<1.6.0\"\n```\n\nThen, install viralQC\n\n```bash\npip install viralQC\n```\n\n### From Source\n\n```bash\ngit clone https://github.com/InstitutoTodosPelaSaude/viralQC.git\ncd viralQC\n```\n\n#### Dependencies\n\n```bash\nmicromamba env create -f env.yml\nmicromamba activate viralQC\n```\n\n#### viralQC\n\n```bash\npip install .\n```\n\n### Check installation (CLI)\n\n```bash\nvqc --help\n```\n\n## Usage (CLI)\n\n### get-nextclade-datasets\n\nThis command configures local datasets using nextclade. It is necessary to run at least once to generate a local copy of the nextclade datasets, before running the `run-from-fasta` command\n\n```bash\nvqc get-nextclade-datasets --cores 2\n```\n\nA directory name can be specified, the default is `datasets`.\n\n```bash\nvqc get-nextclade-datasets --cores 2 --datasets-dir <directory_name>\n```\n\n### run-from-fasta\n\nThis command runs several steps to identify viruses represented in the input FASTA file and executes Nextclade for each identified virus/dataset. There are two modes for viral identification:\n\n - nextclade (default): Run `nextclade sort`\n - blast: Run `blastn`.\n\n#### run-from-fasta (nextclade)\n\n```bash\nvqc run-from-fasta --sequences-fasta test_data/sequences.fasta\n```\n\nSome parameters can be specified:\n\n- `--output-dir` \u2014 Output directory name. **Default:** `output`\n- `--datasets-dir` \u2014 Path to the local Nextclade datasets directory. **Default:** `datasets`\n- `--ns-min-score` \u2014 Minimum score used by the Nextclade `sort` command. **Default:** `0.1`\n- `--ns-min-hits` \u2014 Minimum number of hits for Nextclade to consider a dataset. **Default:** `10`\n- `--cores` \u2014 Number of threads used in `nextclade sort` and `nextclade run`. **Default:** `1`\n\nThe output directory has the following structure:\n\n```\n\u251c\u2500\u2500 <datasets> # Output from nextclade sort; sequences for each dataset split into sequences.fa files.\n\u251c\u2500\u2500 datasets_selected.tsv # Formatted nextclade sort output showing the mapping between input sequences and local datasets.\n\u251c\u2500\u2500 <virus/dataset>.nextclade.tsv # Nextclade run output for each identified virus, including clade assignments and QC metrics.\n\u251c\u2500\u2500 unmapped_sequences.txt # Names of input sequences that were not mapped to any virus.\n\u2514\u2500\u2500 viruses.tsv # Nextclade sort output showing the mapping between input sequences and remote\n```\n\n#### run-from-fasta (nextclade)\n\nFor a run-from-fasta analysis using BLAST to identify the corresponding virus, a BLAST database must first be created:\n\n```bash\nvqc get-blast-database\n```\n\nSome parameters can be specified:\n\n- `--output-dir` \u2014 Path to store the BLAST database.. **Default:** `datasets`\n- `--datasets-dir` \u2014 Path of local nextclade datasets. **Default:** `datasets`\n- `--cores` \u2014 Number of threads used in `nextclade sort` and `nextclade run`. **Default:** `1`\n\n\nThen:\n\n```bash\nvqc run-from-fasta --sequences-fasta test_data/sequences.fasta --sort-mode blast\n```\n\nSome parameters can be specified:\n\n- `--output-dir` \u2014 Output directory name. **Default:** `output`\n- `--datasets-dir` \u2014 Path to the local BLAST database directory. **Default:** `datasets`\n- `--cores` \u2014 Number of threads used in `nextclade sort` and `nextclade run`. **Default:** `1`\n\nThe output directory has the following structure:\n\n```bash\n\u251c\u2500\u2500 datasets_selected.tsv # Mapping between input sequences and local datasets.\n\u251c\u2500\u2500 <virus/dataset>.nextclade.tsv # Nextclade run output for each identified virus, including clade assignments and QC metrics.\n\u251c\u2500\u2500 sequences.<virus/dataset>.fasta # Input sequences split by virus/dataset.\n\u251c\u2500\u2500 unmapped_sequences.txt # Names of input sequences that were not mapped to any virus.\n\u2514\u2500\u2500 viruses.tsv # BLAST output.\n```\n\n## Usage (API)\n\n```bash\nvqc-server\n```\n\nGo to `http://127.0.0.1:8000/docs`\n\n### Development\n\nInstall development dependencies and run `black` into `viralqc` directory.\n\n```bash\npip install -e \".[dev]\"\nblack viralqc\n```\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A tool and package for quality control of consensus virus genomes.",
"version": "0.1.0",
"project_urls": {
"Homepage": "https://github.com/InstitutoTodosPelaSaude/viralQC"
},
"split_keywords": [
"python",
" quality-control",
" snakemake",
" virus"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "283ee80c52b6f91f8cac387b8237327278f81b748a43b24a24eba474606947cf",
"md5": "81ab20c25ff241debdac661331b213e5",
"sha256": "ab5ea7556ebe5c0dc3f60d50c52b6f628f4edfe4470c3ee4d74640194af19490"
},
"downloads": -1,
"filename": "viralqc-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "81ab20c25ff241debdac661331b213e5",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.12,>=3.8",
"size": 19187,
"upload_time": "2025-09-05T12:48:38",
"upload_time_iso_8601": "2025-09-05T12:48:38.951030Z",
"url": "https://files.pythonhosted.org/packages/28/3e/e80c52b6f91f8cac387b8237327278f81b748a43b24a24eba474606947cf/viralqc-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "3a26e3dd4b2ccbef9ce18d5b3eea0b19c109a3f71ca0638dd5ed0efc1cc3dfcd",
"md5": "9ac8a8796f84a40ac3c7aad14202b470",
"sha256": "88d37889a2cf1dba37e57a7437f5a11fa77cf6ce42e1640def3c6b42aae2d12d"
},
"downloads": -1,
"filename": "viralqc-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "9ac8a8796f84a40ac3c7aad14202b470",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.12,>=3.8",
"size": 44636,
"upload_time": "2025-09-05T12:48:40",
"upload_time_iso_8601": "2025-09-05T12:48:40.564621Z",
"url": "https://files.pythonhosted.org/packages/3a/26/e3dd4b2ccbef9ce18d5b3eea0b19c109a3f71ca0638dd5ed0efc1cc3dfcd/viralqc-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-05 12:48:40",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "InstitutoTodosPelaSaude",
"github_project": "viralQC",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "viralqc"
}