| Name | viralQC JSON | 
            
| Version | 
                  0.4.0
                   
                  JSON | 
            
 | download  | 
            
| home_page | None  | 
            
| Summary | A tool and package for quality control of consensus virus genomes. | 
            | upload_time | 2025-10-11 13:08:19 | 
            | maintainer | None | 
            
            | docs_url | None | 
            | author | None | 
            
            | requires_python | <3.12,>=3.8 | 
            
            
            | license | MIT | 
            | keywords | 
                
                    python
                
                     quality-control
                
                     snakemake
                
                     virus
                 | 
            | VCS | 
                
                     | 
                
            
            | bugtrack_url | 
                
                 | 
             
            
            | requirements | 
                
                  No requirements were recorded.
                
             | 
            
| Travis-CI | 
                
                   No Travis.
                
             | 
            | coveralls test coverage | 
                
                   No coveralls.
                
             | 
        
        
            
            ## Install
ViralQC is a tool and package for quality control of consensus virus genomes which uses the [nextclade tool](https://docs.nextstrain.org/projects/nextclade/en/stable/), [BLAST](https://www.ncbi.nlm.nih.gov/books/NBK279690/) and a series of internal logics to classify viral sequences and perform quality control of complete genomes, regions or target genes.
### From pip
First, install the dependencies:
- [Snakemake 7.32](https://snakemake.readthedocs.io/en/v7.32.0/getting_started/installation.html)
- [Ncbi-blast 2.16](https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.16.0/)
- [Nextclade 3.15](https://docs.nextstrain.org/projects/nextclade/en/3.15.3/user/nextclade-cli/installation/)
- [Seqtk 1.5](https://github.com/lh3/seqtk/releases/tag/v1.5)
- [Python < 3.12](https://www.python.org/downloads/)
or with micromamba
```bash
micromamba install \
  -c conda-forge \
  -c bioconda \
  "python>=3.8.0,<3.12.0" \
  "snakemake-minimal>=7.32.0,<7.33.0" \
  "blast>=2.16.0,<2.17.0" \
  "nextclade>=3.15.0,<3.16.0" \
  "seqtk>=1.5.0,<1.6.0"
```
Then, install viralQC
```bash
pip install viralQC
```
### From Source
```bash
git clone https://github.com/InstitutoTodosPelaSaude/viralQC.git
cd viralQC
```
#### Dependencies
```bash
micromamba env create -f env.yml
micromamba activate viralQC
```
#### viralQC
```bash
pip install .
```
### Check installation (CLI)
```bash
vqc --help
```
## Usage (CLI)
### get-nextclade-datasets
This command configures local datasets using nextclade. It is necessary to run at least once to generate a local copy of the nextclade datasets, before running the `run-from-fasta` command
```bash
vqc get-nextclade-datasets --cores 2
```
A directory name can be specified, the default is `datasets`.
```bash
vqc get-nextclade-datasets --cores 2 --datasets-dir <directory_name>
```
### get-blast-database
This command configures local blast database with all ncbi refseq viral genomes. It is necessary to run at least once to generate a local blast database, before running the `run-from-fasta` command.
```bash
vqc get-blast-database --cores 2
```
A output directory name can be specified, the default is `datasets`.
```bash
vqc get-nextclade-datasets --cores 2 --output-dir <directory_name>
```
### run-from-fasta
This command runs several steps to identify viruses represented in the input FASTA file and executes Nextclade for each identified virus/dataset.
#### run-from-fasta
```bash
vqc run-from-fasta --sequences-fasta test_data/sequences.fasta
```
Some parameters can be specified:
- `--output-dir` — Output directory name. **Default:** `output`
- `--output-file` - File to write final results. Valid extensions: .csv, .tsv or .json. **Default:** `results.tsv`
- `--datasets-dir` — Path to the local Nextclade datasets directory. **Default:** `datasets`
- `--ns-min-score` — Minimum score used by the Nextclade `sort` command. **Default:** `0.1`
- `--ns-min-hits` — Minimum number of hits for Nextclade to consider a dataset. **Default:** `10`
- `--blast-database` - Path to store local blast database. **Default:** `datasets/blast.fasta`
- `--identity-threshold` - Percentual identity threshold for BLAST analysis. **Default:** `0.9`
- `--cores` — Number of threads used in `nextclade sort` and `nextclade run`. **Default:** `1`
The output directory has the following structure:
```
├── <datasets>                    # Output from nextclade sort; sequences for each dataset split into sequences.fa files.
├── datasets_selected.tsv         # Formatted nextclade sort output showing the mapping between input sequences and local datasets.
├── <virus/dataset>.nextclade.tsv # Nextclade run output for each identified virus, including clade assignments and QC metrics.
├── unmapped_sequences.txt        # Names of input sequences that were not mapped to any virus on nextclade sort.
├── unmapped_sequences.blast.tsv  # BLAST results for unmapped sequences.
└── viruses.tsv                   # Nextclade sort output showing the mapping between input sequences and remote
```
## Usage (API)
```bash
vqc-server
```
Go to `http://127.0.0.1:8000/docs`
### Development
Install development dependencies and run `black` into `viralqc` directory.
```bash
pip install -e ".[dev]"
black viralqc
```
            
         
        Raw data
        
            {
    "_id": null,
    "home_page": null,
    "name": "viralQC",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.12,>=3.8",
    "maintainer_email": null,
    "keywords": "python, quality-control, snakemake, virus",
    "author": null,
    "author_email": "Filipe Zimmer Dezordi <zimmer.filipe@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/6d/e8/71191533b433838e3fafa428384e6f2612a39c319e5ef15f815cdbb2a039/viralqc-0.4.0.tar.gz",
    "platform": null,
    "description": "## Install\n\nViralQC is a tool and package for quality control of consensus virus genomes which uses the [nextclade tool](https://docs.nextstrain.org/projects/nextclade/en/stable/), [BLAST](https://www.ncbi.nlm.nih.gov/books/NBK279690/) and a series of internal logics to classify viral sequences and perform quality control of complete genomes, regions or target genes.\n\n### From pip\n\nFirst, install the dependencies:\n\n- [Snakemake 7.32](https://snakemake.readthedocs.io/en/v7.32.0/getting_started/installation.html)\n- [Ncbi-blast 2.16](https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.16.0/)\n- [Nextclade 3.15](https://docs.nextstrain.org/projects/nextclade/en/3.15.3/user/nextclade-cli/installation/)\n- [Seqtk 1.5](https://github.com/lh3/seqtk/releases/tag/v1.5)\n- [Python < 3.12](https://www.python.org/downloads/)\n\nor with micromamba\n\n```bash\nmicromamba install \\\n  -c conda-forge \\\n  -c bioconda \\\n  \"python>=3.8.0,<3.12.0\" \\\n  \"snakemake-minimal>=7.32.0,<7.33.0\" \\\n  \"blast>=2.16.0,<2.17.0\" \\\n  \"nextclade>=3.15.0,<3.16.0\" \\\n  \"seqtk>=1.5.0,<1.6.0\"\n```\n\nThen, install viralQC\n\n```bash\npip install viralQC\n```\n\n### From Source\n\n```bash\ngit clone https://github.com/InstitutoTodosPelaSaude/viralQC.git\ncd viralQC\n```\n\n#### Dependencies\n\n```bash\nmicromamba env create -f env.yml\nmicromamba activate viralQC\n```\n\n#### viralQC\n\n```bash\npip install .\n```\n\n### Check installation (CLI)\n\n```bash\nvqc --help\n```\n\n## Usage (CLI)\n\n### get-nextclade-datasets\n\nThis command configures local datasets using nextclade. It is necessary to run at least once to generate a local copy of the nextclade datasets, before running the `run-from-fasta` command\n\n```bash\nvqc get-nextclade-datasets --cores 2\n```\n\nA directory name can be specified, the default is `datasets`.\n\n```bash\nvqc get-nextclade-datasets --cores 2 --datasets-dir <directory_name>\n```\n\n### get-blast-database\n\nThis command configures local blast database with all ncbi refseq viral genomes. It is necessary to run at least once to generate a local blast database, before running the `run-from-fasta` command.\n\n```bash\nvqc get-blast-database --cores 2\n```\n\nA output directory name can be specified, the default is `datasets`.\n\n```bash\nvqc get-nextclade-datasets --cores 2 --output-dir <directory_name>\n```\n\n### run-from-fasta\n\nThis command runs several steps to identify viruses represented in the input FASTA file and executes Nextclade for each identified virus/dataset.\n\n#### run-from-fasta\n\n```bash\nvqc run-from-fasta --sequences-fasta test_data/sequences.fasta\n```\n\nSome parameters can be specified:\n\n- `--output-dir` \u2014 Output directory name. **Default:** `output`\n- `--output-file` - File to write final results. Valid extensions: .csv, .tsv or .json. **Default:** `results.tsv`\n- `--datasets-dir` \u2014 Path to the local Nextclade datasets directory. **Default:** `datasets`\n- `--ns-min-score` \u2014 Minimum score used by the Nextclade `sort` command. **Default:** `0.1`\n- `--ns-min-hits` \u2014 Minimum number of hits for Nextclade to consider a dataset. **Default:** `10`\n- `--blast-database` - Path to store local blast database. **Default:** `datasets/blast.fasta`\n- `--identity-threshold` - Percentual identity threshold for BLAST analysis. **Default:** `0.9`\n- `--cores` \u2014 Number of threads used in `nextclade sort` and `nextclade run`. **Default:** `1`\n\nThe output directory has the following structure:\n\n```\n\u251c\u2500\u2500 <datasets>                    # Output from nextclade sort; sequences for each dataset split into sequences.fa files.\n\u251c\u2500\u2500 datasets_selected.tsv         # Formatted nextclade sort output showing the mapping between input sequences and local datasets.\n\u251c\u2500\u2500 <virus/dataset>.nextclade.tsv # Nextclade run output for each identified virus, including clade assignments and QC metrics.\n\u251c\u2500\u2500 unmapped_sequences.txt        # Names of input sequences that were not mapped to any virus on nextclade sort.\n\u251c\u2500\u2500 unmapped_sequences.blast.tsv  # BLAST results for unmapped sequences.\n\u2514\u2500\u2500 viruses.tsv                   # Nextclade sort output showing the mapping between input sequences and remote\n```\n\n## Usage (API)\n\n```bash\nvqc-server\n```\n\nGo to `http://127.0.0.1:8000/docs`\n\n### Development\n\nInstall development dependencies and run `black` into `viralqc` directory.\n\n```bash\npip install -e \".[dev]\"\nblack viralqc\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A tool and package for quality control of consensus virus genomes.",
    "version": "0.4.0",
    "project_urls": {
        "Homepage": "https://github.com/InstitutoTodosPelaSaude/viralQC"
    },
    "split_keywords": [
        "python",
        " quality-control",
        " snakemake",
        " virus"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "e71ef2dfd6ee34ea0507d591d42d3b45f6d57fd3ddee748177cad5ae98255865",
                "md5": "3d7517f27c94a07d81ddfdc84e3f7bde",
                "sha256": "122029a2226d17b41fea647e460adcf0771a84c49c48a42cac111289ec65aace"
            },
            "downloads": -1,
            "filename": "viralqc-0.4.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3d7517f27c94a07d81ddfdc84e3f7bde",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.12,>=3.8",
            "size": 22999,
            "upload_time": "2025-10-11T13:08:17",
            "upload_time_iso_8601": "2025-10-11T13:08:17.645689Z",
            "url": "https://files.pythonhosted.org/packages/e7/1e/f2dfd6ee34ea0507d591d42d3b45f6d57fd3ddee748177cad5ae98255865/viralqc-0.4.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "6de871191533b433838e3fafa428384e6f2612a39c319e5ef15f815cdbb2a039",
                "md5": "1e592c6af9d5f8c10e6a8970038e9b02",
                "sha256": "d8392e9e42f142168f7f380b966f43f5830f2ce769b6b612516a692311365306"
            },
            "downloads": -1,
            "filename": "viralqc-0.4.0.tar.gz",
            "has_sig": false,
            "md5_digest": "1e592c6af9d5f8c10e6a8970038e9b02",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.12,>=3.8",
            "size": 49342,
            "upload_time": "2025-10-11T13:08:19",
            "upload_time_iso_8601": "2025-10-11T13:08:19.423364Z",
            "url": "https://files.pythonhosted.org/packages/6d/e8/71191533b433838e3fafa428384e6f2612a39c319e5ef15f815cdbb2a039/viralqc-0.4.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-11 13:08:19",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "InstitutoTodosPelaSaude",
    "github_project": "viralQC",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "viralqc"
}