viralQC


NameviralQC JSON
Version 0.1.0 PyPI version JSON
download
home_pageNone
SummaryA tool and package for quality control of consensus virus genomes.
upload_time2025-09-05 12:48:40
maintainerNone
docs_urlNone
authorNone
requires_python<3.12,>=3.8
licenseMIT
keywords python quality-control snakemake virus
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ## Install

### From pip

First, install the dependencies:

- [Snakemake 7.32](https://snakemake.readthedocs.io/en/v7.32.0/getting_started/installation.html)
- [Ncbi-blast 2.16](https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.16.0/)
- [Nextclade 3.15](https://docs.nextstrain.org/projects/nextclade/en/3.15.3/user/nextclade-cli/installation/)
- [Seqtk 1.5](https://github.com/lh3/seqtk/releases/tag/v1.5)
- [Python < 3.12](https://www.python.org/downloads/)

or with micromamba

```bash
micromamba install \
  -c conda-forge \
  -c bioconda \
  "python>=3.8.0,<3.12.0" \
  "snakemake-minimal>=7.32.0,<7.33.0" \
  "blast>=2.16.0,<2.17.0" \
  "nextclade>=3.15.0,<3.16.0" \
  "seqtk>=1.5.0,<1.6.0"
```

Then, install viralQC

```bash
pip install viralQC
```

### From Source

```bash
git clone https://github.com/InstitutoTodosPelaSaude/viralQC.git
cd viralQC
```

#### Dependencies

```bash
micromamba env create -f env.yml
micromamba activate viralQC
```

#### viralQC

```bash
pip install .
```

### Check installation (CLI)

```bash
vqc --help
```

## Usage (CLI)

### get-nextclade-datasets

This command configures local datasets using nextclade. It is necessary to run at least once to generate a local copy of the nextclade datasets, before running the `run-from-fasta` command

```bash
vqc get-nextclade-datasets --cores 2
```

A directory name can be specified, the default is `datasets`.

```bash
vqc get-nextclade-datasets --cores 2 --datasets-dir <directory_name>
```

### run-from-fasta

This command runs several steps to identify viruses represented in the input FASTA file and executes Nextclade for each identified virus/dataset. There are two modes for viral identification:

    - nextclade (default): Run `nextclade sort`
    - blast: Run `blastn`.

#### run-from-fasta (nextclade)

```bash
vqc run-from-fasta --sequences-fasta test_data/sequences.fasta
```

Some parameters can be specified:

- `--output-dir` — Output directory name. **Default:** `output`
- `--datasets-dir` — Path to the local Nextclade datasets directory. **Default:** `datasets`
- `--ns-min-score` — Minimum score used by the Nextclade `sort` command. **Default:** `0.1`
- `--ns-min-hits` — Minimum number of hits for Nextclade to consider a dataset. **Default:** `10`
- `--cores` — Number of threads used in `nextclade sort` and `nextclade run`. **Default:** `1`

The output directory has the following structure:

```
├── <datasets>                    # Output from nextclade sort; sequences for each dataset split into sequences.fa files.
├── datasets_selected.tsv         # Formatted nextclade sort output showing the mapping between input sequences and local datasets.
├── <virus/dataset>.nextclade.tsv # Nextclade run output for each identified virus, including clade assignments and QC metrics.
├── unmapped_sequences.txt        # Names of input sequences that were not mapped to any virus.
└── viruses.tsv                   # Nextclade sort output showing the mapping between input sequences and remote
```

#### run-from-fasta (nextclade)

For a run-from-fasta analysis using BLAST to identify the corresponding virus, a BLAST database must first be created:

```bash
vqc get-blast-database
```

Some parameters can be specified:

- `--output-dir` — Path to store the BLAST database.. **Default:** `datasets`
- `--datasets-dir` — Path of local nextclade datasets. **Default:** `datasets`
- `--cores` — Number of threads used in `nextclade sort` and `nextclade run`. **Default:** `1`


Then:

```bash
vqc run-from-fasta --sequences-fasta test_data/sequences.fasta --sort-mode blast
```

Some parameters can be specified:

- `--output-dir` — Output directory name. **Default:** `output`
- `--datasets-dir` — Path to the local BLAST database directory. **Default:** `datasets`
- `--cores` — Number of threads used in `nextclade sort` and `nextclade run`. **Default:** `1`

The output directory has the following structure:

```bash
├── datasets_selected.tsv            # Mapping between input sequences and local datasets.
├── <virus/dataset>.nextclade.tsv    # Nextclade run output for each identified virus, including clade assignments and QC metrics.
├── sequences.<virus/dataset>.fasta  # Input sequences split by virus/dataset.
├── unmapped_sequences.txt           # Names of input sequences that were not mapped to any virus.
└── viruses.tsv                      # BLAST output.
```

## Usage (API)

```bash
vqc-server
```

Go to `http://127.0.0.1:8000/docs`

### Development

Install development dependencies and run `black` into `viralqc` directory.

```bash
pip install -e ".[dev]"
black viralqc
```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "viralQC",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.12,>=3.8",
    "maintainer_email": null,
    "keywords": "python, quality-control, snakemake, virus",
    "author": null,
    "author_email": "Filipe Zimmer Dezordi <zimmer.filipe@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/3a/26/e3dd4b2ccbef9ce18d5b3eea0b19c109a3f71ca0638dd5ed0efc1cc3dfcd/viralqc-0.1.0.tar.gz",
    "platform": null,
    "description": "## Install\n\n### From pip\n\nFirst, install the dependencies:\n\n- [Snakemake 7.32](https://snakemake.readthedocs.io/en/v7.32.0/getting_started/installation.html)\n- [Ncbi-blast 2.16](https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.16.0/)\n- [Nextclade 3.15](https://docs.nextstrain.org/projects/nextclade/en/3.15.3/user/nextclade-cli/installation/)\n- [Seqtk 1.5](https://github.com/lh3/seqtk/releases/tag/v1.5)\n- [Python < 3.12](https://www.python.org/downloads/)\n\nor with micromamba\n\n```bash\nmicromamba install \\\n  -c conda-forge \\\n  -c bioconda \\\n  \"python>=3.8.0,<3.12.0\" \\\n  \"snakemake-minimal>=7.32.0,<7.33.0\" \\\n  \"blast>=2.16.0,<2.17.0\" \\\n  \"nextclade>=3.15.0,<3.16.0\" \\\n  \"seqtk>=1.5.0,<1.6.0\"\n```\n\nThen, install viralQC\n\n```bash\npip install viralQC\n```\n\n### From Source\n\n```bash\ngit clone https://github.com/InstitutoTodosPelaSaude/viralQC.git\ncd viralQC\n```\n\n#### Dependencies\n\n```bash\nmicromamba env create -f env.yml\nmicromamba activate viralQC\n```\n\n#### viralQC\n\n```bash\npip install .\n```\n\n### Check installation (CLI)\n\n```bash\nvqc --help\n```\n\n## Usage (CLI)\n\n### get-nextclade-datasets\n\nThis command configures local datasets using nextclade. It is necessary to run at least once to generate a local copy of the nextclade datasets, before running the `run-from-fasta` command\n\n```bash\nvqc get-nextclade-datasets --cores 2\n```\n\nA directory name can be specified, the default is `datasets`.\n\n```bash\nvqc get-nextclade-datasets --cores 2 --datasets-dir <directory_name>\n```\n\n### run-from-fasta\n\nThis command runs several steps to identify viruses represented in the input FASTA file and executes Nextclade for each identified virus/dataset. There are two modes for viral identification:\n\n    - nextclade (default): Run `nextclade sort`\n    - blast: Run `blastn`.\n\n#### run-from-fasta (nextclade)\n\n```bash\nvqc run-from-fasta --sequences-fasta test_data/sequences.fasta\n```\n\nSome parameters can be specified:\n\n- `--output-dir` \u2014 Output directory name. **Default:** `output`\n- `--datasets-dir` \u2014 Path to the local Nextclade datasets directory. **Default:** `datasets`\n- `--ns-min-score` \u2014 Minimum score used by the Nextclade `sort` command. **Default:** `0.1`\n- `--ns-min-hits` \u2014 Minimum number of hits for Nextclade to consider a dataset. **Default:** `10`\n- `--cores` \u2014 Number of threads used in `nextclade sort` and `nextclade run`. **Default:** `1`\n\nThe output directory has the following structure:\n\n```\n\u251c\u2500\u2500 <datasets>                    # Output from nextclade sort; sequences for each dataset split into sequences.fa files.\n\u251c\u2500\u2500 datasets_selected.tsv         # Formatted nextclade sort output showing the mapping between input sequences and local datasets.\n\u251c\u2500\u2500 <virus/dataset>.nextclade.tsv # Nextclade run output for each identified virus, including clade assignments and QC metrics.\n\u251c\u2500\u2500 unmapped_sequences.txt        # Names of input sequences that were not mapped to any virus.\n\u2514\u2500\u2500 viruses.tsv                   # Nextclade sort output showing the mapping between input sequences and remote\n```\n\n#### run-from-fasta (nextclade)\n\nFor a run-from-fasta analysis using BLAST to identify the corresponding virus, a BLAST database must first be created:\n\n```bash\nvqc get-blast-database\n```\n\nSome parameters can be specified:\n\n- `--output-dir` \u2014 Path to store the BLAST database.. **Default:** `datasets`\n- `--datasets-dir` \u2014 Path of local nextclade datasets. **Default:** `datasets`\n- `--cores` \u2014 Number of threads used in `nextclade sort` and `nextclade run`. **Default:** `1`\n\n\nThen:\n\n```bash\nvqc run-from-fasta --sequences-fasta test_data/sequences.fasta --sort-mode blast\n```\n\nSome parameters can be specified:\n\n- `--output-dir` \u2014 Output directory name. **Default:** `output`\n- `--datasets-dir` \u2014 Path to the local BLAST database directory. **Default:** `datasets`\n- `--cores` \u2014 Number of threads used in `nextclade sort` and `nextclade run`. **Default:** `1`\n\nThe output directory has the following structure:\n\n```bash\n\u251c\u2500\u2500 datasets_selected.tsv            # Mapping between input sequences and local datasets.\n\u251c\u2500\u2500 <virus/dataset>.nextclade.tsv    # Nextclade run output for each identified virus, including clade assignments and QC metrics.\n\u251c\u2500\u2500 sequences.<virus/dataset>.fasta  # Input sequences split by virus/dataset.\n\u251c\u2500\u2500 unmapped_sequences.txt           # Names of input sequences that were not mapped to any virus.\n\u2514\u2500\u2500 viruses.tsv                      # BLAST output.\n```\n\n## Usage (API)\n\n```bash\nvqc-server\n```\n\nGo to `http://127.0.0.1:8000/docs`\n\n### Development\n\nInstall development dependencies and run `black` into `viralqc` directory.\n\n```bash\npip install -e \".[dev]\"\nblack viralqc\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A tool and package for quality control of consensus virus genomes.",
    "version": "0.1.0",
    "project_urls": {
        "Homepage": "https://github.com/InstitutoTodosPelaSaude/viralQC"
    },
    "split_keywords": [
        "python",
        " quality-control",
        " snakemake",
        " virus"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "283ee80c52b6f91f8cac387b8237327278f81b748a43b24a24eba474606947cf",
                "md5": "81ab20c25ff241debdac661331b213e5",
                "sha256": "ab5ea7556ebe5c0dc3f60d50c52b6f628f4edfe4470c3ee4d74640194af19490"
            },
            "downloads": -1,
            "filename": "viralqc-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "81ab20c25ff241debdac661331b213e5",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.12,>=3.8",
            "size": 19187,
            "upload_time": "2025-09-05T12:48:38",
            "upload_time_iso_8601": "2025-09-05T12:48:38.951030Z",
            "url": "https://files.pythonhosted.org/packages/28/3e/e80c52b6f91f8cac387b8237327278f81b748a43b24a24eba474606947cf/viralqc-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "3a26e3dd4b2ccbef9ce18d5b3eea0b19c109a3f71ca0638dd5ed0efc1cc3dfcd",
                "md5": "9ac8a8796f84a40ac3c7aad14202b470",
                "sha256": "88d37889a2cf1dba37e57a7437f5a11fa77cf6ce42e1640def3c6b42aae2d12d"
            },
            "downloads": -1,
            "filename": "viralqc-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "9ac8a8796f84a40ac3c7aad14202b470",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.12,>=3.8",
            "size": 44636,
            "upload_time": "2025-09-05T12:48:40",
            "upload_time_iso_8601": "2025-09-05T12:48:40.564621Z",
            "url": "https://files.pythonhosted.org/packages/3a/26/e3dd4b2ccbef9ce18d5b3eea0b19c109a3f71ca0638dd5ed0efc1cc3dfcd/viralqc-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-05 12:48:40",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "InstitutoTodosPelaSaude",
    "github_project": "viralQC",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "viralqc"
}
        
Elapsed time: 3.79327s