Name | varseek JSON |
Version |
0.1.1
JSON |
| download |
home_page | None |
Summary | Efficient variant screening of RNA-seq and DNA-seq data using k-mer-based alignment against a reference of known variants. |
upload_time | 2025-08-12 08:17:59 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.9 |
license | BSD 2-Clause License
Copyright (c) 2024, Pachter Lab
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
keywords |
varseek
bioinformatics
variant-analysis
k-mer
rna-seq
dna-seq
|
VCS |
 |
bugtrack_url |
|
requirements |
numpy
pandas
matplotlib
tqdm
anndata
kb-python
gget
scipy
pyfastx
pysam
pyarrow
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# varseek
[](https://pypi.org/project/varseek)

[](LICENSE)


<!--[](https://anaconda.org/bioconda/varseek)-->
<!--[](https://anaconda.org/bioconda/varseek)-->

`varseek` is a free, open-source command-line tool and Python package that provides variant screening of RNA-seq and DNA-seq data using k-mer-based alignment against a reference of known variants. The name comes from "seeking variants" or, alternatively, "seeing k-variants" (where a "k-variant" is defined as a k-mer containing a variant).

The two commands used in a standard workflow are `varseek ref` and `varseek count`. `varseek ref` generates a variant-containing reference sequence (VCRS) index that serves as the basis for variant calling. `varseek count` pseudoaligns RNA-seq or DNA-seq reads against the VCRS index and generates a variant count matrix. The variant count matrix can be used for downstream analysis. Each step wraps around other steps within the varseek package and the kb-python package, as described below.

The functions of `varseek` are described in the table below.
| Description | Bash | Python (with `import varseek as vk`) |
|-------------------------------------------------------------------|-------------------|--------------------------------------|
| Build a variant-containing reference sequence (VCRS) fasta file | `vk build ...` | `vk.build(...)` |
| Describe the VCRS reference in a dataframe for filtering | `vk info ...` | `vk.info(...)` |
| Filter the VCRS file based on the CSV generated from varseek info | `vk filter ...` | `vk.filter(...)` |
| Preprocess the FASTQ files before pseudoalignment | `vk fastqpp ...` | `vk.fastqpp(...)` |
| Process the variant count matrix | `vk clean ...` | `vk.clean(...)` |
| Analyze the variant count matrix results | `vk summarize ...`| `vk.summarize(...)` |
| Wrap vk build, vk info, vk filter, and kb ref | `vk ref ...` | `vk.ref(...)` |
| Wrap vk fastqpp, kb count, vk clean, and vk summarize | `vk count ...` | `vk.count(...)` |
| Create synthetic RNA-seq dataset with variant-containing reads | `vk sim ...` | `vk.sim(...)` |
After aligning and generating a variant count matrix with `varseek`, you can explore the data using our pre-built notebooks. The notebooks are described in the table below.
| Description | Notebook |
|-----------------------------------------------|--------------------------------------------------------------------------------------|
| Preprocessing the variant count matrix | [3_matrix_preprocessing.ipynb](./3_matrix_preprocessing.ipynb) |
| Sequence visualization of variants | [4_1_variant_analysis_sequence_visualization.ipynb](./4_1_variant_analysis_sequence_visualization.ipynb) |
| Heatmap visualization of variant patterns | [4_2_variant_analysis_heatmaps.ipynb](./4_2_variant_analysis_heatmaps.ipynb) |
| Protein-level variant analysis | [4_3_variant_analysis_protein_variant.ipynb](./4_3_variant_analysis_protein_variant.ipynb) |
| Heatmap analysis of gene expression | [5_1_gene_analysis_heatmaps.ipynb](./5_1_gene_analysis_heatmaps.ipynb) |
| Drug-target analysis for genes | [5_2_gene_analysis_drugs.ipynb](./5_2_gene_analysis_drugs.ipynb) |
| Pathway analysis using Enrichr | [6_1_pathway_analysis_enrichr.ipynb](./6_1_pathway_analysis_enrichr.ipynb) |
| Gene Ontology enrichment analysis (GOEA) | [6_2_pathway_analysis_goea.ipynb](./6_2_pathway_analysis_goea.ipynb) |
You can find more examples of how to use varseek in the GitHub repository for our preprint [GitHub - pachterlab/RLSRWP_2025](https://github.com/pachterlab/RLSRWP_2025.git).
If you use `varseek` in a publication, please cite the following study:
```
PAPER CITATION
```
Read the article here: PAPER DOI
# Installation
```bash
pip install varseek
```
# 🪄 Quick start guide
## 1. Acquire a Reference
Follow one of the below options:
### a. Download a Pre-built Reference
- (optional) View all downloadable references: `vk ref --list_downloadable_references`
- `vk ref --download --variants VARIANTS --sequences SEQUENCES`
### b. Make custom reference – screen for user-defined variants
- `vk ref --variants VARIANTS --sequences SEQUENCES ...`
### c. Customize reference building process – customize the VCRS filtering process (e.g., add additional information by which to filter, add custom filtering logic, tune filtering parameters based on the results of intermediate steps, etc.)
- `vk build --variants VARIANTS --sequences SEQUENCES ...`
- (optional) `vk info --input_dir INPUT_DIR ...`
- (optional) `vk filter --input_dir INPUT_DIR ...`
- `kb ref --workflow custom --index INDEX ...`
## 2. Screen for variants
Follow one of the below options:
### a. Standard workflow
- (optional) fastq quality control
- `vk count --index INDEX --t2g T2G ... --fastqs FASTQ1 FASTQ2...`
### b. Customize variant screening process - additional fastq preprocessing, custom count matrix processing
- (optional) fastq quality control
- (optional) `vk fastqpp ... --fastqs FASTQ1 FASTQ2...`
- `kb count --index INDEX --t2g T2G ... --fastqs FASTQ1 FASTQ2...`
- (optional) `kb count --index REFERENCE_INDEX --t2g REFERENCE_T2G ... --fastqs FASTQ1 FASTQ2...`
- (optional) `vk clean --adata ADATA ...`
- (optional) `vk summarize --adata ADATA ...`
**Examples for getting started:** [GitHub - pachterlab/varseek](https://github.com/pachterlab/varseek-examples.git)
**Manuscript**: ...
**Repository for manuscript figures**: [GitHub - pachterlab/RLSRP_2025](https://github.com/pachterlab/RLSRP_2025.git)
Raw data
{
"_id": null,
"home_page": null,
"name": "varseek",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": "Joseph Rich <josephrich98@gmail.com>",
"keywords": "varseek, bioinformatics, variant-analysis, k-mer, RNA-seq, DNA-seq",
"author": null,
"author_email": "Joseph Rich <josephrich98@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/7d/51/8aacda62660f88b7283865a76ac330574d64a2f83d6aeab3bc62d858248f/varseek-0.1.1.tar.gz",
"platform": null,
"description": "# varseek\n[](https://pypi.org/project/varseek)\n\n[](LICENSE)\n\n\n\n<!--[](https://anaconda.org/bioconda/varseek)-->\n<!--[](https://anaconda.org/bioconda/varseek)-->\n\n\n\n`varseek` is a free, open-source command-line tool and Python package that provides variant screening of RNA-seq and DNA-seq data using k-mer-based alignment against a reference of known variants. The name comes from \"seeking variants\" or, alternatively, \"seeing k-variants\" (where a \"k-variant\" is defined as a k-mer containing a variant).\n \n\n\nThe two commands used in a standard workflow are `varseek ref` and `varseek count`. `varseek ref` generates a variant-containing reference sequence (VCRS) index that serves as the basis for variant calling. `varseek count` pseudoaligns RNA-seq or DNA-seq reads against the VCRS index and generates a variant count matrix. The variant count matrix can be used for downstream analysis. Each step wraps around other steps within the varseek package and the kb-python package, as described below.\n\n\n\nThe functions of `varseek` are described in the table below.\n\n| Description | Bash | Python (with `import varseek as vk`) |\n|-------------------------------------------------------------------|-------------------|--------------------------------------|\n| Build a variant-containing reference sequence (VCRS) fasta file | `vk build ...` | `vk.build(...)` |\n| Describe the VCRS reference in a dataframe for filtering | `vk info ...` | `vk.info(...)` |\n| Filter the VCRS file based on the CSV generated from varseek info | `vk filter ...` | `vk.filter(...)` |\n| Preprocess the FASTQ files before pseudoalignment | `vk fastqpp ...` | `vk.fastqpp(...)` |\n| Process the variant count matrix | `vk clean ...` | `vk.clean(...)` |\n| Analyze the variant count matrix results | `vk summarize ...`| `vk.summarize(...)` |\n| Wrap vk build, vk info, vk filter, and kb ref | `vk ref ...` | `vk.ref(...)` |\n| Wrap vk fastqpp, kb count, vk clean, and vk summarize | `vk count ...` | `vk.count(...)` |\n| Create synthetic RNA-seq dataset with variant-containing reads | `vk sim ...` | `vk.sim(...)` |\n\nAfter aligning and generating a variant count matrix with `varseek`, you can explore the data using our pre-built notebooks. The notebooks are described in the table below.\n\n| Description | Notebook |\n|-----------------------------------------------|--------------------------------------------------------------------------------------|\n| Preprocessing the variant count matrix | [3_matrix_preprocessing.ipynb](./3_matrix_preprocessing.ipynb) |\n| Sequence visualization of variants | [4_1_variant_analysis_sequence_visualization.ipynb](./4_1_variant_analysis_sequence_visualization.ipynb) |\n| Heatmap visualization of variant patterns | [4_2_variant_analysis_heatmaps.ipynb](./4_2_variant_analysis_heatmaps.ipynb) |\n| Protein-level variant analysis | [4_3_variant_analysis_protein_variant.ipynb](./4_3_variant_analysis_protein_variant.ipynb) |\n| Heatmap analysis of gene expression | [5_1_gene_analysis_heatmaps.ipynb](./5_1_gene_analysis_heatmaps.ipynb) |\n| Drug-target analysis for genes | [5_2_gene_analysis_drugs.ipynb](./5_2_gene_analysis_drugs.ipynb) |\n| Pathway analysis using Enrichr | [6_1_pathway_analysis_enrichr.ipynb](./6_1_pathway_analysis_enrichr.ipynb) |\n| Gene Ontology enrichment analysis (GOEA) | [6_2_pathway_analysis_goea.ipynb](./6_2_pathway_analysis_goea.ipynb) |\n\nYou can find more examples of how to use varseek in the GitHub repository for our preprint [GitHub - pachterlab/RLSRWP_2025](https://github.com/pachterlab/RLSRWP_2025.git).\n\n \nIf you use `varseek` in a publication, please cite the following study: \n```\nPAPER CITATION\n```\nRead the article here: PAPER DOI \n\n# Installation\n```bash\npip install varseek\n```\n\n# \ud83e\ude84 Quick start guide\n## 1. Acquire a Reference\n\nFollow one of the below options:\n\n### a. Download a Pre-built Reference\n- (optional) View all downloadable references: `vk ref --list_downloadable_references`\n- `vk ref --download --variants VARIANTS --sequences SEQUENCES`\n\n### b. Make custom reference \u2013 screen for user-defined variants\n- `vk ref --variants VARIANTS --sequences SEQUENCES ...`\n\n### c. Customize reference building process \u2013 customize the VCRS filtering process (e.g., add additional information by which to filter, add custom filtering logic, tune filtering parameters based on the results of intermediate steps, etc.)\n- `vk build --variants VARIANTS --sequences SEQUENCES ...`\n- (optional) `vk info --input_dir INPUT_DIR ...`\n- (optional) `vk filter --input_dir INPUT_DIR ...`\n- `kb ref --workflow custom --index INDEX ...`\n\n\n## 2. Screen for variants\n\nFollow one of the below options:\n\n### a. Standard workflow\n- (optional) fastq quality control\n- `vk count --index INDEX --t2g T2G ... --fastqs FASTQ1 FASTQ2...`\n\n### b. Customize variant screening process - additional fastq preprocessing, custom count matrix processing\n- (optional) fastq quality control\n- (optional) `vk fastqpp ... --fastqs FASTQ1 FASTQ2...`\n- `kb count --index INDEX --t2g T2G ... --fastqs FASTQ1 FASTQ2...`\n- (optional) `kb count --index REFERENCE_INDEX --t2g REFERENCE_T2G ... --fastqs FASTQ1 FASTQ2...`\n- (optional) `vk clean --adata ADATA ...`\n- (optional) `vk summarize --adata ADATA ...`\n\n\n**Examples for getting started:** [GitHub - pachterlab/varseek](https://github.com/pachterlab/varseek-examples.git)\n**Manuscript**: ...\n**Repository for manuscript figures**: [GitHub - pachterlab/RLSRP_2025](https://github.com/pachterlab/RLSRP_2025.git)\n",
"bugtrack_url": null,
"license": "BSD 2-Clause License\n \n Copyright (c) 2024, Pachter Lab\n \n Redistribution and use in source and binary forms, with or without\n modification, are permitted provided that the following conditions are met:\n \n 1. Redistributions of source code must retain the above copyright notice, this\n list of conditions and the following disclaimer.\n \n 2. Redistributions in binary form must reproduce the above copyright notice,\n this list of conditions and the following disclaimer in the documentation\n and/or other materials provided with the distribution.\n \n THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS \"AS IS\"\n AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE\n IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE\n DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE\n FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL\n DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR\n SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER\n CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,\n OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE\n OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.\n ",
"summary": "Efficient variant screening of RNA-seq and DNA-seq data using k-mer-based alignment against a reference of known variants.",
"version": "0.1.1",
"project_urls": {
"Homepage": "https://github.com/pachterlab/varseek"
},
"split_keywords": [
"varseek",
" bioinformatics",
" variant-analysis",
" k-mer",
" rna-seq",
" dna-seq"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "963fda5133a35fc41ab16a7ae9e8f7dd6c4606cfdd71864a618e30a4cd050ec8",
"md5": "9aff5806f4f96e4b36be66c401e7c3a1",
"sha256": "85be51268e0fd2359d8adb9431f31dac41e47c83e44b1e9e10212820469c02d8"
},
"downloads": -1,
"filename": "varseek-0.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "9aff5806f4f96e4b36be66c401e7c3a1",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 234323,
"upload_time": "2025-08-12T08:17:58",
"upload_time_iso_8601": "2025-08-12T08:17:58.076228Z",
"url": "https://files.pythonhosted.org/packages/96/3f/da5133a35fc41ab16a7ae9e8f7dd6c4606cfdd71864a618e30a4cd050ec8/varseek-0.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "7d518aacda62660f88b7283865a76ac330574d64a2f83d6aeab3bc62d858248f",
"md5": "79bbfa0214253557c805b2299e2aa686",
"sha256": "d6765f9fda272be7d1855106d5a52acf43500795b0002480d2b0f7600dbf011a"
},
"downloads": -1,
"filename": "varseek-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "79bbfa0214253557c805b2299e2aa686",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 237522,
"upload_time": "2025-08-12T08:17:59",
"upload_time_iso_8601": "2025-08-12T08:17:59.312817Z",
"url": "https://files.pythonhosted.org/packages/7d/51/8aacda62660f88b7283865a76ac330574d64a2f83d6aeab3bc62d858248f/varseek-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-12 08:17:59",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "pachterlab",
"github_project": "varseek",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "numpy",
"specs": [
[
">=",
"1.26.4"
],
[
"<",
"2.0.0"
]
]
},
{
"name": "pandas",
"specs": [
[
"<",
"2.0.0"
],
[
">=",
"1.5.3"
]
]
},
{
"name": "matplotlib",
"specs": [
[
">=",
"3.9.0"
]
]
},
{
"name": "tqdm",
"specs": [
[
">=",
"4.66.4"
]
]
},
{
"name": "anndata",
"specs": [
[
">=",
"0.8.0"
]
]
},
{
"name": "kb-python",
"specs": [
[
">=",
"0.29.3"
]
]
},
{
"name": "gget",
"specs": [
[
">=",
"0.29.1"
]
]
},
{
"name": "scipy",
"specs": [
[
">=",
"1.13.1"
]
]
},
{
"name": "pyfastx",
"specs": [
[
">=",
"2.1.0"
]
]
},
{
"name": "pysam",
"specs": [
[
">=",
"0.22.1"
]
]
},
{
"name": "pyarrow",
"specs": [
[
">=",
"19.0.1"
]
]
}
],
"lcname": "varseek"
}