# HiScanner (HIgh-resolution Single-Cell Allelic copy Number callER)
[](https://badge.fury.io/py/hiscanner)
[](https://opensource.org/licenses/MIT)
HiScanner is a python package for high-resolution single-cell copy number analysis. It supports two modes of operation:
1. **Standard Pipeline** (RDR + BAF): Full analysis using both read depth ratios (RDR) and B-allele frequencies (BAF)
2. **RDR-only Pipeline**: Simplified analysis using only read depth ratios
## Table of Contents
- [Installation](#installation)
- [Required External Files](#required-external-files)
- [Pipeline Options](#pipeline-options)
- [Running the Pipeline](#running-the-pipeline)
- [Output Structure](#output-structure)
- [Troubleshooting](#troubleshooting)
- [Citation](#citation)
### Installation
```bash
# Create new conda environment with all dependencies
conda create -n hiscanner_test python=3.8
conda activate hiscanner_test
pip install hiscanner --no-cache-dir
```
Install R and required packages:
```bash
conda install -c conda-forge r-base
conda install -c bioconda r-mgcv>=1.8
```
Install other dependencies:
```bash
conda install -c bioconda snakemake samtools bcftools
```
We tested with snakemake==7.32.4, samtools==1.15.1, bcftools==1.13.
## Required External Files:
### 1) Mappability Track
<details>
<summary>hg19/GRCh37</summary>
```bash
# 100mer:
wget https://www.math.pku.edu.cn/teachers/xirb/downloads/software/BICseq2/Mappability/hg19CRG.100bp.tar.gz --no-check-certificate
tar -xvzf hg19CRG.100bp.tar.gz
```
</details>
<details>
<summary>hg38/GRCh38 (150bp)</summary>
```bash
150bp: Coming soon
```
</details>
<details>
<summary>Other reference genomes: instructions on custom track generation</summary>
For other genomes/read length configurations, follow instructions at [CGAP Annotations](https://cgap-annotations.readthedocs.io/en/latest/bic-seq2_mappability.html) to generate mappability tracks.
</details>
<details>
<summary>IMPORTANT: Guidelines when choosing mappability tracks</summary>
- **Shorter mappability track (e.g., 100bp) with longer reads (e.g., 150bp)**: Valid but conservative (some uniquely mappable regions may be missed)
- **Longer mappability track (e.g., 150bp) with shorter reads (e.g., 100bp)**: Not valid, will cause false positives
</details>
### 2) Reference Genome
We require the reference genome fasta to be split into chromosomes, to allow for parallel processing. You can use the following command to split the reference genome:
```bash
samtools faidx /path/to/reference.fasta
mkdir /path/to/reference/split
awk '{print $1}' /path/to/reference.fasta.fai | xargs -I {} samtools faidx /path/to/reference.fasta {} > /path/to/reference/split/{}.fasta
```
## Pipeline Options
HiScanner offers two pipeline options:
### 1. Standard Pipeline (RDR + BAF)
Uses both read depth ratios and B-allele frequencies for comprehensive CNV analysis.
Steps:
1. SNP Calling (via SCAN2)
2. Phasing & BAF Computation
3. ADO Analysis
4. Segmentation
5. CNV Calling
Requirements:
- SCAN2 output files
- Bulk and single cell BAM files
- Reference genome
- Mappability tracks
### 2. RDR-only Pipeline
Uses only read depth ratios for simplified CNV analysis.
Steps:
1. Segmentation
2. CNV Calling
Requirements:
- Single cell BAM files
- Reference genome
- Mappability tracks
## Running the Pipeline
### Option 1: Standard Pipeline (RDR + BAF)
#### Step 1: SCAN2 (Prerequisites)
[SCAN2](https://github.com/parklab/SCAN2) needs to be run separately before using HiScanner.
> **Note**: If you only need SCAN2 output for HiScanner (and are not interested in SNV calls), you can save time by running SCAN2 with:
> ```bash
> scan2 run --joblimit 5000 --snakemake-args ' --until shapeit/phased_hets.vcf.gz --latency-wait 120'
> ```
> This will stop at the phasing step, which is sufficient for HiScanner's requirements.
If you have already run SCAN2, ensure you have:
- VCF file with raw variants (`gatk/hc_raw.mmq60.vcf.gz`)
- Phased heterozygous variants (`shapeit/phased_hets.vcf`)
- Additionally, we note that the phased genotype field in `phased_hets.vcf` should be named as `phasedgt`. This is the expected output from the SCAN2 pipeline that we have tested with. If your VCF file has a different field name, please manually rename it to `phasedgt` in the VCF.
The expected location is `scan2_out/` in your project directory.
#### Steps 2-5: HiScanner Pipeline
<details>
<summary>1. Initialize HiScanner project</summary>
```bash
hiscanner init --output ./my_project
cd my_project
```
</details>
<details>
<summary>2. Edit config.yaml</summary>
Edit with your paths and parameters
</details>
<details>
<summary>3. Prepare metadata file</summary>
Must contain the following columns:
```
bamID bam singlecell
bulk1 /path/to/bulk.bam N
cell1 /path/to/cell1.bam Y
cell2 /path/to/cell2.bam Y
```
</details>
<details>
<summary>4. Validate configuration</summary>
```bash
hiscanner --config config.yaml validate
```
</details>
<details>
<summary>5. Run the pipeline</summary>
```bash
hiscanner --config config.yaml run --step snp # Check SCAN2 results
hiscanner --config config.yaml run --step phase # Process SCAN2 results
hiscanner --config config.yaml run --step ado # ADO analysis to identify optimal bin size
hiscanner --config config.yaml run --step normalize # Normalize read depth ratios
hiscanner --config config.yaml run --step segment # Segmentation
hiscanner --config config.yaml run --step cnv # CNV calling
# Or run all steps at once:
hiscanner --config config.yaml run --step all
```
</details>
For ```normalize``` (the most time-consuming step), we provide an option to run with cluster, e.g.,
```bash
hiscanner --config config.yaml run --step normalize --use-cluster
```
### Option 2: RDR-only Pipeline
<details>
<summary>1. Initialize project</summary>
```bash
hiscanner init --output ./my_project
cd my_project
```
</details>
<details>
<summary>2. Edit config.yaml</summary>
- Set `rdr_only: true`
- Configure paths and parameters
</details>
<details>
<summary>3. Prepare metadata file</summary>
Must contain the following columns (bulk samples are not required):
```
bamID bam singlecell
cell1 /path/to/cell1.bam Y
cell2 /path/to/cell2.bam Y
```
</details>
<details>
<summary>4. Validate configuration</summary>
```bash
hiscanner --config config.yaml validate
```
</details>
<details>
<summary>5. Run the pipeline</summary>
```bash
hiscanner --config config.yaml run --step normalize # Normalize read depth ratios
hiscanner --config config.yaml run --step segment # Segmentation
hiscanner --config config.yaml run --step cnv # CNV calling
```
</details>
## Output Structure
```
hiscanner_output/
├── phased_hets/ # Processed heterozygous SNPs (Standard pipeline only)
├── ado/ # ADO analysis results (Standard pipeline only)
├── bins/ # Binned read depth
├── segs/ # Segmentation results
└── final_calls/ # Final CNV calls
```
## Cleaning Up
HiScanner creates several temporary directories during analysis. You can clean these up using the clean command:
```bash
hiscanner --config config.yaml clean
```
## Troubleshooting
Common issues:
1. Missing SCAN2 results: Ensure scan2_output directory is correctly specified
2. File permissions: Check access to BAM files and reference data
3. Memory issues: Adjust batch_size in config.yaml
For more detailed information, check the log files in hiscanner_output/logs/
## Support
HiScanner is currently under active development. For support or questions, please open an issue on our [GitHub repository](github.com/parklab/hiscanner).
## Citation
If you use HiScanner in your research, please cite:
Zhao, Y., Luquette, L. J., Veit, A. D., Wang, X., Xi, R., Viswanadham, V. V., ... & Park, P. J. (2024). High-resolution detection of copy number alterations in single cells with HiScanner. bioRxiv, 2024-04.
Raw data
{
"_id": null,
"home_page": null,
"name": "hiscanner",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "bioinformatics, single-cell, cnv, genomics",
"author": "Yifan Zhao",
"author_email": "yifnzhao@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/35/ac/26ef559bf8afbcd1140663f5849fa14cca1dcf853f813713e16f14252620/hiscanner-0.2b5.tar.gz",
"platform": null,
"description": "# HiScanner (HIgh-resolution Single-Cell Allelic copy Number callER)\n[](https://badge.fury.io/py/hiscanner)\n[](https://opensource.org/licenses/MIT)\n\nHiScanner is a python package for high-resolution single-cell copy number analysis. It supports two modes of operation:\n\n1. **Standard Pipeline** (RDR + BAF): Full analysis using both read depth ratios (RDR) and B-allele frequencies (BAF)\n2. **RDR-only Pipeline**: Simplified analysis using only read depth ratios\n\n## Table of Contents\n\n- [Installation](#installation)\n- [Required External Files](#required-external-files)\n- [Pipeline Options](#pipeline-options)\n- [Running the Pipeline](#running-the-pipeline)\n- [Output Structure](#output-structure)\n- [Troubleshooting](#troubleshooting)\n- [Citation](#citation)\n\n### Installation\n```bash\n# Create new conda environment with all dependencies\nconda create -n hiscanner_test python=3.8\nconda activate hiscanner_test\npip install hiscanner --no-cache-dir\n```\n\nInstall R and required packages:\n```bash\nconda install -c conda-forge r-base \nconda install -c bioconda r-mgcv>=1.8\n```\n\nInstall other dependencies:\n```bash\nconda install -c bioconda snakemake samtools bcftools\n```\nWe tested with snakemake==7.32.4, samtools==1.15.1, bcftools==1.13.\n\n## Required External Files: \n\n### 1) Mappability Track\n<details>\n<summary>hg19/GRCh37</summary>\n\n```bash\n# 100mer:\nwget https://www.math.pku.edu.cn/teachers/xirb/downloads/software/BICseq2/Mappability/hg19CRG.100bp.tar.gz --no-check-certificate\ntar -xvzf hg19CRG.100bp.tar.gz\n```\n</details>\n\n<details>\n<summary>hg38/GRCh38 (150bp)</summary>\n```bash\n150bp: Coming soon\n```\n</details>\n\n<details>\n<summary>Other reference genomes: instructions on custom track generation</summary>\n\nFor other genomes/read length configurations, follow instructions at [CGAP Annotations](https://cgap-annotations.readthedocs.io/en/latest/bic-seq2_mappability.html) to generate mappability tracks.\n</details>\n\n<details>\n<summary>IMPORTANT: Guidelines when choosing mappability tracks</summary>\n\n- **Shorter mappability track (e.g., 100bp) with longer reads (e.g., 150bp)**: Valid but conservative (some uniquely mappable regions may be missed)\n- **Longer mappability track (e.g., 150bp) with shorter reads (e.g., 100bp)**: Not valid, will cause false positives\n</details>\n\n### 2) Reference Genome\nWe require the reference genome fasta to be split into chromosomes, to allow for parallel processing. You can use the following command to split the reference genome:\n```bash\nsamtools faidx /path/to/reference.fasta\nmkdir /path/to/reference/split\nawk '{print $1}' /path/to/reference.fasta.fai | xargs -I {} samtools faidx /path/to/reference.fasta {} > /path/to/reference/split/{}.fasta\n```\n\n\n## Pipeline Options\n\nHiScanner offers two pipeline options:\n\n### 1. Standard Pipeline (RDR + BAF)\nUses both read depth ratios and B-allele frequencies for comprehensive CNV analysis.\n\nSteps:\n1. SNP Calling (via SCAN2)\n2. Phasing & BAF Computation\n3. ADO Analysis\n4. Segmentation\n5. CNV Calling\n\nRequirements:\n- SCAN2 output files\n- Bulk and single cell BAM files\n- Reference genome\n- Mappability tracks\n\n### 2. RDR-only Pipeline\nUses only read depth ratios for simplified CNV analysis.\n\nSteps:\n1. Segmentation\n2. CNV Calling\n\nRequirements:\n- Single cell BAM files\n- Reference genome\n- Mappability tracks\n\n## Running the Pipeline\n\n### Option 1: Standard Pipeline (RDR + BAF)\n\n\n#### Step 1: SCAN2 (Prerequisites)\n\n[SCAN2](https://github.com/parklab/SCAN2) needs to be run separately before using HiScanner. \n\n> **Note**: If you only need SCAN2 output for HiScanner (and are not interested in SNV calls), you can save time by running SCAN2 with:\n> ```bash\n> scan2 run --joblimit 5000 --snakemake-args ' --until shapeit/phased_hets.vcf.gz --latency-wait 120'\n> ```\n> This will stop at the phasing step, which is sufficient for HiScanner's requirements.\n\nIf you have already run SCAN2, ensure you have:\n- VCF file with raw variants (`gatk/hc_raw.mmq60.vcf.gz`)\n- Phased heterozygous variants (`shapeit/phased_hets.vcf`)\n\n- Additionally, we note that the phased genotype field in `phased_hets.vcf` should be named as `phasedgt`. This is the expected output from the SCAN2 pipeline that we have tested with. If your VCF file has a different field name, please manually rename it to `phasedgt` in the VCF.\n\nThe expected location is `scan2_out/` in your project directory.\n\n#### Steps 2-5: HiScanner Pipeline\n<details>\n<summary>1. Initialize HiScanner project</summary>\n\n```bash\nhiscanner init --output ./my_project\ncd my_project\n```\n</details>\n\n<details>\n<summary>2. Edit config.yaml</summary>\n\nEdit with your paths and parameters\n</details>\n\n<details>\n<summary>3. Prepare metadata file</summary>\n\nMust contain the following columns:\n```\nbamID bam singlecell\nbulk1 /path/to/bulk.bam N\ncell1 /path/to/cell1.bam Y \ncell2 /path/to/cell2.bam Y\n```\n</details>\n\n<details>\n<summary>4. Validate configuration</summary>\n\n```bash\nhiscanner --config config.yaml validate\n```\n</details>\n\n<details>\n<summary>5. Run the pipeline</summary>\n\n```bash\nhiscanner --config config.yaml run --step snp # Check SCAN2 results\nhiscanner --config config.yaml run --step phase # Process SCAN2 results\nhiscanner --config config.yaml run --step ado # ADO analysis to identify optimal bin size\nhiscanner --config config.yaml run --step normalize # Normalize read depth ratios\nhiscanner --config config.yaml run --step segment # Segmentation\nhiscanner --config config.yaml run --step cnv # CNV calling\n\n# Or run all steps at once:\nhiscanner --config config.yaml run --step all\n```\n</details>\n\nFor ```normalize``` (the most time-consuming step), we provide an option to run with cluster, e.g., \n```bash\nhiscanner --config config.yaml run --step normalize --use-cluster\n```\n\n### Option 2: RDR-only Pipeline\n<details>\n<summary>1. Initialize project</summary>\n\n```bash\nhiscanner init --output ./my_project\ncd my_project\n```\n</details>\n\n<details>\n<summary>2. Edit config.yaml</summary>\n\n- Set `rdr_only: true`\n- Configure paths and parameters\n</details>\n\n<details>\n<summary>3. Prepare metadata file</summary>\n\nMust contain the following columns (bulk samples are not required):\n```\nbamID bam singlecell\ncell1 /path/to/cell1.bam Y \ncell2 /path/to/cell2.bam Y\n```\n</details>\n\n<details>\n<summary>4. Validate configuration</summary>\n\n```bash\nhiscanner --config config.yaml validate\n```\n</details>\n\n<details>\n<summary>5. Run the pipeline</summary>\n\n```bash\nhiscanner --config config.yaml run --step normalize # Normalize read depth ratios\nhiscanner --config config.yaml run --step segment # Segmentation\nhiscanner --config config.yaml run --step cnv # CNV calling\n```\n</details>\n\n\n## Output Structure\n\n```\nhiscanner_output/\n\u251c\u2500\u2500 phased_hets/ # Processed heterozygous SNPs (Standard pipeline only)\n\u251c\u2500\u2500 ado/ # ADO analysis results (Standard pipeline only)\n\u251c\u2500\u2500 bins/ # Binned read depth\n\u251c\u2500\u2500 segs/ # Segmentation results\n\u2514\u2500\u2500 final_calls/ # Final CNV calls\n```\n\n## Cleaning Up\nHiScanner creates several temporary directories during analysis. You can clean these up using the clean command:\n```bash\nhiscanner --config config.yaml clean\n```\n\n## Troubleshooting\n\nCommon issues:\n1. Missing SCAN2 results: Ensure scan2_output directory is correctly specified\n2. File permissions: Check access to BAM files and reference data\n3. Memory issues: Adjust batch_size in config.yaml\n\nFor more detailed information, check the log files in hiscanner_output/logs/\n\n\n## Support\nHiScanner is currently under active development. For support or questions, please open an issue on our [GitHub repository](github.com/parklab/hiscanner).\n\n\n## Citation\n\nIf you use HiScanner in your research, please cite:\n\nZhao, Y., Luquette, L. J., Veit, A. D., Wang, X., Xi, R., Viswanadham, V. V., ... & Park, P. J. (2024). High-resolution detection of copy number alterations in single cells with HiScanner. bioRxiv, 2024-04.\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "High-resolution single-cell copy number analysis.",
"version": "0.2b5",
"project_urls": {
"Changelog": "https://github.com/parklab/hiscanner/blob/main/CHANGELOG.md",
"Documentation": "https://github.com/parklab/hiscanner#readme",
"Homepage": "https://github.com/parklab/hiscanner",
"Repository": "https://github.com/parklab/hiscanner.git"
},
"split_keywords": [
"bioinformatics",
" single-cell",
" cnv",
" genomics"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "87019e905d443bdc03b8d5bc96384bd8bb9d157724fe1a126c9f4a3bdf1a0a85",
"md5": "1eef5a55aa605d650ab3f0197c4456aa",
"sha256": "3c3a467b9d7526bff3c3fab71ef24fdb5e8c9043d39aa943d491f4aa19e7f25a"
},
"downloads": -1,
"filename": "hiscanner-0.2b5-py3-none-any.whl",
"has_sig": false,
"md5_digest": "1eef5a55aa605d650ab3f0197c4456aa",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 4867338,
"upload_time": "2025-02-06T22:21:15",
"upload_time_iso_8601": "2025-02-06T22:21:15.125975Z",
"url": "https://files.pythonhosted.org/packages/87/01/9e905d443bdc03b8d5bc96384bd8bb9d157724fe1a126c9f4a3bdf1a0a85/hiscanner-0.2b5-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "35ac26ef559bf8afbcd1140663f5849fa14cca1dcf853f813713e16f14252620",
"md5": "2417a2d1c032d1b87e53b311ce17f140",
"sha256": "79dedfd6a0c5f330cf39eec5686aaccff4a083fba43c67f60c5223d01e199e34"
},
"downloads": -1,
"filename": "hiscanner-0.2b5.tar.gz",
"has_sig": false,
"md5_digest": "2417a2d1c032d1b87e53b311ce17f140",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 4792227,
"upload_time": "2025-02-06T22:21:17",
"upload_time_iso_8601": "2025-02-06T22:21:17.771900Z",
"url": "https://files.pythonhosted.org/packages/35/ac/26ef559bf8afbcd1140663f5849fa14cca1dcf853f813713e16f14252620/hiscanner-0.2b5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-02-06 22:21:17",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "parklab",
"github_project": "hiscanner",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "hiscanner"
}