searcHPV


NamesearcHPV JSON
Version 1.1.0 PyPI version JSON
download
home_pagehttps://github.com/WenjinGudaisy/SearcHPV
SummaryAn HPV integration sites detection tool for targeted capture sequencing data
upload_time2023-06-28 18:48:05
maintainer
docs_urlNone
authorWenjin Gu
requires_python>=3.5, <4
license
keywords hpv integration targeted capture sequencing
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![Documentation Status](https://readthedocs.org/projects/searchpv/badge/?version=stable)](https://searchpv.readthedocs.io/en/stable/?badge=stable)
[![License](https://img.shields.io/badge/License-BSD%203--Clause-blue.svg)](https://github.com/WenjinGudaisy/SearcHPV/blob/main/LICENSE)
[![PyPI version](https://badge.fury.io/py/searcHPV.svg)](https://badge.fury.io/py/searcHPV)
</br> 

|Host | Downloads |
|:----|:---------:|
|PyPI | [![Downloads](https://pepy.tech/badge/searchpv)](https://pepy.tech/project/searchpv)

# SearcHPV
An HPV integration point detection tool for targeted capture sequencing data

## Introdution
* SearcHPV detects HPV fusion sites on both human genome and HPV genome
* SearcHPV is able to provide locally assembled contigs for each integration events. It will report at least one and at most two contigs for each integration sites. The two contigs will provide information captured for left and right sides of the event.

## Getting started
1. Required resources
* Unix like environment


2. Download and install
Firstly, download and install the required resources.
    1) Download Anaconda >=4.11.0: https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html#install-linux-silent

    2) Download the "environment.yaml" file under this repository

    3) Creat conda environment for SearcHPV:
        ```
        conda env create -f [your_path]/environment.yaml

        ```
        This command will automatically set up all the third-party tools and packages required for SearcHPV and install latest version of SearcHPV. The name of the environment is "searcHPV".

        You can check the packages and tools in this environment by:

        ```
        conda list -n searcHPV

        ```

        You can update the environment by:
        ```
        conda env update -f [your_path]/environment.yaml

        ```



3. Usage

SearcHPV have four main steps. You could either run it start-to-finish or run it step-by-step.

* Before running SearcHPV, active the conda environment:

```
conda activate searcHPV

```

If you are running commands in a bash script, start with:

```
#!/bin/bash
source ~/anaconda3/etc/profile.d/conda.sh;
conda activate searcHPV; 
#[searcHPV commands...]
```
Note: Please check your path of "conda.sh" if you did not install Anaconda in the home directory.

* Usage of searcHPV:

```
searcHPV <options> ...
```
* Standard options:
```
 -fastq1 <str>  sequencing data: fastq/fq.gz file
 -fastq2 <str>  sequencing data: fastq/fq.gz file
 -humRef <str>  human reference genome: fasta file
 -virRef <str>  HPV reference genome: fasta file
```
* Optional options:
```
-h, --help      show this help message and exit
-window <int>   the length of region searching for informative reads, default=300
-output <str>   output directory, default "./"
-alignment      run the alignment step, step1
-genomeFusion   call the genome fusion points, step2
-assemble local assemble for each integration event, step3
-hpvFusion call the HPV fusion points, step4
-clusterWindow <int> the length of window of clustering integration sites,default=100
-gz             if fastq files are in gz format
-poly(dn) N     poly(n), n*d(A/T/C/G), will report low confidence if contig contains poly(n), default=20
-index          index the original human and virus reference files, default=False
```

Note: If you've already indexed the virus and human reference files for BWA, Samtools, Picard, you do not need to add the "-index" option, especailly when you are running for a batch of samples that share the same virus and human reference files and you do not want to spend time on indexing references every time running a sample. The commands for indexing the virus and human reference files:

```
#activate SearcHPV conda environment first to make sure using the correct versions of tools
ref = '[path_of_your_reference_file]'
bwa index {ref}
samtools faidx {ref}
picard CreateSequenceDictionary R={ref} O={ref.replace('.fa','.dict')
```


4. Examples:

    1) Run it start-to-finish and submit a SBATCH job:
        ```
        #!/bin/bash
        #SBATCH --job-name=searcHPV
        #SBATCH --mail-user=wenjingu@umich.edu
        #SBATCH --mail-type=BEGIN,END
        #SBATCH --cpus-per-task=1
        #SBATCH --nodes=1
        #SBATCH --ntasks-per-node=8
        #SBATCH --mem=40gb
        #SBATCH --time=100:00:00
        #SBATCH --account=XXXXX
        #SBATCH --partition=standard
        #SBATCH --output=searcHPV.log
        #SBATCH --error=searcHPV.err
        source ~/anaconda3/etc/profile.d/conda.sh;
        conda activate searcHPV;      
        searcHPV -fastq1 Sample_81279.R1.fastq.gz -fastq2 Sample_81279.R2.fastq.gz -humRef hs37d5.fa -virRef HPV.fa -output /home/scratch/HPV_fusion/Sample_81279 -gz -index;
        ```


    2) Run it step-by-step:


        ```
        searchHPV -alignment -fastq1 Sample_81279.R1.fastq.gz -fastq2 Sample_81279.R2.fastq.gz -humRef hs37d5.fa -virRef HPV.fa -output /home/scratch/HPV_fusion/Sample_81279 -gz -index
        searchHPV -genomeFusion -fastq1 Sample_81279.R1.fastq.gz -fastq2 Sample_81279.R2.fastq.gz -humRef hs37d5.fa -virRef HPV.fa -output /home/scratch/HPV_fusion/Sample_81279 -gz
        searchHPV -assemble -fastq1 Sample_81279.R1.fastq.gz -fastq2 Sample_81279.R2.fastq.gz -humRef hs37d5.fa -virRef HPV.fa -output /home/scratch/HPV_fusion/Sample_81279 -gz
        searchHPV -hpvFusion -fastq1 Sample_81279.R1.fastq.gz -fastq2 Sample_81279.R2.fastq.gz -humRef hs37d5.fa -virRef HPV.fa -output /home/scratch/HPV_fusion/Sample_81279 -gz

        ```
        Note: if run it step-by-step, please make sure the output directories for all steps are the same.

## Output
1. Alignment: the marked dupliaction alignment bam file and customized reference genome.\\
2. Genome Fusion Point Calling: orignal callset, filtered callset, filtered clustered callset.\\
3. Assemble: supportive reads, contigs for each integration events (unfiltered).\\
4. HPV fusion Point Calling: alignment bam file for contigs againt human and HPV genome.\\
Final outputs are under the folder "call_fusion_virus": 
summary of all the integration events : "HPVfusionPointContig.txt"
contig sequences for all the integration events: "ContigsSequence.fa"

## Citation
SearcHPV: a novel approach to identify and assemble human papillomavirus-host genomic integration events in cancer --- Accepted by Cancer

## Contact
wenjingu@umich.edu





            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/WenjinGudaisy/SearcHPV",
    "name": "searcHPV",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.5, <4",
    "maintainer_email": "",
    "keywords": "HPV,integration,targeted capture sequencing",
    "author": "Wenjin Gu",
    "author_email": "wenjingu@umich.edu",
    "download_url": "https://files.pythonhosted.org/packages/0a/87/892c1cdb3efc2a241c57573efde865bd8504c2fc1106faa59f3a7b8738d9/searcHPV-1.1.0.tar.gz",
    "platform": null,
    "description": "[![Documentation Status](https://readthedocs.org/projects/searchpv/badge/?version=stable)](https://searchpv.readthedocs.io/en/stable/?badge=stable)\n[![License](https://img.shields.io/badge/License-BSD%203--Clause-blue.svg)](https://github.com/WenjinGudaisy/SearcHPV/blob/main/LICENSE)\n[![PyPI version](https://badge.fury.io/py/searcHPV.svg)](https://badge.fury.io/py/searcHPV)\n</br> \n\n|Host | Downloads |\n|:----|:---------:|\n|PyPI | [![Downloads](https://pepy.tech/badge/searchpv)](https://pepy.tech/project/searchpv)\n\n# SearcHPV\nAn HPV integration point detection tool for targeted capture sequencing data\n\n## Introdution\n* SearcHPV detects HPV fusion sites on both human genome and HPV genome\n* SearcHPV is able to provide locally assembled contigs for each integration events. It will report at least one and at most two contigs for each integration sites. The two contigs will provide information captured for left and right sides of the event.\n\n## Getting started\n1. Required resources\n* Unix like environment\n\n\n2. Download and install\nFirstly, download and install the required resources.\n    1) Download Anaconda >=4.11.0: https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html#install-linux-silent\n\n    2) Download the \"environment.yaml\" file under this repository\n\n    3) Creat conda environment for SearcHPV:\n        ```\n        conda env create -f [your_path]/environment.yaml\n\n        ```\n        This command will automatically set up all the third-party tools and packages required for SearcHPV and install latest version of SearcHPV. The name of the environment is \"searcHPV\".\n\n        You can check the packages and tools in this environment by:\n\n        ```\n        conda list -n searcHPV\n\n        ```\n\n        You can update the environment by:\n        ```\n        conda env update -f [your_path]/environment.yaml\n\n        ```\n\n\n\n3. Usage\n\nSearcHPV have four main steps. You could either run it start-to-finish or run it step-by-step.\n\n* Before running SearcHPV, active the conda environment:\n\n```\nconda activate searcHPV\n\n```\n\nIf you are running commands in a bash script, start with:\n\n```\n#!/bin/bash\nsource ~/anaconda3/etc/profile.d/conda.sh;\nconda activate searcHPV; \n#[searcHPV commands...]\n```\nNote: Please check your path of \"conda.sh\" if you did not install Anaconda in the home directory.\n\n* Usage of searcHPV:\n\n```\nsearcHPV <options> ...\n```\n* Standard options:\n```\n -fastq1 <str>  sequencing data: fastq/fq.gz file\n -fastq2 <str>  sequencing data: fastq/fq.gz file\n -humRef <str>  human reference genome: fasta file\n -virRef <str>  HPV reference genome: fasta file\n```\n* Optional options:\n```\n-h, --help      show this help message and exit\n-window <int>   the length of region searching for informative reads, default=300\n-output <str>   output directory, default \"./\"\n-alignment      run the alignment step, step1\n-genomeFusion   call the genome fusion points, step2\n-assemble local assemble for each integration event, step3\n-hpvFusion call the HPV fusion points, step4\n-clusterWindow <int> the length of window of clustering integration sites,default=100\n-gz             if fastq files are in gz format\n-poly(dn) N     poly(n), n*d(A/T/C/G), will report low confidence if contig contains poly(n), default=20\n-index          index the original human and virus reference files, default=False\n```\n\nNote: If you've already indexed the virus and human reference files for BWA, Samtools, Picard, you do not need to add the \"-index\" option, especailly when you are running for a batch of samples that share the same virus and human reference files and you do not want to spend time on indexing references every time running a sample. The commands for indexing the virus and human reference files:\n\n```\n#activate SearcHPV conda environment first to make sure using the correct versions of tools\nref = '[path_of_your_reference_file]'\nbwa index {ref}\nsamtools faidx {ref}\npicard CreateSequenceDictionary R={ref} O={ref.replace('.fa','.dict')\n```\n\n\n4. Examples:\n\n    1) Run it start-to-finish and submit a SBATCH job:\n        ```\n        #!/bin/bash\n        #SBATCH --job-name=searcHPV\n        #SBATCH --mail-user=wenjingu@umich.edu\n        #SBATCH --mail-type=BEGIN,END\n        #SBATCH --cpus-per-task=1\n        #SBATCH --nodes=1\n        #SBATCH --ntasks-per-node=8\n        #SBATCH --mem=40gb\n        #SBATCH --time=100:00:00\n        #SBATCH --account=XXXXX\n        #SBATCH --partition=standard\n        #SBATCH --output=searcHPV.log\n        #SBATCH --error=searcHPV.err\n        source ~/anaconda3/etc/profile.d/conda.sh;\n        conda activate searcHPV;      \n        searcHPV -fastq1 Sample_81279.R1.fastq.gz -fastq2 Sample_81279.R2.fastq.gz -humRef hs37d5.fa -virRef HPV.fa -output /home/scratch/HPV_fusion/Sample_81279 -gz -index;\n        ```\n\n\n    2) Run it step-by-step:\n\n\n        ```\n        searchHPV -alignment -fastq1 Sample_81279.R1.fastq.gz -fastq2 Sample_81279.R2.fastq.gz -humRef hs37d5.fa -virRef HPV.fa -output /home/scratch/HPV_fusion/Sample_81279 -gz -index\n        searchHPV -genomeFusion -fastq1 Sample_81279.R1.fastq.gz -fastq2 Sample_81279.R2.fastq.gz -humRef hs37d5.fa -virRef HPV.fa -output /home/scratch/HPV_fusion/Sample_81279 -gz\n        searchHPV -assemble -fastq1 Sample_81279.R1.fastq.gz -fastq2 Sample_81279.R2.fastq.gz -humRef hs37d5.fa -virRef HPV.fa -output /home/scratch/HPV_fusion/Sample_81279 -gz\n        searchHPV -hpvFusion -fastq1 Sample_81279.R1.fastq.gz -fastq2 Sample_81279.R2.fastq.gz -humRef hs37d5.fa -virRef HPV.fa -output /home/scratch/HPV_fusion/Sample_81279 -gz\n\n        ```\n        Note: if run it step-by-step, please make sure the output directories for all steps are the same.\n\n## Output\n1. Alignment: the marked dupliaction alignment bam file and customized reference genome.\\\\\n2. Genome Fusion Point Calling: orignal callset, filtered callset, filtered clustered callset.\\\\\n3. Assemble: supportive reads, contigs for each integration events (unfiltered).\\\\\n4. HPV fusion Point Calling: alignment bam file for contigs againt human and HPV genome.\\\\\nFinal outputs are under the folder \"call_fusion_virus\": \nsummary of all the integration events : \"HPVfusionPointContig.txt\"\ncontig sequences for all the integration events: \"ContigsSequence.fa\"\n\n## Citation\nSearcHPV: a novel approach to identify and assemble human papillomavirus-host genomic integration events in cancer --- Accepted by Cancer\n\n## Contact\nwenjingu@umich.edu\n\n\n\n\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "An HPV integration sites detection tool for targeted capture sequencing data",
    "version": "1.1.0",
    "project_urls": {
        "Homepage": "https://github.com/WenjinGudaisy/SearcHPV"
    },
    "split_keywords": [
        "hpv",
        "integration",
        "targeted capture sequencing"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f9ed2ebfcc26d962a6268bc60166e62e4dee464c045f56d13140cf305677ea95",
                "md5": "42166e3493ff53f77cf43ea7b65aa275",
                "sha256": "83f87a472ded2026e4a5a35a470a5f7ec00d7b2acea448933794a5e3a3f694d0"
            },
            "downloads": -1,
            "filename": "searcHPV-1.1.0-py3.8.egg",
            "has_sig": false,
            "md5_digest": "42166e3493ff53f77cf43ea7b65aa275",
            "packagetype": "bdist_egg",
            "python_version": "1.1.0",
            "requires_python": ">=3.5, <4",
            "size": 44365,
            "upload_time": "2023-06-28T18:48:04",
            "upload_time_iso_8601": "2023-06-28T18:48:04.150501Z",
            "url": "https://files.pythonhosted.org/packages/f9/ed/2ebfcc26d962a6268bc60166e62e4dee464c045f56d13140cf305677ea95/searcHPV-1.1.0-py3.8.egg",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "891c74ea0299769f76223d0145ccf4279040caa654f92a83a42ba111a33ecc93",
                "md5": "b781e6e54d509783406d52a734009abb",
                "sha256": "c87fdd29baed5b8883f7c3c983a95fbe34434bedb8c31ad406d62b03e0d0e2e6"
            },
            "downloads": -1,
            "filename": "searcHPV-1.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b781e6e54d509783406d52a734009abb",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.5, <4",
            "size": 24440,
            "upload_time": "2023-06-28T18:48:02",
            "upload_time_iso_8601": "2023-06-28T18:48:02.678849Z",
            "url": "https://files.pythonhosted.org/packages/89/1c/74ea0299769f76223d0145ccf4279040caa654f92a83a42ba111a33ecc93/searcHPV-1.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0a87892c1cdb3efc2a241c57573efde865bd8504c2fc1106faa59f3a7b8738d9",
                "md5": "a1f4bf98123762de2bf2df16e8cd7a40",
                "sha256": "67a0c96d8031d1693e4b55b2a3d62f91498c02b071ab90f678414e28db0ba2e6"
            },
            "downloads": -1,
            "filename": "searcHPV-1.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "a1f4bf98123762de2bf2df16e8cd7a40",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.5, <4",
            "size": 24188,
            "upload_time": "2023-06-28T18:48:05",
            "upload_time_iso_8601": "2023-06-28T18:48:05.376771Z",
            "url": "https://files.pythonhosted.org/packages/0a/87/892c1cdb3efc2a241c57573efde865bd8504c2fc1106faa59f3a7b8738d9/searcHPV-1.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-06-28 18:48:05",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "WenjinGudaisy",
    "github_project": "SearcHPV",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "searchpv"
}
        
Elapsed time: 0.10246s