PGAP2


NamePGAP2 JSON
Version 1.0.6 PyPI version JSON
download
home_pageNone
SummaryPGAP2: a comprehensive pan-genome analysis pipeline for prokaryotic genomes
upload_time2025-08-07 04:54:58
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseMIT License
keywords pan-genome prokaryotic genomes pipeline cluster
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ![image](https://github.com/bucongfan/PGAP2/blob/main/pgap2.logo.png)

## In Brief
PGAP2 (Pan-Genome Analysis Pipeline 2) is an ultra-fast and comprehensive toolkit for prokaryotic pan-genome analysis. Powered by a Fine-Grained Feature Network, PGAP2 can construct a pan-genome map from 1,000 genomes within 20 minutes while ensuring high accuracy. In addition, it offers a rich set of upstream quality control modules and downstream analysis tools to support common pan-genome analyses.

## Quick start
### Basic usage
The input directory contains all the genome and annotation files.

PGAP2 supports multiple input formats: GFF files in the same format as those output by Prokka, GFF files with their corresponding genome FASTA files in separate files, GenBank flat files (GBFF), or just genome FASTA files (with `--reannot` required).

Different formats of input files can be mixed in one input directory. PGAP2 will recognize and process them based on their prefixes and suffixes.

```bash
pgap2 main -i inputdir/ -o outputdir/
```

### Preprocessing
Quality checks and visualization are conducted by PGAP2 during the preprocessing step. PGAP2 generates an interactive HTML file and corresponding vector figures to help users understand their input data. The input data and pre-alignment results are stored as a pickle file for quick restarting of the same calculation step.

```bash
pgap2 prep -i inputdir/ -o outputdir/
```

### Postprocessing
The postprocessing pipeline is performed by PGAP2. There are various submodules integrated into the postprocessing module, such as statistical analysis, single-copy tree building, population clustering, and Tajima's D test. Regardless of which submodule you want to use, you can always run it as follows:

```bash
pgap2 post [submodule] [options] -i inputdir/ -o outputdir/
```

The inputdir is the outputdir of main module.

PGAP2 also support statistical analysis using a PAV file indepandently:

```bash
pgap2 post profile --pav your_pav_file -o outputdir/
```

## Installation
The best way to install full version of PGAP2 package is using conda:

```bash
conda create -n pgap2 -c bioconda pgap2
```

alternatively it is often faster to use the [mamba](https://github.com/mamba-org/mamba) solver (Recommended)

```bash
conda create -n pgap2  mamba
conda activate pgap2 
mamba install -c bioconda pgap2
```

Or sometimes you only want to carry out a specific function, such as partioning and don't want install too many extra softwares for fully version of PGAP2, then you can just install PGAP2:

```bash
pip install pgap2
```

Or via source file to get the latest version:

```bash
git clone https://github.com/bucongfan/PGAP2
pip install -e PGAP2/
```

And then install extra software that only necessary for a specific function by yourself.

Dependencies of PGAP2 are list below, and PGAP2 will check them whether in environment path or in pgap2/dependencies folder.

### Preprocessing
+ One of clustering software
    - [cd-hit](https://github.com/weizhongli/cdhit)
    - [MMseqs2](https://github.com/soedinglab/MMseqs2)
+ One of alignment software
    - [diamond](https://github.com/bbuchfink/diamond)
    - [blast+ ](https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/)

### Main
+ One of clustering software
    - [cd-hit](https://github.com/weizhongli/cdhit)
    - [MMseqs2](https://github.com/soedinglab/MMseqs2)
+ [mcl](https://github.com/micans/mcl)
+ One of alignment software
    - [diamond](https://github.com/bbuchfink/diamond)
    - [blast+ ](https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/)
+ Using `--retrieve` to retrieve missing gene loci
    - [miniprot](https://github.com/lh3/miniprot)
    - [seqtk](https://github.com/lh3/seqtk)
+ Using `--reannot` to re-annotate your genome
    - [prodigal](https://github.com/hyattpd/Prodigal)

### Postprocessing
+ One of MSA software
    - [muscle](https://github.com/rcedgar/muscle)
    - [mafft](https://github.com/GSLBiotech/mafft)
    - [tcoffee](https://github.com/cbcrg/tcoffee)
+ [ClipKIT](https://github.com/JLSteenwyk/ClipKIT)
+ One of phylogenetic tree construction software
    - [IQ-TREE](http://www.iqtree.org/)
    - [FastTree](https://morgannprice.github.io/fasttree/)
    - [RAxML-ng](https://github.com/amkozlov/raxml-ng)
+ [ClonalFrameML](https://github.com/xavierdidelot/ClonalFrameML)
+ [maskrc-svg](https://github.com/kwongj/maskrc-svg)
+ [fastbaps](https://github.com/gtonkinhill/fastbaps)



### Visulization in  Preprocessing and Postprocessing modules
PGAP2 will call Rscript in your environment virable. The library should have:

+ ggpubr
+ ggrepel
+ dplyr
+ tidyr
+ patchwork
+ optparse



## Detailed documentation
Please refer documentation from [wiki](https://github.com/bucongfan/PGAP2/wiki).




            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "PGAP2",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": "bucongfan <bucf@big.ac.cn>",
    "keywords": "pan-genome, prokaryotic, genomes, pipeline, cluster",
    "author": null,
    "author_email": "bucongfan <bucf@big.ac.cn>",
    "download_url": "https://files.pythonhosted.org/packages/7b/60/90984acd44b5acfeb77071c1ecc6c8146ad3377f02122ee7bafd776b891b/pgap2-1.0.6.tar.gz",
    "platform": null,
    "description": "![image](https://github.com/bucongfan/PGAP2/blob/main/pgap2.logo.png)\n\n## In Brief\nPGAP2 (Pan-Genome Analysis Pipeline 2) is an ultra-fast and comprehensive toolkit for prokaryotic pan-genome analysis. Powered by a Fine-Grained Feature Network, PGAP2 can construct a pan-genome map from 1,000 genomes within 20 minutes while ensuring high accuracy. In addition, it offers a rich set of upstream quality control modules and downstream analysis tools to support common pan-genome analyses.\n\n## Quick start\n### Basic usage\nThe input directory contains all the genome and annotation files.\n\nPGAP2 supports multiple input formats: GFF files in the same format as those output by Prokka, GFF files with their corresponding genome FASTA files in separate files, GenBank flat files (GBFF), or just genome FASTA files (with `--reannot` required).\n\nDifferent formats of input files can be mixed in one input directory. PGAP2 will recognize and process them based on their prefixes and suffixes.\n\n```bash\npgap2 main -i inputdir/ -o outputdir/\n```\n\n### Preprocessing\nQuality checks and visualization are conducted by PGAP2 during the preprocessing step. PGAP2 generates an interactive HTML file and corresponding vector figures to help users understand their input data. The input data and pre-alignment results are stored as a pickle file for quick restarting of the same calculation step.\n\n```bash\npgap2 prep -i inputdir/ -o outputdir/\n```\n\n### Postprocessing\nThe postprocessing pipeline is performed by PGAP2. There are various submodules integrated into the postprocessing module, such as statistical analysis, single-copy tree building, population clustering, and Tajima's D test. Regardless of which submodule you want to use, you can always run it as follows:\n\n```bash\npgap2 post [submodule] [options] -i inputdir/ -o outputdir/\n```\n\nThe inputdir is the outputdir of main module.\n\nPGAP2 also support statistical analysis using a PAV file indepandently:\n\n```bash\npgap2 post profile --pav your_pav_file -o outputdir/\n```\n\n## Installation\nThe best way to install full version of PGAP2 package is using conda:\n\n```bash\nconda create -n pgap2 -c bioconda pgap2\n```\n\nalternatively it is often faster to use the [mamba](https://github.com/mamba-org/mamba) solver (Recommended)\n\n```bash\nconda create -n pgap2  mamba\nconda activate pgap2 \nmamba install -c bioconda pgap2\n```\n\nOr sometimes you only want to carry out a specific function, such as partioning and don't want install too many extra softwares for fully version of PGAP2, then you can just install PGAP2:\n\n```bash\npip install pgap2\n```\n\nOr via source file to get the latest version:\n\n```bash\ngit clone https://github.com/bucongfan/PGAP2\npip install -e PGAP2/\n```\n\nAnd then install extra software that only necessary for a specific function by yourself.\n\nDependencies of PGAP2 are list below, and PGAP2 will check them whether in environment path or in pgap2/dependencies folder.\n\n### Preprocessing\n+ One of clustering software\n    - [cd-hit](https://github.com/weizhongli/cdhit)\n    - [MMseqs2](https://github.com/soedinglab/MMseqs2)\n+ One of alignment software\n    - [diamond](https://github.com/bbuchfink/diamond)\n    - [blast+ ](https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/)\n\n### Main\n+ One of clustering software\n    - [cd-hit](https://github.com/weizhongli/cdhit)\n    - [MMseqs2](https://github.com/soedinglab/MMseqs2)\n+ [mcl](https://github.com/micans/mcl)\n+ One of alignment software\n    - [diamond](https://github.com/bbuchfink/diamond)\n    - [blast+ ](https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/)\n+ Using `--retrieve` to retrieve missing gene loci\n    - [miniprot](https://github.com/lh3/miniprot)\n    - [seqtk](https://github.com/lh3/seqtk)\n+ Using `--reannot` to re-annotate your genome\n    - [prodigal](https://github.com/hyattpd/Prodigal)\n\n### Postprocessing\n+ One of MSA software\n    - [muscle](https://github.com/rcedgar/muscle)\n    - [mafft](https://github.com/GSLBiotech/mafft)\n    - [tcoffee](https://github.com/cbcrg/tcoffee)\n+ [ClipKIT](https://github.com/JLSteenwyk/ClipKIT)\n+ One of phylogenetic tree construction software\n    - [IQ-TREE](http://www.iqtree.org/)\n    - [FastTree](https://morgannprice.github.io/fasttree/)\n    - [RAxML-ng](https://github.com/amkozlov/raxml-ng)\n+ [ClonalFrameML](https://github.com/xavierdidelot/ClonalFrameML)\n+ [maskrc-svg](https://github.com/kwongj/maskrc-svg)\n+ [fastbaps](https://github.com/gtonkinhill/fastbaps)\n\n\n\n### Visulization in  Preprocessing and Postprocessing modules\nPGAP2 will call Rscript in your environment virable. The library should have:\n\n+ ggpubr\n+ ggrepel\n+ dplyr\n+ tidyr\n+ patchwork\n+ optparse\n\n\n\n## Detailed documentation\nPlease refer documentation from [wiki](https://github.com/bucongfan/PGAP2/wiki).\n\n\n\n",
    "bugtrack_url": null,
    "license": "MIT License",
    "summary": "PGAP2: a comprehensive pan-genome analysis pipeline for prokaryotic genomes",
    "version": "1.0.6",
    "project_urls": {
        "documentation": "https://github.com/bucongfan/PGAP2/wiki",
        "homepage": "https://github.com/bucongfan/PGAP2",
        "repository": "https://github.com/bucongfan/PGAP2"
    },
    "split_keywords": [
        "pan-genome",
        " prokaryotic",
        " genomes",
        " pipeline",
        " cluster"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "2bf5e24a30e3b26d2e53d2a1a2098848a94d62ccf60dc15747a6fcef7f85feb9",
                "md5": "c5278545f3741861584cb0c0cdae2baa",
                "sha256": "7afba2dd7ce2b7a06ee1353f41521845cfdbe35bb6e7551116636eb6694dd658"
            },
            "downloads": -1,
            "filename": "pgap2-1.0.6-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c5278545f3741861584cb0c0cdae2baa",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 6428940,
            "upload_time": "2025-08-07T04:54:56",
            "upload_time_iso_8601": "2025-08-07T04:54:56.337069Z",
            "url": "https://files.pythonhosted.org/packages/2b/f5/e24a30e3b26d2e53d2a1a2098848a94d62ccf60dc15747a6fcef7f85feb9/pgap2-1.0.6-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "7b6090984acd44b5acfeb77071c1ecc6c8146ad3377f02122ee7bafd776b891b",
                "md5": "f8fcf5940d03af19f68e9a33143e2989",
                "sha256": "e80e3deee5662e4004e9fdc569222df561a4fd16d2a094ed4c94d48b09dfd78c"
            },
            "downloads": -1,
            "filename": "pgap2-1.0.6.tar.gz",
            "has_sig": false,
            "md5_digest": "f8fcf5940d03af19f68e9a33143e2989",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 6243252,
            "upload_time": "2025-08-07T04:54:58",
            "upload_time_iso_8601": "2025-08-07T04:54:58.454569Z",
            "url": "https://files.pythonhosted.org/packages/7b/60/90984acd44b5acfeb77071c1ecc6c8146ad3377f02122ee7bafd776b891b/pgap2-1.0.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-07 04:54:58",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "bucongfan",
    "github_project": "PGAP2",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "pgap2"
}
        
Elapsed time: 0.99794s