gfanno


Namegfanno JSON
Version 1.4 PyPI version JSON
download
home_pagehttps://github.com/qunjie-zhang/gfanno
SummaryGene Family Annotation
upload_time2024-03-26 12:43:15
maintainerNone
docs_urlNone
author['wangzt', 'duliuxu']
requires_python>=3.5
licenseNone
keywords gene family bioinformatics pipline
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Gfanno
Gene Family Annoation Pipline
```
   ______  ________    _     Gene Family Annotation Workflow                           
 .' ___  ||_   __  |  / \     Bioinformatics Lab of SCAU.            
/ .'   \_|  | |_ \_| / _ \     _ .--.   _ .--.   .--.   
| |   ____  |  _|   / ___ \   [ `.-. | [ `.-. |/ .'`\ \ 
\ `.___]  |_| |_  _/ /   \ \_  | | | |  | | | || \__. | 
 `._____.'|_____||____| |____|[___||__][___||__]'.__.'  
```
## Introduction
This software is used to identify candidate genes.
Here, we demonstrate the identification of the SCPLⅠA gene in Tieguanyin as an example, 
based on parameter thresholds for gene filtering.


## Install
### Preparation before use
Please ensure that you have installed BLAST and Hmmer software correctly before use.
You can execute the following two commands to test the software installation situation.

```shell
# Test blastp
blastp -version
# Test Hmmsearch
hmmsearch -h
```
### Environment
python >= 3.5

### Install Gfanno
There are multiple ways to install gfanno software here.Just choose one of the methods.
- Installing using pypi
```shell
pip install gfanno
```
- Installing using source code
```shell
git clone https://github.com/qunjie-zhang/gfanno.git
cd gfanno
python setup.py install
```

## Quick Start
```shell
# Generate basic configuration file.
gfanno -g
# Release sample data, including HMM models and seed files.
gfanno --data
# Enter the fasta file path you need to use after the - f parameter
# This example uses a seed file. In actual use, the seed file is not passed in here.
gfanno -f test.fasta
```

## Usage
### Get Help Information
Use it as a terminal command. For all parameters, type `gfanno -h`.
```shell

Program:    gfanno (Gene Family Annoation Workflow)
Version:    1.4

    Useage: gfanno  <command> [options]

    Commands:
        -f / --fasta        Input fasta file path. This option is required.
        -- / --deredundant  To de-redundant the annotation results of different subfamilies.
                            Enter at least two gfanno output stat files path,Suppy ort - o parameter to specifoutput name.
        -o / --output       Output file path.
        -c / --config       Use the specified configuration file. This parameter is optional. 
                            If you do not set this parameter, the program will use 'gfanno_config.ini' by default.
        -t / --target       Specifies the parameter category used in the configuration file. This option is required.
        -g / --generate     Generate the default configuration file (gfanno_config.ini) under the current path.
                            Used to initialize the software operating environment or reset damaged configuration files.
        -- / --data         Release built-in data sets in the current directory.
        -h / --help         Display this help message.
        -v / --version      Detailed version information.
``` 
* `-f` is the input fasta file path,is a required parameter. 
* `--deredundant` The function of merge is to de-redundant the annotation results of different subfamilies with the same domain in a gene family or supergene family, with the goal of eventually annotating the gene into a more correct subfamily and avoiding incorrect annotation.
* `-o` is the data output path. If the path does not exist, the program will automatically create it. The default is `output`
* `-c` This parameter is used to specify the path of the configuration file. By default, the configuration file is located in the current directory and is named `gfanno_config.ini`. In this case, you do not need to specify this parameter, as the program will use this file by default. If your configuration file has a different name or path, or if you want to use a specific configuration file, please use this parameter to inform the program about the configuration file you want to use.
* `-t` This is an optional parameter used to specify the parameter schemes used during software runtime. In the configuration file, you can define multiple runtime schemes. You can use this parameter to specify the names of the runtime schemes, allowing you to input multiple names separated by commas. If this parameter is not specified, all the schemes provided in the configuration file will be used by default.
* `-g` This parameter is used to generate the default configuration file in the current directory. The default name of this file is `gfanno_config.ini`. You can use this parameter to create a new configuration file for modification when the configuration file template is missing or when the file is corrupt.
* `--data` This is used to extract the built-in default dataset in the current path. It includes seed files and HMM models.


### Configuration file
Before starting, you need to prepare a configuration file to obtain relevant parameters when the software is running.
You can directly use the `gfanno -g` command to create the default configuration content configuration file `gfanno_config.ini` in the current path.
When special attention is required, you may create multiple sets of configuration files. Be careful to avoid accidental overwriting of files and loss of content.

You can create multiple configuration files to use with the software by using the -c parameter, or you can maintain a set of configuration files for use with the -t parameter to specify which ones will be run. It's important to note that the software has a configuration file validation feature. When the software is run, it will first check whether the configuration files being used are all valid. Data that doesn't meet the requirements will be flagged and the program will terminate

This is a demonstration scenario from the default configuration file:
```ini
# PPO
[PPO]
blastp_seed = seed/PPO.seed.fasta
hmm = hmm/PPO1_DWL.hmm,hmm/PPO1_KFDV.hmm
hmm_coverage = 70,70
domain = PPO1_KFDV,PPO1_DWL
blastp_identity = 50
blastp_qcovs = 50
```
In the configuration file, the hmm parameter, hmm_coverage parameter, and domain parameter can have multiple values separated by commas. However, it is important to note that the number of values they carry must be equal.

The configuration file supports comment text information starting with ‘;’ or ‘#’

## LICENSE
Copyright [2023] [Bioinformatics Laboratory of South China Agricultural University]

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

## Contact
If any questions, please create an issue on this repo, we will deal with it as soon as possible.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/qunjie-zhang/gfanno",
    "name": "gfanno",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.5",
    "maintainer_email": null,
    "keywords": "gene family, bioinformatics, pipline",
    "author": "['wangzt', 'duliuxu']",
    "author_email": "interestingcn01@gmail.com",
    "download_url": null,
    "platform": null,
    "description": "# Gfanno\nGene Family Annoation Pipline\n```\n   ______  ________    _     Gene Family Annotation Workflow                           \n .' ___  ||_   __  |  / \\     Bioinformatics Lab of SCAU.            \n/ .'   \\_|  | |_ \\_| / _ \\     _ .--.   _ .--.   .--.   \n| |   ____  |  _|   / ___ \\   [ `.-. | [ `.-. |/ .'`\\ \\ \n\\ `.___]  |_| |_  _/ /   \\ \\_  | | | |  | | | || \\__. | \n `._____.'|_____||____| |____|[___||__][___||__]'.__.'  \n```\n## Introduction\nThis software is used to identify candidate genes.\nHere, we demonstrate the identification of the SCPL\u2160A gene in Tieguanyin as an example, \nbased on parameter thresholds for gene filtering.\n\n\n## Install\n### Preparation before use\nPlease ensure that you have installed BLAST and Hmmer software correctly before use.\nYou can execute the following two commands to test the software installation situation.\n\n```shell\n# Test blastp\nblastp -version\n# Test Hmmsearch\nhmmsearch -h\n```\n### Environment\npython >= 3.5\n\n### Install Gfanno\nThere are multiple ways to install gfanno software here.Just choose one of the methods.\n- Installing using pypi\n```shell\npip install gfanno\n```\n- Installing using source code\n```shell\ngit clone https://github.com/qunjie-zhang/gfanno.git\ncd gfanno\npython setup.py install\n```\n\n## Quick Start\n```shell\n# Generate basic configuration file.\ngfanno -g\n# Release sample data, including HMM models and seed files.\ngfanno --data\n# Enter the fasta file path you need to use after the - f parameter\n# This example uses a seed file. In actual use, the seed file is not passed in here.\ngfanno -f test.fasta\n```\n\n## Usage\n### Get Help Information\nUse it as a terminal command. For all parameters, type `gfanno -h`.\n```shell\n\nProgram:    gfanno (Gene Family Annoation Workflow)\nVersion:    1.4\n\n    Useage: gfanno  <command> [options]\n\n    Commands:\n        -f / --fasta        Input fasta file path. This option is required.\n        -- / --deredundant  To de-redundant the annotation results of different subfamilies.\n                            Enter at least two gfanno output stat files path,Suppy ort - o parameter to specifoutput name.\n        -o / --output       Output file path.\n        -c / --config       Use the specified configuration file. This parameter is optional. \n                            If you do not set this parameter, the program will use 'gfanno_config.ini' by default.\n        -t / --target       Specifies the parameter category used in the configuration file. This option is required.\n        -g / --generate     Generate the default configuration file (gfanno_config.ini) under the current path.\n                            Used to initialize the software operating environment or reset damaged configuration files.\n        -- / --data         Release built-in data sets in the current directory.\n        -h / --help         Display this help message.\n        -v / --version      Detailed version information.\n``` \n* `-f` is the input fasta file path,is a required parameter. \n* `--deredundant` The function of merge is to de-redundant the annotation results of different subfamilies with the same domain in a gene family or supergene family, with the goal of eventually annotating the gene into a more correct subfamily and avoiding incorrect annotation.\n* `-o` is the data output path. If the path does not exist, the program will automatically create it. The default is `output`\n* `-c` This parameter is used to specify the path of the configuration file. By default, the configuration file is located in the current directory and is named `gfanno_config.ini`. In this case, you do not need to specify this parameter, as the program will use this file by default. If your configuration file has a different name or path, or if you want to use a specific configuration file, please use this parameter to inform the program about the configuration file you want to use.\n* `-t` This is an optional parameter used to specify the parameter schemes used during software runtime. In the configuration file, you can define multiple runtime schemes. You can use this parameter to specify the names of the runtime schemes, allowing you to input multiple names separated by commas. If this parameter is not specified, all the schemes provided in the configuration file will be used by default.\n* `-g` This parameter is used to generate the default configuration file in the current directory. The default name of this file is `gfanno_config.ini`. You can use this parameter to create a new configuration file for modification when the configuration file template is missing or when the file is corrupt.\n* `--data` This is used to extract the built-in default dataset in the current path. It includes seed files and HMM models.\n\n\n### Configuration file\nBefore starting, you need to prepare a configuration file to obtain relevant parameters when the software is running.\nYou can directly use the `gfanno -g` command to create the default configuration content configuration file `gfanno_config.ini` in the current path.\nWhen special attention is required, you may create multiple sets of configuration files. Be careful to avoid accidental overwriting of files and loss of content.\n\nYou can create multiple configuration files to use with the software by using the -c parameter, or you can maintain a set of configuration files for use with the -t parameter to specify which ones will be run. It's important to note that the software has a configuration file validation feature. When the software is run, it will first check whether the configuration files being used are all valid. Data that doesn't meet the requirements will be flagged and the program will terminate\n\nThis is a demonstration scenario from the default configuration file:\n```ini\n# PPO\n[PPO]\nblastp_seed = seed/PPO.seed.fasta\nhmm = hmm/PPO1_DWL.hmm,hmm/PPO1_KFDV.hmm\nhmm_coverage = 70,70\ndomain = PPO1_KFDV,PPO1_DWL\nblastp_identity = 50\nblastp_qcovs = 50\n```\nIn the configuration file, the hmm parameter, hmm_coverage parameter, and domain parameter can have multiple values separated by commas. However, it is important to note that the number of values they carry must be equal.\n\nThe configuration file supports comment text information starting with \u2018;\u2019 or \u2018#\u2019\n\n## LICENSE\nCopyright [2023] [Bioinformatics Laboratory of South China Agricultural University]\n\nLicensed under the Apache License, Version 2.0 (the \"License\");\nyou may not use this file except in compliance with the License.\nYou may obtain a copy of the License at\n\n    http://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing, software\ndistributed under the License is distributed on an \"AS IS\" BASIS,\nWITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\nSee the License for the specific language governing permissions and\nlimitations under the License.\n\n## Contact\nIf any questions, please create an issue on this repo, we will deal with it as soon as possible.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Gene Family Annotation",
    "version": "1.4",
    "project_urls": {
        "Homepage": "https://github.com/qunjie-zhang/gfanno"
    },
    "split_keywords": [
        "gene family",
        " bioinformatics",
        " pipline"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d6a02a197712781f9d055290f226495bd2956724c53621d1440eac979a7e49f4",
                "md5": "4fcfcc05063500159309f8ece60349c4",
                "sha256": "afad59e9a5d375eb190cea4423fb55bf6f828d9d486f2c2ff4ae134054d7b6dd"
            },
            "downloads": -1,
            "filename": "gfanno-1.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "4fcfcc05063500159309f8ece60349c4",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.5",
            "size": 421346,
            "upload_time": "2024-03-26T12:43:15",
            "upload_time_iso_8601": "2024-03-26T12:43:15.892013Z",
            "url": "https://files.pythonhosted.org/packages/d6/a0/2a197712781f9d055290f226495bd2956724c53621d1440eac979a7e49f4/gfanno-1.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-26 12:43:15",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "qunjie-zhang",
    "github_project": "gfanno",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "gfanno"
}
        
Elapsed time: 0.22930s