genebank-file-generater


Namegenebank-file-generater JSON
Version 0.1.2 PyPI version JSON
download
home_pageNone
SummaryA comprehensive tool for converting DNA FASTA files to annotated GenBank format with automated gene prediction using Augustus
upload_time2025-07-29 03:54:07
maintainerGbkGen Development Team
docs_urlNone
authorGbkGen Development Team
requires_python>=3.13
licenseAGPL-3.0-or-later
keywords augustus bioinformatics fasta genbank gene-prediction genomics gff
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # GbkGen - GenBank File Generator

A comprehensive tool for converting DNA FASTA files to annotated GenBank format with automated gene prediction using Augustus. GbkGen provides a command-line interface for flexible genomic data processing.

## Features

- **FASTA to GenBank Conversion**: Convert DNA sequences from FASTA format to fully annotated GenBank files
- **Automated Gene Prediction**: Integrated Augustus gene prediction with support for multiple species models
- **GFF File Support**: Use existing GFF annotations or generate new ones with Augustus
- **Multiprocessing Support**: Parallel processing for large datasets with configurable CPU cores
- **Multi-sequence Processing**: Handle multiple DNA sequences in a single FASTA file
- **Species-Specific Models**: Configurable Augustus species models for accurate gene prediction
- **Robust Error Handling**: Comprehensive logging and error reporting
- **File Validation**: Automatic validation of input files and compatibility checking
- **Temporary File Management**: Automatic cleanup of intermediate files

## Installation

### Prerequisites
- Python 3.13 or higher
- Augustus gene prediction tool (installed and available in PATH)
- pip or uv package manager

### Using UV (Recommended)
```bash
# Clone the repository
git clone https://github.com/darrengao628/genebank_file_generater
cd genebank_file_generater

# Install with uv
uv sync
```

### Using pip
```bash
# Clone the repository
git clone https://github.com/darrengao628/genebank_file_generater
cd genebank_file_generater

# Install dependencies
pip install -r genebank_file_generater/requirements.txt
```

### Augustus Installation
Make sure Augustus is installed and available in your PATH:

```bash
# For Ubuntu/Debian
sudo apt-get install augustus

# For macOS with Homebrew
brew install augustus

# Or build from source
# Follow instructions at: http://bioinf.uni-greifswald.de/augustus/
```

## Usage

### Command Line Interface

#### Basic Usage

**If installed from source:**
```bash
# Convert FASTA to GenBank (automatically creates input.gbk)
python -m genebank_file_generater.genebank_generater input.fasta

# With custom output filename
python -m genebank_file_generater.genebank_generater input.fasta -o output.gbk

# With specific species model
python -m genebank_file_generater.genebank_generater input.fasta -s human

# Using multiple CPU cores for faster processing
python -m genebank_file_generater.genebank_generater input.fasta -c 8
```

**If installed via pip:**
```bash
# Convert FASTA to GenBank (automatically creates input.gbk)
gbkgen input.fasta

# With custom output filename
gbkgen input.fasta -o output.gbk

# With specific species model
gbkgen input.fasta -s human

# Using multiple CPU cores for faster processing
gbkgen input.fasta -c 8
```

#### Automatic GFF File Detection
The program automatically detects corresponding GFF files:
- If `input.fasta` is provided, it looks for `input.gff` or `input.gff3`
- If found, the GFF file is used automatically (no need for `-g` flag)
- The output filename is always based on the input FASTA filename

```bash
# If 299.fa and 299.gff exist, this automatically uses 299.gff
python -m genebank_file_generater.genebank_generater 299.fa
# Creates 299.gbk as output

# Override automatic GFF detection with explicit GFF file
python -m genebank_file_generater.genebank_generater 299.fa -g custom.gff -o output.gbk
```

#### Advanced Usage
```bash
# Use existing GFF file instead of running Augustus
gbkgen input.fasta -g annotations.gff -o output.gbk

# Specify custom working directory
gbkgen input.fasta -w /tmp/augustus -o output.gbk

# Full example with all options
gbkgen input.fasta \
  --output output.gbk \
  --species aspergillus_fumigatus \
  --workdir ./augustus_output \
  --cpu 4
```

#### Command Line Options
| Option | Short | Description | Default |
|--------|-------|-------------|---------|
| `input` | | Input DNA FASTA file (required) | |
| `--output` | `-o` | Output GenBank file | input.gbk |
| `--species` | `-s` | Augustus species model | aspergillus_fumigatus |
| `--workdir` | `-w` | Working directory for Augustus | ./augustus_output |
| `--gff` | `-g` | Pre-existing GFF3 file | None |
| `--cpu` | `-c` | Number of CPU cores | All available |

## Supported Species Models

GbkGen supports all Augustus species models. Common models include:

- `aspergillus_fumigatus` - Aspergillus fumigatus (default)

For a complete list, run:
```bash
augustus --species=help
```

## Project Structure

```
GbkGen/
├── README.md                           # Main project documentation
├── pyproject.toml                      # Project configuration
├── main.py                             # Simple entry point
├── claude.md                           # Technical analysis
├── genebank_file_generater/            # Core conversion library
│   ├── __init__.py
│   ├── genebank_generater.py          # Main conversion logic
│   ├── gff_parser.py                  # GFF file parsing
│   ├── record.py                      # Record and feature management
│   ├── pyproject.toml                 # Package configuration
│   ├── requirements.txt               # Dependencies
│   ├── README.md                      # Package documentation
│   └── ToDO.md                        # Development roadmap
├── augustus_output/                   # Default Augustus output directory

```

### Getting Help
- Check the [Issues](https://github.com/darrengao628/genebank_file_generater/issues) page
- Review the [ToDO.md](genebank_file_generater/ToDO.md) for known limitations
- Create a new issue with detailed error information


## Changelog

### Version 0.1.0
- Initial release
- Core FASTA to GenBank conversion functionality
- Augustus integration with multiprocessing support
- GFF file parsing and validation
- Comprehensive error handling and logging
- Package distribution support with PyPI
- Simplified dependencies for easier installation


## Acknowledgments

- **BioPython** team for sequence handling libraries
- **Augustus** team for gene prediction software
- **antiSMASH** project for GFF parsing components


---

For more information, visit the [project repository](https://github.com/darrengao628/genebank_file_generater) or contact the development team.
            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "genebank-file-generater",
    "maintainer": "GbkGen Development Team",
    "docs_url": null,
    "requires_python": ">=3.13",
    "maintainer_email": null,
    "keywords": "augustus, bioinformatics, fasta, genbank, gene-prediction, genomics, gff",
    "author": "GbkGen Development Team",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/3b/17/c7d3c2b2bde67f611e896457d01b24ccc2b98ad70b04951aaadbc3d91761/genebank_file_generater-0.1.2.tar.gz",
    "platform": null,
    "description": "# GbkGen - GenBank File Generator\n\nA comprehensive tool for converting DNA FASTA files to annotated GenBank format with automated gene prediction using Augustus. GbkGen provides a command-line interface for flexible genomic data processing.\n\n## Features\n\n- **FASTA to GenBank Conversion**: Convert DNA sequences from FASTA format to fully annotated GenBank files\n- **Automated Gene Prediction**: Integrated Augustus gene prediction with support for multiple species models\n- **GFF File Support**: Use existing GFF annotations or generate new ones with Augustus\n- **Multiprocessing Support**: Parallel processing for large datasets with configurable CPU cores\n- **Multi-sequence Processing**: Handle multiple DNA sequences in a single FASTA file\n- **Species-Specific Models**: Configurable Augustus species models for accurate gene prediction\n- **Robust Error Handling**: Comprehensive logging and error reporting\n- **File Validation**: Automatic validation of input files and compatibility checking\n- **Temporary File Management**: Automatic cleanup of intermediate files\n\n## Installation\n\n### Prerequisites\n- Python 3.13 or higher\n- Augustus gene prediction tool (installed and available in PATH)\n- pip or uv package manager\n\n### Using UV (Recommended)\n```bash\n# Clone the repository\ngit clone https://github.com/darrengao628/genebank_file_generater\ncd genebank_file_generater\n\n# Install with uv\nuv sync\n```\n\n### Using pip\n```bash\n# Clone the repository\ngit clone https://github.com/darrengao628/genebank_file_generater\ncd genebank_file_generater\n\n# Install dependencies\npip install -r genebank_file_generater/requirements.txt\n```\n\n### Augustus Installation\nMake sure Augustus is installed and available in your PATH:\n\n```bash\n# For Ubuntu/Debian\nsudo apt-get install augustus\n\n# For macOS with Homebrew\nbrew install augustus\n\n# Or build from source\n# Follow instructions at: http://bioinf.uni-greifswald.de/augustus/\n```\n\n## Usage\n\n### Command Line Interface\n\n#### Basic Usage\n\n**If installed from source:**\n```bash\n# Convert FASTA to GenBank (automatically creates input.gbk)\npython -m genebank_file_generater.genebank_generater input.fasta\n\n# With custom output filename\npython -m genebank_file_generater.genebank_generater input.fasta -o output.gbk\n\n# With specific species model\npython -m genebank_file_generater.genebank_generater input.fasta -s human\n\n# Using multiple CPU cores for faster processing\npython -m genebank_file_generater.genebank_generater input.fasta -c 8\n```\n\n**If installed via pip:**\n```bash\n# Convert FASTA to GenBank (automatically creates input.gbk)\ngbkgen input.fasta\n\n# With custom output filename\ngbkgen input.fasta -o output.gbk\n\n# With specific species model\ngbkgen input.fasta -s human\n\n# Using multiple CPU cores for faster processing\ngbkgen input.fasta -c 8\n```\n\n#### Automatic GFF File Detection\nThe program automatically detects corresponding GFF files:\n- If `input.fasta` is provided, it looks for `input.gff` or `input.gff3`\n- If found, the GFF file is used automatically (no need for `-g` flag)\n- The output filename is always based on the input FASTA filename\n\n```bash\n# If 299.fa and 299.gff exist, this automatically uses 299.gff\npython -m genebank_file_generater.genebank_generater 299.fa\n# Creates 299.gbk as output\n\n# Override automatic GFF detection with explicit GFF file\npython -m genebank_file_generater.genebank_generater 299.fa -g custom.gff -o output.gbk\n```\n\n#### Advanced Usage\n```bash\n# Use existing GFF file instead of running Augustus\ngbkgen input.fasta -g annotations.gff -o output.gbk\n\n# Specify custom working directory\ngbkgen input.fasta -w /tmp/augustus -o output.gbk\n\n# Full example with all options\ngbkgen input.fasta \\\n  --output output.gbk \\\n  --species aspergillus_fumigatus \\\n  --workdir ./augustus_output \\\n  --cpu 4\n```\n\n#### Command Line Options\n| Option | Short | Description | Default |\n|--------|-------|-------------|---------|\n| `input` | | Input DNA FASTA file (required) | |\n| `--output` | `-o` | Output GenBank file | input.gbk |\n| `--species` | `-s` | Augustus species model | aspergillus_fumigatus |\n| `--workdir` | `-w` | Working directory for Augustus | ./augustus_output |\n| `--gff` | `-g` | Pre-existing GFF3 file | None |\n| `--cpu` | `-c` | Number of CPU cores | All available |\n\n## Supported Species Models\n\nGbkGen supports all Augustus species models. Common models include:\n\n- `aspergillus_fumigatus` - Aspergillus fumigatus (default)\n\nFor a complete list, run:\n```bash\naugustus --species=help\n```\n\n## Project Structure\n\n```\nGbkGen/\n\u251c\u2500\u2500 README.md                           # Main project documentation\n\u251c\u2500\u2500 pyproject.toml                      # Project configuration\n\u251c\u2500\u2500 main.py                             # Simple entry point\n\u251c\u2500\u2500 claude.md                           # Technical analysis\n\u251c\u2500\u2500 genebank_file_generater/            # Core conversion library\n\u2502   \u251c\u2500\u2500 __init__.py\n\u2502   \u251c\u2500\u2500 genebank_generater.py          # Main conversion logic\n\u2502   \u251c\u2500\u2500 gff_parser.py                  # GFF file parsing\n\u2502   \u251c\u2500\u2500 record.py                      # Record and feature management\n\u2502   \u251c\u2500\u2500 pyproject.toml                 # Package configuration\n\u2502   \u251c\u2500\u2500 requirements.txt               # Dependencies\n\u2502   \u251c\u2500\u2500 README.md                      # Package documentation\n\u2502   \u2514\u2500\u2500 ToDO.md                        # Development roadmap\n\u251c\u2500\u2500 augustus_output/                   # Default Augustus output directory\n\n```\n\n### Getting Help\n- Check the [Issues](https://github.com/darrengao628/genebank_file_generater/issues) page\n- Review the [ToDO.md](genebank_file_generater/ToDO.md) for known limitations\n- Create a new issue with detailed error information\n\n\n## Changelog\n\n### Version 0.1.0\n- Initial release\n- Core FASTA to GenBank conversion functionality\n- Augustus integration with multiprocessing support\n- GFF file parsing and validation\n- Comprehensive error handling and logging\n- Package distribution support with PyPI\n- Simplified dependencies for easier installation\n\n\n## Acknowledgments\n\n- **BioPython** team for sequence handling libraries\n- **Augustus** team for gene prediction software\n- **antiSMASH** project for GFF parsing components\n\n\n---\n\nFor more information, visit the [project repository](https://github.com/darrengao628/genebank_file_generater) or contact the development team.",
    "bugtrack_url": null,
    "license": "AGPL-3.0-or-later",
    "summary": "A comprehensive tool for converting DNA FASTA files to annotated GenBank format with automated gene prediction using Augustus",
    "version": "0.1.2",
    "project_urls": {
        "Bug Tracker": "https://github.com/darrengao628/genebank_file_generater/issues",
        "Documentation": "https://github.com/darrengao628/genebank_file_generater#readme",
        "Homepage": "https://github.com/darrengao628/genebank_file_generater",
        "Repository": "https://github.com/darrengao628/genebank_file_generater"
    },
    "split_keywords": [
        "augustus",
        " bioinformatics",
        " fasta",
        " genbank",
        " gene-prediction",
        " genomics",
        " gff"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "94b0f4b18e42b500168e3ccfea035d8c79b8b0ca493b1ab4665412e13d4188e7",
                "md5": "38355fe73672e2ea09481687d559d769",
                "sha256": "b0a0f7dd24a00c635c086d069d4ae309ffe9db6ca15740ddfdb0bb24151493bb"
            },
            "downloads": -1,
            "filename": "genebank_file_generater-0.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "38355fe73672e2ea09481687d559d769",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.13",
            "size": 3739,
            "upload_time": "2025-07-29T03:54:06",
            "upload_time_iso_8601": "2025-07-29T03:54:06.329820Z",
            "url": "https://files.pythonhosted.org/packages/94/b0/f4b18e42b500168e3ccfea035d8c79b8b0ca493b1ab4665412e13d4188e7/genebank_file_generater-0.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "3b17c7d3c2b2bde67f611e896457d01b24ccc2b98ad70b04951aaadbc3d91761",
                "md5": "385e54fbc5c2d4f76cdc96ca8733320b",
                "sha256": "9c797156413ddd2fc843b6c3438dff19e90bbfb35d49a189b99fcde76be71eba"
            },
            "downloads": -1,
            "filename": "genebank_file_generater-0.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "385e54fbc5c2d4f76cdc96ca8733320b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.13",
            "size": 3299,
            "upload_time": "2025-07-29T03:54:07",
            "upload_time_iso_8601": "2025-07-29T03:54:07.592396Z",
            "url": "https://files.pythonhosted.org/packages/3b/17/c7d3c2b2bde67f611e896457d01b24ccc2b98ad70b04951aaadbc3d91761/genebank_file_generater-0.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-29 03:54:07",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "darrengao628",
    "github_project": "genebank_file_generater",
    "github_not_found": true,
    "lcname": "genebank-file-generater"
}
        
Elapsed time: 1.51179s