PhenGO


NamePhenGO JSON
Version 0.1.2 PyPI version JSON
download
home_pagehttps://github.com/NickJD/PhenoGO
SummaryPhenoGO: A tool to build WEKA-ready ARFF files from model organism phenotype and Gene Ontology (GO) annotations.
upload_time2025-08-06 18:39:10
maintainerNone
docs_urlNone
authorNicholas Dimonaco
requires_python>=3.6
licenseNone
keywords phenotype gene ontology go weka arff machine learning model organism
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![DOI](https://zenodo.org/badge/1026270084.svg)](https://doi.org/10.5281/zenodo.16748708)

# PhenGO

## Overview

This project provides a unified Python-based tool to generate ready-to-use WEKA ARFF formatted files, specifically designed for machine learning applications involving gene essentiality prediction. 
The tool integrates phenotype data and Gene Ontology (GO) annotations for genes from selected model organisms, streamlining the data preparation process.

## Purpose

The main goal of this project is to simplify and standardise the creation of ARFF files that combine phenotype information with GO-mapped gene data. 
This enables researchers to efficiently apply machine learning techniques (using WEKA or similar platforms) to analyse gene essentiality and related biological questions across various model organisms.

## Features

- **Unified Workflow:** Handles data collection, integration, and formatting in a single pipeline.
- **Model Organism Support:** Designed for commonly studied organisms (e.g., *Saccharomyces cerevisiae*, *Mus musculus*).
- **GO Annotation Integration:** Maps genes to their respective GO terms for comprehensive feature representation and traces obo files to acquire parent terms.
- **Phenotype Data Inclusion:** Incorporates phenotype labels for supervised learning tasks.
- **WEKA ARFF Output:** Produces files in the ARFF format, ready for immediate use in WEKA.

## Installation
To install the PhenGO package, you can use pip:

```bash
pip install phengo
```

## Usage
### PhenGO Package:
#### PhenGO Example:
```commandline
PhenGO -species fly -phenotype_file data/fly/phenotype_data/2017/allele_phenotypic_data_fb_2017_05.tsv.gz -gene_association_file data/fly/gene_association/2017/gene_association_2017_05.fb.gz
-go_obo_file data/go/2017/go_2017-05-01.obo.gz -output_dir Documents/PhenGO/fly_2017
```
The output will be saved in the specified output directory, which will contain the ARFF file and other relevant data files.
#### Menu:
```bash
usage: PhenGO.py [-h] -species SPECIES -phenotype_file PHENOTYPE_FILE
                 -gene_association_file GENE_ASSOCIATION_FILE -go_obo_file
                 GO_OBO_FILE -output_dir OUTPUT_DIR [-filter_unused_gos]
                 [-filter_mixed_terms] [-gene_go_pheno]
                 [-fly_assignments FLY_ASSIGNMENTS]
                 [-driver_lines DRIVER_LINES] [-filt_with]
                 [-worm_phenotypes WORM_PHENOTYPES]
                 [-mouse_phenotypes MOUSE_PHENOTYPES] [-v]

PhenGO v0.1.2 - Convert phenotype and GO data to ARFF format

Required Options:
  -species SPECIES      Species tag (e.g., fly, yeast)
  -phenotype_file PHENOTYPE_FILE
                        Path to the phenotype data file (.gz)
  -gene_association_file GENE_ASSOCIATION_FILE
                        Path to the gene association file (.gz)
  -go_obo_file GO_OBO_FILE
                        Path to the go.obo file
  -output_dir OUTPUT_DIR
                        Output directory

Optional parameters:
  -filter_unused_gos    Filter out unused GO terms from the FUNC and ARFF
                        output (default: True)
  -filter_mixed_terms   Filter out genes which have both lethal and viable
                        phenotypes - Terms not specifically lethal/viable are
                        not counted in this (default: False)
  -gene_go_pheno        Output "Gene-GO-Phenotype" (Rbbp5 GO:0003674 0) file
                        for overrepresentation analysis with tools such as
                        FUNC (default: False)

Fly specific parameters:
  -fly_assignments FLY_ASSIGNMENTS
                        Provide TSV file of fly assignments (file confirming
                        genes are assignment to drosophila melanogaster
                        (default: "data/fly/FlyBase_Fields_2017.txt.gz")
  -driver_lines DRIVER_LINES
                        Provide TSV file of fly driver lines (file containing
                        the name of driver lines (RNAi) to ignore when present
                        with the "with" tag (default: "data/fly/FlyBase_Driver
                        Line_Fields_2025_08_05.txt.gz")
  -filt_with            Filter out phenotype with "with" tag (default: DO NOT
                        FILTER)

Worm specific parameters:
  -worm_phenotypes WORM_PHENOTYPES
                        Provide TSV file of worm phenotypes (default:
                        "data/worm/WS297_lethal_terms.tsv.gz")

Mouse specific parameters:
  -mouse_phenotypes MOUSE_PHENOTYPES
                        Provide TSV file of mouse phenotypes (default:
                        "data/mouse/mouse_lethal_terms.txt.gz")

Misc:
  -v, --version         show program's version number and exit
```

### Compare-ARFF:
```commandline
usage: compare_arff_genes.py [-h] -arff_a ARFF_A -arff_b ARFF_B -o OUTPUT

PhenoGO v0.1.2 - Compare-ARFF: Compare two ARFF files.

options:
  -h, --help      show this help message and exit
  -arff_a ARFF_A  Master ARFF file (reference)
  -arff_b ARFF_B  Comparison ARFF file
  -o OUTPUT       Output CSV file

```

**Output:**
The output of the `compare-arff` function is a CSV file that summarizes the comparison between two ARFF files.
```commandline
Gene,Label A,Label B,GO Terms Differ,Status
GeneA,lethal,,,"MISSING_IN_B"
GeneB,lethal,viable,,"LABEL_MISMATCH"
GeneC,viable,viable,GO:0008150;GO:0003674,"GO_TERM_MISMATCH"
GeneD,viable,viable,,"EXACT_MATCH"
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/NickJD/PhenoGO",
    "name": "PhenGO",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": "phenotype, gene ontology, GO, WEKA, ARFF, machine learning, model organism",
    "author": "Nicholas Dimonaco",
    "author_email": "nicholas@dimonaco.co.uk",
    "download_url": "https://files.pythonhosted.org/packages/9f/68/d34a934dbda06eff5fd42b2a8d4f6dc6e8e4ea3613f826d54e1999c289c1/phengo-0.1.2.tar.gz",
    "platform": null,
    "description": "[![DOI](https://zenodo.org/badge/1026270084.svg)](https://doi.org/10.5281/zenodo.16748708)\n\n# PhenGO\n\n## Overview\n\nThis project provides a unified Python-based tool to generate ready-to-use WEKA ARFF formatted files, specifically designed for machine learning applications involving gene essentiality prediction. \nThe tool integrates phenotype data and Gene Ontology (GO) annotations for genes from selected model organisms, streamlining the data preparation process.\n\n## Purpose\n\nThe main goal of this project is to simplify and standardise the creation of ARFF files that combine phenotype information with GO-mapped gene data. \nThis enables researchers to efficiently apply machine learning techniques (using WEKA or similar platforms) to analyse gene essentiality and related biological questions across various model organisms.\n\n## Features\n\n- **Unified Workflow:** Handles data collection, integration, and formatting in a single pipeline.\n- **Model Organism Support:** Designed for commonly studied organisms (e.g., *Saccharomyces cerevisiae*, *Mus musculus*).\n- **GO Annotation Integration:** Maps genes to their respective GO terms for comprehensive feature representation and traces obo files to acquire parent terms.\n- **Phenotype Data Inclusion:** Incorporates phenotype labels for supervised learning tasks.\n- **WEKA ARFF Output:** Produces files in the ARFF format, ready for immediate use in WEKA.\n\n## Installation\nTo install the PhenGO package, you can use pip:\n\n```bash\npip install phengo\n```\n\n## Usage\n### PhenGO Package:\n#### PhenGO Example:\n```commandline\nPhenGO -species fly -phenotype_file data/fly/phenotype_data/2017/allele_phenotypic_data_fb_2017_05.tsv.gz -gene_association_file data/fly/gene_association/2017/gene_association_2017_05.fb.gz\n-go_obo_file data/go/2017/go_2017-05-01.obo.gz -output_dir Documents/PhenGO/fly_2017\n```\nThe output will be saved in the specified output directory, which will contain the ARFF file and other relevant data files.\n#### Menu:\n```bash\nusage: PhenGO.py [-h] -species SPECIES -phenotype_file PHENOTYPE_FILE\n                 -gene_association_file GENE_ASSOCIATION_FILE -go_obo_file\n                 GO_OBO_FILE -output_dir OUTPUT_DIR [-filter_unused_gos]\n                 [-filter_mixed_terms] [-gene_go_pheno]\n                 [-fly_assignments FLY_ASSIGNMENTS]\n                 [-driver_lines DRIVER_LINES] [-filt_with]\n                 [-worm_phenotypes WORM_PHENOTYPES]\n                 [-mouse_phenotypes MOUSE_PHENOTYPES] [-v]\n\nPhenGO v0.1.2 - Convert phenotype and GO data to ARFF format\n\nRequired Options:\n  -species SPECIES      Species tag (e.g., fly, yeast)\n  -phenotype_file PHENOTYPE_FILE\n                        Path to the phenotype data file (.gz)\n  -gene_association_file GENE_ASSOCIATION_FILE\n                        Path to the gene association file (.gz)\n  -go_obo_file GO_OBO_FILE\n                        Path to the go.obo file\n  -output_dir OUTPUT_DIR\n                        Output directory\n\nOptional parameters:\n  -filter_unused_gos    Filter out unused GO terms from the FUNC and ARFF\n                        output (default: True)\n  -filter_mixed_terms   Filter out genes which have both lethal and viable\n                        phenotypes - Terms not specifically lethal/viable are\n                        not counted in this (default: False)\n  -gene_go_pheno        Output \"Gene-GO-Phenotype\" (Rbbp5 GO:0003674 0) file\n                        for overrepresentation analysis with tools such as\n                        FUNC (default: False)\n\nFly specific parameters:\n  -fly_assignments FLY_ASSIGNMENTS\n                        Provide TSV file of fly assignments (file confirming\n                        genes are assignment to drosophila melanogaster\n                        (default: \"data/fly/FlyBase_Fields_2017.txt.gz\")\n  -driver_lines DRIVER_LINES\n                        Provide TSV file of fly driver lines (file containing\n                        the name of driver lines (RNAi) to ignore when present\n                        with the \"with\" tag (default: \"data/fly/FlyBase_Driver\n                        Line_Fields_2025_08_05.txt.gz\")\n  -filt_with            Filter out phenotype with \"with\" tag (default: DO NOT\n                        FILTER)\n\nWorm specific parameters:\n  -worm_phenotypes WORM_PHENOTYPES\n                        Provide TSV file of worm phenotypes (default:\n                        \"data/worm/WS297_lethal_terms.tsv.gz\")\n\nMouse specific parameters:\n  -mouse_phenotypes MOUSE_PHENOTYPES\n                        Provide TSV file of mouse phenotypes (default:\n                        \"data/mouse/mouse_lethal_terms.txt.gz\")\n\nMisc:\n  -v, --version         show program's version number and exit\n```\n\n### Compare-ARFF:\n```commandline\nusage: compare_arff_genes.py [-h] -arff_a ARFF_A -arff_b ARFF_B -o OUTPUT\n\nPhenoGO v0.1.2 - Compare-ARFF: Compare two ARFF files.\n\noptions:\n  -h, --help      show this help message and exit\n  -arff_a ARFF_A  Master ARFF file (reference)\n  -arff_b ARFF_B  Comparison ARFF file\n  -o OUTPUT       Output CSV file\n\n```\n\n**Output:**\nThe output of the `compare-arff` function is a CSV file that summarizes the comparison between two ARFF files.\n```commandline\nGene,Label A,Label B,GO Terms Differ,Status\nGeneA,lethal,,,\"MISSING_IN_B\"\nGeneB,lethal,viable,,\"LABEL_MISMATCH\"\nGeneC,viable,viable,GO:0008150;GO:0003674,\"GO_TERM_MISMATCH\"\nGeneD,viable,viable,,\"EXACT_MATCH\"\n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "PhenoGO: A tool to build WEKA-ready ARFF files from model organism phenotype and Gene Ontology (GO) annotations.",
    "version": "0.1.2",
    "project_urls": {
        "Bug Tracker": "https://github.com/NickJD/PhenoGO/issues",
        "Homepage": "https://github.com/NickJD/PhenoGO"
    },
    "split_keywords": [
        "phenotype",
        " gene ontology",
        " go",
        " weka",
        " arff",
        " machine learning",
        " model organism"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "41c825d3e20e013e3650233b8c62272c71a3ee5dedca9bb186705890ab8879e4",
                "md5": "f7a353da865a8a4a3f5ac05a8f8b5080",
                "sha256": "11a51280b227e40bb222e56e5a7b2df8700062ddcd8f273e2b161351882ec42b"
            },
            "downloads": -1,
            "filename": "phengo-0.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "f7a353da865a8a4a3f5ac05a8f8b5080",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 28421,
            "upload_time": "2025-08-06T18:39:08",
            "upload_time_iso_8601": "2025-08-06T18:39:08.805255Z",
            "url": "https://files.pythonhosted.org/packages/41/c8/25d3e20e013e3650233b8c62272c71a3ee5dedca9bb186705890ab8879e4/phengo-0.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "9f68d34a934dbda06eff5fd42b2a8d4f6dc6e8e4ea3613f826d54e1999c289c1",
                "md5": "ca2f1f4d9d4a7759851eb8129b38d2bb",
                "sha256": "b42ca678b84bd86129e79262572aee268729d71106a3dc2ac661e40fa230aced"
            },
            "downloads": -1,
            "filename": "phengo-0.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "ca2f1f4d9d4a7759851eb8129b38d2bb",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 27867,
            "upload_time": "2025-08-06T18:39:10",
            "upload_time_iso_8601": "2025-08-06T18:39:10.277779Z",
            "url": "https://files.pythonhosted.org/packages/9f/68/d34a934dbda06eff5fd42b2a8d4f6dc6e8e4ea3613f826d54e1999c289c1/phengo-0.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-06 18:39:10",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "NickJD",
    "github_project": "PhenoGO",
    "github_not_found": true,
    "lcname": "phengo"
}
        
Elapsed time: 1.30135s