yolo-health-checker


Nameyolo-health-checker JSON
Version 0.1.1 PyPI version JSON
download
home_pageNone
SummaryA tool to numerically analyze the health of a YOLO dataset.
upload_time2024-12-28 04:27:09
maintainerNone
docs_urlNone
authorNone
requires_pythonNone
licenseNone
keywords yolo dataset analysis
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # yolo-health-checker

A Python tool to perform an in-depth analysis of YOLO-format datasets, providing both **multidimensional** and **unidimensional** metrics of dataset “health.” This package helps you **quantify** class distribution, spatial distribution, and other properties in a systematic and reproducible way.

## Features

- **Class Distribution Analysis**:  
  - Calculate entropy, Gini index, and standard deviation of instance counts per class.
  - Inspect the number of instances per class and identify imbalances quickly.

- **Spatial Distribution Analysis**:  
  - Compute spatial entropy of bounding boxes to see how they are spread across images.
  - Measure standard deviation of bounding box centers, detecting if objects cluster in certain image regions.
  - Calculate average distance from image center, unveiling potential “center bias” in your dataset.

- **Rich Visual Outputs**:  
  - Automatically generates heatmaps of bounding box footprints and centers.
  - Visual bar charts showing the number of instances per class.

- **Comprehensive Logging**:  
  - Each analysis is logged into a log file, capturing potential warnings (e.g., missing annotations) and key statistics.

- **Modular**:  
  - Integrate directly into Python code, or run as a command-line script.
  - Produces CSV reports of class distribution and overall health metrics.

## Installation

Install `yolo-health-checker` via pip:

``pip install yolo-health-checker``

## Usage

There are two main ways to use this tool: **as a command-line script** or **via Python import**.

### Command-Line

``python -m yolo_health_checker.analyze_dataset /path/to/yolo_dataset --output_dir results --save_images --save_csv``

- `dataset_path`: The path to your YOLO-format dataset (containing `data.yaml`, `train/`, `val/` folders).
- `--output_dir`: Optional path to store the output artifacts (CSV, images, etc.).
- `--save_images`: Save class distribution bar charts and heatmaps.
- `--save_csv`: Save CSV files with class distributions and health metrics.
- `--log_file`: Specify the log file name (default: `main.log`).
- `--log_level`: Logging verbosity (default: `INFO`).

### Python Import

You can also integrate `yolo-health-checker` within your Python code:

```
from yolo_health_checker import analyze_dataset

health_checker = analyze_dataset(
    dataset_path='/path/to/yolo_dataset',
    output_dir='results',
    save_images=True,
    save_csv=True,
    log_level='INFO',
    log_file='analysis.log'
)

# Once analysis is done, inspect the results
health_checker.show_health_metrics()
```

## Motivations & Numeric Measurements

> **Why numeric measurements?**  
> Numeric metrics allow us to systematically compare how well different YOLO versions handle dataset variations. By converting each characteristic into a **measurable number**, we make the research both **reproducible** and **statistically testable**.

Below we list the main “dataset health” metrics we measure. Each is **numeric** with a clear interpretation, making them suitable for statistical analyses. The overarching principle: **if it cannot be expressed numerically, we cannot reliably correlate it with YOLO performance**.

### 1. Class Distribution Metrics

#### 1.1. Entropy of Class Distribution
- **Reason:** Measures the **uniformity** of the distribution of objects across classes. A high entropy indicates a more balanced dataset.
- **Formula:**  
  H = - Σ pᵢ log(pᵢ)  
  (where pᵢ is the proportion of class i)

#### 1.2. Gini Index
- **Reason:** Captures how **unevenly** instances are distributed among classes.
- **Formula:**  
  G = 1 - Σ (pᵢ)²  
  (where pᵢ is the proportion of class i)

#### 1.3. Standard Deviation of Instances per Class
- **Reason:** Indicates the spread of counts across different classes. 

### 2. Spatial Distribution Metrics

#### 2.1. Entropy of Object Locations
- **Reason:** Checks if bounding boxes are **clustered** in a few regions or **spread** evenly.
- **Procedure:** A 10×10 grid is created, and bounding box counts per cell are transformed into probabilities for entropy calculation.

#### 2.2. Standard Deviation of Object Centers
- **Reason:** Measures how widely scattered the center points of bounding boxes are across the image.

#### 2.3. Distance from Center of Mass
- **Reason:** Quantifies how far bounding box centers lie from the image center, highlighting potential “center bias.”

## Example

To run a sample analysis (outputting both CSV and images):

``python -m yolo_health_checker.analyze_dataset /path/to/yolo_dataset --output_dir results --save_images --save_csv --log_file my_log.log``

- This will log the process in **my_log.log**, generate bar charts for class distribution, produce bounding box heatmaps, and create CSV reports with class counts and dataset health metrics in the `results/health` folder.

## Contributing

Feel free to open an issue or a pull request if you spot bugs or want to contribute improvements. We welcome new ideas on metrics or enhancements to support more YOLO-format variations.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "yolo-health-checker",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "yolo, dataset, analysis",
    "author": null,
    "author_email": "Rodrigo Ferraz Souza <dev.rodrigofs@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/15/94/68e246f7c6de73615a8be8cbe0c7dc7337b4361afc261610d8ea36217798/yolo_health_checker-0.1.1.tar.gz",
    "platform": null,
    "description": "# yolo-health-checker\r\n\r\nA Python tool to perform an in-depth analysis of YOLO-format datasets, providing both **multidimensional** and **unidimensional** metrics of dataset \u201chealth.\u201d This package helps you **quantify** class distribution, spatial distribution, and other properties in a systematic and reproducible way.\r\n\r\n## Features\r\n\r\n- **Class Distribution Analysis**:  \r\n  - Calculate entropy, Gini index, and standard deviation of instance counts per class.\r\n  - Inspect the number of instances per class and identify imbalances quickly.\r\n\r\n- **Spatial Distribution Analysis**:  \r\n  - Compute spatial entropy of bounding boxes to see how they are spread across images.\r\n  - Measure standard deviation of bounding box centers, detecting if objects cluster in certain image regions.\r\n  - Calculate average distance from image center, unveiling potential \u201ccenter bias\u201d in your dataset.\r\n\r\n- **Rich Visual Outputs**:  \r\n  - Automatically generates heatmaps of bounding box footprints and centers.\r\n  - Visual bar charts showing the number of instances per class.\r\n\r\n- **Comprehensive Logging**:  \r\n  - Each analysis is logged into a log file, capturing potential warnings (e.g., missing annotations) and key statistics.\r\n\r\n- **Modular**:  \r\n  - Integrate directly into Python code, or run as a command-line script.\r\n  - Produces CSV reports of class distribution and overall health metrics.\r\n\r\n## Installation\r\n\r\nInstall `yolo-health-checker` via pip:\r\n\r\n``pip install yolo-health-checker``\r\n\r\n## Usage\r\n\r\nThere are two main ways to use this tool: **as a command-line script** or **via Python import**.\r\n\r\n### Command-Line\r\n\r\n``python -m yolo_health_checker.analyze_dataset /path/to/yolo_dataset --output_dir results --save_images --save_csv``\r\n\r\n- `dataset_path`: The path to your YOLO-format dataset (containing `data.yaml`, `train/`, `val/` folders).\r\n- `--output_dir`: Optional path to store the output artifacts (CSV, images, etc.).\r\n- `--save_images`: Save class distribution bar charts and heatmaps.\r\n- `--save_csv`: Save CSV files with class distributions and health metrics.\r\n- `--log_file`: Specify the log file name (default: `main.log`).\r\n- `--log_level`: Logging verbosity (default: `INFO`).\r\n\r\n### Python Import\r\n\r\nYou can also integrate `yolo-health-checker` within your Python code:\r\n\r\n```\r\nfrom yolo_health_checker import analyze_dataset\r\n\r\nhealth_checker = analyze_dataset(\r\n    dataset_path='/path/to/yolo_dataset',\r\n    output_dir='results',\r\n    save_images=True,\r\n    save_csv=True,\r\n    log_level='INFO',\r\n    log_file='analysis.log'\r\n)\r\n\r\n# Once analysis is done, inspect the results\r\nhealth_checker.show_health_metrics()\r\n```\r\n\r\n## Motivations & Numeric Measurements\r\n\r\n> **Why numeric measurements?**  \r\n> Numeric metrics allow us to systematically compare how well different YOLO versions handle dataset variations. By converting each characteristic into a **measurable number**, we make the research both **reproducible** and **statistically testable**.\r\n\r\nBelow we list the main \u201cdataset health\u201d metrics we measure. Each is **numeric** with a clear interpretation, making them suitable for statistical analyses. The overarching principle: **if it cannot be expressed numerically, we cannot reliably correlate it with YOLO performance**.\r\n\r\n### 1. Class Distribution Metrics\r\n\r\n#### 1.1. Entropy of Class Distribution\r\n- **Reason:** Measures the **uniformity** of the distribution of objects across classes. A high entropy indicates a more balanced dataset.\r\n- **Formula:**  \r\n  H = - \u03a3 p\u1d62 log(p\u1d62)  \r\n  (where p\u1d62 is the proportion of class i)\r\n\r\n#### 1.2. Gini Index\r\n- **Reason:** Captures how **unevenly** instances are distributed among classes.\r\n- **Formula:**  \r\n  G = 1 - \u03a3 (p\u1d62)\u00b2  \r\n  (where p\u1d62 is the proportion of class i)\r\n\r\n#### 1.3. Standard Deviation of Instances per Class\r\n- **Reason:** Indicates the spread of counts across different classes. \r\n\r\n### 2. Spatial Distribution Metrics\r\n\r\n#### 2.1. Entropy of Object Locations\r\n- **Reason:** Checks if bounding boxes are **clustered** in a few regions or **spread** evenly.\r\n- **Procedure:** A 10\u00d710 grid is created, and bounding box counts per cell are transformed into probabilities for entropy calculation.\r\n\r\n#### 2.2. Standard Deviation of Object Centers\r\n- **Reason:** Measures how widely scattered the center points of bounding boxes are across the image.\r\n\r\n#### 2.3. Distance from Center of Mass\r\n- **Reason:** Quantifies how far bounding box centers lie from the image center, highlighting potential \u201ccenter bias.\u201d\r\n\r\n## Example\r\n\r\nTo run a sample analysis (outputting both CSV and images):\r\n\r\n``python -m yolo_health_checker.analyze_dataset /path/to/yolo_dataset --output_dir results --save_images --save_csv --log_file my_log.log``\r\n\r\n- This will log the process in **my_log.log**, generate bar charts for class distribution, produce bounding box heatmaps, and create CSV reports with class counts and dataset health metrics in the `results/health` folder.\r\n\r\n## Contributing\r\n\r\nFeel free to open an issue or a pull request if you spot bugs or want to contribute improvements. We welcome new ideas on metrics or enhancements to support more YOLO-format variations.\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A tool to numerically analyze the health of a YOLO dataset.",
    "version": "0.1.1",
    "project_urls": {
        "Bug Tracker": "https://github.com/CodeWracker/yolo-health-checker/issues",
        "Source Code": "https://github.com/CodeWracker/yolo-health-checker"
    },
    "split_keywords": [
        "yolo",
        " dataset",
        " analysis"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1b8922fec5d682c7ba598b0ad1d07baa586068f75c10f165c03545d9a263cf0b",
                "md5": "34d220ff583a79600fea5cfe7d7bac96",
                "sha256": "c35d0a8854269d000ea90c907d3ac21f36ea5e8871c713b2e14a19502f5b5226"
            },
            "downloads": -1,
            "filename": "yolo_health_checker-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "34d220ff583a79600fea5cfe7d7bac96",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 9259,
            "upload_time": "2024-12-28T04:27:07",
            "upload_time_iso_8601": "2024-12-28T04:27:07.603656Z",
            "url": "https://files.pythonhosted.org/packages/1b/89/22fec5d682c7ba598b0ad1d07baa586068f75c10f165c03545d9a263cf0b/yolo_health_checker-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "159468e246f7c6de73615a8be8cbe0c7dc7337b4361afc261610d8ea36217798",
                "md5": "05205ec13e8618c0f57803a471ad71fc",
                "sha256": "43dafe8e54eb8bf0cd6267586d55eb3496058600363c2b75b2675f419d8f62da"
            },
            "downloads": -1,
            "filename": "yolo_health_checker-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "05205ec13e8618c0f57803a471ad71fc",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 10968,
            "upload_time": "2024-12-28T04:27:09",
            "upload_time_iso_8601": "2024-12-28T04:27:09.731424Z",
            "url": "https://files.pythonhosted.org/packages/15/94/68e246f7c6de73615a8be8cbe0c7dc7337b4361afc261610d8ea36217798/yolo_health_checker-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-28 04:27:09",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "CodeWracker",
    "github_project": "yolo-health-checker",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "yolo-health-checker"
}
        
Elapsed time: 0.43479s