Name | yolo-health-checker JSON |
Version |
0.1.1
JSON |
| download |
home_page | None |
Summary | A tool to numerically analyze the health of a YOLO dataset. |
upload_time | 2024-12-28 04:27:09 |
maintainer | None |
docs_url | None |
author | None |
requires_python | None |
license | None |
keywords |
yolo
dataset
analysis
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# yolo-health-checker
A Python tool to perform an in-depth analysis of YOLO-format datasets, providing both **multidimensional** and **unidimensional** metrics of dataset “health.” This package helps you **quantify** class distribution, spatial distribution, and other properties in a systematic and reproducible way.
## Features
- **Class Distribution Analysis**:
- Calculate entropy, Gini index, and standard deviation of instance counts per class.
- Inspect the number of instances per class and identify imbalances quickly.
- **Spatial Distribution Analysis**:
- Compute spatial entropy of bounding boxes to see how they are spread across images.
- Measure standard deviation of bounding box centers, detecting if objects cluster in certain image regions.
- Calculate average distance from image center, unveiling potential “center bias” in your dataset.
- **Rich Visual Outputs**:
- Automatically generates heatmaps of bounding box footprints and centers.
- Visual bar charts showing the number of instances per class.
- **Comprehensive Logging**:
- Each analysis is logged into a log file, capturing potential warnings (e.g., missing annotations) and key statistics.
- **Modular**:
- Integrate directly into Python code, or run as a command-line script.
- Produces CSV reports of class distribution and overall health metrics.
## Installation
Install `yolo-health-checker` via pip:
``pip install yolo-health-checker``
## Usage
There are two main ways to use this tool: **as a command-line script** or **via Python import**.
### Command-Line
``python -m yolo_health_checker.analyze_dataset /path/to/yolo_dataset --output_dir results --save_images --save_csv``
- `dataset_path`: The path to your YOLO-format dataset (containing `data.yaml`, `train/`, `val/` folders).
- `--output_dir`: Optional path to store the output artifacts (CSV, images, etc.).
- `--save_images`: Save class distribution bar charts and heatmaps.
- `--save_csv`: Save CSV files with class distributions and health metrics.
- `--log_file`: Specify the log file name (default: `main.log`).
- `--log_level`: Logging verbosity (default: `INFO`).
### Python Import
You can also integrate `yolo-health-checker` within your Python code:
```
from yolo_health_checker import analyze_dataset
health_checker = analyze_dataset(
dataset_path='/path/to/yolo_dataset',
output_dir='results',
save_images=True,
save_csv=True,
log_level='INFO',
log_file='analysis.log'
)
# Once analysis is done, inspect the results
health_checker.show_health_metrics()
```
## Motivations & Numeric Measurements
> **Why numeric measurements?**
> Numeric metrics allow us to systematically compare how well different YOLO versions handle dataset variations. By converting each characteristic into a **measurable number**, we make the research both **reproducible** and **statistically testable**.
Below we list the main “dataset health” metrics we measure. Each is **numeric** with a clear interpretation, making them suitable for statistical analyses. The overarching principle: **if it cannot be expressed numerically, we cannot reliably correlate it with YOLO performance**.
### 1. Class Distribution Metrics
#### 1.1. Entropy of Class Distribution
- **Reason:** Measures the **uniformity** of the distribution of objects across classes. A high entropy indicates a more balanced dataset.
- **Formula:**
H = - Σ pᵢ log(pᵢ)
(where pᵢ is the proportion of class i)
#### 1.2. Gini Index
- **Reason:** Captures how **unevenly** instances are distributed among classes.
- **Formula:**
G = 1 - Σ (pᵢ)²
(where pᵢ is the proportion of class i)
#### 1.3. Standard Deviation of Instances per Class
- **Reason:** Indicates the spread of counts across different classes.
### 2. Spatial Distribution Metrics
#### 2.1. Entropy of Object Locations
- **Reason:** Checks if bounding boxes are **clustered** in a few regions or **spread** evenly.
- **Procedure:** A 10×10 grid is created, and bounding box counts per cell are transformed into probabilities for entropy calculation.
#### 2.2. Standard Deviation of Object Centers
- **Reason:** Measures how widely scattered the center points of bounding boxes are across the image.
#### 2.3. Distance from Center of Mass
- **Reason:** Quantifies how far bounding box centers lie from the image center, highlighting potential “center bias.”
## Example
To run a sample analysis (outputting both CSV and images):
``python -m yolo_health_checker.analyze_dataset /path/to/yolo_dataset --output_dir results --save_images --save_csv --log_file my_log.log``
- This will log the process in **my_log.log**, generate bar charts for class distribution, produce bounding box heatmaps, and create CSV reports with class counts and dataset health metrics in the `results/health` folder.
## Contributing
Feel free to open an issue or a pull request if you spot bugs or want to contribute improvements. We welcome new ideas on metrics or enhancements to support more YOLO-format variations.
Raw data
{
"_id": null,
"home_page": null,
"name": "yolo-health-checker",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "yolo, dataset, analysis",
"author": null,
"author_email": "Rodrigo Ferraz Souza <dev.rodrigofs@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/15/94/68e246f7c6de73615a8be8cbe0c7dc7337b4361afc261610d8ea36217798/yolo_health_checker-0.1.1.tar.gz",
"platform": null,
"description": "# yolo-health-checker\r\n\r\nA Python tool to perform an in-depth analysis of YOLO-format datasets, providing both **multidimensional** and **unidimensional** metrics of dataset \u201chealth.\u201d This package helps you **quantify** class distribution, spatial distribution, and other properties in a systematic and reproducible way.\r\n\r\n## Features\r\n\r\n- **Class Distribution Analysis**: \r\n - Calculate entropy, Gini index, and standard deviation of instance counts per class.\r\n - Inspect the number of instances per class and identify imbalances quickly.\r\n\r\n- **Spatial Distribution Analysis**: \r\n - Compute spatial entropy of bounding boxes to see how they are spread across images.\r\n - Measure standard deviation of bounding box centers, detecting if objects cluster in certain image regions.\r\n - Calculate average distance from image center, unveiling potential \u201ccenter bias\u201d in your dataset.\r\n\r\n- **Rich Visual Outputs**: \r\n - Automatically generates heatmaps of bounding box footprints and centers.\r\n - Visual bar charts showing the number of instances per class.\r\n\r\n- **Comprehensive Logging**: \r\n - Each analysis is logged into a log file, capturing potential warnings (e.g., missing annotations) and key statistics.\r\n\r\n- **Modular**: \r\n - Integrate directly into Python code, or run as a command-line script.\r\n - Produces CSV reports of class distribution and overall health metrics.\r\n\r\n## Installation\r\n\r\nInstall `yolo-health-checker` via pip:\r\n\r\n``pip install yolo-health-checker``\r\n\r\n## Usage\r\n\r\nThere are two main ways to use this tool: **as a command-line script** or **via Python import**.\r\n\r\n### Command-Line\r\n\r\n``python -m yolo_health_checker.analyze_dataset /path/to/yolo_dataset --output_dir results --save_images --save_csv``\r\n\r\n- `dataset_path`: The path to your YOLO-format dataset (containing `data.yaml`, `train/`, `val/` folders).\r\n- `--output_dir`: Optional path to store the output artifacts (CSV, images, etc.).\r\n- `--save_images`: Save class distribution bar charts and heatmaps.\r\n- `--save_csv`: Save CSV files with class distributions and health metrics.\r\n- `--log_file`: Specify the log file name (default: `main.log`).\r\n- `--log_level`: Logging verbosity (default: `INFO`).\r\n\r\n### Python Import\r\n\r\nYou can also integrate `yolo-health-checker` within your Python code:\r\n\r\n```\r\nfrom yolo_health_checker import analyze_dataset\r\n\r\nhealth_checker = analyze_dataset(\r\n dataset_path='/path/to/yolo_dataset',\r\n output_dir='results',\r\n save_images=True,\r\n save_csv=True,\r\n log_level='INFO',\r\n log_file='analysis.log'\r\n)\r\n\r\n# Once analysis is done, inspect the results\r\nhealth_checker.show_health_metrics()\r\n```\r\n\r\n## Motivations & Numeric Measurements\r\n\r\n> **Why numeric measurements?** \r\n> Numeric metrics allow us to systematically compare how well different YOLO versions handle dataset variations. By converting each characteristic into a **measurable number**, we make the research both **reproducible** and **statistically testable**.\r\n\r\nBelow we list the main \u201cdataset health\u201d metrics we measure. Each is **numeric** with a clear interpretation, making them suitable for statistical analyses. The overarching principle: **if it cannot be expressed numerically, we cannot reliably correlate it with YOLO performance**.\r\n\r\n### 1. Class Distribution Metrics\r\n\r\n#### 1.1. Entropy of Class Distribution\r\n- **Reason:** Measures the **uniformity** of the distribution of objects across classes. A high entropy indicates a more balanced dataset.\r\n- **Formula:** \r\n H = - \u03a3 p\u1d62 log(p\u1d62) \r\n (where p\u1d62 is the proportion of class i)\r\n\r\n#### 1.2. Gini Index\r\n- **Reason:** Captures how **unevenly** instances are distributed among classes.\r\n- **Formula:** \r\n G = 1 - \u03a3 (p\u1d62)\u00b2 \r\n (where p\u1d62 is the proportion of class i)\r\n\r\n#### 1.3. Standard Deviation of Instances per Class\r\n- **Reason:** Indicates the spread of counts across different classes. \r\n\r\n### 2. Spatial Distribution Metrics\r\n\r\n#### 2.1. Entropy of Object Locations\r\n- **Reason:** Checks if bounding boxes are **clustered** in a few regions or **spread** evenly.\r\n- **Procedure:** A 10\u00d710 grid is created, and bounding box counts per cell are transformed into probabilities for entropy calculation.\r\n\r\n#### 2.2. Standard Deviation of Object Centers\r\n- **Reason:** Measures how widely scattered the center points of bounding boxes are across the image.\r\n\r\n#### 2.3. Distance from Center of Mass\r\n- **Reason:** Quantifies how far bounding box centers lie from the image center, highlighting potential \u201ccenter bias.\u201d\r\n\r\n## Example\r\n\r\nTo run a sample analysis (outputting both CSV and images):\r\n\r\n``python -m yolo_health_checker.analyze_dataset /path/to/yolo_dataset --output_dir results --save_images --save_csv --log_file my_log.log``\r\n\r\n- This will log the process in **my_log.log**, generate bar charts for class distribution, produce bounding box heatmaps, and create CSV reports with class counts and dataset health metrics in the `results/health` folder.\r\n\r\n## Contributing\r\n\r\nFeel free to open an issue or a pull request if you spot bugs or want to contribute improvements. We welcome new ideas on metrics or enhancements to support more YOLO-format variations.\r\n",
"bugtrack_url": null,
"license": null,
"summary": "A tool to numerically analyze the health of a YOLO dataset.",
"version": "0.1.1",
"project_urls": {
"Bug Tracker": "https://github.com/CodeWracker/yolo-health-checker/issues",
"Source Code": "https://github.com/CodeWracker/yolo-health-checker"
},
"split_keywords": [
"yolo",
" dataset",
" analysis"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "1b8922fec5d682c7ba598b0ad1d07baa586068f75c10f165c03545d9a263cf0b",
"md5": "34d220ff583a79600fea5cfe7d7bac96",
"sha256": "c35d0a8854269d000ea90c907d3ac21f36ea5e8871c713b2e14a19502f5b5226"
},
"downloads": -1,
"filename": "yolo_health_checker-0.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "34d220ff583a79600fea5cfe7d7bac96",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 9259,
"upload_time": "2024-12-28T04:27:07",
"upload_time_iso_8601": "2024-12-28T04:27:07.603656Z",
"url": "https://files.pythonhosted.org/packages/1b/89/22fec5d682c7ba598b0ad1d07baa586068f75c10f165c03545d9a263cf0b/yolo_health_checker-0.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "159468e246f7c6de73615a8be8cbe0c7dc7337b4361afc261610d8ea36217798",
"md5": "05205ec13e8618c0f57803a471ad71fc",
"sha256": "43dafe8e54eb8bf0cd6267586d55eb3496058600363c2b75b2675f419d8f62da"
},
"downloads": -1,
"filename": "yolo_health_checker-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "05205ec13e8618c0f57803a471ad71fc",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 10968,
"upload_time": "2024-12-28T04:27:09",
"upload_time_iso_8601": "2024-12-28T04:27:09.731424Z",
"url": "https://files.pythonhosted.org/packages/15/94/68e246f7c6de73615a8be8cbe0c7dc7337b4361afc261610d8ea36217798/yolo_health_checker-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-28 04:27:09",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "CodeWracker",
"github_project": "yolo-health-checker",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "yolo-health-checker"
}