# DataGradients
<div align="center">
<p align="center">
<a href="https://github.com/Deci-AI/super-gradients#prerequisites"><img src="https://img.shields.io/badge/python-3.7%20%7C%203.8%20%7C%203.9-blue" /></a>
<a href="https://pypi.org/project/data-gradients/"><img src="https://img.shields.io/pypi/v/data-gradients" /></a>
<a href="https://github.com/Deci-AI/data-gradients/releases"><img src="https://img.shields.io/github/v/release/Deci-AI/data-gradients" /></a>
<a href="https://github.com/Deci-AI/data-gradients/blob/master/LICENSE.md"><img src="https://img.shields.io/badge/license-Apache%202.0-blue" /></a>
</p>
</div>
DataGradients is an open-source python based library designed for **computer vision dataset analysis**.
Extract **valuable insights** from your datasets and get **comprehensive reports effortlessly**.
### π Detect Common Data Issues
- Corrupted data
- Labeling errors
- Underlying biases, and more.
### π‘ Extract Insights for Better Model Design
- Informed decisions based on data characteristics.
- Object size and location distributions.
- High frequency details.
### π― Reduce Guesswork for Hyperparameters
- Define the correct NMS and filtering parameters.
- Identify class distribution issues.
- Calibrate metrics for your unique dataset.
## π Capabilities
Non-exhaustive list of supported features.
- **General Image Metrics**: Explore key attributes like resolution, color distribution, and average brightness.
- **Class Overview**: Get a snapshot of class distributions, most frequent classes, and unlabelled images.
- **Positional Heatmaps**: Visualize where objects tend to appear within your images.
- **Bounding Box & Mask Details**: Delve into dimensions, area coverages, and resolutions of objects.
- **Class Frequencies Deep Dive**: Dive deeper into class distributions, understanding anomalies and rare classes.
- **Detailed Object Counts**: Examine the granularity of components per image, identifying patterns and outliers.
- And **[many more](./documentation/feature_description.md)**!
> π **Deep Dive into Data Profiling**
> Puzzled by some dataset challenges while using DataGradients? We've got you covered.
> Enrich your understanding with this **[πfree online course](https://deci.ai/course/profiling-computer-vision-datasets-overview/?utm_campaign[β¦]=DG-PDF-report&utm_medium=DG-repo&utm_content=DG-Report-to-course)**. Dive into dataset profiling, confront its complexities, and harness the full potential of DataGradients.
<div align="center">
<a href="https://github.com/Deci-AI/data-gradients/raw/master/documentation/assets/report_image_stats.png"><img src="https://github.com/Deci-AI/data-gradients/raw/master/documentation/assets/report_image_stats.png" width="250px"></a>
<a href="https://github.com/Deci-AI/data-gradients/raw/master/documentation/assets/report_mask_sample.png"><img src="https://github.com/Deci-AI/data-gradients/raw/master/documentation/assets/report_mask_sample.png" width="250px"></a>
<a href="https://github.com/Deci-AI/data-gradients/raw/master/documentation/assets/report_classes_distribution.png"><img src="https://github.com/Deci-AI/data-gradients/raw/master/documentation/assets/report_classes_distribution.png" width="250px"></a>
<p><em>Example of pages from the Report</em>
</div>
<div align="center">
<a href="https://github.com/Deci-AI/data-gradients/raw/master/documentation/assets/SegmentationBoundingBoxArea.png"><img src="https://github.com/Deci-AI/data-gradients/raw/master/documentation/assets/SegmentationBoundingBoxArea.png" width="375px"></a>
<a href="https://github.com/Deci-AI/data-gradients/raw/master/documentation/assets/SegmentationBoundingBoxResolution.png"><img src="https://github.com/Deci-AI/data-gradients/raw/master/documentation/assets/SegmentationBoundingBoxResolution.png" width="375px"></a>
<br />
<a href="https://github.com/Deci-AI/data-gradients/raw/master/documentation/assets/SegmentationClassFrequency.png"><img src="https://github.com/Deci-AI/data-gradients/raw/master/documentation/assets/SegmentationClassFrequency.png" width="375px"></a>
<a href="https://github.com/Deci-AI/data-gradients/raw/master/documentation/assets/SegmentationComponentsPerImageCount.png"><img src="https://github.com/Deci-AI/data-gradients/raw/master/documentation/assets/SegmentationComponentsPerImageCount.png" width="375px"></a>
<p><em>Example of specific features</em>
</div>
> Check out the [pre-computed dataset analysis](#pre-computed-dataset-analysis) for a deeper dive into reports.
## Table of Contents
- [Installation](#installation)
- [Quick Start](#quick-start)
- [Prerequisites](#prerequisites)
- [Dataset Analysis](#dataset-analysis)
- [Report](#report)
- [Feature Configuration](#feature-configuration)
- [Dataset Extractors](#dataset-extractors)
- [Pre-computed Dataset Analysis](#pre-computed-dataset-analysis)
- [License](#license)
## Installation
You can install DataGradients directly from the GitHub repository.
```
pip install data-gradients
```
## Quick Start
### Prerequisites
- **Dataset**: Includes a **Train** set and a **Validation** or a **Test** set.
- **Dataset Iterable**: A method to iterate over your Dataset providing images and labels. Can be any of the following:
- PyTorch **Dataloader**
- PyTorch **Dataset**
- Generator that yields image/label pairs
- Any other iterable you use for model training/validation
- One of:
- **Class Names**: Either the list of all class names in the dataset OR dictionary mapping of `class_id` -> `class_name`.
- **Number of classes**: Indicate how many unique classes are in your dataset. Ensure this number is greater than the highest class index (e.g., if your highest class index is 9, the number of classes should be at least 10).
Please ensure all the points above are checked before you proceed with **DataGradients**.
**Example**
``` python
from torchvision.datasets import CocoDetection
train_data = CocoDetection(...)
val_data = CocoDetection(...)
class_names = ["person", "bicycle", "car", "motorcycle", ...]
# OR
# class_names = {0: "person", 1:"bicycle", 2:"car", 3: "motorcycle", ...}
```
> **Good to Know** - DataGradients will try to find out how the dataset returns images and labels.
> - If something cannot be automatically determined, you will be asked to provide some extra information through a text input.
> - In some extreme cases, the process will crash and invite you to implement a custom [dataset extractor](#dataset-extractors)
> **Heads up** - DataGradients provides a few out-of-the-box [dataset/dataloader](./documentation/datasets.md) implementation.
> You can find more dataset implementations in [PyTorch](https://pytorch.org/vision/stable/datasets.html)
> or [SuperGradients](https://docs.deci.ai/super-gradients/src/super_gradients/training/datasets/Dataset_Setup_Instructions.html).
## Dataset Analysis
You are now ready to go, chose the relevant analyzer for your task and run it over your datasets!
**Image Classification**
```python
from data_gradients.managers.classification_manager import ClassificationAnalysisManager
train_data = ... # Your dataset iterable (torch dataset/dataloader/...)
val_data = ... # Your dataset iterable (torch dataset/dataloader/...)
class_names = ... # [<class-1>, <class-2>, ...]
analyzer = ClassificationAnalysisManager(
report_title="Testing Data-Gradients Classification",
train_data=train_data,
val_data=val_data,
class_names=class_names,
)
analyzer.run()
```
**Object Detection**
```python
from data_gradients.managers.detection_manager import DetectionAnalysisManager
train_data = ... # Your dataset iterable (torch dataset/dataloader/...)
val_data = ... # Your dataset iterable (torch dataset/dataloader/...)
class_names = ... # [<class-1>, <class-2>, ...]
analyzer = DetectionAnalysisManager(
report_title="Testing Data-Gradients Object Detection",
train_data=train_data,
val_data=val_data,
class_names=class_names,
)
analyzer.run()
```
**Semantic Segmentation**
```python
from data_gradients.managers.segmentation_manager import SegmentationAnalysisManager
train_data = ... # Your dataset iterable (torch dataset/dataloader/...)
val_data = ... # Your dataset iterable (torch dataset/dataloader/...)
class_names = ... # [<class-1>, <class-2>, ...]
analyzer = SegmentationAnalysisManager(
report_title="Testing Data-Gradients Segmentation",
train_data=train_data,
val_data=val_data,
class_names=class_names,
)
analyzer.run()
```
**Example**
You can test the segmentation analysis tool in the following [example](https://github.com/Deci-AI/data-gradients/blob/master/examples/segmentation_example.py)
which does not require you to download any additional data.
## Report
Once the analysis is done, the path to your pdf report will be printed. You can find here examples of [pre-computed dataset analysis reports](#pre-computed-dataset-analysis).
## Feature Configuration
The feature configuration allows you to run the analysis on a subset of features or adjust the parameters of existing features.
If you are interested in customizing this configuration, you can check out the [documentation](documentation/feature_configuration.md) on that topic.
## Dataset Extractors
**Ensuring Comprehensive Dataset Compatibility**
DataGradients is adept at automatic dataset inference; however, certain specificities, such as nested annotations structures or unique annotation format, may necessitate a tailored approach.
To address this, DataGradients offers `extractors` tailored for enhancing compatibility with diverse dataset formats.
For an in-depth understanding and implementation details, we encourage a thorough review of the [Dataset Extractors Documentation](./documentation/dataset_extractors.md).
## Pre-computed Dataset Analysis
<table style="border: 0">
<tr>
<td><img src="https://github.com/Deci-AI/data-gradients/raw/master/documentation/assets/colab.png" width="80pt"></td>
<td><a href="https://colab.research.google.com/drive/1dswgeK0KF-n61p6ixRdFgbQKHEtOu8SE?usp=sharing"> Example notebook on Colab</a></td>
</tr>
</table>
<details>
<summary><h3>Detection</h3></summary>
Common Datasets
- [COCO](https://dgreports.deci.ai/detection/COCO/Report.pdf)
- [VOC](https://dgreports.deci.ai/detection/VOC/Report.pdf)
[Roboflow 100](https://universe.roboflow.com/roboflow-100?ref=blog.roboflow.com) Datasets
- [4-fold-defect](https://dgreports.deci.ai/detection/RF100_4-fold-defect/Report.pdf)
- [abdomen-mri](https://dgreports.deci.ai/detection/RF100_abdomen-mri/Report.pdf)
- [acl-x-ray](https://dgreports.deci.ai/detection/RF100_acl-x-ray/Report.pdf)
- [activity-diagrams-qdobr](https://dgreports.deci.ai/detection/RF100_activity-diagrams-qdobr/Report.pdf)
- [aerial-cows](https://dgreports.deci.ai/detection/RF100_aerial-cows/Report.pdf)
- [aerial-pool](https://dgreports.deci.ai/detection/RF100_aerial-pool/Report.pdf)
- [aerial-spheres](https://dgreports.deci.ai/detection/RF100_aerial-spheres/Report.pdf)
- [animals-ij5d2](https://dgreports.deci.ai/detection/RF100_animals-ij5d2/Report.pdf)
- [apex-videogame](https://dgreports.deci.ai/detection/RF100_apex-videogame/Report.pdf)
- [apples-fvpl5](https://dgreports.deci.ai/detection/RF100_apples-fvpl5/Report.pdf)
- [aquarium-qlnqy](https://dgreports.deci.ai/detection/RF100_aquarium-qlnqy/Report.pdf)
- [asbestos](https://dgreports.deci.ai/detection/RF100_asbestos/Report.pdf)
- [avatar-recognition-nuexe](https://dgreports.deci.ai/detection/RF100_avatar-recognition-nuexe/Report.pdf)
- [axial-mri](https://dgreports.deci.ai/detection/RF100_axial-mri/Report.pdf)
- [bacteria-ptywi](https://dgreports.deci.ai/detection/RF100_bacteria-ptywi/Report.pdf)
- [bccd-ouzjz](https://dgreports.deci.ai/detection/RF100_bccd-ouzjz/Report.pdf)
- [bees-jt5in](https://dgreports.deci.ai/detection/RF100_bees-jt5in/Report.pdf)
- [bone-fracture-7fylg](https://dgreports.deci.ai/detection/RF100_bone-fracture-7fylg/Report.pdf)
- [brain-tumor-m2pbp](https://dgreports.deci.ai/detection/RF100_brain-tumor-m2pbp/Report.pdf)
- [cable-damage](https://dgreports.deci.ai/detection/RF100_cable-damage/Report.pdf)
- [cables-nl42k](https://dgreports.deci.ai/detection/RF100_cables-nl42k/Report.pdf)
- [cavity-rs0uf](https://dgreports.deci.ai/detection/RF100_cavity-rs0uf/Report.pdf)
- [cell-towers](https://dgreports.deci.ai/detection/RF100_cell-towers/Report.pdf)
- [cells-uyemf](https://dgreports.deci.ai/detection/RF100_cells-uyemf/Report.pdf)
- [chess-pieces-mjzgj](https://dgreports.deci.ai/detection/RF100_chess-pieces-mjzgj/Report.pdf)
- [circuit-elements](https://dgreports.deci.ai/detection/RF100_circuit-elements/Report.pdf)
- [circuit-voltages](https://dgreports.deci.ai/detection/RF100_circuit-voltages/Report.pdf)
- [cloud-types](https://dgreports.deci.ai/detection/RF100_cloud-types/Report.pdf)
- [coins-1apki](https://dgreports.deci.ai/detection/RF100_coins-1apki/Report.pdf)
- [construction-safety-gsnvb](https://dgreports.deci.ai/detection/RF100_construction-safety-gsnvb/Report.pdf)
- [coral-lwptl](https://dgreports.deci.ai/detection/RF100_coral-lwptl/Report.pdf)
- [corrosion-bi3q3](https://dgreports.deci.ai/detection/RF100_corrosion-bi3q3/Report.pdf)
- [cotton-20xz5](https://dgreports.deci.ai/detection/RF100_cotton-20xz5/Report.pdf)
- [cotton-plant-disease](https://dgreports.deci.ai/detection/RF100_cotton-plant-disease/Report.pdf)
- [csgo-videogame](https://dgreports.deci.ai/detection/RF100_csgo-videogame/Report.pdf)
- [currency-v4f8j](https://dgreports.deci.ai/detection/RF100_currency-v4f8j/Report.pdf)
- [digits-t2eg6](https://dgreports.deci.ai/detection/RF100_digits-t2eg6/Report.pdf)
- [document-parts](https://dgreports.deci.ai/detection/RF100_document-parts/Report.pdf)
- [excavators-czvg9](https://dgreports.deci.ai/detection/RF100_excavators-czvg9/Report.pdf)
- [farcry6-videogame](https://dgreports.deci.ai/detection/RF100_farcry6-videogame/Report.pdf)
- [fish-market-ggjso](https://dgreports.deci.ai/detection/RF100_fish-market-ggjso/Report.pdf)
- [flir-camera-objects](https://dgreports.deci.ai/detection/RF100_flir-camera-objects/Report.pdf)
- [furniture-ngpea](https://dgreports.deci.ai/detection/RF100_furniture-ngpea/Report.pdf)
- [gauge-u2lwv](https://dgreports.deci.ai/detection/RF100_gauge-u2lwv/Report.pdf)
- [grass-weeds](https://dgreports.deci.ai/detection/RF100_grass-weeds/Report.pdf)
- [gynecology-mri](https://dgreports.deci.ai/detection/RF100_gynecology-mri/Report.pdf)
- [halo-infinite-angel-videogame](https://dgreports.deci.ai/detection/RF100_halo-infinite-angel-videogame/Report.pdf)
- [hand-gestures-jps7z](https://dgreports.deci.ai/detection/RF100_hand-gestures-jps7z/Report.pdf)
- [insects-mytwu](https://dgreports.deci.ai/detection/RF100_insects-mytwu/Report.pdf)
- [leaf-disease-nsdsr](https://dgreports.deci.ai/detection/RF100_leaf-disease-nsdsr/Report.pdf)
- [lettuce-pallets](https://dgreports.deci.ai/detection/RF100_lettuce-pallets/Report.pdf)
- [liver-disease](https://dgreports.deci.ai/detection/RF100_liver-disease/Report.pdf)
- [marbles](https://dgreports.deci.ai/detection/RF100_marbles/Report.pdf)
- [mask-wearing-608pr](https://dgreports.deci.ai/detection/RF100_mask-wearing-608pr/Report.pdf)
- [mitosis-gjs3g](https://dgreports.deci.ai/detection/RF100_mitosis-gjs3g/Report.pdf)
- [number-ops](https://dgreports.deci.ai/detection/RF100_number-ops/Report.pdf)
- [paper-parts](https://dgreports.deci.ai/detection/RF100_paper-parts/Report.pdf)
- [paragraphs-co84b](https://dgreports.deci.ai/detection/RF100_paragraphs-co84b/Report.pdf)
- [parasites-1s07h](https://dgreports.deci.ai/detection/RF100_parasites-1s07h/Report.pdf)
- [peanuts-sd4kf](https://dgreports.deci.ai/detection/RF100_peanuts-sd4kf/Report.pdf)
- [peixos-fish](https://dgreports.deci.ai/detection/RF100_peixos-fish/Report.pdf)
- [people-in-paintings](https://dgreports.deci.ai/detection/RF100_people-in-paintings/Report.pdf)
- [pests-2xlvx](https://dgreports.deci.ai/detection/RF100_pests-2xlvx/Report.pdf)
- [phages](https://dgreports.deci.ai/detection/RF100_phages/Report.pdf)
- [pills-sxdht](https://dgreports.deci.ai/detection/RF100_pills-sxdht/Report.pdf)
- [poker-cards-cxcvz](https://dgreports.deci.ai/detection/RF100_poker-cards-cxcvz/Report.pdf)
- [printed-circuit-board](https://dgreports.deci.ai/detection/RF100_printed-circuit-board/Report.pdf)
- [radio-signal](https://dgreports.deci.ai/detection/RF100_radio-signal/Report.pdf)
- [road-signs-6ih4y](https://dgreports.deci.ai/detection/RF100_road-signs-6ih4y/Report.pdf)
- [road-traffic](https://dgreports.deci.ai/detection/RF100_road-traffic/Report.pdf)
- [robomasters-285km](https://dgreports.deci.ai/detection/RF100_robomasters-285km/Report.pdf)
- [secondary-chains](https://dgreports.deci.ai/detection/RF100_secondary-chains/Report.pdf)
- [sedimentary-features-9eosf](https://dgreports.deci.ai/detection/RF100_sedimentary-features-9eosf/Report.pdf)
- [shark-teeth-5atku](https://dgreports.deci.ai/detection/RF100_shark-teeth-5atku/Report.pdf)
- [sign-language-sokdr](https://dgreports.deci.ai/detection/RF100_sign-language-sokdr/Report.pdf)
- [signatures-xc8up](https://dgreports.deci.ai/detection/RF100_signatures-xc8up/Report.pdf)
- [smoke-uvylj](https://dgreports.deci.ai/detection/RF100_smoke-uvylj/Report.pdf)
- [soccer-players-5fuqs](https://dgreports.deci.ai/detection/RF100_soccer-players-5fuqs/Report.pdf)
- [soda-bottles](https://dgreports.deci.ai/detection/RF100_soda-bottles/Report.pdf)
- [solar-panels-taxvb](https://dgreports.deci.ai/detection/RF100_solar-panels-taxvb/Report.pdf)
- [stomata-cells](https://dgreports.deci.ai/detection/RF100_stomata-cells/Report.pdf)
- [street-work](https://dgreports.deci.ai/detection/RF100_street-work/Report.pdf)
- [tabular-data-wf9uh](https://dgreports.deci.ai/detection/RF100_tabular-data-wf9uh/Report.pdf)
- [team-fight-tactics](https://dgreports.deci.ai/detection/RF100_team-fight-tactics/Report.pdf)
- [thermal-cheetah-my4dp](https://dgreports.deci.ai/detection/RF100_thermal-cheetah-my4dp/Report.pdf)
- [thermal-dogs-and-people-x6ejw](https://dgreports.deci.ai/detection/RF100_thermal-dogs-and-people-x6ejw/Report.pdf)
- [trail-camera](https://dgreports.deci.ai/detection/RF100_trail-camera/Report.pdf)
- [truck-movement](https://dgreports.deci.ai/detection/RF100_truck-movement/Report.pdf)
- [tweeter-posts](https://dgreports.deci.ai/detection/RF100_tweeter-posts/Report.pdf)
- [tweeter-profile](https://dgreports.deci.ai/detection/RF100_tweeter-profile/Report.pdf)
- [underwater-objects-5v7p8](https://dgreports.deci.ai/detection/RF100_underwater-objects-5v7p8/Report.pdf)
- [underwater-pipes-4ng4t](https://dgreports.deci.ai/detection/RF100_underwater-pipes-4ng4t/Report.pdf)
- [uno-deck](https://dgreports.deci.ai/detection/RF100_uno-deck/Report.pdf)
- [valentines-chocolate](https://dgreports.deci.ai/detection/RF100_valentines-chocolate/Report.pdf)
- [vehicles-q0x2v](https://dgreports.deci.ai/detection/RF100_vehicles-q0x2v/Report.pdf)
- [wall-damage](https://dgreports.deci.ai/detection/RF100_wall-damage/Report.pdf)
- [washroom-rf1fa](https://dgreports.deci.ai/detection/RF100_washroom-rf1fa/Report.pdf)
- [weed-crop-aerial](https://dgreports.deci.ai/detection/RF100_weed-crop-aerial/Report.pdf)
- [wine-labels](https://dgreports.deci.ai/detection/RF100_wine-labels/Report.pdf)
- [x-ray-rheumatology](https://dgreports.deci.ai/detection/RF100_x-ray-rheumatology/Report.pdf)
</details>
<details>
<summary><h3>Segmentation</h3></summary>
- [COCO](https://dgreports.deci.ai/segmentation/COCO/Report.pdf)
- [Cityspace](https://dgreports.deci.ai/segmentation/Cityspace/Report.pdf)
- [VOC](https://dgreports.deci.ai/segmentation/VOC/Report.pdf)
</details>
## Community
<table style="border: 0">
<tr>
<td><img src="https://github.com/Deci-AI/data-gradients/raw/master/documentation/assets/discord.png" width="60pt"></td>
<td><a href="https://discord.gg/2v6cEGMREN"> Click here to join our Discord Community</a></td>
</tr>
</table>
## License
This project is released under the [Apache 2.0 license](https://dgreports.deci.ai/detection/LICENSE.md).
Raw data
{
"_id": null,
"home_page": "https://github.com/Deci-AI/data-gradients",
"name": "data-gradients",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "Deci,AI,Data,Deep Learning,Computer Vision,PyTorch",
"author": "Deci AI",
"author_email": "rnd@deci.ai",
"download_url": "",
"platform": null,
"description": "# DataGradients\n<div align=\"center\">\n<p align=\"center\">\n <a href=\"https://github.com/Deci-AI/super-gradients#prerequisites\"><img src=\"https://img.shields.io/badge/python-3.7%20%7C%203.8%20%7C%203.9-blue\" /></a>\n <a href=\"https://pypi.org/project/data-gradients/\"><img src=\"https://img.shields.io/pypi/v/data-gradients\" /></a>\n <a href=\"https://github.com/Deci-AI/data-gradients/releases\"><img src=\"https://img.shields.io/github/v/release/Deci-AI/data-gradients\" /></a>\n <a href=\"https://github.com/Deci-AI/data-gradients/blob/master/LICENSE.md\"><img src=\"https://img.shields.io/badge/license-Apache%202.0-blue\" /></a>\n</p> \n</div>\n\n\nDataGradients is an open-source python based library designed for **computer vision dataset analysis**. \n\nExtract **valuable insights** from your datasets and get **comprehensive reports effortlessly**.\n\n### \ud83d\udd0d Detect Common Data Issues\n- Corrupted data\n- Labeling errors\n- Underlying biases, and more.\n\n### \ud83d\udca1 Extract Insights for Better Model Design\n- Informed decisions based on data characteristics.\n- Object size and location distributions.\n- High frequency details.\n\n### \ud83c\udfaf Reduce Guesswork for Hyperparameters\n- Define the correct NMS and filtering parameters.\n- Identify class distribution issues.\n- Calibrate metrics for your unique dataset.\n\n## \ud83d\udee0 Capabilities \nNon-exhaustive list of supported features.\n- **General Image Metrics**: Explore key attributes like resolution, color distribution, and average brightness.\n- **Class Overview**: Get a snapshot of class distributions, most frequent classes, and unlabelled images.\n- **Positional Heatmaps**: Visualize where objects tend to appear within your images.\n- **Bounding Box & Mask Details**: Delve into dimensions, area coverages, and resolutions of objects.\n- **Class Frequencies Deep Dive**: Dive deeper into class distributions, understanding anomalies and rare classes.\n- **Detailed Object Counts**: Examine the granularity of components per image, identifying patterns and outliers.\n- And **[many more](./documentation/feature_description.md)**!\n\n> \ud83d\udcd8 **Deep Dive into Data Profiling** \n> Puzzled by some dataset challenges while using DataGradients? We've got you covered. \n> Enrich your understanding with this **[\ud83c\udf93free online course](https://deci.ai/course/profiling-computer-vision-datasets-overview/?utm_campaign[\u2026]=DG-PDF-report&utm_medium=DG-repo&utm_content=DG-Report-to-course)**. Dive into dataset profiling, confront its complexities, and harness the full potential of DataGradients.\n\n\n<div align=\"center\">\n <a href=\"https://github.com/Deci-AI/data-gradients/raw/master/documentation/assets/report_image_stats.png\"><img src=\"https://github.com/Deci-AI/data-gradients/raw/master/documentation/assets/report_image_stats.png\" width=\"250px\"></a>\n <a href=\"https://github.com/Deci-AI/data-gradients/raw/master/documentation/assets/report_mask_sample.png\"><img src=\"https://github.com/Deci-AI/data-gradients/raw/master/documentation/assets/report_mask_sample.png\" width=\"250px\"></a>\n <a href=\"https://github.com/Deci-AI/data-gradients/raw/master/documentation/assets/report_classes_distribution.png\"><img src=\"https://github.com/Deci-AI/data-gradients/raw/master/documentation/assets/report_classes_distribution.png\" width=\"250px\"></a>\n <p><em>Example of pages from the Report</em>\n</div>\n\n<div align=\"center\">\n <a href=\"https://github.com/Deci-AI/data-gradients/raw/master/documentation/assets/SegmentationBoundingBoxArea.png\"><img src=\"https://github.com/Deci-AI/data-gradients/raw/master/documentation/assets/SegmentationBoundingBoxArea.png\" width=\"375px\"></a>\n <a href=\"https://github.com/Deci-AI/data-gradients/raw/master/documentation/assets/SegmentationBoundingBoxResolution.png\"><img src=\"https://github.com/Deci-AI/data-gradients/raw/master/documentation/assets/SegmentationBoundingBoxResolution.png\" width=\"375px\"></a>\n <br />\n <a href=\"https://github.com/Deci-AI/data-gradients/raw/master/documentation/assets/SegmentationClassFrequency.png\"><img src=\"https://github.com/Deci-AI/data-gradients/raw/master/documentation/assets/SegmentationClassFrequency.png\" width=\"375px\"></a>\n <a href=\"https://github.com/Deci-AI/data-gradients/raw/master/documentation/assets/SegmentationComponentsPerImageCount.png\"><img src=\"https://github.com/Deci-AI/data-gradients/raw/master/documentation/assets/SegmentationComponentsPerImageCount.png\" width=\"375px\"></a>\n <p><em>Example of specific features</em>\n</div>\n\n> Check out the [pre-computed dataset analysis](#pre-computed-dataset-analysis) for a deeper dive into reports.\n\n\n## Table of Contents\n- [Installation](#installation)\n- [Quick Start](#quick-start)\n - [Prerequisites](#prerequisites)\n - [Dataset Analysis](#dataset-analysis)\n - [Report](#report)\n- [Feature Configuration](#feature-configuration)\n- [Dataset Extractors](#dataset-extractors)\n- [Pre-computed Dataset Analysis](#pre-computed-dataset-analysis)\n- [License](#license)\n\n\n\n## Installation\nYou can install DataGradients directly from the GitHub repository.\n\n```\npip install data-gradients\n```\n\n\n## Quick Start\n\n### Prerequisites\n\n- **Dataset**: Includes a **Train** set and a **Validation** or a **Test** set.\n- **Dataset Iterable**: A method to iterate over your Dataset providing images and labels. Can be any of the following:\n - PyTorch **Dataloader**\n - PyTorch **Dataset**\n - Generator that yields image/label pairs\n - Any other iterable you use for model training/validation\n- One of:\n - **Class Names**: Either the list of all class names in the dataset OR dictionary mapping of `class_id` -> `class_name`.\n - **Number of classes**: Indicate how many unique classes are in your dataset. Ensure this number is greater than the highest class index (e.g., if your highest class index is 9, the number of classes should be at least 10).\n\nPlease ensure all the points above are checked before you proceed with **DataGradients**.\n\n**Example**\n``` python\nfrom torchvision.datasets import CocoDetection\n\ntrain_data = CocoDetection(...)\nval_data = CocoDetection(...)\nclass_names = [\"person\", \"bicycle\", \"car\", \"motorcycle\", ...]\n# OR\n# class_names = {0: \"person\", 1:\"bicycle\", 2:\"car\", 3: \"motorcycle\", ...}\n```\n\n> **Good to Know** - DataGradients will try to find out how the dataset returns images and labels.\n> - If something cannot be automatically determined, you will be asked to provide some extra information through a text input.\n> - In some extreme cases, the process will crash and invite you to implement a custom [dataset extractor](#dataset-extractors)\n\n> **Heads up** - DataGradients provides a few out-of-the-box [dataset/dataloader](./documentation/datasets.md) implementation. \n> You can find more dataset implementations in [PyTorch](https://pytorch.org/vision/stable/datasets.html) \n> or [SuperGradients](https://docs.deci.ai/super-gradients/src/super_gradients/training/datasets/Dataset_Setup_Instructions.html). \n\n\n## Dataset Analysis\nYou are now ready to go, chose the relevant analyzer for your task and run it over your datasets!\n\n**Image Classification**\n```python\nfrom data_gradients.managers.classification_manager import ClassificationAnalysisManager \n\ntrain_data = ... # Your dataset iterable (torch dataset/dataloader/...)\nval_data = ... # Your dataset iterable (torch dataset/dataloader/...)\nclass_names = ... # [<class-1>, <class-2>, ...]\n\nanalyzer = ClassificationAnalysisManager(\n report_title=\"Testing Data-Gradients Classification\",\n train_data=train_data,\n val_data=val_data,\n class_names=class_names,\n)\n\nanalyzer.run()\n```\n\n**Object Detection**\n```python\nfrom data_gradients.managers.detection_manager import DetectionAnalysisManager\n\ntrain_data = ... # Your dataset iterable (torch dataset/dataloader/...)\nval_data = ... # Your dataset iterable (torch dataset/dataloader/...)\nclass_names = ... # [<class-1>, <class-2>, ...]\n\nanalyzer = DetectionAnalysisManager(\n report_title=\"Testing Data-Gradients Object Detection\",\n train_data=train_data,\n val_data=val_data,\n class_names=class_names,\n)\n\nanalyzer.run()\n```\n\n\n**Semantic Segmentation**\n```python\nfrom data_gradients.managers.segmentation_manager import SegmentationAnalysisManager \n\ntrain_data = ... # Your dataset iterable (torch dataset/dataloader/...)\nval_data = ... # Your dataset iterable (torch dataset/dataloader/...)\nclass_names = ... # [<class-1>, <class-2>, ...]\n\nanalyzer = SegmentationAnalysisManager(\n report_title=\"Testing Data-Gradients Segmentation\",\n train_data=train_data,\n val_data=val_data,\n class_names=class_names,\n)\n\nanalyzer.run()\n```\n\n**Example**\n\nYou can test the segmentation analysis tool in the following [example](https://github.com/Deci-AI/data-gradients/blob/master/examples/segmentation_example.py)\nwhich does not require you to download any additional data.\n\n\n## Report\nOnce the analysis is done, the path to your pdf report will be printed. You can find here examples of [pre-computed dataset analysis reports](#pre-computed-dataset-analysis).\n\n\n## Feature Configuration\n \nThe feature configuration allows you to run the analysis on a subset of features or adjust the parameters of existing features. \nIf you are interested in customizing this configuration, you can check out the [documentation](documentation/feature_configuration.md) on that topic.\n\n\n## Dataset Extractors\n**Ensuring Comprehensive Dataset Compatibility**\n\nDataGradients is adept at automatic dataset inference; however, certain specificities, such as nested annotations structures or unique annotation format, may necessitate a tailored approach.\n\nTo address this, DataGradients offers `extractors` tailored for enhancing compatibility with diverse dataset formats.\n\nFor an in-depth understanding and implementation details, we encourage a thorough review of the [Dataset Extractors Documentation](./documentation/dataset_extractors.md).\n\n\n\n## Pre-computed Dataset Analysis\n\n<table style=\"border: 0\">\n <tr>\n <td><img src=\"https://github.com/Deci-AI/data-gradients/raw/master/documentation/assets/colab.png\" width=\"80pt\"></td>\n <td><a href=\"https://colab.research.google.com/drive/1dswgeK0KF-n61p6ixRdFgbQKHEtOu8SE?usp=sharing\"> Example notebook on Colab</a></td>\n </tr>\n</table>\n\n<details>\n\n<summary><h3>Detection</h3></summary>\n\nCommon Datasets\n\n- [COCO](https://dgreports.deci.ai/detection/COCO/Report.pdf)\n\n- [VOC](https://dgreports.deci.ai/detection/VOC/Report.pdf)\n\n[Roboflow 100](https://universe.roboflow.com/roboflow-100?ref=blog.roboflow.com) Datasets\n\n- [4-fold-defect](https://dgreports.deci.ai/detection/RF100_4-fold-defect/Report.pdf)\n\n- [abdomen-mri](https://dgreports.deci.ai/detection/RF100_abdomen-mri/Report.pdf)\n\n- [acl-x-ray](https://dgreports.deci.ai/detection/RF100_acl-x-ray/Report.pdf)\n\n- [activity-diagrams-qdobr](https://dgreports.deci.ai/detection/RF100_activity-diagrams-qdobr/Report.pdf)\n\n- [aerial-cows](https://dgreports.deci.ai/detection/RF100_aerial-cows/Report.pdf)\n\n- [aerial-pool](https://dgreports.deci.ai/detection/RF100_aerial-pool/Report.pdf)\n\n- [aerial-spheres](https://dgreports.deci.ai/detection/RF100_aerial-spheres/Report.pdf)\n\n- [animals-ij5d2](https://dgreports.deci.ai/detection/RF100_animals-ij5d2/Report.pdf)\n\n- [apex-videogame](https://dgreports.deci.ai/detection/RF100_apex-videogame/Report.pdf)\n\n- [apples-fvpl5](https://dgreports.deci.ai/detection/RF100_apples-fvpl5/Report.pdf)\n\n- [aquarium-qlnqy](https://dgreports.deci.ai/detection/RF100_aquarium-qlnqy/Report.pdf)\n\n- [asbestos](https://dgreports.deci.ai/detection/RF100_asbestos/Report.pdf)\n\n- [avatar-recognition-nuexe](https://dgreports.deci.ai/detection/RF100_avatar-recognition-nuexe/Report.pdf)\n\n- [axial-mri](https://dgreports.deci.ai/detection/RF100_axial-mri/Report.pdf)\n\n- [bacteria-ptywi](https://dgreports.deci.ai/detection/RF100_bacteria-ptywi/Report.pdf)\n\n- [bccd-ouzjz](https://dgreports.deci.ai/detection/RF100_bccd-ouzjz/Report.pdf)\n\n- [bees-jt5in](https://dgreports.deci.ai/detection/RF100_bees-jt5in/Report.pdf)\n\n- [bone-fracture-7fylg](https://dgreports.deci.ai/detection/RF100_bone-fracture-7fylg/Report.pdf)\n\n- [brain-tumor-m2pbp](https://dgreports.deci.ai/detection/RF100_brain-tumor-m2pbp/Report.pdf)\n\n- [cable-damage](https://dgreports.deci.ai/detection/RF100_cable-damage/Report.pdf)\n\n- [cables-nl42k](https://dgreports.deci.ai/detection/RF100_cables-nl42k/Report.pdf)\n\n- [cavity-rs0uf](https://dgreports.deci.ai/detection/RF100_cavity-rs0uf/Report.pdf)\n\n- [cell-towers](https://dgreports.deci.ai/detection/RF100_cell-towers/Report.pdf)\n\n- [cells-uyemf](https://dgreports.deci.ai/detection/RF100_cells-uyemf/Report.pdf)\n\n- [chess-pieces-mjzgj](https://dgreports.deci.ai/detection/RF100_chess-pieces-mjzgj/Report.pdf)\n\n- [circuit-elements](https://dgreports.deci.ai/detection/RF100_circuit-elements/Report.pdf)\n\n- [circuit-voltages](https://dgreports.deci.ai/detection/RF100_circuit-voltages/Report.pdf)\n\n- [cloud-types](https://dgreports.deci.ai/detection/RF100_cloud-types/Report.pdf)\n\n- [coins-1apki](https://dgreports.deci.ai/detection/RF100_coins-1apki/Report.pdf)\n\n- [construction-safety-gsnvb](https://dgreports.deci.ai/detection/RF100_construction-safety-gsnvb/Report.pdf)\n\n- [coral-lwptl](https://dgreports.deci.ai/detection/RF100_coral-lwptl/Report.pdf)\n\n- [corrosion-bi3q3](https://dgreports.deci.ai/detection/RF100_corrosion-bi3q3/Report.pdf)\n\n- [cotton-20xz5](https://dgreports.deci.ai/detection/RF100_cotton-20xz5/Report.pdf)\n\n- [cotton-plant-disease](https://dgreports.deci.ai/detection/RF100_cotton-plant-disease/Report.pdf)\n\n- [csgo-videogame](https://dgreports.deci.ai/detection/RF100_csgo-videogame/Report.pdf)\n\n- [currency-v4f8j](https://dgreports.deci.ai/detection/RF100_currency-v4f8j/Report.pdf)\n\n- [digits-t2eg6](https://dgreports.deci.ai/detection/RF100_digits-t2eg6/Report.pdf)\n\n- [document-parts](https://dgreports.deci.ai/detection/RF100_document-parts/Report.pdf)\n\n- [excavators-czvg9](https://dgreports.deci.ai/detection/RF100_excavators-czvg9/Report.pdf)\n\n- [farcry6-videogame](https://dgreports.deci.ai/detection/RF100_farcry6-videogame/Report.pdf)\n\n- [fish-market-ggjso](https://dgreports.deci.ai/detection/RF100_fish-market-ggjso/Report.pdf)\n\n- [flir-camera-objects](https://dgreports.deci.ai/detection/RF100_flir-camera-objects/Report.pdf)\n\n- [furniture-ngpea](https://dgreports.deci.ai/detection/RF100_furniture-ngpea/Report.pdf)\n\n- [gauge-u2lwv](https://dgreports.deci.ai/detection/RF100_gauge-u2lwv/Report.pdf)\n\n- [grass-weeds](https://dgreports.deci.ai/detection/RF100_grass-weeds/Report.pdf)\n\n- [gynecology-mri](https://dgreports.deci.ai/detection/RF100_gynecology-mri/Report.pdf)\n\n- [halo-infinite-angel-videogame](https://dgreports.deci.ai/detection/RF100_halo-infinite-angel-videogame/Report.pdf)\n\n- [hand-gestures-jps7z](https://dgreports.deci.ai/detection/RF100_hand-gestures-jps7z/Report.pdf)\n\n- [insects-mytwu](https://dgreports.deci.ai/detection/RF100_insects-mytwu/Report.pdf)\n\n- [leaf-disease-nsdsr](https://dgreports.deci.ai/detection/RF100_leaf-disease-nsdsr/Report.pdf)\n\n- [lettuce-pallets](https://dgreports.deci.ai/detection/RF100_lettuce-pallets/Report.pdf)\n\n- [liver-disease](https://dgreports.deci.ai/detection/RF100_liver-disease/Report.pdf)\n\n- [marbles](https://dgreports.deci.ai/detection/RF100_marbles/Report.pdf)\n\n- [mask-wearing-608pr](https://dgreports.deci.ai/detection/RF100_mask-wearing-608pr/Report.pdf)\n\n- [mitosis-gjs3g](https://dgreports.deci.ai/detection/RF100_mitosis-gjs3g/Report.pdf)\n\n- [number-ops](https://dgreports.deci.ai/detection/RF100_number-ops/Report.pdf)\n\n- [paper-parts](https://dgreports.deci.ai/detection/RF100_paper-parts/Report.pdf)\n\n- [paragraphs-co84b](https://dgreports.deci.ai/detection/RF100_paragraphs-co84b/Report.pdf)\n\n- [parasites-1s07h](https://dgreports.deci.ai/detection/RF100_parasites-1s07h/Report.pdf)\n\n- [peanuts-sd4kf](https://dgreports.deci.ai/detection/RF100_peanuts-sd4kf/Report.pdf)\n\n- [peixos-fish](https://dgreports.deci.ai/detection/RF100_peixos-fish/Report.pdf)\n\n- [people-in-paintings](https://dgreports.deci.ai/detection/RF100_people-in-paintings/Report.pdf)\n\n- [pests-2xlvx](https://dgreports.deci.ai/detection/RF100_pests-2xlvx/Report.pdf)\n\n- [phages](https://dgreports.deci.ai/detection/RF100_phages/Report.pdf)\n\n- [pills-sxdht](https://dgreports.deci.ai/detection/RF100_pills-sxdht/Report.pdf)\n\n- [poker-cards-cxcvz](https://dgreports.deci.ai/detection/RF100_poker-cards-cxcvz/Report.pdf)\n\n- [printed-circuit-board](https://dgreports.deci.ai/detection/RF100_printed-circuit-board/Report.pdf)\n\n- [radio-signal](https://dgreports.deci.ai/detection/RF100_radio-signal/Report.pdf)\n\n- [road-signs-6ih4y](https://dgreports.deci.ai/detection/RF100_road-signs-6ih4y/Report.pdf)\n\n- [road-traffic](https://dgreports.deci.ai/detection/RF100_road-traffic/Report.pdf)\n\n- [robomasters-285km](https://dgreports.deci.ai/detection/RF100_robomasters-285km/Report.pdf)\n\n- [secondary-chains](https://dgreports.deci.ai/detection/RF100_secondary-chains/Report.pdf)\n\n- [sedimentary-features-9eosf](https://dgreports.deci.ai/detection/RF100_sedimentary-features-9eosf/Report.pdf)\n\n- [shark-teeth-5atku](https://dgreports.deci.ai/detection/RF100_shark-teeth-5atku/Report.pdf)\n\n- [sign-language-sokdr](https://dgreports.deci.ai/detection/RF100_sign-language-sokdr/Report.pdf)\n\n- [signatures-xc8up](https://dgreports.deci.ai/detection/RF100_signatures-xc8up/Report.pdf)\n\n- [smoke-uvylj](https://dgreports.deci.ai/detection/RF100_smoke-uvylj/Report.pdf)\n\n- [soccer-players-5fuqs](https://dgreports.deci.ai/detection/RF100_soccer-players-5fuqs/Report.pdf)\n\n- [soda-bottles](https://dgreports.deci.ai/detection/RF100_soda-bottles/Report.pdf)\n\n- [solar-panels-taxvb](https://dgreports.deci.ai/detection/RF100_solar-panels-taxvb/Report.pdf)\n\n- [stomata-cells](https://dgreports.deci.ai/detection/RF100_stomata-cells/Report.pdf)\n\n- [street-work](https://dgreports.deci.ai/detection/RF100_street-work/Report.pdf)\n\n- [tabular-data-wf9uh](https://dgreports.deci.ai/detection/RF100_tabular-data-wf9uh/Report.pdf)\n\n- [team-fight-tactics](https://dgreports.deci.ai/detection/RF100_team-fight-tactics/Report.pdf)\n\n- [thermal-cheetah-my4dp](https://dgreports.deci.ai/detection/RF100_thermal-cheetah-my4dp/Report.pdf)\n\n- [thermal-dogs-and-people-x6ejw](https://dgreports.deci.ai/detection/RF100_thermal-dogs-and-people-x6ejw/Report.pdf)\n\n- [trail-camera](https://dgreports.deci.ai/detection/RF100_trail-camera/Report.pdf)\n\n- [truck-movement](https://dgreports.deci.ai/detection/RF100_truck-movement/Report.pdf)\n\n- [tweeter-posts](https://dgreports.deci.ai/detection/RF100_tweeter-posts/Report.pdf)\n\n- [tweeter-profile](https://dgreports.deci.ai/detection/RF100_tweeter-profile/Report.pdf)\n\n- [underwater-objects-5v7p8](https://dgreports.deci.ai/detection/RF100_underwater-objects-5v7p8/Report.pdf)\n\n- [underwater-pipes-4ng4t](https://dgreports.deci.ai/detection/RF100_underwater-pipes-4ng4t/Report.pdf)\n\n- [uno-deck](https://dgreports.deci.ai/detection/RF100_uno-deck/Report.pdf)\n\n- [valentines-chocolate](https://dgreports.deci.ai/detection/RF100_valentines-chocolate/Report.pdf)\n\n- [vehicles-q0x2v](https://dgreports.deci.ai/detection/RF100_vehicles-q0x2v/Report.pdf)\n\n- [wall-damage](https://dgreports.deci.ai/detection/RF100_wall-damage/Report.pdf)\n\n- [washroom-rf1fa](https://dgreports.deci.ai/detection/RF100_washroom-rf1fa/Report.pdf)\n\n- [weed-crop-aerial](https://dgreports.deci.ai/detection/RF100_weed-crop-aerial/Report.pdf)\n\n- [wine-labels](https://dgreports.deci.ai/detection/RF100_wine-labels/Report.pdf)\n\n- [x-ray-rheumatology](https://dgreports.deci.ai/detection/RF100_x-ray-rheumatology/Report.pdf)\n\n</details>\n\n\n<details>\n\n<summary><h3>Segmentation</h3></summary>\n\n- [COCO](https://dgreports.deci.ai/segmentation/COCO/Report.pdf)\n\n- [Cityspace](https://dgreports.deci.ai/segmentation/Cityspace/Report.pdf)\n\n- [VOC](https://dgreports.deci.ai/segmentation/VOC/Report.pdf)\n\n</details>\n\n## Community\n<table style=\"border: 0\">\n <tr>\n <td><img src=\"https://github.com/Deci-AI/data-gradients/raw/master/documentation/assets/discord.png\" width=\"60pt\"></td>\n <td><a href=\"https://discord.gg/2v6cEGMREN\"> Click here to join our Discord Community</a></td>\n </tr>\n</table>\n\n## License\n\nThis project is released under the [Apache 2.0 license](https://dgreports.deci.ai/detection/LICENSE.md).\n",
"bugtrack_url": null,
"license": "",
"summary": "DataGradients",
"version": "0.3.2",
"project_urls": {
"Homepage": "https://github.com/Deci-AI/data-gradients"
},
"split_keywords": [
"deci",
"ai",
"data",
"deep learning",
"computer vision",
"pytorch"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "9521d037976311f94c981abcadcfcbbc583546ab1e8346e5e3b7d24eaf4913ff",
"md5": "95a23a1ec1cd828a799de6fcd6ee959a",
"sha256": "4866ddb195a3800c73a50e33d1faf49b9bde054e670f705cdebea7a6d6adcd37"
},
"downloads": -1,
"filename": "data_gradients-0.3.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "95a23a1ec1cd828a799de6fcd6ee959a",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 459508,
"upload_time": "2024-01-11T13:47:38",
"upload_time_iso_8601": "2024-01-11T13:47:38.217335Z",
"url": "https://files.pythonhosted.org/packages/95/21/d037976311f94c981abcadcfcbbc583546ab1e8346e5e3b7d24eaf4913ff/data_gradients-0.3.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-01-11 13:47:38",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Deci-AI",
"github_project": "data-gradients",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"circle": true,
"requirements": [],
"lcname": "data-gradients"
}