EDAExcelReport

Name	EDAExcelReport JSON
Version	0.2.1 JSON
	download
home_page	https://github.com/rohit180497/EDAExcelReport
Summary	A Python package for generating detailed EDA reports in Excel format with structured insights and visualizations.
upload_time	2025-07-15 23:07:40
maintainer	None
docs_url	None
author	Rohit Kosamkar, Sapna Chavan
requires_python	>=3.6
license	None
keywords	eda excel exploratory data analysis report pandas numpy openpyxl machine learning data science data analysis edaexcelreport profiling visualization excel report python eda report
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # EDAExcelReport

![PyPI](https://img.shields.io/pypi/v/EDAExcelReport?color=blue&label=PyPI) ![Python](https://img.shields.io/badge/Python-3.6%2B-blue.svg) ![License](https://img.shields.io/badge/License-MIT-green.svg) ![Downloads](https://img.shields.io/pypi/dm/EDAExcelReport?color=orange&label=Downloads) ![Issues](https://img.shields.io/github/issues/rohit180497/EDAExcelReport) ![EDA](https://img.shields.io/badge/EDA-Exploratory%20Data%20Analysis-yellow.svg) ![Machine Learning](https://img.shields.io/badge/Machine%20Learning-ML-red.svg) ![Statistics](https://img.shields.io/badge/Statistics-Data%20Science-purple.svg)


EDAExcelReport is a Python package for generating detailed exploratory data analysis (EDA) reports specifically for datasets with binary target variables. The package creates comprehensive EDA reports in Excel format, which include statistics and visualizations in the form of table that help in understanding the distribution and relationship of various features with the target variable.

## Table of Contents
- [Features](#features)
- [Installation](#installation)
- [Usage](#usage)
- [Important Note](#important-note)
- [Input Parameters](#input-parameters)
- [Example Usage](#example-usage)
- [Screenshots](#screenshots)
- [License](#license)


## Features

- Calculates frequency and distribution of feature values.
- Computes target rate, percentage of total target, and lift for each feature value.
- Automatically handles numeric and categorical data.
- Generates Excel reports with well-formatted tables and conditional formatting.
- Removes gridlines and adds borders for better readability.

## Installation

You can install the package via pip:

```sh
pip install EDAExcelReport
```

```python

# How to import?
from EDAR.excel_report import EDAExcelReport

```


```python
# Import necessary libraries
import pandas as pd
import numpy as np
import os
from EDAR.excel_report import EDAExcelReport

```

```python
# Loading the credit dataset
df = pd.read_csv(r"tests\credit_data.csv")
```

```python
df.columns
```
    Index(['ID', 'CODE_GENDER', 'FLAG_OWN_CAR', 'FLAG_OWN_REALTY', 'CNT_CHILDREN',
           'AMT_INCOME_TOTAL', 'NAME_INCOME_TYPE', 'NAME_EDUCATION_TYPE',
           'NAME_FAMILY_STATUS', 'NAME_HOUSING_TYPE', 'DAYS_BIRTH',
           'DAYS_EMPLOYED', 'FLAG_MOBIL', 'FLAG_WORK_PHONE', 'FLAG_PHONE',
           'FLAG_EMAIL', 'OCCUPATION_TYPE', 'CNT_FAM_MEMBERS', 'target'],
          dtype='object')


```python
df.isna().sum()
```
    ID                         0
    CODE_GENDER                0
    FLAG_OWN_CAR               0
    FLAG_OWN_REALTY            0
    CNT_CHILDREN               0
    AMT_INCOME_TOTAL           0
    NAME_INCOME_TYPE           0
    NAME_EDUCATION_TYPE        0
    NAME_FAMILY_STATUS         0
    NAME_HOUSING_TYPE          0
    DAYS_BIRTH                 0
    DAYS_EMPLOYED              0
    FLAG_MOBIL                 0
    FLAG_WORK_PHONE            0
    FLAG_PHONE                 0
    FLAG_EMAIL                 0
    OCCUPATION_TYPE        11323
    CNT_FAM_MEMBERS            0
    target                     0
    dtype: int64


```python
ignore_feats = ["ID", "OCCUPATION_TYPE", "DAYS_BIRTH", "DAYS_EMPLOYED", "FLAG_MOBIL"]
```

```python
EDAExcelReport(df, 'target',r'tests\test_eda_report.xlsx', ignore_cols= ignore_feats)
```

    Your EDA report is ready at tests\test_eda_report_20240610_153828.xlsx
    
    <ed_report.excel_report.EDAExcelReport at 0x188c09ee9f0>


## Important Note 

Ensure your dataset is free of null values before using the EDAExcelReport package. This is crucial because numeric data is bucketed during the analysis, and the presence of null values can interfere with the bucket creation process. Additionally, having null values in the dataset can lead to inaccurate or misleading results when showcasing the report to stakeholders.

### Example

```python
# Remove or impute null values
df.fillna(method='ffill', inplace=True)
```

## Input Parameters

### EDAExcelReport

```python

class EDAExcelReport:
    def __init__(self, data, target, report_path, ignore_cols=None, cat_label_enco_thresh=0.05, num_min_samples_leaf=0.1, conditional_color='red'):


`data:` The input DataFrame containing the dataset.
`target:` The name of the target column in the DataFrame.
`report_path:` The file path where the Excel report will be saved.
`ignore_cols:` (Optional) List of column names to ignore in the analysis.
`cat_label_enco_thresh:` (Optional) Threshold for label encoding of categorical variables (default is 0.05).
`num_min_samples_leaf:` (Optional) Minimum samples per leaf for numeric data bucketing (default is 0.1).
`conditional_color:` (Optional) The color used for conditional formatting in the report (default is 'red').

```
### Exploratory Data Analysis Excel File for above Credit Data you can download from here: 

[Download Excel File](https://github.com/rohit180497/EDAExcelReport/blob/main/tests/test_eda_report_20240610_153828.xlsx)

## Screenshots

### Screenshot 1
![Screenshot 1](https://github.com/rohit180497/EDAExcelReport/blob/main/images/Snapshot_of_EDA_excel_report1.png?raw=true)

### Screenshot 2
![Screenshot 2](https://github.com/rohit180497/EDAExcelReport/blob/main/images/Snapshot_of_EDA_excel_report2.png?raw=true)

### Screenshot 3
![Screenshot 3](https://github.com/rohit180497/EDAExcelReport/blob/main/images/Snapshot_of_EDA_excel_report3.png?raw=true)

### Screenshot 4
![Screenshot 4](https://github.com/rohit180497/EDAExcelReport/blob/main/images/Snapshot_of_EDA_excel_roc_report.png?raw=true)


## License

This project is licensed under the MIT License.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/rohit180497/EDAExcelReport",
    "name": "EDAExcelReport",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": "EDA, Excel, exploratory data analysis, report, pandas, numpy, openpyxl, machine learning, data science, data analysis, EDAExcelReport, profiling, Visualization, Excel report, python EDA report",
    "author": "Rohit Kosamkar, Sapna Chavan",
    "author_email": "rohitkosamkar97@gmail.com, chavansapna12@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/ca/d7/250aea8b08f3b497c6e6f3cbbf5c40ebbea69916aa29230415b8e36bf905/edaexcelreport-0.2.1.tar.gz",
    "platform": null,
    "description": "# EDAExcelReport\n\n![PyPI](https://img.shields.io/pypi/v/EDAExcelReport?color=blue&label=PyPI) ![Python](https://img.shields.io/badge/Python-3.6%2B-blue.svg) ![License](https://img.shields.io/badge/License-MIT-green.svg) ![Downloads](https://img.shields.io/pypi/dm/EDAExcelReport?color=orange&label=Downloads) ![Issues](https://img.shields.io/github/issues/rohit180497/EDAExcelReport) ![EDA](https://img.shields.io/badge/EDA-Exploratory%20Data%20Analysis-yellow.svg) ![Machine Learning](https://img.shields.io/badge/Machine%20Learning-ML-red.svg) ![Statistics](https://img.shields.io/badge/Statistics-Data%20Science-purple.svg)\n\n\nEDAExcelReport is a Python package for generating detailed exploratory data analysis (EDA) reports specifically for datasets with binary target variables. The package creates comprehensive EDA reports in Excel format, which include statistics and visualizations in the form of table that help in understanding the distribution and relationship of various features with the target variable.\n\n## Table of Contents\n- [Features](#features)\n- [Installation](#installation)\n- [Usage](#usage)\n- [Important Note](#important-note)\n- [Input Parameters](#input-parameters)\n- [Example Usage](#example-usage)\n- [Screenshots](#screenshots)\n- [License](#license)\n\n\n## Features\n\n- Calculates frequency and distribution of feature values.\n- Computes target rate, percentage of total target, and lift for each feature value.\n- Automatically handles numeric and categorical data.\n- Generates Excel reports with well-formatted tables and conditional formatting.\n- Removes gridlines and adds borders for better readability.\n\n## Installation\n\nYou can install the package via pip:\n\n```sh\npip install EDAExcelReport\n```\n\n```python\n\n# How to import?\nfrom EDAR.excel_report import EDAExcelReport\n\n```\n\n\n```python\n# Import necessary libraries\nimport pandas as pd\nimport numpy as np\nimport os\nfrom EDAR.excel_report import EDAExcelReport\n\n```\n\n```python\n# Loading the credit dataset\ndf = pd.read_csv(r\"tests\\credit_data.csv\")\n```\n\n```python\ndf.columns\n```\n    Index(['ID', 'CODE_GENDER', 'FLAG_OWN_CAR', 'FLAG_OWN_REALTY', 'CNT_CHILDREN',\n           'AMT_INCOME_TOTAL', 'NAME_INCOME_TYPE', 'NAME_EDUCATION_TYPE',\n           'NAME_FAMILY_STATUS', 'NAME_HOUSING_TYPE', 'DAYS_BIRTH',\n           'DAYS_EMPLOYED', 'FLAG_MOBIL', 'FLAG_WORK_PHONE', 'FLAG_PHONE',\n           'FLAG_EMAIL', 'OCCUPATION_TYPE', 'CNT_FAM_MEMBERS', 'target'],\n          dtype='object')\n\n\n```python\ndf.isna().sum()\n```\n    ID                         0\n    CODE_GENDER                0\n    FLAG_OWN_CAR               0\n    FLAG_OWN_REALTY            0\n    CNT_CHILDREN               0\n    AMT_INCOME_TOTAL           0\n    NAME_INCOME_TYPE           0\n    NAME_EDUCATION_TYPE        0\n    NAME_FAMILY_STATUS         0\n    NAME_HOUSING_TYPE          0\n    DAYS_BIRTH                 0\n    DAYS_EMPLOYED              0\n    FLAG_MOBIL                 0\n    FLAG_WORK_PHONE            0\n    FLAG_PHONE                 0\n    FLAG_EMAIL                 0\n    OCCUPATION_TYPE        11323\n    CNT_FAM_MEMBERS            0\n    target                     0\n    dtype: int64\n\n\n```python\nignore_feats = [\"ID\", \"OCCUPATION_TYPE\", \"DAYS_BIRTH\", \"DAYS_EMPLOYED\", \"FLAG_MOBIL\"]\n```\n\n```python\nEDAExcelReport(df, 'target',r'tests\\test_eda_report.xlsx', ignore_cols= ignore_feats)\n```\n\n    Your EDA report is ready at tests\\test_eda_report_20240610_153828.xlsx\n    \n    <ed_report.excel_report.EDAExcelReport at 0x188c09ee9f0>\n\n\n## Important Note \n\nEnsure your dataset is free of null values before using the EDAExcelReport package. This is crucial because numeric data is bucketed during the analysis, and the presence of null values can interfere with the bucket creation process. Additionally, having null values in the dataset can lead to inaccurate or misleading results when showcasing the report to stakeholders.\n\n### Example\n\n```python\n# Remove or impute null values\ndf.fillna(method='ffill', inplace=True)\n```\n\n## Input Parameters\n\n### EDAExcelReport\n\n```python\n\nclass EDAExcelReport:\n    def __init__(self, data, target, report_path, ignore_cols=None, cat_label_enco_thresh=0.05, num_min_samples_leaf=0.1, conditional_color='red'):\n\n\n`data:` The input DataFrame containing the dataset.\n`target:` The name of the target column in the DataFrame.\n`report_path:` The file path where the Excel report will be saved.\n`ignore_cols:` (Optional) List of column names to ignore in the analysis.\n`cat_label_enco_thresh:` (Optional) Threshold for label encoding of categorical variables (default is 0.05).\n`num_min_samples_leaf:` (Optional) Minimum samples per leaf for numeric data bucketing (default is 0.1).\n`conditional_color:` (Optional) The color used for conditional formatting in the report (default is 'red').\n\n```\n### Exploratory Data Analysis Excel File for above Credit Data you can download from here: \n\n[Download Excel File](https://github.com/rohit180497/EDAExcelReport/blob/main/tests/test_eda_report_20240610_153828.xlsx)\n\n## Screenshots\n\n### Screenshot 1\n![Screenshot 1](https://github.com/rohit180497/EDAExcelReport/blob/main/images/Snapshot_of_EDA_excel_report1.png?raw=true)\n\n### Screenshot 2\n![Screenshot 2](https://github.com/rohit180497/EDAExcelReport/blob/main/images/Snapshot_of_EDA_excel_report2.png?raw=true)\n\n### Screenshot 3\n![Screenshot 3](https://github.com/rohit180497/EDAExcelReport/blob/main/images/Snapshot_of_EDA_excel_report3.png?raw=true)\n\n### Screenshot 4\n![Screenshot 4](https://github.com/rohit180497/EDAExcelReport/blob/main/images/Snapshot_of_EDA_excel_roc_report.png?raw=true)\n\n\n## License\n\nThis project is licensed under the MIT License.\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A Python package for generating detailed EDA reports in Excel format with structured insights and visualizations.",
    "version": "0.2.1",
    "project_urls": {
        "Bug Tracker": "https://github.com/rohit180497/EDAExcelReport/issues",
        "Documentation": "https://github.com/rohit180497/EDAExcelReport#readme",
        "Homepage": "https://github.com/rohit180497/EDAExcelReport",
        "Source Code": "https://github.com/rohit180497/EDAExcelReport"
    },
    "split_keywords": [
        "eda",
        " excel",
        " exploratory data analysis",
        " report",
        " pandas",
        " numpy",
        " openpyxl",
        " machine learning",
        " data science",
        " data analysis",
        " edaexcelreport",
        " profiling",
        " visualization",
        " excel report",
        " python eda report"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "1294c315818255a5e1a1154ce5fd2089d4d47308caa79fa9697fe5848a0373dd",
                "md5": "b423cf816084f29a23b6b4f8cfb8571b",
                "sha256": "22a938a4c41c0fbb555e16dd216ec9363c1478aed6cc1d494b9c6f8e8d7230c9"
            },
            "downloads": -1,
            "filename": "edaexcelreport-0.2.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b423cf816084f29a23b6b4f8cfb8571b",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 10325,
            "upload_time": "2025-07-15T23:07:39",
            "upload_time_iso_8601": "2025-07-15T23:07:39.989339Z",
            "url": "https://files.pythonhosted.org/packages/12/94/c315818255a5e1a1154ce5fd2089d4d47308caa79fa9697fe5848a0373dd/edaexcelreport-0.2.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "cad7250aea8b08f3b497c6e6f3cbbf5c40ebbea69916aa29230415b8e36bf905",
                "md5": "75aeb17a555ade4cd3ab209d995f61fc",
                "sha256": "e332f4897ea2fcd77f5557061b0ee8d58b1cbae66cc947b0ff0b7ffdaf6531fa"
            },
            "downloads": -1,
            "filename": "edaexcelreport-0.2.1.tar.gz",
            "has_sig": false,
            "md5_digest": "75aeb17a555ade4cd3ab209d995f61fc",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 11874,
            "upload_time": "2025-07-15T23:07:40",
            "upload_time_iso_8601": "2025-07-15T23:07:40.999425Z",
            "url": "https://files.pythonhosted.org/packages/ca/d7/250aea8b08f3b497c6e6f3cbbf5c40ebbea69916aa29230415b8e36bf905/edaexcelreport-0.2.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-15 23:07:40",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "rohit180497",
    "github_project": "EDAExcelReport",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "edaexcelreport"
}

Rohit Kosamkar, Sapna Chavan