# `eda-report` - Automated Exploratory Data Analysis
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/Tim-Abwao/eda-report/HEAD?filepath=eda-report-basics.ipynb)
[![PyPI version](https://badge.fury.io/py/eda-report.svg)](https://badge.fury.io/py/eda-report)
[![Python 3.9 - 3.11](https://github.com/Tim-Abwao/eda-report/actions/workflows/unit-tests.yml/badge.svg)](https://github.com/Tim-Abwao/eda-report/actions/workflows/unit-tests.yml)
[![Documentation Status](https://readthedocs.org/projects/eda-report/badge/?version=latest)](https://eda-report.readthedocs.io/en/latest/?badge=latest)
[![codecov](https://codecov.io/gh/Tim-Abwao/eda-report/branch/main/graph/badge.svg?token=KNQD8XZCWG)](https://codecov.io/gh/Tim-Abwao/eda-report)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
A Python program to help automate the exploratory data analysis and reporting process.
Input data is analyzed using [pandas][pandas] and [SciPy][scipy]. Graphs are plotted using [matplotlib][matplotlib]. The results are then nicely packaged as a *Word (.docx)* document using [python-docx][python-docx].
![screencast of report document from iris dataset][report-screencast]
## Installation
You can install the package from [PyPI][eda-report-pypi] using:
```bash
pip install eda-report
```
## Basic Usage
### 1. Graphical User Interface
The `eda-report` command launches a graphical window to help select a `csv`/`excel` file to analyze:
```bash
eda-report
```
![screencast of the gui][gui-screencast]
You'll be prompted to set a *report title*, *group-by/target variable (optional)*, *graph color* and *output filename*; after which the contents of the input file are analyzed, and the results saved in a *Word (.docx)* document.
>**NOTE:** For help with `Tk` - related issues, consider visiting [TkDocs][tkdocs].
### 2. Command Line Interface
```bash
$ eda-report -i iris.csv -o iris-report.docx
Analyze variables: 100%|███████████████████████████████████| 5/5
Plot variables: 100%|███████████████████████████████████| 5/5
Bivariate analysis: 100%|███████████████████████████████████| 6/6 pairs.
[INFO 02:12:22.146] Done. Results saved as 'iris-report.docx'
```
```bash
$ eda-report -h
usage: eda-report [-h] [-i INFILE] [-o OUTFILE] [-t TITLE] [-c COLOR]
[-g GROUPBY]
Automatically analyze data and generate reports. A graphical user interface
will be launched if none of the optional arguments is specified.
optional arguments:
-h, --help show this help message and exit
-i INFILE, --infile INFILE
A .csv or .xlsx file to analyze.
-o OUTFILE, --outfile OUTFILE
The output name for analysis results (default: eda-
report.docx)
-t TITLE, --title TITLE
The top level heading for the report (default:
Exploratory Data Analysis Report)
-c COLOR, --color COLOR
The color to apply to graphs (default: cyan)
-g GROUPBY, -T GROUPBY, --groupby GROUPBY, --target GROUPBY
The variable to use for grouping plotted values. An
integer value is treated as a column index, whereas a
string is treated as a column label.
```
</details>
### 3. Interpreter Session
```python
>>> eda_report.summarize(iris_data)
Summary Statistics for Numeric features (4)
-------------------------------------------
count avg stddev min 25% 50% 75% max skewness kurtosis
sepal_length 150 5.8433 0.8281 4.3 5.1 5.80 6.4 7.9 0.3149 -0.5521
sepal_width 150 3.0573 0.4359 2.0 2.8 3.00 3.3 4.4 0.3190 0.2282
petal_length 150 3.7580 1.7653 1.0 1.6 4.35 5.1 6.9 -0.2749 -1.4021
petal_width 150 1.1993 0.7622 0.1 0.3 1.30 1.8 2.5 -0.1030 -1.3406
Summary Statistics for Categorical features (1)
-----------------------------------------------
count unique top freq relative freq
species 150 3 setosa 50 33.33%
Pearson's Correlation (Top 20)
------------------------------
petal_length & petal_width -> very strong positive correlation (0.96)
sepal_length & petal_length -> very strong positive correlation (0.87)
sepal_length & petal_width -> very strong positive correlation (0.82)
sepal_width & petal_length -> moderate negative correlation (-0.43)
sepal_width & petal_width -> weak negative correlation (-0.37)
sepal_length & sepal_width -> very weak negative correlation (-0.12)
```
Check out the [documentation][docs] for more features and details.
[docs]: https://eda-report.readthedocs.io/
[eda-report-pypi]: https://pypi.org/project/eda-report/
[matplotlib]: https://matplotlib.org/
[pandas]: https://pandas.pydata.org/
[python-docx]: https://python-docx.readthedocs.io/
[scipy]: https://scipy.org/
[gui-screencast]: https://raw.githubusercontent.com/Tim-Abwao/eda-report/dev/docs/source/_static/screencast.gif
[report-screencast]: https://raw.githubusercontent.com/Tim-Abwao/eda-report/dev/docs/source/_static/report.gif
[tkdocs]: https://tkdocs.com/index.html
Raw data
{
"_id": null,
"home_page": "https://eda-report.readthedocs.io/",
"name": "eda-report",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": "",
"keywords": "eda exploratory data analysis report",
"author": "Abwao",
"author_email": "abwaomusungu@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/aa/d6/436014225bd86a81b28a0c71baf6b11b22cdfca09f8c7eb49970074f5077/eda_report-2.8.1.tar.gz",
"platform": null,
"description": "# `eda-report` - Automated Exploratory Data Analysis\n\n[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/Tim-Abwao/eda-report/HEAD?filepath=eda-report-basics.ipynb)\n[![PyPI version](https://badge.fury.io/py/eda-report.svg)](https://badge.fury.io/py/eda-report)\n[![Python 3.9 - 3.11](https://github.com/Tim-Abwao/eda-report/actions/workflows/unit-tests.yml/badge.svg)](https://github.com/Tim-Abwao/eda-report/actions/workflows/unit-tests.yml)\n[![Documentation Status](https://readthedocs.org/projects/eda-report/badge/?version=latest)](https://eda-report.readthedocs.io/en/latest/?badge=latest)\n[![codecov](https://codecov.io/gh/Tim-Abwao/eda-report/branch/main/graph/badge.svg?token=KNQD8XZCWG)](https://codecov.io/gh/Tim-Abwao/eda-report)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n\nA Python program to help automate the exploratory data analysis and reporting process.\n\nInput data is analyzed using [pandas][pandas] and [SciPy][scipy]. Graphs are plotted using [matplotlib][matplotlib]. The results are then nicely packaged as a *Word (.docx)* document using [python-docx][python-docx].\n\n![screencast of report document from iris dataset][report-screencast]\n\n## Installation\n\nYou can install the package from [PyPI][eda-report-pypi] using:\n\n```bash\npip install eda-report\n```\n\n## Basic Usage\n\n### 1. Graphical User Interface\n\nThe `eda-report` command launches a graphical window to help select a `csv`/`excel` file to analyze:\n\n```bash\neda-report\n```\n\n![screencast of the gui][gui-screencast]\n\nYou'll be prompted to set a *report title*, *group-by/target variable (optional)*, *graph color* and *output filename*; after which the contents of the input file are analyzed, and the results saved in a *Word (.docx)* document.\n\n>**NOTE:** For help with `Tk` - related issues, consider visiting [TkDocs][tkdocs].\n\n### 2. Command Line Interface\n\n```bash\n$ eda-report -i iris.csv -o iris-report.docx\nAnalyze variables: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 5/5\nPlot variables: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 5/5\nBivariate analysis: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 6/6 pairs.\n[INFO 02:12:22.146] Done. Results saved as 'iris-report.docx'\n```\n\n```bash\n$ eda-report -h\nusage: eda-report [-h] [-i INFILE] [-o OUTFILE] [-t TITLE] [-c COLOR]\n [-g GROUPBY]\n\nAutomatically analyze data and generate reports. A graphical user interface\nwill be launched if none of the optional arguments is specified.\n\noptional arguments:\n -h, --help show this help message and exit\n -i INFILE, --infile INFILE\n A .csv or .xlsx file to analyze.\n -o OUTFILE, --outfile OUTFILE\n The output name for analysis results (default: eda-\n report.docx)\n -t TITLE, --title TITLE\n The top level heading for the report (default:\n Exploratory Data Analysis Report)\n -c COLOR, --color COLOR\n The color to apply to graphs (default: cyan)\n -g GROUPBY, -T GROUPBY, --groupby GROUPBY, --target GROUPBY\n The variable to use for grouping plotted values. An\n integer value is treated as a column index, whereas a\n string is treated as a column label.\n```\n\n</details>\n\n### 3. Interpreter Session\n\n```python\n>>> eda_report.summarize(iris_data)\n\n Summary Statistics for Numeric features (4)\n -------------------------------------------\n count avg stddev min 25% 50% 75% max skewness kurtosis\n sepal_length 150 5.8433 0.8281 4.3 5.1 5.80 6.4 7.9 0.3149 -0.5521\n sepal_width 150 3.0573 0.4359 2.0 2.8 3.00 3.3 4.4 0.3190 0.2282\n petal_length 150 3.7580 1.7653 1.0 1.6 4.35 5.1 6.9 -0.2749 -1.4021\n petal_width 150 1.1993 0.7622 0.1 0.3 1.30 1.8 2.5 -0.1030 -1.3406\n\n Summary Statistics for Categorical features (1)\n -----------------------------------------------\n count unique top freq relative freq\n species 150 3 setosa 50 33.33%\n\n\n Pearson's Correlation (Top 20)\n ------------------------------\n petal_length & petal_width -> very strong positive correlation (0.96)\n sepal_length & petal_length -> very strong positive correlation (0.87)\n sepal_length & petal_width -> very strong positive correlation (0.82)\n sepal_width & petal_length -> moderate negative correlation (-0.43)\n sepal_width & petal_width -> weak negative correlation (-0.37)\n sepal_length & sepal_width -> very weak negative correlation (-0.12)\n```\n\nCheck out the [documentation][docs] for more features and details.\n\n[docs]: https://eda-report.readthedocs.io/\n[eda-report-pypi]: https://pypi.org/project/eda-report/\n[matplotlib]: https://matplotlib.org/\n[pandas]: https://pandas.pydata.org/\n[python-docx]: https://python-docx.readthedocs.io/\n[scipy]: https://scipy.org/\n[gui-screencast]: https://raw.githubusercontent.com/Tim-Abwao/eda-report/dev/docs/source/_static/screencast.gif\n[report-screencast]: https://raw.githubusercontent.com/Tim-Abwao/eda-report/dev/docs/source/_static/report.gif\n[tkdocs]: https://tkdocs.com/index.html\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Automate exploratory data analysis and reporting.",
"version": "2.8.1",
"project_urls": {
"Homepage": "https://eda-report.readthedocs.io/",
"Source Code": "https://github.com/Tim-Abwao/eda-report"
},
"split_keywords": [
"eda",
"exploratory",
"data",
"analysis",
"report"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "896a84663a1e660c2e422d90f8a0b488fb6e8c511da7be46fcedc9c48b0bded5",
"md5": "2749112b16a2f1bd90d156cfb3f5eb8a",
"sha256": "4705271cd8a3a5ee1ab99c93667b4fd5c25696bf7834adc791d8e7c42f7a2c01"
},
"downloads": -1,
"filename": "eda_report-2.8.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "2749112b16a2f1bd90d156cfb3f5eb8a",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 43602,
"upload_time": "2023-08-19T21:10:06",
"upload_time_iso_8601": "2023-08-19T21:10:06.074452Z",
"url": "https://files.pythonhosted.org/packages/89/6a/84663a1e660c2e422d90f8a0b488fb6e8c511da7be46fcedc9c48b0bded5/eda_report-2.8.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "aad6436014225bd86a81b28a0c71baf6b11b22cdfca09f8c7eb49970074f5077",
"md5": "7e256aa9eed3e20b864c2f436f893d48",
"sha256": "42a3036241973def205085a854eacab147bd1bfd3ee1c084218334967f2430e3"
},
"downloads": -1,
"filename": "eda_report-2.8.1.tar.gz",
"has_sig": false,
"md5_digest": "7e256aa9eed3e20b864c2f436f893d48",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 42648,
"upload_time": "2023-08-19T21:10:09",
"upload_time_iso_8601": "2023-08-19T21:10:09.394992Z",
"url": "https://files.pythonhosted.org/packages/aa/d6/436014225bd86a81b28a0c71baf6b11b22cdfca09f8c7eb49970074f5077/eda_report-2.8.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-08-19 21:10:09",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Tim-Abwao",
"github_project": "eda-report",
"travis_ci": false,
"coveralls": true,
"github_actions": true,
"requirements": [],
"lcname": "eda-report"
}