eda-report


Nameeda-report JSON
Version 2.8.1 PyPI version JSON
download
home_pagehttps://eda-report.readthedocs.io/
SummaryAutomate exploratory data analysis and reporting.
upload_time2023-08-19 21:10:09
maintainer
docs_urlNone
authorAbwao
requires_python>=3.9
licenseMIT
keywords eda exploratory data analysis report
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage
            # `eda-report` - Automated Exploratory Data Analysis

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/Tim-Abwao/eda-report/HEAD?filepath=eda-report-basics.ipynb)
[![PyPI version](https://badge.fury.io/py/eda-report.svg)](https://badge.fury.io/py/eda-report)
[![Python 3.9 - 3.11](https://github.com/Tim-Abwao/eda-report/actions/workflows/unit-tests.yml/badge.svg)](https://github.com/Tim-Abwao/eda-report/actions/workflows/unit-tests.yml)
[![Documentation Status](https://readthedocs.org/projects/eda-report/badge/?version=latest)](https://eda-report.readthedocs.io/en/latest/?badge=latest)
[![codecov](https://codecov.io/gh/Tim-Abwao/eda-report/branch/main/graph/badge.svg?token=KNQD8XZCWG)](https://codecov.io/gh/Tim-Abwao/eda-report)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

A Python program to help automate the exploratory data analysis and reporting process.

Input data is analyzed using [pandas][pandas] and [SciPy][scipy]. Graphs are plotted using [matplotlib][matplotlib]. The results are then nicely packaged as a *Word (.docx)* document using [python-docx][python-docx].

![screencast of report document from iris dataset][report-screencast]

## Installation

You can install the package from [PyPI][eda-report-pypi] using:

```bash
pip install eda-report
```

## Basic Usage

### 1. Graphical User Interface

The `eda-report` command launches a graphical window to help select a `csv`/`excel` file to analyze:

```bash
eda-report
```

![screencast of the gui][gui-screencast]

You'll be prompted to set a *report title*, *group-by/target variable (optional)*, *graph color* and *output filename*; after which the contents of the input file are analyzed, and the results saved in a *Word (.docx)* document.

>**NOTE:** For help with `Tk` - related issues, consider visiting [TkDocs][tkdocs].

### 2. Command Line Interface

```bash
$ eda-report -i iris.csv -o iris-report.docx
Analyze variables:  100%|███████████████████████████████████| 5/5
Plot variables:     100%|███████████████████████████████████| 5/5
Bivariate analysis: 100%|███████████████████████████████████| 6/6 pairs.
[INFO 02:12:22.146] Done. Results saved as 'iris-report.docx'
```

```bash
$ eda-report -h
usage: eda-report [-h] [-i INFILE] [-o OUTFILE] [-t TITLE] [-c COLOR]
                  [-g GROUPBY]

Automatically analyze data and generate reports. A graphical user interface
will be launched if none of the optional arguments is specified.

optional arguments:
  -h, --help            show this help message and exit
  -i INFILE, --infile INFILE
                        A .csv or .xlsx file to analyze.
  -o OUTFILE, --outfile OUTFILE
                        The output name for analysis results (default: eda-
                        report.docx)
  -t TITLE, --title TITLE
                        The top level heading for the report (default:
                        Exploratory Data Analysis Report)
  -c COLOR, --color COLOR
                        The color to apply to graphs (default: cyan)
  -g GROUPBY, -T GROUPBY, --groupby GROUPBY, --target GROUPBY
                        The variable to use for grouping plotted values. An
                        integer value is treated as a column index, whereas a
                        string is treated as a column label.
```

</details>

### 3. Interpreter Session

```python
>>> eda_report.summarize(iris_data)

                  Summary Statistics for Numeric features (4)
                  -------------------------------------------
                count     avg  stddev  min  25%   50%  75%  max  skewness  kurtosis
  sepal_length    150  5.8433  0.8281  4.3  5.1  5.80  6.4  7.9    0.3149   -0.5521
  sepal_width     150  3.0573  0.4359  2.0  2.8  3.00  3.3  4.4    0.3190    0.2282
  petal_length    150  3.7580  1.7653  1.0  1.6  4.35  5.1  6.9   -0.2749   -1.4021
  petal_width     150  1.1993  0.7622  0.1  0.3  1.30  1.8  2.5   -0.1030   -1.3406

                Summary Statistics for Categorical features (1)
                -----------------------------------------------
                    count unique     top freq relative freq
            species   150      3  setosa   50        33.33%


                        Pearson's Correlation (Top 20)
                        ------------------------------
      petal_length & petal_width -> very strong positive correlation (0.96)
     sepal_length & petal_length -> very strong positive correlation (0.87)
      sepal_length & petal_width -> very strong positive correlation (0.82)
      sepal_width & petal_length -> moderate negative correlation (-0.43)
       sepal_width & petal_width -> weak negative correlation (-0.37)
      sepal_length & sepal_width -> very weak negative correlation (-0.12)
```

Check out the [documentation][docs] for more features and details.

[docs]: https://eda-report.readthedocs.io/
[eda-report-pypi]: https://pypi.org/project/eda-report/
[matplotlib]: https://matplotlib.org/
[pandas]: https://pandas.pydata.org/
[python-docx]: https://python-docx.readthedocs.io/
[scipy]: https://scipy.org/
[gui-screencast]: https://raw.githubusercontent.com/Tim-Abwao/eda-report/dev/docs/source/_static/screencast.gif
[report-screencast]: https://raw.githubusercontent.com/Tim-Abwao/eda-report/dev/docs/source/_static/report.gif
[tkdocs]: https://tkdocs.com/index.html

            

Raw data

            {
    "_id": null,
    "home_page": "https://eda-report.readthedocs.io/",
    "name": "eda-report",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": "",
    "keywords": "eda exploratory data analysis report",
    "author": "Abwao",
    "author_email": "abwaomusungu@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/aa/d6/436014225bd86a81b28a0c71baf6b11b22cdfca09f8c7eb49970074f5077/eda_report-2.8.1.tar.gz",
    "platform": null,
    "description": "# `eda-report` - Automated Exploratory Data Analysis\n\n[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/Tim-Abwao/eda-report/HEAD?filepath=eda-report-basics.ipynb)\n[![PyPI version](https://badge.fury.io/py/eda-report.svg)](https://badge.fury.io/py/eda-report)\n[![Python 3.9 - 3.11](https://github.com/Tim-Abwao/eda-report/actions/workflows/unit-tests.yml/badge.svg)](https://github.com/Tim-Abwao/eda-report/actions/workflows/unit-tests.yml)\n[![Documentation Status](https://readthedocs.org/projects/eda-report/badge/?version=latest)](https://eda-report.readthedocs.io/en/latest/?badge=latest)\n[![codecov](https://codecov.io/gh/Tim-Abwao/eda-report/branch/main/graph/badge.svg?token=KNQD8XZCWG)](https://codecov.io/gh/Tim-Abwao/eda-report)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n\nA Python program to help automate the exploratory data analysis and reporting process.\n\nInput data is analyzed using [pandas][pandas] and [SciPy][scipy]. Graphs are plotted using [matplotlib][matplotlib]. The results are then nicely packaged as a *Word (.docx)* document using [python-docx][python-docx].\n\n![screencast of report document from iris dataset][report-screencast]\n\n## Installation\n\nYou can install the package from [PyPI][eda-report-pypi] using:\n\n```bash\npip install eda-report\n```\n\n## Basic Usage\n\n### 1. Graphical User Interface\n\nThe `eda-report` command launches a graphical window to help select a `csv`/`excel` file to analyze:\n\n```bash\neda-report\n```\n\n![screencast of the gui][gui-screencast]\n\nYou'll be prompted to set a *report title*, *group-by/target variable (optional)*, *graph color* and *output filename*; after which the contents of the input file are analyzed, and the results saved in a *Word (.docx)* document.\n\n>**NOTE:** For help with `Tk` - related issues, consider visiting [TkDocs][tkdocs].\n\n### 2. Command Line Interface\n\n```bash\n$ eda-report -i iris.csv -o iris-report.docx\nAnalyze variables:  100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 5/5\nPlot variables:     100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 5/5\nBivariate analysis: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 6/6 pairs.\n[INFO 02:12:22.146] Done. Results saved as 'iris-report.docx'\n```\n\n```bash\n$ eda-report -h\nusage: eda-report [-h] [-i INFILE] [-o OUTFILE] [-t TITLE] [-c COLOR]\n                  [-g GROUPBY]\n\nAutomatically analyze data and generate reports. A graphical user interface\nwill be launched if none of the optional arguments is specified.\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -i INFILE, --infile INFILE\n                        A .csv or .xlsx file to analyze.\n  -o OUTFILE, --outfile OUTFILE\n                        The output name for analysis results (default: eda-\n                        report.docx)\n  -t TITLE, --title TITLE\n                        The top level heading for the report (default:\n                        Exploratory Data Analysis Report)\n  -c COLOR, --color COLOR\n                        The color to apply to graphs (default: cyan)\n  -g GROUPBY, -T GROUPBY, --groupby GROUPBY, --target GROUPBY\n                        The variable to use for grouping plotted values. An\n                        integer value is treated as a column index, whereas a\n                        string is treated as a column label.\n```\n\n</details>\n\n### 3. Interpreter Session\n\n```python\n>>> eda_report.summarize(iris_data)\n\n                  Summary Statistics for Numeric features (4)\n                  -------------------------------------------\n                count     avg  stddev  min  25%   50%  75%  max  skewness  kurtosis\n  sepal_length    150  5.8433  0.8281  4.3  5.1  5.80  6.4  7.9    0.3149   -0.5521\n  sepal_width     150  3.0573  0.4359  2.0  2.8  3.00  3.3  4.4    0.3190    0.2282\n  petal_length    150  3.7580  1.7653  1.0  1.6  4.35  5.1  6.9   -0.2749   -1.4021\n  petal_width     150  1.1993  0.7622  0.1  0.3  1.30  1.8  2.5   -0.1030   -1.3406\n\n                Summary Statistics for Categorical features (1)\n                -----------------------------------------------\n                    count unique     top freq relative freq\n            species   150      3  setosa   50        33.33%\n\n\n                        Pearson's Correlation (Top 20)\n                        ------------------------------\n      petal_length & petal_width -> very strong positive correlation (0.96)\n     sepal_length & petal_length -> very strong positive correlation (0.87)\n      sepal_length & petal_width -> very strong positive correlation (0.82)\n      sepal_width & petal_length -> moderate negative correlation (-0.43)\n       sepal_width & petal_width -> weak negative correlation (-0.37)\n      sepal_length & sepal_width -> very weak negative correlation (-0.12)\n```\n\nCheck out the [documentation][docs] for more features and details.\n\n[docs]: https://eda-report.readthedocs.io/\n[eda-report-pypi]: https://pypi.org/project/eda-report/\n[matplotlib]: https://matplotlib.org/\n[pandas]: https://pandas.pydata.org/\n[python-docx]: https://python-docx.readthedocs.io/\n[scipy]: https://scipy.org/\n[gui-screencast]: https://raw.githubusercontent.com/Tim-Abwao/eda-report/dev/docs/source/_static/screencast.gif\n[report-screencast]: https://raw.githubusercontent.com/Tim-Abwao/eda-report/dev/docs/source/_static/report.gif\n[tkdocs]: https://tkdocs.com/index.html\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Automate exploratory data analysis and reporting.",
    "version": "2.8.1",
    "project_urls": {
        "Homepage": "https://eda-report.readthedocs.io/",
        "Source Code": "https://github.com/Tim-Abwao/eda-report"
    },
    "split_keywords": [
        "eda",
        "exploratory",
        "data",
        "analysis",
        "report"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "896a84663a1e660c2e422d90f8a0b488fb6e8c511da7be46fcedc9c48b0bded5",
                "md5": "2749112b16a2f1bd90d156cfb3f5eb8a",
                "sha256": "4705271cd8a3a5ee1ab99c93667b4fd5c25696bf7834adc791d8e7c42f7a2c01"
            },
            "downloads": -1,
            "filename": "eda_report-2.8.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "2749112b16a2f1bd90d156cfb3f5eb8a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 43602,
            "upload_time": "2023-08-19T21:10:06",
            "upload_time_iso_8601": "2023-08-19T21:10:06.074452Z",
            "url": "https://files.pythonhosted.org/packages/89/6a/84663a1e660c2e422d90f8a0b488fb6e8c511da7be46fcedc9c48b0bded5/eda_report-2.8.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "aad6436014225bd86a81b28a0c71baf6b11b22cdfca09f8c7eb49970074f5077",
                "md5": "7e256aa9eed3e20b864c2f436f893d48",
                "sha256": "42a3036241973def205085a854eacab147bd1bfd3ee1c084218334967f2430e3"
            },
            "downloads": -1,
            "filename": "eda_report-2.8.1.tar.gz",
            "has_sig": false,
            "md5_digest": "7e256aa9eed3e20b864c2f436f893d48",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 42648,
            "upload_time": "2023-08-19T21:10:09",
            "upload_time_iso_8601": "2023-08-19T21:10:09.394992Z",
            "url": "https://files.pythonhosted.org/packages/aa/d6/436014225bd86a81b28a0c71baf6b11b22cdfca09f8c7eb49970074f5077/eda_report-2.8.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-08-19 21:10:09",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Tim-Abwao",
    "github_project": "eda-report",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "requirements": [],
    "lcname": "eda-report"
}
        
Elapsed time: 0.10186s