leo-data-analyser


Nameleo-data-analyser JSON
Version 1.0.0 PyPI version JSON
download
home_pagehttps://github.com/anthropoleo
SummaryGenerate profile report for pandas DataFrame
upload_time2024-08-30 07:14:30
maintainerNone
docs_urlNone
authorLeandro Falero
requires_python<3.13,>=3.7
licenseMIT
keywords pandas data-science data-analysis python jupyter ipython
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Data_analyser

data_analyser is a Python package for generating comprehensive profiling reports from pandas DataFrames, helping you quickly understand your data's structure and quality.


## ▶️ Quickstart

### Install
```cmd
pip install data_analyser
```
or
```cmd
conda install -c conda-forge data_analyser
```
### Start profiling

Start by loading your pandas `DataFrame` as you normally would, e.g. by using:

```python
import numpy as np
import pandas as pd
from data_analyser import ProfileReport

df = pd.DataFrame(np.random.rand(100, 5), columns=["a", "b", "c", "d", "e"])
```

To generate the standard profiling report, merely run:

```python
profile = ProfileReport(df, title="Profiling Report")
profile.to_file("output.html")

```

## 📊 Key features

- **Type inference**: automatic detection of columns' data types (*Categorical*, *Numerical*, *Date*, etc.)
- **Warnings**: A summary of the problems/challenges in the data that you might need to work on (*missing data*, *inaccuracies*, *skewness*, etc.)
- **Univariate analysis**: including descriptive statistics (mean, median, mode, etc) and informative visualizations such as distribution histograms
- **Multivariate analysis**: including correlations, a detailed analysis of missing data, duplicate rows, and visual support for variables pairwise interaction
- **Time-Series**: including different statistical information relative to time dependent data such as auto-correlation and seasonality, along ACF and PACF plots.
- **Text analysis**: most common categories (uppercase, lowercase, separator), scripts (Latin, Cyrillic) and blocks (ASCII, Cyrilic)
- **File and Image analysis**: file sizes, creation dates, dimensions, indication of truncated images and existence of EXIF metadata
- **Compare datasets**: one-line solution to enable a fast and complete report on the comparison of datasets
- **Flexible output formats**: all analysis can be exported to an HTML report that can be easily shared with different parties, as JSON for an easy integration in automated systems and as a widget in a Jupyter Notebook.

The report contains three additional sections:

- **Overview**: mostly global details about the dataset (number of records, number of variables, overall missigness and duplicates, memory footprint)
- **Alerts**: a comprehensive and automatic list of potential data quality issues (high correlation, skewness, uniformity, zeros, missing values, constant values, between others)
- **Reproduction**: technical details about the analysis (time, version and configuration)


### Exporting the report to a file

To generate a HTML report file, save the `ProfileReport` to an object and use the `to_file()` function:

```python
profile.to_file("your_report.html")
```

Alternatively, the report's data can be obtained as a JSON file:

```python
# As a JSON string
json_data = profile.to_json()

# As a file
profile.to_file("your_report.json")
```


## 🛠️ Installation


### Using pip


You can install using the `pip` package manager by running:

```sh
pip install -U data_analyser
```

#### Extras

The package declares "extras", sets of additional dependencies.

* `[notebook]`: support for rendering the report in Jupyter notebook widgets.
* `[unicode]`: support for more detailed Unicode analysis, at the expense of additional disk space.
* `[pyspark]`: support for pyspark for big dataset analysis

Install these with e.g.

```sh
pip install -U data_analyser[notebook,unicode,pyspark]
```



## 🙋 Support
Need help? Want to share a perspective? Report a bug? Ideas for collaborations? 

Shoot me an email @ leandroofalero@outlook.com



## 🤝🏽 Contributing

A big thank you to all the team at Ydata-profiling in whose work I based this package


## License 



This project is licensed under the MIT License

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/anthropoleo",
    "name": "leo-data-analyser",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.13,>=3.7",
    "maintainer_email": null,
    "keywords": "pandas data-science data-analysis python jupyter ipython",
    "author": "Leandro Falero",
    "author_email": "leandroofalero@outlook.com",
    "download_url": "https://files.pythonhosted.org/packages/2b/80/52acc96dae40397361472d8bb469ff2ccdb80928129be5c6a0249f9bfd54/leo_data_analyser-1.0.0.tar.gz",
    "platform": null,
    "description": "# Data_analyser\n\ndata_analyser is a Python package for generating comprehensive profiling reports from pandas DataFrames, helping you quickly understand your data's structure and quality.\n\n\n## \u25b6\ufe0f Quickstart\n\n### Install\n```cmd\npip install data_analyser\n```\nor\n```cmd\nconda install -c conda-forge data_analyser\n```\n### Start profiling\n\nStart by loading your pandas `DataFrame` as you normally would, e.g. by using:\n\n```python\nimport numpy as np\nimport pandas as pd\nfrom data_analyser import ProfileReport\n\ndf = pd.DataFrame(np.random.rand(100, 5), columns=[\"a\", \"b\", \"c\", \"d\", \"e\"])\n```\n\nTo generate the standard profiling report, merely run:\n\n```python\nprofile = ProfileReport(df, title=\"Profiling Report\")\nprofile.to_file(\"output.html\")\n\n```\n\n## \ud83d\udcca Key features\n\n- **Type inference**: automatic detection of columns' data types (*Categorical*, *Numerical*, *Date*, etc.)\n- **Warnings**: A summary of the problems/challenges in the data that you might need to work on (*missing data*, *inaccuracies*, *skewness*, etc.)\n- **Univariate analysis**: including descriptive statistics (mean, median, mode, etc) and informative visualizations such as distribution histograms\n- **Multivariate analysis**: including correlations, a detailed analysis of missing data, duplicate rows, and visual support for variables pairwise interaction\n- **Time-Series**: including different statistical information relative to time dependent data such as auto-correlation and seasonality, along ACF and PACF plots.\n- **Text analysis**: most common categories (uppercase, lowercase, separator), scripts (Latin, Cyrillic) and blocks (ASCII, Cyrilic)\n- **File and Image analysis**: file sizes, creation dates, dimensions, indication of truncated images and existence of EXIF metadata\n- **Compare datasets**: one-line solution to enable a fast and complete report on the comparison of datasets\n- **Flexible output formats**: all analysis can be exported to an HTML report that can be easily shared with different parties, as JSON for an easy integration in automated systems and as a widget in a Jupyter Notebook.\n\nThe report contains three additional sections:\n\n- **Overview**: mostly global details about the dataset (number of records, number of variables, overall missigness and duplicates, memory footprint)\n- **Alerts**: a comprehensive and automatic list of potential data quality issues (high correlation, skewness, uniformity, zeros, missing values, constant values, between others)\n- **Reproduction**: technical details about the analysis (time, version and configuration)\n\n\n### Exporting the report to a file\n\nTo generate a HTML report file, save the `ProfileReport` to an object and use the `to_file()` function:\n\n```python\nprofile.to_file(\"your_report.html\")\n```\n\nAlternatively, the report's data can be obtained as a JSON file:\n\n```python\n# As a JSON string\njson_data = profile.to_json()\n\n# As a file\nprofile.to_file(\"your_report.json\")\n```\n\n\n## \ud83d\udee0\ufe0f Installation\n\n\n### Using pip\n\n\nYou can install using the `pip` package manager by running:\n\n```sh\npip install -U data_analyser\n```\n\n#### Extras\n\nThe package declares \"extras\", sets of additional dependencies.\n\n* `[notebook]`: support for rendering the report in Jupyter notebook widgets.\n* `[unicode]`: support for more detailed Unicode analysis, at the expense of additional disk space.\n* `[pyspark]`: support for pyspark for big dataset analysis\n\nInstall these with e.g.\n\n```sh\npip install -U data_analyser[notebook,unicode,pyspark]\n```\n\n\n\n## \ud83d\ude4b Support\nNeed help? Want to share a perspective? Report a bug? Ideas for collaborations? \n\nShoot me an email @ leandroofalero@outlook.com\n\n\n\n## \ud83e\udd1d\ud83c\udffd Contributing\n\nA big thank you to all the team at Ydata-profiling in whose work I based this package\n\n\n## License \n\n\n\nThis project is licensed under the MIT License\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Generate profile report for pandas DataFrame",
    "version": "1.0.0",
    "project_urls": {
        "Homepage": "https://github.com/anthropoleo"
    },
    "split_keywords": [
        "pandas",
        "data-science",
        "data-analysis",
        "python",
        "jupyter",
        "ipython"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a5a6261882ec9052ffe50f54ccb1a87b66c502c2592e0e0b247f8a46307a8292",
                "md5": "f16613af4ecc81ef84146a7d976fae8a",
                "sha256": "110800e69ef67efeef3bcab7c5ebb3973af8ee37d989ed578535d1966fbf3b97"
            },
            "downloads": -1,
            "filename": "leo_data_analyser-1.0.0-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "f16613af4ecc81ef84146a7d976fae8a",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": "<3.13,>=3.7",
            "size": 350799,
            "upload_time": "2024-08-30T07:14:27",
            "upload_time_iso_8601": "2024-08-30T07:14:27.642637Z",
            "url": "https://files.pythonhosted.org/packages/a5/a6/261882ec9052ffe50f54ccb1a87b66c502c2592e0e0b247f8a46307a8292/leo_data_analyser-1.0.0-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2b8052acc96dae40397361472d8bb469ff2ccdb80928129be5c6a0249f9bfd54",
                "md5": "02f4c13c4661902348a97d6d739ecd1e",
                "sha256": "2b14548ce9c1033d6a92fb40baa53685c9cbe07cc12ef1e63bafbcd66426b6ce"
            },
            "downloads": -1,
            "filename": "leo_data_analyser-1.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "02f4c13c4661902348a97d6d739ecd1e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.13,>=3.7",
            "size": 267506,
            "upload_time": "2024-08-30T07:14:30",
            "upload_time_iso_8601": "2024-08-30T07:14:30.032019Z",
            "url": "https://files.pythonhosted.org/packages/2b/80/52acc96dae40397361472d8bb469ff2ccdb80928129be5c6a0249f9bfd54/leo_data_analyser-1.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-30 07:14:30",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "leo-data-analyser"
}
        
Elapsed time: 0.34944s