# Data_analyser
data_analyser is a Python package for generating comprehensive profiling reports from pandas DataFrames, helping you quickly understand your data's structure and quality.
## ▶️ Quickstart
### Install
```cmd
pip install data_analyser
```
or
```cmd
conda install -c conda-forge data_analyser
```
### Start profiling
Start by loading your pandas `DataFrame` as you normally would, e.g. by using:
```python
import numpy as np
import pandas as pd
from data_analyser import ProfileReport
df = pd.DataFrame(np.random.rand(100, 5), columns=["a", "b", "c", "d", "e"])
```
To generate the standard profiling report, merely run:
```python
profile = ProfileReport(df, title="Profiling Report")
profile.to_file("output.html")
```
## 📊 Key features
- **Type inference**: automatic detection of columns' data types (*Categorical*, *Numerical*, *Date*, etc.)
- **Warnings**: A summary of the problems/challenges in the data that you might need to work on (*missing data*, *inaccuracies*, *skewness*, etc.)
- **Univariate analysis**: including descriptive statistics (mean, median, mode, etc) and informative visualizations such as distribution histograms
- **Multivariate analysis**: including correlations, a detailed analysis of missing data, duplicate rows, and visual support for variables pairwise interaction
- **Time-Series**: including different statistical information relative to time dependent data such as auto-correlation and seasonality, along ACF and PACF plots.
- **Text analysis**: most common categories (uppercase, lowercase, separator), scripts (Latin, Cyrillic) and blocks (ASCII, Cyrilic)
- **File and Image analysis**: file sizes, creation dates, dimensions, indication of truncated images and existence of EXIF metadata
- **Compare datasets**: one-line solution to enable a fast and complete report on the comparison of datasets
- **Flexible output formats**: all analysis can be exported to an HTML report that can be easily shared with different parties, as JSON for an easy integration in automated systems and as a widget in a Jupyter Notebook.
The report contains three additional sections:
- **Overview**: mostly global details about the dataset (number of records, number of variables, overall missigness and duplicates, memory footprint)
- **Alerts**: a comprehensive and automatic list of potential data quality issues (high correlation, skewness, uniformity, zeros, missing values, constant values, between others)
- **Reproduction**: technical details about the analysis (time, version and configuration)
### Exporting the report to a file
To generate a HTML report file, save the `ProfileReport` to an object and use the `to_file()` function:
```python
profile.to_file("your_report.html")
```
Alternatively, the report's data can be obtained as a JSON file:
```python
# As a JSON string
json_data = profile.to_json()
# As a file
profile.to_file("your_report.json")
```
## 🛠️ Installation
### Using pip
You can install using the `pip` package manager by running:
```sh
pip install -U data_analyser
```
#### Extras
The package declares "extras", sets of additional dependencies.
* `[notebook]`: support for rendering the report in Jupyter notebook widgets.
* `[unicode]`: support for more detailed Unicode analysis, at the expense of additional disk space.
* `[pyspark]`: support for pyspark for big dataset analysis
Install these with e.g.
```sh
pip install -U data_analyser[notebook,unicode,pyspark]
```
## 🙋 Support
Need help? Want to share a perspective? Report a bug? Ideas for collaborations?
Shoot me an email @ leandroofalero@outlook.com
## 🤝🏽 Contributing
A big thank you to all the team at Ydata-profiling in whose work I based this package
## License
This project is licensed under the MIT License
Raw data
{
"_id": null,
"home_page": "https://github.com/anthropoleo",
"name": "leo-data-analyser",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.13,>=3.7",
"maintainer_email": null,
"keywords": "pandas data-science data-analysis python jupyter ipython",
"author": "Leandro Falero",
"author_email": "leandroofalero@outlook.com",
"download_url": "https://files.pythonhosted.org/packages/2b/80/52acc96dae40397361472d8bb469ff2ccdb80928129be5c6a0249f9bfd54/leo_data_analyser-1.0.0.tar.gz",
"platform": null,
"description": "# Data_analyser\n\ndata_analyser is a Python package for generating comprehensive profiling reports from pandas DataFrames, helping you quickly understand your data's structure and quality.\n\n\n## \u25b6\ufe0f Quickstart\n\n### Install\n```cmd\npip install data_analyser\n```\nor\n```cmd\nconda install -c conda-forge data_analyser\n```\n### Start profiling\n\nStart by loading your pandas `DataFrame` as you normally would, e.g. by using:\n\n```python\nimport numpy as np\nimport pandas as pd\nfrom data_analyser import ProfileReport\n\ndf = pd.DataFrame(np.random.rand(100, 5), columns=[\"a\", \"b\", \"c\", \"d\", \"e\"])\n```\n\nTo generate the standard profiling report, merely run:\n\n```python\nprofile = ProfileReport(df, title=\"Profiling Report\")\nprofile.to_file(\"output.html\")\n\n```\n\n## \ud83d\udcca Key features\n\n- **Type inference**: automatic detection of columns' data types (*Categorical*, *Numerical*, *Date*, etc.)\n- **Warnings**: A summary of the problems/challenges in the data that you might need to work on (*missing data*, *inaccuracies*, *skewness*, etc.)\n- **Univariate analysis**: including descriptive statistics (mean, median, mode, etc) and informative visualizations such as distribution histograms\n- **Multivariate analysis**: including correlations, a detailed analysis of missing data, duplicate rows, and visual support for variables pairwise interaction\n- **Time-Series**: including different statistical information relative to time dependent data such as auto-correlation and seasonality, along ACF and PACF plots.\n- **Text analysis**: most common categories (uppercase, lowercase, separator), scripts (Latin, Cyrillic) and blocks (ASCII, Cyrilic)\n- **File and Image analysis**: file sizes, creation dates, dimensions, indication of truncated images and existence of EXIF metadata\n- **Compare datasets**: one-line solution to enable a fast and complete report on the comparison of datasets\n- **Flexible output formats**: all analysis can be exported to an HTML report that can be easily shared with different parties, as JSON for an easy integration in automated systems and as a widget in a Jupyter Notebook.\n\nThe report contains three additional sections:\n\n- **Overview**: mostly global details about the dataset (number of records, number of variables, overall missigness and duplicates, memory footprint)\n- **Alerts**: a comprehensive and automatic list of potential data quality issues (high correlation, skewness, uniformity, zeros, missing values, constant values, between others)\n- **Reproduction**: technical details about the analysis (time, version and configuration)\n\n\n### Exporting the report to a file\n\nTo generate a HTML report file, save the `ProfileReport` to an object and use the `to_file()` function:\n\n```python\nprofile.to_file(\"your_report.html\")\n```\n\nAlternatively, the report's data can be obtained as a JSON file:\n\n```python\n# As a JSON string\njson_data = profile.to_json()\n\n# As a file\nprofile.to_file(\"your_report.json\")\n```\n\n\n## \ud83d\udee0\ufe0f Installation\n\n\n### Using pip\n\n\nYou can install using the `pip` package manager by running:\n\n```sh\npip install -U data_analyser\n```\n\n#### Extras\n\nThe package declares \"extras\", sets of additional dependencies.\n\n* `[notebook]`: support for rendering the report in Jupyter notebook widgets.\n* `[unicode]`: support for more detailed Unicode analysis, at the expense of additional disk space.\n* `[pyspark]`: support for pyspark for big dataset analysis\n\nInstall these with e.g.\n\n```sh\npip install -U data_analyser[notebook,unicode,pyspark]\n```\n\n\n\n## \ud83d\ude4b Support\nNeed help? Want to share a perspective? Report a bug? Ideas for collaborations? \n\nShoot me an email @ leandroofalero@outlook.com\n\n\n\n## \ud83e\udd1d\ud83c\udffd Contributing\n\nA big thank you to all the team at Ydata-profiling in whose work I based this package\n\n\n## License \n\n\n\nThis project is licensed under the MIT License\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Generate profile report for pandas DataFrame",
"version": "1.0.0",
"project_urls": {
"Homepage": "https://github.com/anthropoleo"
},
"split_keywords": [
"pandas",
"data-science",
"data-analysis",
"python",
"jupyter",
"ipython"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "a5a6261882ec9052ffe50f54ccb1a87b66c502c2592e0e0b247f8a46307a8292",
"md5": "f16613af4ecc81ef84146a7d976fae8a",
"sha256": "110800e69ef67efeef3bcab7c5ebb3973af8ee37d989ed578535d1966fbf3b97"
},
"downloads": -1,
"filename": "leo_data_analyser-1.0.0-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "f16613af4ecc81ef84146a7d976fae8a",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": "<3.13,>=3.7",
"size": 350799,
"upload_time": "2024-08-30T07:14:27",
"upload_time_iso_8601": "2024-08-30T07:14:27.642637Z",
"url": "https://files.pythonhosted.org/packages/a5/a6/261882ec9052ffe50f54ccb1a87b66c502c2592e0e0b247f8a46307a8292/leo_data_analyser-1.0.0-py2.py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "2b8052acc96dae40397361472d8bb469ff2ccdb80928129be5c6a0249f9bfd54",
"md5": "02f4c13c4661902348a97d6d739ecd1e",
"sha256": "2b14548ce9c1033d6a92fb40baa53685c9cbe07cc12ef1e63bafbcd66426b6ce"
},
"downloads": -1,
"filename": "leo_data_analyser-1.0.0.tar.gz",
"has_sig": false,
"md5_digest": "02f4c13c4661902348a97d6d739ecd1e",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.13,>=3.7",
"size": 267506,
"upload_time": "2024-08-30T07:14:30",
"upload_time_iso_8601": "2024-08-30T07:14:30.032019Z",
"url": "https://files.pythonhosted.org/packages/2b/80/52acc96dae40397361472d8bb469ff2ccdb80928129be5c6a0249f9bfd54/leo_data_analyser-1.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-08-30 07:14:30",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "leo-data-analyser"
}