DataGuru


NameDataGuru JSON
Version 0.0.2 PyPI version JSON
download
home_page
SummaryThe "DataGuru" library is a comprehensive Python toolkit designed to simplify common data analysis tasks for data science students and practitioners. With a focus on ease of use and efficiency, this library provides a range of functions to streamline your data analysis workflow.
upload_time2023-06-29 08:42:42
maintainer
docs_urlNone
authorGuna M
requires_python>=3.6
licenseMIT
keywords data analysis
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            **Library Overview**

The "DataGuru" library is designed to provide various data analysis and preprocessing functionalities to simplify common tasks in data science projects. It includes primary functions such as `missingValues`, `FindOutliers`, and `analyzeData`. Let's analyze each of these functions in detail:

1. **`missingValues(data)`**: This function computes the missing values statistics for each column in the input data. It generates a DataFrame containing information such as the variable name, total values, total missing values, missing value rate, data type, unique values, and total unique values. The missing data DataFrame is sorted in descending order based on the total number of missing values.

2. **`FindOutliers(data, method='zscore')`**: This function detects outliers in numeric columns of the input data. It supports two outlier detection methods: Z-score and IQR (interquartile range). By default, the Z-score method is used. The function iterates over each numeric column and applies the specified outlier detection method. It then collects information about the column, including the mean, standard deviation, outliers, total outliers, and percentage of outliers. The resulting DataFrame is sorted in descending order based on the percentage of outliers.

3. **`analyzeData(data, numCol, catCol)`**: This function performs an analysis on the input data by grouping a numeric column (`numCol`) based on a categorical column (`catCol`). It calculates the mean, standard deviation, and percentage of the numeric column for each category. The results are displayed in a DataFrame sorted in descending order based on the mean value. Additionally, a bar plot is generated using Plotly Express, visualizing the mean, standard deviation, and percentage for each category.

**Future Enhancements**

The README mentions planned enhancements for the library, including segregation features in `analyzeData` similar to the "hue" parameter in seaborn, model comparison for regression, classification, and clustering, and data preprocessing capabilities. These additions will provide more flexibility and functionality to the library, enabling users to perform advanced analyses and streamline their data science workflows.

By incorporating these features, the "DataGuru" library aims to simplify common data analysis tasks, automate repetitive code, and enhance the productivity of data science students and practitioners.

Change Log
==========

0.0.1 (12/06/2023)
-------------------
- First Release
- features
--- 1. missingValues
--- 2. findOutliers
--- 3. analyzeData

=======================

0.0.2 (29/06/2023)
-------------------
fixed the bugs in the features
 

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "DataGuru",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "data analysis",
    "author": "Guna M",
    "author_email": "guna0professional@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/1a/d1/ff8169e9801715f844da69b916440501ff8ad4314144d3e1cf042dba9c19/DataGuru-0.0.2.tar.gz",
    "platform": null,
    "description": "**Library Overview**\r\n\r\nThe \"DataGuru\" library is designed to provide various data analysis and preprocessing functionalities to simplify common tasks in data science projects. It includes primary functions such as `missingValues`, `FindOutliers`, and `analyzeData`. Let's analyze each of these functions in detail:\r\n\r\n1. **`missingValues(data)`**: This function computes the missing values statistics for each column in the input data. It generates a DataFrame containing information such as the variable name, total values, total missing values, missing value rate, data type, unique values, and total unique values. The missing data DataFrame is sorted in descending order based on the total number of missing values.\r\n\r\n2. **`FindOutliers(data, method='zscore')`**: This function detects outliers in numeric columns of the input data. It supports two outlier detection methods: Z-score and IQR (interquartile range). By default, the Z-score method is used. The function iterates over each numeric column and applies the specified outlier detection method. It then collects information about the column, including the mean, standard deviation, outliers, total outliers, and percentage of outliers. The resulting DataFrame is sorted in descending order based on the percentage of outliers.\r\n\r\n3. **`analyzeData(data, numCol, catCol)`**: This function performs an analysis on the input data by grouping a numeric column (`numCol`) based on a categorical column (`catCol`). It calculates the mean, standard deviation, and percentage of the numeric column for each category. The results are displayed in a DataFrame sorted in descending order based on the mean value. Additionally, a bar plot is generated using Plotly Express, visualizing the mean, standard deviation, and percentage for each category.\r\n\r\n**Future Enhancements**\r\n\r\nThe README mentions planned enhancements for the library, including segregation features in `analyzeData` similar to the \"hue\" parameter in seaborn, model comparison for regression, classification, and clustering, and data preprocessing capabilities. These additions will provide more flexibility and functionality to the library, enabling users to perform advanced analyses and streamline their data science workflows.\r\n\r\nBy incorporating these features, the \"DataGuru\" library aims to simplify common data analysis tasks, automate repetitive code, and enhance the productivity of data science students and practitioners.\r\n\r\nChange Log\r\n==========\r\n\r\n0.0.1 (12/06/2023)\r\n-------------------\r\n- First Release\r\n- features\r\n--- 1. missingValues\r\n--- 2. findOutliers\r\n--- 3. analyzeData\r\n\r\n=======================\r\n\r\n0.0.2 (29/06/2023)\r\n-------------------\r\nfixed the bugs in the features\r\n \r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "The \"DataGuru\" library is a comprehensive Python toolkit designed to simplify common data analysis tasks for data science students and practitioners. With a focus on ease of use and efficiency, this library provides a range of functions to streamline your data analysis workflow.",
    "version": "0.0.2",
    "project_urls": null,
    "split_keywords": [
        "data",
        "analysis"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b4fb5d2bb6fbdb6200f428443fa119616e7865539290f26182f9991d8291d5fb",
                "md5": "16eb47528eb43b0a1304b73769173e39",
                "sha256": "5a3bd54c7f2e76fa0572f557467dfacc6f3dbcbfe75fd1041fe59b2bd5d6e74f"
            },
            "downloads": -1,
            "filename": "DataGuru-0.0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "16eb47528eb43b0a1304b73769173e39",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 4876,
            "upload_time": "2023-06-29T08:42:39",
            "upload_time_iso_8601": "2023-06-29T08:42:39.794198Z",
            "url": "https://files.pythonhosted.org/packages/b4/fb/5d2bb6fbdb6200f428443fa119616e7865539290f26182f9991d8291d5fb/DataGuru-0.0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1ad1ff8169e9801715f844da69b916440501ff8ad4314144d3e1cf042dba9c19",
                "md5": "91721a92276bc2553195a9142cf11e4c",
                "sha256": "ec6b222206e40ee823852ef87e8753954c8b311713b3c8c40a2795f528c35269"
            },
            "downloads": -1,
            "filename": "DataGuru-0.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "91721a92276bc2553195a9142cf11e4c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 4862,
            "upload_time": "2023-06-29T08:42:42",
            "upload_time_iso_8601": "2023-06-29T08:42:42.038792Z",
            "url": "https://files.pythonhosted.org/packages/1a/d1/ff8169e9801715f844da69b916440501ff8ad4314144d3e1cf042dba9c19/DataGuru-0.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-06-29 08:42:42",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "dataguru"
}
        
Elapsed time: 0.50500s