describr


Namedescribr JSON
Version 0.0.31 PyPI version JSON
download
home_pagehttps://github.com/famutimine/describr
SummaryDescribr is a Python library that provides a convenient way to generate descriptive statistics for datasets.
upload_time2024-02-07 04:34:42
maintainer
docs_urlNone
authorDaniel Famutimi MD, MPH
requires_python
licenseMIT
keywords descriptive statistics
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ## README.md

describr is a Python library that provides functionality for descriptive statistics and outlier detection in pandas DataFrames.

**Installation**

You can install describr using pip:

```python
pip install describr
```

#### Example usage
```python
import pandas as pd
import numpy as np
from describr import FindOutliers, DescriptiveStats
```
#### Create a sample dataframe
```python
np.random.seed(0)
n = 500

data = {
    'MCID': ['MCID_' + str(i) for i in range(1, n + 1)],
    'Age': np.random.randint(18, 90, size=n),
    'Race': np.random.choice(['White', 'Black', 'Asian', 'Hispanic',''], size=n),
    'Educational_Status': np.random.choice(['High School', 'Bachelor', 'Master', 'PhD',''], size=n),
    'Gender': np.random.choice(['Male', 'Female', ''], size=n),
    'ER_COST': np.random.uniform(500, 5000, size=n),
    'ER_VISITS': np.random.randint(0, 10, size=n),
    'IP_COST': np.random.uniform(5000, 20000, size=n),
    'IP_ADMITS': np.random.randint(0, 5, size=n),
    'CHF': np.random.choice([0, 1], size=n),
    'COPD': np.random.choice([0, 1], size=n),
    'DM': np.random.choice([0, 1], size=n),
    'ASTHMA': np.random.choice([0, 1], size=n),
    'HYPERTENSION': np.random.choice([0, 1], size=n),
    'SCHIZOPHRENIA': np.random.choice([0, 1], size=n),
    'MOOD_DEPRESSED': np.random.choice([0, 1], size=n),
    'MOOD_BIPOLAR': np.random.choice([0, 1], size=n),
    'TREATMENT': np.random.choice(['Yes', 'No'], size=n)
}

df = pd.DataFrame(data)
```
#### Parameters
**df**: name of dataframe

**id_col**: Primary key of the dataframe; accepts string or integer or float.

**group_col**: A Column to group by, It must be a binary column. Strings or integers are acceptable. 

**positive_class**: This is the response value for the primary outcome of interest. For instance, positive value for a Treatment cohort is 'Yes' or 1 otherwise 'No' or 0, respectively. Strings or integers are acceptable.

**continuous_var_summary**: User specifies measures of central tendency, only mean and median are acceptable. This parameter is case insensitive.


#### Example usage of FindOutliers Class

This returns a dataframe (outliers_flag_df) with outlier_flag column (outlier_flag =1: record contains one or more ouliers). Tukey's IQR method is used to detect outliers in the data

```python
outliers_flag=FindOutliers(df=df, id_col='MCID', group_col='TREATMENT')
outliers_flag_df=outliers_flag.flag_outliers()
```
#### This example counts number of rows with outliers stratified by a defined grouping variable
```python
outliers_flag.count_outliers()
```
#### This example removes all outliers
```python
df2=outliers_flag.remove_outliers()
df2.shape
```

#### Example usage of DescriptiveStats Class
```python 
descriptive_stats = DescriptiveStats(df=df, id_col='MCID', group_col='TREATMENT', positive_class='Yes', continuous_var_summary='median')
```
#### Gets statistics for binary and categorical variables and returns a dataframe.
```python
binary_stats_df = descriptive_stats.get_binary_stats()
```

#### Gets mean and standard deviation for continuous variables and returns a dataframe.

```python
continuous_stats_mean_df = descriptive_stats.get_continuous_mean_stats()
```

#### Gets median and interquartile range for continuous variables and returns a dataframe.
```python
continuous_stats_median_df = descriptive_stats.get_continuous_median_stats()
```

#### Computes summary statistics for binary and continuous variables based on defined measure of central tendency. Method returns a dataframe.
````python
descriptive_stats.compute_descriptive_stats()
summary_stats = descriptive_stats.summary_stats()
````

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/famutimine/describr",
    "name": "describr",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "descriptive statistics",
    "author": "Daniel Famutimi MD, MPH",
    "author_email": "danielfamutimi@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/79/b6/f8508682d3f88a1732de3cfbcc8d1ff889174c4e90551374e2252569b549/describr-0.0.31.tar.gz",
    "platform": null,
    "description": "## README.md\n\ndescribr is a Python library that provides functionality for descriptive statistics and outlier detection in pandas DataFrames.\n\n**Installation**\n\nYou can install describr using pip:\n\n```python\npip install describr\n```\n\n#### Example usage\n```python\nimport pandas as pd\nimport numpy as np\nfrom describr import FindOutliers, DescriptiveStats\n```\n#### Create a sample dataframe\n```python\nnp.random.seed(0)\nn = 500\n\ndata = {\n    'MCID': ['MCID_' + str(i) for i in range(1, n + 1)],\n    'Age': np.random.randint(18, 90, size=n),\n    'Race': np.random.choice(['White', 'Black', 'Asian', 'Hispanic',''], size=n),\n    'Educational_Status': np.random.choice(['High School', 'Bachelor', 'Master', 'PhD',''], size=n),\n    'Gender': np.random.choice(['Male', 'Female', ''], size=n),\n    'ER_COST': np.random.uniform(500, 5000, size=n),\n    'ER_VISITS': np.random.randint(0, 10, size=n),\n    'IP_COST': np.random.uniform(5000, 20000, size=n),\n    'IP_ADMITS': np.random.randint(0, 5, size=n),\n    'CHF': np.random.choice([0, 1], size=n),\n    'COPD': np.random.choice([0, 1], size=n),\n    'DM': np.random.choice([0, 1], size=n),\n    'ASTHMA': np.random.choice([0, 1], size=n),\n    'HYPERTENSION': np.random.choice([0, 1], size=n),\n    'SCHIZOPHRENIA': np.random.choice([0, 1], size=n),\n    'MOOD_DEPRESSED': np.random.choice([0, 1], size=n),\n    'MOOD_BIPOLAR': np.random.choice([0, 1], size=n),\n    'TREATMENT': np.random.choice(['Yes', 'No'], size=n)\n}\n\ndf = pd.DataFrame(data)\n```\n#### Parameters\n**df**: name of dataframe\n\n**id_col**: Primary key of the dataframe; accepts string or integer or float.\n\n**group_col**: A Column to group by, It must be a binary column. Strings or integers are acceptable. \n\n**positive_class**: This is the response value for the primary outcome of interest. For instance, positive value for a Treatment cohort is 'Yes' or 1 otherwise 'No' or 0, respectively. Strings or integers are acceptable.\n\n**continuous_var_summary**: User specifies measures of central tendency, only mean and median are acceptable. This parameter is case insensitive.\n\n\n#### Example usage of FindOutliers Class\n\nThis returns a dataframe (outliers_flag_df) with outlier_flag column (outlier_flag =1: record contains one or more ouliers). Tukey's IQR method is used to detect outliers in the data\n\n```python\noutliers_flag=FindOutliers(df=df, id_col='MCID', group_col='TREATMENT')\noutliers_flag_df=outliers_flag.flag_outliers()\n```\n#### This example counts number of rows with outliers stratified by a defined grouping variable\n```python\noutliers_flag.count_outliers()\n```\n#### This example removes all outliers\n```python\ndf2=outliers_flag.remove_outliers()\ndf2.shape\n```\n\n#### Example usage of DescriptiveStats Class\n```python \ndescriptive_stats = DescriptiveStats(df=df, id_col='MCID', group_col='TREATMENT', positive_class='Yes', continuous_var_summary='median')\n```\n#### Gets statistics for binary and categorical variables and returns a dataframe.\n```python\nbinary_stats_df = descriptive_stats.get_binary_stats()\n```\n\n#### Gets mean and standard deviation for continuous variables and returns a dataframe.\n\n```python\ncontinuous_stats_mean_df = descriptive_stats.get_continuous_mean_stats()\n```\n\n#### Gets median and interquartile range for continuous variables and returns a dataframe.\n```python\ncontinuous_stats_median_df = descriptive_stats.get_continuous_median_stats()\n```\n\n#### Computes summary statistics for binary and continuous variables based on defined measure of central tendency. Method returns a dataframe.\n````python\ndescriptive_stats.compute_descriptive_stats()\nsummary_stats = descriptive_stats.summary_stats()\n````\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Describr is a Python library that provides a convenient way to generate descriptive statistics for datasets.",
    "version": "0.0.31",
    "project_urls": {
        "Homepage": "https://github.com/famutimine/describr"
    },
    "split_keywords": [
        "descriptive",
        "statistics"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2d73b33cf46cec122fca1c8083d8862bec632e7543d237119575410d8b3c5c2b",
                "md5": "506b70bce3887597bed8ba704759bfff",
                "sha256": "8057ee6a95c04af49b233266d9f7814b67cd1ea179171856edd2c5167a0c91d6"
            },
            "downloads": -1,
            "filename": "describr-0.0.31-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "506b70bce3887597bed8ba704759bfff",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 6252,
            "upload_time": "2024-02-07T04:34:41",
            "upload_time_iso_8601": "2024-02-07T04:34:41.483848Z",
            "url": "https://files.pythonhosted.org/packages/2d/73/b33cf46cec122fca1c8083d8862bec632e7543d237119575410d8b3c5c2b/describr-0.0.31-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "79b6f8508682d3f88a1732de3cfbcc8d1ff889174c4e90551374e2252569b549",
                "md5": "bc84962c350601498a782a1a12194611",
                "sha256": "1a64fd7e36f6709944a4f88d63635f83613f343663cddb7b2b7e41fba140d1c9"
            },
            "downloads": -1,
            "filename": "describr-0.0.31.tar.gz",
            "has_sig": false,
            "md5_digest": "bc84962c350601498a782a1a12194611",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 6578,
            "upload_time": "2024-02-07T04:34:42",
            "upload_time_iso_8601": "2024-02-07T04:34:42.597895Z",
            "url": "https://files.pythonhosted.org/packages/79/b6/f8508682d3f88a1732de3cfbcc8d1ff889174c4e90551374e2252569b549/describr-0.0.31.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-07 04:34:42",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "famutimine",
    "github_project": "describr",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "describr"
}
        
Elapsed time: 0.25582s