sparse-profile


Namesparse-profile JSON
Version 0.1.1 PyPI version JSON
download
home_pagehttps://github.com/domarp/sparse_profile
SummaryEDA on sparse data for classification problems
upload_time2021-04-21 15:28:03
maintainer
docs_urlNone
authorPramod Balakrishnan
requires_python
licenseMIT
keywords eda sparse data classification
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Sparse profile  - EDA on sparse data
Module to perform EDA tasks for a classification problem with sparse data<br>
Curently takes only numeric values

Sample usage
------------
```python
import pandas as pd
import numpy as np
from sparse_profile import sparse_profile

df = pd.DataFrame({
        'target' : [1, 1, 1, 1, 0, 0 ,0 ,0, 1, 0],
        'col_1' :  [1, 0, 0, 0, 0, 0, 0, 0, 0, 9],
        'col_2' :  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    })
sProfile = sparse_profile(df, 'target')
print(sProfile.top_gain)
```
Output maximum gain obtained from each column
```
col_2    0.422810
col_1    0.074882
dtype: float64
```

```python
print(sProfile.report_sparsity)
```
Output percentage of zeros in column

```
col_1  0.8
col_2  0.1
```

Various sparse_profile reports can be accessed as attributes of the sparse_profile class object. List of all available attributes:
* report_sparsity:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; pandas dataframe, Percentage of zeros in each column
* report_distinct:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;pandas dataframe, Count of distinct non zero values in each column
* report_overall:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;pandas dataframe, Overall summary of each column (similar to pandas describe())
* report_non_zero:&nbsp;&nbsp;&nbsp;&nbsp;pandas dataframe, Summary of each column after removing zeros
* gain_df:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;pandas dataframe, Relative information gain at decile cutoffs for each column wrt target column
* auc_df:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;pandas dataframe, AUC of each column wrt target column
* top_gain:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;pandas dataframe, Columns sorted by maximum gain obtained from gain_df


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/domarp/sparse_profile",
    "name": "sparse-profile",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "eda,sparse data,classification",
    "author": "Pramod Balakrishnan",
    "author_email": "prmdbk@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/63/82/e099d4738d968132a483d09187734158998ead6a9bfda7f269b3ad81a787/sparse_profile-0.1.1.tar.gz",
    "platform": "",
    "description": "# Sparse profile  - EDA on sparse data\nModule to perform EDA tasks for a classification problem with sparse data<br>\nCurently takes only numeric values\n\nSample usage\n------------\n```python\nimport pandas as pd\nimport numpy as np\nfrom sparse_profile import sparse_profile\n\ndf = pd.DataFrame({\n        'target' : [1, 1, 1, 1, 0, 0 ,0 ,0, 1, 0],\n        'col_1' :  [1, 0, 0, 0, 0, 0, 0, 0, 0, 9],\n        'col_2' :  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]\n    })\nsProfile = sparse_profile(df, 'target')\nprint(sProfile.top_gain)\n```\nOutput maximum gain obtained from each column\n```\ncol_2    0.422810\ncol_1    0.074882\ndtype: float64\n```\n\n```python\nprint(sProfile.report_sparsity)\n```\nOutput percentage of zeros in column\n\n```\ncol_1  0.8\ncol_2  0.1\n```\n\nVarious sparse_profile reports can be accessed as attributes of the sparse_profile class object. List of all available attributes:\n* report_sparsity:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; pandas dataframe, Percentage of zeros in each column\n* report_distinct:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;pandas dataframe, Count of distinct non zero values in each column\n* report_overall:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;pandas dataframe, Overall summary of each column (similar to pandas describe())\n* report_non_zero:&nbsp;&nbsp;&nbsp;&nbsp;pandas dataframe, Summary of each column after removing zeros\n* gain_df:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;pandas dataframe, Relative information gain at decile cutoffs for each column wrt target column\n* auc_df:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;pandas dataframe, AUC of each column wrt target column\n* top_gain:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;pandas dataframe, Columns sorted by maximum gain obtained from gain_df\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "EDA on sparse data for classification problems",
    "version": "0.1.1",
    "split_keywords": [
        "eda",
        "sparse data",
        "classification"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "e97ccdbf8eacedc3ab5b9bf066889c58",
                "sha256": "d92195cb0f936ff6770de53ac81f472c716a6f58fe2f9544bfe4d7e61a8a1b2d"
            },
            "downloads": -1,
            "filename": "sparse_profile-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e97ccdbf8eacedc3ab5b9bf066889c58",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 4997,
            "upload_time": "2021-04-21T15:28:00",
            "upload_time_iso_8601": "2021-04-21T15:28:00.817456Z",
            "url": "https://files.pythonhosted.org/packages/4e/ff/35229ba56301a12b25cedd6f2afcd299082c040a2a20cbaf4f1ca0a829ff/sparse_profile-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "b3df4fc4e91b01f9e159cbaf663cf15e",
                "sha256": "69c97adc38046be4f54a77554b5bd723ccc7e4a7c9e7a4830f6810b6af1e9ed2"
            },
            "downloads": -1,
            "filename": "sparse_profile-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "b3df4fc4e91b01f9e159cbaf663cf15e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 4868,
            "upload_time": "2021-04-21T15:28:03",
            "upload_time_iso_8601": "2021-04-21T15:28:03.074793Z",
            "url": "https://files.pythonhosted.org/packages/63/82/e099d4738d968132a483d09187734158998ead6a9bfda7f269b3ad81a787/sparse_profile-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2021-04-21 15:28:03",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": null,
    "github_project": "domarp",
    "error": "Could not fetch GitHub repository",
    "lcname": "sparse-profile"
}
        
Elapsed time: 0.28370s