tonic-reporting


Nametonic-reporting JSON
Version 1.4.4 PyPI version JSON
download
home_pagehttps://www.tonic.ai/
SummaryTools for evaluating fidelity and privacy of synthetic data
upload_time2023-02-02 23:10:51
maintainer
docs_urlNone
authorEric Timmerman
requires_python>=3.8
licenseMIT
keywords tonic.ai tonic
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Overview
This library contains tools for evaluating fidelity and privacy of synthetic data.

## Usage

Import the desired modules from the library:

```
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from tonic_reporting import univariate, multivariate, privacy
```

**Preface**

*Numeric* columns refer to columns *encoded* as numeric. Numerical data types in the schema underlying a model may be encoded as other types.

*Categorical* columns refer to columns *encoded* as categorical.

*source_df* is a Pandas DataFrame of original data from the source database

*synth_df* is a Pandas DataFrame of sampled data from trained models

The source and synthetic DataFrames should be equal in row count and schema.

**Numeric Column Statistics**

`univariate.summarize_numeric(source_df, synth_df, numeric_cols)`

**Categorical Column Statistics**

`univariate.summarize_categorical(source_df, synth_df, categorical_cols)`

**Numeric Column Comparative Histograms**

```
fig, axarr = plt.subplots(1, len(numeric_cols), figsize = (9,12))
axarr = axarr.ravel()

for col, ax in zip(numeric_cols, axarr):
    univariate.plot_histogram(source_df, synth_df, col,ax)
```

**Categorical Column Comparative Frequency Tables**

```
for col in categorical_cols:
    univariate.plot_frequency_table(source_df, synth_df, col, ax)
```

**Numeric Column Aggregates Over Time**

If the data represents time series, we can visualize means and confidence intervals of numeric features
over time:

```
for col in numeric_cols:
    fig, ax = plt.subplots(figsize=(10, 8))
    univariate.plot_events_means(source_df, synth_df, col, order_col, ax=ax)
```

and

```
for col in numeric_cols:
    fig, ax = plt.subplots(figsize=(12, 10))
    univariate.plot_events_confidence_intervals(source_df, synth_df, col, order_col, ax=ax)
```
where `order_col` denotes the time/order column.

**Numeric Column Multivariate Correlations Table**

`multivariate.summarize_correlations(source_df, synth_df, numeric_cols)`

**Numeric Column Multivariate Correlations Heat Map**

```
fig, axarr = plt.subplots(1, 2, figsize=(13, 8))
multivariate.plot_correlations(source_df, synth_df, numeric_cols, axarr=axarr, )
fig.tight_layout()
```

**Distance to Closest Record Comparison**

```
syn_dcr, real_dcr = privacy.compute_dcr(source_df, synth_df, numeric_cols, categorical_cols)

fig, ax = plt.subplots(1,1,figsize=(8,6))
ax.hist(real_dcr,bins=300,label = 'Real vs. real', color='mediumpurple');
ax.hist(syn_dcr,bins=300,label='Synthetic vs. real', color='mediumturquoise');
ax.tick_params(axis='both', which='major', labelsize=14)
ax.set_title('Distances to closest record',fontsize=22)
ax.legend(fontsize=16);
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://www.tonic.ai/",
    "name": "tonic-reporting",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "tonic.ai,tonic",
    "author": "Eric Timmerman",
    "author_email": "eric@tonic.ai",
    "download_url": "https://files.pythonhosted.org/packages/5f/91/e9a25877b3cb80289b2a7e7baf9326e67802478ddfef95b8c70accfc8ec8/tonic-reporting-1.4.4.tar.gz",
    "platform": null,
    "description": "# Overview\nThis library contains tools for evaluating fidelity and privacy of synthetic data.\n\n## Usage\n\nImport the desired modules from the library:\n\n```\nimport pandas as pd\nimport numpy as np\nimport matplotlib.pyplot as plt\nfrom tonic_reporting import univariate, multivariate, privacy\n```\n\n**Preface**\n\n*Numeric* columns refer to columns *encoded* as numeric. Numerical data types in the schema underlying a model may be encoded as other types.\n\n*Categorical* columns refer to columns *encoded* as categorical.\n\n*source_df* is a Pandas DataFrame of original data from the source database\n\n*synth_df* is a Pandas DataFrame of sampled data from trained models\n\nThe source and synthetic DataFrames should be equal in row count and schema.\n\n**Numeric Column Statistics**\n\n`univariate.summarize_numeric(source_df, synth_df, numeric_cols)`\n\n**Categorical Column Statistics**\n\n`univariate.summarize_categorical(source_df, synth_df, categorical_cols)`\n\n**Numeric Column Comparative Histograms**\n\n```\nfig, axarr = plt.subplots(1, len(numeric_cols), figsize = (9,12))\naxarr = axarr.ravel()\n\nfor col, ax in zip(numeric_cols, axarr):\n    univariate.plot_histogram(source_df, synth_df, col,ax)\n```\n\n**Categorical Column Comparative Frequency Tables**\n\n```\nfor col in categorical_cols:\n    univariate.plot_frequency_table(source_df, synth_df, col, ax)\n```\n\n**Numeric Column Aggregates Over Time**\n\nIf the data represents time series, we can visualize means and confidence intervals of numeric features\nover time:\n\n```\nfor col in numeric_cols:\n    fig, ax = plt.subplots(figsize=(10, 8))\n    univariate.plot_events_means(source_df, synth_df, col, order_col, ax=ax)\n```\n\nand\n\n```\nfor col in numeric_cols:\n    fig, ax = plt.subplots(figsize=(12, 10))\n    univariate.plot_events_confidence_intervals(source_df, synth_df, col, order_col, ax=ax)\n```\nwhere `order_col` denotes the time/order column.\n\n**Numeric Column Multivariate Correlations Table**\n\n`multivariate.summarize_correlations(source_df, synth_df, numeric_cols)`\n\n**Numeric Column Multivariate Correlations Heat Map**\n\n```\nfig, axarr = plt.subplots(1, 2, figsize=(13, 8))\nmultivariate.plot_correlations(source_df, synth_df, numeric_cols, axarr=axarr, )\nfig.tight_layout()\n```\n\n**Distance to Closest Record Comparison**\n\n```\nsyn_dcr, real_dcr = privacy.compute_dcr(source_df, synth_df, numeric_cols, categorical_cols)\n\nfig, ax = plt.subplots(1,1,figsize=(8,6))\nax.hist(real_dcr,bins=300,label = 'Real vs. real', color='mediumpurple');\nax.hist(syn_dcr,bins=300,label='Synthetic vs. real', color='mediumturquoise');\nax.tick_params(axis='both', which='major', labelsize=14)\nax.set_title('Distances to closest record',fontsize=22)\nax.legend(fontsize=16);\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Tools for evaluating fidelity and privacy of synthetic data",
    "version": "1.4.4",
    "split_keywords": [
        "tonic.ai",
        "tonic"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6519118343c15c271d8003737af94b0d3a619680396fb5fdf085bf649d1ae26f",
                "md5": "cd073395cfb26fc348fe39b20b75fd0a",
                "sha256": "936cddf55bfdd7d64ba149443f1822d2cda6952f1a3db254d2bf85e752523556"
            },
            "downloads": -1,
            "filename": "tonic_reporting-1.4.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "cd073395cfb26fc348fe39b20b75fd0a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 11381,
            "upload_time": "2023-02-02T23:10:53",
            "upload_time_iso_8601": "2023-02-02T23:10:53.994691Z",
            "url": "https://files.pythonhosted.org/packages/65/19/118343c15c271d8003737af94b0d3a619680396fb5fdf085bf649d1ae26f/tonic_reporting-1.4.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5f91e9a25877b3cb80289b2a7e7baf9326e67802478ddfef95b8c70accfc8ec8",
                "md5": "bac30ded2d320b343f7366373805cd20",
                "sha256": "8337b7cd61aeaf40152a2d0994a620d68240beca66e3d82cc8c71b265c928286"
            },
            "downloads": -1,
            "filename": "tonic-reporting-1.4.4.tar.gz",
            "has_sig": false,
            "md5_digest": "bac30ded2d320b343f7366373805cd20",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 10006,
            "upload_time": "2023-02-02T23:10:51",
            "upload_time_iso_8601": "2023-02-02T23:10:51.208380Z",
            "url": "https://files.pythonhosted.org/packages/5f/91/e9a25877b3cb80289b2a7e7baf9326e67802478ddfef95b8c70accfc8ec8/tonic-reporting-1.4.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-02-02 23:10:51",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "lcname": "tonic-reporting"
}
        
Elapsed time: 0.03742s