syndat


Namesyndat JSON
Version 0.10.3 PyPI version JSON
download
home_pagehttps://github.com/SCAI-BIO/syndat
SummaryA library for evaluation & visualization of synthetic data.
upload_time2025-01-08 15:31:03
maintainerNone
docs_urlNone
authorTim Adams
requires_python>=3.9
licenseMIT
keywords synthetic-data data-quality data-visualization
VCS
bugtrack_url
requirements pandas numpy scipy scikit-learn matplotlib seaborn setuptools shap
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
# Syndat
![tests](https://github.com/SCAI-BIO/syndat/actions/workflows/tests.yaml/badge.svg) ![docs](https://readthedocs.org/projects/syndat/badge/?version=latest&style=flat) ![version](https://img.shields.io/github/v/release/SCAI-BIO/syndat)

Syndat is a software package that provides basic functionalities for the evaluation and visualizsation of synthetic data. Quality scores can be computed on 3 base metrics (Discrimation, Correlation and Distribution) and data may be visualized to inspect correlation structures or statistical distribution plots.

# Installation

Install via pip:

```bash
pip install syndat
```

# Usage

## Quality metrics

Compute data quality metrics by comparing real and synthetic data in terms of their separation complexity, 
distribution similarity or pairwise feature correlations:

```python
import pandas as pd
import syndat

real = pd.read_csv("real.csv")
synthetic = pd.read_csv("synthetic.csv")

# How similar are the statistical distributions of real and synthetic features 
distribution_similarity_score = syndat.scores.distribution(real, synthetic)

# How hard is it for a classifier to discriminate real and synthetic data
discrimination_score = syndat.scores.discrimination(real, synthetic)

# How well are pairwise feature correlations preserved
correlation_score = syndat.scores.correlation(real, synthetic)
```

Scores are defined in a range of 0-100, with a higher score corresponding to better data fidelity.

## Visualization

Visualize real vs. synthetic data distributions, summary statistics and discriminating features:

```python
import pandas as pd
import syndat

real = pd.read_csv("real.csv")
synthetic = pd.read_csv("synthetic.csv")

# plot *all* feature distribution and store image files
syndat.visualization.plot_distributions(real, synthetic, store_destination="results/plots")
syndat.visualization.plot_correlations(real, synthetic, store_destination="results/plots")

# plot and display specific feature distribution plot
syndat.visualization.plot_numerical_feature("feature_xy", real, synthetic)
syndat.visualization.plot_numerical_feature("feature_xy", real, synthetic)

# plot a shap plot of differentiating feature for real and synthetic data
syndat.visualization.plot_shap_discrimination(real, synthetic)
```


## Postprocessing

Postprocess synthetic data to improve data fidelity:

```python
import pandas as pd
import syndat

real = pd.read_csv("real.csv")
synthetic = pd.read_csv("synthetic.csv")

# postprocess synthetic data
synthetic_post = syndat.postprocessing.assert_minmax(real, synthetic)
synthetic_post = syndat.postprocessing.normalize_float_precision(real, synthetic)
```


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/SCAI-BIO/syndat",
    "name": "syndat",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "synthetic-data, data-quality, data-visualization",
    "author": "Tim Adams",
    "author_email": "tim.adams@scai.fraunhofer.de",
    "download_url": "https://files.pythonhosted.org/packages/67/70/64a7cda51155ef1c0227b30e84bcb4965df64480d8fb9db7e11f03c8eeec/syndat-0.10.3.tar.gz",
    "platform": null,
    "description": "\n# Syndat\n![tests](https://github.com/SCAI-BIO/syndat/actions/workflows/tests.yaml/badge.svg) ![docs](https://readthedocs.org/projects/syndat/badge/?version=latest&style=flat) ![version](https://img.shields.io/github/v/release/SCAI-BIO/syndat)\n\nSyndat is a software package that provides basic functionalities for the evaluation and visualizsation of synthetic data. Quality scores can be computed on 3 base metrics (Discrimation, Correlation and Distribution) and data may be visualized to inspect correlation structures or statistical distribution plots.\n\n# Installation\n\nInstall via pip:\n\n```bash\npip install syndat\n```\n\n# Usage\n\n## Quality metrics\n\nCompute data quality metrics by comparing real and synthetic data in terms of their separation complexity, \ndistribution similarity or pairwise feature correlations:\n\n```python\nimport pandas as pd\nimport syndat\n\nreal = pd.read_csv(\"real.csv\")\nsynthetic = pd.read_csv(\"synthetic.csv\")\n\n# How similar are the statistical distributions of real and synthetic features \ndistribution_similarity_score = syndat.scores.distribution(real, synthetic)\n\n# How hard is it for a classifier to discriminate real and synthetic data\ndiscrimination_score = syndat.scores.discrimination(real, synthetic)\n\n# How well are pairwise feature correlations preserved\ncorrelation_score = syndat.scores.correlation(real, synthetic)\n```\n\nScores are defined in a range of 0-100, with a higher score corresponding to better data fidelity.\n\n## Visualization\n\nVisualize real vs. synthetic data distributions, summary statistics and discriminating features:\n\n```python\nimport pandas as pd\nimport syndat\n\nreal = pd.read_csv(\"real.csv\")\nsynthetic = pd.read_csv(\"synthetic.csv\")\n\n# plot *all* feature distribution and store image files\nsyndat.visualization.plot_distributions(real, synthetic, store_destination=\"results/plots\")\nsyndat.visualization.plot_correlations(real, synthetic, store_destination=\"results/plots\")\n\n# plot and display specific feature distribution plot\nsyndat.visualization.plot_numerical_feature(\"feature_xy\", real, synthetic)\nsyndat.visualization.plot_numerical_feature(\"feature_xy\", real, synthetic)\n\n# plot a shap plot of differentiating feature for real and synthetic data\nsyndat.visualization.plot_shap_discrimination(real, synthetic)\n```\n\n\n## Postprocessing\n\nPostprocess synthetic data to improve data fidelity:\n\n```python\nimport pandas as pd\nimport syndat\n\nreal = pd.read_csv(\"real.csv\")\nsynthetic = pd.read_csv(\"synthetic.csv\")\n\n# postprocess synthetic data\nsynthetic_post = syndat.postprocessing.assert_minmax(real, synthetic)\nsynthetic_post = syndat.postprocessing.normalize_float_precision(real, synthetic)\n```\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A library for evaluation & visualization of synthetic data.",
    "version": "0.10.3",
    "project_urls": {
        "Documentation": "https://github.com/SCAI-BIO/syndat#readme",
        "Homepage": "https://github.com/SCAI-BIO/syndat",
        "Source": "https://github.com/SCAI-BIO/syndat",
        "Tracker": "https://github.com/SCAI-BIO/syndat/issues"
    },
    "split_keywords": [
        "synthetic-data",
        " data-quality",
        " data-visualization"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f427c54e325da48fb8fd7391945a4da9fff3802f29ab36d84ab613084f3207be",
                "md5": "efb587b5c0abea81df942b721fda3b66",
                "sha256": "9bd6c930701361fbde20d9493c6483db5fc71142f43fa2371cc8a18831108cd8"
            },
            "downloads": -1,
            "filename": "syndat-0.10.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "efb587b5c0abea81df942b721fda3b66",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 12062,
            "upload_time": "2025-01-08T15:31:01",
            "upload_time_iso_8601": "2025-01-08T15:31:01.204309Z",
            "url": "https://files.pythonhosted.org/packages/f4/27/c54e325da48fb8fd7391945a4da9fff3802f29ab36d84ab613084f3207be/syndat-0.10.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "677064a7cda51155ef1c0227b30e84bcb4965df64480d8fb9db7e11f03c8eeec",
                "md5": "d6263635c344dd77798c8ab54f669f82",
                "sha256": "d12fab562b1b93bd269e70f15a85f5666526a6bed05d43f9a7b78cbd8262e2ec"
            },
            "downloads": -1,
            "filename": "syndat-0.10.3.tar.gz",
            "has_sig": false,
            "md5_digest": "d6263635c344dd77798c8ab54f669f82",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 14259,
            "upload_time": "2025-01-08T15:31:03",
            "upload_time_iso_8601": "2025-01-08T15:31:03.888538Z",
            "url": "https://files.pythonhosted.org/packages/67/70/64a7cda51155ef1c0227b30e84bcb4965df64480d8fb9db7e11f03c8eeec/syndat-0.10.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-08 15:31:03",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "SCAI-BIO",
    "github_project": "syndat",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "pandas",
            "specs": [
                [
                    "~=",
                    "2.1.4"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    "~=",
                    "1.26.2"
                ]
            ]
        },
        {
            "name": "scipy",
            "specs": [
                [
                    "~=",
                    "1.11.4"
                ]
            ]
        },
        {
            "name": "scikit-learn",
            "specs": [
                [
                    "~=",
                    "1.5.1"
                ]
            ]
        },
        {
            "name": "matplotlib",
            "specs": [
                [
                    "~=",
                    "3.8.2"
                ]
            ]
        },
        {
            "name": "seaborn",
            "specs": [
                [
                    "~=",
                    "0.13.0"
                ]
            ]
        },
        {
            "name": "setuptools",
            "specs": [
                [
                    "==",
                    "70.0.0"
                ]
            ]
        },
        {
            "name": "shap",
            "specs": [
                [
                    "~=",
                    "0.42.0"
                ]
            ]
        }
    ],
    "lcname": "syndat"
}
        
Elapsed time: 2.36367s