sdmetrics


Namesdmetrics JSON
Version 0.18.0 PyPI version JSON
download
home_pageNone
SummaryMetrics for Synthetic Data Generation Projects
upload_time2024-12-13 21:39:26
maintainerNone
docs_urlNone
authorNone
requires_python<3.13,>=3.8
licenseMIT license
keywords sdmetrics sdmetrics sdmetrics
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <div align="center">
<br/>
<p align="center">
    <i>This repository is part of <a href="https://sdv.dev">The Synthetic Data Vault Project</a>, a project from <a href="https://datacebo.com">DataCebo</a>.</i>
</p>

[![Development Status](https://img.shields.io/badge/Development%20Status-2%20--%20Pre--Alpha-yellow)](https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha)
[![PyPI Shield](https://img.shields.io/pypi/v/sdmetrics.svg)](https://pypi.python.org/pypi/sdmetrics)
[![Downloads](https://pepy.tech/badge/sdmetrics)](https://pepy.tech/project/sdmetrics)
[![Tests](https://github.com/sdv-dev/SDMetrics/workflows/Run%20Tests/badge.svg)](https://github.com/sdv-dev/SDMetrics/actions?query=workflow%3A%22Run+Tests%22+branch%3Amain)
[![Coverage Status](https://codecov.io/gh/sdv-dev/SDMetrics/branch/main/graph/badge.svg)](https://codecov.io/gh/sdv-dev/SDMetrics)
[![Slack](https://img.shields.io/badge/Community-Slack-blue?style=plastic&logo=slack)](https://bit.ly/sdv-slack-invite)
[![Tutorial](https://img.shields.io/badge/Demo-Get%20started-orange?style=plastic&logo=googlecolab)](https://bit.ly/sdmetrics-demo)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.14279167.svg)](https://doi.org/10.5281/zenodo.14279167)

<div align="left">
<br/>
<p align="center">
<a href="https://github.com/sdv-dev/SDV">
<img align="center" width=40% src="https://github.com/sdv-dev/SDV/blob/stable/docs/images/SDMetrics-DataCebo.png"></img>
</a>
</p>
</div>

</div>

# Overview

The SDMetrics library evaluates synthetic data by comparing it to the real data that you're trying to mimic. It includes a variety of metrics to capture different aspects of the data, for example **quality and privacy**. It also includes reports that you can run to generate insights, visualize data and share with your team.

The SDMetrics library is **model-agnostic**, meaning you can use any synthetic data. The library does not need to know how you created the data. 

<img align="center" src="docs/images/column_comparison.png"></img>

# Install

Install SDMetrics using pip or conda. We recommend using a virtual environment to avoid conflicts with other software on your device.

```bash
pip install sdmetrics
```

```bash
conda install -c conda-forge sdmetrics
```

For more information about using SDMetrics, visit the [SDMetrics Documentation](https://docs.sdv.dev/sdmetrics).

# Usage

Get started with **SDMetrics Reports** using some demo data,

```python
from sdmetrics import load_demo
from sdmetrics.reports.single_table import QualityReport

real_data, synthetic_data, metadata = load_demo(modality='single_table')

my_report = QualityReport()
my_report.generate(real_data, synthetic_data, metadata)
```
```
Creating report: 100%|██████████| 4/4 [00:00<00:00,  5.22it/s]

Overall Quality Score: 82.84%

Properties:
Column Shapes: 82.78%
Column Pair Trends: 82.9%
```

Once you generate the report, you can drill down on the details and visualize the results.

```python
my_report.get_visualization(property_name='Column Pair Trends')
```
<img align="center" src="docs/images/column_pairs.png"></img>

Save the report and share it with your team.
```python
my_report.save(filepath='demo_data_quality_report.pkl')

# load it at any point in the future
my_report = QualityReport.load(filepath='demo_data_quality_report.pkl')
```

**Want more metrics?** You can also manually apply any of the metrics in this library to your data.

```python
# calculate whether the synthetic data respects the min/max bounds
# set by the real data
from sdmetrics.single_column import BoundaryAdherence

BoundaryAdherence.compute(
    real_data['start_date'],
    synthetic_data['start_date']
)
```
```
0.8503937007874016
```

```python
# calculate whether the synthetic data is new or whether it's an exact copy of the real data
from sdmetrics.single_table import NewRowSynthesis

NewRowSynthesis.compute(
    real_data,
    synthetic_data,
    metadata
)
```
```
1.0
```

# What's next?

To learn more about the reports and metrics, visit the [SDMetrics Documentation](https://docs.sdv.dev/sdmetrics). 

---


<div align="center">
<a href="https://datacebo.com"><img align="center" width=40% src="https://github.com/sdv-dev/SDV/blob/stable/docs/images/DataCebo.png"></img></a>
</div>
<br/>
<br/>

[The Synthetic Data Vault Project](https://sdv.dev) was first created at MIT's [Data to AI Lab](
https://dai.lids.mit.edu/) in 2016. After 4 years of research and traction with enterprise, we
created [DataCebo](https://datacebo.com) in 2020 with the goal of growing the project.
Today, DataCebo is the proud developer of SDV, the largest ecosystem for
synthetic data generation & evaluation. It is home to multiple libraries that support synthetic
data, including:

* 🔄 Data discovery & transformation. Reverse the transforms to reproduce realistic data.
* 🧠 Multiple machine learning models -- ranging from Copulas to Deep Learning -- to create tabular,
  multi table and time series data.
* 📊 Measuring quality and privacy of synthetic data, and comparing different synthetic data
  generation models.

[Get started using the SDV package](https://sdv.dev/SDV/getting_started/install.html) -- a fully
integrated solution and your one-stop shop for synthetic data. Or, use the standalone libraries
for specific needs.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "sdmetrics",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.13,>=3.8",
    "maintainer_email": null,
    "keywords": "sdmetrics, sdmetrics, SDMetrics",
    "author": null,
    "author_email": "MIT Data To AI Lab <dailabmit@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/94/93/61270aeeff7fd580db64c837e81b00e768234e429ae274e95af16b9227d0/sdmetrics-0.18.0.tar.gz",
    "platform": null,
    "description": "<div align=\"center\">\n<br/>\n<p align=\"center\">\n    <i>This repository is part of <a href=\"https://sdv.dev\">The Synthetic Data Vault Project</a>, a project from <a href=\"https://datacebo.com\">DataCebo</a>.</i>\n</p>\n\n[![Development Status](https://img.shields.io/badge/Development%20Status-2%20--%20Pre--Alpha-yellow)](https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha)\n[![PyPI Shield](https://img.shields.io/pypi/v/sdmetrics.svg)](https://pypi.python.org/pypi/sdmetrics)\n[![Downloads](https://pepy.tech/badge/sdmetrics)](https://pepy.tech/project/sdmetrics)\n[![Tests](https://github.com/sdv-dev/SDMetrics/workflows/Run%20Tests/badge.svg)](https://github.com/sdv-dev/SDMetrics/actions?query=workflow%3A%22Run+Tests%22+branch%3Amain)\n[![Coverage Status](https://codecov.io/gh/sdv-dev/SDMetrics/branch/main/graph/badge.svg)](https://codecov.io/gh/sdv-dev/SDMetrics)\n[![Slack](https://img.shields.io/badge/Community-Slack-blue?style=plastic&logo=slack)](https://bit.ly/sdv-slack-invite)\n[![Tutorial](https://img.shields.io/badge/Demo-Get%20started-orange?style=plastic&logo=googlecolab)](https://bit.ly/sdmetrics-demo)\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.14279167.svg)](https://doi.org/10.5281/zenodo.14279167)\n\n<div align=\"left\">\n<br/>\n<p align=\"center\">\n<a href=\"https://github.com/sdv-dev/SDV\">\n<img align=\"center\" width=40% src=\"https://github.com/sdv-dev/SDV/blob/stable/docs/images/SDMetrics-DataCebo.png\"></img>\n</a>\n</p>\n</div>\n\n</div>\n\n# Overview\n\nThe SDMetrics library evaluates synthetic data by comparing it to the real data that you're trying to mimic. It includes a variety of metrics to capture different aspects of the data, for example **quality and privacy**. It also includes reports that you can run to generate insights, visualize data and share with your team.\n\nThe SDMetrics library is **model-agnostic**, meaning you can use any synthetic data. The library does not need to know how you created the data. \n\n<img align=\"center\" src=\"docs/images/column_comparison.png\"></img>\n\n# Install\n\nInstall SDMetrics using pip or conda. We recommend using a virtual environment to avoid conflicts with other software on your device.\n\n```bash\npip install sdmetrics\n```\n\n```bash\nconda install -c conda-forge sdmetrics\n```\n\nFor more information about using SDMetrics, visit the [SDMetrics Documentation](https://docs.sdv.dev/sdmetrics).\n\n# Usage\n\nGet started with **SDMetrics Reports** using some demo data,\n\n```python\nfrom sdmetrics import load_demo\nfrom sdmetrics.reports.single_table import QualityReport\n\nreal_data, synthetic_data, metadata = load_demo(modality='single_table')\n\nmy_report = QualityReport()\nmy_report.generate(real_data, synthetic_data, metadata)\n```\n```\nCreating report: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 4/4 [00:00<00:00,  5.22it/s]\n\nOverall Quality Score: 82.84%\n\nProperties:\nColumn Shapes: 82.78%\nColumn Pair Trends: 82.9%\n```\n\nOnce you generate the report, you can drill down on the details and visualize the results.\n\n```python\nmy_report.get_visualization(property_name='Column Pair Trends')\n```\n<img align=\"center\" src=\"docs/images/column_pairs.png\"></img>\n\nSave the report and share it with your team.\n```python\nmy_report.save(filepath='demo_data_quality_report.pkl')\n\n# load it at any point in the future\nmy_report = QualityReport.load(filepath='demo_data_quality_report.pkl')\n```\n\n**Want more metrics?** You can also manually apply any of the metrics in this library to your data.\n\n```python\n# calculate whether the synthetic data respects the min/max bounds\n# set by the real data\nfrom sdmetrics.single_column import BoundaryAdherence\n\nBoundaryAdherence.compute(\n    real_data['start_date'],\n    synthetic_data['start_date']\n)\n```\n```\n0.8503937007874016\n```\n\n```python\n# calculate whether the synthetic data is new or whether it's an exact copy of the real data\nfrom sdmetrics.single_table import NewRowSynthesis\n\nNewRowSynthesis.compute(\n    real_data,\n    synthetic_data,\n    metadata\n)\n```\n```\n1.0\n```\n\n# What's next?\n\nTo learn more about the reports and metrics, visit the [SDMetrics Documentation](https://docs.sdv.dev/sdmetrics). \n\n---\n\n\n<div align=\"center\">\n<a href=\"https://datacebo.com\"><img align=\"center\" width=40% src=\"https://github.com/sdv-dev/SDV/blob/stable/docs/images/DataCebo.png\"></img></a>\n</div>\n<br/>\n<br/>\n\n[The Synthetic Data Vault Project](https://sdv.dev) was first created at MIT's [Data to AI Lab](\nhttps://dai.lids.mit.edu/) in 2016. After 4 years of research and traction with enterprise, we\ncreated [DataCebo](https://datacebo.com) in 2020 with the goal of growing the project.\nToday, DataCebo is the proud developer of SDV, the largest ecosystem for\nsynthetic data generation & evaluation. It is home to multiple libraries that support synthetic\ndata, including:\n\n* \ud83d\udd04 Data discovery & transformation. Reverse the transforms to reproduce realistic data.\n* \ud83e\udde0 Multiple machine learning models -- ranging from Copulas to Deep Learning -- to create tabular,\n  multi table and time series data.\n* \ud83d\udcca Measuring quality and privacy of synthetic data, and comparing different synthetic data\n  generation models.\n\n[Get started using the SDV package](https://sdv.dev/SDV/getting_started/install.html) -- a fully\nintegrated solution and your one-stop shop for synthetic data. Or, use the standalone libraries\nfor specific needs.\n",
    "bugtrack_url": null,
    "license": "MIT license",
    "summary": "Metrics for Synthetic Data Generation Projects",
    "version": "0.18.0",
    "project_urls": {
        "Changes": "https://github.com/sdv-dev/SDMetrics/blob/main/HISTORY.md",
        "Chat": "https://bit.ly/sdv-slack-invite",
        "Issue Tracker": "https://github.com/sdv-dev/SDMetrics/issues",
        "Source Code": "https://github.com/sdv-dev/SDMetrics",
        "Twitter": "https://twitter.com/sdv_dev"
    },
    "split_keywords": [
        "sdmetrics",
        " sdmetrics",
        " sdmetrics"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2416120508b9955181a55cebe6729989a1332c15e4458e86d976e64b895766f9",
                "md5": "00b3e488306989bd17d5d9ae45963044",
                "sha256": "000333c47770dcd0fe9637ec9c13b3c97026b55f3adf418814d82dc5fb6f3315"
            },
            "downloads": -1,
            "filename": "sdmetrics-0.18.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "00b3e488306989bd17d5d9ae45963044",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.13,>=3.8",
            "size": 179083,
            "upload_time": "2024-12-13T21:39:23",
            "upload_time_iso_8601": "2024-12-13T21:39:23.693739Z",
            "url": "https://files.pythonhosted.org/packages/24/16/120508b9955181a55cebe6729989a1332c15e4458e86d976e64b895766f9/sdmetrics-0.18.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "949361270aeeff7fd580db64c837e81b00e768234e429ae274e95af16b9227d0",
                "md5": "cc5cc2d9ed58be5ad6e9f309b05d1dc3",
                "sha256": "09b8f36106386f71855a695e8a9b9648caa93ce3c1553b97310a229ec6a8413b"
            },
            "downloads": -1,
            "filename": "sdmetrics-0.18.0.tar.gz",
            "has_sig": false,
            "md5_digest": "cc5cc2d9ed58be5ad6e9f309b05d1dc3",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.13,>=3.8",
            "size": 120478,
            "upload_time": "2024-12-13T21:39:26",
            "upload_time_iso_8601": "2024-12-13T21:39:26.126881Z",
            "url": "https://files.pythonhosted.org/packages/94/93/61270aeeff7fd580db64c837e81b00e768234e429ae274e95af16b9227d0/sdmetrics-0.18.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-13 21:39:26",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "sdv-dev",
    "github_project": "SDMetrics",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "sdmetrics"
}
        
Elapsed time: 0.47315s