Name | sdmetrics JSON |
Version |
0.18.0
JSON |
| download |
home_page | None |
Summary | Metrics for Synthetic Data Generation Projects |
upload_time | 2024-12-13 21:39:26 |
maintainer | None |
docs_url | None |
author | None |
requires_python | <3.13,>=3.8 |
license | MIT license |
keywords |
sdmetrics
sdmetrics
sdmetrics
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
<div align="center">
<br/>
<p align="center">
<i>This repository is part of <a href="https://sdv.dev">The Synthetic Data Vault Project</a>, a project from <a href="https://datacebo.com">DataCebo</a>.</i>
</p>
[![Development Status](https://img.shields.io/badge/Development%20Status-2%20--%20Pre--Alpha-yellow)](https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha)
[![PyPI Shield](https://img.shields.io/pypi/v/sdmetrics.svg)](https://pypi.python.org/pypi/sdmetrics)
[![Downloads](https://pepy.tech/badge/sdmetrics)](https://pepy.tech/project/sdmetrics)
[![Tests](https://github.com/sdv-dev/SDMetrics/workflows/Run%20Tests/badge.svg)](https://github.com/sdv-dev/SDMetrics/actions?query=workflow%3A%22Run+Tests%22+branch%3Amain)
[![Coverage Status](https://codecov.io/gh/sdv-dev/SDMetrics/branch/main/graph/badge.svg)](https://codecov.io/gh/sdv-dev/SDMetrics)
[![Slack](https://img.shields.io/badge/Community-Slack-blue?style=plastic&logo=slack)](https://bit.ly/sdv-slack-invite)
[![Tutorial](https://img.shields.io/badge/Demo-Get%20started-orange?style=plastic&logo=googlecolab)](https://bit.ly/sdmetrics-demo)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.14279167.svg)](https://doi.org/10.5281/zenodo.14279167)
<div align="left">
<br/>
<p align="center">
<a href="https://github.com/sdv-dev/SDV">
<img align="center" width=40% src="https://github.com/sdv-dev/SDV/blob/stable/docs/images/SDMetrics-DataCebo.png"></img>
</a>
</p>
</div>
</div>
# Overview
The SDMetrics library evaluates synthetic data by comparing it to the real data that you're trying to mimic. It includes a variety of metrics to capture different aspects of the data, for example **quality and privacy**. It also includes reports that you can run to generate insights, visualize data and share with your team.
The SDMetrics library is **model-agnostic**, meaning you can use any synthetic data. The library does not need to know how you created the data.
<img align="center" src="docs/images/column_comparison.png"></img>
# Install
Install SDMetrics using pip or conda. We recommend using a virtual environment to avoid conflicts with other software on your device.
```bash
pip install sdmetrics
```
```bash
conda install -c conda-forge sdmetrics
```
For more information about using SDMetrics, visit the [SDMetrics Documentation](https://docs.sdv.dev/sdmetrics).
# Usage
Get started with **SDMetrics Reports** using some demo data,
```python
from sdmetrics import load_demo
from sdmetrics.reports.single_table import QualityReport
real_data, synthetic_data, metadata = load_demo(modality='single_table')
my_report = QualityReport()
my_report.generate(real_data, synthetic_data, metadata)
```
```
Creating report: 100%|██████████| 4/4 [00:00<00:00, 5.22it/s]
Overall Quality Score: 82.84%
Properties:
Column Shapes: 82.78%
Column Pair Trends: 82.9%
```
Once you generate the report, you can drill down on the details and visualize the results.
```python
my_report.get_visualization(property_name='Column Pair Trends')
```
<img align="center" src="docs/images/column_pairs.png"></img>
Save the report and share it with your team.
```python
my_report.save(filepath='demo_data_quality_report.pkl')
# load it at any point in the future
my_report = QualityReport.load(filepath='demo_data_quality_report.pkl')
```
**Want more metrics?** You can also manually apply any of the metrics in this library to your data.
```python
# calculate whether the synthetic data respects the min/max bounds
# set by the real data
from sdmetrics.single_column import BoundaryAdherence
BoundaryAdherence.compute(
real_data['start_date'],
synthetic_data['start_date']
)
```
```
0.8503937007874016
```
```python
# calculate whether the synthetic data is new or whether it's an exact copy of the real data
from sdmetrics.single_table import NewRowSynthesis
NewRowSynthesis.compute(
real_data,
synthetic_data,
metadata
)
```
```
1.0
```
# What's next?
To learn more about the reports and metrics, visit the [SDMetrics Documentation](https://docs.sdv.dev/sdmetrics).
---
<div align="center">
<a href="https://datacebo.com"><img align="center" width=40% src="https://github.com/sdv-dev/SDV/blob/stable/docs/images/DataCebo.png"></img></a>
</div>
<br/>
<br/>
[The Synthetic Data Vault Project](https://sdv.dev) was first created at MIT's [Data to AI Lab](
https://dai.lids.mit.edu/) in 2016. After 4 years of research and traction with enterprise, we
created [DataCebo](https://datacebo.com) in 2020 with the goal of growing the project.
Today, DataCebo is the proud developer of SDV, the largest ecosystem for
synthetic data generation & evaluation. It is home to multiple libraries that support synthetic
data, including:
* 🔄 Data discovery & transformation. Reverse the transforms to reproduce realistic data.
* 🧠 Multiple machine learning models -- ranging from Copulas to Deep Learning -- to create tabular,
multi table and time series data.
* 📊 Measuring quality and privacy of synthetic data, and comparing different synthetic data
generation models.
[Get started using the SDV package](https://sdv.dev/SDV/getting_started/install.html) -- a fully
integrated solution and your one-stop shop for synthetic data. Or, use the standalone libraries
for specific needs.
Raw data
{
"_id": null,
"home_page": null,
"name": "sdmetrics",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.13,>=3.8",
"maintainer_email": null,
"keywords": "sdmetrics, sdmetrics, SDMetrics",
"author": null,
"author_email": "MIT Data To AI Lab <dailabmit@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/94/93/61270aeeff7fd580db64c837e81b00e768234e429ae274e95af16b9227d0/sdmetrics-0.18.0.tar.gz",
"platform": null,
"description": "<div align=\"center\">\n<br/>\n<p align=\"center\">\n <i>This repository is part of <a href=\"https://sdv.dev\">The Synthetic Data Vault Project</a>, a project from <a href=\"https://datacebo.com\">DataCebo</a>.</i>\n</p>\n\n[![Development Status](https://img.shields.io/badge/Development%20Status-2%20--%20Pre--Alpha-yellow)](https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha)\n[![PyPI Shield](https://img.shields.io/pypi/v/sdmetrics.svg)](https://pypi.python.org/pypi/sdmetrics)\n[![Downloads](https://pepy.tech/badge/sdmetrics)](https://pepy.tech/project/sdmetrics)\n[![Tests](https://github.com/sdv-dev/SDMetrics/workflows/Run%20Tests/badge.svg)](https://github.com/sdv-dev/SDMetrics/actions?query=workflow%3A%22Run+Tests%22+branch%3Amain)\n[![Coverage Status](https://codecov.io/gh/sdv-dev/SDMetrics/branch/main/graph/badge.svg)](https://codecov.io/gh/sdv-dev/SDMetrics)\n[![Slack](https://img.shields.io/badge/Community-Slack-blue?style=plastic&logo=slack)](https://bit.ly/sdv-slack-invite)\n[![Tutorial](https://img.shields.io/badge/Demo-Get%20started-orange?style=plastic&logo=googlecolab)](https://bit.ly/sdmetrics-demo)\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.14279167.svg)](https://doi.org/10.5281/zenodo.14279167)\n\n<div align=\"left\">\n<br/>\n<p align=\"center\">\n<a href=\"https://github.com/sdv-dev/SDV\">\n<img align=\"center\" width=40% src=\"https://github.com/sdv-dev/SDV/blob/stable/docs/images/SDMetrics-DataCebo.png\"></img>\n</a>\n</p>\n</div>\n\n</div>\n\n# Overview\n\nThe SDMetrics library evaluates synthetic data by comparing it to the real data that you're trying to mimic. It includes a variety of metrics to capture different aspects of the data, for example **quality and privacy**. It also includes reports that you can run to generate insights, visualize data and share with your team.\n\nThe SDMetrics library is **model-agnostic**, meaning you can use any synthetic data. The library does not need to know how you created the data. \n\n<img align=\"center\" src=\"docs/images/column_comparison.png\"></img>\n\n# Install\n\nInstall SDMetrics using pip or conda. We recommend using a virtual environment to avoid conflicts with other software on your device.\n\n```bash\npip install sdmetrics\n```\n\n```bash\nconda install -c conda-forge sdmetrics\n```\n\nFor more information about using SDMetrics, visit the [SDMetrics Documentation](https://docs.sdv.dev/sdmetrics).\n\n# Usage\n\nGet started with **SDMetrics Reports** using some demo data,\n\n```python\nfrom sdmetrics import load_demo\nfrom sdmetrics.reports.single_table import QualityReport\n\nreal_data, synthetic_data, metadata = load_demo(modality='single_table')\n\nmy_report = QualityReport()\nmy_report.generate(real_data, synthetic_data, metadata)\n```\n```\nCreating report: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 4/4 [00:00<00:00, 5.22it/s]\n\nOverall Quality Score: 82.84%\n\nProperties:\nColumn Shapes: 82.78%\nColumn Pair Trends: 82.9%\n```\n\nOnce you generate the report, you can drill down on the details and visualize the results.\n\n```python\nmy_report.get_visualization(property_name='Column Pair Trends')\n```\n<img align=\"center\" src=\"docs/images/column_pairs.png\"></img>\n\nSave the report and share it with your team.\n```python\nmy_report.save(filepath='demo_data_quality_report.pkl')\n\n# load it at any point in the future\nmy_report = QualityReport.load(filepath='demo_data_quality_report.pkl')\n```\n\n**Want more metrics?** You can also manually apply any of the metrics in this library to your data.\n\n```python\n# calculate whether the synthetic data respects the min/max bounds\n# set by the real data\nfrom sdmetrics.single_column import BoundaryAdherence\n\nBoundaryAdherence.compute(\n real_data['start_date'],\n synthetic_data['start_date']\n)\n```\n```\n0.8503937007874016\n```\n\n```python\n# calculate whether the synthetic data is new or whether it's an exact copy of the real data\nfrom sdmetrics.single_table import NewRowSynthesis\n\nNewRowSynthesis.compute(\n real_data,\n synthetic_data,\n metadata\n)\n```\n```\n1.0\n```\n\n# What's next?\n\nTo learn more about the reports and metrics, visit the [SDMetrics Documentation](https://docs.sdv.dev/sdmetrics). \n\n---\n\n\n<div align=\"center\">\n<a href=\"https://datacebo.com\"><img align=\"center\" width=40% src=\"https://github.com/sdv-dev/SDV/blob/stable/docs/images/DataCebo.png\"></img></a>\n</div>\n<br/>\n<br/>\n\n[The Synthetic Data Vault Project](https://sdv.dev) was first created at MIT's [Data to AI Lab](\nhttps://dai.lids.mit.edu/) in 2016. After 4 years of research and traction with enterprise, we\ncreated [DataCebo](https://datacebo.com) in 2020 with the goal of growing the project.\nToday, DataCebo is the proud developer of SDV, the largest ecosystem for\nsynthetic data generation & evaluation. It is home to multiple libraries that support synthetic\ndata, including:\n\n* \ud83d\udd04 Data discovery & transformation. Reverse the transforms to reproduce realistic data.\n* \ud83e\udde0 Multiple machine learning models -- ranging from Copulas to Deep Learning -- to create tabular,\n multi table and time series data.\n* \ud83d\udcca Measuring quality and privacy of synthetic data, and comparing different synthetic data\n generation models.\n\n[Get started using the SDV package](https://sdv.dev/SDV/getting_started/install.html) -- a fully\nintegrated solution and your one-stop shop for synthetic data. Or, use the standalone libraries\nfor specific needs.\n",
"bugtrack_url": null,
"license": "MIT license",
"summary": "Metrics for Synthetic Data Generation Projects",
"version": "0.18.0",
"project_urls": {
"Changes": "https://github.com/sdv-dev/SDMetrics/blob/main/HISTORY.md",
"Chat": "https://bit.ly/sdv-slack-invite",
"Issue Tracker": "https://github.com/sdv-dev/SDMetrics/issues",
"Source Code": "https://github.com/sdv-dev/SDMetrics",
"Twitter": "https://twitter.com/sdv_dev"
},
"split_keywords": [
"sdmetrics",
" sdmetrics",
" sdmetrics"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "2416120508b9955181a55cebe6729989a1332c15e4458e86d976e64b895766f9",
"md5": "00b3e488306989bd17d5d9ae45963044",
"sha256": "000333c47770dcd0fe9637ec9c13b3c97026b55f3adf418814d82dc5fb6f3315"
},
"downloads": -1,
"filename": "sdmetrics-0.18.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "00b3e488306989bd17d5d9ae45963044",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.13,>=3.8",
"size": 179083,
"upload_time": "2024-12-13T21:39:23",
"upload_time_iso_8601": "2024-12-13T21:39:23.693739Z",
"url": "https://files.pythonhosted.org/packages/24/16/120508b9955181a55cebe6729989a1332c15e4458e86d976e64b895766f9/sdmetrics-0.18.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "949361270aeeff7fd580db64c837e81b00e768234e429ae274e95af16b9227d0",
"md5": "cc5cc2d9ed58be5ad6e9f309b05d1dc3",
"sha256": "09b8f36106386f71855a695e8a9b9648caa93ce3c1553b97310a229ec6a8413b"
},
"downloads": -1,
"filename": "sdmetrics-0.18.0.tar.gz",
"has_sig": false,
"md5_digest": "cc5cc2d9ed58be5ad6e9f309b05d1dc3",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.13,>=3.8",
"size": 120478,
"upload_time": "2024-12-13T21:39:26",
"upload_time_iso_8601": "2024-12-13T21:39:26.126881Z",
"url": "https://files.pythonhosted.org/packages/94/93/61270aeeff7fd580db64c837e81b00e768234e429ae274e95af16b9227d0/sdmetrics-0.18.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-13 21:39:26",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "sdv-dev",
"github_project": "SDMetrics",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"tox": true,
"lcname": "sdmetrics"
}