sdgym


Namesdgym JSON
Version 0.7.0 PyPI version JSON
download
home_pagehttps://github.com/sdv-dev/SDGym
SummaryBenchmark tabular synthetic data generators using a variety of datasets
upload_time2023-06-14 18:20:23
maintainer
docs_urlNone
authorDataCebo, Inc.
requires_python>=3.7,<3.11
licenseBSL-1.1
keywords machine learning synthetic data generation benchmark generative models
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <div align="center">
<br/>
<p align="center">
    <i>This repository is part of <a href="https://sdv.dev">The Synthetic Data Vault Project</a>, a project from <a href="https://datacebo.com">DataCebo</a>.</i>
</p>

[![Development Status](https://img.shields.io/badge/Development%20Status-2%20--%20Pre--Alpha-yellow)](https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha)
[![Travis](https://travis-ci.org/sdv-dev/SDGym.svg?branch=master)](https://travis-ci.org/sdv-dev/SDGym)
[![PyPi Shield](https://img.shields.io/pypi/v/sdgym.svg)](https://pypi.python.org/pypi/sdgym)
[![Downloads](https://pepy.tech/badge/sdgym)](https://pepy.tech/project/sdgym)
[![Slack](https://img.shields.io/badge/Community-Slack-blue?style=plastic&logo=slack)](https://bit.ly/sdv-slack-invite)

<div align="left">
<br/>
<p align="center">
<a href="https://github.com/sdv-dev/SDGym">
<img align="center" width=40% src="https://github.com/sdv-dev/SDV/blob/master/docs/images/SDGym-DataCebo.png"></img>
</a>
</p>
</div>

</div>

# Overview

The Synthetic Data Gym (SDGym) is a benchmarking framework for modeling and generating
synthetic data. Measure performance and memory usage across different synthetic data modeling
techniques – classical statistics, deep learning and more!

<img align="center" src="docs/images/SDGym_Results.png"></img>

The SDGym library integrates with the Synthetic Data Vault ecosystem. You can use any of its
synthesizers, datasets or metrics for benchmarking. You can also customize the process to include
your own work.

* **Datasets**: Select any of the publicly available datasets from the SDV project, or input your own data.
* **Synthesizers**: Choose from any of the SDV synthesizers and baselines. Or write your own custom
machine learning model.
* **Evaluation**: In addition to performance and memory usage, you can also measure synthetic data
quality and privacy through a variety of metrics.

# Install

Install SDGym using pip or conda. We recommend using a virtual environment to avoid conflicts with other software on your device.

```bash
pip install sdgym
```

```bash
conda install -c pytorch -c conda-forge sdgym
```

For more information about using SDGym, visit the [SDGym Documentation](https://docs.sdv.dev/sdgym).

# Usage

Let's benchmark synthetic data generation for single tables. First, let's define which modeling
techniques we want to use. Let's choose a few synthesizers from the SDV library and a few others
to use as baselines.

```python
# these synthesizers come from the SDV library
# each one uses different modeling techniques
sdv_synthesizers = ['GaussianCopulaSynthesizer', 'CTGANSynthesizer']

# these basic synthesizers are available in SDGym
# as baselines
baseline_synthesizers = ['UniformSynthesizer']
```

Now, we can benchmark the different techniques:
```python
import sdgym

sdgym.benchmark_single_table(
    synthesizers=(sdv_synthesizers + baseline_synthesizers)
)
```

The result is a detailed performance, memory and quality evaluation across the synthesizers
on a variety of publicly available datasets.

## Supplying a custom synthesizer

Benchmark your own synthetic data generation techniques. Define your synthesizer by
specifying the training logic (using machine learning) and the sampling logic.

```python
def my_training_logic(data, metadata):
    # create an object to represent your synthesizer
    # train it using the data
    return synthesizer

def my_sampling_logic(trained_synthesizer, num_rows):
    # use the trained synthesizer to create
    # num_rows of synthetic data
    return synthetic_data
```

Learn more in the [Custom Synthesizers Guide](https://docs.sdv.dev/sdgym/customization/synthesizers/custom-synthesizers).

## Customizing your datasets

The SDGym library includes many publicly available datasets that you can include right away.
List these using the ``get_available_datasets`` feature.

```python
sdgym.get_available_datasets()
```

```
dataset_name   size_MB     num_tables
KRK_v1         0.072128    1
adult          3.907448    1
alarm          4.520128    1
asia           1.280128    1
...
```

You can also include any custom, private datasets that are stored on your computer on an
Amazon S3 bucket.

```
my_datasets_folder = 's3://my-datasets-bucket'
```

For more information, see the docs for [Customized Datasets](https://docs.sdv.dev/sdgym/customization/datasets).

# What's next?

Visit the [SDGym Documentation](https://docs.sdv.dev/sdgym) to learn more!

---


<div align="center">
<a href="https://datacebo.com"><img align="center" width=40% src="https://github.com/sdv-dev/SDV/blob/master/docs/images/DataCebo.png"></img></a>
</div>
<br/>
<br/>

[The Synthetic Data Vault Project](https://sdv.dev) was first created at MIT's [Data to AI Lab](
https://dai.lids.mit.edu/) in 2016. After 4 years of research and traction with enterprise, we
created [DataCebo](https://datacebo.com) in 2020 with the goal of growing the project.
Today, DataCebo is the proud developer of SDV, the largest ecosystem for
synthetic data generation & evaluation. It is home to multiple libraries that support synthetic
data, including:

* 🔄 Data discovery & transformation. Reverse the transforms to reproduce realistic data.
* 🧠 Multiple machine learning models -- ranging from Copulas to Deep Learning -- to create tabular,
  multi table and time series data.
* 📊 Measuring quality and privacy of synthetic data, and comparing different synthetic data
  generation models.

[Get started using the SDV package](https://sdv.dev/SDV/getting_started/install.html) -- a fully
integrated solution and your one-stop shop for synthetic data. Or, use the standalone libraries
for specific needs.


# History

## v0.7.0 - 2023-06-13

This release adds support for SDV 1.0 and PyTorch 2.0!

### New Features

* Add functions to top level import - Issue [#229](https://github.com/sdv-dev/SDGym/issues/229) by @fealho
* Cleanup SDGym to the new SDV 1.0 metadata and synthesizers - Issue [#212](https://github.com/sdv-dev/SDGym/issues/212) by @fealho

### Bugs Fixed

* limit_dataset_size causes sdgym to crash - Issue [#231](https://github.com/sdv-dev/SDGym/issues/231) by @fealho
* benchmark_single_table crashes with metadata dict - Issue [#232](https://github.com/sdv-dev/SDGym/issues/232) by @fealho
* Passing None as synthesizers runs all of them - Issue [#233](https://github.com/sdv-dev/SDGym/issues/233) by @fealho
* timeout parameter causes sdgym to crash - Issue [#234](https://github.com/sdv-dev/SDGym/issues/234) by @pvk-developer
* SDGym is not working with latest torch - Issue [#210](https://github.com/sdv-dev/SDGym/issues/210) by @amontanez24
* Fix sdgym --help - Issue [#206](https://github.com/sdv-dev/SDGym/issues/206) by @katxiao

### Internal

* Increase code style lint - Issue [#123](https://github.com/sdv-dev/SDGym/issues/123) by @fealho
* Remove code support for synthesizers that are not strings/classes - PR [#236](https://github.com/sdv-dev/SDGym/pull/236) by @fealho
* Code Refactoring - Issue [#215](https://github.com/sdv-dev/SDGym/issues/215) by @fealho

### Maintenance

* Remove pomegranate - Issue [#230](https://github.com/sdv-dev/SDGym/issues/230) by @amontanez24

## v0.6.0 - 2023-02-01
This release introduces methods for benchmarking single table data and creating custom synthesizers, which can be based on existing SDGym-defined synthesizers or on user-defined functions. This release also adds support for Python 3.10 and drops support for Python 3.6.

### New Features
* Benchmarking progress bar should update on one line - Issue [#204](https://github.com/sdv-dev/SDGym/issues/204) by @katxiao
* Support local additional datasets folder with zip files - Issue [#186](https://github.com/sdv-dev/SDGym/issues/186) by @katxiao
* Enforce that each synthesizer is unique in benchmark_single_table - Issue [#190](https://github.com/sdv-dev/SDGym/issues/190) by @katxiao
* Simplify the file names inside the detailed_results_folder - Issue [#191](https://github.com/sdv-dev/SDGym/issues/191) by @katxiao
* Use SDMetrics silent report generation - Issue [#179](https://github.com/sdv-dev/SDGym/issues/179) by @katxiao
* Remove arguments in get_available_datasets - Issue [#197](https://github.com/sdv-dev/SDGym/issues/197) by @katxiao
* Accept metadata.json as valid metadata file - Issue [#194](https://github.com/sdv-dev/SDGym/issues/194) by @katxiao
* Check if file or folder exists before writing benchmarking results - Issue [#196](https://github.com/sdv-dev/SDGym/issues/196) by @katxiao
* Rename benchmarking argument "evaluate_quality" to "compute_quality_score" - Issue [#195](https://github.com/sdv-dev/SDGym/issues/195) by @katxiao
* Add option to disable sdmetrics in benchmarking - Issue [#182](https://github.com/sdv-dev/SDGym/issues/182) by @katxiao
* Prefix remote bucket with 's3' - Issue [#183](https://github.com/sdv-dev/SDGym/issues/183) by @katxiao
* Benchmarking error handling - Issue [#177](https://github.com/sdv-dev/SDGym/issues/177) by @katxiao
* Allow users to specify custom synthesizers' display names - Issue [#174](https://github.com/sdv-dev/SDGym/issues/174) by @katxiao
* Update benchmarking results columns - Issue [#172](https://github.com/sdv-dev/SDGym/issues/172) by @katxiao
* Allow custom datasets - Issue [#166](https://github.com/sdv-dev/SDGym/issues/166) by @katxiao
* Use new datasets s3 bucket - Issue [#161](https://github.com/sdv-dev/SDGym/issues/161) by @katxiao
* Create benchmark_single_table method - Issue [#151](https://github.com/sdv-dev/SDGym/issues/151) by @katxiao
* Update summary metrics - Issue [#134](https://github.com/sdv-dev/SDGym/issues/134) by @katxiao
* Benchmark individual methods - Issue [#159](https://github.com/sdv-dev/SDGym/issues/159) by @katxiao
* Add method to create a sdv variant synthesizer - Issue [#152](https://github.com/sdv-dev/SDGym/issues/152) by @katxiao
* Add method to generate a multi table synthesizer - Issue [#149](https://github.com/sdv-dev/SDGym/issues/149) by @katxiao
* Add method to create single table synthesizers - Issue [#148](https://github.com/sdv-dev/SDGym/issues/148) by @katxiao
* Updating existing synthesizers to new API - Issue [#154](https://github.com/sdv-dev/SDGym/issues/154) by @katxiao

### Bug Fixes
* Pip encounters dependency issues with ipython - Issue [#187](https://github.com/sdv-dev/SDGym/issues/187) by @katxiao
* IndependentSynthesizer is printing out ConvergeWarning too many times - Issue [#192](https://github.com/sdv-dev/SDGym/issues/192) by @katxiao
* Size values in benchmarking results seems inaccurate - Issue [#184](https://github.com/sdv-dev/SDGym/issues/184) by @katxiao
* Import error in the example for benchmarking the synthesizers - Issue [#139](https://github.com/sdv-dev/SDGym/issues/139) by @katxiao
* Updates and bugfixes - Issue [#132](https://github.com/sdv-dev/SDGym/issues/132) by @csala

### Maintenance
* Update README - Issue [#203](https://github.com/sdv-dev/SDGym/issues/203) by @katxiao
* Support Python Versions >=3.7 and <3.11 - Issue [#170](https://github.com/sdv-dev/SDGym/issues/170) by @katxiao
* SDGym Package Maintenance Updates documentation  - Issue [#163](https://github.com/sdv-dev/SDGym/issues/163) by @katxiao
* Remove YData - Issue [#168](https://github.com/sdv-dev/SDGym/issues/168) by @katxiao
* Update to newest SDV - Issue [#157](https://github.com/sdv-dev/SDGym/issues/157) by @katxiao
* Update slack invite link. - Issue [#144](https://github.com/sdv-dev/SDGym/issues/144) by @pvk-developer
* updating workflows to work with windows - Issue [#136](https://github.com/sdv-dev/SDGym/issues/136) by @amontanez24
* Update conda dependencies - Issue [#130](https://github.com/sdv-dev/SDGym/issues/130) by @katxiao

## v0.5.0 - 2021-12-13
This release adds support for Python 3.9, and updates dependencies to accept the latest versions when possible.

### Issues closed

* Add support for Python 3.9 - [Issue #127](https://github.com/sdv-dev/SDGym/issues/127) by @katxiao
* Add pip check worflow - [Issue #124](https://github.com/sdv-dev/SDGym/issues/124) by @pvk-developer
* Fix meta.yaml dependencies - [PR #119](https://github.com/sdv-dev/SDGym/pull/119) by @fealho
* Upgrade dependency ranges - [Issue #118](https://github.com/sdv-dev/SDGym/issues/118) by @katxiao

## v0.4.1 - 2021-08-20
This release fixed a bug where passing a `json` file as configuration for a multi-table synthesizer crashed the model.
It also adds a number of fixes and enhancements, including: (1) a function and CLI command to list the available synthesizer names,
(2) a curate set of dependencies and making `Gretel` into an optional dependency, (3) updating `Gretel` to use temp directories,
(4) using `nvidia-smi` to get the number of gpus and (5) multiple `dockerfile` updates to improve functionality.

### Issues closed

* Bug when using JSON configuration for multiple multi-table evaluation - [Issue #115](https://github.com/sdv-dev/SDGym/issues/115) by @pvk-developer
* Use nvidia-smi to get number of gpus - [PR #113](https://github.com/sdv-dev/SDGym/issues/113) by @katxiao
* List synthesizer names - [Issue #82](https://github.com/sdv-dev/SDGym/issues/82) by @fealho
* Use nvidia base for dockerfile - [PR #108](https://github.com/sdv-dev/SDGym/issues/108) by @katxiao
* Add Makefile target to install gretel and ydata - [PR #107](https://github.com/sdv-dev/SDGym/issues/107) by @katxiao
* Curate dependencies and make Gretel optional - [PR #106](https://github.com/sdv-dev/SDGym/issues/106) by @csala
* Update gretel checkpoints to use temp directory - [PR #105](https://github.com/sdv-dev/SDGym/issues/105) by @katxiao
* Initialize variable before reference - [PR #104](https://github.com/sdv-dev/SDGym/issues/104) by @katxiao

## v0.4.0 - 2021-06-17

This release adds new synthesizers for Gretel and ydata, and creates a Docker image for SDGym.
It also includes enhancements to the accepted SDGym arguments, adds a summary command to aggregate
metrics, and adds the normalized score to the benchmark results.

### New Features

* Add normalized score to benchmark results - [Issue #102](https://github.com/sdv-dev/SDGym/issues/102) by @katxiao
* Add max rows and max columns args - [Issue #96](https://github.com/sdv-dev/SDGym/issues/96) by @katxiao
* Automatically detect number of workers - [Issue #97](https://github.com/sdv-dev/SDGym/issues/97) by @katxiao
* Add summary function and command - [Issue #92](https://github.com/sdv-dev/SDGym/issues/92) by @amontanez24
* Allow jobs list/JSON to be passed - [Issue #93](https://github.com/sdv-dev/SDGym/issues/93) by @fealho
* Add ydata to sdgym - [Issue #90](https://github.com/sdv-dev/SDGym/issues/90) by @fealho
* Add dockerfile for sdgym - [Issue #88](https://github.com/sdv-dev/SDGym/issues/88) by @katxiao
* Add Gretel to SDGym synthesizer - [Issue #87](https://github.com/sdv-dev/SDGym/issues/87) by @amontanez24

## v0.3.1 - 2021-05-20

This release adds new features to store results and cache contents into an S3 bucket
as well as a script to collect results from a cache dir and compile a single results
CSV file.

### Issues closed

* Collect cached results from s3 bucket - [Issue #85](https://github.com/sdv-dev/SDGym/issues/85) by @katxiao
* Store cache contents into an S3 bucket - [Issue #81](https://github.com/sdv-dev/SDGym/issues/81) by @katxiao
* Store SDGym results into an S3 bucket - [Issue #80](https://github.com/sdv-dev/SDGym/issues/80) by @katxiao
* Add a way to collect cached results - [Issue #79](https://github.com/sdv-dev/SDGym/issues/79) by @katxiao
* Allow reading datasets from private s3 bucket - [Issue #74](https://github.com/sdv-dev/SDGym/issues/74) by @katxiao
* Typos in the sdgym.run function docstring documentation - [Issue #69](https://github.com/sdv-dev/SDGym/issues/69) by @sbrugman

## v0.3.0 - 2021-01-27

Major rework of the SDGym functionality to support a collection of new features:

* Add relational and timeseries model benchmarking.
* Use SDMetrics for model scoring.
* Update datasets format to match SDV metadata based storage format.
* Centralize default datasets collection in the `sdv-datasets` S3 bucket.
* Add options to download and use datasets from different S3 buckets.
* Rename synthesizers to baselines and adapt to the new metadata format.
* Add model execution and metric computation time logging.
* Add optional synthetic data and error traceback caching.

## v0.2.2 - 2020-10-17

This version adds a rework of the benchmark function and a few new synthesizers.

### New Features

* New CLI with `run`, `make-leaderboard` and `make-summary` commands
* Parallel execution via Dask or Multiprocessing
* Download datasets without executing the benchmark
* Support for python from 3.6 to 3.8

### New Synthesizers

* `sdv.tabular.CTGAN`
* `sdv.tabular.CopulaGAN`
* `sdv.tabular.GaussianCopulaOneHot`
* `sdv.tabular.GaussianCopulaCategorical`
* `sdv.tabular.GaussianCopulaCategoricalFuzzy`

## v0.2.1 - 2020-05-12

New updated leaderboard and minor improvements.

### New Features

* Add parameters for PrivBNSynthesizer - [Issue #37](https://github.com/sdv-dev/SDGym/issues/37) by @csala

## v0.2.0 - 2020-04-10

New Becnhmark API and lots of improved documentation.

### New Features

* The benchmark function now returns a complete leaderboard instead of only one score
* Class Synthesizers can be directly passed to the benchmark function

### Bug Fixes

* One hot encoding errors in the Independent, VEEGAN and Medgan Synthesizers.
* Proper usage of the `eval` mode during sampling.
* Fix improperly configured datasets.

## v0.1.0 - 2019-08-07

First release to PyPi

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/sdv-dev/SDGym",
    "name": "sdgym",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7,<3.11",
    "maintainer_email": "",
    "keywords": "machine learning synthetic data generation benchmark generative models",
    "author": "DataCebo, Inc.",
    "author_email": "info@sdv.dev",
    "download_url": "https://files.pythonhosted.org/packages/f4/96/dce97c0aae0c2a7c90902172b346df2d39b0aca3a47024e25ae311c534cf/sdgym-0.7.0.tar.gz",
    "platform": null,
    "description": "<div align=\"center\">\n<br/>\n<p align=\"center\">\n    <i>This repository is part of <a href=\"https://sdv.dev\">The Synthetic Data Vault Project</a>, a project from <a href=\"https://datacebo.com\">DataCebo</a>.</i>\n</p>\n\n[![Development Status](https://img.shields.io/badge/Development%20Status-2%20--%20Pre--Alpha-yellow)](https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha)\n[![Travis](https://travis-ci.org/sdv-dev/SDGym.svg?branch=master)](https://travis-ci.org/sdv-dev/SDGym)\n[![PyPi Shield](https://img.shields.io/pypi/v/sdgym.svg)](https://pypi.python.org/pypi/sdgym)\n[![Downloads](https://pepy.tech/badge/sdgym)](https://pepy.tech/project/sdgym)\n[![Slack](https://img.shields.io/badge/Community-Slack-blue?style=plastic&logo=slack)](https://bit.ly/sdv-slack-invite)\n\n<div align=\"left\">\n<br/>\n<p align=\"center\">\n<a href=\"https://github.com/sdv-dev/SDGym\">\n<img align=\"center\" width=40% src=\"https://github.com/sdv-dev/SDV/blob/master/docs/images/SDGym-DataCebo.png\"></img>\n</a>\n</p>\n</div>\n\n</div>\n\n# Overview\n\nThe Synthetic Data Gym (SDGym) is a benchmarking framework for modeling and generating\nsynthetic data. Measure performance and memory usage across different synthetic data modeling\ntechniques \u2013 classical statistics, deep learning and more!\n\n<img align=\"center\" src=\"docs/images/SDGym_Results.png\"></img>\n\nThe SDGym library integrates with the Synthetic Data Vault ecosystem. You can use any of its\nsynthesizers, datasets or metrics for benchmarking. You can also customize the process to include\nyour own work.\n\n* **Datasets**: Select any of the publicly available datasets from the SDV project, or input your own data.\n* **Synthesizers**: Choose from any of the SDV synthesizers and baselines. Or write your own custom\nmachine learning model.\n* **Evaluation**: In addition to performance and memory usage, you can also measure synthetic data\nquality and privacy through a variety of metrics.\n\n# Install\n\nInstall SDGym using pip or conda. We recommend using a virtual environment to avoid conflicts with other software on your device.\n\n```bash\npip install sdgym\n```\n\n```bash\nconda install -c pytorch -c conda-forge sdgym\n```\n\nFor more information about using SDGym, visit the [SDGym Documentation](https://docs.sdv.dev/sdgym).\n\n# Usage\n\nLet's benchmark synthetic data generation for single tables. First, let's define which modeling\ntechniques we want to use. Let's choose a few synthesizers from the SDV library and a few others\nto use as baselines.\n\n```python\n# these synthesizers come from the SDV library\n# each one uses different modeling techniques\nsdv_synthesizers = ['GaussianCopulaSynthesizer', 'CTGANSynthesizer']\n\n# these basic synthesizers are available in SDGym\n# as baselines\nbaseline_synthesizers = ['UniformSynthesizer']\n```\n\nNow, we can benchmark the different techniques:\n```python\nimport sdgym\n\nsdgym.benchmark_single_table(\n    synthesizers=(sdv_synthesizers + baseline_synthesizers)\n)\n```\n\nThe result is a detailed performance, memory and quality evaluation across the synthesizers\non a variety of publicly available datasets.\n\n## Supplying a custom synthesizer\n\nBenchmark your own synthetic data generation techniques. Define your synthesizer by\nspecifying the training logic (using machine learning) and the sampling logic.\n\n```python\ndef my_training_logic(data, metadata):\n    # create an object to represent your synthesizer\n    # train it using the data\n    return synthesizer\n\ndef my_sampling_logic(trained_synthesizer, num_rows):\n    # use the trained synthesizer to create\n    # num_rows of synthetic data\n    return synthetic_data\n```\n\nLearn more in the [Custom Synthesizers Guide](https://docs.sdv.dev/sdgym/customization/synthesizers/custom-synthesizers).\n\n## Customizing your datasets\n\nThe SDGym library includes many publicly available datasets that you can include right away.\nList these using the ``get_available_datasets`` feature.\n\n```python\nsdgym.get_available_datasets()\n```\n\n```\ndataset_name   size_MB     num_tables\nKRK_v1         0.072128    1\nadult          3.907448    1\nalarm          4.520128    1\nasia           1.280128    1\n...\n```\n\nYou can also include any custom, private datasets that are stored on your computer on an\nAmazon S3 bucket.\n\n```\nmy_datasets_folder = 's3://my-datasets-bucket'\n```\n\nFor more information, see the docs for [Customized Datasets](https://docs.sdv.dev/sdgym/customization/datasets).\n\n# What's next?\n\nVisit the [SDGym Documentation](https://docs.sdv.dev/sdgym) to learn more!\n\n---\n\n\n<div align=\"center\">\n<a href=\"https://datacebo.com\"><img align=\"center\" width=40% src=\"https://github.com/sdv-dev/SDV/blob/master/docs/images/DataCebo.png\"></img></a>\n</div>\n<br/>\n<br/>\n\n[The Synthetic Data Vault Project](https://sdv.dev) was first created at MIT's [Data to AI Lab](\nhttps://dai.lids.mit.edu/) in 2016. After 4 years of research and traction with enterprise, we\ncreated [DataCebo](https://datacebo.com) in 2020 with the goal of growing the project.\nToday, DataCebo is the proud developer of SDV, the largest ecosystem for\nsynthetic data generation & evaluation. It is home to multiple libraries that support synthetic\ndata, including:\n\n* \ud83d\udd04 Data discovery & transformation. Reverse the transforms to reproduce realistic data.\n* \ud83e\udde0 Multiple machine learning models -- ranging from Copulas to Deep Learning -- to create tabular,\n  multi table and time series data.\n* \ud83d\udcca Measuring quality and privacy of synthetic data, and comparing different synthetic data\n  generation models.\n\n[Get started using the SDV package](https://sdv.dev/SDV/getting_started/install.html) -- a fully\nintegrated solution and your one-stop shop for synthetic data. Or, use the standalone libraries\nfor specific needs.\n\n\n# History\n\n## v0.7.0 - 2023-06-13\n\nThis release adds support for SDV 1.0 and PyTorch 2.0!\n\n### New Features\n\n* Add functions to top level import - Issue [#229](https://github.com/sdv-dev/SDGym/issues/229) by @fealho\n* Cleanup SDGym to the new SDV 1.0 metadata and synthesizers - Issue [#212](https://github.com/sdv-dev/SDGym/issues/212) by @fealho\n\n### Bugs Fixed\n\n* limit_dataset_size causes sdgym to crash - Issue [#231](https://github.com/sdv-dev/SDGym/issues/231) by @fealho\n* benchmark_single_table crashes with metadata dict - Issue [#232](https://github.com/sdv-dev/SDGym/issues/232) by @fealho\n* Passing None as synthesizers runs all of them - Issue [#233](https://github.com/sdv-dev/SDGym/issues/233) by @fealho\n* timeout parameter causes sdgym to crash - Issue [#234](https://github.com/sdv-dev/SDGym/issues/234) by @pvk-developer\n* SDGym is not working with latest torch - Issue [#210](https://github.com/sdv-dev/SDGym/issues/210) by @amontanez24\n* Fix sdgym --help - Issue [#206](https://github.com/sdv-dev/SDGym/issues/206) by @katxiao\n\n### Internal\n\n* Increase code style lint - Issue [#123](https://github.com/sdv-dev/SDGym/issues/123) by @fealho\n* Remove code support for synthesizers that are not strings/classes - PR [#236](https://github.com/sdv-dev/SDGym/pull/236) by @fealho\n* Code Refactoring - Issue [#215](https://github.com/sdv-dev/SDGym/issues/215) by @fealho\n\n### Maintenance\n\n* Remove pomegranate - Issue [#230](https://github.com/sdv-dev/SDGym/issues/230) by @amontanez24\n\n## v0.6.0 - 2023-02-01\nThis release introduces methods for benchmarking single table data and creating custom synthesizers, which can be based on existing SDGym-defined synthesizers or on user-defined functions. This release also adds support for Python 3.10 and drops support for Python 3.6.\n\n### New Features\n* Benchmarking progress bar should update on one line - Issue [#204](https://github.com/sdv-dev/SDGym/issues/204) by @katxiao\n* Support local additional datasets folder with zip files - Issue [#186](https://github.com/sdv-dev/SDGym/issues/186) by @katxiao\n* Enforce that each synthesizer is unique in benchmark_single_table - Issue [#190](https://github.com/sdv-dev/SDGym/issues/190) by @katxiao\n* Simplify the file names inside the detailed_results_folder - Issue [#191](https://github.com/sdv-dev/SDGym/issues/191) by @katxiao\n* Use SDMetrics silent report generation - Issue [#179](https://github.com/sdv-dev/SDGym/issues/179) by @katxiao\n* Remove arguments in get_available_datasets - Issue [#197](https://github.com/sdv-dev/SDGym/issues/197) by @katxiao\n* Accept metadata.json as valid metadata file - Issue [#194](https://github.com/sdv-dev/SDGym/issues/194) by @katxiao\n* Check if file or folder exists before writing benchmarking results - Issue [#196](https://github.com/sdv-dev/SDGym/issues/196) by @katxiao\n* Rename benchmarking argument \"evaluate_quality\" to \"compute_quality_score\" - Issue [#195](https://github.com/sdv-dev/SDGym/issues/195) by @katxiao\n* Add option to disable sdmetrics in benchmarking - Issue [#182](https://github.com/sdv-dev/SDGym/issues/182) by @katxiao\n* Prefix remote bucket with 's3' - Issue [#183](https://github.com/sdv-dev/SDGym/issues/183) by @katxiao\n* Benchmarking error handling - Issue [#177](https://github.com/sdv-dev/SDGym/issues/177) by @katxiao\n* Allow users to specify custom synthesizers' display names - Issue [#174](https://github.com/sdv-dev/SDGym/issues/174) by @katxiao\n* Update benchmarking results columns - Issue [#172](https://github.com/sdv-dev/SDGym/issues/172) by @katxiao\n* Allow custom datasets - Issue [#166](https://github.com/sdv-dev/SDGym/issues/166) by @katxiao\n* Use new datasets s3 bucket - Issue [#161](https://github.com/sdv-dev/SDGym/issues/161) by @katxiao\n* Create benchmark_single_table method - Issue [#151](https://github.com/sdv-dev/SDGym/issues/151) by @katxiao\n* Update summary metrics - Issue [#134](https://github.com/sdv-dev/SDGym/issues/134) by @katxiao\n* Benchmark individual methods - Issue [#159](https://github.com/sdv-dev/SDGym/issues/159) by @katxiao\n* Add method to create a sdv variant synthesizer - Issue [#152](https://github.com/sdv-dev/SDGym/issues/152) by @katxiao\n* Add method to generate a multi table synthesizer - Issue [#149](https://github.com/sdv-dev/SDGym/issues/149) by @katxiao\n* Add method to create single table synthesizers - Issue [#148](https://github.com/sdv-dev/SDGym/issues/148) by @katxiao\n* Updating existing synthesizers to new API - Issue [#154](https://github.com/sdv-dev/SDGym/issues/154) by @katxiao\n\n### Bug Fixes\n* Pip encounters dependency issues with ipython - Issue [#187](https://github.com/sdv-dev/SDGym/issues/187) by @katxiao\n* IndependentSynthesizer is printing out ConvergeWarning too many times - Issue [#192](https://github.com/sdv-dev/SDGym/issues/192) by @katxiao\n* Size values in benchmarking results seems inaccurate - Issue [#184](https://github.com/sdv-dev/SDGym/issues/184) by @katxiao\n* Import error in the example for benchmarking the synthesizers - Issue [#139](https://github.com/sdv-dev/SDGym/issues/139) by @katxiao\n* Updates and bugfixes - Issue [#132](https://github.com/sdv-dev/SDGym/issues/132) by @csala\n\n### Maintenance\n* Update README - Issue [#203](https://github.com/sdv-dev/SDGym/issues/203) by @katxiao\n* Support Python Versions >=3.7 and <3.11 - Issue [#170](https://github.com/sdv-dev/SDGym/issues/170) by @katxiao\n* SDGym Package Maintenance Updates documentation  - Issue [#163](https://github.com/sdv-dev/SDGym/issues/163) by @katxiao\n* Remove YData - Issue [#168](https://github.com/sdv-dev/SDGym/issues/168) by @katxiao\n* Update to newest SDV - Issue [#157](https://github.com/sdv-dev/SDGym/issues/157) by @katxiao\n* Update slack invite link. - Issue [#144](https://github.com/sdv-dev/SDGym/issues/144) by @pvk-developer\n* updating workflows to work with windows - Issue [#136](https://github.com/sdv-dev/SDGym/issues/136) by @amontanez24\n* Update conda dependencies - Issue [#130](https://github.com/sdv-dev/SDGym/issues/130) by @katxiao\n\n## v0.5.0 - 2021-12-13\nThis release adds support for Python 3.9, and updates dependencies to accept the latest versions when possible.\n\n### Issues closed\n\n* Add support for Python 3.9 - [Issue #127](https://github.com/sdv-dev/SDGym/issues/127) by @katxiao\n* Add pip check worflow - [Issue #124](https://github.com/sdv-dev/SDGym/issues/124) by @pvk-developer\n* Fix meta.yaml dependencies - [PR #119](https://github.com/sdv-dev/SDGym/pull/119) by @fealho\n* Upgrade dependency ranges - [Issue #118](https://github.com/sdv-dev/SDGym/issues/118) by @katxiao\n\n## v0.4.1 - 2021-08-20\nThis release fixed a bug where passing a `json` file as configuration for a multi-table synthesizer crashed the model.\nIt also adds a number of fixes and enhancements, including: (1) a function and CLI command to list the available synthesizer names,\n(2) a curate set of dependencies and making `Gretel` into an optional dependency, (3) updating `Gretel` to use temp directories,\n(4) using `nvidia-smi` to get the number of gpus and (5) multiple `dockerfile` updates to improve functionality.\n\n### Issues closed\n\n* Bug when using JSON configuration for multiple multi-table evaluation - [Issue #115](https://github.com/sdv-dev/SDGym/issues/115) by @pvk-developer\n* Use nvidia-smi to get number of gpus - [PR #113](https://github.com/sdv-dev/SDGym/issues/113) by @katxiao\n* List synthesizer names - [Issue #82](https://github.com/sdv-dev/SDGym/issues/82) by @fealho\n* Use nvidia base for dockerfile - [PR #108](https://github.com/sdv-dev/SDGym/issues/108) by @katxiao\n* Add Makefile target to install gretel and ydata - [PR #107](https://github.com/sdv-dev/SDGym/issues/107) by @katxiao\n* Curate dependencies and make Gretel optional - [PR #106](https://github.com/sdv-dev/SDGym/issues/106) by @csala\n* Update gretel checkpoints to use temp directory - [PR #105](https://github.com/sdv-dev/SDGym/issues/105) by @katxiao\n* Initialize variable before reference - [PR #104](https://github.com/sdv-dev/SDGym/issues/104) by @katxiao\n\n## v0.4.0 - 2021-06-17\n\nThis release adds new synthesizers for Gretel and ydata, and creates a Docker image for SDGym.\nIt also includes enhancements to the accepted SDGym arguments, adds a summary command to aggregate\nmetrics, and adds the normalized score to the benchmark results.\n\n### New Features\n\n* Add normalized score to benchmark results - [Issue #102](https://github.com/sdv-dev/SDGym/issues/102) by @katxiao\n* Add max rows and max columns args - [Issue #96](https://github.com/sdv-dev/SDGym/issues/96) by @katxiao\n* Automatically detect number of workers - [Issue #97](https://github.com/sdv-dev/SDGym/issues/97) by @katxiao\n* Add summary function and command - [Issue #92](https://github.com/sdv-dev/SDGym/issues/92) by @amontanez24\n* Allow jobs list/JSON to be passed - [Issue #93](https://github.com/sdv-dev/SDGym/issues/93) by @fealho\n* Add ydata to sdgym - [Issue #90](https://github.com/sdv-dev/SDGym/issues/90) by @fealho\n* Add dockerfile for sdgym - [Issue #88](https://github.com/sdv-dev/SDGym/issues/88) by @katxiao\n* Add Gretel to SDGym synthesizer - [Issue #87](https://github.com/sdv-dev/SDGym/issues/87) by @amontanez24\n\n## v0.3.1 - 2021-05-20\n\nThis release adds new features to store results and cache contents into an S3 bucket\nas well as a script to collect results from a cache dir and compile a single results\nCSV file.\n\n### Issues closed\n\n* Collect cached results from s3 bucket - [Issue #85](https://github.com/sdv-dev/SDGym/issues/85) by @katxiao\n* Store cache contents into an S3 bucket - [Issue #81](https://github.com/sdv-dev/SDGym/issues/81) by @katxiao\n* Store SDGym results into an S3 bucket - [Issue #80](https://github.com/sdv-dev/SDGym/issues/80) by @katxiao\n* Add a way to collect cached results - [Issue #79](https://github.com/sdv-dev/SDGym/issues/79) by @katxiao\n* Allow reading datasets from private s3 bucket - [Issue #74](https://github.com/sdv-dev/SDGym/issues/74) by @katxiao\n* Typos in the sdgym.run function docstring documentation - [Issue #69](https://github.com/sdv-dev/SDGym/issues/69) by @sbrugman\n\n## v0.3.0 - 2021-01-27\n\nMajor rework of the SDGym functionality to support a collection of new features:\n\n* Add relational and timeseries model benchmarking.\n* Use SDMetrics for model scoring.\n* Update datasets format to match SDV metadata based storage format.\n* Centralize default datasets collection in the `sdv-datasets` S3 bucket.\n* Add options to download and use datasets from different S3 buckets.\n* Rename synthesizers to baselines and adapt to the new metadata format.\n* Add model execution and metric computation time logging.\n* Add optional synthetic data and error traceback caching.\n\n## v0.2.2 - 2020-10-17\n\nThis version adds a rework of the benchmark function and a few new synthesizers.\n\n### New Features\n\n* New CLI with `run`, `make-leaderboard` and `make-summary` commands\n* Parallel execution via Dask or Multiprocessing\n* Download datasets without executing the benchmark\n* Support for python from 3.6 to 3.8\n\n### New Synthesizers\n\n* `sdv.tabular.CTGAN`\n* `sdv.tabular.CopulaGAN`\n* `sdv.tabular.GaussianCopulaOneHot`\n* `sdv.tabular.GaussianCopulaCategorical`\n* `sdv.tabular.GaussianCopulaCategoricalFuzzy`\n\n## v0.2.1 - 2020-05-12\n\nNew updated leaderboard and minor improvements.\n\n### New Features\n\n* Add parameters for PrivBNSynthesizer - [Issue #37](https://github.com/sdv-dev/SDGym/issues/37) by @csala\n\n## v0.2.0 - 2020-04-10\n\nNew Becnhmark API and lots of improved documentation.\n\n### New Features\n\n* The benchmark function now returns a complete leaderboard instead of only one score\n* Class Synthesizers can be directly passed to the benchmark function\n\n### Bug Fixes\n\n* One hot encoding errors in the Independent, VEEGAN and Medgan Synthesizers.\n* Proper usage of the `eval` mode during sampling.\n* Fix improperly configured datasets.\n\n## v0.1.0 - 2019-08-07\n\nFirst release to PyPi\n",
    "bugtrack_url": null,
    "license": "BSL-1.1",
    "summary": "Benchmark tabular synthetic data generators using a variety of datasets",
    "version": "0.7.0",
    "project_urls": {
        "Homepage": "https://github.com/sdv-dev/SDGym"
    },
    "split_keywords": [
        "machine",
        "learning",
        "synthetic",
        "data",
        "generation",
        "benchmark",
        "generative",
        "models"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0a53dd9bded899c44f6c4f4b9a2ab90dfb135e796ece8a296247b96f825cbab3",
                "md5": "f8ee2290e274f2e79829e34634e4815d",
                "sha256": "0a44ac4109fae29b4d57adb02dfbf57c9362d23dfe86f0d81ed8a23575d1e00e"
            },
            "downloads": -1,
            "filename": "sdgym-0.7.0-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "f8ee2290e274f2e79829e34634e4815d",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": ">=3.7,<3.11",
            "size": 39329,
            "upload_time": "2023-06-14T18:20:20",
            "upload_time_iso_8601": "2023-06-14T18:20:20.029274Z",
            "url": "https://files.pythonhosted.org/packages/0a/53/dd9bded899c44f6c4f4b9a2ab90dfb135e796ece8a296247b96f825cbab3/sdgym-0.7.0-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f496dce97c0aae0c2a7c90902172b346df2d39b0aca3a47024e25ae311c534cf",
                "md5": "7008ec58d3091fff2300024b5433edc5",
                "sha256": "7578c253516ea5d7dbe00765fbb2956273842ba5c3deda5665f07789cd7cf189"
            },
            "downloads": -1,
            "filename": "sdgym-0.7.0.tar.gz",
            "has_sig": false,
            "md5_digest": "7008ec58d3091fff2300024b5433edc5",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7,<3.11",
            "size": 341873,
            "upload_time": "2023-06-14T18:20:23",
            "upload_time_iso_8601": "2023-06-14T18:20:23.352169Z",
            "url": "https://files.pythonhosted.org/packages/f4/96/dce97c0aae0c2a7c90902172b346df2d39b0aca3a47024e25ae311c534cf/sdgym-0.7.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-06-14 18:20:23",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "sdv-dev",
    "github_project": "SDGym",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "sdgym"
}
        
Elapsed time: 0.07790s