Name | ctgan JSON |
Version |
0.11.0
JSON |
| download |
home_page | None |
Summary | Create tabular synthetic data using a conditional GAN |
upload_time | 2025-02-26 20:05:34 |
maintainer | None |
docs_url | None |
author | None |
requires_python | <3.14,>=3.8 |
license | BSL-1.1 |
keywords |
ctgan
ctgan
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
<div align="center">
<br/>
<p align="center">
<i>This repository is part of <a href="https://sdv.dev">The Synthetic Data Vault Project</a>, a project from <a href="https://datacebo.com">DataCebo</a>.</i>
</p>
[](https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha)
[](https://pypi.python.org/pypi/ctgan)
[](https://github.com/sdv-dev/CTGAN/actions/workflows/unit.yml)
[](https://pepy.tech/project/ctgan)
[](https://codecov.io/gh/sdv-dev/CTGAN)
<div align="left">
<br/>
<p align="center">
<a href="https://github.com/sdv-dev/CTGAN">
<img align="center" width=40% src="https://github.com/sdv-dev/SDV/blob/stable/docs/images/CTGAN-DataCebo.png"></img>
</a>
</p>
</div>
</div>
# Overview
CTGAN is a collection of Deep Learning based synthetic data generators for single table data, which are able to learn from real data and generate synthetic data with high fidelity.
| Important Links | |
| --------------------------------------------- | -------------------------------------------------------------------- |
| :computer: **[Website]** | Check out the SDV Website for more information about our overall synthetic data ecosystem.|
| :orange_book: **[Blog]** | A deeper look at open source, synthetic data creation and evaluation.|
| :book: **[Documentation]** | Quickstarts, User and Development Guides, and API Reference. |
| :octocat: **[Repository]** | The link to the Github Repository of this library. |
| :keyboard: **[Development Status]** | This software is in its Pre-Alpha stage. |
| [![][Slack Logo] **Community**][Community] | Join our Slack Workspace for announcements and discussions. |
[Website]: https://sdv.dev
[Blog]: https://datacebo.com/blog
[Documentation]: https://bit.ly/sdv-docs
[Repository]: https://github.com/sdv-dev/CTGAN
[License]: https://github.com/sdv-dev/CTGAN/blob/main/LICENSE
[Development Status]: https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha
[Slack Logo]: https://github.com/sdv-dev/SDV/blob/stable/docs/images/slack.png
[Community]: https://bit.ly/sdv-slack-invite
Currently, this library implements the **CTGAN** and **TVAE** models described in the [Modeling Tabular data using Conditional GAN](https://arxiv.org/abs/1907.00503) paper, presented at the 2019 NeurIPS conference.
# Install
## Use CTGAN through the SDV library
:warning: If you're just getting started with synthetic data, we recommend installing the SDV library which provides user-friendly APIs for accessing CTGAN. :warning:
The SDV library provides wrappers for preprocessing your data as well as additional usability features like constraints. See the [SDV documentation](https://bit.ly/sdv-docs) to get started.
## Use the CTGAN standalone library
Alternatively, you can also install and use **CTGAN** directly, as a standalone library:
**Using `pip`:**
```bash
pip install ctgan
```
**Using `conda`:**
```bash
conda install -c pytorch -c conda-forge ctgan
```
When using the CTGAN library directly, you may need to manually preprocess your data into the correct format, for example:
* Continuous data must be represented as floats
* Discrete data must be represented as ints or strings
* The data should not contain any missing values
# Usage Example
In this example we load the [Adult Census Dataset](https://archive.ics.uci.edu/ml/datasets/adult)* which is a built-in demo dataset. We use CTGAN to learn from the real data and then generate some synthetic data.
```python3
from ctgan import CTGAN
from ctgan import load_demo
real_data = load_demo()
# Names of the columns that are discrete
discrete_columns = [
'workclass',
'education',
'marital-status',
'occupation',
'relationship',
'race',
'sex',
'native-country',
'income'
]
ctgan = CTGAN(epochs=10)
ctgan.fit(real_data, discrete_columns)
# Create synthetic data
synthetic_data = ctgan.sample(1000)
```
*For more information about the dataset see:
Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml].
Irvine, CA: University of California, School of Information and Computer Science.
# Join our community
Join our [Slack channel](https://bit.ly/sdv-slack-invite) to discuss more about CTGAN and synthetic data. If you find a bug or have a feature request, you can also [open an issue](https://github.com/sdv-dev/CTGAN/issues) on our GitHub.
**Interested in contributing to CTGAN?** Read our [Contribution Guide](CONTRIBUTING.rst) to get started.
# Citing CTGAN
If you use CTGAN, please cite the following work:
*Lei Xu, Maria Skoularidou, Alfredo Cuesta-Infante, Kalyan Veeramachaneni.* **Modeling Tabular data using Conditional GAN**. NeurIPS, 2019.
```LaTeX
@inproceedings{ctgan,
title={Modeling Tabular data using Conditional GAN},
author={Xu, Lei and Skoularidou, Maria and Cuesta-Infante, Alfredo and Veeramachaneni, Kalyan},
booktitle={Advances in Neural Information Processing Systems},
year={2019}
}
```
# Related Projects
Please note that these projects are external to the SDV Ecosystem. They are not affiliated with or maintained by DataCebo.
* **R Interface for CTGAN**: A wrapper around **CTGAN** that brings the functionalities to **R** users.
More details can be found in the corresponding repository: https://github.com/kasaai/ctgan
* **CTGAN Server CLI**: A package to easily deploy CTGAN onto a remote server. Created by Timothy Pillow @oregonpillow at: https://github.com/oregonpillow/ctgan-server-cli
---
<div align="center">
<a href="https://datacebo.com"><img align="center" width=40% src="https://github.com/sdv-dev/SDV/blob/stable/docs/images/DataCebo.png"></img></a>
</div>
<br/>
<br/>
[The Synthetic Data Vault Project](https://sdv.dev) was first created at MIT's [Data to AI Lab](
https://dai.lids.mit.edu/) in 2016. After 4 years of research and traction with enterprise, we
created [DataCebo](https://datacebo.com) in 2020 with the goal of growing the project.
Today, DataCebo is the proud developer of SDV, the largest ecosystem for
synthetic data generation & evaluation. It is home to multiple libraries that support synthetic
data, including:
* 🔄 Data discovery & transformation. Reverse the transforms to reproduce realistic data.
* 🧠 Multiple machine learning models -- ranging from Copulas to Deep Learning -- to create tabular,
multi table and time series data.
* 📊 Measuring quality and privacy of synthetic data, and comparing different synthetic data
generation models.
[Get started using the SDV package](https://sdv.dev/SDV/getting_started/install.html) -- a fully
integrated solution and your one-stop shop for synthetic data. Or, use the standalone libraries
for specific needs.
Raw data
{
"_id": null,
"home_page": null,
"name": "ctgan",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.14,>=3.8",
"maintainer_email": null,
"keywords": "ctgan, CTGAN",
"author": null,
"author_email": "\"DataCebo, Inc.\" <info@sdv.dev>",
"download_url": "https://files.pythonhosted.org/packages/89/95/9ddfd01c8f668fc85048eaa6caf49009fc4cda2395848347261eb67c64cc/ctgan-0.11.0.tar.gz",
"platform": null,
"description": "<div align=\"center\">\n<br/>\n<p align=\"center\">\n <i>This repository is part of <a href=\"https://sdv.dev\">The Synthetic Data Vault Project</a>, a project from <a href=\"https://datacebo.com\">DataCebo</a>.</i>\n</p>\n\n[](https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha)\n[](https://pypi.python.org/pypi/ctgan)\n[](https://github.com/sdv-dev/CTGAN/actions/workflows/unit.yml)\n[](https://pepy.tech/project/ctgan)\n[](https://codecov.io/gh/sdv-dev/CTGAN)\n\n<div align=\"left\">\n<br/>\n<p align=\"center\">\n<a href=\"https://github.com/sdv-dev/CTGAN\">\n<img align=\"center\" width=40% src=\"https://github.com/sdv-dev/SDV/blob/stable/docs/images/CTGAN-DataCebo.png\"></img>\n</a>\n</p>\n</div>\n\n</div>\n\n# Overview\n\nCTGAN\u00a0is a collection of Deep Learning based\u00a0synthetic data generators\u00a0for\u00a0single table\u00a0data, which are able to learn from real data and generate synthetic data with high fidelity.\n\n| Important Links | |\n| --------------------------------------------- | -------------------------------------------------------------------- |\n| :computer: **[Website]** | Check out the SDV Website for more information about our overall synthetic data ecosystem.|\n| :orange_book: **[Blog]** | A deeper look at open source, synthetic data creation and evaluation.|\n| :book: **[Documentation]** | Quickstarts, User and Development Guides, and API Reference. |\n| :octocat: **[Repository]** | The link to the Github Repository of this library. |\n| :keyboard: **[Development Status]** | This software is in its Pre-Alpha stage. |\n| [![][Slack Logo] **Community**][Community] | Join our Slack Workspace for announcements and discussions. |\n\n[Website]: https://sdv.dev\n[Blog]: https://datacebo.com/blog\n[Documentation]: https://bit.ly/sdv-docs\n[Repository]: https://github.com/sdv-dev/CTGAN\n[License]: https://github.com/sdv-dev/CTGAN/blob/main/LICENSE\n[Development Status]: https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha\n[Slack Logo]: https://github.com/sdv-dev/SDV/blob/stable/docs/images/slack.png\n[Community]: https://bit.ly/sdv-slack-invite\n\nCurrently, this library implements the **CTGAN** and **TVAE** models described in the [Modeling Tabular data using Conditional GAN](https://arxiv.org/abs/1907.00503) paper, presented at the 2019 NeurIPS conference.\n\n# Install\n\n## Use CTGAN through the SDV library\n\n:warning: If you're just getting started with synthetic data, we recommend installing the SDV library which provides user-friendly APIs for accessing CTGAN. :warning:\n\nThe SDV library provides wrappers for preprocessing your data as well as additional usability features like constraints. See the [SDV documentation](https://bit.ly/sdv-docs) to get started.\n\n## Use the CTGAN standalone library\n\nAlternatively, you can also install and use **CTGAN** directly, as a standalone library:\n\n**Using `pip`:**\n\n```bash\npip install ctgan\n```\n\n**Using `conda`:**\n\n```bash\nconda install -c pytorch -c conda-forge ctgan\n```\n\nWhen using the CTGAN library directly, you may need to manually preprocess your data into the correct format, for example:\n\n* Continuous data must be represented as floats\n* Discrete data must be represented as ints or strings\n* The data should not contain any missing values\n\n# Usage Example\n\nIn this example we load the [Adult Census Dataset](https://archive.ics.uci.edu/ml/datasets/adult)* which is a built-in demo dataset. We use CTGAN to learn from the real data and then generate some synthetic data.\n\n```python3\nfrom ctgan import CTGAN\nfrom ctgan import load_demo\n\nreal_data = load_demo()\n\n# Names of the columns that are discrete\ndiscrete_columns = [\n 'workclass',\n 'education',\n 'marital-status',\n 'occupation',\n 'relationship',\n 'race',\n 'sex',\n 'native-country',\n 'income'\n]\n\nctgan = CTGAN(epochs=10)\nctgan.fit(real_data, discrete_columns)\n\n# Create synthetic data\nsynthetic_data = ctgan.sample(1000)\n```\n\n*For more information about the dataset see:\nDua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml].\nIrvine, CA: University of California, School of Information and Computer Science.\n\n# Join our community\n\nJoin our [Slack channel](https://bit.ly/sdv-slack-invite) to discuss more about CTGAN and synthetic data. If you find a bug or have a feature request, you can also [open an issue](https://github.com/sdv-dev/CTGAN/issues) on our GitHub.\n\n**Interested in contributing to CTGAN?** Read our [Contribution Guide](CONTRIBUTING.rst) to get started.\n\n# Citing CTGAN\n\nIf you use CTGAN, please cite the following work:\n\n*Lei Xu, Maria Skoularidou, Alfredo Cuesta-Infante, Kalyan Veeramachaneni.* **Modeling Tabular data using Conditional GAN**. NeurIPS, 2019.\n\n```LaTeX\n@inproceedings{ctgan,\n title={Modeling Tabular data using Conditional GAN},\n author={Xu, Lei and Skoularidou, Maria and Cuesta-Infante, Alfredo and Veeramachaneni, Kalyan},\n booktitle={Advances in Neural Information Processing Systems},\n year={2019}\n}\n```\n\n# Related Projects\nPlease note that these projects are external to the SDV Ecosystem. They are not affiliated with or maintained by DataCebo.\n\n* **R Interface for CTGAN**: A wrapper around **CTGAN** that brings the functionalities to **R** users.\nMore details can be found in the corresponding repository: https://github.com/kasaai/ctgan\n* **CTGAN Server CLI**: A package to easily deploy CTGAN onto a remote server. Created by Timothy Pillow @oregonpillow at: https://github.com/oregonpillow/ctgan-server-cli\n\n---\n\n\n<div align=\"center\">\n<a href=\"https://datacebo.com\"><img align=\"center\" width=40% src=\"https://github.com/sdv-dev/SDV/blob/stable/docs/images/DataCebo.png\"></img></a>\n</div>\n<br/>\n<br/>\n\n[The Synthetic Data Vault Project](https://sdv.dev) was first created at MIT's [Data to AI Lab](\nhttps://dai.lids.mit.edu/) in 2016. After 4 years of research and traction with enterprise, we\ncreated [DataCebo](https://datacebo.com) in 2020 with the goal of growing the project.\nToday, DataCebo is the proud developer of SDV, the largest ecosystem for\nsynthetic data generation & evaluation. It is home to multiple libraries that support synthetic\ndata, including:\n\n* \ud83d\udd04 Data discovery & transformation. Reverse the transforms to reproduce realistic data.\n* \ud83e\udde0 Multiple machine learning models -- ranging from Copulas to Deep Learning -- to create tabular,\n multi table and time series data.\n* \ud83d\udcca Measuring quality and privacy of synthetic data, and comparing different synthetic data\n generation models.\n\n[Get started using the SDV package](https://sdv.dev/SDV/getting_started/install.html) -- a fully\nintegrated solution and your one-stop shop for synthetic data. Or, use the standalone libraries\nfor specific needs.\n",
"bugtrack_url": null,
"license": "BSL-1.1",
"summary": "Create tabular synthetic data using a conditional GAN",
"version": "0.11.0",
"project_urls": {
"Changes": "https://github.com/sdv-dev/CTGAN/blob/main/HISTORY.md",
"Chat": "https://bit.ly/sdv-slack-invite",
"Issue Tracker": "https://github.com/sdv-dev/CTGAN/issues",
"Source Code": "https://github.com/sdv-dev/CTGAN/",
"Twitter": "https://twitter.com/sdv_dev"
},
"split_keywords": [
"ctgan",
" ctgan"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "900cdb2b3039762226ba93004aa4a104e3471c4ae596fbecbb3db236900663bf",
"md5": "e7276861b3c4bfb029a28a4b617cdff5",
"sha256": "ae84b28ae0d131b729c7bd3439db871941c2ffc92755a106f811a085f013b656"
},
"downloads": -1,
"filename": "ctgan-0.11.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "e7276861b3c4bfb029a28a4b617cdff5",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.14,>=3.8",
"size": 24377,
"upload_time": "2025-02-26T20:05:32",
"upload_time_iso_8601": "2025-02-26T20:05:32.941303Z",
"url": "https://files.pythonhosted.org/packages/90/0c/db2b3039762226ba93004aa4a104e3471c4ae596fbecbb3db236900663bf/ctgan-0.11.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "89959ddfd01c8f668fc85048eaa6caf49009fc4cda2395848347261eb67c64cc",
"md5": "777a77f45f647eb710a68aa659ce9da0",
"sha256": "dd08b02370d375663f282f020d1729ee80e4b16bf27a61c82156bfe690c2092b"
},
"downloads": -1,
"filename": "ctgan-0.11.0.tar.gz",
"has_sig": false,
"md5_digest": "777a77f45f647eb710a68aa659ce9da0",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.14,>=3.8",
"size": 26140,
"upload_time": "2025-02-26T20:05:34",
"upload_time_iso_8601": "2025-02-26T20:05:34.869668Z",
"url": "https://files.pythonhosted.org/packages/89/95/9ddfd01c8f668fc85048eaa6caf49009fc4cda2395848347261eb67c64cc/ctgan-0.11.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-02-26 20:05:34",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "sdv-dev",
"github_project": "CTGAN",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"tox": true,
"lcname": "ctgan"
}