dp-cgans


Namedp-cgans JSON
Version 0.0.6 PyPI version JSON
download
home_pageNone
SummaryA library to generate synthetic tabular or RDF data using Conditional Generative Adversary Networks (GANs) combined with Differential Privacy techniques.
upload_time2023-12-04 12:26:45
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseMIT License Copyright (c) 2023-present Sun Chang <sunchang0124@gmail.com> Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords cgan dp differential privacy gan synthetic data
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # 👯 DP-CGANS (Differentially Private - Conditional Generative Adversarial NetworkS)

[![PyPi Shield](https://img.shields.io/pypi/v/dp-cgans)](https://pypi.org/project/dp-cgans/) [![Py versions](https://img.shields.io/pypi/pyversions/dp-cgans)](https://pypi.org/project/dp-cgans/) [![Test package](https://github.com/sunchang0124/dp_cgans/actions/workflows/test.yml/badge.svg)](https://github.com/sunchang0124/dp_cgans/actions/workflows/test.yml) [![Publish package](https://github.com/sunchang0124/dp_cgans/actions/workflows/publish.yml/badge.svg)](https://github.com/sunchang0124/dp_cgans/actions/workflows/publish.yml)



<!-- [![Development Status](https://img.shields.io/badge/Development%20Status-2%20--%20Pre--Alpha-yellow)](https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha) -->
<!-- [![PyPi Shield](https://img.shields.io/badge/pypi-v0.0.2-blue)](https://pypi.org/project/dp-cgans/) -->
<!-- [![Tests](https://github.com/sdv-dev/SDV/workflows/Run%20Tests/badge.svg)](https://github.com/sdv-dev/SDV/actions?query=workflow%3A%22Run+Tests%22+branch%3Amaster) -->

**Abstract**: This repository presents a Conditional Generative Adversary Networks (GANs) on tabular data (and RDF data) combining with Differential Privacy techniques. Our pre-print publication: [Generating synthetic personal health data using conditional generative adversarial networks combining with differential privacy](https://doi.org/10.1016/j.jbi.2023.104404).

**Author**: Chang Sun, Institute of Data Science, Maastricht University
**Start date**: Nov-2021
**Status**: Under development

**Note**: "Standing on the shoulders of giants". This repository is inspired by the excellent work of [CTGAN](https://github.com/sdv-dev/CTGAN) from [Synthetic Data Vault (SDV)](https://github.com/sdv-dev/SDV), [Tensorflow Privacy](https://github.com/tensorflow/privacy), and [RdfPdans](https://github.com/cadmiumkitty/rdfpandas). Highly appreciate they shared the ideas and implementations, made code publicly available, well-written documentation. More related work can be found in the References below.  

This package is extended from SDV (https://github.com/sdv-dev/SDV), CTGAN (https://github.com/sdv-dev/CTGAN), and Differential Privacy in GANs (https://github.com/civisanalytics/dpwgan). The author modified the conditional matrix and cost functions to emphasize on the relations between variables. The main changes are in ctgan/synthesizers/ctgan.py ../data_sampler.py ../data_transformer.py


## 📥️ Installation

You will need Python >=3.8+ and <3.10

```shell
pip install dp-cgans
```

## 🪄 Usage

### ⌨️ Use as a command-line interface

You can easily generate synthetic data for a file using your terminal after installing `dp-cgans` with pip.

To quickly run our example, you can download the [example data](https://raw.githubusercontent.com/sunchang0124/dp_cgans/main/resources/example_tabular_data_UCIAdult.csv):

```bash
wget https://raw.githubusercontent.com/sunchang0124/dp_cgans/main/resources/example_tabular_data_UCIAdult.csv
```

Then run `dp-cgans`:

```bash
dp-cgans gen example_tabular_data_UCIAdult.csv --epochs 2 --output out.csv --gen-size 100
```

Get a full rundown of the available options for generating synthetic data with:

```bash
dp-cgans gen --help
```

### 🐍 Use with python 

This library can also be used directly in python scripts

If your input is tabular data (e.g., csv):

 ```python
from dp_cgans import DP_CGAN
import pandas as pd

tabular_data=pd.read_csv("../resources/example_tabular_data_UCIAdult.csv")

# We adjusted the original CTGAN model from SDV. Instead of looking at the distribution of individual variable, we extended to two variables and keep their corrll
model = DP_CGAN(
    epochs=100, # number of training epochs
    batch_size=1000, # the size of each batch
    log_frequency=True,
    verbose=True,
    generator_dim=(128, 128, 128),
    discriminator_dim=(128, 128, 128),
    generator_lr=2e-4, 
    discriminator_lr=2e-4,
    discriminator_steps=1, 
    private=False,
)

print("Start training model")
model.fit(tabular_data)
model.save("generator.pkl")

# Generate 100 synthetic rows
syn_data = model.sample(100)
syn_data.to_csv("syn_data_file.csv")
 ```

<!-- 
2. If your input data is in RDF format:

  ```python
from dp_cgans import DP_CGAN
from dp_cgans import RDF_to_Tabular

# Step 1. Load RDF to a plain table (dataframe)
plain_tabular=RDF_to_Tabular(file_path="../resources/example_rdf_data.ttl")

# Step 2. Convert plain table to a structured table 
# After step 1, RDF data will be converted a plain tabular dataset (all the nodes/entities will be presented as rows. Step 2 will structure the table by recognizing and sorting the types of the entities, replacing the URI with actual value which is attached to that URI. Users can decide how many levels they want to unfold their RDF models to tabular datasets.)
tabular_data,rel_pred_obj=plain_tabular.fit_convert(user_define_data_instance="http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#C16960", 
                                                    user_define_is_a=["rdf:type{URIRef}"], 
                                                    user_define_has_value=["http://www.cancerdata.org/roo/P100042"], 
                                                    set_level="full", 
                                                    as_column='object', 
                                                    remove_columns_unique_values=True)

# Step 3. Define your GANS model
model = DP_CGAN(
    epochs=100, # number of training epochs
    batch_size=1000, # the size of each batch
    log_frequency=True,
    verbose=True,
    generator_dim=(128, 128, 128),
    discriminator_dim=(128, 128, 128),
    generator_lr=2e-4, 
    discriminator_lr=2e-4,
    discriminator_steps=1, 
    private=False,
)

print("Start training model")
model.fit(tabular_data)

# Sample the generated synthetic data
model.sample(100)
  ```
-->


## 🧑‍💻 Development setup


For development, we recommend to install and use [Hatch](https://hatch.pypa.io/latest/), as it will automatically install and sync the dependencies when running development scripts. But you can also directly create a virtual environment and install the library with `pip install -e .`

### Install

Clone the repository:

```bash
git clone https://github.com/sunchang0124/dp_cgans
cd dp_cgans
```

> When working in development the `hatch` tool will automatically install and sync the dependencies when running a script. But you can also directly 

### Run

Run the library with the CLI:

```bash
hatch -v run dp-cgans gen --help
```

You can also enter a new shell with the virtual environments automatically activated:

```bash
hatch shell
dp-cgans gen --help
```

### Tests

Run the tests locally:

```bash
hatch run pytest -s
```

### Format

Run formatting and linting (black and ruff):

```bash
hatch run fmt
```

### Reset the virtual environments

In case the virtual environments is not updating as expected you can easily reset it with:

```bash
hatch env prune
```

## 📦️ New release process

The deployment of new releases is done automatically by a GitHub Action workflow when a new release is created on GitHub. To release a new version:

1. Make sure the `PYPI_API_TOKEN` secret has been defined in the GitHub repository (in Settings > Secrets > Actions). You can get an API token from PyPI [here](https://pypi.org/manage/account/).

2. Increment the `version` number in `src/dp_cgans/__init__.py` file:

   ```bash
   hatch version fix    # Bump from 0.0.1 to 0.0.2
   hatch version minor  # Bump from 0.0.1 to 0.1.0
   hatch version 0.1.1  # Bump to the specified version
   ```

3. Create a new release on GitHub, which will automatically trigger the publish workflow, and publish the new release to PyPI.

You can also manually build and publish from you laptop:

```bash
hatch build
hatch publish
```

## 📚️ References / Further reading 

There are many excellent work on generating synthetic data using GANS and other methods. We list the studies that made great conbributions for the field and inspiring for our work.

##### GANS

   1. Synthetic Data Vault (SDV) [[Paper](https://dai.lids.mit.edu/wp-content/uploads/2018/03/SDV.pdf)] [[Github](https://github.com/sdv-dev/SDV)]
   2. Modeling Tabular Data using Conditional GAN (a part of SDV) [[Paper](https://arxiv.org/abs/1907.00503)] [[Github](https://github.com/sdv-dev/CTGAN)]
   3. Wasserstein GAN [[Paper](https://arxiv.org/pdf/1701.07875.pdf)]
   4. Improved Training of Wasserstein GANs [[Paper](https://papers.nips.cc/paper/2017/file/892c3b1c6dccd52936e27cbd0ff683d6-Paper.pdf)]
   5. Synthesising Tabular Data using Wasserstein Conditional GANs with Gradient Penalty (WCGAN-GP) [[Paper](http://ceur-ws.org/Vol-2771/AICS2020_paper_57.pdf)]
   6. PacGAN: The power of two samples in generative adversarial networks [[Paper](https://proceedings.neurips.cc/paper/2018/file/288cc0ff022877bd3df94bc9360b9c5d-Paper.pdf)]
   7. CTAB-GAN: Effective Table Data Synthesizing [[Paper](https://arxiv.org/pdf/2102.08369.pdf)]
   8. Conditional Tabular GAN-Based Two-Stage Data Generation Scheme for Short-Term Load Forecasting [[Paper](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9253644)]
   9. TabFairGAN: Fair Tabular Data Generation with Generative Adversarial Networks [[Paper](https://arxiv.org/pdf/2109.00666.pdf)]
   10. Conditional Wasserstein GAN-based Oversampling of Tabular Data for Imbalanced Learning [[Paper](https://arxiv.org/pdf/2008.09202.pdf)]

   ##### Differential Privacy

   1. Tensorflow Privacy [[Github](https://github.com/tensorflow/privacy)]
   2. Renyi Differential Privacy [[Paper](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/46029.pdf)]
   3. DP-CGAN : Differentially Private Synthetic Data and Label Generation [[Paper](https://arxiv.org/pdf/2001.09700.pdf)]
   4. Differentially Private Generative Adversarial Network [[Paper](https://arxiv.org/pdf/1802.06739.pdf)] [[Github](https://github.com/illidanlab/dpgan)] Another implementation [[Github](https://github.com/civisanalytics/dpwgan)]
   5. Private Data Generation Toolbox [[Github](https://github.com/BorealisAI/private-data-generation)]
   6. autodp: Automating differential privacy computation [[Github](https://github.com/yuxiangw/autodp)]
   7. Differentially Private Synthetic Medical Data Generation using Convolutional GANs [[Paper](https://arxiv.org/pdf/2012.11774.pdf)]
   8. DTGAN: Differential Private Training for Tabular GANs [[Paper](https://arxiv.org/pdf/2107.02521.pdf)]
   9. DIFFERENTIALLY PRIVATE SYNTHETIC DATA: APPLIED EVALUATIONS AND ENHANCEMENTS [[Paper](https://arxiv.org/pdf/2011.05537.pdf)]
   10. FFPDG: FAST, FAIR AND PRIVATE DATA GENERATION [[Paper](https://sdg-quality-privacy-bias.github.io/papers/SDG_paper_19.pdf)]

##### Others

   1. EvoGen: a Generator for Synthetic Versioned RDF [[Paper](http://ceur-ws.org/Vol-1558/paper9.pdf)]
   2. Generation and evaluation of synthetic patient data [[Paper](https://bmcmedresmethodol.biomedcentral.com/track/pdf/10.1186/s12874-020-00977-1.pdf)]
   3. Fake It Till You Make It: Guidelines for Effective Synthetic Data Generation [[Paper](https://www.mdpi.com/2076-3417/11/5/2158)]
   4. Generating and evaluating cross-sectional synthetic electronic healthcare data: Preserving data utility and patient privacy [[Paper](https://onlinelibrary.wiley.com/doi/epdf/10.1111/coin.12427)]
   5. Synthetic data for open and reproducible methodological research in social sciences and official statistics [[Paper](https://link.springer.com/article/10.1007/s11943-017-0214-8#Sec2)]
   6. A Study of the Impact of Synthetic Data Generation Techniques on Data Utility using the 1991 UK Samples of Anonymised Records [[Paper](https://unece.org/fileadmin/DAM/stats/documents/ece/ces/ge.46/2017/4_utility_paper.pdf)]

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "dp-cgans",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "Sun Chang <sunchang0124@gmail.com>, Vincent Emonet <vincent.emonet@gmail.com>",
    "keywords": "CGAN,DP,Differential Privacy,GAN,synthetic data",
    "author": null,
    "author_email": "Sun Chang <sunchang0124@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/b9/8e/2102626bac50487da6b63e5967dd4b46a816fe19fdab1c582a90975d5aa8/dp_cgans-0.0.6.tar.gz",
    "platform": null,
    "description": "# \ud83d\udc6f DP-CGANS (Differentially Private - Conditional Generative Adversarial NetworkS)\n\n[![PyPi Shield](https://img.shields.io/pypi/v/dp-cgans)](https://pypi.org/project/dp-cgans/) [![Py versions](https://img.shields.io/pypi/pyversions/dp-cgans)](https://pypi.org/project/dp-cgans/) [![Test package](https://github.com/sunchang0124/dp_cgans/actions/workflows/test.yml/badge.svg)](https://github.com/sunchang0124/dp_cgans/actions/workflows/test.yml) [![Publish package](https://github.com/sunchang0124/dp_cgans/actions/workflows/publish.yml/badge.svg)](https://github.com/sunchang0124/dp_cgans/actions/workflows/publish.yml)\n\n\n\n<!-- [![Development Status](https://img.shields.io/badge/Development%20Status-2%20--%20Pre--Alpha-yellow)](https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha) -->\n<!-- [![PyPi Shield](https://img.shields.io/badge/pypi-v0.0.2-blue)](https://pypi.org/project/dp-cgans/) -->\n<!-- [![Tests](https://github.com/sdv-dev/SDV/workflows/Run%20Tests/badge.svg)](https://github.com/sdv-dev/SDV/actions?query=workflow%3A%22Run+Tests%22+branch%3Amaster) -->\n\n**Abstract**: This repository presents a Conditional Generative Adversary Networks (GANs) on tabular data (and RDF data) combining with Differential Privacy techniques. Our pre-print publication: [Generating synthetic personal health data using conditional generative adversarial networks combining with differential privacy](https://doi.org/10.1016/j.jbi.2023.104404).\n\n**Author**: Chang Sun, Institute of Data Science, Maastricht University\n**Start date**: Nov-2021\n**Status**: Under development\n\n**Note**: \"Standing on the shoulders of giants\". This repository is inspired by the excellent work of [CTGAN](https://github.com/sdv-dev/CTGAN) from [Synthetic Data Vault (SDV)](https://github.com/sdv-dev/SDV), [Tensorflow Privacy](https://github.com/tensorflow/privacy), and [RdfPdans](https://github.com/cadmiumkitty/rdfpandas). Highly appreciate they shared the ideas and implementations, made code publicly available, well-written documentation. More related work can be found in the References below.  \n\nThis package is extended from SDV (https://github.com/sdv-dev/SDV), CTGAN (https://github.com/sdv-dev/CTGAN), and Differential Privacy in GANs (https://github.com/civisanalytics/dpwgan). The author modified the conditional matrix and cost functions to emphasize on the relations between variables. The main changes are in ctgan/synthesizers/ctgan.py ../data_sampler.py ../data_transformer.py\n\n\n## \ud83d\udce5\ufe0f Installation\n\nYou will need Python >=3.8+ and <3.10\n\n```shell\npip install dp-cgans\n```\n\n## \ud83e\ude84 Usage\n\n### \u2328\ufe0f Use as a command-line interface\n\nYou can easily generate synthetic data for a file using your terminal after installing `dp-cgans` with pip.\n\nTo quickly run our example, you can download the [example data](https://raw.githubusercontent.com/sunchang0124/dp_cgans/main/resources/example_tabular_data_UCIAdult.csv):\n\n```bash\nwget https://raw.githubusercontent.com/sunchang0124/dp_cgans/main/resources/example_tabular_data_UCIAdult.csv\n```\n\nThen run `dp-cgans`:\n\n```bash\ndp-cgans gen example_tabular_data_UCIAdult.csv --epochs 2 --output out.csv --gen-size 100\n```\n\nGet a full rundown of the available options for generating synthetic data with:\n\n```bash\ndp-cgans gen --help\n```\n\n### \ud83d\udc0d Use with python \n\nThis library can also be used directly in python scripts\n\nIf your input is tabular data (e.g., csv):\n\n ```python\nfrom dp_cgans import DP_CGAN\nimport pandas as pd\n\ntabular_data=pd.read_csv(\"../resources/example_tabular_data_UCIAdult.csv\")\n\n# We adjusted the original CTGAN model from SDV. Instead of looking at the distribution of individual variable, we extended to two variables and keep their corrll\nmodel = DP_CGAN(\n    epochs=100, # number of training epochs\n    batch_size=1000, # the size of each batch\n    log_frequency=True,\n    verbose=True,\n    generator_dim=(128, 128, 128),\n    discriminator_dim=(128, 128, 128),\n    generator_lr=2e-4, \n    discriminator_lr=2e-4,\n    discriminator_steps=1, \n    private=False,\n)\n\nprint(\"Start training model\")\nmodel.fit(tabular_data)\nmodel.save(\"generator.pkl\")\n\n# Generate 100 synthetic rows\nsyn_data = model.sample(100)\nsyn_data.to_csv(\"syn_data_file.csv\")\n ```\n\n<!-- \n2. If your input data is in RDF format:\n\n  ```python\nfrom dp_cgans import DP_CGAN\nfrom dp_cgans import RDF_to_Tabular\n\n# Step 1. Load RDF to a plain table (dataframe)\nplain_tabular=RDF_to_Tabular(file_path=\"../resources/example_rdf_data.ttl\")\n\n# Step 2. Convert plain table to a structured table \n# After step 1, RDF data will be converted a plain tabular dataset (all the nodes/entities will be presented as rows. Step 2 will structure the table by recognizing and sorting the types of the entities, replacing the URI with actual value which is attached to that URI. Users can decide how many levels they want to unfold their RDF models to tabular datasets.)\ntabular_data,rel_pred_obj=plain_tabular.fit_convert(user_define_data_instance=\"http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#C16960\", \n                                                    user_define_is_a=[\"rdf:type{URIRef}\"], \n                                                    user_define_has_value=[\"http://www.cancerdata.org/roo/P100042\"], \n                                                    set_level=\"full\", \n                                                    as_column='object', \n                                                    remove_columns_unique_values=True)\n\n# Step 3. Define your GANS model\nmodel = DP_CGAN(\n    epochs=100, # number of training epochs\n    batch_size=1000, # the size of each batch\n    log_frequency=True,\n    verbose=True,\n    generator_dim=(128, 128, 128),\n    discriminator_dim=(128, 128, 128),\n    generator_lr=2e-4, \n    discriminator_lr=2e-4,\n    discriminator_steps=1, \n    private=False,\n)\n\nprint(\"Start training model\")\nmodel.fit(tabular_data)\n\n# Sample the generated synthetic data\nmodel.sample(100)\n  ```\n-->\n\n\n## \ud83e\uddd1\u200d\ud83d\udcbb Development setup\n\n\nFor development, we recommend to install and use [Hatch](https://hatch.pypa.io/latest/), as it will automatically install and sync the dependencies when running development scripts. But you can also directly create a virtual environment and install the library with `pip install -e .`\n\n### Install\n\nClone the repository:\n\n```bash\ngit clone https://github.com/sunchang0124/dp_cgans\ncd dp_cgans\n```\n\n> When working in development the `hatch` tool will automatically install and sync the dependencies when running a script. But you can also directly \n\n### Run\n\nRun the library with the CLI:\n\n```bash\nhatch -v run dp-cgans gen --help\n```\n\nYou can also enter a new shell with the virtual environments automatically activated:\n\n```bash\nhatch shell\ndp-cgans gen --help\n```\n\n### Tests\n\nRun the tests locally:\n\n```bash\nhatch run pytest -s\n```\n\n### Format\n\nRun formatting and linting (black and ruff):\n\n```bash\nhatch run fmt\n```\n\n### Reset the virtual environments\n\nIn case the virtual environments is not updating as expected you can easily reset it with:\n\n```bash\nhatch env prune\n```\n\n## \ud83d\udce6\ufe0f New release process\n\nThe deployment of new releases is done automatically by a GitHub Action workflow when a new release is created on GitHub. To release a new version:\n\n1. Make sure the `PYPI_API_TOKEN` secret has been defined in the GitHub repository (in Settings > Secrets > Actions). You can get an API token from PyPI [here](https://pypi.org/manage/account/).\n\n2. Increment the `version` number in `src/dp_cgans/__init__.py` file:\n\n   ```bash\n   hatch version fix    # Bump from 0.0.1 to 0.0.2\n   hatch version minor  # Bump from 0.0.1 to 0.1.0\n   hatch version 0.1.1  # Bump to the specified version\n   ```\n\n3. Create a new release on GitHub, which will automatically trigger the publish workflow, and publish the new release to PyPI.\n\nYou can also manually build and publish from you laptop:\n\n```bash\nhatch build\nhatch publish\n```\n\n## \ud83d\udcda\ufe0f References / Further reading \n\nThere are many excellent work on generating synthetic data using GANS and other methods. We list the studies that made great conbributions for the field and inspiring for our work.\n\n##### GANS\n\n   1. Synthetic Data Vault (SDV) [[Paper](https://dai.lids.mit.edu/wp-content/uploads/2018/03/SDV.pdf)] [[Github](https://github.com/sdv-dev/SDV)]\n   2. Modeling Tabular Data using Conditional GAN (a part of SDV) [[Paper](https://arxiv.org/abs/1907.00503)] [[Github](https://github.com/sdv-dev/CTGAN)]\n   3. Wasserstein GAN [[Paper](https://arxiv.org/pdf/1701.07875.pdf)]\n   4. Improved Training of Wasserstein GANs [[Paper](https://papers.nips.cc/paper/2017/file/892c3b1c6dccd52936e27cbd0ff683d6-Paper.pdf)]\n   5. Synthesising Tabular Data using Wasserstein Conditional GANs with Gradient Penalty (WCGAN-GP) [[Paper](http://ceur-ws.org/Vol-2771/AICS2020_paper_57.pdf)]\n   6. PacGAN: The power of two samples in generative adversarial networks [[Paper](https://proceedings.neurips.cc/paper/2018/file/288cc0ff022877bd3df94bc9360b9c5d-Paper.pdf)]\n   7. CTAB-GAN: Effective Table Data Synthesizing [[Paper](https://arxiv.org/pdf/2102.08369.pdf)]\n   8. Conditional Tabular GAN-Based Two-Stage Data Generation Scheme for Short-Term Load Forecasting [[Paper](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9253644)]\n   9. TabFairGAN: Fair Tabular Data Generation with Generative Adversarial Networks [[Paper](https://arxiv.org/pdf/2109.00666.pdf)]\n   10. Conditional Wasserstein GAN-based Oversampling of Tabular Data for Imbalanced Learning [[Paper](https://arxiv.org/pdf/2008.09202.pdf)]\n\n   ##### Differential Privacy\n\n   1. Tensorflow Privacy [[Github](https://github.com/tensorflow/privacy)]\n   2. Renyi Differential Privacy [[Paper](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/46029.pdf)]\n   3. DP-CGAN : Differentially Private Synthetic Data and Label Generation [[Paper](https://arxiv.org/pdf/2001.09700.pdf)]\n   4. Differentially Private Generative Adversarial Network [[Paper](https://arxiv.org/pdf/1802.06739.pdf)] [[Github](https://github.com/illidanlab/dpgan)] Another implementation [[Github](https://github.com/civisanalytics/dpwgan)]\n   5. Private Data Generation Toolbox [[Github](https://github.com/BorealisAI/private-data-generation)]\n   6. autodp: Automating differential privacy computation [[Github](https://github.com/yuxiangw/autodp)]\n   7. Differentially Private Synthetic Medical Data Generation using Convolutional GANs [[Paper](https://arxiv.org/pdf/2012.11774.pdf)]\n   8. DTGAN: Differential Private Training for Tabular GANs [[Paper](https://arxiv.org/pdf/2107.02521.pdf)]\n   9. DIFFERENTIALLY PRIVATE SYNTHETIC DATA: APPLIED EVALUATIONS AND ENHANCEMENTS [[Paper](https://arxiv.org/pdf/2011.05537.pdf)]\n   10. FFPDG: FAST, FAIR AND PRIVATE DATA GENERATION [[Paper](https://sdg-quality-privacy-bias.github.io/papers/SDG_paper_19.pdf)]\n\n##### Others\n\n   1. EvoGen: a Generator for Synthetic Versioned RDF [[Paper](http://ceur-ws.org/Vol-1558/paper9.pdf)]\n   2. Generation and evaluation of synthetic patient data [[Paper](https://bmcmedresmethodol.biomedcentral.com/track/pdf/10.1186/s12874-020-00977-1.pdf)]\n   3. Fake It Till You Make It: Guidelines for Effective Synthetic Data Generation [[Paper](https://www.mdpi.com/2076-3417/11/5/2158)]\n   4. Generating and evaluating cross-sectional synthetic electronic healthcare data: Preserving data utility and patient privacy [[Paper](https://onlinelibrary.wiley.com/doi/epdf/10.1111/coin.12427)]\n   5. Synthetic data for open and reproducible methodological research in social sciences and official statistics [[Paper](https://link.springer.com/article/10.1007/s11943-017-0214-8#Sec2)]\n   6. A Study of the Impact of Synthetic Data Generation Techniques on Data Utility using the 1991 UK Samples of Anonymised Records [[Paper](https://unece.org/fileadmin/DAM/stats/documents/ece/ces/ge.46/2017/4_utility_paper.pdf)]\n",
    "bugtrack_url": null,
    "license": "MIT License\n        \n        Copyright (c) 2023-present Sun Chang <sunchang0124@gmail.com>\n        \n        Permission is hereby granted, free of charge, to any person obtaining a copy\n        of this software and associated documentation files (the \"Software\"), to deal\n        in the Software without restriction, including without limitation the rights\n        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\n        copies of the Software, and to permit persons to whom the Software is\n        furnished to do so, subject to the following conditions:\n        \n        The above copyright notice and this permission notice shall be included in all\n        copies or substantial portions of the Software.\n        \n        THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\n        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\n        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\n        SOFTWARE.",
    "summary": "A library to generate synthetic tabular or RDF data using Conditional Generative Adversary Networks (GANs) combined with Differential Privacy techniques.",
    "version": "0.0.6",
    "project_urls": {
        "Documentation": "https://github.com/sunchang0124/dp_cgans",
        "History": "https://github.com/sunchang0124/dp_cgans/releases",
        "Homepage": "https://github.com/sunchang0124/dp_cgans",
        "Source": "https://github.com/sunchang0124/dp_cgans",
        "Tracker": "https://github.com/sunchang0124/dp_cgans/issues"
    },
    "split_keywords": [
        "cgan",
        "dp",
        "differential privacy",
        "gan",
        "synthetic data"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "3d8ec76535aa3f4847bb2c16fb5b852b8f25a05b5f76531e73b090c62893d629",
                "md5": "cd35dac979a3e8b8f3f5cbadee99bf20",
                "sha256": "c8dd9896cc729bcd0ee9a19414d15bddd36e75d086cae0c3ec88b8c978fd48bb"
            },
            "downloads": -1,
            "filename": "dp_cgans-0.0.6-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "cd35dac979a3e8b8f3f5cbadee99bf20",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 72687,
            "upload_time": "2023-12-04T12:26:44",
            "upload_time_iso_8601": "2023-12-04T12:26:44.122351Z",
            "url": "https://files.pythonhosted.org/packages/3d/8e/c76535aa3f4847bb2c16fb5b852b8f25a05b5f76531e73b090c62893d629/dp_cgans-0.0.6-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "b98e2102626bac50487da6b63e5967dd4b46a816fe19fdab1c582a90975d5aa8",
                "md5": "71f583abdf33175aa7e94a25792219ec",
                "sha256": "22533a11c66b749c12b3db8733a5233c5abffb044846daeb507acd1f189bcfc9"
            },
            "downloads": -1,
            "filename": "dp_cgans-0.0.6.tar.gz",
            "has_sig": false,
            "md5_digest": "71f583abdf33175aa7e94a25792219ec",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 99713,
            "upload_time": "2023-12-04T12:26:45",
            "upload_time_iso_8601": "2023-12-04T12:26:45.700905Z",
            "url": "https://files.pythonhosted.org/packages/b9/8e/2102626bac50487da6b63e5967dd4b46a816fe19fdab1c582a90975d5aa8/dp_cgans-0.0.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-12-04 12:26:45",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "sunchang0124",
    "github_project": "dp_cgans",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "dp-cgans"
}
        
Elapsed time: 2.33159s