DataSynthesizer


NameDataSynthesizer JSON
Version 0.1.13 PyPI version JSON
download
home_pagehttps://github.com/DataResponsibly/DataSynthesizer
SummaryGenerate synthetic data that simulate a given dataset.
upload_time2023-10-18 20:58:32
maintainer
docs_urlNone
authorData, Responsibly
requires_python>=3.7
licenseMIT license
keywords datasynthesizer
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI
coveralls test coverage No coveralls.
            [![PyPi Shield](https://img.shields.io/pypi/v/DataSynthesizer.svg)](https://pypi.python.org/pypi/DataSynthesizer) [![Travis CI Shield](https://travis-ci.com/DataResponsibly/DataSynthesizer.svg?branch=master)](https://travis-ci.com/DataResponsibly/DataSynthesizer)

# DataSynthesizer

DataSynthesizer generates synthetic data that simulates a given dataset.

> It aims to facilitate the collaborations between data scientists and owners of sensitive data. It applies Differential Privacy techniques to achieve strong privacy guarantee.
>
> For more details, please refer to [DataSynthesizer: Privacy-Preserving Synthetic Datasets](docs/cr-datasynthesizer-privacy.pdf)

### Install DataSynthesizer

```bash
pip install DataSynthesizer
```

### Usage

##### Assumptions for the Input Dataset

1. The input dataset is a table in first normal form ([1NF](https://en.wikipedia.org/wiki/First_normal_form)).
2. When implementing differential privacy, DataSynthesizer injects noises into the statistics within **active domain** that are the values presented in the table.

##### Use Jupyter Notebook

After installing DataSynthesizer and [Jupyter Notebook](https://jupyter.org/install), open and try the demos in `./notebooks/`

- [DataSynthesizer__random_mode.ipynb](notebooks/DataSynthesizer__random_mode.ipynb)
- [DataSynthesizer__independent_attribute_mode.ipynb](notebooks/DataSynthesizer__independent_attribute_mode.ipynb)
- [DataSynthesizer__correlated_attribute_mode.ipynb](notebooks/DataSynthesizer__correlated_attribute_mode.ipynb)

##### Use Web UI

The [dataResponsiblyUI](https://github.com/DataResponsibly/dataResponsiblyUI) is a Django project that includes DataSynthesizer. Please follow the steps in [Run the Web UIs locally](https://github.com/DataResponsibly/dataResponsiblyUI#run-the-web-uis-locally) and run DataSynthesizer by visiting http://127.0.0.1:8000/synthesizer in a browser.



# History

## 0.1.0 - 2020-06-11

* First release on PyPI.

## 0.1.1 - 2020-07-05

### Bugs Fixed

* Numpy error when synthesising data with unique identifiers. - [Issue #23](https://github.com/DataResponsibly/DataSynthesizer/issues/23) by @raids

## 0.1.2 - 2020-07-19

### Bugs Fixed

* infer_distribution() for string attributes fails to sort index of varying types. - [Issue #24](https://github.com/DataResponsibly/DataSynthesizer/issues/24) by @raids

## 0.1.3 - 2020-09-13

### Bugs Fixed

* The dataframes are not appended into the full space in get_noisy_distribution_of_attributes(). - [Issue #26](https://github.com/DataResponsibly/DataSynthesizer/issues/26) by @zjroth

## 0.1.4 - 2021-01-14

### Bugs Fixed

* Fix a bug in candidate key identification.

## 0.1.5 - 2021-03-11

### What's New

* Downgrade required Python from >=3.8 to >=3.7.

## 0.1.6 - 2021-03-11

### What's New

* Update example notebooks.

## 0.1.7 - 2021-03-31

### Bugs Fixed

* Fixed an error in Laplace noise parameter. - [Issue #34](https://github.com/DataResponsibly/DataSynthesizer/issues/34) by @ganevgv

## 0.1.8 - 2021-04-09

### Bugs Fixed

* The randomness seeding is effective across the entire project now.

## 0.1.9 - 2021-07-18

### Bugs Fixed

* Optimized the datetime datatype detection.

## 0.1.10 - 2021-11-15

### Bugs Fixed

* Seed the randomness in `greedy_bayes()`.

## 0.1.11 - 2022-03-31

### Bugs Fixed

* Fixed a bug in DateTime generation. - [Issue #37](https://github.com/DataResponsibly/DataSynthesizer/issues/37) by @artemgur

## 0.1.12 - 2023-10-17

### Bugs Fixed

* Support Python 3.11+ and pandas 2.0+. - [Issue #40](https://github.com/DataResponsibly/DataSynthesizer/issues/41) by @artemgur
* Added empty file creation before saving files. - [Issue #41](https://github.com/DataResponsibly/DataSynthesizer/issues/41) by @PepijndeReus

## 0.1.13 - 2023-10-18

### Bugs Fixed

* Support pandas 2.0+.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/DataResponsibly/DataSynthesizer",
    "name": "DataSynthesizer",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "DataSynthesizer",
    "author": "Data, Responsibly",
    "author_email": "dataresponsibly@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/5e/b4/051fea17ca58bdef3538772fc59115b8bc868eecf81582f6b1b9162854d8/DataSynthesizer-0.1.13.tar.gz",
    "platform": null,
    "description": "[![PyPi Shield](https://img.shields.io/pypi/v/DataSynthesizer.svg)](https://pypi.python.org/pypi/DataSynthesizer) [![Travis CI Shield](https://travis-ci.com/DataResponsibly/DataSynthesizer.svg?branch=master)](https://travis-ci.com/DataResponsibly/DataSynthesizer)\n\n# DataSynthesizer\n\nDataSynthesizer generates synthetic data that simulates a given dataset.\n\n> It aims to facilitate the collaborations between data scientists and owners of sensitive data. It applies Differential Privacy techniques to achieve strong privacy guarantee.\n>\n> For more details, please refer to [DataSynthesizer: Privacy-Preserving Synthetic Datasets](docs/cr-datasynthesizer-privacy.pdf)\n\n### Install DataSynthesizer\n\n```bash\npip install DataSynthesizer\n```\n\n### Usage\n\n##### Assumptions for the Input Dataset\n\n1. The input dataset is a table in first normal form ([1NF](https://en.wikipedia.org/wiki/First_normal_form)).\n2. When implementing differential privacy, DataSynthesizer injects noises into the statistics within **active domain** that are the values presented in the table.\n\n##### Use Jupyter Notebook\n\nAfter installing DataSynthesizer and [Jupyter Notebook](https://jupyter.org/install), open and try the demos in `./notebooks/`\n\n- [DataSynthesizer__random_mode.ipynb](notebooks/DataSynthesizer__random_mode.ipynb)\n- [DataSynthesizer__independent_attribute_mode.ipynb](notebooks/DataSynthesizer__independent_attribute_mode.ipynb)\n- [DataSynthesizer__correlated_attribute_mode.ipynb](notebooks/DataSynthesizer__correlated_attribute_mode.ipynb)\n\n##### Use Web UI\n\nThe [dataResponsiblyUI](https://github.com/DataResponsibly/dataResponsiblyUI) is a Django project that includes DataSynthesizer. Please follow the steps in [Run the Web UIs locally](https://github.com/DataResponsibly/dataResponsiblyUI#run-the-web-uis-locally) and run DataSynthesizer by visiting http://127.0.0.1:8000/synthesizer in a browser.\n\n\n\n# History\n\n## 0.1.0 - 2020-06-11\n\n* First release on PyPI.\n\n## 0.1.1 - 2020-07-05\n\n### Bugs Fixed\n\n* Numpy error when synthesising data with unique identifiers. - [Issue #23](https://github.com/DataResponsibly/DataSynthesizer/issues/23) by @raids\n\n## 0.1.2 - 2020-07-19\n\n### Bugs Fixed\n\n* infer_distribution() for string attributes fails to sort index of varying types. - [Issue #24](https://github.com/DataResponsibly/DataSynthesizer/issues/24) by @raids\n\n## 0.1.3 - 2020-09-13\n\n### Bugs Fixed\n\n* The dataframes are not appended into the full space in get_noisy_distribution_of_attributes(). - [Issue #26](https://github.com/DataResponsibly/DataSynthesizer/issues/26) by @zjroth\n\n## 0.1.4 - 2021-01-14\n\n### Bugs Fixed\n\n* Fix a bug in candidate key identification.\n\n## 0.1.5 - 2021-03-11\n\n### What's New\n\n* Downgrade required Python from >=3.8 to >=3.7.\n\n## 0.1.6 - 2021-03-11\n\n### What's New\n\n* Update example notebooks.\n\n## 0.1.7 - 2021-03-31\n\n### Bugs Fixed\n\n* Fixed an error in Laplace noise parameter. - [Issue #34](https://github.com/DataResponsibly/DataSynthesizer/issues/34) by @ganevgv\n\n## 0.1.8 - 2021-04-09\n\n### Bugs Fixed\n\n* The randomness seeding is effective across the entire project now.\n\n## 0.1.9 - 2021-07-18\n\n### Bugs Fixed\n\n* Optimized the datetime datatype detection.\n\n## 0.1.10 - 2021-11-15\n\n### Bugs Fixed\n\n* Seed the randomness in `greedy_bayes()`.\n\n## 0.1.11 - 2022-03-31\n\n### Bugs Fixed\n\n* Fixed a bug in DateTime generation. - [Issue #37](https://github.com/DataResponsibly/DataSynthesizer/issues/37) by @artemgur\n\n## 0.1.12 - 2023-10-17\n\n### Bugs Fixed\n\n* Support Python 3.11+ and pandas 2.0+. - [Issue #40](https://github.com/DataResponsibly/DataSynthesizer/issues/41) by @artemgur\n* Added empty file creation before saving files. - [Issue #41](https://github.com/DataResponsibly/DataSynthesizer/issues/41) by @PepijndeReus\n\n## 0.1.13 - 2023-10-18\n\n### Bugs Fixed\n\n* Support pandas 2.0+.\n",
    "bugtrack_url": null,
    "license": "MIT license",
    "summary": "Generate synthetic data that simulate a given dataset.",
    "version": "0.1.13",
    "project_urls": {
        "Homepage": "https://github.com/DataResponsibly/DataSynthesizer"
    },
    "split_keywords": [
        "datasynthesizer"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e9f512b5bdff0f3e2cf4ab6f991e5307c643abe6a4d72b8a78f754b59a074c6d",
                "md5": "4a120db9f28c75d821928055f909a8ee",
                "sha256": "3d90b3b97c61ed08c9025a6c9bd76504394ab94a2728f8fef7bba024c2debfd4"
            },
            "downloads": -1,
            "filename": "DataSynthesizer-0.1.13-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "4a120db9f28c75d821928055f909a8ee",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": ">=3.7",
            "size": 24252,
            "upload_time": "2023-10-18T20:58:29",
            "upload_time_iso_8601": "2023-10-18T20:58:29.978293Z",
            "url": "https://files.pythonhosted.org/packages/e9/f5/12b5bdff0f3e2cf4ab6f991e5307c643abe6a4d72b8a78f754b59a074c6d/DataSynthesizer-0.1.13-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5eb4051fea17ca58bdef3538772fc59115b8bc868eecf81582f6b1b9162854d8",
                "md5": "bff72714541f03c1a34e81f7caf95feb",
                "sha256": "5be5b25969bcf5bc39bc9e8bff63593bc295f4f84b5955f3a1a66679d7c0072a"
            },
            "downloads": -1,
            "filename": "DataSynthesizer-0.1.13.tar.gz",
            "has_sig": false,
            "md5_digest": "bff72714541f03c1a34e81f7caf95feb",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 129681,
            "upload_time": "2023-10-18T20:58:32",
            "upload_time_iso_8601": "2023-10-18T20:58:32.293586Z",
            "url": "https://files.pythonhosted.org/packages/5e/b4/051fea17ca58bdef3538772fc59115b8bc868eecf81582f6b1b9162854d8/DataSynthesizer-0.1.13.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-10-18 20:58:32",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "DataResponsibly",
    "github_project": "DataSynthesizer",
    "travis_ci": true,
    "coveralls": false,
    "github_actions": false,
    "tox": true,
    "lcname": "datasynthesizer"
}
        
Elapsed time: 0.59973s