metasynth


Namemetasynth JSON
Version 0.5.0 PyPI version JSON
download
home_page
SummaryPackage for creating synthetic datasets while preserving privacy.
upload_time2023-09-18 09:36:34
maintainer
docs_urlNone
author
requires_python>=3.8
licenseMIT License Copyright (c) 2023 SoDa Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords metadata open-data privacy synthetic-data tabular datasets
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ![PyPI - Python Version](https://img.shields.io/pypi/pyversions/metasynth)
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/sodascience/metasynth/HEAD?labpath=examples%2Fgetting_started.ipynb)
[![Google Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/sodascience/metasynth/blob/main/examples/getting_started.ipynb)
[![docs](https://readthedocs.org/projects/metasynth/badge/?version=latest)](https://metasynth.readthedocs.io/en/latest/index.html)

![MetaSynth Logo](docs/source/images/logos/blue.svg)

# MetaSynth
MetaSynth is a Python package designed to generate tabular synthetic data for rigorous code testing and reproducibility.

The package has two main functionalities. First, it allows for the **creation of metadata** from an input dataset. This metadata describes the overarching structure and traits of the dataset. Second, MetaSynth allows for **generation of synthetic data** that aligns with this metadata. Instead of relying on the original dataset, the synthetic data is produced using the metadata. This approach ensures that the synthetic dataset remains separate and independent from any sensitive source data. Researchers and data owners can leverage this capability to generate and share synthetic versions of their sensitive data, mitigating privacy concerns. Furthermore, this separation between metadata and original data promotes reproducibility, as the metadata file can be easily shared and used to generate consistent synthetic datasets.


## Features
### Generating metadata from a dataset
MetaSynth can generate metadata from any given dataset (provided as polars or pandas dataframe) in the form of a MetaFrame. A MetaFrame encapsulates the structure and characteristics of each column in the original dataset (including their names, variable types, data types, proportion of missing values and distribution specifications) and serves as complete recipes for generating new synthetic data. 

MetaFrames follow the GMF standard, [Generative Metadata Format (GMF)](https://github.com/sodascience/generative_metadata_format) and as such are designed to be easy to read. MetaFrames can be exported as .JSON file allowing for manual and automatic editing, as well as easy sharing.

![Metadata_generation_flowchart](docs/source/images/flow_metadata_generation.png)

<details> 
<summary> A simple example of an exported MetaFrame (following the GMF standard): </summary>

```json
 {
    "n_rows": 5,
    "n_columns": 5,
    "provenance": {
        "created by": {
            "name": "MetaSynth",
            "version": "0.4.0"
        },
        "creation time": "2023-08-07T12:04:40.669740"
    },
    "vars": [
        {
            "name": "ID",
            "type": "discrete",
            "dtype": "Int64",
            "prop_missing": 0.0,
            "distribution": {
                "implements": "core.unique_key",
                "provenance": "builtin",
                "class_name": "UniqueKeyDistribution",
                "parameters": {
                    "low": 1,
                    "consecutive": 1
                }
            }
        },
        {
            "name": "fruits",
            "type": "categorical",
            "dtype": "Categorical",
            "prop_missing": 0.0,
            "distribution": {
                "implements": "core.multinoulli",
                "provenance": "builtin",
                "class_name": "MultinoulliDistribution",
                "parameters": {
                    "labels": [
                        "apple",
                        "banana"
                    ],
                    "probs": [
                        0.4,
                        0.6
                    ]
                }
            }
        },
        {
            "name": "B",
            "type": "discrete",
            "dtype": "Int64",
            "prop_missing": 0.0,
            "distribution": {
                "implements": "core.poisson",
                "provenance": "builtin",
                "class_name": "PoissonDistribution",
                "parameters": {
                    "mu": 3.0
                }
            }
        },
        {
            "name": "cars",
            "type": "categorical",
            "dtype": "Categorical",
            "prop_missing": 0.0,
            "distribution": {
                "implements": "core.multinoulli",
                "provenance": "builtin",
                "class_name": "MultinoulliDistribution",
                "parameters": {
                    "labels": [
                        "audi",
                        "beetle"
                    ],
                    "probs": [
                        0.2,
                        0.8
                    ]
                }
            }
        },
        {
            "name": "optional",
            "type": "discrete",
            "dtype": "Int64",
            "prop_missing": 0.2,
            "distribution": {
                "implements": "core.discrete_uniform",
                "provenance": "builtin",
                "class_name": "DiscreteUniformDistribution",
                "parameters": {
                    "low": -30,
                    "high": 301
                }
            }
        }
    ]
}
```

A more advanced example GMF, based on the [Titanic](https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/titanic.csv) dataset, can be found [here](examples/titanic_example.json)
</details>

### Generating synthetic data from a GMF file
MetaSynth can then be used to **generate synthetic data** from any GMF standard .JSON file.

![Synthetic_data_generation](docs/source/images/flow_synthetic_data_generation.png)

The generated synthetic data, emulates the original data's format and plausibility at the individual record level and attempts to reproduce marginal (univariate) distributions where possible. Generated values are based on the observed distributions while adding a degree of variance and smoothing. The generated data does **not** aim to preserve the relationships between variables. The frequency of missing values and their codes are maintained in the synthetically-augmented dataset. 

### Overview of features
- **Metadata Generation**: MetaSynth allows the extraction of metadata from a dataset provided as a Polars or Pandas dataframe. Metadata includes key characteristics such as variable names, types, data types, the percentage of missing values, and distribution attributes.
- **Synthetic Data Generation**: MetaSynth allows for the generation of a polars DataFrame with synthetic data that resembles the original data.
- **GMF Standard**: MetaSynth utilizes the Generative Metadata Format (GMF) standard for metadata export and import. 
- **Distribution Fitting**: MetaSynth allows for manual and automatic distribution fitting.
- **Data Type Support**: MetaSynth supports generating synthetic data for a variety of common data types including `categorical`, `string`, `integer`, `float`, `date`, `time`, and `datetime`.
- **Integration with Faker**: MetaSynth integrates with the [faker](https://github.com/joke2k/faker) package, a Python library for generating fake data such as names and emails. Allowing for more realistic synthetic data.    
- **Structured String Detection**: MetaSynth identifies structured strings within your dataset, which can include formatted text, codes, identifiers, or any string that follows a specific pattern.
- **Handling Unique Values**: MetaSynth can identify and process variables with unique values or keys in the data, preserving their uniqueness in the synthetic dataset, which is crucial for generating synthetic data that maintains the characteristics of the original dataset.


## Getting Started
### Try it out online
If you're new to Python or simply want to quickly explore the basic features of MetaSynth, you can try it out using the online Google Colab tutorial. [Click here](https://colab.research.google.com/github/sodascience/metasynth/blob/main/examples/getting_started.ipynb) to access the tutorial. It provides a step-by-step walkthrough and example dataset to help you get started. However, please exercise caution when using sensitive data, as it will be handled through Google servers.

### Local Installation

For more advanced users and researchers who prefer working on their local machines, you can install MetaSynth directly from PyPI using the following command in the terminal (not Python):

```sh
pip install metasynth
```

## Usage

To learn how to use MetaSynth effectively, refer to the comprehensive [documentation](https://metasynth.readthedocs.io/en/latest/index.html). The documentation covers all the necessary information and provides detailed explanations, examples, and usage guidelines.

Additionally, the documentation offers a series of [tutorials](https://metasynth.readthedocs.io/en/latest/index.html) that delve into specific features and use cases. These tutorials can further assist you in understanding and leveraging the capabilities of MetaSynth.

### Quick start
Get started quickly with MetaSynth using the following example. In this concise demonstration, you'll learn the basic functionality of MetaSynth by generating synthetic data from [titanic](https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/titanic.csv) dataset.

It is important to start by importing the appropriate libraries:

```python
# import libraries
import polars as pl
from metasynth import MetaFrame, demo_file
```

#### Generating a MetaFrame 
##### 1.  Begin by creating a polars dataframe:
```python
# import the demo csv 
dataset_csv = demo_file() # This function automatically loads the Titanic dataset (as found here )


# create dataframe
data_types = {
    "Sex": pl.Categorical,
    "Embarked": pl.Categorical,
    "Survived": pl.Categorical,
    "Pclass": pl.Categorical,
    "SibSp": pl.Categorical,
    "Parch": pl.Categorical
}

df = pl.read_csv(dataset_csv, dtypes=data_types)
```

<details>
     <summary> 
     Note on using Pandas
     </summary>
     
Internally, MetaSynth uses Polars (instead of Pandas) mainly because typing and the handling of non-existing data is more
consistent. It is possible to supply a Pandas DataFrame instead of a polars DataFrame to `MetaFrame.fit_dataframe`.
However, this uses the automatic polars conversion functionality, which for some edge cases result in problems. Therefore,
we advise users to create Polars DataFrames. The resulting synthetic dataset is always a polars dataframe, but this can
be easily converted back to a Pandas DataFrame by using `df_pandas = df_polars.to_pandas()`.
</details>

##### 2. Next, we can generate a MetaFrame from the polars DataFrame.

```python
# create a MetaFrame (mf) from the DataFrame (df)
mf = MetaFrame.fit_dataframe(df)
```

> Note: if at this point you get the following warning about a potential unique variable, do not worry, it is safe to continue.
> 
> ```
> Variable PassengerId seems unique, but not set to be unique. Set the variable to be either unique or not unique to remove this warning. warnings.warn(f"\nVariable {series.name} seems unique, but not set to be unique.\n"
> ```

##### 3. We can export this MetaFrame to a .JSON file using:

```python
#export MetaFrame
mf.to_json("exported_metaframe.json")
```

#### Generating synthetic data

##### 1. We can load metadata from a .JSON file:
```python
# load MetaFrame
mf = MetaFrame.from_json("exported_metaframe.json")
```

##### 2. We can then synthesize a DataFrame based on a loaded MetaFrame using:

```python
# synthesize a DataFrame with 5 rows of data based on a MetaFrame
synthetic_data = mf.synthesize(5) 
```



<!-- CONTRIBUTING -->
## Contributing
Contributions are what make the open source community an amazing place to learn, inspire, and create.

Any contributions you make are greatly appreciated.

To contribute:
1. Fork the Project
2. Create your Feature Branch (`git checkout -b feature/AmazingFeature`)
3. Commit your Changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the Branch (`git push origin feature/AmazingFeature`)
5. Open a Pull Request


<!-- CONTACT -->
## Contact
**MetaSynth** is a project by the [ODISSEI Social Data Science (SoDa)](https://odissei-data.nl/nl/soda/) team.
Do you have questions, suggestions, or remarks on the technical implementation? File an issue in the
issue tracker or feel free to contact [Erik-Jan van Kesteren](https://github.com/vankesteren)
or [Raoul Schram](https://github.com/qubixes).

<img src="docs/source/images/logos/soda.png" alt="SoDa logo" width="250px"/> 

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "metasynth",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "metadata,open-data,privacy,synthetic-data,tabular datasets",
    "author": "",
    "author_email": "Raoul Schram <r.d.schram@uu.nl>, Erik-Jan van Kesteren <e.vankesteren1@uu.nl>",
    "download_url": "https://files.pythonhosted.org/packages/91/8c/1df24f9004196267403fcb40de5b6cef65ebe3ceb5e73a2b46080cf6e08c/metasynth-0.5.0.tar.gz",
    "platform": null,
    "description": "![PyPI - Python Version](https://img.shields.io/pypi/pyversions/metasynth)\n[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/sodascience/metasynth/HEAD?labpath=examples%2Fgetting_started.ipynb)\n[![Google Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/sodascience/metasynth/blob/main/examples/getting_started.ipynb)\n[![docs](https://readthedocs.org/projects/metasynth/badge/?version=latest)](https://metasynth.readthedocs.io/en/latest/index.html)\n\n![MetaSynth Logo](docs/source/images/logos/blue.svg)\n\n# MetaSynth\nMetaSynth is a Python package designed to generate tabular synthetic data for rigorous code testing and reproducibility.\n\nThe package has two main functionalities. First, it allows for the **creation of metadata** from an input dataset. This metadata describes the overarching structure and traits of the dataset. Second, MetaSynth allows for **generation of synthetic data** that aligns with this metadata. Instead of relying on the original dataset, the synthetic data is produced using the metadata. This approach ensures that the synthetic dataset remains separate and independent from any sensitive source data. Researchers and data owners can leverage this capability to generate and share synthetic versions of their sensitive data, mitigating privacy concerns. Furthermore, this separation between metadata and original data promotes reproducibility, as the metadata file can be easily shared and used to generate consistent synthetic datasets.\n\n\n## Features\n### Generating metadata from a dataset\nMetaSynth can generate metadata from any given dataset (provided as polars or pandas dataframe) in the form of a MetaFrame. A MetaFrame encapsulates the structure and characteristics of each column in the original dataset (including their names, variable types, data types, proportion of missing values and distribution specifications) and serves as complete recipes for generating new synthetic data. \n\nMetaFrames follow the GMF standard, [Generative Metadata Format (GMF)](https://github.com/sodascience/generative_metadata_format) and as such are designed to be easy to read. MetaFrames can be exported as .JSON file allowing for manual and automatic editing, as well as easy sharing.\n\n![Metadata_generation_flowchart](docs/source/images/flow_metadata_generation.png)\n\n<details> \n<summary> A simple example of an exported MetaFrame (following the GMF standard): </summary>\n\n```json\n {\n    \"n_rows\": 5,\n    \"n_columns\": 5,\n    \"provenance\": {\n        \"created by\": {\n            \"name\": \"MetaSynth\",\n            \"version\": \"0.4.0\"\n        },\n        \"creation time\": \"2023-08-07T12:04:40.669740\"\n    },\n    \"vars\": [\n        {\n            \"name\": \"ID\",\n            \"type\": \"discrete\",\n            \"dtype\": \"Int64\",\n            \"prop_missing\": 0.0,\n            \"distribution\": {\n                \"implements\": \"core.unique_key\",\n                \"provenance\": \"builtin\",\n                \"class_name\": \"UniqueKeyDistribution\",\n                \"parameters\": {\n                    \"low\": 1,\n                    \"consecutive\": 1\n                }\n            }\n        },\n        {\n            \"name\": \"fruits\",\n            \"type\": \"categorical\",\n            \"dtype\": \"Categorical\",\n            \"prop_missing\": 0.0,\n            \"distribution\": {\n                \"implements\": \"core.multinoulli\",\n                \"provenance\": \"builtin\",\n                \"class_name\": \"MultinoulliDistribution\",\n                \"parameters\": {\n                    \"labels\": [\n                        \"apple\",\n                        \"banana\"\n                    ],\n                    \"probs\": [\n                        0.4,\n                        0.6\n                    ]\n                }\n            }\n        },\n        {\n            \"name\": \"B\",\n            \"type\": \"discrete\",\n            \"dtype\": \"Int64\",\n            \"prop_missing\": 0.0,\n            \"distribution\": {\n                \"implements\": \"core.poisson\",\n                \"provenance\": \"builtin\",\n                \"class_name\": \"PoissonDistribution\",\n                \"parameters\": {\n                    \"mu\": 3.0\n                }\n            }\n        },\n        {\n            \"name\": \"cars\",\n            \"type\": \"categorical\",\n            \"dtype\": \"Categorical\",\n            \"prop_missing\": 0.0,\n            \"distribution\": {\n                \"implements\": \"core.multinoulli\",\n                \"provenance\": \"builtin\",\n                \"class_name\": \"MultinoulliDistribution\",\n                \"parameters\": {\n                    \"labels\": [\n                        \"audi\",\n                        \"beetle\"\n                    ],\n                    \"probs\": [\n                        0.2,\n                        0.8\n                    ]\n                }\n            }\n        },\n        {\n            \"name\": \"optional\",\n            \"type\": \"discrete\",\n            \"dtype\": \"Int64\",\n            \"prop_missing\": 0.2,\n            \"distribution\": {\n                \"implements\": \"core.discrete_uniform\",\n                \"provenance\": \"builtin\",\n                \"class_name\": \"DiscreteUniformDistribution\",\n                \"parameters\": {\n                    \"low\": -30,\n                    \"high\": 301\n                }\n            }\n        }\n    ]\n}\n```\n\nA more advanced example GMF, based on the [Titanic](https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/titanic.csv) dataset, can be found [here](examples/titanic_example.json)\n</details>\n\n### Generating synthetic data from a GMF file\nMetaSynth can then be used to **generate synthetic data** from any GMF standard .JSON file.\n\n![Synthetic_data_generation](docs/source/images/flow_synthetic_data_generation.png)\n\nThe generated synthetic data, emulates the original data's format and plausibility at the individual record level and attempts to reproduce marginal (univariate) distributions where possible. Generated values are based on the observed distributions while adding a degree of variance and smoothing. The generated data does **not** aim to preserve the relationships between variables. The frequency of missing values and their codes are maintained in the synthetically-augmented dataset. \n\n### Overview of features\n- **Metadata Generation**: MetaSynth allows the extraction of metadata from a dataset provided as a Polars or Pandas dataframe. Metadata includes key characteristics such as variable names, types, data types, the percentage of missing values, and distribution attributes.\n- **Synthetic Data Generation**: MetaSynth allows for the generation of a polars DataFrame with synthetic data that resembles the original data.\n- **GMF Standard**: MetaSynth utilizes the Generative Metadata Format (GMF) standard for metadata export and import. \n- **Distribution Fitting**: MetaSynth allows for manual and automatic distribution fitting.\n- **Data Type Support**: MetaSynth supports generating synthetic data for a variety of common data types including `categorical`, `string`, `integer`, `float`, `date`, `time`, and `datetime`.\n- **Integration with Faker**: MetaSynth integrates with the [faker](https://github.com/joke2k/faker) package, a Python library for generating fake data such as names and emails. Allowing for more realistic synthetic data.    \n- **Structured String Detection**: MetaSynth identifies structured strings within your dataset, which can include formatted text, codes, identifiers, or any string that follows a specific pattern.\n- **Handling Unique Values**: MetaSynth can identify and process variables with unique values or keys in the data, preserving their uniqueness in the synthetic dataset, which is crucial for generating synthetic data that maintains the characteristics of the original dataset.\n\n\n## Getting Started\n### Try it out online\nIf you're new to Python or simply want to quickly explore the basic features of MetaSynth, you can try it out using the online Google Colab tutorial. [Click here](https://colab.research.google.com/github/sodascience/metasynth/blob/main/examples/getting_started.ipynb) to access the tutorial. It provides a step-by-step walkthrough and example dataset to help you get started. However, please exercise caution when using sensitive data, as it will be handled through Google servers.\n\n### Local Installation\n\nFor more advanced users and researchers who prefer working on their local machines, you can install MetaSynth directly from PyPI using the following command in the terminal (not Python):\n\n```sh\npip install metasynth\n```\n\n## Usage\n\nTo learn how to use MetaSynth effectively, refer to the comprehensive [documentation](https://metasynth.readthedocs.io/en/latest/index.html). The documentation covers all the necessary information and provides detailed explanations, examples, and usage guidelines.\n\nAdditionally, the documentation offers a series of [tutorials](https://metasynth.readthedocs.io/en/latest/index.html) that delve into specific features and use cases. These tutorials can further assist you in understanding and leveraging the capabilities of MetaSynth.\n\n### Quick start\nGet started quickly with MetaSynth using the following example. In this concise demonstration, you'll learn the basic functionality of MetaSynth by generating synthetic data from [titanic](https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/titanic.csv) dataset.\n\nIt is important to start by importing the appropriate libraries:\n\n```python\n# import libraries\nimport polars as pl\nfrom metasynth import MetaFrame, demo_file\n```\n\n#### Generating a MetaFrame \n##### 1.  Begin by creating a polars dataframe:\n```python\n# import the demo csv \ndataset_csv = demo_file() # This function automatically loads the Titanic dataset (as found here )\n\n\n# create dataframe\ndata_types = {\n    \"Sex\": pl.Categorical,\n    \"Embarked\": pl.Categorical,\n    \"Survived\": pl.Categorical,\n    \"Pclass\": pl.Categorical,\n    \"SibSp\": pl.Categorical,\n    \"Parch\": pl.Categorical\n}\n\ndf = pl.read_csv(dataset_csv, dtypes=data_types)\n```\n\n<details>\n     <summary> \n     Note on using Pandas\n     </summary>\n     \nInternally, MetaSynth uses Polars (instead of Pandas) mainly because typing and the handling of non-existing data is more\nconsistent. It is possible to supply a Pandas DataFrame instead of a polars DataFrame to `MetaFrame.fit_dataframe`.\nHowever, this uses the automatic polars conversion functionality, which for some edge cases result in problems. Therefore,\nwe advise users to create Polars DataFrames. The resulting synthetic dataset is always a polars dataframe, but this can\nbe easily converted back to a Pandas DataFrame by using `df_pandas = df_polars.to_pandas()`.\n</details>\n\n##### 2. Next, we can generate a MetaFrame from the polars DataFrame.\n\n```python\n# create a MetaFrame (mf) from the DataFrame (df)\nmf = MetaFrame.fit_dataframe(df)\n```\n\n> Note: if at this point you get the following warning about a potential unique variable, do not worry, it is safe to continue.\n> \n> ```\n> Variable PassengerId seems unique, but not set to be unique. Set the variable to be either unique or not unique to remove this warning. warnings.warn(f\"\\nVariable {series.name} seems unique, but not set to be unique.\\n\"\n> ```\n\n##### 3. We can export this MetaFrame to a .JSON file using:\n\n```python\n#export MetaFrame\nmf.to_json(\"exported_metaframe.json\")\n```\n\n#### Generating synthetic data\n\n##### 1. We can load metadata from a .JSON file:\n```python\n# load MetaFrame\nmf = MetaFrame.from_json(\"exported_metaframe.json\")\n```\n\n##### 2. We can then synthesize a DataFrame based on a loaded MetaFrame using:\n\n```python\n# synthesize a DataFrame with 5 rows of data based on a MetaFrame\nsynthetic_data = mf.synthesize(5) \n```\n\n\n\n<!-- CONTRIBUTING -->\n## Contributing\nContributions are what make the open source community an amazing place to learn, inspire, and create.\n\nAny contributions you make are greatly appreciated.\n\nTo contribute:\n1. Fork the Project\n2. Create your Feature Branch (`git checkout -b feature/AmazingFeature`)\n3. Commit your Changes (`git commit -m 'Add some AmazingFeature'`)\n4. Push to the Branch (`git push origin feature/AmazingFeature`)\n5. Open a Pull Request\n\n\n<!-- CONTACT -->\n## Contact\n**MetaSynth** is a project by the [ODISSEI Social Data Science (SoDa)](https://odissei-data.nl/nl/soda/) team.\nDo you have questions, suggestions, or remarks on the technical implementation? File an issue in the\nissue tracker or feel free to contact [Erik-Jan van Kesteren](https://github.com/vankesteren)\nor [Raoul Schram](https://github.com/qubixes).\n\n<img src=\"docs/source/images/logos/soda.png\" alt=\"SoDa logo\" width=\"250px\"/> \n",
    "bugtrack_url": null,
    "license": "MIT License  Copyright (c) 2023 SoDa  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:  The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.  THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ",
    "summary": "Package for creating synthetic datasets while preserving privacy.",
    "version": "0.5.0",
    "project_urls": {
        "GitHub": "https://github.com/sodascience/metasynth",
        "documentation": "https://metasynth.readthedocs.io/en/latest/index.html"
    },
    "split_keywords": [
        "metadata",
        "open-data",
        "privacy",
        "synthetic-data",
        "tabular datasets"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "dc5a58a27e17778ffc0c7d143e5f6053c0fe83aaf95a10bbb92a2b6307dd500a",
                "md5": "d78279b7ae65124abb96edd569e410ba",
                "sha256": "424bc8d277d323a15799b3b8eaf219f52efffe2eb09f4326ef03510c805007e1"
            },
            "downloads": -1,
            "filename": "metasynth-0.5.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d78279b7ae65124abb96edd569e410ba",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 86142,
            "upload_time": "2023-09-18T09:36:32",
            "upload_time_iso_8601": "2023-09-18T09:36:32.402142Z",
            "url": "https://files.pythonhosted.org/packages/dc/5a/58a27e17778ffc0c7d143e5f6053c0fe83aaf95a10bbb92a2b6307dd500a/metasynth-0.5.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "918c1df24f9004196267403fcb40de5b6cef65ebe3ceb5e73a2b46080cf6e08c",
                "md5": "92fdbfb8aabcd3974d9c445eb746d6d5",
                "sha256": "c4f22ab3a4bfbd4f7d5b2cfbcad19ad25466c6c7ef056695a897a3020790f62d"
            },
            "downloads": -1,
            "filename": "metasynth-0.5.0.tar.gz",
            "has_sig": false,
            "md5_digest": "92fdbfb8aabcd3974d9c445eb746d6d5",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 3507786,
            "upload_time": "2023-09-18T09:36:34",
            "upload_time_iso_8601": "2023-09-18T09:36:34.541893Z",
            "url": "https://files.pythonhosted.org/packages/91/8c/1df24f9004196267403fcb40de5b6cef65ebe3ceb5e73a2b46080cf6e08c/metasynth-0.5.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-09-18 09:36:34",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "sodascience",
    "github_project": "metasynth",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "metasynth"
}
        
Elapsed time: 0.11697s