py-synd


Namepy-synd JSON
Version 0.0.1 PyPI version JSON
download
home_pagehttps://github.com/wilhelmagren/synd
SummarySYNthetic Data generation for complex tabular datasets
upload_time2023-05-30 18:09:12
maintainer
docs_urlNone
authorWilhelm Ågren
requires_python>=3.7
license
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <div align="center">
<br/>
<div align="left">
<br/>
<p align="center">
<a href="https://github.com/wilhelmagren/synd">
<img align="center" width=40% src="https://github.com/wilhelmagren/synd/blob/120ad15bf411807073b7f279c6390560ae1054c3/docs/images/synd-transparent.png"></img>
</a>
</p>
</div>

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![codecov](https://codecov.io/github/wilhelmagren/synd/branch/main/graph/badge.svg?token=PBUVJ2LNMM)](https://codecov.io/github/wilhelmagren/synd)
[![Lines of code](https://img.shields.io/tokei/lines/github/wilhelmagren/synd)](https://github.com/wilhelmagren/synd/tree/58bcd31b37c5bde0c8656717ed6c0f81cc3ec562/synd)
[![Unit Tests](https://github.com/wilhelmagren/synd/actions/workflows/unittest.yml/badge.svg)](https://github.com/wilhelmagren/synd/actions/workflows/unittest.yml)

</div>

## 🔎 Overview
SYNthetic Data generation for complex datasets. Seamlessly speed up testing and data integration by utilizing the power of synthetically generated data.

Fully open-source with data transparency and compliance at heart.

Ongoing work:
- Identifiable fields anonymization
- Multi-tabular data
- Sequential data
- Image data
- Data & model lineage
- Multi GPU support
- Database connections (?)
- Weights & Biases support (?)
- ...


## 🔒 Requirements
- If installing locally, you need the dependencies from [requirements.txt](https://github.com/wilhelmagren/synd/blob/main/requirements.txt) file.
- To train and sample efficiently you need a CUDA compatible GPU, check out [this](https://developer.nvidia.com/cuda-gpus) link.
- Python <= 3.10 & >= 3.7


## 📦 Installation
Either clone this repository and perform a local install accordingly
```
git clone https://github.com/wilhelmagren/synd.git
cd synd
pip install -e .
```
or install the most recent release from the Python Package Index (PyPI).
```
pip install <tbd>
```


## 🚀 Example usage
The models are designed to be used similarly to sci-kit learn models. Import some data in pandas DataFrame format, create a dataset and fit 
the required transformer and sampler for the dataset, then create the synthesizer model and fit it on the dataset. Then you can generate an arbitrary amount
of synthetic data with the trained models, and compare the quality to real data.

```python
from sdv.datasets.demo import download_demo
data, metadata = download_demo('single_table', 'adult')

from synd.models import CTGAN
from synd.datasets import SingleTable

discrete_columns = [
    'workclass',
    'education',
    'marital-status',
    'occupation',
    'relationship',
    'race',
    'sex',
    'native-country',
    'label',
]

dataset = SingleTable(data, metadata.to_dict(), discrete_columns=discrete_columns)
dataset.fit()

import torch

batch_size = 500
epochs = 100
device = 'cuda:0' if torch.cuda.is_available() else 'cpu'

model = CTGAN(batch_size=batch_size, device=device)
model.fit(dataset, epochs=epochs, critic_steps=2)

fake_samples = model.sample(n_samples=10000)

from sdmetrics.reports.single_table import QualityReport
report = QualityReport()
report.generate(data, fake_samples, metadata.to_dict())
...
```

## 📋 License
All code is to be held under a general MIT license, please see [LICENSE](https://github.com/wilhelmagren/synd/blob/fa06666402cfa0aa05846c9513aff19fc720a8f1/LICENSE) for specific information.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/wilhelmagren/synd",
    "name": "py-synd",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "",
    "author": "Wilhelm \u00c5gren",
    "author_email": "Wilhelm \u00c5gren <wilhelmagren98@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/9d/da/b79be1a6998cd47647d4fd819985d209459ede88d80bd8620d24d3e5e568/py-synd-0.0.1.tar.gz",
    "platform": null,
    "description": "<div align=\"center\">\n<br/>\n<div align=\"left\">\n<br/>\n<p align=\"center\">\n<a href=\"https://github.com/wilhelmagren/synd\">\n<img align=\"center\" width=40% src=\"https://github.com/wilhelmagren/synd/blob/120ad15bf411807073b7f279c6390560ae1054c3/docs/images/synd-transparent.png\"></img>\n</a>\n</p>\n</div>\n\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![codecov](https://codecov.io/github/wilhelmagren/synd/branch/main/graph/badge.svg?token=PBUVJ2LNMM)](https://codecov.io/github/wilhelmagren/synd)\n[![Lines of code](https://img.shields.io/tokei/lines/github/wilhelmagren/synd)](https://github.com/wilhelmagren/synd/tree/58bcd31b37c5bde0c8656717ed6c0f81cc3ec562/synd)\n[![Unit Tests](https://github.com/wilhelmagren/synd/actions/workflows/unittest.yml/badge.svg)](https://github.com/wilhelmagren/synd/actions/workflows/unittest.yml)\n\n</div>\n\n## \ud83d\udd0e Overview\nSYNthetic Data generation for complex datasets. Seamlessly speed up testing and data integration by utilizing the power of synthetically generated data.\n\nFully open-source with data transparency and compliance at heart.\n\nOngoing work:\n- Identifiable fields anonymization\n- Multi-tabular data\n- Sequential data\n- Image data\n- Data & model lineage\n- Multi GPU support\n- Database connections (?)\n- Weights & Biases support (?)\n- ...\n\n\n## \ud83d\udd12 Requirements\n- If installing locally, you need the dependencies from [requirements.txt](https://github.com/wilhelmagren/synd/blob/main/requirements.txt) file.\n- To train and sample efficiently you need a CUDA compatible GPU, check out [this](https://developer.nvidia.com/cuda-gpus) link.\n- Python <= 3.10 & >= 3.7\n\n\n## \ud83d\udce6 Installation\nEither clone this repository and perform a local install accordingly\n```\ngit clone https://github.com/wilhelmagren/synd.git\ncd synd\npip install -e .\n```\nor install the most recent release from the Python Package Index (PyPI).\n```\npip install <tbd>\n```\n\n\n## \ud83d\ude80 Example usage\nThe models are designed to be used similarly to sci-kit learn models. Import some data in pandas DataFrame format, create a dataset and fit \nthe required transformer and sampler for the dataset, then create the synthesizer model and fit it on the dataset. Then you can generate an arbitrary amount\nof synthetic data with the trained models, and compare the quality to real data.\n\n```python\nfrom sdv.datasets.demo import download_demo\ndata, metadata = download_demo('single_table', 'adult')\n\nfrom synd.models import CTGAN\nfrom synd.datasets import SingleTable\n\ndiscrete_columns = [\n    'workclass',\n    'education',\n    'marital-status',\n    'occupation',\n    'relationship',\n    'race',\n    'sex',\n    'native-country',\n    'label',\n]\n\ndataset = SingleTable(data, metadata.to_dict(), discrete_columns=discrete_columns)\ndataset.fit()\n\nimport torch\n\nbatch_size = 500\nepochs = 100\ndevice = 'cuda:0' if torch.cuda.is_available() else 'cpu'\n\nmodel = CTGAN(batch_size=batch_size, device=device)\nmodel.fit(dataset, epochs=epochs, critic_steps=2)\n\nfake_samples = model.sample(n_samples=10000)\n\nfrom sdmetrics.reports.single_table import QualityReport\nreport = QualityReport()\nreport.generate(data, fake_samples, metadata.to_dict())\n...\n```\n\n## \ud83d\udccb License\nAll code is to be held under a general MIT license, please see [LICENSE](https://github.com/wilhelmagren/synd/blob/fa06666402cfa0aa05846c9513aff19fc720a8f1/LICENSE) for specific information.\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "SYNthetic Data generation for complex tabular datasets",
    "version": "0.0.1",
    "project_urls": {
        "Bug tracker": "https://github.com/wilhelmagren/synd/issues",
        "Homepage": "https://github.com/wilhelmagren/synd"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "fd3623942e27e59f33dc5f6eef57a58264ec97764106e5baaa5f57ffaf9f5ac2",
                "md5": "cfeddc8b826ac81753a13c08efce1adc",
                "sha256": "9db30bd3086f96e33d977f58382f3589245bf5d45ee86e30e04f9ac4dd18c55f"
            },
            "downloads": -1,
            "filename": "py_synd-0.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "cfeddc8b826ac81753a13c08efce1adc",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 48217,
            "upload_time": "2023-05-30T18:09:10",
            "upload_time_iso_8601": "2023-05-30T18:09:10.658487Z",
            "url": "https://files.pythonhosted.org/packages/fd/36/23942e27e59f33dc5f6eef57a58264ec97764106e5baaa5f57ffaf9f5ac2/py_synd-0.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9ddab79be1a6998cd47647d4fd819985d209459ede88d80bd8620d24d3e5e568",
                "md5": "69c0cca90984762039f7cd7c04e8f38c",
                "sha256": "625fa2587304c2c71f5a40d40004be174150f811b92100ff6dcee383f49cad9d"
            },
            "downloads": -1,
            "filename": "py-synd-0.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "69c0cca90984762039f7cd7c04e8f38c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 20326,
            "upload_time": "2023-05-30T18:09:12",
            "upload_time_iso_8601": "2023-05-30T18:09:12.829177Z",
            "url": "https://files.pythonhosted.org/packages/9d/da/b79be1a6998cd47647d4fd819985d209459ede88d80bd8620d24d3e5e568/py-synd-0.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-05-30 18:09:12",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "wilhelmagren",
    "github_project": "synd",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "py-synd"
}
        
Elapsed time: 0.08030s