synthetic-dataset


Namesynthetic-dataset JSON
Version 0.0.0.2 PyPI version JSON
download
home_page
SummaryGenerating accurate and safe synthetic datasets for tabular, classification, and time-series labeling tasks
upload_time2023-04-10 04:37:01
maintainer
docs_urlNone
authorSynthetic Dataset AI Team
requires_python>=3.8
license
keywords python pandas numpy scikit-learn scipy matplotlib seaborn
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
# Synthetic Data Generation for Tabular, Classification, and Time-Series Labels



This repository contains a Python-based framework for generating accurate and safe synthetic datasets for tabular, classification, and time-series labeling tasks. It is designed to help researchers, data scientists, and machine learning engineers create high-quality, realistic datasets for training and evaluating their models while ensuring privacy and compliance with data protection regulations.



## Features



1. **Tabular Data Generation**: Easily generate synthetic tabular datasets with customizable column types, distribution patterns, and correlations between variables.

2. **Classification Data Generation**: Create datasets for binary or multi-class classification tasks, controlling class imbalance and feature importance.

3. **Time-Series Data Generation**: Generate synthetic time-series datasets with user-defined seasonality, trend, and noise components.

4. **Data Privacy**: Ensure data privacy by using differential privacy techniques and limiting the degree of similarity between the original and synthetic datasets.

5. **Flexible and Extensible**: The framework is designed to be easily extended and adapted to a wide range of data generation tasks, with support for custom data generation modules and integration with other data generation tools.



## Installation



Clone the repository and install the required dependencies:



```bash

git clone https://github.com/syntheticdataset/synthetic-dataset.git

cd synthetic-dataset

pip install -r requirements.txt

```



## Usage

Refer to the provided examples and documentation for guidance on how to generate synthetic datasets for your specific use case.



from synthetic_data import TabularDataGenerator, ClassificationDataGenerator, TimeSeriesDataGenerator



```python

# Tabular data generation

tabular_gen = TabularDataGenerator(num_rows=1000)

tabular_data = tabular_gen.generate()



# Classification data generation

classification_gen = ClassificationDataGenerator(num_samples=1000, num_classes=3)

classification_data, labels = classification_gen.generate()



# Time-series data generation

time_series_gen = TimeSeriesDataGenerator(num_samples=1000, seasonal_period=12)

time_series_data = time_series_gen.generate()

```





## Contributing

Please read the CONTRIBUTING.md file for details on how to contribute to the project. We welcome pull requests, bug reports, and feature requests.



## License

This project is licensed under the MIT License - [Licence](https://github.com/syntheticdataset/synthetic-dataset/blob/main/LICENSE) see the  file for details.


            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "synthetic-dataset",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "python,pandas,numpy,scikit-learn,scipy,matplotlib,seaborn",
    "author": "Synthetic Dataset AI Team",
    "author_email": "<admin@syntheticdataset.ai>",
    "download_url": "https://files.pythonhosted.org/packages/26/ea/2f021b6a2a16c960aece62899bfc33fc19fe2516d07d3ad88f6cfa4bbc27/synthetic-dataset-0.0.0.2.tar.gz",
    "platform": null,
    "description": "\n# Synthetic Data Generation for Tabular, Classification, and Time-Series Labels\n\n\n\nThis repository contains a Python-based framework for generating accurate and safe synthetic datasets for tabular, classification, and time-series labeling tasks. It is designed to help researchers, data scientists, and machine learning engineers create high-quality, realistic datasets for training and evaluating their models while ensuring privacy and compliance with data protection regulations.\n\n\n\n## Features\n\n\n\n1. **Tabular Data Generation**: Easily generate synthetic tabular datasets with customizable column types, distribution patterns, and correlations between variables.\n\n2. **Classification Data Generation**: Create datasets for binary or multi-class classification tasks, controlling class imbalance and feature importance.\n\n3. **Time-Series Data Generation**: Generate synthetic time-series datasets with user-defined seasonality, trend, and noise components.\n\n4. **Data Privacy**: Ensure data privacy by using differential privacy techniques and limiting the degree of similarity between the original and synthetic datasets.\n\n5. **Flexible and Extensible**: The framework is designed to be easily extended and adapted to a wide range of data generation tasks, with support for custom data generation modules and integration with other data generation tools.\n\n\n\n## Installation\n\n\n\nClone the repository and install the required dependencies:\n\n\n\n```bash\n\ngit clone https://github.com/syntheticdataset/synthetic-dataset.git\n\ncd synthetic-dataset\n\npip install -r requirements.txt\n\n```\n\n\n\n## Usage\n\nRefer to the provided examples and documentation for guidance on how to generate synthetic datasets for your specific use case.\n\n\n\nfrom synthetic_data import TabularDataGenerator, ClassificationDataGenerator, TimeSeriesDataGenerator\n\n\n\n```python\n\n# Tabular data generation\n\ntabular_gen = TabularDataGenerator(num_rows=1000)\n\ntabular_data = tabular_gen.generate()\n\n\n\n# Classification data generation\n\nclassification_gen = ClassificationDataGenerator(num_samples=1000, num_classes=3)\n\nclassification_data, labels = classification_gen.generate()\n\n\n\n# Time-series data generation\n\ntime_series_gen = TimeSeriesDataGenerator(num_samples=1000, seasonal_period=12)\n\ntime_series_data = time_series_gen.generate()\n\n```\n\n\n\n\n\n## Contributing\n\nPlease read the CONTRIBUTING.md file for details on how to contribute to the project. We welcome pull requests, bug reports, and feature requests.\n\n\n\n## License\n\nThis project is licensed under the MIT License - [Licence](https://github.com/syntheticdataset/synthetic-dataset/blob/main/LICENSE) see the  file for details.\n\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Generating accurate and safe synthetic datasets for tabular, classification, and time-series labeling tasks",
    "version": "0.0.0.2",
    "split_keywords": [
        "python",
        "pandas",
        "numpy",
        "scikit-learn",
        "scipy",
        "matplotlib",
        "seaborn"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2e324614b7ca4899ff2a5ab1bf39f04bb0344654d0dbba14ba72d16a124ff8fc",
                "md5": "57de5825ef26283b781b9e474fdb5428",
                "sha256": "ea0bfa4cd8b0039e0b78c70e24e7e9fa053eabb7125ece98cb1851df587bfbc0"
            },
            "downloads": -1,
            "filename": "synthetic_dataset-0.0.0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "57de5825ef26283b781b9e474fdb5428",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 3322,
            "upload_time": "2023-04-10T04:36:59",
            "upload_time_iso_8601": "2023-04-10T04:36:59.519759Z",
            "url": "https://files.pythonhosted.org/packages/2e/32/4614b7ca4899ff2a5ab1bf39f04bb0344654d0dbba14ba72d16a124ff8fc/synthetic_dataset-0.0.0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "26ea2f021b6a2a16c960aece62899bfc33fc19fe2516d07d3ad88f6cfa4bbc27",
                "md5": "6716b0b014950fce8ad53fdad1dd9898",
                "sha256": "41b8ab040623c3b440fc518275a1260c82e1282c172d0603e044a4c910b3125d"
            },
            "downloads": -1,
            "filename": "synthetic-dataset-0.0.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "6716b0b014950fce8ad53fdad1dd9898",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 3457,
            "upload_time": "2023-04-10T04:37:01",
            "upload_time_iso_8601": "2023-04-10T04:37:01.684270Z",
            "url": "https://files.pythonhosted.org/packages/26/ea/2f021b6a2a16c960aece62899bfc33fc19fe2516d07d3ad88f6cfa4bbc27/synthetic-dataset-0.0.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-04-10 04:37:01",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "lcname": "synthetic-dataset"
}
        
Elapsed time: 0.07399s