synth-data-eval

Name	synth-data-eval JSON
Version	0.1.4 JSON
	download
home_page	None
Summary	Comprehensive evaluation framework for tabular synthetic data generators
upload_time	2025-10-25 18:32:58
maintainer	None
docs_url	None
author	None
requires_python	<3.12,>=3.8
license	MIT License Copyright (c) 2025 Eötvös Loránd University (ELTE) Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords	synthetic-data machine-learning evaluation tabular-data ctgan privacy data-generation
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # synth-data-eval repo

[![CI](https://github.com/ahmed-fouad-lagha/synth-data-eval/actions/workflows/ci.yml/badge.svg)](https://github.com/ahmed-fouad-lagha/synth-data-eval/actions/workflows/ci.yml)
[![Code Quality](https://github.com/ahmed-fouad-lagha/synth-data-eval/actions/workflows/code-quality.yml/badge.svg)](https://github.com/ahmed-fouad-lagha/synth-data-eval/actions/workflows/code-quality.yml)
[![codecov](https://codecov.io/gh/ahmed-fouad-lagha/synth-data-eval/branch/main/graph/badge.svg)](https://codecov.io/gh/ahmed-fouad-lagha/synth-data-eval)
[![PyPI version](https://img.shields.io/pypi/v/synth-data-eval.svg)](https://pypi.org/project/synth-data-eval/)
[![PyPI pyversions](https://img.shields.io/pypi/pyversions/synth-data-eval.svg)](https://pypi.org/project/synth-data-eval/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

A collaborative research project investigating methods for generating and evaluating synthetic tabular data across multiple domains.
This repository contains reproducible code, datasets, and experiment configurations used in our paper preparation.

---

## 📚 Project Overview

Synthetic data is crucial for privacy-preserving machine learning.
This project evaluates different synthetic data generators (CTGAN, TVAE, Gaussian Copula) across statistical fidelity, ML utility, privacy, and data quality.

**Research Objective:**
To provide a systematic benchmark framework and identify trade-offs between realism, privacy, and downstream task performance.

---

## 🚀 Installation

### From PyPI (Recommended)
```bash
pip install synth-data-eval
```

### From Source (Development)
```bash
git clone https://github.com/ahmed-fouad-lagha/synth-data-eval.git
cd synth-data-eval
pip install -e ".[all]"  # Install with all optional dependencies
```

### Optional Dependencies
```bash
pip install -e ".[dev]"      # Development tools (pytest, mypy, black, etc.)
pip install -e ".[docs]"     # Documentation building
pip install -e ".[notebooks]" # Jupyter notebook support
```

---

## 🧵 Repository Structure
```
synthetic-tabular-eval/
├── pyproject.toml
├── README.md
├── CONTRIBUTING.md
├── LICENSE
├── .gitignore
├── generators/
│   ├── __init__.py
│   ├── base_generator.py
│   ├── ctgan_model.py
│   ├── tvae_model.py
│   └── gaussian_copula.py
├── evaluation/
│   ├── __init__.py
│   ├── sdmetrics_evaluation.py
│   ├── ml_utility.py
│   └── privacy_metrics.py
├── scripts/
│   ├── config.yaml
│   ├── run_benchmark.py
│   ├── visualize_results.py
│   └── download_datasets.py
├── tests/
│   ├── __init__.py
│   ├── test_generators.py
│   └── test_evaluation.py
├── datasets/
├── results/
└── logs/
```

---

## 🔬 Experimental Setup

### Datasets
We evaluated on five benchmark datasets for comprehensive evaluation:
- **Adult Income**: 32,561 training samples, 14 features (8 categorical, 6 numerical) - *Classification*
- **Credit Card Default**: 30,000 training samples, 23 features (mixed) - *Classification*
- **Diabetes**: 442 training samples, 10 numerical features - *Regression*
- **California Housing**: 20,640 training samples, 8 numerical features - *Regression*
- **Wine Quality**: 1,599 training samples, 11 numerical features - *Regression*

### Generators
- **CTGAN**: GAN-based with mode-specific normalization for categorical data
- **TVAE**: Variational autoencoder approach optimized for tabular data
- **Gaussian Copula**: Parametric baseline using copula-based modeling

### Evaluation Metrics
- **Statistical Fidelity**: Correlation similarity, Kolmogorov-Smirnov complement
- **ML Utility**: Train-on-Synthetic-Test-on-Real (TSTR) paradigm with utility ratios
- **Privacy**: Distance to Closest Record (DCR), Nearest Neighbor Distance Ratio (NNDR)

### Implementation Details
- **5 independent runs** per configuration for statistical robustness
- **300 epochs** for deep learning models (CTGAN, TVAE)
- **Python 3.10**, **SDV 1.28**, **CTGAN 0.7**
- **Statistical significance testing** with t-tests and confidence intervals

---

## 📊 Key Findings

**Performance Highlights:**
- **TVAE excels on classification tasks** (Adult Income: 0.908 ± 0.028 utility ratio)
- **Gaussian Copula dominates regression tasks** (Diabetes: 0.964 ± 0.000 utility ratio)
- **Massive training time differences**: CTGAN (1022s) vs Gaussian Copula (4.9s) = 200x efficiency gap
- **8 statistically significant differences** detected across metrics and datasets

**Trade-offs Identified:**
- GAN-based generators (CTGAN, TVAE) show negative utility on small regression datasets
- Gaussian Copula provides best privacy-utility balance, especially for smaller datasets
- Dataset size significantly impacts generator performance and optimal choice

---

## 🧬 Experiment Pipeline

**Completed Research Workflow:**
- **Data Preparation:** 5 diverse datasets (Adult Income 32K, Credit 30K, California Housing 20K, Wine Quality 1.6K, Diabetes 442 samples)
- **Generation:** 5 independent runs each of CTGAN (300 epochs), TVAE (300 epochs), Gaussian Copula
- **Evaluation:** Statistical fidelity (SDMetrics), ML utility (TSTR paradigm), privacy metrics (DCR, NNDR)
- **Analysis:** Statistical significance testing, confidence intervals, comprehensive visualizations

**Key Scripts:**
- `scripts/run_benchmark.py` - Execute complete experimental pipeline
- `scripts/statistical_analysis.py` - Generate significance tests and LaTeX tables
- `scripts/visualize_results.py` - Create radar plots, heatmaps, and utility comparisons
- `paper/main.tex` - Complete research paper with results and analysis

---

## 🔄 Reproducing Results

```bash
# 1. Install dependencies
pip install -e ".[all]"

# 2. Download datasets
python scripts/download_datasets.py

# 3. Run complete benchmark (will take several hours)
python scripts/run_benchmark.py

# 4. Generate statistical analysis
python scripts/statistical_analysis.py

# 5. Create visualizations
python scripts/visualize_results.py

# 6. Compile paper
cd paper && pdflatex main.tex
```

**Expected Runtime:** ~6-8 hours for full experimental pipeline with 5 runs × 3 generators × 5 datasets.

---

## �️ Development

### Prerequisites
- Python 3.8+
- pip

### Setup
```bash
# Clone the repository
git clone https://github.com/ahmed-fouad-lagha/synth-data-eval.git
cd synth-data-eval

# Install in development mode with all dependencies
pip install -e ".[dev,docs,notebooks]"

# Optional: Install pre-commit hooks for code quality
pip install pre-commit
pre-commit install
```

### Testing
```bash
# Run all tests
pytest

# Run with coverage
pytest --cov=generators --cov=evaluation

# Run specific test file
pytest tests/test_generators.py
```

### Code Quality
```bash
# Format code
black .
isort .

# Lint code
flake8 .

# Type check
mypy generators/ evaluation/ scripts/
```

### Documentation
```bash
# Build documentation
cd docs
sphinx-build -b html . _build/html

# View documentation
open _build/html/index.html
```

### CI/CD
This project uses GitHub Actions for continuous integration:

- **CI Pipeline**: Runs on every push/PR with testing, linting, documentation building, and security scanning
- **Multi-Python Support**: Tests on Python 3.8, 3.9, 3.10, and 3.11
- **Code Quality**: Automated checks for formatting, linting, and type safety
- **Coverage**: Code coverage reporting with Codecov integration
- **Security**: Automated vulnerability scanning
- **Release**: Automated PyPI publishing on version tags

---

## 📦 Creating Releases

### Automated Release Process
Use the provided release script for consistent versioning and publishing:

```bash
# Patch release (0.1.0 -> 0.1.1)
python scripts/make_release.py patch

# Minor release (0.1.0 -> 0.2.0)
python scripts/make_release.py minor

# Major release (0.1.0 -> 1.0.0)
python scripts/make_release.py major

# Specific version release
python scripts/make_release.py v1.0.0
```

The script will:
- ✅ Run all quality checks (tests, linting, type checking)
- ✅ Update version in `pyproject.toml`
- ✅ Update `CHANGELOG.md` with release date
- ✅ Build and validate the package
- ✅ Create a git tag and push to trigger PyPI publishing

### Manual Release Process
If you prefer manual control:

1. Update version in `pyproject.toml`
2. Update `CHANGELOG.md`
3. Commit changes: `git commit -m "Release v1.0.0"`
4. Create tag: `git tag -a v1.0.0 -m "Release v1.0.0"`
5. Push: `git push origin v1.0.0`
6. GitHub Actions will automatically publish to PyPI

### Testing Releases
You can test releases on TestPyPI before publishing to production:

1. Go to GitHub Actions → Release workflow
2. Click "Run workflow"
3. Select "testpypi" target
4. Install from TestPyPI: `pip install --index-url https://test.pypi.org/simple/ synth-data-eval`

### Private Repository Setup
If your repository is private, GitHub release creation requires a Personal Access Token (PAT):

1. **Create a Personal Access Token (PAT)**:
   - Go to https://github.com/settings/tokens
   - Generate a new token with `repo` scope
   - Copy the token

2. **Add to Repository Secrets**:
   - Go to your repo → Settings → Secrets and variables → Actions
   - Add a new secret named `RELEASE_TOKEN`
   - Paste your PAT as the value

3. **✅ Status**: RELEASE_TOKEN is now configured - GitHub releases will work automatically!

---

## 🔒 Repository Policy

This repository is currently **private** and contains research code under development. It will be made public upon publication of the associated research paper to ensure proper attribution and compliance with venue policies.

- Do not upload confidential or non-public datasets
- Results and scripts shared here are for pre-publication collaboration only
- Contact authors for pre-publication access requests

---

## 📄 License

This repository contains research code that will be made publicly available under the MIT License upon publication of the associated research paper.

For pre-publication access, please contact the authors.

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "synth-data-eval",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.12,>=3.8",
    "maintainer_email": "Ahmed Fouad LAGHA <ms5jzx@inf.elte.hu>, Izsa Regina M\u00e1ria <bnbq2z@inf.elte.hu>, Zakarya Farou <zakaryafarou@inf.elte.hu>",
    "keywords": "synthetic-data, machine-learning, evaluation, tabular-data, ctgan, privacy, data-generation",
    "author": null,
    "author_email": "Ahmed Fouad LAGHA <ms5jzx@inf.elte.hu>, Izsa Regina M\u00e1ria <bnbq2z@inf.elte.hu>, Zakarya Farou <zakaryafarou@inf.elte.hu>",
    "download_url": "https://files.pythonhosted.org/packages/35/f5/fdb37f00df5b38af788e1fca9b8cf9402eda24c70c76ad15f6aa1257d5d6/synth_data_eval-0.1.4.tar.gz",
    "platform": null,
    "description": "# synth-data-eval repo\n\n[![CI](https://github.com/ahmed-fouad-lagha/synth-data-eval/actions/workflows/ci.yml/badge.svg)](https://github.com/ahmed-fouad-lagha/synth-data-eval/actions/workflows/ci.yml)\n[![Code Quality](https://github.com/ahmed-fouad-lagha/synth-data-eval/actions/workflows/code-quality.yml/badge.svg)](https://github.com/ahmed-fouad-lagha/synth-data-eval/actions/workflows/code-quality.yml)\n[![codecov](https://codecov.io/gh/ahmed-fouad-lagha/synth-data-eval/branch/main/graph/badge.svg)](https://codecov.io/gh/ahmed-fouad-lagha/synth-data-eval)\n[![PyPI version](https://img.shields.io/pypi/v/synth-data-eval.svg)](https://pypi.org/project/synth-data-eval/)\n[![PyPI pyversions](https://img.shields.io/pypi/pyversions/synth-data-eval.svg)](https://pypi.org/project/synth-data-eval/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\nA collaborative research project investigating methods for generating and evaluating synthetic tabular data across multiple domains.\nThis repository contains reproducible code, datasets, and experiment configurations used in our paper preparation.\n\n---\n\n## \ud83d\udcda Project Overview\n\nSynthetic data is crucial for privacy-preserving machine learning.\nThis project evaluates different synthetic data generators (CTGAN, TVAE, Gaussian Copula) across statistical fidelity, ML utility, privacy, and data quality.\n\n**Research Objective:**\nTo provide a systematic benchmark framework and identify trade-offs between realism, privacy, and downstream task performance.\n\n---\n\n## \ud83d\ude80 Installation\n\n### From PyPI (Recommended)\n```bash\npip install synth-data-eval\n```\n\n### From Source (Development)\n```bash\ngit clone https://github.com/ahmed-fouad-lagha/synth-data-eval.git\ncd synth-data-eval\npip install -e \".[all]\"  # Install with all optional dependencies\n```\n\n### Optional Dependencies\n```bash\npip install -e \".[dev]\"      # Development tools (pytest, mypy, black, etc.)\npip install -e \".[docs]\"     # Documentation building\npip install -e \".[notebooks]\" # Jupyter notebook support\n```\n\n---\n\n## \ud83e\uddf5 Repository Structure\n```\nsynthetic-tabular-eval/\n\u251c\u2500\u2500 pyproject.toml\n\u251c\u2500\u2500 README.md\n\u251c\u2500\u2500 CONTRIBUTING.md\n\u251c\u2500\u2500 LICENSE\n\u251c\u2500\u2500 .gitignore\n\u251c\u2500\u2500 generators/\n\u2502   \u251c\u2500\u2500 __init__.py\n\u2502   \u251c\u2500\u2500 base_generator.py\n\u2502   \u251c\u2500\u2500 ctgan_model.py\n\u2502   \u251c\u2500\u2500 tvae_model.py\n\u2502   \u2514\u2500\u2500 gaussian_copula.py\n\u251c\u2500\u2500 evaluation/\n\u2502   \u251c\u2500\u2500 __init__.py\n\u2502   \u251c\u2500\u2500 sdmetrics_evaluation.py\n\u2502   \u251c\u2500\u2500 ml_utility.py\n\u2502   \u2514\u2500\u2500 privacy_metrics.py\n\u251c\u2500\u2500 scripts/\n\u2502   \u251c\u2500\u2500 config.yaml\n\u2502   \u251c\u2500\u2500 run_benchmark.py\n\u2502   \u251c\u2500\u2500 visualize_results.py\n\u2502   \u2514\u2500\u2500 download_datasets.py\n\u251c\u2500\u2500 tests/\n\u2502   \u251c\u2500\u2500 __init__.py\n\u2502   \u251c\u2500\u2500 test_generators.py\n\u2502   \u2514\u2500\u2500 test_evaluation.py\n\u251c\u2500\u2500 datasets/\n\u251c\u2500\u2500 results/\n\u2514\u2500\u2500 logs/\n```\n\n---\n\n## \ud83d\udd2c Experimental Setup\n\n### Datasets\nWe evaluated on five benchmark datasets for comprehensive evaluation:\n- **Adult Income**: 32,561 training samples, 14 features (8 categorical, 6 numerical) - *Classification*\n- **Credit Card Default**: 30,000 training samples, 23 features (mixed) - *Classification*\n- **Diabetes**: 442 training samples, 10 numerical features - *Regression*\n- **California Housing**: 20,640 training samples, 8 numerical features - *Regression*\n- **Wine Quality**: 1,599 training samples, 11 numerical features - *Regression*\n\n### Generators\n- **CTGAN**: GAN-based with mode-specific normalization for categorical data\n- **TVAE**: Variational autoencoder approach optimized for tabular data\n- **Gaussian Copula**: Parametric baseline using copula-based modeling\n\n### Evaluation Metrics\n- **Statistical Fidelity**: Correlation similarity, Kolmogorov-Smirnov complement\n- **ML Utility**: Train-on-Synthetic-Test-on-Real (TSTR) paradigm with utility ratios\n- **Privacy**: Distance to Closest Record (DCR), Nearest Neighbor Distance Ratio (NNDR)\n\n### Implementation Details\n- **5 independent runs** per configuration for statistical robustness\n- **300 epochs** for deep learning models (CTGAN, TVAE)\n- **Python 3.10**, **SDV 1.28**, **CTGAN 0.7**\n- **Statistical significance testing** with t-tests and confidence intervals\n\n---\n\n## \ud83d\udcca Key Findings\n\n**Performance Highlights:**\n- **TVAE excels on classification tasks** (Adult Income: 0.908 \u00b1 0.028 utility ratio)\n- **Gaussian Copula dominates regression tasks** (Diabetes: 0.964 \u00b1 0.000 utility ratio)\n- **Massive training time differences**: CTGAN (1022s) vs Gaussian Copula (4.9s) = 200x efficiency gap\n- **8 statistically significant differences** detected across metrics and datasets\n\n**Trade-offs Identified:**\n- GAN-based generators (CTGAN, TVAE) show negative utility on small regression datasets\n- Gaussian Copula provides best privacy-utility balance, especially for smaller datasets\n- Dataset size significantly impacts generator performance and optimal choice\n\n---\n\n## \ud83e\uddec Experiment Pipeline\n\n**Completed Research Workflow:**\n- **Data Preparation:** 5 diverse datasets (Adult Income 32K, Credit 30K, California Housing 20K, Wine Quality 1.6K, Diabetes 442 samples)\n- **Generation:** 5 independent runs each of CTGAN (300 epochs), TVAE (300 epochs), Gaussian Copula\n- **Evaluation:** Statistical fidelity (SDMetrics), ML utility (TSTR paradigm), privacy metrics (DCR, NNDR)\n- **Analysis:** Statistical significance testing, confidence intervals, comprehensive visualizations\n\n**Key Scripts:**\n- `scripts/run_benchmark.py` - Execute complete experimental pipeline\n- `scripts/statistical_analysis.py` - Generate significance tests and LaTeX tables\n- `scripts/visualize_results.py` - Create radar plots, heatmaps, and utility comparisons\n- `paper/main.tex` - Complete research paper with results and analysis\n\n---\n\n## \ud83d\udd04 Reproducing Results\n\n```bash\n# 1. Install dependencies\npip install -e \".[all]\"\n\n# 2. Download datasets\npython scripts/download_datasets.py\n\n# 3. Run complete benchmark (will take several hours)\npython scripts/run_benchmark.py\n\n# 4. Generate statistical analysis\npython scripts/statistical_analysis.py\n\n# 5. Create visualizations\npython scripts/visualize_results.py\n\n# 6. Compile paper\ncd paper && pdflatex main.tex\n```\n\n**Expected Runtime:** ~6-8 hours for full experimental pipeline with 5 runs \u00d7 3 generators \u00d7 5 datasets.\n\n---\n\n## \ufffd\ufe0f Development\n\n### Prerequisites\n- Python 3.8+\n- pip\n\n### Setup\n```bash\n# Clone the repository\ngit clone https://github.com/ahmed-fouad-lagha/synth-data-eval.git\ncd synth-data-eval\n\n# Install in development mode with all dependencies\npip install -e \".[dev,docs,notebooks]\"\n\n# Optional: Install pre-commit hooks for code quality\npip install pre-commit\npre-commit install\n```\n\n### Testing\n```bash\n# Run all tests\npytest\n\n# Run with coverage\npytest --cov=generators --cov=evaluation\n\n# Run specific test file\npytest tests/test_generators.py\n```\n\n### Code Quality\n```bash\n# Format code\nblack .\nisort .\n\n# Lint code\nflake8 .\n\n# Type check\nmypy generators/ evaluation/ scripts/\n```\n\n### Documentation\n```bash\n# Build documentation\ncd docs\nsphinx-build -b html . _build/html\n\n# View documentation\nopen _build/html/index.html\n```\n\n### CI/CD\nThis project uses GitHub Actions for continuous integration:\n\n- **CI Pipeline**: Runs on every push/PR with testing, linting, documentation building, and security scanning\n- **Multi-Python Support**: Tests on Python 3.8, 3.9, 3.10, and 3.11\n- **Code Quality**: Automated checks for formatting, linting, and type safety\n- **Coverage**: Code coverage reporting with Codecov integration\n- **Security**: Automated vulnerability scanning\n- **Release**: Automated PyPI publishing on version tags\n\n---\n\n## \ud83d\udce6 Creating Releases\n\n### Automated Release Process\nUse the provided release script for consistent versioning and publishing:\n\n```bash\n# Patch release (0.1.0 -> 0.1.1)\npython scripts/make_release.py patch\n\n# Minor release (0.1.0 -> 0.2.0)\npython scripts/make_release.py minor\n\n# Major release (0.1.0 -> 1.0.0)\npython scripts/make_release.py major\n\n# Specific version release\npython scripts/make_release.py v1.0.0\n```\n\nThe script will:\n- \u2705 Run all quality checks (tests, linting, type checking)\n- \u2705 Update version in `pyproject.toml`\n- \u2705 Update `CHANGELOG.md` with release date\n- \u2705 Build and validate the package\n- \u2705 Create a git tag and push to trigger PyPI publishing\n\n### Manual Release Process\nIf you prefer manual control:\n\n1. Update version in `pyproject.toml`\n2. Update `CHANGELOG.md`\n3. Commit changes: `git commit -m \"Release v1.0.0\"`\n4. Create tag: `git tag -a v1.0.0 -m \"Release v1.0.0\"`\n5. Push: `git push origin v1.0.0`\n6. GitHub Actions will automatically publish to PyPI\n\n### Testing Releases\nYou can test releases on TestPyPI before publishing to production:\n\n1. Go to GitHub Actions \u2192 Release workflow\n2. Click \"Run workflow\"\n3. Select \"testpypi\" target\n4. Install from TestPyPI: `pip install --index-url https://test.pypi.org/simple/ synth-data-eval`\n\n### Private Repository Setup\nIf your repository is private, GitHub release creation requires a Personal Access Token (PAT):\n\n1. **Create a Personal Access Token (PAT)**:\n   - Go to https://github.com/settings/tokens\n   - Generate a new token with `repo` scope\n   - Copy the token\n\n2. **Add to Repository Secrets**:\n   - Go to your repo \u2192 Settings \u2192 Secrets and variables \u2192 Actions\n   - Add a new secret named `RELEASE_TOKEN`\n   - Paste your PAT as the value\n\n3. **\u2705 Status**: RELEASE_TOKEN is now configured - GitHub releases will work automatically!\n\n---\n\n## \ud83d\udd12 Repository Policy\n\nThis repository is currently **private** and contains research code under development. It will be made public upon publication of the associated research paper to ensure proper attribution and compliance with venue policies.\n\n- Do not upload confidential or non-public datasets\n- Results and scripts shared here are for pre-publication collaboration only\n- Contact authors for pre-publication access requests\n\n---\n\n## \ud83d\udcc4 License\n\nThis repository contains research code that will be made publicly available under the MIT License upon publication of the associated research paper.\n\nFor pre-publication access, please contact the authors.\n",
    "bugtrack_url": null,
    "license": "MIT License\n        \n        Copyright (c) 2025 E\u00f6tv\u00f6s Lor\u00e1nd University (ELTE)\n        \n        Permission is hereby granted, free of charge, to any person obtaining a copy\n        of this software and associated documentation files (the \"Software\"), to deal\n        in the Software without restriction, including without limitation the rights\n        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\n        copies of the Software, and to permit persons to whom the Software is\n        furnished to do so, subject to the following conditions:\n        \n        The above copyright notice and this permission notice shall be included in all\n        copies or substantial portions of the Software.\n        \n        THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\n        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\n        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\n        SOFTWARE.\n        ",
    "summary": "Comprehensive evaluation framework for tabular synthetic data generators",
    "version": "0.1.4",
    "project_urls": {
        "Changelog": "https://github.com/ahmed-fouad-lagha/synth-data-eval/blob/main/CHANGELOG.md",
        "Documentation": "https://github.com/ahmed-fouad-lagha/synth-data-eval#readme",
        "Homepage": "https://github.com/ahmed-fouad-lagha/synth-data-eval",
        "Issues": "https://github.com/ahmed-fouad-lagha/synth-data-eval/issues",
        "Repository": "https://github.com/ahmed-fouad-lagha/synth-data-eval"
    },
    "split_keywords": [
        "synthetic-data",
        " machine-learning",
        " evaluation",
        " tabular-data",
        " ctgan",
        " privacy",
        " data-generation"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "ed3dbff5d3ac17e2ed7aa65717dedf94c2e1c8753b728862cacf24cefbf54011",
                "md5": "7392c2616ee9f162d4864bb7075ffd3b",
                "sha256": "5d7dc035811ad46e547f100fc375aa24e428c3273fa47e52c31eb388216da5c5"
            },
            "downloads": -1,
            "filename": "synth_data_eval-0.1.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "7392c2616ee9f162d4864bb7075ffd3b",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.12,>=3.8",
            "size": 34330,
            "upload_time": "2025-10-25T18:32:57",
            "upload_time_iso_8601": "2025-10-25T18:32:57.328049Z",
            "url": "https://files.pythonhosted.org/packages/ed/3d/bff5d3ac17e2ed7aa65717dedf94c2e1c8753b728862cacf24cefbf54011/synth_data_eval-0.1.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "35f5fdb37f00df5b38af788e1fca9b8cf9402eda24c70c76ad15f6aa1257d5d6",
                "md5": "5f120ba8732a158b884d05bcd90ee65c",
                "sha256": "ee58e7e353c89f084912e80e6b91bec270065b04a3adf2d16830c90bdbcb9817"
            },
            "downloads": -1,
            "filename": "synth_data_eval-0.1.4.tar.gz",
            "has_sig": false,
            "md5_digest": "5f120ba8732a158b884d05bcd90ee65c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.12,>=3.8",
            "size": 41209,
            "upload_time": "2025-10-25T18:32:58",
            "upload_time_iso_8601": "2025-10-25T18:32:58.359354Z",
            "url": "https://files.pythonhosted.org/packages/35/f5/fdb37f00df5b38af788e1fca9b8cf9402eda24c70c76ad15f6aa1257d5d6/synth_data_eval-0.1.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-25 18:32:58",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "ahmed-fouad-lagha",
    "github_project": "synth-data-eval",
    "github_not_found": true,
    "lcname": "synth-data-eval"
}

None