<p>ARTSyn is a library containing models and algorithm implementations for synthesizing artificial tabular data. Such synthetic data are frequently useful in numerous classification and regression tasks under the presence of imbalanced datasets. Examples include fault/defect detection, intrusion detection, medical diagnoses, financial predictions, etc.</p><p>Most models in ARTSyn support conditional data generation, namely, generation of data instances that belong to a particular class. The models accept tabular data in CSV format and additional information about the column structure (e.g. columns with numeric/discrete values, class columns, etc.). Then, they are trained to generate additional samples either from a specific class, or without any condition. For the moment, ARTSyn emphasizes on Generative Adversarial Networks (GANs), but more models and algorithms will be supported in the future.</p><p><b>Licence:</b> Apache License, 2.0 (Apache-2.0)</p><p><b>Dependencies:</b>NumPy, Pandas, Matplotlib, Seaborn, joblib, Synthetic Data Vault (SDV), pyTorch, scikit-learn, xgboost, imblearn, Reversible Data Transforms (RDT), tqdm.</p><p><b>GitHub repository:</b> <a href="https://github.com/lakritidis/ARTSyn">https://github.com/lakritidis/artsyn</a></p><p><b>Publications:</b><ul><li>L. Akritidis, P. Bozanis, "A Clustering-Based Resampling Technique with Cluster Structure Analysis for Software Defect Detection in Imbalanced Datasets", Information Sciences, vol. 674,pp. 120724, 2024.</li><li>L. Akritidis, A. Fevgas, M. Alamaniotis, P. Bozanis, "Conditional Data Synthesis with Deep Generative Models for Imbalanced Dataset Oversampling", In Proceedings of the 35th IEEE International Conference on Tools with Artificial Intelligence, pp. 444-451, 2023, 2023.</li><li>L. Akritidis, P. Bozanis, "A Multi-Dimensional Survey on Learning from Imbalanced Data", Chapter in International Conference on Information, Intelligence, Systems, and Applications, pp. 13-45, 2024.</li>
Raw data
{
"_id": null,
"home_page": "https://github.com/lakritidis/artsyn",
"name": "ARTSyn",
"maintainer": "Leonidas Akritidis",
"docs_url": null,
"requires_python": null,
"maintainer_email": "lakritidis@ihu.gr",
"keywords": "tabular data, tabular data synthesis, data engineering, imbalanced data, GAN, VAE, oversampling, machine learning, deep learning",
"author": "Leonidas Akritidis",
"author_email": "lakritidis@ihu.gr",
"download_url": "https://files.pythonhosted.org/packages/61/a1/9d403329ae920aec749b220a798637c4734f50eb967ee6152388c132b051/artsyn-0.5.2.tar.gz",
"platform": null,
"description": "<p>ARTSyn is a library containing models and algorithm implementations for synthesizing artificial tabular data. Such synthetic data are frequently useful in numerous classification and regression tasks under the presence of imbalanced datasets. Examples include fault/defect detection, intrusion detection, medical diagnoses, financial predictions, etc.</p><p>Most models in ARTSyn support conditional data generation, namely, generation of data instances that belong to a particular class. The models accept tabular data in CSV format and additional information about the column structure (e.g. columns with numeric/discrete values, class columns, etc.). Then, they are trained to generate additional samples either from a specific class, or without any condition. For the moment, ARTSyn emphasizes on Generative Adversarial Networks (GANs), but more models and algorithms will be supported in the future.</p><p><b>Licence:</b> Apache License, 2.0 (Apache-2.0)</p><p><b>Dependencies:</b>NumPy, Pandas, Matplotlib, Seaborn, joblib, Synthetic Data Vault (SDV), pyTorch, scikit-learn, xgboost, imblearn, Reversible Data Transforms (RDT), tqdm.</p><p><b>GitHub repository:</b> <a href=\"https://github.com/lakritidis/ARTSyn\">https://github.com/lakritidis/artsyn</a></p><p><b>Publications:</b><ul><li>L. Akritidis, P. Bozanis, \"A Clustering-Based Resampling Technique with Cluster Structure Analysis for Software Defect Detection in Imbalanced Datasets\", Information Sciences, vol. 674,pp. 120724, 2024.</li><li>L. Akritidis, A. Fevgas, M. Alamaniotis, P. Bozanis, \"Conditional Data Synthesis with Deep Generative Models for Imbalanced Dataset Oversampling\", In Proceedings of the 35th IEEE International Conference on Tools with Artificial Intelligence, pp. 444-451, 2023, 2023.</li><li>L. Akritidis, P. Bozanis, \"A Multi-Dimensional Survey on Learning from Imbalanced Data\", Chapter in International Conference on Information, Intelligence, Systems, and Applications, pp. 13-45, 2024.</li>\n",
"bugtrack_url": null,
"license": "Apache",
"summary": "Artificial Tabular Data Synthesizers",
"version": "0.5.2",
"project_urls": {
"Homepage": "https://github.com/lakritidis/artsyn"
},
"split_keywords": [
"tabular data",
" tabular data synthesis",
" data engineering",
" imbalanced data",
" gan",
" vae",
" oversampling",
" machine learning",
" deep learning"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "61a19d403329ae920aec749b220a798637c4734f50eb967ee6152388c132b051",
"md5": "68998392ae6b4dbf912b2dc82c227dc1",
"sha256": "b7d199a0e80c89dd703a751910ad4723e4a81a2d4c58d33688d53ee03b1d0e94"
},
"downloads": -1,
"filename": "artsyn-0.5.2.tar.gz",
"has_sig": false,
"md5_digest": "68998392ae6b4dbf912b2dc82c227dc1",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 108675,
"upload_time": "2025-07-29T13:26:50",
"upload_time_iso_8601": "2025-07-29T13:26:50.985141Z",
"url": "https://files.pythonhosted.org/packages/61/a1/9d403329ae920aec749b220a798637c4734f50eb967ee6152388c132b051/artsyn-0.5.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-29 13:26:50",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "lakritidis",
"github_project": "artsyn",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "artsyn"
}