ForestDiffusion

Name	ForestDiffusion JSON
Version	1.0.5 JSON
	download
home_page	https://github.com/SamsungSAILMontreal/ForestDiffusion
Summary	Generating and Imputing Tabular Data via Diffusion and Flow XGBoost Models
upload_time	2023-12-15 18:16:24
maintainer
docs_url	None
author	Alexia Jolicoeur-Martineau
requires_python
license
keywords	python ai xgboost gbt tree forest tabular diffusion flow
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            Tabular data is hard to acquire and is subject to missing values. This paper proposes a novel approach to generate and impute mixed-type (continuous and categorical) tabular data using score-based diffusion and conditional flow matching. Contrary to previous work that relies on neural networks as function approximators, we instead utilize XGBoost, a popular Gradient-Boosted Tree (GBT) method. In addition to being elegant, we empirically show on various datasets that our method i) generates highly realistic synthetic data when the training dataset is either clean or tainted by missing data and ii) generates diverse plausible data imputations. Our method often outperforms deep-learning generation methods and can trained in parallel using CPUs without the need for a GPU. To make it easily accessible, we release our code through a Python library and an R package <arXiv:2309.09968>.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/SamsungSAILMontreal/ForestDiffusion",
    "name": "ForestDiffusion",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "python,AI,xgboost,GBT,tree,forest,tabular,diffusion,flow",
    "author": "Alexia Jolicoeur-Martineau",
    "author_email": "<alexia.jolicoeur-martineau@mail.mcgill.ca>",
    "download_url": "",
    "platform": null,
    "description": "Tabular data is hard to acquire and is subject to missing values. This paper proposes a novel approach to generate and impute mixed-type (continuous and categorical) tabular data using score-based diffusion and conditional flow matching. Contrary to previous work that relies on neural networks as function approximators, we instead utilize XGBoost, a popular Gradient-Boosted Tree (GBT) method. In addition to being elegant, we empirically show on various datasets that our method i) generates highly realistic synthetic data when the training dataset is either clean or tainted by missing data and ii) generates diverse plausible data imputations. Our method often outperforms deep-learning generation methods and can trained in parallel using CPUs without the need for a GPU. To make it easily accessible, we release our code through a Python library and an R package <arXiv:2309.09968>.\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Generating and Imputing Tabular Data via Diffusion and Flow XGBoost Models",
    "version": "1.0.5",
    "project_urls": {
        "Homepage": "https://github.com/SamsungSAILMontreal/ForestDiffusion"
    },
    "split_keywords": [
        "python",
        "ai",
        "xgboost",
        "gbt",
        "tree",
        "forest",
        "tabular",
        "diffusion",
        "flow"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c0efdc059566bc768906c876e600805dd8f3ce49798d87b2f37eba0c4065f9cf",
                "md5": "aed2783f0cea8670218904c4ee0e4603",
                "sha256": "43c81fd2b736b7b5a7a7174c23b156c64b3db5f3effb34db0b11bcfdeaa1866f"
            },
            "downloads": -1,
            "filename": "ForestDiffusion-1.0.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "aed2783f0cea8670218904c4ee0e4603",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 13643,
            "upload_time": "2023-12-15T18:16:24",
            "upload_time_iso_8601": "2023-12-15T18:16:24.328139Z",
            "url": "https://files.pythonhosted.org/packages/c0/ef/dc059566bc768906c876e600805dd8f3ce49798d87b2f37eba0c4065f9cf/ForestDiffusion-1.0.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-12-15 18:16:24",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "SamsungSAILMontreal",
    "github_project": "ForestDiffusion",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "forestdiffusion"
}

Alexia Jolicoeur-Martineau