cleandat


Namecleandat JSON
Version 0.0.3 PyPI version JSON
download
home_pagehttps://github.com/tiadams/cleandat
SummaryPython functions to facilitate the pre-processing of data for ML tasks in a clinical context.
upload_time2024-01-17 15:21:21
maintainer
docs_urlNone
authorTim Adams
requires_python
licenseMIT License
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
# CleanDat
Python functions to facilitate the pre-processing of data to prepare them for ML tasks, especially suitable for data in a clinical context.

---

Major functionalities include heuristic based data cleaning and feature engineering like:
- Automatic detection of encoding strings (e.g. 1=m) and application of the corresponding encoding to un-encoded data of the corresponding column
- Automatic detection of date strings of different formats (e.g. 2019-01-01, 01/01/2019, January 2022) and conversion to a unified format
- Encoding of date strings into decomposed date features (e.g. year, month, day, weekday, etc.)
- Heuristics for unification of different number formats, e.g. 1,000.00 vs. 1.000,00 or exponential notations like 1e3 vs 10x10^2
- Detection and replacement of inconsistent data values

# Setup

Install via pip:

    pip install cleandat

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/tiadams/cleandat",
    "name": "cleandat",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "",
    "author": "Tim Adams",
    "author_email": "tim-adams@gmx.net",
    "download_url": "https://files.pythonhosted.org/packages/0a/ad/2111a9159e4fa098d6253e15f5aaec992e08287266b94089bd221c6e48f5/cleandat-0.0.3.tar.gz",
    "platform": null,
    "description": "\n# CleanDat\nPython functions to facilitate the pre-processing of data to prepare them for ML tasks, especially suitable for data in a clinical context.\n\n---\n\nMajor functionalities include heuristic based data cleaning and feature engineering like:\n- Automatic detection of encoding strings (e.g. 1=m) and application of the corresponding encoding to un-encoded data of the corresponding column\n- Automatic detection of date strings of different formats (e.g. 2019-01-01, 01/01/2019, January 2022) and conversion to a unified format\n- Encoding of date strings into decomposed date features (e.g. year, month, day, weekday, etc.)\n- Heuristics for unification of different number formats, e.g. 1,000.00 vs. 1.000,00 or exponential notations like 1e3 vs 10x10^2\n- Detection and replacement of inconsistent data values\n\n# Setup\n\nInstall via pip:\n\n    pip install cleandat\n",
    "bugtrack_url": null,
    "license": "MIT License",
    "summary": "Python functions to facilitate the pre-processing of data for ML tasks in a clinical context.",
    "version": "0.0.3",
    "project_urls": {
        "Homepage": "https://github.com/tiadams/cleandat"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "79eeb95719512cce8b823143520db0bf89c4f305f2d16de592a97db90d7feb33",
                "md5": "91ba3fc9baf2f03641a284eb7b8e480e",
                "sha256": "e4910f0d1907fdf00c95f4606520357c09b1d0e87e8d448b97c1f5b2f037b8f9"
            },
            "downloads": -1,
            "filename": "cleandat-0.0.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "91ba3fc9baf2f03641a284eb7b8e480e",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 13935,
            "upload_time": "2024-01-17T15:21:20",
            "upload_time_iso_8601": "2024-01-17T15:21:20.191295Z",
            "url": "https://files.pythonhosted.org/packages/79/ee/b95719512cce8b823143520db0bf89c4f305f2d16de592a97db90d7feb33/cleandat-0.0.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0aad2111a9159e4fa098d6253e15f5aaec992e08287266b94089bd221c6e48f5",
                "md5": "1ee4d2b84a8f32ded75f0aa9ec7d5dce",
                "sha256": "e85c54f195429135076066ac8136391ff9e12586b1dd202e0bc4fbd06e0613ce"
            },
            "downloads": -1,
            "filename": "cleandat-0.0.3.tar.gz",
            "has_sig": false,
            "md5_digest": "1ee4d2b84a8f32ded75f0aa9ec7d5dce",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 10505,
            "upload_time": "2024-01-17T15:21:21",
            "upload_time_iso_8601": "2024-01-17T15:21:21.315110Z",
            "url": "https://files.pythonhosted.org/packages/0a/ad/2111a9159e4fa098d6253e15f5aaec992e08287266b94089bd221c6e48f5/cleandat-0.0.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-01-17 15:21:21",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "tiadams",
    "github_project": "cleandat",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "cleandat"
}
        
Elapsed time: 0.16435s