clustered-imputation


Nameclustered-imputation JSON
Version 1.0.2 PyPI version JSON
download
home_pageNone
SummaryAdding correlation to handle MNAR
upload_time2025-02-22 08:00:57
maintainerNone
docs_urlNone
authorMRINAL KANGSA BANIK
requires_pythonNone
licenseNone
keywords python imputation mnar
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
# Clustering Imputation

## Installation

To install the package, run:

```bash

pip install clustering-imputation

```

## Usage



```python

from clustered_imputation import clusterImputer

df = ...  # Load your dataset

x = clusterImputer(data , basic_imputation , num_imputation , corr_threshold , max_iter)

x.impute()

```

# About the Package

## Features to be passed to the class clusterImputer

* data --> Pass your dataframe

* basic_imputation : Literal["mice" , "sice" , "em"] --> What imputation you want to perform on your clusters

* num_imputation : Literal["mean" , "median"] --> How do you want to handle your initial numeric column imputation for creating correlation matrix

* corr_threshold : 0.6 -->Threshold value to be used with respect to correlation matrix to create clusters

* max_iter : 10 -->Maximum iteration for MICE and SICE

## Problem Statement



* Traditional imputation techniques face several challenges:



* High-Dimensional and Sparse Data: Existing methods struggle with large, sparse datasets; efficient techniques for such cases are needed.



* Temporal Dependencies: Current methods often overlook temporal correlations in data.

## Need to develop a new algo

* Non-Random Missingness: Few methods address non-random missing patterns; improvements here could boost real-world application accuracy. We aim to develop an imputation method that considers "Missing Not at Random" (MNAR).



* Computational Complexity: MICE and EM methods are computationally expensive for high-dimensional data. Our approach aims to reduce time complexity.



## Philosophy of Our Solution: Clustered MICE/EM



We propose a clustering-based approach:



* Identify correlations between features.



* Apply MICE/EM within clusters rather than on the entire dataset.



* Combine results to reconstruct the dataset.



* This method effectively handles MNAR data by leveraging feature correlations.



For further details refer this [ppt](https://docs.google.com/presentation/d/1UZ2uDkleSgB2ZttjG1D6nmQhqk7uz5FQRW5UmSkB0Sg/edit?usp=sharing)

## Contributing



Pull requests are welcome. For major changes, please open an issue first

to discuss what you would like to change.



Please make sure to update tests as appropriate.


            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "clustered-imputation",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "python, imputation, MNAR",
    "author": "MRINAL KANGSA BANIK",
    "author_email": "<manukbanik30@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/e5/4a/2e499957f699e8ef5133c656a4e7cd172ec464b666bbb30c021fc6638f85/clustered_imputation-1.0.2.tar.gz",
    "platform": null,
    "description": "\r\n# Clustering Imputation\r\n\r\n## Installation\r\n\r\nTo install the package, run:\r\n\r\n```bash\r\n\r\npip install clustering-imputation\r\n\r\n```\r\n\r\n## Usage\r\n\r\n\r\n\r\n```python\r\n\r\nfrom clustered_imputation import clusterImputer\r\n\r\ndf = ...  # Load your dataset\r\n\r\nx = clusterImputer(data , basic_imputation , num_imputation , corr_threshold , max_iter)\r\n\r\nx.impute()\r\n\r\n```\r\n\r\n# About the Package\r\n\r\n## Features to be passed to the class clusterImputer\r\n\r\n* data --> Pass your dataframe\r\n\r\n* basic_imputation : Literal[\"mice\" , \"sice\" , \"em\"] --> What imputation you want to perform on your clusters\r\n\r\n* num_imputation : Literal[\"mean\" , \"median\"] --> How do you want to handle your initial numeric column imputation for creating correlation matrix\r\n\r\n* corr_threshold : 0.6 -->Threshold value to be used with respect to correlation matrix to create clusters\r\n\r\n* max_iter : 10 -->Maximum iteration for MICE and SICE\r\n\r\n## Problem Statement\r\n\r\n\r\n\r\n* Traditional imputation techniques face several challenges:\r\n\r\n\r\n\r\n* High-Dimensional and Sparse Data: Existing methods struggle with large, sparse datasets; efficient techniques for such cases are needed.\r\n\r\n\r\n\r\n* Temporal Dependencies: Current methods often overlook temporal correlations in data.\r\n\r\n## Need to develop a new algo\r\n\r\n* Non-Random Missingness: Few methods address non-random missing patterns; improvements here could boost real-world application accuracy. We aim to develop an imputation method that considers \"Missing Not at Random\" (MNAR).\r\n\r\n\r\n\r\n* Computational Complexity: MICE and EM methods are computationally expensive for high-dimensional data. Our approach aims to reduce time complexity.\r\n\r\n\r\n\r\n## Philosophy of Our Solution: Clustered MICE/EM\r\n\r\n\r\n\r\nWe propose a clustering-based approach:\r\n\r\n\r\n\r\n* Identify correlations between features.\r\n\r\n\r\n\r\n* Apply MICE/EM within clusters rather than on the entire dataset.\r\n\r\n\r\n\r\n* Combine results to reconstruct the dataset.\r\n\r\n\r\n\r\n* This method effectively handles MNAR data by leveraging feature correlations.\r\n\r\n\r\n\r\nFor further details refer this [ppt](https://docs.google.com/presentation/d/1UZ2uDkleSgB2ZttjG1D6nmQhqk7uz5FQRW5UmSkB0Sg/edit?usp=sharing)\r\n\r\n## Contributing\r\n\r\n\r\n\r\nPull requests are welcome. For major changes, please open an issue first\r\n\r\nto discuss what you would like to change.\r\n\r\n\r\n\r\nPlease make sure to update tests as appropriate.\r\n\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Adding correlation to handle MNAR",
    "version": "1.0.2",
    "project_urls": null,
    "split_keywords": [
        "python",
        " imputation",
        " mnar"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "06a154aa5207a855f53cb2866458ccae312aa25b1f162a80b28a21ed99ab363c",
                "md5": "882a0999cd341a180028ec32632f37b2",
                "sha256": "0597753249dfa1740789c615b7a6ae904cc67016d33bde115109cdb2be3ee68c"
            },
            "downloads": -1,
            "filename": "clustered_imputation-1.0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "882a0999cd341a180028ec32632f37b2",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 8263,
            "upload_time": "2025-02-22T08:00:54",
            "upload_time_iso_8601": "2025-02-22T08:00:54.530748Z",
            "url": "https://files.pythonhosted.org/packages/06/a1/54aa5207a855f53cb2866458ccae312aa25b1f162a80b28a21ed99ab363c/clustered_imputation-1.0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e54a2e499957f699e8ef5133c656a4e7cd172ec464b666bbb30c021fc6638f85",
                "md5": "12a87d6d33e0c1dd66b8eba414419ab2",
                "sha256": "5d7679489779ad7446065f846ec277bfc30f89d4ccfbc52eb450618eb3a68f7a"
            },
            "downloads": -1,
            "filename": "clustered_imputation-1.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "12a87d6d33e0c1dd66b8eba414419ab2",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 7293,
            "upload_time": "2025-02-22T08:00:57",
            "upload_time_iso_8601": "2025-02-22T08:00:57.500236Z",
            "url": "https://files.pythonhosted.org/packages/e5/4a/2e499957f699e8ef5133c656a4e7cd172ec464b666bbb30c021fc6638f85/clustered_imputation-1.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-22 08:00:57",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "clustered-imputation"
}
        
Elapsed time: 0.41382s