KNearestNeighborSampling


NameKNearestNeighborSampling JSON
Version 0.0.2 PyPI version JSON
download
home_pagehttps://github.com/snimale/KNNSampler
SummaryDataset size reduction using KNN Sampling algorithm
upload_time2023-12-31 19:02:47
maintainer
docs_urlNone
authorSoham S. Nimale
requires_python
licenseMIT
keywords python k-nn sampling size reduction optimization knn sampler
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
# KNNSampler

KNNSampler is an implementation of the [Research paper](https://ieeexplore.ieee.org/document/8990391). It is created to help developers reduce the size of their Datasets by sampling the "Representatives" from the same. NN_SCORES and MNN_SCORES, as discussed in the referred paper, were used to find these "Representatives". KNNSampler works in both dynamic and static way, as discussed by the author in the paper.

### Setup

- Python 3.10

- Requirenments : numpy, pandas, sklearn

### Enhancements

- MNN_SCORES are calculated after every iteration for the entire dataset in the algorithm suggested in the research paperwhich. This leads to redundant calculations. Hence, in this package we only calculate MNN_SCORES for the shortlisted rows using NN_SCORES, producing the same result as the original algorithm but in an optimal way.

- Error was found in the line : train sample = train sample ∪ X[index] in the algorithm given in the research paper, we replace X[index] with X[train_index] for correct outcome.

- Error was found in the Until loop logic of algorithm in the research paper : (NN − score(X) = 0) ∨ (| train sample |≤ k); The second condition must be |X| <= k, changes were done.

- Values of t, m, s for (t,m,s)-nets were not provided in the paper, We give users the freedom to choose the t, m, and s values or use the default values provided.

### Important

- The dataset passed to the sample() function must **NOT CONTAIN COLUMN NAMED "idx"**.

- Warnings produced by "drop()" function in pandas.DataFrame must be **IGNORED**, since they have been added for debug purposes.

### Navigate

- [Package](https://github.com/snimale/KNNSampler/tree/dev/KNearestNeighborSampling)

- [Example Usecase](https://github.com/snimale/KNNSampler/tree/dev/others/example-usecase)

- [Example Results](https://github.com/snimale/KNNSampler/tree/dev/others/sampled_data_plotted_results)

- [Sratch Code](https://github.com/snimale/KNNSampler/tree/dev/others/knn-sampling-scratch-code)



### Acknowledgement

I have "implemented" and "added optimizations" to the original research work done by : Bheekya Dharamsotu, K. Swarupa Rani, Salman Abdul Moiz, and C. Raghavendra Rao in the research paper : </br> </br>

B. Dharamsotu, K. S. Rani, S. Abdul Moiz and C. R. Rao, "k-NN Sampling for Visualization of Dynamic Data Using LION-tSNE," 2019 IEEE 26th International Conference on High Performance Computing, Data, and Analytics (HiPC), Hyderabad, India, 2019, pp. 63-72, doi: 10.1109/HiPC.2019.00019.


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/snimale/KNNSampler",
    "name": "KNearestNeighborSampling",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "python,K-NN,Sampling,Size reduction,Optimization,knn sampler",
    "author": "Soham S. Nimale",
    "author_email": "soham.sachin.nimale@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/d0/5f/2f2eed1cd8b8355110ce8a3d04575ce55009185356a2f7d3c03e215ce99f/KNearestNeighborSampling-0.0.2.tar.gz",
    "platform": null,
    "description": "\r\n# KNNSampler\r\n\r\nKNNSampler is an implementation of the [Research paper](https://ieeexplore.ieee.org/document/8990391). It is created to help developers reduce the size of their Datasets by sampling the \"Representatives\" from the same. NN_SCORES and MNN_SCORES, as discussed in the referred paper, were used to find these \"Representatives\". KNNSampler works in both dynamic and static way, as discussed by the author in the paper.\r\n\r\n### Setup\r\n\r\n- Python 3.10\r\n\r\n- Requirenments : numpy, pandas, sklearn\r\n\r\n### Enhancements\r\n\r\n- MNN_SCORES are calculated after every iteration for the entire dataset in the algorithm suggested in the research paperwhich. This leads to redundant calculations. Hence, in this package we only calculate MNN_SCORES for the shortlisted rows using NN_SCORES, producing the same result as the original algorithm but in an optimal way.\r\n\r\n- Error was found in the line : train sample = train sample \u222a X[index] in the algorithm given in the research paper, we replace X[index] with X[train_index] for correct outcome.\r\n\r\n- Error was found in the Until loop logic of algorithm in the research paper : (NN \u2212 score(X) = 0) \u2228 (| train sample |\u2264 k); The second condition must be |X| <= k, changes were done.\r\n\r\n- Values of t, m, s for (t,m,s)-nets were not provided in the paper, We give users the freedom to choose the t, m, and s values or use the default values provided.\r\n\r\n### Important\r\n\r\n- The dataset passed to the sample() function must **NOT CONTAIN COLUMN NAMED \"idx\"**.\r\n\r\n- Warnings produced by \"drop()\" function in pandas.DataFrame must be **IGNORED**, since they have been added for debug purposes.\r\n\r\n### Navigate\r\n\r\n- [Package](https://github.com/snimale/KNNSampler/tree/dev/KNearestNeighborSampling)\r\n\r\n- [Example Usecase](https://github.com/snimale/KNNSampler/tree/dev/others/example-usecase)\r\n\r\n- [Example Results](https://github.com/snimale/KNNSampler/tree/dev/others/sampled_data_plotted_results)\r\n\r\n- [Sratch Code](https://github.com/snimale/KNNSampler/tree/dev/others/knn-sampling-scratch-code)\r\n\r\n\r\n\r\n### Acknowledgement\r\n\r\nI have \"implemented\" and \"added optimizations\" to the original research work done by : Bheekya Dharamsotu, K. Swarupa Rani, Salman Abdul Moiz, and C. Raghavendra Rao in the research paper : </br> </br>\r\n\r\nB. Dharamsotu, K. S. Rani, S. Abdul Moiz and C. R. Rao, \"k-NN Sampling for Visualization of Dynamic Data Using LION-tSNE,\" 2019 IEEE 26th International Conference on High Performance Computing, Data, and Analytics (HiPC), Hyderabad, India, 2019, pp. 63-72, doi: 10.1109/HiPC.2019.00019.\r\n\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Dataset size reduction using KNN Sampling algorithm",
    "version": "0.0.2",
    "project_urls": {
        "Homepage": "https://github.com/snimale/KNNSampler"
    },
    "split_keywords": [
        "python",
        "k-nn",
        "sampling",
        "size reduction",
        "optimization",
        "knn sampler"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "821662f5955f5dedc98ee5296ce50bf3ded1f5e910cf6637972c5c8d93ca58af",
                "md5": "11232ab95635c4caa14c611e5adb5a3b",
                "sha256": "46accfe7825d6ed8b85def5c533bb5da1a2b4fe5cdb46252624e97dd42681382"
            },
            "downloads": -1,
            "filename": "KNearestNeighborSampling-0.0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "11232ab95635c4caa14c611e5adb5a3b",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 6604,
            "upload_time": "2023-12-31T19:02:45",
            "upload_time_iso_8601": "2023-12-31T19:02:45.358185Z",
            "url": "https://files.pythonhosted.org/packages/82/16/62f5955f5dedc98ee5296ce50bf3ded1f5e910cf6637972c5c8d93ca58af/KNearestNeighborSampling-0.0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d05f2f2eed1cd8b8355110ce8a3d04575ce55009185356a2f7d3c03e215ce99f",
                "md5": "17eab86d1efefbce1b44fdd78b41bdd5",
                "sha256": "61042b02f59ad4108750bf4afcbf85b40bcba38f82c3c5f37f10f25c927353ea"
            },
            "downloads": -1,
            "filename": "KNearestNeighborSampling-0.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "17eab86d1efefbce1b44fdd78b41bdd5",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 6260,
            "upload_time": "2023-12-31T19:02:47",
            "upload_time_iso_8601": "2023-12-31T19:02:47.309897Z",
            "url": "https://files.pythonhosted.org/packages/d0/5f/2f2eed1cd8b8355110ce8a3d04575ce55009185356a2f7d3c03e215ce99f/KNearestNeighborSampling-0.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-12-31 19:02:47",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "snimale",
    "github_project": "KNNSampler",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "knearestneighborsampling"
}
        
Elapsed time: 0.16964s