# distfeatselect
distfeatselect: distributed feature selection
## Authors and Contributors
[Dzeneta Kudumovic](https://github.com/dkudumovic1), [Dr Aida Brankovic](https://www.linkedin.com/in/aida-brankovic-phd-it-msc-ee-4616a038), [Prof Luigi Piroddi](https://www.deib.polimi.it/eng/people/details/318548#:~:text=Born%20in%20London%20in%201966,D)
## Overview
distfeatselect is a Python package implementing an innovative Distributed Feature Selection algorithm based on vertical data partitioning and distributed searching. By dividing features into subsets and assigning dedicated processors for local searches, the algorithm achieves parallelism and scalability, making it suitable for large-scale datasets. The distributed architecture enhances efficiency by operating on smaller search spaces, reducing computational time. Moreover, algorithm's tendency to produce simple model structures enhances interpretability and robustness.
The DFS (Distributed Feature Selection) algorithm is implemented as a scikit-learn-compatible transformer, adhering to the scikit-learn API standards. This enables seamless integration into scikit-learn workflows, empowering users to incorporate distributed feature selection into their machine learning pipelines.
To get started with distfeatselect, explore our provided notebooks, which offer hands-on examples and demonstrations of the package's functionality.
### Requirements
The package dcor depends on the following libraries:
- numpy
- pandas,
- statsmodels
- scikit-learn
- dcor
## Installation
distfeatselect is on PyPi and can be installed using pip:
```bash
pip install distfeatselect
```
## Documentation
- [Quick start notebook](https://dkudumovic1.github.io/distfeatselect/quick_start.html)
- [DFS API Docs](https://dkudumovic1.github.io/distfeatselect/dfs.html)
- [RFS API Docs](https://dkudumovic1.github.io/distfeatselect/rfs.html)
## References
The algorithms integrated into these packages have their foundations in rigorous academic research. Specifically, the methodology employed is derived from the research papers [1], [2] and [3]. By implementing the insights and techniques outlined in this scholarly work, the packages aim to provide users with efficient and effective solutions informed by the latest advancements in the field.
[1] Brankovic, A., Falsone, A., Prandini, M., Piroddi, L. (2018). [A feature selection and classification algorithm based on randomized extraction of model populations](https://ieeexplore.ieee.org/document/7890437)
[2] Brankovic, A., Piroddi, L. (2019). [A distributed feature selection scheme with partial information sharing](https://link.springer.com/article/10.1007/s10994-019-05809-y)
[3] Brankovic, A., Hosseini, M., Piroddi, L. (2018). [A Distributed Feature Selection Algorithm Based on Distance Correlation with an Application to Microarrays](https://ieeexplore.ieee.org/abstract/document/8356595)
## License
This project is licensed under the [MIT License](LICENSE).
Raw data
{
"_id": null,
"home_page": "https://github.com/dkudumovic1/distfeatselect",
"name": "distfeatselect",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "feature selection, distributed feature selection, randomised feature selection",
"author": "Dzeneta Kudumovic",
"author_email": "dzeneta.kudum@gmail.com",
"download_url": "https://github.com/dkudumovic1/distfeatselect/archive/refs/tags/0.1.4.tar.gz",
"platform": null,
"description": "# distfeatselect\r\ndistfeatselect: distributed feature selection\r\n\r\n## Authors and Contributors\r\n[Dzeneta Kudumovic](https://github.com/dkudumovic1), [Dr Aida Brankovic](https://www.linkedin.com/in/aida-brankovic-phd-it-msc-ee-4616a038), [Prof Luigi Piroddi](https://www.deib.polimi.it/eng/people/details/318548#:~:text=Born%20in%20London%20in%201966,D)\r\n\r\n## Overview\r\ndistfeatselect is a Python package implementing an innovative Distributed Feature Selection algorithm based on vertical data partitioning and distributed searching. By dividing features into subsets and assigning dedicated processors for local searches, the algorithm achieves parallelism and scalability, making it suitable for large-scale datasets. The distributed architecture enhances efficiency by operating on smaller search spaces, reducing computational time. Moreover, algorithm's tendency to produce simple model structures enhances interpretability and robustness. \r\n\r\nThe DFS (Distributed Feature Selection) algorithm is implemented as a scikit-learn-compatible transformer, adhering to the scikit-learn API standards. This enables seamless integration into scikit-learn workflows, empowering users to incorporate distributed feature selection into their machine learning pipelines.\r\n\r\nTo get started with distfeatselect, explore our provided notebooks, which offer hands-on examples and demonstrations of the package's functionality.\r\n\r\n### Requirements\r\nThe package dcor depends on the following libraries:\r\n- numpy\r\n- pandas,\r\n- statsmodels\r\n- scikit-learn\r\n- dcor\r\n\r\n## Installation\r\ndistfeatselect is on PyPi and can be installed using pip:\r\n```bash\r\npip install distfeatselect\r\n```\r\n\r\n## Documentation\r\n- [Quick start notebook](https://dkudumovic1.github.io/distfeatselect/quick_start.html)\r\n- [DFS API Docs](https://dkudumovic1.github.io/distfeatselect/dfs.html)\r\n- [RFS API Docs](https://dkudumovic1.github.io/distfeatselect/rfs.html)\r\n \r\n## References\r\nThe algorithms integrated into these packages have their foundations in rigorous academic research. Specifically, the methodology employed is derived from the research papers [1], [2] and [3]. By implementing the insights and techniques outlined in this scholarly work, the packages aim to provide users with efficient and effective solutions informed by the latest advancements in the field.\r\n\r\n[1] Brankovic, A., Falsone, A., Prandini, M., Piroddi, L. (2018). [A feature selection and classification algorithm based on randomized extraction of model populations](https://ieeexplore.ieee.org/document/7890437)\r\n\r\n[2] Brankovic, A., Piroddi, L. (2019). [A distributed feature selection scheme with partial information sharing](https://link.springer.com/article/10.1007/s10994-019-05809-y)\r\n\r\n[3] Brankovic, A., Hosseini, M., Piroddi, L. (2018). [A Distributed Feature Selection Algorithm Based on Distance Correlation with an Application to Microarrays](https://ieeexplore.ieee.org/abstract/document/8356595)\r\n\r\n## License\r\nThis project is licensed under the [MIT License](LICENSE).\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A Pyhton package implementing distributed feature selection algorithm.",
"version": "0.1.4",
"project_urls": {
"Documentation": "https://dkudumovic1.github.io/distfeatselect/",
"Download": "https://github.com/dkudumovic1/distfeatselect/archive/refs/tags/0.1.4.tar.gz",
"Homepage": "https://github.com/dkudumovic1/distfeatselect"
},
"split_keywords": [
"feature selection",
" distributed feature selection",
" randomised feature selection"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "c4165a3f1f32fbd144558e51a0a6adbc339f3850dbbcc4c116d383910055a250",
"md5": "b10cb7479ed58c0c2c7f2c246739e5e7",
"sha256": "5ef496b8777a787e0d261e94b28d403511638816888edf91e807d08bd3886f9f"
},
"downloads": -1,
"filename": "distfeatselect-0.1.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "b10cb7479ed58c0c2c7f2c246739e5e7",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 19089,
"upload_time": "2024-11-06T22:54:46",
"upload_time_iso_8601": "2024-11-06T22:54:46.201925Z",
"url": "https://files.pythonhosted.org/packages/c4/16/5a3f1f32fbd144558e51a0a6adbc339f3850dbbcc4c116d383910055a250/distfeatselect-0.1.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-06 22:54:46",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "dkudumovic1",
"github_project": "distfeatselect",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "distfeatselect"
}