# RACEkNN: A Hybrid Rule-Guided k-Nearest Neighbor Classifier
[](https://opensource.org/licenses/MIT)
This repository contains the official Python implementation for the paper: **"RACEkNN: A hybrid approach for improving the effectiveness of the k-nearest neighbor algorithm"**.
RACEkNN is a hybrid classifier that integrates kNN with **RACER** (Rule Aggregating ClassifiEr), a novel rule-based classifier. RACER generates generalized rules to identify the most relevant subset of the training data for a given test instance. This pre-selection significantly reduces the search space for kNN, leading to faster execution times and improved classification accuracy.
---
## 📖 About the Paper
**Title:** RACEkNN: A hybrid approach for improving the effectiveness of the k-nearest neighbor algorithm
**Journal:** *Knowledge-Based Systems* (Volume 301), 2024
**DOI:** [10.1016/j.knosys.2024.112357](https://doi.org/10.1016/j.knosys.2024.112357)
**Authors:** Mahdiyeh Ebrahimi, Alireza Basiri
### Abstract
> Classification is a fundamental task in data mining, involving the prediction of class labels for new data. k-Nearest Neighbors (kNN), a lazy learning algorithm, is sensitive to data distribution and suffers from high computational costs due to the requirement of finding the closest neighbors across the entire training set. Recent advancements in classification techniques have led to the development of hybrid algorithms that combine the strengths of multiple methods to address specific limitations. In response to the inherent execution time constraint of kNN and the impact of data distribution on its performance, we propose RACEkNN (Rule Aggregating ClassifiEr kNN), a hybrid solution that integrates kNN with RACER, a newly devised rule-based classifier. RACER improves predictive capability and decreases kNN’s runtime by creating more generalized rules, each encompassing a subset of training instances with similar characteristics. During prediction, a test instance is compared to these rules based on its features. By selecting the rule with the closest match, the test instance identifies the most relevant subset of training data for kNN. This significantly reduces the data kNN needs to consider, leading to faster execution times and enhanced prediction accuracy. Empirical findings demonstrate that RACEkNN outperforms kNN in terms of both runtime and accuracy. Additionally, it surpasses RACER, four well-known classifiers, and certain kNN-based methods in terms of accuracy.
---
## 🚀 Installation
To get started, clone the repository and install the required dependencies.
1. **Clone the repository:**
```bash
git clone [https://github.com/mahdiyehebrahimi/RACEkNN.git](https://github.com/mahdiyehebrahimi/RACEkNN.git)
cd RACEkNN
```
2. **Install dependencies:**
It is recommended to use a virtual environment.
```bash
pip install -r requirements.txt
```
---
## 💡 Usage Example
You can use `RACEKNNClassifier` just like any other scikit-learn classifier. Here is a simple example using the "Car Evaluation" dataset included in the `Datasets/` directory.
```python
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from raceknn import RACEKNNClassifier
# Load data
df = pd.read_csv(
"Datasets/car_evaluation.data",
names=["buying", "maint", "doors", "persons", "lug_boot", "safety", "class"]
)
X = df.drop(columns=['class'])
y = df['class']
# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)
# Initialize and fit the classifier
# alpha: RACER fitness trade-off (accuracy vs. coverage)
# k: Number of neighbors for the final kNN vote
clf = RACEKNNClassifier(alpha=0.9, k=5)
clf.fit(X_train, y_train)
# Predict and evaluate
y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy of RACEKNN Classifier: {accuracy:.4f}")
```
For more examples, including how to use k-fold cross-validation, see the `example.py`.
---
## 🎓 Citing This Work
If you use RACEkNN in your research, please cite our paper.
### BibTeX
```bibtex
@article{EBRAHIMI2024112357,
title = {RACEkNN: A hybrid approach for improving the effectiveness of the k-nearest neighbor algorithm},
journal = {Knowledge-Based Systems},
volume = {301},
pages = {112357},
year = {2024},
issn = {0950-7051},
doi = {[https://doi.org/10.1016/j.knosys.2024.112357](https://doi.org/10.1016/j.knosys.2024.112357)},
author = {Mahdiyeh Ebrahimi and Alireza Basiri}
}
```
---
## 📄 License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
Raw data
{
"_id": null,
"home_page": null,
"name": "raceknn",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": null,
"keywords": "knn, k-nearest-neighbors, rule-based, classification, machine learning",
"author": "Mahdiyeh Ebrahimi",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/1d/89/57f0d26f64b5f9a96b215b29b6eb3d7d82638010f79e322c3e55069ad995/raceknn-0.1.0.tar.gz",
"platform": null,
"description": "# RACEkNN: A Hybrid Rule-Guided k-Nearest Neighbor Classifier\r\n\r\n[](https://opensource.org/licenses/MIT)\r\n\r\nThis repository contains the official Python implementation for the paper: **\"RACEkNN: A hybrid approach for improving the effectiveness of the k-nearest neighbor algorithm\"**.\r\n\r\nRACEkNN is a hybrid classifier that integrates kNN with **RACER** (Rule Aggregating ClassifiEr), a novel rule-based classifier. RACER generates generalized rules to identify the most relevant subset of the training data for a given test instance. This pre-selection significantly reduces the search space for kNN, leading to faster execution times and improved classification accuracy.\r\n\r\n---\r\n\r\n## \ud83d\udcd6 About the Paper\r\n\r\n**Title:** RACEkNN: A hybrid approach for improving the effectiveness of the k-nearest neighbor algorithm\r\n**Journal:** *Knowledge-Based Systems* (Volume 301), 2024\r\n**DOI:** [10.1016/j.knosys.2024.112357](https://doi.org/10.1016/j.knosys.2024.112357)\r\n**Authors:** Mahdiyeh Ebrahimi, Alireza Basiri\r\n\r\n### Abstract\r\n> Classification is a fundamental task in data mining, involving the prediction of class labels for new data. k-Nearest Neighbors (kNN), a lazy learning algorithm, is sensitive to data distribution and suffers from high computational costs due to the requirement of finding the closest neighbors across the entire training set. Recent advancements in classification techniques have led to the development of hybrid algorithms that combine the strengths of multiple methods to address specific limitations. In response to the inherent execution time constraint of kNN and the impact of data distribution on its performance, we propose RACEkNN (Rule Aggregating ClassifiEr kNN), a hybrid solution that integrates kNN with RACER, a newly devised rule-based classifier. RACER improves predictive capability and decreases kNN\u2019s runtime by creating more generalized rules, each encompassing a subset of training instances with similar characteristics. During prediction, a test instance is compared to these rules based on its features. By selecting the rule with the closest match, the test instance identifies the most relevant subset of training data for kNN. This significantly reduces the data kNN needs to consider, leading to faster execution times and enhanced prediction accuracy. Empirical findings demonstrate that RACEkNN outperforms kNN in terms of both runtime and accuracy. Additionally, it surpasses RACER, four well-known classifiers, and certain kNN-based methods in terms of accuracy.\r\n\r\n---\r\n\r\n## \ud83d\ude80 Installation\r\n\r\nTo get started, clone the repository and install the required dependencies.\r\n\r\n1. **Clone the repository:**\r\n ```bash\r\n git clone [https://github.com/mahdiyehebrahimi/RACEkNN.git](https://github.com/mahdiyehebrahimi/RACEkNN.git)\r\n cd RACEkNN\r\n ```\r\n\r\n2. **Install dependencies:**\r\n It is recommended to use a virtual environment.\r\n ```bash\r\n pip install -r requirements.txt\r\n ```\r\n\r\n---\r\n\r\n## \ud83d\udca1 Usage Example\r\n\r\nYou can use `RACEKNNClassifier` just like any other scikit-learn classifier. Here is a simple example using the \"Car Evaluation\" dataset included in the `Datasets/` directory.\r\n\r\n```python\r\nimport pandas as pd\r\nfrom sklearn.model_selection import train_test_split\r\nfrom sklearn.metrics import accuracy_score\r\nfrom raceknn import RACEKNNClassifier\r\n\r\n# Load data\r\ndf = pd.read_csv(\r\n \"Datasets/car_evaluation.data\",\r\n names=[\"buying\", \"maint\", \"doors\", \"persons\", \"lug_boot\", \"safety\", \"class\"]\r\n)\r\nX = df.drop(columns=['class'])\r\ny = df['class']\r\n\r\n# Split the data\r\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)\r\n\r\n# Initialize and fit the classifier\r\n# alpha: RACER fitness trade-off (accuracy vs. coverage)\r\n# k: Number of neighbors for the final kNN vote\r\nclf = RACEKNNClassifier(alpha=0.9, k=5)\r\nclf.fit(X_train, y_train)\r\n\r\n# Predict and evaluate\r\ny_pred = clf.predict(X_test)\r\naccuracy = accuracy_score(y_test, y_pred)\r\nprint(f\"Accuracy of RACEKNN Classifier: {accuracy:.4f}\")\r\n```\r\nFor more examples, including how to use k-fold cross-validation, see the `example.py`.\r\n\r\n---\r\n\r\n## \ud83c\udf93 Citing This Work\r\n\r\nIf you use RACEkNN in your research, please cite our paper.\r\n\r\n### BibTeX\r\n```bibtex\r\n@article{EBRAHIMI2024112357,\r\n title = {RACEkNN: A hybrid approach for improving the effectiveness of the k-nearest neighbor algorithm},\r\n journal = {Knowledge-Based Systems},\r\n volume = {301},\r\n pages = {112357},\r\n year = {2024},\r\n issn = {0950-7051},\r\n doi = {[https://doi.org/10.1016/j.knosys.2024.112357](https://doi.org/10.1016/j.knosys.2024.112357)},\r\n author = {Mahdiyeh Ebrahimi and Alireza Basiri}\r\n}\r\n```\r\n\r\n---\r\n\r\n## \ud83d\udcc4 License\r\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\r\n",
"bugtrack_url": null,
"license": null,
"summary": "A hybrid rule-guided k-Nearest Neighbor classifier for improved performance.",
"version": "0.1.0",
"project_urls": {
"Bug Tracker": "https://github.com/mahdiyehebrahimi/RACEkNN/issues",
"Homepage": "https://github.com/mahdiyehebrahimi/RACEkNN",
"Original Paper": "https://doi.org/10.1016/j.knosys.2024.112357"
},
"split_keywords": [
"knn",
" k-nearest-neighbors",
" rule-based",
" classification",
" machine learning"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "04dfb076926911f462f836670b8fc672d7ce99a1deffa94b2124f5f05b7d996b",
"md5": "28209b55beead0e7c79bad7e214381ae",
"sha256": "0d0794704a8d5679f498396fd22f2a8fece14081bb9cdd76513d34748faeae13"
},
"downloads": -1,
"filename": "raceknn-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "28209b55beead0e7c79bad7e214381ae",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 10707,
"upload_time": "2025-08-18T20:37:22",
"upload_time_iso_8601": "2025-08-18T20:37:22.111557Z",
"url": "https://files.pythonhosted.org/packages/04/df/b076926911f462f836670b8fc672d7ce99a1deffa94b2124f5f05b7d996b/raceknn-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "1d8957f0d26f64b5f9a96b215b29b6eb3d7d82638010f79e322c3e55069ad995",
"md5": "586efc0387bcd17f95bf4d8ec71ea032",
"sha256": "e2638b81ad081838bbfe36d9af0f4d21cad630f012d95a224475e276ef2f2f52"
},
"downloads": -1,
"filename": "raceknn-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "586efc0387bcd17f95bf4d8ec71ea032",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 12718,
"upload_time": "2025-08-18T20:37:23",
"upload_time_iso_8601": "2025-08-18T20:37:23.796120Z",
"url": "https://files.pythonhosted.org/packages/1d/89/57f0d26f64b5f9a96b215b29b6eb3d7d82638010f79e322c3e55069ad995/raceknn-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-18 20:37:23",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "mahdiyehebrahimi",
"github_project": "RACEkNN",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "numpy",
"specs": []
},
{
"name": "pandas",
"specs": []
},
{
"name": "scikit-learn",
"specs": []
},
{
"name": "optbinning",
"specs": []
}
],
"lcname": "raceknn"
}