# VishuML
A comprehensive machine learning library implementing fundamental algorithms from scratch in Python. This library provides educational implementations of popular ML algorithms without relying on external ML frameworks like scikit-learn.
## Features
**🎯 sklearn-compatible API** - Works seamlessly with pandas DataFrames and CSV data!
VishuML implements the following machine learning algorithms:
### Supervised Learning
- **Linear Regression** - For continuous target prediction
- **Logistic Regression** - For binary classification
- **K-Nearest Neighbors (KNN)** - For classification and regression
- **Support Vector Machine (SVM)** - For binary classification with linear and RBF kernels
- **Decision Tree** - For classification using CART algorithm
- **Naive Bayes** - Gaussian Naive Bayes for classification
- **Perceptron** - Linear binary classifier
### Unsupervised Learning
- **K-Means Clustering** - For data clustering
### Utilities
- Data splitting (train/test split)
- Evaluation metrics (accuracy, R², MSE)
- Distance functions
- Data normalization
- Confusion matrix
## Installation
### From PyPI (when published)
```bash
pip install vishuml
```
### From Source
```bash
git clone https://github.com/vishuRizz/vishuml.git
cd vishuml
pip install -e .
```
## Quick Start
### 🚀 Works with pandas DataFrames (Just like sklearn!)
```python
import pandas as pd
from vishuml import LinearRegression, LogisticRegression
from vishuml.utils import train_test_split, r2_score, accuracy_score
# Load your CSV data (just like sklearn!)
df = pd.read_csv('your_data.csv')
X = df[['feature1', 'feature2', 'feature3']] # Select features
y = df['target'] # Select target
# Train-test split (works with DataFrames!)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train model (accepts DataFrames!)
model = LinearRegression()
model.fit(X_train, y_train) # DataFrame input!
# Make predictions (works with DataFrames!)
predictions = model.predict(X_test)
score = model.score(X_test, y_test)
print(f"R² Score: {score:.4f}")
# Classification Example with real data
from vishuml import datasets as ds
X, y = ds.load_iris()
# Convert to DataFrame for realistic workflow
iris_df = pd.DataFrame(X, columns=['sepal_length', 'sepal_width', 'petal_length', 'petal_width'])
iris_df['species'] = y
# sklearn-like feature selection
features = iris_df[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']]
target = (iris_df['species'] == 0).astype(int) # Binary classification
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.3)
classifier = LogisticRegression()
classifier.fit(X_train, y_train) # DataFrame input!
accuracy = classifier.score(X_test, y_test)
print(f"Accuracy: {accuracy:.4f}")
```
### Traditional NumPy Arrays
```python
import numpy as np
from vishuml import LinearRegression, KMeans
# NumPy arrays also work (backward compatibility)
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([2, 4, 6, 8, 10])
model = LinearRegression()
model.fit(X, y)
predictions = model.predict([[6], [7]])
print(f"Predictions: {predictions}") # Should be close to [12, 14]
# Clustering Example
X = np.array([[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]])
kmeans = KMeans(k=2, random_state=42)
clusters = kmeans.fit_predict(X)
print(f"Cluster labels: {clusters}")
```
## Algorithm Documentation
### Linear Regression
```python
from vishuml import LinearRegression
# Create and train model
model = LinearRegression(fit_intercept=True)
model.fit(X_train, y_train)
# Make predictions
predictions = model.predict(X_test)
# Get R² score
score = model.score(X_test, y_test)
```
### Logistic Regression
```python
from vishuml import LogisticRegression
# Create and train model
model = LogisticRegression(learning_rate=0.01, max_iterations=1000)
model.fit(X_train, y_train)
# Make predictions
predictions = model.predict(X_test)
probabilities = model.predict_proba(X_test)
# Get accuracy
accuracy = model.score(X_test, y_test)
```
### K-Nearest Neighbors
```python
from vishuml import KNearestNeighbors
# For classification
knn_clf = KNearestNeighbors(k=3, task_type='classification')
knn_clf.fit(X_train, y_train)
predictions = knn_clf.predict(X_test)
# For regression
knn_reg = KNearestNeighbors(k=5, task_type='regression')
knn_reg.fit(X_train, y_train)
predictions = knn_reg.predict(X_test)
```
### Support Vector Machine
```python
from vishuml import SupportVectorMachine
# Linear SVM
svm_linear = SupportVectorMachine(C=1.0, kernel='linear')
svm_linear.fit(X_train, y_train)
# RBF SVM
svm_rbf = SupportVectorMachine(C=1.0, kernel='rbf', gamma=1.0)
svm_rbf.fit(X_train, y_train)
predictions = svm_rbf.predict(X_test)
decision_scores = svm_rbf.decision_function(X_test)
```
### Decision Tree
```python
from vishuml import DecisionTree
# Create and train model
tree = DecisionTree(max_depth=5, min_samples_split=2, min_samples_leaf=1)
tree.fit(X_train, y_train)
# Make predictions
predictions = tree.predict(X_test)
accuracy = tree.score(X_test, y_test)
```
### Naive Bayes
```python
from vishuml import NaiveBayes
# Create and train model
nb = NaiveBayes()
nb.fit(X_train, y_train)
# Make predictions
predictions = nb.predict(X_test)
probabilities = nb.predict_proba(X_test)
```
### Perceptron
```python
from vishuml import Perceptron
# Create and train model
perceptron = Perceptron(learning_rate=0.01, max_iterations=1000)
perceptron.fit(X_train, y_train)
# Make predictions
predictions = perceptron.predict(X_test)
decision_scores = perceptron.decision_function(X_test)
```
### K-Means Clustering
```python
from vishuml import KMeans
# Create and train model
kmeans = KMeans(k=3, init='k-means++', random_state=42)
kmeans.fit(X)
# Get cluster labels
labels = kmeans.labels
# Or predict for new data
new_labels = kmeans.predict(X_new)
# Transform to distance space
distances = kmeans.transform(X)
```
## Utility Functions
```python
from vishuml.utils import (
train_test_split, accuracy_score, r2_score,
mean_squared_error, euclidean_distance,
normalize, confusion_matrix
)
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Evaluate predictions
accuracy = accuracy_score(y_true, y_pred)
r2 = r2_score(y_true, y_pred)
mse = mean_squared_error(y_true, y_pred)
# Normalize features
X_normalized = normalize(X)
# Confusion matrix
cm = confusion_matrix(y_true, y_pred)
```
## Sample Datasets
The library includes sample datasets in CSV format:
- `datasets/iris.csv` - Classic iris flower classification dataset
- `datasets/housing.csv` - Housing price regression dataset
- `datasets/wine.csv` - Wine quality classification dataset
```python
import pandas as pd
import os
# Load sample datasets
iris_data = pd.read_csv('datasets/iris.csv')
housing_data = pd.read_csv('datasets/housing.csv')
wine_data = pd.read_csv('datasets/wine.csv')
```
## Examples
Check out the `examples/` directory for Jupyter notebook tutorials demonstrating each algorithm:
- `examples/linear_regression_example.ipynb`
- `examples/logistic_regression_example.ipynb`
- `examples/knn_example.ipynb`
- `examples/svm_example.ipynb`
- `examples/decision_tree_example.ipynb`
- `examples/naive_bayes_example.ipynb`
- `examples/perceptron_example.ipynb`
- `examples/kmeans_example.ipynb`
## Development
### Setup Development Environment
```bash
git clone https://github.com/vishuRizz/vishuml.git
cd vishuml
pip install -e ".[dev]"
```
### Running Tests
```bash
pytest tests/ -v --cov=vishuml
```
### Code Formatting
```bash
black vishuml/
flake8 vishuml/
```
## Requirements
- Python >= 3.7
- NumPy >= 1.19.0
## License
This project is licensed under the MIT License - see the LICENSE file for details.
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
## Educational Purpose
This library is designed for educational purposes to help understand how machine learning algorithms work under the hood. For production use, consider using mature libraries like scikit-learn, which are more optimized and feature-complete.
## Author
**Vishu** - [GitHub Profile](https://github.com/vishuRizz)
## Acknowledgments
- Inspired by scikit-learn's API design
- Algorithms implemented based on standard textbook descriptions
- Built for educational and learning purposes
Raw data
{
"_id": null,
"home_page": "https://github.com/vishuRizz/vishuml-pip-library",
"name": "vishuml",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": null,
"keywords": "machine learning, algorithms, classification, regression, clustering, data science",
"author": "Vishu pratap",
"author_email": "vishurizz0@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/59/0d/d5a4499b3f659e82d01f9fe1434e4f3d4cc14e3b09fa8d260dd70bfb23da/vishuml-0.1.6.tar.gz",
"platform": null,
"description": "# VishuML\n\nA comprehensive machine learning library implementing fundamental algorithms from scratch in Python. This library provides educational implementations of popular ML algorithms without relying on external ML frameworks like scikit-learn.\n\n## Features\n\n**\ud83c\udfaf sklearn-compatible API** - Works seamlessly with pandas DataFrames and CSV data!\n\nVishuML implements the following machine learning algorithms:\n\n### Supervised Learning\n\n- **Linear Regression** - For continuous target prediction\n- **Logistic Regression** - For binary classification\n- **K-Nearest Neighbors (KNN)** - For classification and regression\n- **Support Vector Machine (SVM)** - For binary classification with linear and RBF kernels\n- **Decision Tree** - For classification using CART algorithm\n- **Naive Bayes** - Gaussian Naive Bayes for classification\n- **Perceptron** - Linear binary classifier\n\n### Unsupervised Learning\n\n- **K-Means Clustering** - For data clustering\n\n### Utilities\n\n- Data splitting (train/test split)\n- Evaluation metrics (accuracy, R\u00b2, MSE)\n- Distance functions\n- Data normalization\n- Confusion matrix\n\n## Installation\n\n### From PyPI (when published)\n\n```bash\npip install vishuml\n```\n\n### From Source\n\n```bash\ngit clone https://github.com/vishuRizz/vishuml.git\ncd vishuml\npip install -e .\n```\n\n## Quick Start\n\n### \ud83d\ude80 Works with pandas DataFrames (Just like sklearn!)\n\n```python\nimport pandas as pd\nfrom vishuml import LinearRegression, LogisticRegression\nfrom vishuml.utils import train_test_split, r2_score, accuracy_score\n\n# Load your CSV data (just like sklearn!)\ndf = pd.read_csv('your_data.csv')\nX = df[['feature1', 'feature2', 'feature3']] # Select features\ny = df['target'] # Select target\n\n# Train-test split (works with DataFrames!)\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n\n# Train model (accepts DataFrames!)\nmodel = LinearRegression()\nmodel.fit(X_train, y_train) # DataFrame input!\n\n# Make predictions (works with DataFrames!)\npredictions = model.predict(X_test)\nscore = model.score(X_test, y_test)\nprint(f\"R\u00b2 Score: {score:.4f}\")\n\n# Classification Example with real data\nfrom vishuml import datasets as ds\nX, y = ds.load_iris()\n\n# Convert to DataFrame for realistic workflow\niris_df = pd.DataFrame(X, columns=['sepal_length', 'sepal_width', 'petal_length', 'petal_width'])\niris_df['species'] = y\n\n# sklearn-like feature selection\nfeatures = iris_df[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']]\ntarget = (iris_df['species'] == 0).astype(int) # Binary classification\n\nX_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.3)\n\nclassifier = LogisticRegression()\nclassifier.fit(X_train, y_train) # DataFrame input!\naccuracy = classifier.score(X_test, y_test)\nprint(f\"Accuracy: {accuracy:.4f}\")\n```\n\n### Traditional NumPy Arrays\n\n```python\nimport numpy as np\nfrom vishuml import LinearRegression, KMeans\n\n# NumPy arrays also work (backward compatibility)\nX = np.array([[1], [2], [3], [4], [5]])\ny = np.array([2, 4, 6, 8, 10])\n\nmodel = LinearRegression()\nmodel.fit(X, y)\npredictions = model.predict([[6], [7]])\nprint(f\"Predictions: {predictions}\") # Should be close to [12, 14]\n\n# Clustering Example\nX = np.array([[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]])\nkmeans = KMeans(k=2, random_state=42)\nclusters = kmeans.fit_predict(X)\nprint(f\"Cluster labels: {clusters}\")\n```\n\n## Algorithm Documentation\n\n### Linear Regression\n\n```python\nfrom vishuml import LinearRegression\n\n# Create and train model\nmodel = LinearRegression(fit_intercept=True)\nmodel.fit(X_train, y_train)\n\n# Make predictions\npredictions = model.predict(X_test)\n\n# Get R\u00b2 score\nscore = model.score(X_test, y_test)\n```\n\n### Logistic Regression\n\n```python\nfrom vishuml import LogisticRegression\n\n# Create and train model\nmodel = LogisticRegression(learning_rate=0.01, max_iterations=1000)\nmodel.fit(X_train, y_train)\n\n# Make predictions\npredictions = model.predict(X_test)\nprobabilities = model.predict_proba(X_test)\n\n# Get accuracy\naccuracy = model.score(X_test, y_test)\n```\n\n### K-Nearest Neighbors\n\n```python\nfrom vishuml import KNearestNeighbors\n\n# For classification\nknn_clf = KNearestNeighbors(k=3, task_type='classification')\nknn_clf.fit(X_train, y_train)\npredictions = knn_clf.predict(X_test)\n\n# For regression\nknn_reg = KNearestNeighbors(k=5, task_type='regression')\nknn_reg.fit(X_train, y_train)\npredictions = knn_reg.predict(X_test)\n```\n\n### Support Vector Machine\n\n```python\nfrom vishuml import SupportVectorMachine\n\n# Linear SVM\nsvm_linear = SupportVectorMachine(C=1.0, kernel='linear')\nsvm_linear.fit(X_train, y_train)\n\n# RBF SVM\nsvm_rbf = SupportVectorMachine(C=1.0, kernel='rbf', gamma=1.0)\nsvm_rbf.fit(X_train, y_train)\n\npredictions = svm_rbf.predict(X_test)\ndecision_scores = svm_rbf.decision_function(X_test)\n```\n\n### Decision Tree\n\n```python\nfrom vishuml import DecisionTree\n\n# Create and train model\ntree = DecisionTree(max_depth=5, min_samples_split=2, min_samples_leaf=1)\ntree.fit(X_train, y_train)\n\n# Make predictions\npredictions = tree.predict(X_test)\naccuracy = tree.score(X_test, y_test)\n```\n\n### Naive Bayes\n\n```python\nfrom vishuml import NaiveBayes\n\n# Create and train model\nnb = NaiveBayes()\nnb.fit(X_train, y_train)\n\n# Make predictions\npredictions = nb.predict(X_test)\nprobabilities = nb.predict_proba(X_test)\n```\n\n### Perceptron\n\n```python\nfrom vishuml import Perceptron\n\n# Create and train model\nperceptron = Perceptron(learning_rate=0.01, max_iterations=1000)\nperceptron.fit(X_train, y_train)\n\n# Make predictions\npredictions = perceptron.predict(X_test)\ndecision_scores = perceptron.decision_function(X_test)\n```\n\n### K-Means Clustering\n\n```python\nfrom vishuml import KMeans\n\n# Create and train model\nkmeans = KMeans(k=3, init='k-means++', random_state=42)\nkmeans.fit(X)\n\n# Get cluster labels\nlabels = kmeans.labels\n# Or predict for new data\nnew_labels = kmeans.predict(X_new)\n\n# Transform to distance space\ndistances = kmeans.transform(X)\n```\n\n## Utility Functions\n\n```python\nfrom vishuml.utils import (\n train_test_split, accuracy_score, r2_score,\n mean_squared_error, euclidean_distance,\n normalize, confusion_matrix\n)\n\n# Split data\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n\n# Evaluate predictions\naccuracy = accuracy_score(y_true, y_pred)\nr2 = r2_score(y_true, y_pred)\nmse = mean_squared_error(y_true, y_pred)\n\n# Normalize features\nX_normalized = normalize(X)\n\n# Confusion matrix\ncm = confusion_matrix(y_true, y_pred)\n```\n\n## Sample Datasets\n\nThe library includes sample datasets in CSV format:\n\n- `datasets/iris.csv` - Classic iris flower classification dataset\n- `datasets/housing.csv` - Housing price regression dataset\n- `datasets/wine.csv` - Wine quality classification dataset\n\n```python\nimport pandas as pd\nimport os\n\n# Load sample datasets\niris_data = pd.read_csv('datasets/iris.csv')\nhousing_data = pd.read_csv('datasets/housing.csv')\nwine_data = pd.read_csv('datasets/wine.csv')\n```\n\n## Examples\n\nCheck out the `examples/` directory for Jupyter notebook tutorials demonstrating each algorithm:\n\n- `examples/linear_regression_example.ipynb`\n- `examples/logistic_regression_example.ipynb`\n- `examples/knn_example.ipynb`\n- `examples/svm_example.ipynb`\n- `examples/decision_tree_example.ipynb`\n- `examples/naive_bayes_example.ipynb`\n- `examples/perceptron_example.ipynb`\n- `examples/kmeans_example.ipynb`\n\n## Development\n\n### Setup Development Environment\n\n```bash\ngit clone https://github.com/vishuRizz/vishuml.git\ncd vishuml\npip install -e \".[dev]\"\n```\n\n### Running Tests\n\n```bash\npytest tests/ -v --cov=vishuml\n```\n\n### Code Formatting\n\n```bash\nblack vishuml/\nflake8 vishuml/\n```\n\n## Requirements\n\n- Python >= 3.7\n- NumPy >= 1.19.0\n\n## License\n\nThis project is licensed under the MIT License - see the LICENSE file for details.\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.\n\n## Educational Purpose\n\nThis library is designed for educational purposes to help understand how machine learning algorithms work under the hood. For production use, consider using mature libraries like scikit-learn, which are more optimized and feature-complete.\n\n## Author\n\n**Vishu** - [GitHub Profile](https://github.com/vishuRizz)\n\n## Acknowledgments\n\n- Inspired by scikit-learn's API design\n- Algorithms implemented based on standard textbook descriptions\n- Built for educational and learning purposes\n\n",
"bugtrack_url": null,
"license": null,
"summary": "A machine learning library implementing algorithms from scratch",
"version": "0.1.6",
"project_urls": {
"Bug Reports": "https://github.com/vishuRizz/vishuml-pip-library/issues",
"Documentation": "https://github.com/vishuRizz/vishuml-pip-library#readme",
"Homepage": "https://github.com/vishuRizz/vishuml-pip-library",
"Source": "https://github.com/vishuRizz/vishuml-pip-library"
},
"split_keywords": [
"machine learning",
" algorithms",
" classification",
" regression",
" clustering",
" data science"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "6e2628bbd279e9ff55ecc4c82bcc6fd40c6ae58b31bfaeb8410edeff5e07a80c",
"md5": "7522e3844effabed36145bad3341c14e",
"sha256": "1632b13882565717a3b38c280ab182d3c486d984d99f2cab6a4d137265b35fdf"
},
"downloads": -1,
"filename": "vishuml-0.1.6-py3-none-any.whl",
"has_sig": false,
"md5_digest": "7522e3844effabed36145bad3341c14e",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 32059,
"upload_time": "2025-08-09T18:32:20",
"upload_time_iso_8601": "2025-08-09T18:32:20.446208Z",
"url": "https://files.pythonhosted.org/packages/6e/26/28bbd279e9ff55ecc4c82bcc6fd40c6ae58b31bfaeb8410edeff5e07a80c/vishuml-0.1.6-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "590dd5a4499b3f659e82d01f9fe1434e4f3d4cc14e3b09fa8d260dd70bfb23da",
"md5": "9c3f4cfa9dd0edc9c0da7345bebe0e32",
"sha256": "1ebe7032af485f2f30afdb8195b4bd25380620c68ed0ca5384be1f1068c53fb5"
},
"downloads": -1,
"filename": "vishuml-0.1.6.tar.gz",
"has_sig": false,
"md5_digest": "9c3f4cfa9dd0edc9c0da7345bebe0e32",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 27381,
"upload_time": "2025-08-09T18:32:21",
"upload_time_iso_8601": "2025-08-09T18:32:21.281543Z",
"url": "https://files.pythonhosted.org/packages/59/0d/d5a4499b3f659e82d01f9fe1434e4f3d4cc14e3b09fa8d260dd70bfb23da/vishuml-0.1.6.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-09 18:32:21",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "vishuRizz",
"github_project": "vishuml-pip-library",
"github_not_found": true,
"lcname": "vishuml"
}