# Stripje - Make sklearn pipelines lean
[](https://www.python.org/downloads/)
[](https://opensource.org/licenses/MIT)
[](https://github.com/hadi-gharibi/stripje)
**Speed up your scikit-learn pipelines for single-row predictions by 2-10x!**
Stripje is a high-performance compiler that converts trained scikit-learn pipelines into optimized Python functions, eliminating numpy overhead for single-row inference.
## 🚀 Why Stripje?
- **⚡ 2-200x faster** single-row predictions, depending on the pipeline complexity
- **🔧 Drop-in replacement** - works with your existing pipelines
- **🎯 Zero configuration** - just compile and use
- **🛠️ Production ready** - optimized for real-time inference
## 📦 Installation
```bash
pip install stripje
```
Or with uv (recommended):
```bash
uv add stripje
```
## ⚡ Quick Start
```python
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from stripje import compile_pipeline
# 1. Create and fit your pipeline as usual
pipeline = Pipeline([
('scaler', StandardScaler()),
('classifier', LogisticRegression())
])
pipeline.fit(X_train, y_train)
# 2. Compile for fast single-row inference
fast_predict = compile_pipeline(pipeline)
# 3. Get predictions up to 10x faster!
test_row = [1.2, -0.5, 0.8, -1.1]
prediction = fast_predict(test_row) # Much faster than pipeline.predict([test_row])
```
## 🎯 The Problem We Solve
**Standard scikit-learn pipelines are slow for single predictions** because they're optimized for batch processing. When you need to predict one row at a time (like in web APIs), numpy operations create unnecessary overhead.
**Stripje compiles your trained pipeline** into a specialized function that:
- ✅ Extracts fitted parameters once
- ✅ Eliminates array creation overhead
- ✅ Uses native Python operations
- ✅ Maintains identical results
## 📊 Performance Comparison
```python
import time
from sklearn.datasets import make_classification
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
# Setup
X, y = make_classification(n_samples=1000, n_features=20)
pipeline = Pipeline([('scaler', StandardScaler()), ('clf', LogisticRegression())])
pipeline.fit(X, y)
fast_predict = compile_pipeline(pipeline)
test_row = X[0].tolist()
# Benchmark single-row predictions
def benchmark_standard():
start = time.time()
for _ in range(1000):
pipeline.predict([test_row])
return time.time() - start
def benchmark_compiled():
start = time.time()
for _ in range(1000):
fast_predict(test_row)
return time.time() - start
standard_time = benchmark_standard()
compiled_time = benchmark_compiled()
speedup = standard_time / compiled_time
print(f"Standard pipeline: {standard_time:.3f}s")
print(f"Compiled pipeline: {compiled_time:.3f}s")
print(f"Speedup: {speedup:.1f}x faster!")
```
## 🔧 Supported Components
Stripje supports the most commonly used scikit-learn components:
### 🔄 Transformers
- **Scalers**: `StandardScaler`, `MinMaxScaler`, `RobustScaler`, `MaxAbsScaler`
- **Encoders**: `OneHotEncoder`, `OrdinalEncoder`, `LabelEncoder`
- **Other**: `Normalizer`, `QuantileTransformer`, `SelectKBest`
### 🎯 Estimators
- **Classification**: `LogisticRegression`, `RandomForestClassifier`, `DecisionTreeClassifier`, `GaussianNB`
- **Regression**: `LinearRegression`
### 🏗️ Composite
- **`ColumnTransformer`** - Full support with nested compilation
*More components coming soon! See [Contributing](#-contributing) to request or add support.*
## 📖 More Examples
### Complex Pipeline with ColumnTransformer
```python
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.ensemble import RandomForestClassifier
# Create a complex pipeline
preprocessor = ColumnTransformer([
('num', StandardScaler(), ['age', 'income']),
('cat', OneHotEncoder(), ['category', 'region'])
])
pipeline = Pipeline([
('preprocessor', preprocessor),
('classifier', RandomForestClassifier(n_estimators=10))
])
# Fit and compile
pipeline.fit(X_train, y_train)
fast_predict = compile_pipeline(pipeline)
# Single-row prediction
row = [25, 50000, 'A', 'North'] # [age, income, category, region]
prediction = fast_predict(row)
```
### Real-World API Usage
```python
from flask import Flask, request, jsonify
import joblib
app = Flask(__name__)
# Load and compile your model once at startup
model = joblib.load('trained_pipeline.pkl')
fast_predict = compile_pipeline(model)
@app.route('/predict', methods=['POST'])
def predict():
data = request.json['features']
prediction = fast_predict(data) # Super fast!
return jsonify({'prediction': prediction.tolist()})
```
## 🚫 Limitations
- Input must be lists/arrays (no pandas DataFrames directly)
- No sparse matrix support
- Some transformers use approximations (e.g., `QuantileTransformer`)
- Only listed components are supported
## 📚 API Reference
### `compile_pipeline(pipeline)`
Compiles a fitted scikit-learn pipeline into a fast single-row prediction function.
**Args:**
- `pipeline`: A fitted scikit-learn Pipeline
**Returns:**
- Function that takes a single row (list/array) and returns predictions
**Raises:**
- `ValueError`: If pipeline contains unsupported components
### `get_supported_transformers()`
Returns list of all supported transformer/estimator classes.
## 📁 Examples & Benchmarks
Check out the `examples/` directory for:
- **`simple_example.py`** - Basic usage
- **`benchmark.py`** - Performance comparisons
- **`comprehensive_benchmark.py`** - Detailed benchmarks
- **`profiler_demo.py`** - Profiling tools
## 🔌 Extending Support
Want to add support for a new transformer? It's easy:
```python
from stripje import register_step_handler
@register_step_handler(YourTransformer)
def handle_your_transformer(step):
# Extract parameters from the fitted step
param1 = step.param1_
param2 = step.param2_
def transform_one(x):
# Implement single-row transformation logic
result = []
for val in x:
# Your transformation logic here
transformed_val = val * param1 + param2
result.append(transformed_val)
return result
return transform_one
```
## 🤝 Contributing
Contributions are welcome! Please feel free to submit a pull request or open an issue.
1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add some amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request
## 🛠️ Development
### Setup Development Environment
1. Clone the repository:
```bash
git clone https://github.com/hadi-gharibi/stripje.git
cd stripje
```
2. Install all dependencies (including optional ones for full testing):
```bash
uv sync --all-extras
```
3. Install pre-commit hooks:
```bash
uv run pre-commit install
```
### Code Quality Tools
This project uses modern Python development tools:
- **Ruff**: Fast linting, formatting, and import sorting
- **MyPy**: Static type checking
- **pre-commit**: Automated code quality checks
Run code quality checks:
```bash
# Lint and auto-fix issues
uv run ruff check src/ tests/ --fix
# Format code
uv run ruff format src/ tests/
# Type checking
uv run mypy src/
# Run all pre-commit hooks
uv run pre-commit run --all-files
```
### Testing
Run tests:
```bash
uv run pytest
```
Run tests with coverage:
```bash
uv run pytest --cov=stripje
```
## 📄 License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
Raw data
{
"_id": null,
"home_page": null,
"name": "stripje",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "scikit-learn, sklearn, machine-learning, pipeline, optimization, performance, inference, compiler, single-row, prediction, transformation, preprocessing",
"author": null,
"author_email": "Hadi Gharibi <hady.gharibi@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/a8/ae/013436a0ac2b2f036a00f0a973999742fc12dac5e9bcc4e8f85e9dd798f7/stripje-0.1.0.tar.gz",
"platform": null,
"description": "# Stripje - Make sklearn pipelines lean\n\n[](https://www.python.org/downloads/)\n[](https://opensource.org/licenses/MIT)\n[](https://github.com/hadi-gharibi/stripje)\n\n**Speed up your scikit-learn pipelines for single-row predictions by 2-10x!**\n\nStripje is a high-performance compiler that converts trained scikit-learn pipelines into optimized Python functions, eliminating numpy overhead for single-row inference.\n\n## \ud83d\ude80 Why Stripje?\n\n- **\u26a1 2-200x faster** single-row predictions, depending on the pipeline complexity\n- **\ud83d\udd27 Drop-in replacement** - works with your existing pipelines\n- **\ud83c\udfaf Zero configuration** - just compile and use\n- **\ud83d\udee0\ufe0f Production ready** - optimized for real-time inference\n\n## \ud83d\udce6 Installation\n\n```bash\npip install stripje\n```\n\nOr with uv (recommended):\n```bash\nuv add stripje\n```\n\n## \u26a1 Quick Start\n\n```python\nfrom sklearn.pipeline import Pipeline\nfrom sklearn.preprocessing import StandardScaler\nfrom sklearn.linear_model import LogisticRegression\nfrom stripje import compile_pipeline\n\n# 1. Create and fit your pipeline as usual\npipeline = Pipeline([\n ('scaler', StandardScaler()),\n ('classifier', LogisticRegression())\n])\npipeline.fit(X_train, y_train)\n\n# 2. Compile for fast single-row inference\nfast_predict = compile_pipeline(pipeline)\n\n# 3. Get predictions up to 10x faster!\ntest_row = [1.2, -0.5, 0.8, -1.1]\nprediction = fast_predict(test_row) # Much faster than pipeline.predict([test_row])\n```\n\n## \ud83c\udfaf The Problem We Solve\n\n**Standard scikit-learn pipelines are slow for single predictions** because they're optimized for batch processing. When you need to predict one row at a time (like in web APIs), numpy operations create unnecessary overhead.\n\n**Stripje compiles your trained pipeline** into a specialized function that:\n- \u2705 Extracts fitted parameters once\n- \u2705 Eliminates array creation overhead\n- \u2705 Uses native Python operations\n- \u2705 Maintains identical results\n\n## \ud83d\udcca Performance Comparison\n\n```python\nimport time\nfrom sklearn.datasets import make_classification\nfrom sklearn.pipeline import Pipeline\nfrom sklearn.preprocessing import StandardScaler\nfrom sklearn.linear_model import LogisticRegression\n\n# Setup\nX, y = make_classification(n_samples=1000, n_features=20)\npipeline = Pipeline([('scaler', StandardScaler()), ('clf', LogisticRegression())])\npipeline.fit(X, y)\nfast_predict = compile_pipeline(pipeline)\n\ntest_row = X[0].tolist()\n\n# Benchmark single-row predictions\ndef benchmark_standard():\n start = time.time()\n for _ in range(1000):\n pipeline.predict([test_row])\n return time.time() - start\n\ndef benchmark_compiled():\n start = time.time()\n for _ in range(1000):\n fast_predict(test_row)\n return time.time() - start\n\nstandard_time = benchmark_standard()\ncompiled_time = benchmark_compiled()\nspeedup = standard_time / compiled_time\n\nprint(f\"Standard pipeline: {standard_time:.3f}s\")\nprint(f\"Compiled pipeline: {compiled_time:.3f}s\")\nprint(f\"Speedup: {speedup:.1f}x faster!\")\n```\n\n## \ud83d\udd27 Supported Components\n\nStripje supports the most commonly used scikit-learn components:\n\n### \ud83d\udd04 Transformers\n- **Scalers**: `StandardScaler`, `MinMaxScaler`, `RobustScaler`, `MaxAbsScaler`\n- **Encoders**: `OneHotEncoder`, `OrdinalEncoder`, `LabelEncoder`\n- **Other**: `Normalizer`, `QuantileTransformer`, `SelectKBest`\n\n### \ud83c\udfaf Estimators\n- **Classification**: `LogisticRegression`, `RandomForestClassifier`, `DecisionTreeClassifier`, `GaussianNB`\n- **Regression**: `LinearRegression`\n\n### \ud83c\udfd7\ufe0f Composite\n- **`ColumnTransformer`** - Full support with nested compilation\n\n*More components coming soon! See [Contributing](#-contributing) to request or add support.*\n\n## \ud83d\udcd6 More Examples\n\n### Complex Pipeline with ColumnTransformer\n\n```python\nfrom sklearn.compose import ColumnTransformer\nfrom sklearn.preprocessing import StandardScaler, OneHotEncoder\nfrom sklearn.ensemble import RandomForestClassifier\n\n# Create a complex pipeline\npreprocessor = ColumnTransformer([\n ('num', StandardScaler(), ['age', 'income']),\n ('cat', OneHotEncoder(), ['category', 'region'])\n])\n\npipeline = Pipeline([\n ('preprocessor', preprocessor),\n ('classifier', RandomForestClassifier(n_estimators=10))\n])\n\n# Fit and compile\npipeline.fit(X_train, y_train)\nfast_predict = compile_pipeline(pipeline)\n\n# Single-row prediction\nrow = [25, 50000, 'A', 'North'] # [age, income, category, region]\nprediction = fast_predict(row)\n```\n\n### Real-World API Usage\n\n```python\nfrom flask import Flask, request, jsonify\nimport joblib\n\napp = Flask(__name__)\n\n# Load and compile your model once at startup\nmodel = joblib.load('trained_pipeline.pkl')\nfast_predict = compile_pipeline(model)\n\n@app.route('/predict', methods=['POST'])\ndef predict():\n data = request.json['features']\n prediction = fast_predict(data) # Super fast!\n return jsonify({'prediction': prediction.tolist()})\n```\n\n## \ud83d\udeab Limitations\n\n- Input must be lists/arrays (no pandas DataFrames directly)\n- No sparse matrix support\n- Some transformers use approximations (e.g., `QuantileTransformer`)\n- Only listed components are supported\n\n## \ud83d\udcda API Reference\n\n### `compile_pipeline(pipeline)`\nCompiles a fitted scikit-learn pipeline into a fast single-row prediction function.\n\n**Args:**\n- `pipeline`: A fitted scikit-learn Pipeline\n\n**Returns:**\n- Function that takes a single row (list/array) and returns predictions\n\n**Raises:**\n- `ValueError`: If pipeline contains unsupported components\n\n### `get_supported_transformers()`\nReturns list of all supported transformer/estimator classes.\n\n## \ud83d\udcc1 Examples & Benchmarks\n\nCheck out the `examples/` directory for:\n- **`simple_example.py`** - Basic usage\n- **`benchmark.py`** - Performance comparisons\n- **`comprehensive_benchmark.py`** - Detailed benchmarks\n- **`profiler_demo.py`** - Profiling tools\n\n## \ud83d\udd0c Extending Support\n\nWant to add support for a new transformer? It's easy:\n\n```python\nfrom stripje import register_step_handler\n\n@register_step_handler(YourTransformer)\ndef handle_your_transformer(step):\n # Extract parameters from the fitted step\n param1 = step.param1_\n param2 = step.param2_\n\n def transform_one(x):\n # Implement single-row transformation logic\n result = []\n for val in x:\n # Your transformation logic here\n transformed_val = val * param1 + param2\n result.append(transformed_val)\n return result\n\n return transform_one\n```\n\n## \ud83e\udd1d Contributing\n\nContributions are welcome! Please feel free to submit a pull request or open an issue.\n\n1. Fork the repository\n2. Create a feature branch (`git checkout -b feature/amazing-feature`)\n3. Commit your changes (`git commit -m 'Add some amazing feature'`)\n4. Push to the branch (`git push origin feature/amazing-feature`)\n5. Open a Pull Request\n\n## \ud83d\udee0\ufe0f Development\n\n### Setup Development Environment\n\n1. Clone the repository:\n```bash\ngit clone https://github.com/hadi-gharibi/stripje.git\ncd stripje\n```\n\n2. Install all dependencies (including optional ones for full testing):\n```bash\nuv sync --all-extras\n```\n\n3. Install pre-commit hooks:\n```bash\nuv run pre-commit install\n```\n\n### Code Quality Tools\n\nThis project uses modern Python development tools:\n\n- **Ruff**: Fast linting, formatting, and import sorting\n- **MyPy**: Static type checking\n- **pre-commit**: Automated code quality checks\n\nRun code quality checks:\n\n```bash\n# Lint and auto-fix issues\nuv run ruff check src/ tests/ --fix\n\n# Format code\nuv run ruff format src/ tests/\n\n# Type checking\nuv run mypy src/\n\n# Run all pre-commit hooks\nuv run pre-commit run --all-files\n```\n\n### Testing\n\nRun tests:\n\n```bash\nuv run pytest\n```\n\nRun tests with coverage:\n\n```bash\nuv run pytest --cov=stripje\n```\n\n## \ud83d\udcc4 License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "High-performance single-row inference compiler for scikit-learn pipelines with 2-10x speedup",
"version": "0.1.0",
"project_urls": {
"Homepage": "https://github.com/hadi-gharibi/stripje"
},
"split_keywords": [
"scikit-learn",
" sklearn",
" machine-learning",
" pipeline",
" optimization",
" performance",
" inference",
" compiler",
" single-row",
" prediction",
" transformation",
" preprocessing"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "3c8cdd13d838d98129ecef1a4a25b04f838abff2cde5f3a845533e218ae0c3aa",
"md5": "3ffcb14ace90215dd28fe9650ab1ec54",
"sha256": "ee4b17446551b244284a34a75223e0d58f50f5348d948caf893b64562d134c60"
},
"downloads": -1,
"filename": "stripje-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "3ffcb14ace90215dd28fe9650ab1ec54",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 30494,
"upload_time": "2025-10-26T00:52:02",
"upload_time_iso_8601": "2025-10-26T00:52:02.186836Z",
"url": "https://files.pythonhosted.org/packages/3c/8c/dd13d838d98129ecef1a4a25b04f838abff2cde5f3a845533e218ae0c3aa/stripje-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "a8ae013436a0ac2b2f036a00f0a973999742fc12dac5e9bcc4e8f85e9dd798f7",
"md5": "0143ac4d27c0616b3d280b44c5fb4baf",
"sha256": "b7a01720c99cd3868b3ed4436443dc3d83a0965a06e76b3e2f19a0819218a9a2"
},
"downloads": -1,
"filename": "stripje-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "0143ac4d27c0616b3d280b44c5fb4baf",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 52628,
"upload_time": "2025-10-26T00:52:03",
"upload_time_iso_8601": "2025-10-26T00:52:03.557086Z",
"url": "https://files.pythonhosted.org/packages/a8/ae/013436a0ac2b2f036a00f0a973999742fc12dac5e9bcc4e8f85e9dd798f7/stripje-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-26 00:52:03",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "hadi-gharibi",
"github_project": "stripje",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "stripje"
}