stripje


Namestripje JSON
Version 0.1.0 PyPI version JSON
download
home_pageNone
SummaryHigh-performance single-row inference compiler for scikit-learn pipelines with 2-10x speedup
upload_time2025-10-26 00:52:03
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseMIT
keywords scikit-learn sklearn machine-learning pipeline optimization performance inference compiler single-row prediction transformation preprocessing
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Stripje - Make sklearn pipelines lean

[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Build Status](https://img.shields.io/badge/build-passing-brightgreen.svg)](https://github.com/hadi-gharibi/stripje)

**Speed up your scikit-learn pipelines for single-row predictions by 2-10x!**

Stripje is a high-performance compiler that converts trained scikit-learn pipelines into optimized Python functions, eliminating numpy overhead for single-row inference.

## 🚀 Why Stripje?

- **⚡ 2-200x faster** single-row predictions, depending on the pipeline complexity
- **🔧 Drop-in replacement** - works with your existing pipelines
- **🎯 Zero configuration** - just compile and use
- **🛠️ Production ready** - optimized for real-time inference

## 📦 Installation

```bash
pip install stripje
```

Or with uv (recommended):
```bash
uv add stripje
```

## ⚡ Quick Start

```python
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from stripje import compile_pipeline

# 1. Create and fit your pipeline as usual
pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('classifier', LogisticRegression())
])
pipeline.fit(X_train, y_train)

# 2. Compile for fast single-row inference
fast_predict = compile_pipeline(pipeline)

# 3. Get predictions up to 10x faster!
test_row = [1.2, -0.5, 0.8, -1.1]
prediction = fast_predict(test_row)  # Much faster than pipeline.predict([test_row])
```

## 🎯 The Problem We Solve

**Standard scikit-learn pipelines are slow for single predictions** because they're optimized for batch processing. When you need to predict one row at a time (like in web APIs), numpy operations create unnecessary overhead.

**Stripje compiles your trained pipeline** into a specialized function that:
- ✅ Extracts fitted parameters once
- ✅ Eliminates array creation overhead
- ✅ Uses native Python operations
- ✅ Maintains identical results

## 📊 Performance Comparison

```python
import time
from sklearn.datasets import make_classification
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression

# Setup
X, y = make_classification(n_samples=1000, n_features=20)
pipeline = Pipeline([('scaler', StandardScaler()), ('clf', LogisticRegression())])
pipeline.fit(X, y)
fast_predict = compile_pipeline(pipeline)

test_row = X[0].tolist()

# Benchmark single-row predictions
def benchmark_standard():
    start = time.time()
    for _ in range(1000):
        pipeline.predict([test_row])
    return time.time() - start

def benchmark_compiled():
    start = time.time()
    for _ in range(1000):
        fast_predict(test_row)
    return time.time() - start

standard_time = benchmark_standard()
compiled_time = benchmark_compiled()
speedup = standard_time / compiled_time

print(f"Standard pipeline: {standard_time:.3f}s")
print(f"Compiled pipeline: {compiled_time:.3f}s")
print(f"Speedup: {speedup:.1f}x faster!")
```

## 🔧 Supported Components

Stripje supports the most commonly used scikit-learn components:

### 🔄 Transformers
- **Scalers**: `StandardScaler`, `MinMaxScaler`, `RobustScaler`, `MaxAbsScaler`
- **Encoders**: `OneHotEncoder`, `OrdinalEncoder`, `LabelEncoder`
- **Other**: `Normalizer`, `QuantileTransformer`, `SelectKBest`

### 🎯 Estimators
- **Classification**: `LogisticRegression`, `RandomForestClassifier`, `DecisionTreeClassifier`, `GaussianNB`
- **Regression**: `LinearRegression`

### 🏗️ Composite
- **`ColumnTransformer`** - Full support with nested compilation

*More components coming soon! See [Contributing](#-contributing) to request or add support.*

## 📖 More Examples

### Complex Pipeline with ColumnTransformer

```python
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.ensemble import RandomForestClassifier

# Create a complex pipeline
preprocessor = ColumnTransformer([
    ('num', StandardScaler(), ['age', 'income']),
    ('cat', OneHotEncoder(), ['category', 'region'])
])

pipeline = Pipeline([
    ('preprocessor', preprocessor),
    ('classifier', RandomForestClassifier(n_estimators=10))
])

# Fit and compile
pipeline.fit(X_train, y_train)
fast_predict = compile_pipeline(pipeline)

# Single-row prediction
row = [25, 50000, 'A', 'North']  # [age, income, category, region]
prediction = fast_predict(row)
```

### Real-World API Usage

```python
from flask import Flask, request, jsonify
import joblib

app = Flask(__name__)

# Load and compile your model once at startup
model = joblib.load('trained_pipeline.pkl')
fast_predict = compile_pipeline(model)

@app.route('/predict', methods=['POST'])
def predict():
    data = request.json['features']
    prediction = fast_predict(data)  # Super fast!
    return jsonify({'prediction': prediction.tolist()})
```

## 🚫 Limitations

- Input must be lists/arrays (no pandas DataFrames directly)
- No sparse matrix support
- Some transformers use approximations (e.g., `QuantileTransformer`)
- Only listed components are supported

## 📚 API Reference

### `compile_pipeline(pipeline)`
Compiles a fitted scikit-learn pipeline into a fast single-row prediction function.

**Args:**
- `pipeline`: A fitted scikit-learn Pipeline

**Returns:**
- Function that takes a single row (list/array) and returns predictions

**Raises:**
- `ValueError`: If pipeline contains unsupported components

### `get_supported_transformers()`
Returns list of all supported transformer/estimator classes.

## 📁 Examples & Benchmarks

Check out the `examples/` directory for:
- **`simple_example.py`** - Basic usage
- **`benchmark.py`** - Performance comparisons
- **`comprehensive_benchmark.py`** - Detailed benchmarks
- **`profiler_demo.py`** - Profiling tools

## 🔌 Extending Support

Want to add support for a new transformer? It's easy:

```python
from stripje import register_step_handler

@register_step_handler(YourTransformer)
def handle_your_transformer(step):
    # Extract parameters from the fitted step
    param1 = step.param1_
    param2 = step.param2_

    def transform_one(x):
        # Implement single-row transformation logic
        result = []
        for val in x:
            # Your transformation logic here
            transformed_val = val * param1 + param2
            result.append(transformed_val)
        return result

    return transform_one
```

## 🤝 Contributing

Contributions are welcome! Please feel free to submit a pull request or open an issue.

1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add some amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request

## 🛠️ Development

### Setup Development Environment

1. Clone the repository:
```bash
git clone https://github.com/hadi-gharibi/stripje.git
cd stripje
```

2. Install all dependencies (including optional ones for full testing):
```bash
uv sync --all-extras
```

3. Install pre-commit hooks:
```bash
uv run pre-commit install
```

### Code Quality Tools

This project uses modern Python development tools:

- **Ruff**: Fast linting, formatting, and import sorting
- **MyPy**: Static type checking
- **pre-commit**: Automated code quality checks

Run code quality checks:

```bash
# Lint and auto-fix issues
uv run ruff check src/ tests/ --fix

# Format code
uv run ruff format src/ tests/

# Type checking
uv run mypy src/

# Run all pre-commit hooks
uv run pre-commit run --all-files
```

### Testing

Run tests:

```bash
uv run pytest
```

Run tests with coverage:

```bash
uv run pytest --cov=stripje
```

## 📄 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "stripje",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "scikit-learn, sklearn, machine-learning, pipeline, optimization, performance, inference, compiler, single-row, prediction, transformation, preprocessing",
    "author": null,
    "author_email": "Hadi Gharibi <hady.gharibi@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/a8/ae/013436a0ac2b2f036a00f0a973999742fc12dac5e9bcc4e8f85e9dd798f7/stripje-0.1.0.tar.gz",
    "platform": null,
    "description": "# Stripje - Make sklearn pipelines lean\n\n[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Build Status](https://img.shields.io/badge/build-passing-brightgreen.svg)](https://github.com/hadi-gharibi/stripje)\n\n**Speed up your scikit-learn pipelines for single-row predictions by 2-10x!**\n\nStripje is a high-performance compiler that converts trained scikit-learn pipelines into optimized Python functions, eliminating numpy overhead for single-row inference.\n\n## \ud83d\ude80 Why Stripje?\n\n- **\u26a1 2-200x faster** single-row predictions, depending on the pipeline complexity\n- **\ud83d\udd27 Drop-in replacement** - works with your existing pipelines\n- **\ud83c\udfaf Zero configuration** - just compile and use\n- **\ud83d\udee0\ufe0f Production ready** - optimized for real-time inference\n\n## \ud83d\udce6 Installation\n\n```bash\npip install stripje\n```\n\nOr with uv (recommended):\n```bash\nuv add stripje\n```\n\n## \u26a1 Quick Start\n\n```python\nfrom sklearn.pipeline import Pipeline\nfrom sklearn.preprocessing import StandardScaler\nfrom sklearn.linear_model import LogisticRegression\nfrom stripje import compile_pipeline\n\n# 1. Create and fit your pipeline as usual\npipeline = Pipeline([\n    ('scaler', StandardScaler()),\n    ('classifier', LogisticRegression())\n])\npipeline.fit(X_train, y_train)\n\n# 2. Compile for fast single-row inference\nfast_predict = compile_pipeline(pipeline)\n\n# 3. Get predictions up to 10x faster!\ntest_row = [1.2, -0.5, 0.8, -1.1]\nprediction = fast_predict(test_row)  # Much faster than pipeline.predict([test_row])\n```\n\n## \ud83c\udfaf The Problem We Solve\n\n**Standard scikit-learn pipelines are slow for single predictions** because they're optimized for batch processing. When you need to predict one row at a time (like in web APIs), numpy operations create unnecessary overhead.\n\n**Stripje compiles your trained pipeline** into a specialized function that:\n- \u2705 Extracts fitted parameters once\n- \u2705 Eliminates array creation overhead\n- \u2705 Uses native Python operations\n- \u2705 Maintains identical results\n\n## \ud83d\udcca Performance Comparison\n\n```python\nimport time\nfrom sklearn.datasets import make_classification\nfrom sklearn.pipeline import Pipeline\nfrom sklearn.preprocessing import StandardScaler\nfrom sklearn.linear_model import LogisticRegression\n\n# Setup\nX, y = make_classification(n_samples=1000, n_features=20)\npipeline = Pipeline([('scaler', StandardScaler()), ('clf', LogisticRegression())])\npipeline.fit(X, y)\nfast_predict = compile_pipeline(pipeline)\n\ntest_row = X[0].tolist()\n\n# Benchmark single-row predictions\ndef benchmark_standard():\n    start = time.time()\n    for _ in range(1000):\n        pipeline.predict([test_row])\n    return time.time() - start\n\ndef benchmark_compiled():\n    start = time.time()\n    for _ in range(1000):\n        fast_predict(test_row)\n    return time.time() - start\n\nstandard_time = benchmark_standard()\ncompiled_time = benchmark_compiled()\nspeedup = standard_time / compiled_time\n\nprint(f\"Standard pipeline: {standard_time:.3f}s\")\nprint(f\"Compiled pipeline: {compiled_time:.3f}s\")\nprint(f\"Speedup: {speedup:.1f}x faster!\")\n```\n\n## \ud83d\udd27 Supported Components\n\nStripje supports the most commonly used scikit-learn components:\n\n### \ud83d\udd04 Transformers\n- **Scalers**: `StandardScaler`, `MinMaxScaler`, `RobustScaler`, `MaxAbsScaler`\n- **Encoders**: `OneHotEncoder`, `OrdinalEncoder`, `LabelEncoder`\n- **Other**: `Normalizer`, `QuantileTransformer`, `SelectKBest`\n\n### \ud83c\udfaf Estimators\n- **Classification**: `LogisticRegression`, `RandomForestClassifier`, `DecisionTreeClassifier`, `GaussianNB`\n- **Regression**: `LinearRegression`\n\n### \ud83c\udfd7\ufe0f Composite\n- **`ColumnTransformer`** - Full support with nested compilation\n\n*More components coming soon! See [Contributing](#-contributing) to request or add support.*\n\n## \ud83d\udcd6 More Examples\n\n### Complex Pipeline with ColumnTransformer\n\n```python\nfrom sklearn.compose import ColumnTransformer\nfrom sklearn.preprocessing import StandardScaler, OneHotEncoder\nfrom sklearn.ensemble import RandomForestClassifier\n\n# Create a complex pipeline\npreprocessor = ColumnTransformer([\n    ('num', StandardScaler(), ['age', 'income']),\n    ('cat', OneHotEncoder(), ['category', 'region'])\n])\n\npipeline = Pipeline([\n    ('preprocessor', preprocessor),\n    ('classifier', RandomForestClassifier(n_estimators=10))\n])\n\n# Fit and compile\npipeline.fit(X_train, y_train)\nfast_predict = compile_pipeline(pipeline)\n\n# Single-row prediction\nrow = [25, 50000, 'A', 'North']  # [age, income, category, region]\nprediction = fast_predict(row)\n```\n\n### Real-World API Usage\n\n```python\nfrom flask import Flask, request, jsonify\nimport joblib\n\napp = Flask(__name__)\n\n# Load and compile your model once at startup\nmodel = joblib.load('trained_pipeline.pkl')\nfast_predict = compile_pipeline(model)\n\n@app.route('/predict', methods=['POST'])\ndef predict():\n    data = request.json['features']\n    prediction = fast_predict(data)  # Super fast!\n    return jsonify({'prediction': prediction.tolist()})\n```\n\n## \ud83d\udeab Limitations\n\n- Input must be lists/arrays (no pandas DataFrames directly)\n- No sparse matrix support\n- Some transformers use approximations (e.g., `QuantileTransformer`)\n- Only listed components are supported\n\n## \ud83d\udcda API Reference\n\n### `compile_pipeline(pipeline)`\nCompiles a fitted scikit-learn pipeline into a fast single-row prediction function.\n\n**Args:**\n- `pipeline`: A fitted scikit-learn Pipeline\n\n**Returns:**\n- Function that takes a single row (list/array) and returns predictions\n\n**Raises:**\n- `ValueError`: If pipeline contains unsupported components\n\n### `get_supported_transformers()`\nReturns list of all supported transformer/estimator classes.\n\n## \ud83d\udcc1 Examples & Benchmarks\n\nCheck out the `examples/` directory for:\n- **`simple_example.py`** - Basic usage\n- **`benchmark.py`** - Performance comparisons\n- **`comprehensive_benchmark.py`** - Detailed benchmarks\n- **`profiler_demo.py`** - Profiling tools\n\n## \ud83d\udd0c Extending Support\n\nWant to add support for a new transformer? It's easy:\n\n```python\nfrom stripje import register_step_handler\n\n@register_step_handler(YourTransformer)\ndef handle_your_transformer(step):\n    # Extract parameters from the fitted step\n    param1 = step.param1_\n    param2 = step.param2_\n\n    def transform_one(x):\n        # Implement single-row transformation logic\n        result = []\n        for val in x:\n            # Your transformation logic here\n            transformed_val = val * param1 + param2\n            result.append(transformed_val)\n        return result\n\n    return transform_one\n```\n\n## \ud83e\udd1d Contributing\n\nContributions are welcome! Please feel free to submit a pull request or open an issue.\n\n1. Fork the repository\n2. Create a feature branch (`git checkout -b feature/amazing-feature`)\n3. Commit your changes (`git commit -m 'Add some amazing feature'`)\n4. Push to the branch (`git push origin feature/amazing-feature`)\n5. Open a Pull Request\n\n## \ud83d\udee0\ufe0f Development\n\n### Setup Development Environment\n\n1. Clone the repository:\n```bash\ngit clone https://github.com/hadi-gharibi/stripje.git\ncd stripje\n```\n\n2. Install all dependencies (including optional ones for full testing):\n```bash\nuv sync --all-extras\n```\n\n3. Install pre-commit hooks:\n```bash\nuv run pre-commit install\n```\n\n### Code Quality Tools\n\nThis project uses modern Python development tools:\n\n- **Ruff**: Fast linting, formatting, and import sorting\n- **MyPy**: Static type checking\n- **pre-commit**: Automated code quality checks\n\nRun code quality checks:\n\n```bash\n# Lint and auto-fix issues\nuv run ruff check src/ tests/ --fix\n\n# Format code\nuv run ruff format src/ tests/\n\n# Type checking\nuv run mypy src/\n\n# Run all pre-commit hooks\nuv run pre-commit run --all-files\n```\n\n### Testing\n\nRun tests:\n\n```bash\nuv run pytest\n```\n\nRun tests with coverage:\n\n```bash\nuv run pytest --cov=stripje\n```\n\n## \ud83d\udcc4 License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "High-performance single-row inference compiler for scikit-learn pipelines with 2-10x speedup",
    "version": "0.1.0",
    "project_urls": {
        "Homepage": "https://github.com/hadi-gharibi/stripje"
    },
    "split_keywords": [
        "scikit-learn",
        " sklearn",
        " machine-learning",
        " pipeline",
        " optimization",
        " performance",
        " inference",
        " compiler",
        " single-row",
        " prediction",
        " transformation",
        " preprocessing"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "3c8cdd13d838d98129ecef1a4a25b04f838abff2cde5f3a845533e218ae0c3aa",
                "md5": "3ffcb14ace90215dd28fe9650ab1ec54",
                "sha256": "ee4b17446551b244284a34a75223e0d58f50f5348d948caf893b64562d134c60"
            },
            "downloads": -1,
            "filename": "stripje-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3ffcb14ace90215dd28fe9650ab1ec54",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 30494,
            "upload_time": "2025-10-26T00:52:02",
            "upload_time_iso_8601": "2025-10-26T00:52:02.186836Z",
            "url": "https://files.pythonhosted.org/packages/3c/8c/dd13d838d98129ecef1a4a25b04f838abff2cde5f3a845533e218ae0c3aa/stripje-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "a8ae013436a0ac2b2f036a00f0a973999742fc12dac5e9bcc4e8f85e9dd798f7",
                "md5": "0143ac4d27c0616b3d280b44c5fb4baf",
                "sha256": "b7a01720c99cd3868b3ed4436443dc3d83a0965a06e76b3e2f19a0819218a9a2"
            },
            "downloads": -1,
            "filename": "stripje-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "0143ac4d27c0616b3d280b44c5fb4baf",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 52628,
            "upload_time": "2025-10-26T00:52:03",
            "upload_time_iso_8601": "2025-10-26T00:52:03.557086Z",
            "url": "https://files.pythonhosted.org/packages/a8/ae/013436a0ac2b2f036a00f0a973999742fc12dac5e9bcc4e8f85e9dd798f7/stripje-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-26 00:52:03",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "hadi-gharibi",
    "github_project": "stripje",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "stripje"
}
        
Elapsed time: 1.67032s