tinyshift


Nametinyshift JSON
Version 0.1.3 PyPI version JSON
download
home_pageNone
SummaryA small toolbox for mlops
upload_time2025-07-27 02:54:44
maintainerNone
docs_urlNone
authorLucas Leão
requires_python<4.0,>=3.10
licenseMIT
keywords mlops toolbox machine-learning
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # TinyShift

**TinyShift** is a small experimental Python library designed to detect **data drifts** and **performance drops** in machine learning models over time. The main goal of the project is to provide quick and tiny monitoring tools to help identify when data or model performance unexpectedly change.
For more robust solutions, I highly recommend [Nannyml.](https://github.com/NannyML/nannyml)

## Technologies Used

- **Python 3.x**
- **Scikit-learn**
- **Pandas**
- **NumPy**
- **Plotly**
- **Scipy**

## Installation

To install **TinyShift** in your development environment, use **pip**:


```bash
pip install tinyshift
```
If you prefer to clone the repository and install manually:
```bash
git clone https://github.com/HeyLucasLeao/tinyshift.git
cd tinyshift    
pip install .
```

> **Note:** If you want to enable plotting capabilities, you need to install the extras using Poetry:

```bash
poetry install --all-extras
```

## Usage
Below are basic examples of how to use TinyShift's features.
### 1. Data Drift Detection
To detect data drift, simply score in a new dataset to compare with the reference data. The DataDriftDetector will calculate metrics to identify significant differences.

```python
from tinyshift.detector import CategoricalDriftDetector

df = pd.DataFrame("examples.csv")
df_reference = df[(df["datetime"] < '2024-07-01')].copy()
df_analysis = df[(df["datetime"] >= '2024-07-01')].copy()

detector = CategoricalDriftDetector(df_reference, 'discrete_1', "datetime", "W", drift_limit='mad')

analysis_score = detector.score(df_analysis, "discrete_1", "datetime")

print(analysis_score)
```

### 2. Performance Tracker
To track model performance over time, use the PerformanceMonitor, which will compare model accuracy on both old and new data.
```python
from tinyshift.tracker import PerformanceTracker

df_reference = pd.read_csv('refence.csv')
df_analysis = pd.read_csv('analysis.csv')
model = load_model('model.pkl') 
df_analysis['prediction'] = model.predict(df_analysis["feature_0"])

tracker = PerformanceTracker(df_reference, 'target', 'prediction', 'datetime', "W")

analysis_score = tracker.score(df_analysis, 'target', 'prediction', 'datetime')

print(analysis_score)
```

### 3. Visualization
TinyShift also provides graphs to visualize the magnitude of drift and performance changes over time.
```python
tracker.plot.scatter(analysis_score, fig_type="png")

tracker.plot.bar(analysis_score, fig_type="png")
```

### 4. Outlier Detection
To detect outliers in your dataset, you can use the models provided by TinyShift. Currently, it offers the Histogram-Based Outlier Score (HBOS), Simple Probabilistic Anomaly Detector (SPAD), and SPAD+.

```python
from tinyshift.outlier import SPAD

df = pd.read_csv('data.csv')

spad_plus = SPAD(plus=True)
spad_plus.fit(df)

anomaly_scores = spad_plus.decision_function(df)

print(anomaly_scores)
```
### 5. Anomaly Tracker
The Anomaly Tracker in TinyShift allows you to identify potential outliers based on the drift limit and anomaly scores generated during training. By setting a drift limit, the tracker can flag data points that exceed this threshold as possible outliers.

```python
from tinyshift.tracker import AnomalyTracker

model = load_model('model.pkl') 

tracker = AnomalyTracker(model, drift_limit='mad')

df_analysis = pd.read_csv('analysis.csv')

outliers = tracker.score(df_analysis)

print(outliers)
```
In this example, the `AnomalyTracker` is initialized with a reference model and a specified drift limit. The `score` method evaluates the analysis dataset, calculating anomaly scores and flagging data points that exceed the drift limit as potential outliers.

## Project Structure
The basic structure of the project is as follows:
```
tinyshift
├── LICENSE
├── README.md
├── poetry.lock
├── pyproject.toml
└── tinyshift
    ├── examples
    │   ├── outlier.ipynb
    │   └── tracker.ipynb
    ├── outlier
    │   ├── __init__.py
    │   ├── base.py
    │   ├── hbos.py
    │   └── spad.py
    ├── plot
    │   ├── __init__.py
    │   └── plot.py
    ├── tests
    │   ├── test_hbos.py
    │   └── test_spad.py
    └── tracker
        ├── anomaly.py
        ├── base.py
        ├── categorical.py
        ├── continuous.py
        └── performance.py      
```

### License
This project is licensed under the MIT License - see the LICENSE file for more details.


            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "tinyshift",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.10",
    "maintainer_email": null,
    "keywords": "mlops, toolbox, machine-learning",
    "author": "Lucas Le\u00e3o",
    "author_email": "heylucasleao@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/81/c0/3d7cfd9dad875011ed22c5e74bdf2c7ecb11863a6987a36b1d38c655cefa/tinyshift-0.1.3.tar.gz",
    "platform": null,
    "description": "# TinyShift\n\n**TinyShift** is a small experimental Python library designed to detect **data drifts** and **performance drops** in machine learning models over time. The main goal of the project is to provide quick and tiny monitoring tools to help identify when data or model performance unexpectedly change.\nFor more robust solutions, I highly recommend [Nannyml.](https://github.com/NannyML/nannyml)\n\n## Technologies Used\n\n- **Python 3.x**\n- **Scikit-learn**\n- **Pandas**\n- **NumPy**\n- **Plotly**\n- **Scipy**\n\n## Installation\n\nTo install **TinyShift** in your development environment, use **pip**:\n\n\n```bash\npip install tinyshift\n```\nIf you prefer to clone the repository and install manually:\n```bash\ngit clone https://github.com/HeyLucasLeao/tinyshift.git\ncd tinyshift    \npip install .\n```\n\n> **Note:** If you want to enable plotting capabilities, you need to install the extras using Poetry:\n\n```bash\npoetry install --all-extras\n```\n\n## Usage\nBelow are basic examples of how to use TinyShift's features.\n### 1. Data Drift Detection\nTo detect data drift, simply score in a new dataset to compare with the reference data. The DataDriftDetector will calculate metrics to identify significant differences.\n\n```python\nfrom tinyshift.detector import CategoricalDriftDetector\n\ndf = pd.DataFrame(\"examples.csv\")\ndf_reference = df[(df[\"datetime\"] < '2024-07-01')].copy()\ndf_analysis = df[(df[\"datetime\"] >= '2024-07-01')].copy()\n\ndetector = CategoricalDriftDetector(df_reference, 'discrete_1', \"datetime\", \"W\", drift_limit='mad')\n\nanalysis_score = detector.score(df_analysis, \"discrete_1\", \"datetime\")\n\nprint(analysis_score)\n```\n\n### 2. Performance Tracker\nTo track model performance over time, use the PerformanceMonitor, which will compare model accuracy on both old and new data.\n```python\nfrom tinyshift.tracker import PerformanceTracker\n\ndf_reference = pd.read_csv('refence.csv')\ndf_analysis = pd.read_csv('analysis.csv')\nmodel = load_model('model.pkl') \ndf_analysis['prediction'] = model.predict(df_analysis[\"feature_0\"])\n\ntracker = PerformanceTracker(df_reference, 'target', 'prediction', 'datetime', \"W\")\n\nanalysis_score = tracker.score(df_analysis, 'target', 'prediction', 'datetime')\n\nprint(analysis_score)\n```\n\n### 3. Visualization\nTinyShift also provides graphs to visualize the magnitude of drift and performance changes over time.\n```python\ntracker.plot.scatter(analysis_score, fig_type=\"png\")\n\ntracker.plot.bar(analysis_score, fig_type=\"png\")\n```\n\n### 4. Outlier Detection\nTo detect outliers in your dataset, you can use the models provided by TinyShift. Currently, it offers the Histogram-Based Outlier Score (HBOS), Simple Probabilistic Anomaly Detector (SPAD), and SPAD+.\n\n```python\nfrom tinyshift.outlier import SPAD\n\ndf = pd.read_csv('data.csv')\n\nspad_plus = SPAD(plus=True)\nspad_plus.fit(df)\n\nanomaly_scores = spad_plus.decision_function(df)\n\nprint(anomaly_scores)\n```\n### 5. Anomaly Tracker\nThe Anomaly Tracker in TinyShift allows you to identify potential outliers based on the drift limit and anomaly scores generated during training. By setting a drift limit, the tracker can flag data points that exceed this threshold as possible outliers.\n\n```python\nfrom tinyshift.tracker import AnomalyTracker\n\nmodel = load_model('model.pkl') \n\ntracker = AnomalyTracker(model, drift_limit='mad')\n\ndf_analysis = pd.read_csv('analysis.csv')\n\noutliers = tracker.score(df_analysis)\n\nprint(outliers)\n```\nIn this example, the `AnomalyTracker` is initialized with a reference model and a specified drift limit. The `score` method evaluates the analysis dataset, calculating anomaly scores and flagging data points that exceed the drift limit as potential outliers.\n\n## Project Structure\nThe basic structure of the project is as follows:\n```\ntinyshift\n\u251c\u2500\u2500 LICENSE\n\u251c\u2500\u2500 README.md\n\u251c\u2500\u2500 poetry.lock\n\u251c\u2500\u2500 pyproject.toml\n\u2514\u2500\u2500 tinyshift\n    \u251c\u2500\u2500 examples\n    \u2502\u00a0\u00a0 \u251c\u2500\u2500 outlier.ipynb\n    \u2502\u00a0\u00a0 \u2514\u2500\u2500 tracker.ipynb\n    \u251c\u2500\u2500 outlier\n    \u2502\u00a0\u00a0 \u251c\u2500\u2500 __init__.py\n    \u2502\u00a0\u00a0 \u251c\u2500\u2500 base.py\n    \u2502\u00a0\u00a0 \u251c\u2500\u2500 hbos.py\n    \u2502\u00a0\u00a0 \u2514\u2500\u2500 spad.py\n    \u251c\u2500\u2500 plot\n    \u2502\u00a0\u00a0 \u251c\u2500\u2500 __init__.py\n    \u2502\u00a0\u00a0 \u2514\u2500\u2500 plot.py\n    \u251c\u2500\u2500 tests\n    \u2502\u00a0\u00a0 \u251c\u2500\u2500 test_hbos.py\n    \u2502\u00a0\u00a0 \u2514\u2500\u2500 test_spad.py\n    \u2514\u2500\u2500 tracker\n        \u251c\u2500\u2500 anomaly.py\n        \u251c\u2500\u2500 base.py\n        \u251c\u2500\u2500 categorical.py\n        \u251c\u2500\u2500 continuous.py\n        \u2514\u2500\u2500 performance.py      \n```\n\n### License\nThis project is licensed under the MIT License - see the LICENSE file for more details.\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A small toolbox for mlops",
    "version": "0.1.3",
    "project_urls": null,
    "split_keywords": [
        "mlops",
        " toolbox",
        " machine-learning"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "6213656f1affe42c1603fad02e83dd90858ad7cce629b4449774466dc7a636b4",
                "md5": "6174b6983e844a301eb45d82ac559a4a",
                "sha256": "c4aef980119588ba4240517951c669ba9d3c593c3c7653f8cf22b422a84fd469"
            },
            "downloads": -1,
            "filename": "tinyshift-0.1.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6174b6983e844a301eb45d82ac559a4a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.10",
            "size": 1382783,
            "upload_time": "2025-07-27T02:54:42",
            "upload_time_iso_8601": "2025-07-27T02:54:42.896444Z",
            "url": "https://files.pythonhosted.org/packages/62/13/656f1affe42c1603fad02e83dd90858ad7cce629b4449774466dc7a636b4/tinyshift-0.1.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "81c03d7cfd9dad875011ed22c5e74bdf2c7ecb11863a6987a36b1d38c655cefa",
                "md5": "9604c23e0f46e70c642e1f48070c1f60",
                "sha256": "107653f793f1abc3d878881684d901d4259d6535934e5cf8b146383508da29f2"
            },
            "downloads": -1,
            "filename": "tinyshift-0.1.3.tar.gz",
            "has_sig": false,
            "md5_digest": "9604c23e0f46e70c642e1f48070c1f60",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.10",
            "size": 1373548,
            "upload_time": "2025-07-27T02:54:44",
            "upload_time_iso_8601": "2025-07-27T02:54:44.608826Z",
            "url": "https://files.pythonhosted.org/packages/81/c0/3d7cfd9dad875011ed22c5e74bdf2c7ecb11863a6987a36b1d38c655cefa/tinyshift-0.1.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-27 02:54:44",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "tinyshift"
}
        
Elapsed time: 1.83860s