# Haphazard
A Python package for **haphazard dataset and model management**.
Provides a standardized interface for loading datasets (with online normalization) and models, running experiments, and extending with custom datasets or models.
---
## Table of Contents
- [Installation](#installation)
- [Quick Start](#quick-start)
- [Datasets](#datasets)
- [Models](#models)
- [Normalization](#normalization)
- [Versions](#versions)
- [Contributing](#contributing)
- [License](#license)
---
## Installation
Install via pip (after packaging):
```bash
pip install haphazard
````
---
## Project Structure
The Haphazard package follows a modular, extensible design:
```
haphazard/
├── __init__.py
├── data/ # Dataset management
│ ├── __init__.py
│ ├── base_dataset.py
│ ├── mask.py
│ └── datasets/
│ ├── __init__.py
│ ├── dummy_dataset/
│ ├── magic04/
│ ├── a8a/
│ ├── imdb/
│ ├── susy/
│ ├── higgs/
│ ├── dry_bean/
│ └── gas/
├── models/ # Model management
│ ├── __init__.py
│ ├── base_model.py
│ └── model_zoo/
│ ├── __init__.py
│ ├── dummy_model/
│ ├── dynfo/
│ ├── fae/
│ ├── nb3/
│ ├── ocds/
│ ├── olifl/
│ ├── olvf/
│ ├── orf3v/
│ └── ovfm/
├── normalization/ # New in v1.1.0
│ ├── __init__.py
│ ├── base_normalizer.py
│ └── normalizer_zoo/
│ ├── __init__.py
│ ├── decimal_scale.py
│ ├── mean.py
│ ├── minmax.py
│ ├── no_normalization.py
│ ├── unit_vector.py
│ └── zscore.py
└── utils/ # Utilities
├── __init__.py
├── file_utils.py
├── metrics.py
└── seeding.py
```
### Notes
* `data/base_dataset.py` defines the `BaseDataset` class and integrates normalization support.
* `normalization/base_normalizer.py` defines a universal `BaseNormalizer` base class.
* `normalizer_zoo/` provides built-in normalizers (e.g., **zscore**, **mean**, **decimal_scale**, **no_normalization**).
* `models/base_model.py` defines `BaseModel`, used by all models in `model_zoo/`.
* Dynamic registration of **datasets**, **models**, and now **normalizers** is handled via decorators.
---
## Quick Start
```python
from haphazard import load_dataset, load_model
# Load dataset
dataset = load_dataset("dummy", n_samples=100, n_features=10, norm="zscore")
# Load model
model = load_model("dummy")
# Run model
model_params = {} # Dummy dataset has no hyperparameters
outputs = model(dataset, model_parameters)
print(outputs)
```
---
## Datasets
* All datasets inherit from `BaseDataset`.
* Example dataset: `DummyDataset`.
* Main interface:
```python
from haphazard import load_dataset
dataset = load_dataset(
"magic04",
base_path="./data",
scheme="probabilistic",
availability_prob=0.5,
norm="none"
)
x, y = dataset.x, dataset.y
mask = dataset.mask
```
### Dataset Attributes
* `name`: str - dataset name
* `task`: `"classification"` | `"regression"`
* `haphazard_type`: `"controlled"` | `"intrinsic"`
* `n_samples`, `n_features`: int
* `num_classes`: int (for classification)
* `normalizer`: optional (default=`"none"`) defines a normalization scheme, if used
### Available Datasets
* Dummy (`"dummy"`)
* Magic04 (`"magic04"`)
* A8a (`"a8a"`)
* IMDB (`"imdb"`)
* Susy (`"susy"`)
* Higgs (`"higgs"`)
* DryBean (`"dry_bean"`)
* Gas (`"gas"`)
---
## Models
* All models inherit from `BaseModel`.
* Example model: `DummyModel`.
```python
from haphazard import load_model
model = load_model("dummy")
model_params = {} # Hyperparameters of the model
outputs = model(dataset, model_params)
```
### Output
* **Classification**: `labels`, `preds`, `logits`, `time_taken`, `is_logit`
* **Regression**: `targets`, `preds`, `time_taken`
### Available Models
* Dummy - testing/prototyping.
* NB3, FAE - Naive Bayes based models.
* DynFo, ORF3V - Decision stump based models.
* OLVF, OLIFL, OVFM, OCDS - Linear classifier based models.
---
## Normalization
### Overview
Introduced in **v1.1.0**, the normalization module provides standardized interfaces for **online feature normalization** across datasets and models.
### Using Built-in Normalizers
```python
from haphazard import load_data, load_normalizer
dataset = laod_data(
"a8a",
base_path="./",
scheme="sudden",
num_chunks=4,
norm="none" # No normalization applied internally
)
# Load z-score normalization
normalizer = load_normalizer("zscore", num_features=dataset.n_features)
for x, mask, y in dataset:
x_norm = normalizer(x, mask)
...
# Further processing as required
# Load mean normalization
normalizer = load_normalizer("mean", num_features=dataset.n_features)
X, Mask, Y = dataset.x, dataset.y, dataset.mask
for x, mask, y in zip(X, Mask, Y):
x_norm = normalizer(x, mask)
...
# Further processing as required
```
or
```python
from haphazard import load_data, load_normalizer
dataset = laod_data(
"a8a",
base_path="./",
scheme="sudden",
num_chunks=4,
norm="zscore" # Apply normalization internally
)
# Load z-score normalization
# Iterating through the dataset normalizes the input at every step
for x_norm, mask, y in dataset:
...
# Further processing as required
# Un-normalized values can still be extracted using the following
X, Mask, Y = dataset.x, dataset.y, dataset.mask
# Load mean normalization
normalizer = load_normalizer("mean", num_features=dataset.n_features)
for x, mask, y in zip(X, Mask, Y):
x_norm = normalizer(x, mask)
... # Further processing as required
```
### Available Normalizers
| Normalizer Name | Description |
| ------------------ | ----------- |
| `decimal_scale` | Scales feature values by powers of 10 |
| `mean` | Online mean normalization by substracting running mean |
| `minmax` | Scales features by the range (max-min) of the feature value observed |
| `no_normalization` | Pass-through, no normalization applied |
| `unit_vector` | Normalizes observed values into a unit-vector (scales by L2 norm) |
| `zscore` | Online mean normalization using running mean and variance (substract mean, scale by variance) |
### Extending Normalization
Developers can register their own normalization schemes-see [Contributing](#contributing).
---
## Versions
### v1.1.0
**Major Features**
* Added **Normalization Framework**
* Introduced new module `normalization/` with base and zoo submodules.
* Built-in normalizers: `mean`, `zscale`, `no_normalization`, etc.
* Unified registration via `@register_normalizer`.
* Datasets and models now support integrated normalization.
**Modifications**
* Updated:
* `data/base_dataset.py` - normalization integration.
* `models/base_model.py` - normalization compatibility.
* `model_zoo` and `datasets` modules - decorator consistency.
### v1.0.9
- Added model **FAE**
- **Bug Fix**
> Update the X2 calculation in NB3 model
### v1.0.8
- Added model **NB3**.
### v1.0.7
- **Bug Fix**
> - Set RunOCDS.determministic = `False` as it uses random initialization.
> - Not passing 'tau' (or passing None) hyperparameter in OCDS will now result in
> using tau=np.sqrt(1.0/t) as a varied step size, as mentioned in OCDS paper (but not GLSC paper).
### v1.0.6
- Added datasets **A8a**, **IMDB**, **Susy**, and **Higgs**.
### v1.0.5
- Added model **OCDS**.
- **Bug Fixes and Improvements:**
> - In `haphazard/models/model_zoo/dynfo/dynfo.py`:
> Updated the `dropLearner()` method to prevent errors when attempting to remove the last remaining weak learner.
> ```python
> def dropLearner(self, i):
> if len(self.learners) == 1:
> return
> self.learners.pop(i)
> self.weights.pop(i)
> self.acceptedFeatures.pop(i)
> assert len(self.weights) == len(self.learners) == len(self.acceptedFeatures)
> ```
> This ensures stability in low-learner configurations and prevents `IndexError` during runtime.
### v1.0.4
- Added model **ORF3V**.
> NOTE:
>
> * ORF3V also requires an initial buffer, which works similarly to DynFo.
> * ORF3V depends on the optional package `tdigest`, which requires Microsoft Visual C++ Build Tools.
> * To install with this dependency:
>
> 1. Visit: [https://visualstudio.microsoft.com/visual-cpp-build-tools/](https://visualstudio.microsoft.com/visual-cpp-build-tools/)
> 2. Download and install Build Tools for Visual Studio.
> During installation:
>
> * Select “Desktop development with C++” workload.
> * Ensure **MSVC v143 or later**, **Windows 10/11 SDK**, and **CMake tools** are checked.
> * After installation, restart your terminal and re-run:
>
> ```
> pip install haphazard[orf3v]
> ```
>
> or
>
> ```
> pip install haphazard[all] # installs all optional dependencies
> ```
> * The package can still be used without installing `tdigest`; only the `ORF3V` model will be unavailable.
- **Bug Fixes and Improvements:**
> - In `haphazard/models/model_zoo/dynfo/__init__.py`: corrected docstring from
> `"Initialize the OLVF runner class."` -> `"Initialize the DynFo runner class."`
> - In `haphazard/models/model_zoo/dynfo/dynfo.py`: changed
>
> ```python
> return int(np.argmax(wc)), float(max(wc))
> ```
>
> to
>
> ```python
> return int(np.argmax(wc)), float(wc[1])
> ```
>
> for correct AUROC/AUPRC compatibility.
### v1.0.3
- Added model **DynFo**
> NOTE:
> - DynFo requires an initial buffer.
> - If no initial buffer size is provided, it is set to 1.
> - The length of the output labels/preds/logits is reduced by the initial buffer size.
### v1.0.2
- Added model **OVFM**
### v1.0.0
(Considered to be the base version, ignore versions before this)
- Includes models **OLVF** and **OLIFL** natively.
- Includes datasets **Magic04**, **Dry Bean** and **Gas**. (Does not include raw files to read from, please use `base_path` argument to point to relevant path containing the raw files).
---
## Contributing
Haphazard supports easy extensibility for new **datasets**, **models**, and now **normalizers**.
### Adding a new dataset
1. Create a new folder under `haphazard/data/datasets/`, e.g., `my_dataset/`.
2. Add `__init__.py`:
```python
from ...base_dataset import BaseDataset
from ...datasets import register_dataset
import numpy as np
@register_dataset("my_dataset")
class MyDataset(BaseDataset):
def __init__(self, base_path="./", **kwargs):
self.name = "my_dataset"
self.haphazard_type = "controlled"
self.task = "classification"
super().__init__(base_path=base_path, **kwargs)
def read_data(self, base_path="./"):
# Load or generate x, y
x = np.random.random((100, 10))
y = np.random.randint(0, 2, 100)
return x, y
```
3. The dataset is automatically registered and can be loaded with `load_dataset("my_dataset")`.
### Adding a new model
1. Create a new folder under `haphazard/models/model_zoo/`, e.g., `my_model/`.
2. Add `__init__.py`:
```python
from ...base_model import BaseModel, BaseDataset
from ...model_zoo import register_model
import numpy as np
@register_model("my_model")
class MyModel(BaseModel):
def __init__(self, **kwargs):
self.name = "MyModel"
self.tasks = {"classification", "regression"}
self.deterministic = True
self.hyperparameters = set()
super().__init__(**kwargs)
def fit(self, dataset: BaseDataset, model_params=None, seed=42):
# Dummy implementation
preds = []
for x, mask, y in dataset:
preds.append(int(np.random.randint(0, 2)))
if dataset.task == "classification":
return {
"labels": y,
"preds": preds,
"logits": preds,
"time_taken": 0.0,
"is_logit": True
}
elif dataset.task == "regression":
return {
"targets": dataset.y,
"preds": preds,
"time_taken": 0.0,
}
```
3. The model is automatically registered and can be loaded with `load_model("my_model")`.
### Adding a New Normalizer
1. Create a folder under:
```
haphazard/normalization/normalizer_zoo/my_normalizer/
```
2. Add `__init__.py`:
```python
from ...base_normalizer import OnlineNormalization
from ...normalizer_zoo import register_normalizer
import numpy as np
from numpy.typing import NDArray
@register_normalizer("my_normalizer")
class MyNormalizer(OnlineNormalization):
def __init__(self, num_features: int, replace_with: float | str = "nan"):
# initialize required parameters
super().__init__(num_features, replace_with)
def update_params(self, x: NDArray[np.float64], indices: NDArray[np.int64]) -> None:
# Update parameters
def normalize(self, x: NDArray[np.float64], indices: NDArray[np.int64]) -> NDArray[np.float64]:
# normalize x
x_norm = ...
return x_norm
```
3. Load dynamically with:
```python
from haphazard import load_normalizer
normalizer = load_normalizer("my_normalizer", num_features=10)
```
---
## License
MIT License.
Raw data
{
"_id": null,
"home_page": "https://github.com/theArijitDas/Haphazard-Package/",
"name": "haphazard",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "machine-learning haphazard models datasets registration framework",
"author": "Arijit Das",
"author_email": "dasarijitjnv@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/e0/86/b1dcaf32f3a46c737d4359632b6e1def4964e7f7650510f0948fd77cab2b/haphazard-1.1.0.tar.gz",
"platform": null,
"description": "# Haphazard\r\n\r\nA Python package for **haphazard dataset and model management**. \r\nProvides a standardized interface for loading datasets (with online normalization) and models, running experiments, and extending with custom datasets or models.\r\n\r\n---\r\n\r\n## Table of Contents\r\n\r\n- [Installation](#installation)\r\n- [Quick Start](#quick-start)\r\n- [Datasets](#datasets)\r\n- [Models](#models)\r\n- [Normalization](#normalization)\r\n- [Versions](#versions)\r\n- [Contributing](#contributing)\r\n- [License](#license)\r\n\r\n---\r\n\r\n## Installation\r\n\r\nInstall via pip (after packaging):\r\n\r\n```bash\r\npip install haphazard\r\n````\r\n\r\n---\r\n\r\n## Project Structure\r\n\r\nThe Haphazard package follows a modular, extensible design:\r\n\r\n```\r\nhaphazard/\r\n\u251c\u2500\u2500 __init__.py\r\n\u251c\u2500\u2500 data/ # Dataset management\r\n\u2502 \u251c\u2500\u2500 __init__.py\r\n\u2502 \u251c\u2500\u2500 base_dataset.py\r\n\u2502 \u251c\u2500\u2500 mask.py\r\n\u2502 \u2514\u2500\u2500 datasets/\r\n\u2502 \u251c\u2500\u2500 __init__.py\r\n\u2502 \u251c\u2500\u2500 dummy_dataset/\r\n\u2502 \u251c\u2500\u2500 magic04/\r\n\u2502 \u251c\u2500\u2500 a8a/\r\n\u2502 \u251c\u2500\u2500 imdb/\r\n\u2502 \u251c\u2500\u2500 susy/\r\n\u2502 \u251c\u2500\u2500 higgs/\r\n\u2502 \u251c\u2500\u2500 dry_bean/\r\n\u2502 \u2514\u2500\u2500 gas/\r\n\u251c\u2500\u2500 models/ # Model management\r\n\u2502 \u251c\u2500\u2500 __init__.py\r\n\u2502 \u251c\u2500\u2500 base_model.py\r\n\u2502 \u2514\u2500\u2500 model_zoo/\r\n\u2502 \u251c\u2500\u2500 __init__.py\r\n\u2502 \u251c\u2500\u2500 dummy_model/\r\n\u2502 \u251c\u2500\u2500 dynfo/\r\n\u2502 \u251c\u2500\u2500 fae/\r\n\u2502 \u251c\u2500\u2500 nb3/\r\n\u2502 \u251c\u2500\u2500 ocds/\r\n\u2502 \u251c\u2500\u2500 olifl/\r\n\u2502 \u251c\u2500\u2500 olvf/\r\n\u2502 \u251c\u2500\u2500 orf3v/\r\n\u2502 \u2514\u2500\u2500 ovfm/\r\n\u251c\u2500\u2500 normalization/ # New in v1.1.0\r\n\u2502 \u251c\u2500\u2500 __init__.py\r\n\u2502 \u251c\u2500\u2500 base_normalizer.py\r\n\u2502 \u2514\u2500\u2500 normalizer_zoo/\r\n\u2502 \u251c\u2500\u2500 __init__.py\r\n\u2502 \u251c\u2500\u2500 decimal_scale.py\r\n\u2502 \u251c\u2500\u2500 mean.py\r\n\u2502 \u251c\u2500\u2500 minmax.py\r\n\u2502 \u251c\u2500\u2500 no_normalization.py\r\n\u2502 \u251c\u2500\u2500 unit_vector.py\r\n\u2502 \u2514\u2500\u2500 zscore.py\r\n\u2514\u2500\u2500 utils/ # Utilities\r\n \u251c\u2500\u2500 __init__.py\r\n \u251c\u2500\u2500 file_utils.py\r\n \u251c\u2500\u2500 metrics.py\r\n \u2514\u2500\u2500 seeding.py\r\n```\r\n\r\n### Notes\r\n\r\n* `data/base_dataset.py` defines the `BaseDataset` class and integrates normalization support.\r\n* `normalization/base_normalizer.py` defines a universal `BaseNormalizer` base class.\r\n* `normalizer_zoo/` provides built-in normalizers (e.g., **zscore**, **mean**, **decimal_scale**, **no_normalization**).\r\n* `models/base_model.py` defines `BaseModel`, used by all models in `model_zoo/`.\r\n* Dynamic registration of **datasets**, **models**, and now **normalizers** is handled via decorators.\r\n\r\n---\r\n\r\n## Quick Start\r\n\r\n```python\r\nfrom haphazard import load_dataset, load_model\r\n\r\n# Load dataset\r\ndataset = load_dataset(\"dummy\", n_samples=100, n_features=10, norm=\"zscore\")\r\n\r\n# Load model\r\nmodel = load_model(\"dummy\")\r\n\r\n# Run model\r\nmodel_params = {} # Dummy dataset has no hyperparameters\r\noutputs = model(dataset, model_parameters)\r\nprint(outputs)\r\n```\r\n\r\n---\r\n\r\n## Datasets\r\n\r\n* All datasets inherit from `BaseDataset`.\r\n* Example dataset: `DummyDataset`.\r\n* Main interface:\r\n\r\n```python\r\nfrom haphazard import load_dataset\r\ndataset = load_dataset(\r\n \"magic04\", \r\n base_path=\"./data\", \r\n scheme=\"probabilistic\", \r\n availability_prob=0.5,\r\n norm=\"none\"\r\n )\r\n\r\nx, y = dataset.x, dataset.y\r\nmask = dataset.mask\r\n```\r\n\r\n### Dataset Attributes\r\n\r\n* `name`: str - dataset name\r\n* `task`: `\"classification\"` | `\"regression\"`\r\n* `haphazard_type`: `\"controlled\"` | `\"intrinsic\"`\r\n* `n_samples`, `n_features`: int\r\n* `num_classes`: int (for classification)\r\n* `normalizer`: optional (default=`\"none\"`) defines a normalization scheme, if used\r\n\r\n### Available Datasets\r\n\r\n* Dummy (`\"dummy\"`)\r\n* Magic04 (`\"magic04\"`)\r\n* A8a (`\"a8a\"`)\r\n* IMDB (`\"imdb\"`)\r\n* Susy (`\"susy\"`)\r\n* Higgs (`\"higgs\"`)\r\n* DryBean (`\"dry_bean\"`)\r\n* Gas (`\"gas\"`)\r\n\r\n---\r\n\r\n## Models\r\n\r\n* All models inherit from `BaseModel`.\r\n* Example model: `DummyModel`.\r\n\r\n```python\r\nfrom haphazard import load_model\r\nmodel = load_model(\"dummy\")\r\nmodel_params = {} # Hyperparameters of the model\r\noutputs = model(dataset, model_params)\r\n```\r\n\r\n### Output\r\n\r\n* **Classification**: `labels`, `preds`, `logits`, `time_taken`, `is_logit`\r\n* **Regression**: `targets`, `preds`, `time_taken`\r\n\r\n### Available Models\r\n\r\n* Dummy - testing/prototyping.\r\n* NB3, FAE - Naive Bayes based models.\r\n* DynFo, ORF3V - Decision stump based models. \r\n* OLVF, OLIFL, OVFM, OCDS - Linear classifier based models.\r\n\r\n---\r\n\r\n## Normalization\r\n\r\n### Overview\r\n\r\nIntroduced in **v1.1.0**, the normalization module provides standardized interfaces for **online feature normalization** across datasets and models.\r\n\r\n### Using Built-in Normalizers\r\n\r\n```python\r\nfrom haphazard import load_data, load_normalizer\r\n\r\ndataset = laod_data(\r\n \"a8a\",\r\n base_path=\"./\",\r\n scheme=\"sudden\",\r\n num_chunks=4,\r\n norm=\"none\" # No normalization applied internally\r\n)\r\n\r\n# Load z-score normalization\r\nnormalizer = load_normalizer(\"zscore\", num_features=dataset.n_features)\r\nfor x, mask, y in dataset:\r\n x_norm = normalizer(x, mask)\r\n ... \r\n # Further processing as required\r\n\r\n# Load mean normalization\r\nnormalizer = load_normalizer(\"mean\", num_features=dataset.n_features)\r\nX, Mask, Y = dataset.x, dataset.y, dataset.mask\r\nfor x, mask, y in zip(X, Mask, Y):\r\n x_norm = normalizer(x, mask)\r\n ... \r\n # Further processing as required\r\n```\r\n\r\nor \r\n\r\n```python\r\nfrom haphazard import load_data, load_normalizer\r\n\r\ndataset = laod_data(\r\n \"a8a\",\r\n base_path=\"./\",\r\n scheme=\"sudden\",\r\n num_chunks=4,\r\n norm=\"zscore\" # Apply normalization internally\r\n)\r\n\r\n# Load z-score normalization\r\n# Iterating through the dataset normalizes the input at every step\r\nfor x_norm, mask, y in dataset:\r\n ... \r\n # Further processing as required\r\n\r\n\r\n# Un-normalized values can still be extracted using the following\r\nX, Mask, Y = dataset.x, dataset.y, dataset.mask\r\n\r\n# Load mean normalization\r\nnormalizer = load_normalizer(\"mean\", num_features=dataset.n_features)\r\nfor x, mask, y in zip(X, Mask, Y):\r\n x_norm = normalizer(x, mask)\r\n ... # Further processing as required\r\n```\r\n\r\n### Available Normalizers\r\n\r\n| Normalizer Name | Description |\r\n| ------------------ | ----------- |\r\n| `decimal_scale` | Scales feature values by powers of 10 |\r\n| `mean` | Online mean normalization by substracting running mean |\r\n| `minmax` | Scales features by the range (max-min) of the feature value observed |\r\n| `no_normalization` | Pass-through, no normalization applied |\r\n| `unit_vector` | Normalizes observed values into a unit-vector (scales by L2 norm) |\r\n| `zscore` | Online mean normalization using running mean and variance (substract mean, scale by variance) |\r\n\r\n### Extending Normalization\r\n\r\nDevelopers can register their own normalization schemes-see [Contributing](#contributing).\r\n\r\n---\r\n\r\n## Versions\r\n\r\n### v1.1.0\r\n\r\n**Major Features**\r\n\r\n* Added **Normalization Framework**\r\n\r\n * Introduced new module `normalization/` with base and zoo submodules.\r\n * Built-in normalizers: `mean`, `zscale`, `no_normalization`, etc.\r\n * Unified registration via `@register_normalizer`.\r\n * Datasets and models now support integrated normalization.\r\n\r\n**Modifications**\r\n\r\n* Updated:\r\n\r\n * `data/base_dataset.py` - normalization integration.\r\n * `models/base_model.py` - normalization compatibility.\r\n * `model_zoo` and `datasets` modules - decorator consistency.\r\n\r\n\r\n### v1.0.9\r\n\r\n- Added model **FAE**\r\n\r\n- **Bug Fix**\r\n> Update the X2 calculation in NB3 model\r\n\r\n\r\n### v1.0.8\r\n- Added model **NB3**.\r\n\r\n\r\n### v1.0.7\r\n\r\n- **Bug Fix**\r\n> - Set RunOCDS.determministic = `False` as it uses random initialization.\r\n> - Not passing 'tau' (or passing None) hyperparameter in OCDS will now result in \r\n> using tau=np.sqrt(1.0/t) as a varied step size, as mentioned in OCDS paper (but not GLSC paper).\r\n\r\n\r\n### v1.0.6\r\n\r\n- Added datasets **A8a**, **IMDB**, **Susy**, and **Higgs**.\r\n\r\n\r\n### v1.0.5\r\n\r\n- Added model **OCDS**.\r\n\r\n- **Bug Fixes and Improvements:**\r\n> - In `haphazard/models/model_zoo/dynfo/dynfo.py`: \r\n> Updated the `dropLearner()` method to prevent errors when attempting to remove the last remaining weak learner.\r\n> ```python\r\n> def dropLearner(self, i):\r\n> if len(self.learners) == 1:\r\n> return\r\n> self.learners.pop(i)\r\n> self.weights.pop(i)\r\n> self.acceptedFeatures.pop(i)\r\n> assert len(self.weights) == len(self.learners) == len(self.acceptedFeatures)\r\n> ```\r\n> This ensures stability in low-learner configurations and prevents `IndexError` during runtime.\r\n\r\n\r\n### v1.0.4\r\n\r\n- Added model **ORF3V**.\r\n\r\n> NOTE:\r\n>\r\n> * ORF3V also requires an initial buffer, which works similarly to DynFo.\r\n> * ORF3V depends on the optional package `tdigest`, which requires Microsoft Visual C++ Build Tools.\r\n> * To install with this dependency:\r\n>\r\n> 1. Visit: [https://visualstudio.microsoft.com/visual-cpp-build-tools/](https://visualstudio.microsoft.com/visual-cpp-build-tools/)\r\n> 2. Download and install Build Tools for Visual Studio.\r\n> During installation:\r\n>\r\n> * Select \u201cDesktop development with C++\u201d workload.\r\n> * Ensure **MSVC v143 or later**, **Windows 10/11 SDK**, and **CMake tools** are checked.\r\n> * After installation, restart your terminal and re-run:\r\n>\r\n> ```\r\n> pip install haphazard[orf3v]\r\n> ```\r\n>\r\n> or\r\n>\r\n> ```\r\n> pip install haphazard[all] # installs all optional dependencies\r\n> ```\r\n> * The package can still be used without installing `tdigest`; only the `ORF3V` model will be unavailable.\r\n\r\n- **Bug Fixes and Improvements:**\r\n\r\n> - In `haphazard/models/model_zoo/dynfo/__init__.py`: corrected docstring from\r\n> `\"Initialize the OLVF runner class.\"` -> `\"Initialize the DynFo runner class.\"`\r\n> - In `haphazard/models/model_zoo/dynfo/dynfo.py`: changed\r\n>\r\n> ```python\r\n> return int(np.argmax(wc)), float(max(wc))\r\n> ```\r\n>\r\n> to\r\n>\r\n> ```python\r\n> return int(np.argmax(wc)), float(wc[1])\r\n> ```\r\n>\r\n> for correct AUROC/AUPRC compatibility.\r\n\r\n\r\n### v1.0.3\r\n\r\n- Added model **DynFo**\r\n\r\n> NOTE:\r\n> - DynFo requires an initial buffer.\r\n> - If no initial buffer size is provided, it is set to 1.\r\n> - The length of the output labels/preds/logits is reduced by the initial buffer size.\r\n\r\n\r\n### v1.0.2\r\n\r\n- Added model **OVFM**\r\n\r\n\r\n### v1.0.0\r\n\r\n(Considered to be the base version, ignore versions before this)\r\n\r\n- Includes models **OLVF** and **OLIFL** natively.\r\n- Includes datasets **Magic04**, **Dry Bean** and **Gas**. (Does not include raw files to read from, please use `base_path` argument to point to relevant path containing the raw files).\r\n\r\n---\r\n\r\n## Contributing\r\n\r\nHaphazard supports easy extensibility for new **datasets**, **models**, and now **normalizers**.\r\n\r\n### Adding a new dataset\r\n\r\n1. Create a new folder under `haphazard/data/datasets/`, e.g., `my_dataset/`.\r\n2. Add `__init__.py`:\r\n\r\n```python\r\nfrom ...base_dataset import BaseDataset\r\nfrom ...datasets import register_dataset\r\nimport numpy as np\r\n\r\n@register_dataset(\"my_dataset\")\r\nclass MyDataset(BaseDataset):\r\n def __init__(self, base_path=\"./\", **kwargs):\r\n self.name = \"my_dataset\"\r\n self.haphazard_type = \"controlled\"\r\n self.task = \"classification\"\r\n super().__init__(base_path=base_path, **kwargs)\r\n\r\n def read_data(self, base_path=\"./\"):\r\n # Load or generate x, y\r\n x = np.random.random((100, 10))\r\n y = np.random.randint(0, 2, 100)\r\n return x, y\r\n```\r\n\r\n3. The dataset is automatically registered and can be loaded with `load_dataset(\"my_dataset\")`.\r\n\r\n### Adding a new model\r\n\r\n1. Create a new folder under `haphazard/models/model_zoo/`, e.g., `my_model/`.\r\n2. Add `__init__.py`:\r\n\r\n```python\r\nfrom ...base_model import BaseModel, BaseDataset\r\nfrom ...model_zoo import register_model\r\nimport numpy as np\r\n\r\n@register_model(\"my_model\")\r\nclass MyModel(BaseModel):\r\n def __init__(self, **kwargs):\r\n self.name = \"MyModel\"\r\n self.tasks = {\"classification\", \"regression\"}\r\n self.deterministic = True\r\n self.hyperparameters = set()\r\n super().__init__(**kwargs)\r\n\r\n def fit(self, dataset: BaseDataset, model_params=None, seed=42):\r\n # Dummy implementation\r\n preds = []\r\n for x, mask, y in dataset:\r\n preds.append(int(np.random.randint(0, 2)))\r\n if dataset.task == \"classification\":\r\n return {\r\n \"labels\": y,\r\n \"preds\": preds,\r\n \"logits\": preds,\r\n \"time_taken\": 0.0,\r\n \"is_logit\": True\r\n }\r\n elif dataset.task == \"regression\":\r\n return {\r\n \"targets\": dataset.y,\r\n \"preds\": preds,\r\n \"time_taken\": 0.0,\r\n }\r\n```\r\n\r\n3. The model is automatically registered and can be loaded with `load_model(\"my_model\")`.\r\n\r\n\r\n### Adding a New Normalizer\r\n\r\n1. Create a folder under:\r\n\r\n ```\r\n haphazard/normalization/normalizer_zoo/my_normalizer/\r\n ```\r\n\r\n2. Add `__init__.py`:\r\n\r\n ```python\r\n from ...base_normalizer import OnlineNormalization\r\n from ...normalizer_zoo import register_normalizer\r\n import numpy as np\r\n from numpy.typing import NDArray\r\n\r\n @register_normalizer(\"my_normalizer\")\r\n class MyNormalizer(OnlineNormalization):\r\n def __init__(self, num_features: int, replace_with: float | str = \"nan\"):\r\n # initialize required parameters\r\n super().__init__(num_features, replace_with)\r\n\r\n def update_params(self, x: NDArray[np.float64], indices: NDArray[np.int64]) -> None:\r\n # Update parameters\r\n\r\n def normalize(self, x: NDArray[np.float64], indices: NDArray[np.int64]) -> NDArray[np.float64]:\r\n # normalize x\r\n x_norm = ...\r\n return x_norm\r\n ```\r\n\r\n3. Load dynamically with:\r\n\r\n ```python\r\n from haphazard import load_normalizer\r\n normalizer = load_normalizer(\"my_normalizer\", num_features=10)\r\n ```\r\n\r\n---\r\n\r\n## License\r\n\r\nMIT License.\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A modular framework for registering and running haphazard datasets and models.",
"version": "1.1.0",
"project_urls": {
"Bug Tracker": "https://github.com/theArijitDas/Haphazard-Package/issues",
"Homepage": "https://github.com/theArijitDas/Haphazard-Package/",
"Source Code": "https://github.com/theArijitDas/Haphazard-Package/"
},
"split_keywords": [
"machine-learning",
"haphazard",
"models",
"datasets",
"registration",
"framework"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "1cb385f9f0d0b2e61e513fe637a4c6404db1de8314371573a6b1546ae511b680",
"md5": "7283f080df1499da086fe1f9e678d740",
"sha256": "0de9de18832eede04b786311f106db34c3e0853c8091ed95e654d5c366dc2af7"
},
"downloads": -1,
"filename": "haphazard-1.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "7283f080df1499da086fe1f9e678d740",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 123886,
"upload_time": "2025-11-02T23:19:57",
"upload_time_iso_8601": "2025-11-02T23:19:57.927743Z",
"url": "https://files.pythonhosted.org/packages/1c/b3/85f9f0d0b2e61e513fe637a4c6404db1de8314371573a6b1546ae511b680/haphazard-1.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "e086b1dcaf32f3a46c737d4359632b6e1def4964e7f7650510f0948fd77cab2b",
"md5": "18ced2209b78c2c9bf959ee96f7294bf",
"sha256": "497e23fda676f325851269cd50f86ae974c2017a8ea8deb976333212a11cba0c"
},
"downloads": -1,
"filename": "haphazard-1.1.0.tar.gz",
"has_sig": false,
"md5_digest": "18ced2209b78c2c9bf959ee96f7294bf",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 78570,
"upload_time": "2025-11-02T23:20:06",
"upload_time_iso_8601": "2025-11-02T23:20:06.151540Z",
"url": "https://files.pythonhosted.org/packages/e0/86/b1dcaf32f3a46c737d4359632b6e1def4964e7f7650510f0948fd77cab2b/haphazard-1.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-11-02 23:20:06",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "theArijitDas",
"github_project": "Haphazard-Package",
"github_not_found": true,
"lcname": "haphazard"
}