haphazard


Namehaphazard JSON
Version 1.1.0 PyPI version JSON
download
home_pagehttps://github.com/theArijitDas/Haphazard-Package/
SummaryA modular framework for registering and running haphazard datasets and models.
upload_time2025-11-02 23:20:06
maintainerNone
docs_urlNone
authorArijit Das
requires_python>=3.10
licenseMIT
keywords machine-learning haphazard models datasets registration framework
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Haphazard

A Python package for **haphazard dataset and model management**.  
Provides a standardized interface for loading datasets (with online normalization) and models, running experiments, and extending with custom datasets or models.

---

## Table of Contents

- [Installation](#installation)
- [Quick Start](#quick-start)
- [Datasets](#datasets)
- [Models](#models)
- [Normalization](#normalization)
- [Versions](#versions)
- [Contributing](#contributing)
- [License](#license)

---

## Installation

Install via pip (after packaging):

```bash
pip install haphazard
````

---

## Project Structure

The Haphazard package follows a modular, extensible design:

```
haphazard/
├── __init__.py
├── data/                          # Dataset management
│   ├── __init__.py
│   ├── base_dataset.py
│   ├── mask.py
│   └── datasets/
│       ├── __init__.py
│       ├── dummy_dataset/
│       ├── magic04/
│       ├── a8a/
│       ├── imdb/
│       ├── susy/
│       ├── higgs/
│       ├── dry_bean/
│       └── gas/
├── models/                        # Model management
│   ├── __init__.py
│   ├── base_model.py
│   └── model_zoo/
│       ├── __init__.py
│       ├── dummy_model/
│       ├── dynfo/
│       ├── fae/
│       ├── nb3/
│       ├── ocds/
│       ├── olifl/
│       ├── olvf/
│       ├── orf3v/
│       └── ovfm/
├── normalization/                 # New in v1.1.0
│   ├── __init__.py
│   ├── base_normalizer.py
│   └── normalizer_zoo/
│       ├── __init__.py
│       ├── decimal_scale.py
│       ├── mean.py
│       ├── minmax.py
│       ├── no_normalization.py
│       ├── unit_vector.py
│       └── zscore.py
└── utils/                         # Utilities
    ├── __init__.py
    ├── file_utils.py
    ├── metrics.py
    └── seeding.py
```

### Notes

* `data/base_dataset.py` defines the `BaseDataset` class and integrates normalization support.
* `normalization/base_normalizer.py` defines a universal `BaseNormalizer` base class.
* `normalizer_zoo/` provides built-in normalizers (e.g., **zscore**, **mean**, **decimal_scale**, **no_normalization**).
* `models/base_model.py` defines `BaseModel`, used by all models in `model_zoo/`.
* Dynamic registration of **datasets**, **models**, and now **normalizers** is handled via decorators.

---

## Quick Start

```python
from haphazard import load_dataset, load_model

# Load dataset
dataset = load_dataset("dummy", n_samples=100, n_features=10, norm="zscore")

# Load model
model = load_model("dummy")

# Run model
model_params = {}  # Dummy dataset has no hyperparameters
outputs = model(dataset, model_parameters)
print(outputs)
```

---

## Datasets

* All datasets inherit from `BaseDataset`.
* Example dataset: `DummyDataset`.
* Main interface:

```python
from haphazard import load_dataset
dataset = load_dataset(
   "magic04", 
   base_path="./data", 
   scheme="probabilistic", 
   availability_prob=0.5,
   norm="none"
   )

x, y = dataset.x, dataset.y
mask = dataset.mask
```

### Dataset Attributes

* `name`: str - dataset name
* `task`: `"classification"` | `"regression"`
* `haphazard_type`: `"controlled"` | `"intrinsic"`
* `n_samples`, `n_features`: int
* `num_classes`: int (for classification)
* `normalizer`: optional (default=`"none"`) defines a normalization scheme, if used

### Available Datasets

* Dummy (`"dummy"`)
* Magic04 (`"magic04"`)
* A8a (`"a8a"`)
* IMDB (`"imdb"`)
* Susy (`"susy"`)
* Higgs (`"higgs"`)
* DryBean (`"dry_bean"`)
* Gas (`"gas"`)

---

## Models

* All models inherit from `BaseModel`.
* Example model: `DummyModel`.

```python
from haphazard import load_model
model = load_model("dummy")
model_params = {}  # Hyperparameters of the model
outputs = model(dataset, model_params)
```

### Output

* **Classification**: `labels`, `preds`, `logits`, `time_taken`, `is_logit`
* **Regression**: `targets`, `preds`, `time_taken`

### Available Models

* Dummy - testing/prototyping.
* NB3, FAE - Naive Bayes based models.
* DynFo, ORF3V - Decision stump based models. 
* OLVF, OLIFL, OVFM, OCDS - Linear classifier based models.

---

## Normalization

### Overview

Introduced in **v1.1.0**, the normalization module provides standardized interfaces for **online feature normalization** across datasets and models.

### Using Built-in Normalizers

```python
from haphazard import load_data, load_normalizer

dataset = laod_data(
   "a8a",
   base_path="./",
   scheme="sudden",
   num_chunks=4,
   norm="none"  # No normalization applied internally
)

# Load z-score normalization
normalizer = load_normalizer("zscore", num_features=dataset.n_features)
for x, mask, y in dataset:
   x_norm = normalizer(x, mask)
   ...  
   # Further processing as required

# Load mean normalization
normalizer = load_normalizer("mean", num_features=dataset.n_features)
X, Mask, Y = dataset.x, dataset.y, dataset.mask
for x, mask, y in zip(X, Mask, Y):
   x_norm = normalizer(x, mask)
   ...  
   # Further processing as required
```

or 

```python
from haphazard import load_data, load_normalizer

dataset = laod_data(
   "a8a",
   base_path="./",
   scheme="sudden",
   num_chunks=4,
   norm="zscore"  # Apply normalization internally
)

# Load z-score normalization
# Iterating through the dataset normalizes the input at every step
for x_norm, mask, y in dataset:
   ...  
   # Further processing as required


# Un-normalized values can still be extracted using the following
X, Mask, Y = dataset.x, dataset.y, dataset.mask

# Load mean normalization
normalizer = load_normalizer("mean", num_features=dataset.n_features)
for x, mask, y in zip(X, Mask, Y):
   x_norm = normalizer(x, mask)
   ...  # Further processing as required
```

### Available Normalizers

| Normalizer Name    | Description |
| ------------------ | ----------- |
| `decimal_scale`    | Scales feature values by powers of 10 |
| `mean`             | Online mean normalization by substracting running mean |
| `minmax`           | Scales features by the range (max-min) of the feature value observed |
| `no_normalization` | Pass-through, no normalization applied |
| `unit_vector`      | Normalizes observed values into a unit-vector (scales by L2 norm) |
| `zscore`           | Online mean normalization using running mean and variance (substract mean, scale by variance) |

### Extending Normalization

Developers can register their own normalization schemes-see [Contributing](#contributing).

---

## Versions

### v1.1.0

**Major Features**

* Added **Normalization Framework**

  * Introduced new module `normalization/` with base and zoo submodules.
  * Built-in normalizers: `mean`, `zscale`, `no_normalization`, etc.
  * Unified registration via `@register_normalizer`.
  * Datasets and models now support integrated normalization.

**Modifications**

* Updated:

  * `data/base_dataset.py` - normalization integration.
  * `models/base_model.py` - normalization compatibility.
  * `model_zoo` and `datasets` modules - decorator consistency.


### v1.0.9

- Added model **FAE**

- **Bug Fix**
> Update the X2 calculation in NB3 model


### v1.0.8
- Added model **NB3**.


### v1.0.7

- **Bug Fix**
> - Set RunOCDS.determministic = `False` as it uses random initialization.
> - Not passing 'tau' (or passing None) hyperparameter in OCDS will now result in 
> using tau=np.sqrt(1.0/t) as a varied step size, as mentioned in OCDS paper (but not GLSC paper).


### v1.0.6

- Added datasets **A8a**, **IMDB**, **Susy**, and **Higgs**.


### v1.0.5

- Added model **OCDS**.

- **Bug Fixes and Improvements:**
> - In `haphazard/models/model_zoo/dynfo/dynfo.py`:  
>   Updated the `dropLearner()` method to prevent errors when attempting to remove the last remaining weak learner.
>   ```python
>   def dropLearner(self, i):
>       if len(self.learners) == 1:
>           return
>       self.learners.pop(i)
>       self.weights.pop(i)
>       self.acceptedFeatures.pop(i)
>       assert len(self.weights) == len(self.learners) == len(self.acceptedFeatures)
>   ```
>   This ensures stability in low-learner configurations and prevents `IndexError` during runtime.


### v1.0.4

- Added model **ORF3V**.

> NOTE:
>
> * ORF3V also requires an initial buffer, which works similarly to DynFo.
> * ORF3V depends on the optional package `tdigest`, which requires Microsoft Visual C++ Build Tools.
> * To install with this dependency:
>
>   1. Visit: [https://visualstudio.microsoft.com/visual-cpp-build-tools/](https://visualstudio.microsoft.com/visual-cpp-build-tools/)
>   2. Download and install Build Tools for Visual Studio.
>      During installation:
>
>      * Select “Desktop development with C++” workload.
>      * Ensure **MSVC v143 or later**, **Windows 10/11 SDK**, and **CMake tools** are checked.
> * After installation, restart your terminal and re-run:
>
>   ```
>   pip install haphazard[orf3v]
>   ```
>
>   or
>
>   ```
>   pip install haphazard[all]   # installs all optional dependencies
>   ```
> * The package can still be used without installing `tdigest`; only the `ORF3V` model will be unavailable.

- **Bug Fixes and Improvements:**

> - In `haphazard/models/model_zoo/dynfo/__init__.py`: corrected docstring from
>   `"Initialize the OLVF runner class."` -> `"Initialize the DynFo runner class."`
> - In `haphazard/models/model_zoo/dynfo/dynfo.py`: changed
>
>   ```python
>   return int(np.argmax(wc)), float(max(wc))
>   ```
>
>   to
>
>   ```python
>   return int(np.argmax(wc)), float(wc[1])
>   ```
>
>   for correct AUROC/AUPRC compatibility.


### v1.0.3

- Added model **DynFo**

> NOTE:
> - DynFo requires an initial buffer.
> - If no initial buffer size is provided, it is set to 1.
> - The length of the output labels/preds/logits is reduced by the initial buffer size.


### v1.0.2

- Added model **OVFM**


### v1.0.0

(Considered to be the base version, ignore versions before this)

- Includes models **OLVF** and **OLIFL** natively.
- Includes datasets **Magic04**, **Dry Bean** and **Gas**. (Does not include raw files to read from, please use `base_path` argument to point to relevant  path containing the raw files).

---

## Contributing

Haphazard supports easy extensibility for new **datasets**, **models**, and now **normalizers**.

### Adding a new dataset

1. Create a new folder under `haphazard/data/datasets/`, e.g., `my_dataset/`.
2. Add `__init__.py`:

```python
from ...base_dataset import BaseDataset
from ...datasets import register_dataset
import numpy as np

@register_dataset("my_dataset")
class MyDataset(BaseDataset):
    def __init__(self, base_path="./", **kwargs):
        self.name = "my_dataset"
        self.haphazard_type = "controlled"
        self.task = "classification"
        super().__init__(base_path=base_path, **kwargs)

    def read_data(self, base_path="./"):
        # Load or generate x, y
        x = np.random.random((100, 10))
        y = np.random.randint(0, 2, 100)
        return x, y
```

3. The dataset is automatically registered and can be loaded with `load_dataset("my_dataset")`.

### Adding a new model

1. Create a new folder under `haphazard/models/model_zoo/`, e.g., `my_model/`.
2. Add `__init__.py`:

```python
from ...base_model import BaseModel, BaseDataset
from ...model_zoo import register_model
import numpy as np

@register_model("my_model")
class MyModel(BaseModel):
    def __init__(self, **kwargs):
        self.name = "MyModel"
        self.tasks = {"classification", "regression"}
        self.deterministic = True
        self.hyperparameters = set()
        super().__init__(**kwargs)

    def fit(self, dataset: BaseDataset, model_params=None, seed=42):
        # Dummy implementation
        preds = []
        for x, mask, y in dataset:
            preds.append(int(np.random.randint(0, 2)))
        if dataset.task == "classification":
            return {
                "labels": y,
                "preds": preds,
                "logits": preds,
                "time_taken": 0.0,
                "is_logit": True
            }
        elif dataset.task == "regression":
            return {
                "targets": dataset.y,
                "preds": preds,
                "time_taken": 0.0,
            }
```

3. The model is automatically registered and can be loaded with `load_model("my_model")`.


### Adding a New Normalizer

1. Create a folder under:

   ```
   haphazard/normalization/normalizer_zoo/my_normalizer/
   ```

2. Add `__init__.py`:

   ```python
   from ...base_normalizer import OnlineNormalization
   from ...normalizer_zoo import register_normalizer
   import numpy as np
   from numpy.typing import NDArray

   @register_normalizer("my_normalizer")
   class MyNormalizer(OnlineNormalization):
       def __init__(self, num_features: int, replace_with: float | str = "nan"):
           # initialize required parameters
           super().__init__(num_features, replace_with)

       def update_params(self, x: NDArray[np.float64], indices: NDArray[np.int64]) -> None:
           # Update parameters

       def normalize(self, x: NDArray[np.float64], indices: NDArray[np.int64]) -> NDArray[np.float64]:
           # normalize x
           x_norm = ...
           return x_norm
   ```

3. Load dynamically with:

   ```python
   from haphazard import load_normalizer
   normalizer = load_normalizer("my_normalizer", num_features=10)
   ```

---

## License

MIT License.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/theArijitDas/Haphazard-Package/",
    "name": "haphazard",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "machine-learning haphazard models datasets registration framework",
    "author": "Arijit Das",
    "author_email": "dasarijitjnv@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/e0/86/b1dcaf32f3a46c737d4359632b6e1def4964e7f7650510f0948fd77cab2b/haphazard-1.1.0.tar.gz",
    "platform": null,
    "description": "# Haphazard\r\n\r\nA Python package for **haphazard dataset and model management**.  \r\nProvides a standardized interface for loading datasets (with online normalization) and models, running experiments, and extending with custom datasets or models.\r\n\r\n---\r\n\r\n## Table of Contents\r\n\r\n- [Installation](#installation)\r\n- [Quick Start](#quick-start)\r\n- [Datasets](#datasets)\r\n- [Models](#models)\r\n- [Normalization](#normalization)\r\n- [Versions](#versions)\r\n- [Contributing](#contributing)\r\n- [License](#license)\r\n\r\n---\r\n\r\n## Installation\r\n\r\nInstall via pip (after packaging):\r\n\r\n```bash\r\npip install haphazard\r\n````\r\n\r\n---\r\n\r\n## Project Structure\r\n\r\nThe Haphazard package follows a modular, extensible design:\r\n\r\n```\r\nhaphazard/\r\n\u251c\u2500\u2500 __init__.py\r\n\u251c\u2500\u2500 data/                          # Dataset management\r\n\u2502   \u251c\u2500\u2500 __init__.py\r\n\u2502   \u251c\u2500\u2500 base_dataset.py\r\n\u2502   \u251c\u2500\u2500 mask.py\r\n\u2502   \u2514\u2500\u2500 datasets/\r\n\u2502       \u251c\u2500\u2500 __init__.py\r\n\u2502       \u251c\u2500\u2500 dummy_dataset/\r\n\u2502       \u251c\u2500\u2500 magic04/\r\n\u2502       \u251c\u2500\u2500 a8a/\r\n\u2502       \u251c\u2500\u2500 imdb/\r\n\u2502       \u251c\u2500\u2500 susy/\r\n\u2502       \u251c\u2500\u2500 higgs/\r\n\u2502       \u251c\u2500\u2500 dry_bean/\r\n\u2502       \u2514\u2500\u2500 gas/\r\n\u251c\u2500\u2500 models/                        # Model management\r\n\u2502   \u251c\u2500\u2500 __init__.py\r\n\u2502   \u251c\u2500\u2500 base_model.py\r\n\u2502   \u2514\u2500\u2500 model_zoo/\r\n\u2502       \u251c\u2500\u2500 __init__.py\r\n\u2502       \u251c\u2500\u2500 dummy_model/\r\n\u2502       \u251c\u2500\u2500 dynfo/\r\n\u2502       \u251c\u2500\u2500 fae/\r\n\u2502       \u251c\u2500\u2500 nb3/\r\n\u2502       \u251c\u2500\u2500 ocds/\r\n\u2502       \u251c\u2500\u2500 olifl/\r\n\u2502       \u251c\u2500\u2500 olvf/\r\n\u2502       \u251c\u2500\u2500 orf3v/\r\n\u2502       \u2514\u2500\u2500 ovfm/\r\n\u251c\u2500\u2500 normalization/                 # New in v1.1.0\r\n\u2502   \u251c\u2500\u2500 __init__.py\r\n\u2502   \u251c\u2500\u2500 base_normalizer.py\r\n\u2502   \u2514\u2500\u2500 normalizer_zoo/\r\n\u2502       \u251c\u2500\u2500 __init__.py\r\n\u2502       \u251c\u2500\u2500 decimal_scale.py\r\n\u2502       \u251c\u2500\u2500 mean.py\r\n\u2502       \u251c\u2500\u2500 minmax.py\r\n\u2502       \u251c\u2500\u2500 no_normalization.py\r\n\u2502       \u251c\u2500\u2500 unit_vector.py\r\n\u2502       \u2514\u2500\u2500 zscore.py\r\n\u2514\u2500\u2500 utils/                         # Utilities\r\n    \u251c\u2500\u2500 __init__.py\r\n    \u251c\u2500\u2500 file_utils.py\r\n    \u251c\u2500\u2500 metrics.py\r\n    \u2514\u2500\u2500 seeding.py\r\n```\r\n\r\n### Notes\r\n\r\n* `data/base_dataset.py` defines the `BaseDataset` class and integrates normalization support.\r\n* `normalization/base_normalizer.py` defines a universal `BaseNormalizer` base class.\r\n* `normalizer_zoo/` provides built-in normalizers (e.g., **zscore**, **mean**, **decimal_scale**, **no_normalization**).\r\n* `models/base_model.py` defines `BaseModel`, used by all models in `model_zoo/`.\r\n* Dynamic registration of **datasets**, **models**, and now **normalizers** is handled via decorators.\r\n\r\n---\r\n\r\n## Quick Start\r\n\r\n```python\r\nfrom haphazard import load_dataset, load_model\r\n\r\n# Load dataset\r\ndataset = load_dataset(\"dummy\", n_samples=100, n_features=10, norm=\"zscore\")\r\n\r\n# Load model\r\nmodel = load_model(\"dummy\")\r\n\r\n# Run model\r\nmodel_params = {}  # Dummy dataset has no hyperparameters\r\noutputs = model(dataset, model_parameters)\r\nprint(outputs)\r\n```\r\n\r\n---\r\n\r\n## Datasets\r\n\r\n* All datasets inherit from `BaseDataset`.\r\n* Example dataset: `DummyDataset`.\r\n* Main interface:\r\n\r\n```python\r\nfrom haphazard import load_dataset\r\ndataset = load_dataset(\r\n   \"magic04\", \r\n   base_path=\"./data\", \r\n   scheme=\"probabilistic\", \r\n   availability_prob=0.5,\r\n   norm=\"none\"\r\n   )\r\n\r\nx, y = dataset.x, dataset.y\r\nmask = dataset.mask\r\n```\r\n\r\n### Dataset Attributes\r\n\r\n* `name`: str - dataset name\r\n* `task`: `\"classification\"` | `\"regression\"`\r\n* `haphazard_type`: `\"controlled\"` | `\"intrinsic\"`\r\n* `n_samples`, `n_features`: int\r\n* `num_classes`: int (for classification)\r\n* `normalizer`: optional (default=`\"none\"`) defines a normalization scheme, if used\r\n\r\n### Available Datasets\r\n\r\n* Dummy (`\"dummy\"`)\r\n* Magic04 (`\"magic04\"`)\r\n* A8a (`\"a8a\"`)\r\n* IMDB (`\"imdb\"`)\r\n* Susy (`\"susy\"`)\r\n* Higgs (`\"higgs\"`)\r\n* DryBean (`\"dry_bean\"`)\r\n* Gas (`\"gas\"`)\r\n\r\n---\r\n\r\n## Models\r\n\r\n* All models inherit from `BaseModel`.\r\n* Example model: `DummyModel`.\r\n\r\n```python\r\nfrom haphazard import load_model\r\nmodel = load_model(\"dummy\")\r\nmodel_params = {}  # Hyperparameters of the model\r\noutputs = model(dataset, model_params)\r\n```\r\n\r\n### Output\r\n\r\n* **Classification**: `labels`, `preds`, `logits`, `time_taken`, `is_logit`\r\n* **Regression**: `targets`, `preds`, `time_taken`\r\n\r\n### Available Models\r\n\r\n* Dummy - testing/prototyping.\r\n* NB3, FAE - Naive Bayes based models.\r\n* DynFo, ORF3V - Decision stump based models. \r\n* OLVF, OLIFL, OVFM, OCDS - Linear classifier based models.\r\n\r\n---\r\n\r\n## Normalization\r\n\r\n### Overview\r\n\r\nIntroduced in **v1.1.0**, the normalization module provides standardized interfaces for **online feature normalization** across datasets and models.\r\n\r\n### Using Built-in Normalizers\r\n\r\n```python\r\nfrom haphazard import load_data, load_normalizer\r\n\r\ndataset = laod_data(\r\n   \"a8a\",\r\n   base_path=\"./\",\r\n   scheme=\"sudden\",\r\n   num_chunks=4,\r\n   norm=\"none\"  # No normalization applied internally\r\n)\r\n\r\n# Load z-score normalization\r\nnormalizer = load_normalizer(\"zscore\", num_features=dataset.n_features)\r\nfor x, mask, y in dataset:\r\n   x_norm = normalizer(x, mask)\r\n   ...  \r\n   # Further processing as required\r\n\r\n# Load mean normalization\r\nnormalizer = load_normalizer(\"mean\", num_features=dataset.n_features)\r\nX, Mask, Y = dataset.x, dataset.y, dataset.mask\r\nfor x, mask, y in zip(X, Mask, Y):\r\n   x_norm = normalizer(x, mask)\r\n   ...  \r\n   # Further processing as required\r\n```\r\n\r\nor \r\n\r\n```python\r\nfrom haphazard import load_data, load_normalizer\r\n\r\ndataset = laod_data(\r\n   \"a8a\",\r\n   base_path=\"./\",\r\n   scheme=\"sudden\",\r\n   num_chunks=4,\r\n   norm=\"zscore\"  # Apply normalization internally\r\n)\r\n\r\n# Load z-score normalization\r\n# Iterating through the dataset normalizes the input at every step\r\nfor x_norm, mask, y in dataset:\r\n   ...  \r\n   # Further processing as required\r\n\r\n\r\n# Un-normalized values can still be extracted using the following\r\nX, Mask, Y = dataset.x, dataset.y, dataset.mask\r\n\r\n# Load mean normalization\r\nnormalizer = load_normalizer(\"mean\", num_features=dataset.n_features)\r\nfor x, mask, y in zip(X, Mask, Y):\r\n   x_norm = normalizer(x, mask)\r\n   ...  # Further processing as required\r\n```\r\n\r\n### Available Normalizers\r\n\r\n| Normalizer Name    | Description |\r\n| ------------------ | ----------- |\r\n| `decimal_scale`    | Scales feature values by powers of 10 |\r\n| `mean`             | Online mean normalization by substracting running mean |\r\n| `minmax`           | Scales features by the range (max-min) of the feature value observed |\r\n| `no_normalization` | Pass-through, no normalization applied |\r\n| `unit_vector`      | Normalizes observed values into a unit-vector (scales by L2 norm) |\r\n| `zscore`           | Online mean normalization using running mean and variance (substract mean, scale by variance) |\r\n\r\n### Extending Normalization\r\n\r\nDevelopers can register their own normalization schemes-see [Contributing](#contributing).\r\n\r\n---\r\n\r\n## Versions\r\n\r\n### v1.1.0\r\n\r\n**Major Features**\r\n\r\n* Added **Normalization Framework**\r\n\r\n  * Introduced new module `normalization/` with base and zoo submodules.\r\n  * Built-in normalizers: `mean`, `zscale`, `no_normalization`, etc.\r\n  * Unified registration via `@register_normalizer`.\r\n  * Datasets and models now support integrated normalization.\r\n\r\n**Modifications**\r\n\r\n* Updated:\r\n\r\n  * `data/base_dataset.py` - normalization integration.\r\n  * `models/base_model.py` - normalization compatibility.\r\n  * `model_zoo` and `datasets` modules - decorator consistency.\r\n\r\n\r\n### v1.0.9\r\n\r\n- Added model **FAE**\r\n\r\n- **Bug Fix**\r\n> Update the X2 calculation in NB3 model\r\n\r\n\r\n### v1.0.8\r\n- Added model **NB3**.\r\n\r\n\r\n### v1.0.7\r\n\r\n- **Bug Fix**\r\n> - Set RunOCDS.determministic = `False` as it uses random initialization.\r\n> - Not passing 'tau' (or passing None) hyperparameter in OCDS will now result in \r\n> using tau=np.sqrt(1.0/t) as a varied step size, as mentioned in OCDS paper (but not GLSC paper).\r\n\r\n\r\n### v1.0.6\r\n\r\n- Added datasets **A8a**, **IMDB**, **Susy**, and **Higgs**.\r\n\r\n\r\n### v1.0.5\r\n\r\n- Added model **OCDS**.\r\n\r\n- **Bug Fixes and Improvements:**\r\n> - In `haphazard/models/model_zoo/dynfo/dynfo.py`:  \r\n>   Updated the `dropLearner()` method to prevent errors when attempting to remove the last remaining weak learner.\r\n>   ```python\r\n>   def dropLearner(self, i):\r\n>       if len(self.learners) == 1:\r\n>           return\r\n>       self.learners.pop(i)\r\n>       self.weights.pop(i)\r\n>       self.acceptedFeatures.pop(i)\r\n>       assert len(self.weights) == len(self.learners) == len(self.acceptedFeatures)\r\n>   ```\r\n>   This ensures stability in low-learner configurations and prevents `IndexError` during runtime.\r\n\r\n\r\n### v1.0.4\r\n\r\n- Added model **ORF3V**.\r\n\r\n> NOTE:\r\n>\r\n> * ORF3V also requires an initial buffer, which works similarly to DynFo.\r\n> * ORF3V depends on the optional package `tdigest`, which requires Microsoft Visual C++ Build Tools.\r\n> * To install with this dependency:\r\n>\r\n>   1. Visit: [https://visualstudio.microsoft.com/visual-cpp-build-tools/](https://visualstudio.microsoft.com/visual-cpp-build-tools/)\r\n>   2. Download and install Build Tools for Visual Studio.\r\n>      During installation:\r\n>\r\n>      * Select \u201cDesktop development with C++\u201d workload.\r\n>      * Ensure **MSVC v143 or later**, **Windows 10/11 SDK**, and **CMake tools** are checked.\r\n> * After installation, restart your terminal and re-run:\r\n>\r\n>   ```\r\n>   pip install haphazard[orf3v]\r\n>   ```\r\n>\r\n>   or\r\n>\r\n>   ```\r\n>   pip install haphazard[all]   # installs all optional dependencies\r\n>   ```\r\n> * The package can still be used without installing `tdigest`; only the `ORF3V` model will be unavailable.\r\n\r\n- **Bug Fixes and Improvements:**\r\n\r\n> - In `haphazard/models/model_zoo/dynfo/__init__.py`: corrected docstring from\r\n>   `\"Initialize the OLVF runner class.\"` -> `\"Initialize the DynFo runner class.\"`\r\n> - In `haphazard/models/model_zoo/dynfo/dynfo.py`: changed\r\n>\r\n>   ```python\r\n>   return int(np.argmax(wc)), float(max(wc))\r\n>   ```\r\n>\r\n>   to\r\n>\r\n>   ```python\r\n>   return int(np.argmax(wc)), float(wc[1])\r\n>   ```\r\n>\r\n>   for correct AUROC/AUPRC compatibility.\r\n\r\n\r\n### v1.0.3\r\n\r\n- Added model **DynFo**\r\n\r\n> NOTE:\r\n> - DynFo requires an initial buffer.\r\n> - If no initial buffer size is provided, it is set to 1.\r\n> - The length of the output labels/preds/logits is reduced by the initial buffer size.\r\n\r\n\r\n### v1.0.2\r\n\r\n- Added model **OVFM**\r\n\r\n\r\n### v1.0.0\r\n\r\n(Considered to be the base version, ignore versions before this)\r\n\r\n- Includes models **OLVF** and **OLIFL** natively.\r\n- Includes datasets **Magic04**, **Dry Bean** and **Gas**. (Does not include raw files to read from, please use `base_path` argument to point to relevant  path containing the raw files).\r\n\r\n---\r\n\r\n## Contributing\r\n\r\nHaphazard supports easy extensibility for new **datasets**, **models**, and now **normalizers**.\r\n\r\n### Adding a new dataset\r\n\r\n1. Create a new folder under `haphazard/data/datasets/`, e.g., `my_dataset/`.\r\n2. Add `__init__.py`:\r\n\r\n```python\r\nfrom ...base_dataset import BaseDataset\r\nfrom ...datasets import register_dataset\r\nimport numpy as np\r\n\r\n@register_dataset(\"my_dataset\")\r\nclass MyDataset(BaseDataset):\r\n    def __init__(self, base_path=\"./\", **kwargs):\r\n        self.name = \"my_dataset\"\r\n        self.haphazard_type = \"controlled\"\r\n        self.task = \"classification\"\r\n        super().__init__(base_path=base_path, **kwargs)\r\n\r\n    def read_data(self, base_path=\"./\"):\r\n        # Load or generate x, y\r\n        x = np.random.random((100, 10))\r\n        y = np.random.randint(0, 2, 100)\r\n        return x, y\r\n```\r\n\r\n3. The dataset is automatically registered and can be loaded with `load_dataset(\"my_dataset\")`.\r\n\r\n### Adding a new model\r\n\r\n1. Create a new folder under `haphazard/models/model_zoo/`, e.g., `my_model/`.\r\n2. Add `__init__.py`:\r\n\r\n```python\r\nfrom ...base_model import BaseModel, BaseDataset\r\nfrom ...model_zoo import register_model\r\nimport numpy as np\r\n\r\n@register_model(\"my_model\")\r\nclass MyModel(BaseModel):\r\n    def __init__(self, **kwargs):\r\n        self.name = \"MyModel\"\r\n        self.tasks = {\"classification\", \"regression\"}\r\n        self.deterministic = True\r\n        self.hyperparameters = set()\r\n        super().__init__(**kwargs)\r\n\r\n    def fit(self, dataset: BaseDataset, model_params=None, seed=42):\r\n        # Dummy implementation\r\n        preds = []\r\n        for x, mask, y in dataset:\r\n            preds.append(int(np.random.randint(0, 2)))\r\n        if dataset.task == \"classification\":\r\n            return {\r\n                \"labels\": y,\r\n                \"preds\": preds,\r\n                \"logits\": preds,\r\n                \"time_taken\": 0.0,\r\n                \"is_logit\": True\r\n            }\r\n        elif dataset.task == \"regression\":\r\n            return {\r\n                \"targets\": dataset.y,\r\n                \"preds\": preds,\r\n                \"time_taken\": 0.0,\r\n            }\r\n```\r\n\r\n3. The model is automatically registered and can be loaded with `load_model(\"my_model\")`.\r\n\r\n\r\n### Adding a New Normalizer\r\n\r\n1. Create a folder under:\r\n\r\n   ```\r\n   haphazard/normalization/normalizer_zoo/my_normalizer/\r\n   ```\r\n\r\n2. Add `__init__.py`:\r\n\r\n   ```python\r\n   from ...base_normalizer import OnlineNormalization\r\n   from ...normalizer_zoo import register_normalizer\r\n   import numpy as np\r\n   from numpy.typing import NDArray\r\n\r\n   @register_normalizer(\"my_normalizer\")\r\n   class MyNormalizer(OnlineNormalization):\r\n       def __init__(self, num_features: int, replace_with: float | str = \"nan\"):\r\n           # initialize required parameters\r\n           super().__init__(num_features, replace_with)\r\n\r\n       def update_params(self, x: NDArray[np.float64], indices: NDArray[np.int64]) -> None:\r\n           # Update parameters\r\n\r\n       def normalize(self, x: NDArray[np.float64], indices: NDArray[np.int64]) -> NDArray[np.float64]:\r\n           # normalize x\r\n           x_norm = ...\r\n           return x_norm\r\n   ```\r\n\r\n3. Load dynamically with:\r\n\r\n   ```python\r\n   from haphazard import load_normalizer\r\n   normalizer = load_normalizer(\"my_normalizer\", num_features=10)\r\n   ```\r\n\r\n---\r\n\r\n## License\r\n\r\nMIT License.\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A modular framework for registering and running haphazard datasets and models.",
    "version": "1.1.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/theArijitDas/Haphazard-Package/issues",
        "Homepage": "https://github.com/theArijitDas/Haphazard-Package/",
        "Source Code": "https://github.com/theArijitDas/Haphazard-Package/"
    },
    "split_keywords": [
        "machine-learning",
        "haphazard",
        "models",
        "datasets",
        "registration",
        "framework"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "1cb385f9f0d0b2e61e513fe637a4c6404db1de8314371573a6b1546ae511b680",
                "md5": "7283f080df1499da086fe1f9e678d740",
                "sha256": "0de9de18832eede04b786311f106db34c3e0853c8091ed95e654d5c366dc2af7"
            },
            "downloads": -1,
            "filename": "haphazard-1.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "7283f080df1499da086fe1f9e678d740",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 123886,
            "upload_time": "2025-11-02T23:19:57",
            "upload_time_iso_8601": "2025-11-02T23:19:57.927743Z",
            "url": "https://files.pythonhosted.org/packages/1c/b3/85f9f0d0b2e61e513fe637a4c6404db1de8314371573a6b1546ae511b680/haphazard-1.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "e086b1dcaf32f3a46c737d4359632b6e1def4964e7f7650510f0948fd77cab2b",
                "md5": "18ced2209b78c2c9bf959ee96f7294bf",
                "sha256": "497e23fda676f325851269cd50f86ae974c2017a8ea8deb976333212a11cba0c"
            },
            "downloads": -1,
            "filename": "haphazard-1.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "18ced2209b78c2c9bf959ee96f7294bf",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 78570,
            "upload_time": "2025-11-02T23:20:06",
            "upload_time_iso_8601": "2025-11-02T23:20:06.151540Z",
            "url": "https://files.pythonhosted.org/packages/e0/86/b1dcaf32f3a46c737d4359632b6e1def4964e7f7650510f0948fd77cab2b/haphazard-1.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-11-02 23:20:06",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "theArijitDas",
    "github_project": "Haphazard-Package",
    "github_not_found": true,
    "lcname": "haphazard"
}
        
Elapsed time: 3.32644s