rindle


Namerindle JSON
Version 0.1.0 PyPI version JSON
download
home_pageNone
SummaryDataset preparation library with Python bindings for sliding window tensors
upload_time2025-10-22 19:12:25
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9
licenseNone
keywords time series windowing sliding window dataset builder feature engineering ml forecasting finance quant trading stocks normalization scaling numpy pandas
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Rindle for Python

Rindle turns collections of per-ticker CSV files into contiguous sliding-window
tensors that are ready for deep learning workflows. The Python extension wraps
the C++20 data preparation engine behind a small, NumPy-friendly API so you can
configure builds, materialize datasets, and recover fitted scalers directly
from notebooks or training scripts.

## Highlights

- **Deterministic dataset builds** – declare the window geometry, scaler, and
  input schema with `rindle.create_config` and let the engine emit consistent
  results across runs.
- **Manifest-driven reloads** – rehydrate tensors on demand with
  `rindle.get_dataset` using the in-memory manifest returned by a build or a
  saved `manifest.json` file.
- **NumPy integration** – feature (`Dataset.X`) and target (`Dataset.Y`) tensors
  are exposed as NumPy arrays with shape `(windows, sequence_length, features)`
  and `float32` precision for direct use with frameworks such as PyTorch or
  TensorFlow.
- **Scaler introspection** – fetch the fitted scaler for any ticker/feature pair
  to invert predictions or understand the normalization that was applied.

## Installation

The package ships with pre-built wheels when possible and can also be compiled
locally with a C++20 toolchain.

```bash
pip install rindle
```

Building from source requires a compiler with C++20 support, CMake 3.18+, and
Python 3.9 or newer. When working from a clone of the repository:

```bash
python -m pip install --upgrade pip
python -m pip install build
python -m build
python -m pip install dist/rindle-*.whl
```

## Quickstart

```python
from pathlib import Path
import rindle

config = rindle.create_config(
    input_dir=Path("data/raw_prices"),
    output_dir=Path("data/processed"),
    feature_columns=["Open", "High", "Low", "Close", "Volume"],
    seq_length=64,
    future_horizon=8,
    target_column="Close",
    time_mode=rindle.TimeMode.UTC_NS,
    row_major=False,
    scaler_kind=rindle.ScalerKind.Standard,
)

manifest = rindle.build_dataset(config)
dataset = rindle.get_dataset(manifest)

X = dataset.X  # NumPy array: (windows, seq_length, n_features), dtype=float32
Y = dataset.Y  # NumPy array aligned with X when targets are enabled
meta = dataset.meta  # List of WindowMeta objects with ticker provenance
print("total windows:", dataset.n_windows())
```

The manifest stores the configuration, aggregate statistics, and ticker-level
metadata. A copy is written to `<output_dir>/manifest.json` during the build so
you can reload tensors later without repeating the pipeline:

```python
from pathlib import Path

manifest_path = Path(config.output_dir) / "manifest.json"
reloaded = rindle.get_dataset(manifest_path)
```

## Inspecting manifests and scalers

Each `ManifestContent` instance exposes the fields captured during the build,
including `feature_columns`, `total_windows`, and `ticker_stats`. The helper
method `find_stats("AAPL")` returns the `TickerStats` record for a ticker, and
`build_ticker_index()` can be called if you mutate `ticker_stats` manually.

To invert normalized values or apply identical scaling elsewhere:

```python
scaler = rindle.get_feature_scaler(manifest, ticker="AAPL", feature="Close")
original_value = rindle.inverse_transform_value(scaler, value=0.42)
```

The returned `FittedScaler` exposes `transform` and `inverse_transform` methods
as well as a `params` property that includes summary statistics (mean, standard
deviation, quartiles, and min/max bounds).

## Data layout

- `Dataset.X` and `Dataset.Y` are three-dimensional NumPy arrays backed by the
  underlying C++ tensors (`float32`). When `row_major=False` (the default), the
  layout is `[window][time][feature]` with contiguous storage, making it ideal
  for training recurrent and convolutional models.
- `Dataset.meta` is a list of `WindowMeta` objects describing where each window
  originated. Fields include `ticker`, `start_row`, `end_row`, and optional
  `target_start` / `target_end` indices.

## API reference snapshot

| Function | Description |
| --- | --- |
| `rindle.create_config(...)` | Validate paths, choose feature columns, configure window geometry and scaling. Returns a `DatasetConfig`. |
| `rindle.build_dataset(config)` | Run discovery → scaling → windowing and return a `ManifestContent`. |
| `rindle.get_dataset(manifest_or_path)` | Load feature/target tensors from an in-memory manifest or a saved `manifest.json`. |
| `rindle.get_feature_scaler(manifest_or_path, ticker, feature)` | Retrieve the fitted scaler for a ticker/feature pair to apply or invert scaling. |
| `rindle.inverse_transform_value(scaler, value)` | Convenience helper to undo scaling with a `FittedScaler`. |

Additional classes such as `DatasetConfig`, `ManifestContent`, `Dataset`, and
`TickerStats` expose their fields as Python attributes for straightforward
inspection or serialization.

## Project resources

- Source repository: <https://github.com/EricGilerson/rindle>
- Issue tracker: <https://github.com/EricGilerson/rindle/issues>

Although the core engine is implemented in C++, the Python package provides a
self-contained workflow for assembling time-series datasets without leaving the
Python ecosystem.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "rindle",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "time series, windowing, sliding window, dataset builder, feature engineering, ML, forecasting, finance, quant, trading, stocks, normalization, scaling, numpy, pandas",
    "author": null,
    "author_email": "Eric Gilerson <ericgilerson@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/46/be/059a71699ab0c970709770c56f74035ebe7b41073db878b2a129a5e5166e/rindle-0.1.0.tar.gz",
    "platform": null,
    "description": "# Rindle for Python\n\nRindle turns collections of per-ticker CSV files into contiguous sliding-window\ntensors that are ready for deep learning workflows. The Python extension wraps\nthe C++20 data preparation engine behind a small, NumPy-friendly API so you can\nconfigure builds, materialize datasets, and recover fitted scalers directly\nfrom notebooks or training scripts.\n\n## Highlights\n\n- **Deterministic dataset builds** \u2013 declare the window geometry, scaler, and\n  input schema with `rindle.create_config` and let the engine emit consistent\n  results across runs.\n- **Manifest-driven reloads** \u2013 rehydrate tensors on demand with\n  `rindle.get_dataset` using the in-memory manifest returned by a build or a\n  saved `manifest.json` file.\n- **NumPy integration** \u2013 feature (`Dataset.X`) and target (`Dataset.Y`) tensors\n  are exposed as NumPy arrays with shape `(windows, sequence_length, features)`\n  and `float32` precision for direct use with frameworks such as PyTorch or\n  TensorFlow.\n- **Scaler introspection** \u2013 fetch the fitted scaler for any ticker/feature pair\n  to invert predictions or understand the normalization that was applied.\n\n## Installation\n\nThe package ships with pre-built wheels when possible and can also be compiled\nlocally with a C++20 toolchain.\n\n```bash\npip install rindle\n```\n\nBuilding from source requires a compiler with C++20 support, CMake 3.18+, and\nPython 3.9 or newer. When working from a clone of the repository:\n\n```bash\npython -m pip install --upgrade pip\npython -m pip install build\npython -m build\npython -m pip install dist/rindle-*.whl\n```\n\n## Quickstart\n\n```python\nfrom pathlib import Path\nimport rindle\n\nconfig = rindle.create_config(\n    input_dir=Path(\"data/raw_prices\"),\n    output_dir=Path(\"data/processed\"),\n    feature_columns=[\"Open\", \"High\", \"Low\", \"Close\", \"Volume\"],\n    seq_length=64,\n    future_horizon=8,\n    target_column=\"Close\",\n    time_mode=rindle.TimeMode.UTC_NS,\n    row_major=False,\n    scaler_kind=rindle.ScalerKind.Standard,\n)\n\nmanifest = rindle.build_dataset(config)\ndataset = rindle.get_dataset(manifest)\n\nX = dataset.X  # NumPy array: (windows, seq_length, n_features), dtype=float32\nY = dataset.Y  # NumPy array aligned with X when targets are enabled\nmeta = dataset.meta  # List of WindowMeta objects with ticker provenance\nprint(\"total windows:\", dataset.n_windows())\n```\n\nThe manifest stores the configuration, aggregate statistics, and ticker-level\nmetadata. A copy is written to `<output_dir>/manifest.json` during the build so\nyou can reload tensors later without repeating the pipeline:\n\n```python\nfrom pathlib import Path\n\nmanifest_path = Path(config.output_dir) / \"manifest.json\"\nreloaded = rindle.get_dataset(manifest_path)\n```\n\n## Inspecting manifests and scalers\n\nEach `ManifestContent` instance exposes the fields captured during the build,\nincluding `feature_columns`, `total_windows`, and `ticker_stats`. The helper\nmethod `find_stats(\"AAPL\")` returns the `TickerStats` record for a ticker, and\n`build_ticker_index()` can be called if you mutate `ticker_stats` manually.\n\nTo invert normalized values or apply identical scaling elsewhere:\n\n```python\nscaler = rindle.get_feature_scaler(manifest, ticker=\"AAPL\", feature=\"Close\")\noriginal_value = rindle.inverse_transform_value(scaler, value=0.42)\n```\n\nThe returned `FittedScaler` exposes `transform` and `inverse_transform` methods\nas well as a `params` property that includes summary statistics (mean, standard\ndeviation, quartiles, and min/max bounds).\n\n## Data layout\n\n- `Dataset.X` and `Dataset.Y` are three-dimensional NumPy arrays backed by the\n  underlying C++ tensors (`float32`). When `row_major=False` (the default), the\n  layout is `[window][time][feature]` with contiguous storage, making it ideal\n  for training recurrent and convolutional models.\n- `Dataset.meta` is a list of `WindowMeta` objects describing where each window\n  originated. Fields include `ticker`, `start_row`, `end_row`, and optional\n  `target_start` / `target_end` indices.\n\n## API reference snapshot\n\n| Function | Description |\n| --- | --- |\n| `rindle.create_config(...)` | Validate paths, choose feature columns, configure window geometry and scaling. Returns a `DatasetConfig`. |\n| `rindle.build_dataset(config)` | Run discovery \u2192 scaling \u2192 windowing and return a `ManifestContent`. |\n| `rindle.get_dataset(manifest_or_path)` | Load feature/target tensors from an in-memory manifest or a saved `manifest.json`. |\n| `rindle.get_feature_scaler(manifest_or_path, ticker, feature)` | Retrieve the fitted scaler for a ticker/feature pair to apply or invert scaling. |\n| `rindle.inverse_transform_value(scaler, value)` | Convenience helper to undo scaling with a `FittedScaler`. |\n\nAdditional classes such as `DatasetConfig`, `ManifestContent`, `Dataset`, and\n`TickerStats` expose their fields as Python attributes for straightforward\ninspection or serialization.\n\n## Project resources\n\n- Source repository: <https://github.com/EricGilerson/rindle>\n- Issue tracker: <https://github.com/EricGilerson/rindle/issues>\n\nAlthough the core engine is implemented in C++, the Python package provides a\nself-contained workflow for assembling time-series datasets without leaving the\nPython ecosystem.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Dataset preparation library with Python bindings for sliding window tensors",
    "version": "0.1.0",
    "project_urls": {
        "Homepage": "https://github.com/EricGilerson/rindle",
        "Issues": "https://github.com/EricGilerson/rindle/issues",
        "Repository": "https://github.com/EricGilerson/rindle"
    },
    "split_keywords": [
        "time series",
        " windowing",
        " sliding window",
        " dataset builder",
        " feature engineering",
        " ml",
        " forecasting",
        " finance",
        " quant",
        " trading",
        " stocks",
        " normalization",
        " scaling",
        " numpy",
        " pandas"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "2a48277341e4b745146dbb20545d23ba9482c9f4df018c2962997d304f5a7a55",
                "md5": "87438b1d71f7ebcaa98ee3290124fd86",
                "sha256": "2d01c590022514bcfafde4a76f895be5f9261fb4963b88f9f13af5fe69524de2"
            },
            "downloads": -1,
            "filename": "rindle-0.1.0-cp39-cp39-macosx_15_0_arm64.whl",
            "has_sig": false,
            "md5_digest": "87438b1d71f7ebcaa98ee3290124fd86",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": ">=3.9",
            "size": 218151,
            "upload_time": "2025-10-22T19:12:23",
            "upload_time_iso_8601": "2025-10-22T19:12:23.493803Z",
            "url": "https://files.pythonhosted.org/packages/2a/48/277341e4b745146dbb20545d23ba9482c9f4df018c2962997d304f5a7a55/rindle-0.1.0-cp39-cp39-macosx_15_0_arm64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "46be059a71699ab0c970709770c56f74035ebe7b41073db878b2a129a5e5166e",
                "md5": "b376ffeecc5c31bd95350048f477e854",
                "sha256": "4dc8a0f5a396f657f4980797e8566838596e170c1832de903cc97e37dd1a1e16"
            },
            "downloads": -1,
            "filename": "rindle-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "b376ffeecc5c31bd95350048f477e854",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 16295479,
            "upload_time": "2025-10-22T19:12:25",
            "upload_time_iso_8601": "2025-10-22T19:12:25.417525Z",
            "url": "https://files.pythonhosted.org/packages/46/be/059a71699ab0c970709770c56f74035ebe7b41073db878b2a129a5e5166e/rindle-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-22 19:12:25",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "EricGilerson",
    "github_project": "rindle",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "rindle"
}
        
Elapsed time: 0.90477s