# Rindle for Python
Rindle turns collections of per-ticker CSV files into contiguous sliding-window
tensors that are ready for deep learning workflows. The Python extension wraps
the C++20 data preparation engine behind a small, NumPy-friendly API so you can
configure builds, materialize datasets, and recover fitted scalers directly
from notebooks or training scripts.
## Highlights
- **Deterministic dataset builds** – declare the window geometry, scaler, and
input schema with `rindle.create_config` and let the engine emit consistent
results across runs.
- **Manifest-driven reloads** – rehydrate tensors on demand with
`rindle.get_dataset` using the in-memory manifest returned by a build or a
saved `manifest.json` file.
- **NumPy integration** – feature (`Dataset.X`) and target (`Dataset.Y`) tensors
are exposed as NumPy arrays with shape `(windows, sequence_length, features)`
and `float32` precision for direct use with frameworks such as PyTorch or
TensorFlow.
- **Scaler introspection** – fetch the fitted scaler for any ticker/feature pair
to invert predictions or understand the normalization that was applied.
## Installation
The package ships with pre-built wheels when possible and can also be compiled
locally with a C++20 toolchain.
```bash
pip install rindle
```
Building from source requires a compiler with C++20 support, CMake 3.18+, and
Python 3.9 or newer. When working from a clone of the repository:
```bash
python -m pip install --upgrade pip
python -m pip install build
python -m build
python -m pip install dist/rindle-*.whl
```
## Quickstart
```python
from pathlib import Path
import rindle
config = rindle.create_config(
input_dir=Path("data/raw_prices"),
output_dir=Path("data/processed"),
feature_columns=["Open", "High", "Low", "Close", "Volume"],
seq_length=64,
future_horizon=8,
target_column="Close",
time_mode=rindle.TimeMode.UTC_NS,
row_major=False,
scaler_kind=rindle.ScalerKind.Standard,
)
manifest = rindle.build_dataset(config)
dataset = rindle.get_dataset(manifest)
X = dataset.X # NumPy array: (windows, seq_length, n_features), dtype=float32
Y = dataset.Y # NumPy array aligned with X when targets are enabled
meta = dataset.meta # List of WindowMeta objects with ticker provenance
print("total windows:", dataset.n_windows())
```
The manifest stores the configuration, aggregate statistics, and ticker-level
metadata. A copy is written to `<output_dir>/manifest.json` during the build so
you can reload tensors later without repeating the pipeline:
```python
from pathlib import Path
manifest_path = Path(config.output_dir) / "manifest.json"
reloaded = rindle.get_dataset(manifest_path)
```
## Inspecting manifests and scalers
Each `ManifestContent` instance exposes the fields captured during the build,
including `feature_columns`, `total_windows`, and `ticker_stats`. The helper
method `find_stats("AAPL")` returns the `TickerStats` record for a ticker, and
`build_ticker_index()` can be called if you mutate `ticker_stats` manually.
To invert normalized values or apply identical scaling elsewhere:
```python
scaler = rindle.get_feature_scaler(manifest, ticker="AAPL", feature="Close")
original_value = rindle.inverse_transform_value(scaler, value=0.42)
```
The returned `FittedScaler` exposes `transform` and `inverse_transform` methods
as well as a `params` property that includes summary statistics (mean, standard
deviation, quartiles, and min/max bounds).
## Data layout
- `Dataset.X` and `Dataset.Y` are three-dimensional NumPy arrays backed by the
underlying C++ tensors (`float32`). When `row_major=False` (the default), the
layout is `[window][time][feature]` with contiguous storage, making it ideal
for training recurrent and convolutional models.
- `Dataset.meta` is a list of `WindowMeta` objects describing where each window
originated. Fields include `ticker`, `start_row`, `end_row`, and optional
`target_start` / `target_end` indices.
## API reference snapshot
| Function | Description |
| --- | --- |
| `rindle.create_config(...)` | Validate paths, choose feature columns, configure window geometry and scaling. Returns a `DatasetConfig`. |
| `rindle.build_dataset(config)` | Run discovery → scaling → windowing and return a `ManifestContent`. |
| `rindle.get_dataset(manifest_or_path)` | Load feature/target tensors from an in-memory manifest or a saved `manifest.json`. |
| `rindle.get_feature_scaler(manifest_or_path, ticker, feature)` | Retrieve the fitted scaler for a ticker/feature pair to apply or invert scaling. |
| `rindle.inverse_transform_value(scaler, value)` | Convenience helper to undo scaling with a `FittedScaler`. |
Additional classes such as `DatasetConfig`, `ManifestContent`, `Dataset`, and
`TickerStats` expose their fields as Python attributes for straightforward
inspection or serialization.
## Project resources
- Source repository: <https://github.com/EricGilerson/rindle>
- Issue tracker: <https://github.com/EricGilerson/rindle/issues>
Although the core engine is implemented in C++, the Python package provides a
self-contained workflow for assembling time-series datasets without leaving the
Python ecosystem.
Raw data
{
"_id": null,
"home_page": null,
"name": "rindle",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "time series, windowing, sliding window, dataset builder, feature engineering, ML, forecasting, finance, quant, trading, stocks, normalization, scaling, numpy, pandas",
"author": null,
"author_email": "Eric Gilerson <ericgilerson@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/46/be/059a71699ab0c970709770c56f74035ebe7b41073db878b2a129a5e5166e/rindle-0.1.0.tar.gz",
"platform": null,
"description": "# Rindle for Python\n\nRindle turns collections of per-ticker CSV files into contiguous sliding-window\ntensors that are ready for deep learning workflows. The Python extension wraps\nthe C++20 data preparation engine behind a small, NumPy-friendly API so you can\nconfigure builds, materialize datasets, and recover fitted scalers directly\nfrom notebooks or training scripts.\n\n## Highlights\n\n- **Deterministic dataset builds** \u2013 declare the window geometry, scaler, and\n input schema with `rindle.create_config` and let the engine emit consistent\n results across runs.\n- **Manifest-driven reloads** \u2013 rehydrate tensors on demand with\n `rindle.get_dataset` using the in-memory manifest returned by a build or a\n saved `manifest.json` file.\n- **NumPy integration** \u2013 feature (`Dataset.X`) and target (`Dataset.Y`) tensors\n are exposed as NumPy arrays with shape `(windows, sequence_length, features)`\n and `float32` precision for direct use with frameworks such as PyTorch or\n TensorFlow.\n- **Scaler introspection** \u2013 fetch the fitted scaler for any ticker/feature pair\n to invert predictions or understand the normalization that was applied.\n\n## Installation\n\nThe package ships with pre-built wheels when possible and can also be compiled\nlocally with a C++20 toolchain.\n\n```bash\npip install rindle\n```\n\nBuilding from source requires a compiler with C++20 support, CMake 3.18+, and\nPython 3.9 or newer. When working from a clone of the repository:\n\n```bash\npython -m pip install --upgrade pip\npython -m pip install build\npython -m build\npython -m pip install dist/rindle-*.whl\n```\n\n## Quickstart\n\n```python\nfrom pathlib import Path\nimport rindle\n\nconfig = rindle.create_config(\n input_dir=Path(\"data/raw_prices\"),\n output_dir=Path(\"data/processed\"),\n feature_columns=[\"Open\", \"High\", \"Low\", \"Close\", \"Volume\"],\n seq_length=64,\n future_horizon=8,\n target_column=\"Close\",\n time_mode=rindle.TimeMode.UTC_NS,\n row_major=False,\n scaler_kind=rindle.ScalerKind.Standard,\n)\n\nmanifest = rindle.build_dataset(config)\ndataset = rindle.get_dataset(manifest)\n\nX = dataset.X # NumPy array: (windows, seq_length, n_features), dtype=float32\nY = dataset.Y # NumPy array aligned with X when targets are enabled\nmeta = dataset.meta # List of WindowMeta objects with ticker provenance\nprint(\"total windows:\", dataset.n_windows())\n```\n\nThe manifest stores the configuration, aggregate statistics, and ticker-level\nmetadata. A copy is written to `<output_dir>/manifest.json` during the build so\nyou can reload tensors later without repeating the pipeline:\n\n```python\nfrom pathlib import Path\n\nmanifest_path = Path(config.output_dir) / \"manifest.json\"\nreloaded = rindle.get_dataset(manifest_path)\n```\n\n## Inspecting manifests and scalers\n\nEach `ManifestContent` instance exposes the fields captured during the build,\nincluding `feature_columns`, `total_windows`, and `ticker_stats`. The helper\nmethod `find_stats(\"AAPL\")` returns the `TickerStats` record for a ticker, and\n`build_ticker_index()` can be called if you mutate `ticker_stats` manually.\n\nTo invert normalized values or apply identical scaling elsewhere:\n\n```python\nscaler = rindle.get_feature_scaler(manifest, ticker=\"AAPL\", feature=\"Close\")\noriginal_value = rindle.inverse_transform_value(scaler, value=0.42)\n```\n\nThe returned `FittedScaler` exposes `transform` and `inverse_transform` methods\nas well as a `params` property that includes summary statistics (mean, standard\ndeviation, quartiles, and min/max bounds).\n\n## Data layout\n\n- `Dataset.X` and `Dataset.Y` are three-dimensional NumPy arrays backed by the\n underlying C++ tensors (`float32`). When `row_major=False` (the default), the\n layout is `[window][time][feature]` with contiguous storage, making it ideal\n for training recurrent and convolutional models.\n- `Dataset.meta` is a list of `WindowMeta` objects describing where each window\n originated. Fields include `ticker`, `start_row`, `end_row`, and optional\n `target_start` / `target_end` indices.\n\n## API reference snapshot\n\n| Function | Description |\n| --- | --- |\n| `rindle.create_config(...)` | Validate paths, choose feature columns, configure window geometry and scaling. Returns a `DatasetConfig`. |\n| `rindle.build_dataset(config)` | Run discovery \u2192 scaling \u2192 windowing and return a `ManifestContent`. |\n| `rindle.get_dataset(manifest_or_path)` | Load feature/target tensors from an in-memory manifest or a saved `manifest.json`. |\n| `rindle.get_feature_scaler(manifest_or_path, ticker, feature)` | Retrieve the fitted scaler for a ticker/feature pair to apply or invert scaling. |\n| `rindle.inverse_transform_value(scaler, value)` | Convenience helper to undo scaling with a `FittedScaler`. |\n\nAdditional classes such as `DatasetConfig`, `ManifestContent`, `Dataset`, and\n`TickerStats` expose their fields as Python attributes for straightforward\ninspection or serialization.\n\n## Project resources\n\n- Source repository: <https://github.com/EricGilerson/rindle>\n- Issue tracker: <https://github.com/EricGilerson/rindle/issues>\n\nAlthough the core engine is implemented in C++, the Python package provides a\nself-contained workflow for assembling time-series datasets without leaving the\nPython ecosystem.\n",
"bugtrack_url": null,
"license": null,
"summary": "Dataset preparation library with Python bindings for sliding window tensors",
"version": "0.1.0",
"project_urls": {
"Homepage": "https://github.com/EricGilerson/rindle",
"Issues": "https://github.com/EricGilerson/rindle/issues",
"Repository": "https://github.com/EricGilerson/rindle"
},
"split_keywords": [
"time series",
" windowing",
" sliding window",
" dataset builder",
" feature engineering",
" ml",
" forecasting",
" finance",
" quant",
" trading",
" stocks",
" normalization",
" scaling",
" numpy",
" pandas"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "2a48277341e4b745146dbb20545d23ba9482c9f4df018c2962997d304f5a7a55",
"md5": "87438b1d71f7ebcaa98ee3290124fd86",
"sha256": "2d01c590022514bcfafde4a76f895be5f9261fb4963b88f9f13af5fe69524de2"
},
"downloads": -1,
"filename": "rindle-0.1.0-cp39-cp39-macosx_15_0_arm64.whl",
"has_sig": false,
"md5_digest": "87438b1d71f7ebcaa98ee3290124fd86",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": ">=3.9",
"size": 218151,
"upload_time": "2025-10-22T19:12:23",
"upload_time_iso_8601": "2025-10-22T19:12:23.493803Z",
"url": "https://files.pythonhosted.org/packages/2a/48/277341e4b745146dbb20545d23ba9482c9f4df018c2962997d304f5a7a55/rindle-0.1.0-cp39-cp39-macosx_15_0_arm64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "46be059a71699ab0c970709770c56f74035ebe7b41073db878b2a129a5e5166e",
"md5": "b376ffeecc5c31bd95350048f477e854",
"sha256": "4dc8a0f5a396f657f4980797e8566838596e170c1832de903cc97e37dd1a1e16"
},
"downloads": -1,
"filename": "rindle-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "b376ffeecc5c31bd95350048f477e854",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 16295479,
"upload_time": "2025-10-22T19:12:25",
"upload_time_iso_8601": "2025-10-22T19:12:25.417525Z",
"url": "https://files.pythonhosted.org/packages/46/be/059a71699ab0c970709770c56f74035ebe7b41073db878b2a129a5e5166e/rindle-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-22 19:12:25",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "EricGilerson",
"github_project": "rindle",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "rindle"
}