polars-expr-hopper


Namepolars-expr-hopper JSON
Version 0.6.5 PyPI version JSON
download
home_pageNone
SummaryA Polars plugin providing a 'hopper' of expressions for automatic, schema-aware application.
upload_time2025-02-13 09:53:06
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9
licenseMIT
keywords polars plugin filter expr metadata
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # polars-expr-hopper

<!-- [![downloads](https://static.pepy.tech/badge/polars-expr-hopper/month)](https://pepy.tech/project/polars-expr-hopper) -->
[![uv](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/uv/main/assets/badge/v0.json)](https://github.com/astral-sh/uv)
[![pdm-managed](https://img.shields.io/badge/pdm-managed-blueviolet)](https://pdm.fming.dev)
[![PyPI](https://img.shields.io/pypi/v/polars-expr-hopper.svg)](https://pypi.org/project/polars-expr-hopper)
[![Supported Python versions](https://img.shields.io/pypi/pyversions/polars-expr-hopper.svg)](https://pypi.org/project/polars-expr-hopper)
[![License](https://img.shields.io/pypi/l/polars-expr-hopper.svg)](https://pypi.org/project/polars-expr-hopper)
[![pre-commit.ci status](https://results.pre-commit.ci/badge/github/lmmx/polars-expr-hopper/master.svg)](https://results.pre-commit.ci/latest/github/lmmx/polars-expr-hopper/master)

**Polars plugin providing an “expression hopper”**—a flexible, DataFrame-level container of **Polars expressions** (`pl.Expr`) that apply themselves **as soon as** the relevant columns are available.

Powered by [polars-config-meta](https://pypi.org/project/polars-config-meta/) for persistent DataFrame-level metadata.

Simplify data pipelines by storing your expressions in a single location and letting them apply **as soon as** the corresponding columns exist in the DataFrame schema.

## Installation

```bash
pip install polars-expr-hopper
```

> The `polars` dependency is required but not included in the package by default.
> It is shipped as an optional extra which can be activated by passing it in square brackets:
> ```bash
> pip install polars-expr-hopper[polars]           # for standard Polars
> pip install polars-expr-hopper[polars-lts-cpu]   # for older CPUs
> ```

### Requirements

- Python 3.9+
- Polars (any recent version, installed via `[polars]` or `[polars-lts-cpu]` extras)
- _(Optional)_ [pyarrow](https://pypi.org/project/pyarrow) if you want Parquet I/O features that preserve metadata in the hopper

## Features

- **DataFrame-Level Expression Management**: Store multiple Polars **expressions** on a DataFrame via the `.hopper` namespace.
- **Apply When Ready**: Each expression is automatically applied once the DataFrame has all columns required by that expression.
- **Namespace Plugin**: Access everything through `df.hopper.*(...)`—no subclassing or monkey-patching.
- **Metadata Preservation**: Transformations called through `df.hopper.<method>()` keep the same expression hopper on the new DataFrame.
- **No Central Orchestration**: Avoid fiddly pipeline step names or schemas—just attach your expressions once, and they get applied in the right order automatically.
- **Optional Serialisation**: If you want to store or share expressions across runs (e.g., Parquet round-trip), you can serialise them to JSON or binary and restore them later—without forcing overhead in normal usage.

## Usage

### Basic Usage Example

```python
import polars as pl
import polars_hopper  # This registers the .hopper plugin under pl.DataFrame

# Create an initial DataFrame
df = pl.DataFrame({
    "user_id": [1, 2, 3, 0],
    "name": ["Alice", "Bob", "Charlie", "NullUser"]
})

# Add expressions to the hopper:
#  - This one is valid right away: pl.col("user_id") != 0
#  - Another needs a future 'age' column
df.hopper.add_filters(pl.col("user_id") != 0)
df.hopper.add_filters(pl.col("age") > 18)  # 'age' doesn't exist yet

# Apply what we can; the first expression is immediately valid:
df = df.hopper.apply_ready_filters()
print(df)
# Rows with user_id=0 are dropped.

# Now let's do a transformation that adds an 'age' column.
# By calling df.hopper.with_columns(...), the plugin
# automatically copies the hopper metadata to the new DataFrame.
df2 = df.hopper.with_columns(
    pl.Series("age", [25, 15, 30])  # new column
)

# Now the second expression can be applied:
df2 = df2.hopper.apply_ready_filters()
print(df2)
# Only rows with age > 18 remain. That expression is then removed from the hopper.
```

### How It Works

Internally, **polars-expr-hopper** attaches a small “manager” object (a plugin namespace) to each `DataFrame`. This manager leverages [polars-config-meta](https://pypi.org/project/polars-config-meta/) to store data in `df.config_meta.get_metadata()`, keyed by the `id(df)`.

1. **List of In-Memory Expressions**:
   - Maintains a `hopper_filters` list of Polars expressions (`pl.Expr`) in the DataFrame’s metadata.
   - Avoids Python callables or lambdas so that **.meta.root_names()** can be used for schema checks and optional serialisation is possible.

2. **Automatic Column Check** (`apply_ready_filters()`)
   - On `apply_ready_filters()`, each expression’s required columns (via `.meta.root_names()`) are compared to the current DataFrame schema.
   - Expressions referencing missing columns remain pending.
   - Expressions referencing all present columns are applied via `df.filter(expr)`.
   - Successfully applied expressions are removed from the hopper.

3. **Metadata Preservation**
   - Because we rely on **polars-config-meta**, transformations called through `df.hopper.select(...)`, `df.hopper.with_columns(...)`, etc. automatically copy the same `hopper_filters` list to the new DataFrame.
   - This ensures **pending** expressions remain valid throughout your pipeline until their columns finally appear.

4. **No Monkey-Patching**
   - Polars’ plugin system is used, so there is no monkey-patching of core Polars classes.
   - The plugin registers a `.hopper` namespace—just like `df.config_meta`, but specialised for expression management.

Together, these features allow you to:

- store a **set** of Polars expressions in one place
- apply them **as soon as** their required columns exist
- easily carry them forward through the pipeline

All without global orchestration or repeated expression checks.

This was motivated by wanting a way to make a flexible CLI tool and express filters for the results
at different steps, without a proliferation of CLI flags. From there, the idea of a 'queue' which
was pulled from on demand, in FIFO order but on the condition that the schema must be amenable was born.

This idea **could be extended to `select` statements**, but initially filtering was the primary deliverable.

### API Methods

- `add_filters(*exprs: tuple[pl.Expr, ...])`
  Add a new predicate (lambda, function, Polars expression, etc.) to the hopper.

- `apply_ready_filters() -> pl.DataFrame`
  Check each stored expression’s root names. If the columns exist, `df.filter(expr)` is applied. Successfully applied expressions are removed.
- `list_filters() -> List[pl.Expr]`
  Inspect the still-pending expressions in the hopper.
- `serialise_filters(format="binary"|"json") -> List[str|bytes]`
  Convert expressions to JSON strings or binary bytes.
- `deserialise_filters(serialised_list, format="binary"|"json")`
  Re-create in-memory `pl.Expr` objects from the serialised data, overwriting any existing expressions.

## Contributing

Maintained by [Louis Maddox](https://github.com/lmmx/polars-expr-hopper). Contributions welcome!

1. **Issues & Discussions**: Please open a GitHub issue or discussion for bugs, feature requests, or questions.
2. **Pull Requests**: PRs are welcome!
   - Install the dev extra (e.g. with [uv](https://docs.astral.sh/uv/)):
     `uv pip install -e .[dev]`
   - Run tests (when available) and include updates to docs or examples if relevant.
   - If reporting a bug, please include the version and any error messages/tracebacks.

## License

This project is licensed under the [MIT License](https://opensource.org/licenses/MIT).

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "polars-expr-hopper",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "polars, plugin, filter, expr, metadata",
    "author": null,
    "author_email": "Louis Maddox <louismmx@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/53/27/f479f2b6a79c0ea0fbc82a3faed2f83634d2e2b141edc7a04d77a0dfe5d1/polars_expr_hopper-0.6.5.tar.gz",
    "platform": null,
    "description": "# polars-expr-hopper\n\n<!-- [![downloads](https://static.pepy.tech/badge/polars-expr-hopper/month)](https://pepy.tech/project/polars-expr-hopper) -->\n[![uv](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/uv/main/assets/badge/v0.json)](https://github.com/astral-sh/uv)\n[![pdm-managed](https://img.shields.io/badge/pdm-managed-blueviolet)](https://pdm.fming.dev)\n[![PyPI](https://img.shields.io/pypi/v/polars-expr-hopper.svg)](https://pypi.org/project/polars-expr-hopper)\n[![Supported Python versions](https://img.shields.io/pypi/pyversions/polars-expr-hopper.svg)](https://pypi.org/project/polars-expr-hopper)\n[![License](https://img.shields.io/pypi/l/polars-expr-hopper.svg)](https://pypi.org/project/polars-expr-hopper)\n[![pre-commit.ci status](https://results.pre-commit.ci/badge/github/lmmx/polars-expr-hopper/master.svg)](https://results.pre-commit.ci/latest/github/lmmx/polars-expr-hopper/master)\n\n**Polars plugin providing an \u201cexpression hopper\u201d**\u2014a flexible, DataFrame-level container of **Polars expressions** (`pl.Expr`) that apply themselves **as soon as** the relevant columns are available.\n\nPowered by [polars-config-meta](https://pypi.org/project/polars-config-meta/) for persistent DataFrame-level metadata.\n\nSimplify data pipelines by storing your expressions in a single location and letting them apply **as soon as** the corresponding columns exist in the DataFrame schema.\n\n## Installation\n\n```bash\npip install polars-expr-hopper\n```\n\n> The `polars` dependency is required but not included in the package by default.\n> It is shipped as an optional extra which can be activated by passing it in square brackets:\n> ```bash\n> pip install polars-expr-hopper[polars]           # for standard Polars\n> pip install polars-expr-hopper[polars-lts-cpu]   # for older CPUs\n> ```\n\n### Requirements\n\n- Python 3.9+\n- Polars (any recent version, installed via `[polars]` or `[polars-lts-cpu]` extras)\n- _(Optional)_ [pyarrow](https://pypi.org/project/pyarrow) if you want Parquet I/O features that preserve metadata in the hopper\n\n## Features\n\n- **DataFrame-Level Expression Management**: Store multiple Polars **expressions** on a DataFrame via the `.hopper` namespace.\n- **Apply When Ready**: Each expression is automatically applied once the DataFrame has all columns required by that expression.\n- **Namespace Plugin**: Access everything through `df.hopper.*(...)`\u2014no subclassing or monkey-patching.\n- **Metadata Preservation**: Transformations called through `df.hopper.<method>()` keep the same expression hopper on the new DataFrame.\n- **No Central Orchestration**: Avoid fiddly pipeline step names or schemas\u2014just attach your expressions once, and they get applied in the right order automatically.\n- **Optional Serialisation**: If you want to store or share expressions across runs (e.g., Parquet round-trip), you can serialise them to JSON or binary and restore them later\u2014without forcing overhead in normal usage.\n\n## Usage\n\n### Basic Usage Example\n\n```python\nimport polars as pl\nimport polars_hopper  # This registers the .hopper plugin under pl.DataFrame\n\n# Create an initial DataFrame\ndf = pl.DataFrame({\n    \"user_id\": [1, 2, 3, 0],\n    \"name\": [\"Alice\", \"Bob\", \"Charlie\", \"NullUser\"]\n})\n\n# Add expressions to the hopper:\n#  - This one is valid right away: pl.col(\"user_id\") != 0\n#  - Another needs a future 'age' column\ndf.hopper.add_filters(pl.col(\"user_id\") != 0)\ndf.hopper.add_filters(pl.col(\"age\") > 18)  # 'age' doesn't exist yet\n\n# Apply what we can; the first expression is immediately valid:\ndf = df.hopper.apply_ready_filters()\nprint(df)\n# Rows with user_id=0 are dropped.\n\n# Now let's do a transformation that adds an 'age' column.\n# By calling df.hopper.with_columns(...), the plugin\n# automatically copies the hopper metadata to the new DataFrame.\ndf2 = df.hopper.with_columns(\n    pl.Series(\"age\", [25, 15, 30])  # new column\n)\n\n# Now the second expression can be applied:\ndf2 = df2.hopper.apply_ready_filters()\nprint(df2)\n# Only rows with age > 18 remain. That expression is then removed from the hopper.\n```\n\n### How It Works\n\nInternally, **polars-expr-hopper** attaches a small \u201cmanager\u201d object (a plugin namespace) to each `DataFrame`. This manager leverages [polars-config-meta](https://pypi.org/project/polars-config-meta/) to store data in `df.config_meta.get_metadata()`, keyed by the `id(df)`.\n\n1. **List of In-Memory Expressions**:\n   - Maintains a `hopper_filters` list of Polars expressions (`pl.Expr`) in the DataFrame\u2019s metadata.\n   - Avoids Python callables or lambdas so that **.meta.root_names()** can be used for schema checks and optional serialisation is possible.\n\n2. **Automatic Column Check** (`apply_ready_filters()`)\n   - On `apply_ready_filters()`, each expression\u2019s required columns (via `.meta.root_names()`) are compared to the current DataFrame schema.\n   - Expressions referencing missing columns remain pending.\n   - Expressions referencing all present columns are applied via `df.filter(expr)`.\n   - Successfully applied expressions are removed from the hopper.\n\n3. **Metadata Preservation**\n   - Because we rely on **polars-config-meta**, transformations called through `df.hopper.select(...)`, `df.hopper.with_columns(...)`, etc. automatically copy the same `hopper_filters` list to the new DataFrame.\n   - This ensures **pending** expressions remain valid throughout your pipeline until their columns finally appear.\n\n4. **No Monkey-Patching**\n   - Polars\u2019 plugin system is used, so there is no monkey-patching of core Polars classes.\n   - The plugin registers a `.hopper` namespace\u2014just like `df.config_meta`, but specialised for expression management.\n\nTogether, these features allow you to:\n\n- store a **set** of Polars expressions in one place\n- apply them **as soon as** their required columns exist\n- easily carry them forward through the pipeline\n\nAll without global orchestration or repeated expression checks.\n\nThis was motivated by wanting a way to make a flexible CLI tool and express filters for the results\nat different steps, without a proliferation of CLI flags. From there, the idea of a 'queue' which\nwas pulled from on demand, in FIFO order but on the condition that the schema must be amenable was born.\n\nThis idea **could be extended to `select` statements**, but initially filtering was the primary deliverable.\n\n### API Methods\n\n- `add_filters(*exprs: tuple[pl.Expr, ...])`\n  Add a new predicate (lambda, function, Polars expression, etc.) to the hopper.\n\n- `apply_ready_filters() -> pl.DataFrame`\n  Check each stored expression\u2019s root names. If the columns exist, `df.filter(expr)` is applied. Successfully applied expressions are removed.\n- `list_filters() -> List[pl.Expr]`\n  Inspect the still-pending expressions in the hopper.\n- `serialise_filters(format=\"binary\"|\"json\") -> List[str|bytes]`\n  Convert expressions to JSON strings or binary bytes.\n- `deserialise_filters(serialised_list, format=\"binary\"|\"json\")`\n  Re-create in-memory `pl.Expr` objects from the serialised data, overwriting any existing expressions.\n\n## Contributing\n\nMaintained by [Louis Maddox](https://github.com/lmmx/polars-expr-hopper). Contributions welcome!\n\n1. **Issues & Discussions**: Please open a GitHub issue or discussion for bugs, feature requests, or questions.\n2. **Pull Requests**: PRs are welcome!\n   - Install the dev extra (e.g. with [uv](https://docs.astral.sh/uv/)):\n     `uv pip install -e .[dev]`\n   - Run tests (when available) and include updates to docs or examples if relevant.\n   - If reporting a bug, please include the version and any error messages/tracebacks.\n\n## License\n\nThis project is licensed under the [MIT License](https://opensource.org/licenses/MIT).\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A Polars plugin providing a 'hopper' of expressions for automatic, schema-aware application.",
    "version": "0.6.5",
    "project_urls": {
        "Documentation": "https://polars-expr-hopper.vercel.app/",
        "Homepage": "https://github.com/lmmx/polars-expr-hopper",
        "Repository": "https://github.com/lmmx/polars-expr-hopper.git"
    },
    "split_keywords": [
        "polars",
        " plugin",
        " filter",
        " expr",
        " metadata"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ded35ffab6436cf1bf940ccf378d4610224563c1a35b412c7570e689f424b270",
                "md5": "92a592dc82d52a15f16fea183fe87586",
                "sha256": "57d71e189fa4f177a14d1a3d0fe685a5cefe545f1beed440a795fed6293b0af6"
            },
            "downloads": -1,
            "filename": "polars_expr_hopper-0.6.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "92a592dc82d52a15f16fea183fe87586",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 10478,
            "upload_time": "2025-02-13T09:53:04",
            "upload_time_iso_8601": "2025-02-13T09:53:04.726699Z",
            "url": "https://files.pythonhosted.org/packages/de/d3/5ffab6436cf1bf940ccf378d4610224563c1a35b412c7570e689f424b270/polars_expr_hopper-0.6.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5327f479f2b6a79c0ea0fbc82a3faed2f83634d2e2b141edc7a04d77a0dfe5d1",
                "md5": "bdb185e7a6021b54bb9023dabf92454e",
                "sha256": "dca7335750901b85209145ab870371143e356261105392ec1a3db54e18216134"
            },
            "downloads": -1,
            "filename": "polars_expr_hopper-0.6.5.tar.gz",
            "has_sig": false,
            "md5_digest": "bdb185e7a6021b54bb9023dabf92454e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 21306,
            "upload_time": "2025-02-13T09:53:06",
            "upload_time_iso_8601": "2025-02-13T09:53:06.662917Z",
            "url": "https://files.pythonhosted.org/packages/53/27/f479f2b6a79c0ea0fbc82a3faed2f83634d2e2b141edc7a04d77a0dfe5d1/polars_expr_hopper-0.6.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-13 09:53:06",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "lmmx",
    "github_project": "polars-expr-hopper",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "polars-expr-hopper"
}
        
Elapsed time: 0.48530s