ray-map

Name	ray-map JSON
Version	0.1.2 JSON
	download
home_page	None
Summary	Efficient Ray-powered map/imap with backpressure, checkpointing, timeouts, and async
upload_time	2025-10-25 13:53:40
maintainer	None
docs_url	None
author	None
requires_python	>=3.9
license	MIT License Copyright (c) 2025 Tovarnov Mikhail Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords	ray parallel map distributed-computing async
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # ray-map

Efficient Ray-powered `imap`/`map` with backpressure, checkpointing, per-item timeouts, safe exceptions, and async variants.

---

## ✨ Features

* ⚡ Parallel map/imap on **Ray** with **back-pressure** and **batching**
* 💾 **Checkpointing** + replay of already computed points
* ⏱️ **Per-item timeouts** (worker-side, via threads)
* 🧯 **Safe exceptions**: return `Exception` objects instead of failing the whole run
* 🔁 **Ordered** or **as-ready** modes, `(arg, res)` or plain `res`
* 🧰 **Async API**: `imap_async`, `map_async`
* 🧩 Single-file module `ray_map.py`, `src`-layout friendly

---

## 📦 Install

```bash
pip install ray-map
```

> Requires Python 3.9+ and Ray 2.6+.

---

## 🚀 Quickstart

```python
from ray_map import RayMap

def foo(x):
    if x == 3:
        raise ValueError("oops")
    return x * x

rmap = RayMap(foo, batch_size=8, max_pending=-1, checkpoint_path="res.pkl")

# 1) Stream (ordered). Exceptions raise by default (safe_exceptions=False)
for y in rmap.imap(range(10)):
    print(y)

# 2) Stream (as-ready), safe exceptions, return (arg, res_or_exc)
for arg, res in rmap.imap(
    range(10), keep_order=False, safe_exceptions=True, ret_args=True, timeout=2.0
):
    print(arg, "->", res)

# 3) Collect to list
lst = rmap.map(range(1000), timeout=2.0, safe_exceptions=True)
```

---

## ⚙️ ray.init / ray.shutdown — how to run Ray

`RayMap` supports **lazy initialization** of Ray: if Ray is **not** initialized when you start computing, `RayMap` will call `ray.init()` for you (local, no dashboard) with the runtime env you pass to `RayMap`.

You can also **manage Ray yourself**. Typical patterns:

### A) Manage Ray yourself (recommended in apps/tests/CI)

```python
import os, ray

# Make your codebase visible to workers. Usually the repository root.
ray.init(runtime_env={"working_dir": os.getcwd()}, include_dashboard=False)

# ... use RayMap normally

ray.shutdown()  # ← you are responsible for shutting down Ray explicitly
```

**Notes**

* If Ray is already initialized, `RayMap` will **not** call `ray.init()` again — it just prepares remote functions.
* Prefer to set `runtime_env` explicitly (see **working_dir** below).
* Call `ray.shutdown()` yourself when the program finishes or tests are done.

### B) Let RayMap do it lazily

If you don’t call `ray.init()` yourself, the first call to `imap`/`map` will:

* start a local Ray instance with `runtime_env` taken from `RayMap(..., runtime_env=...)` (if provided), otherwise a minimal default;
* auto-tune `max_pending` if set to `-1` (`CPU + 16` batches);
* no dashboard; logs suppressed by default.

You still need to shut Ray down yourself if you want a clean exit:

```python
import ray
ray.shutdown()
```

### What arguments can be passed to `ray.init()` via RayMap

`RayMap` forwards a subset of options to Ray:

* `address`: connect to a cluster, e.g. `"ray://host:port"` (requires `ray[default]`/`ray[client]`). If omitted, a local instance is started.
* `password`: `_redis_password` for legacy clusters using Redis authentication.
* `runtime_env`: the **runtime environment** for your job, see below.
* `remote_options`: forwarded to `.options(...)` of the remote function (e.g., `{"num_cpus": 0.5, "resources": {...}}`).

### 📂 About `working_dir` (runtime_env)

`working_dir` defines what source files are shipped to workers. That’s critical when your function `fn` is defined in your project code (or even inside `tests/`).

* **When you manage Ray yourself**: set it explicitly

  ```python
  ray.init(runtime_env={"working_dir": "."})  # or os.getcwd()
  ```
* **When RayMap initializes lazily**: it uses the `runtime_env` you passed to `RayMap`. If you didn’t pass any, use `RayMap(fn, runtime_env={"working_dir": "."})` to avoid import errors on workers.

> If your test functions live in `tests/`, either:
>
> * include `tests/` in the runtime env (e.g. via `py_modules: ["tests"]`), **or**
> * move helpers to `src/test_helpers.py` and import from there in tests.

### `ray.shutdown()`

`ray.shutdown()` **is not** called by `RayMap`. You should call it yourself when you want to cleanly stop Ray (end of script, end of test session, etc.). In pytest, use a `session`-scoped fixture that starts Ray once and calls `ray.shutdown()` in teardown.

---

## 🧭 API Reference

```python
RayMap.imap(iterable, *, timeout=None, safe_exceptions=False, keep_order=True, ret_args=False) -> iterator
RayMap.map(iterable,  *, timeout=None, safe_exceptions=False, keep_order=True, ret_args=False) -> list

# Async variants
RayMap.imap_async(...): async iterator
RayMap.map_async(...): list
```

* `timeout`: per-item timeout (seconds) on worker via `ThreadPoolExecutor`
* `safe_exceptions=True`: return exception objects (no crash)
* `keep_order=True`: preserve input order (1:1); `False` → yield as-ready
* `ret_args=True`: yield `(arg, res_or_exc)` instead of just `res_or_exc`

---

## 💾 Checkpointing

* Stores `(key, arg, result_or_exc)` to `checkpoint_path`.
* On restart, previously computed results are yielded first; then Ray resumes the rest.
* (Optional, future) flag to skip storing exceptions.

---

## 🧪 Examples

See `examples/` directory for quickstarts and CI-ready snippets (including pytest fixtures for `ray.init`/`ray.shutdown`).

---

## 🛠️ Ray configuration & environments (details)

* Local vs remote cluster; choosing `batch_size` / `max_pending`.
* Shipping code to workers via `runtime_env` (`working_dir`, `py_modules`).
* Error callback and tuple/dict arg calling conventions.

---

## 📐 Performance tips

* Keep `batch_size` modest (8–64). Too small → overhead; too big → latency/memory.
* Start with `max_pending=-1` and reduce if memory use spikes.
* For lower latency, consider `keep_order=False`.

---

## 🧪 Testing

Provide a pytest `conftest.py` that starts Ray once per session with proper `runtime_env` and shuts it down in teardown. See README “ray.init / ray.shutdown”.

---

## 📄 License

MIT — see `LICENSE`.

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "ray-map",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "ray, parallel, map, distributed-computing, async",
    "author": null,
    "author_email": "Your Name <your.email@example.com>",
    "download_url": null,
    "platform": null,
    "description": "# ray-map\n\nEfficient Ray-powered `imap`/`map` with backpressure, checkpointing, per-item timeouts, safe exceptions, and async variants.\n\n---\n\n## \u2728 Features\n\n* \u26a1 Parallel map/imap on **Ray** with **back-pressure** and **batching**\n* \ud83d\udcbe **Checkpointing** + replay of already computed points\n* \u23f1\ufe0f **Per-item timeouts** (worker-side, via threads)\n* \ud83e\uddef **Safe exceptions**: return `Exception` objects instead of failing the whole run\n* \ud83d\udd01 **Ordered** or **as-ready** modes, `(arg, res)` or plain `res`\n* \ud83e\uddf0 **Async API**: `imap_async`, `map_async`\n* \ud83e\udde9 Single-file module `ray_map.py`, `src`-layout friendly\n\n---\n\n## \ud83d\udce6 Install\n\n```bash\npip install ray-map\n```\n\n> Requires Python 3.9+ and Ray 2.6+.\n\n---\n\n## \ud83d\ude80 Quickstart\n\n```python\nfrom ray_map import RayMap\n\ndef foo(x):\n    if x == 3:\n        raise ValueError(\"oops\")\n    return x * x\n\nrmap = RayMap(foo, batch_size=8, max_pending=-1, checkpoint_path=\"res.pkl\")\n\n# 1) Stream (ordered). Exceptions raise by default (safe_exceptions=False)\nfor y in rmap.imap(range(10)):\n    print(y)\n\n# 2) Stream (as-ready), safe exceptions, return (arg, res_or_exc)\nfor arg, res in rmap.imap(\n    range(10), keep_order=False, safe_exceptions=True, ret_args=True, timeout=2.0\n):\n    print(arg, \"->\", res)\n\n# 3) Collect to list\nlst = rmap.map(range(1000), timeout=2.0, safe_exceptions=True)\n```\n\n---\n\n## \u2699\ufe0f ray.init / ray.shutdown \u2014 how to run Ray\n\n`RayMap` supports **lazy initialization** of Ray: if Ray is **not** initialized when you start computing, `RayMap` will call `ray.init()` for you (local, no dashboard) with the runtime env you pass to `RayMap`.\n\nYou can also **manage Ray yourself**. Typical patterns:\n\n### A) Manage Ray yourself (recommended in apps/tests/CI)\n\n```python\nimport os, ray\n\n# Make your codebase visible to workers. Usually the repository root.\nray.init(runtime_env={\"working_dir\": os.getcwd()}, include_dashboard=False)\n\n# ... use RayMap normally\n\nray.shutdown()  # \u2190 you are responsible for shutting down Ray explicitly\n```\n\n**Notes**\n\n* If Ray is already initialized, `RayMap` will **not** call `ray.init()` again \u2014 it just prepares remote functions.\n* Prefer to set `runtime_env` explicitly (see **working_dir** below).\n* Call `ray.shutdown()` yourself when the program finishes or tests are done.\n\n### B) Let RayMap do it lazily\n\nIf you don\u2019t call `ray.init()` yourself, the first call to `imap`/`map` will:\n\n* start a local Ray instance with `runtime_env` taken from `RayMap(..., runtime_env=...)` (if provided), otherwise a minimal default;\n* auto-tune `max_pending` if set to `-1` (`CPU + 16` batches);\n* no dashboard; logs suppressed by default.\n\nYou still need to shut Ray down yourself if you want a clean exit:\n\n```python\nimport ray\nray.shutdown()\n```\n\n### What arguments can be passed to `ray.init()` via RayMap\n\n`RayMap` forwards a subset of options to Ray:\n\n* `address`: connect to a cluster, e.g. `\"ray://host:port\"` (requires `ray[default]`/`ray[client]`). If omitted, a local instance is started.\n* `password`: `_redis_password` for legacy clusters using Redis authentication.\n* `runtime_env`: the **runtime environment** for your job, see below.\n* `remote_options`: forwarded to `.options(...)` of the remote function (e.g., `{\"num_cpus\": 0.5, \"resources\": {...}}`).\n\n### \ud83d\udcc2 About `working_dir` (runtime_env)\n\n`working_dir` defines what source files are shipped to workers. That\u2019s critical when your function `fn` is defined in your project code (or even inside `tests/`).\n\n* **When you manage Ray yourself**: set it explicitly\n\n  ```python\n  ray.init(runtime_env={\"working_dir\": \".\"})  # or os.getcwd()\n  ```\n* **When RayMap initializes lazily**: it uses the `runtime_env` you passed to `RayMap`. If you didn\u2019t pass any, use `RayMap(fn, runtime_env={\"working_dir\": \".\"})` to avoid import errors on workers.\n\n> If your test functions live in `tests/`, either:\n>\n> * include `tests/` in the runtime env (e.g. via `py_modules: [\"tests\"]`), **or**\n> * move helpers to `src/test_helpers.py` and import from there in tests.\n\n### `ray.shutdown()`\n\n`ray.shutdown()` **is not** called by `RayMap`. You should call it yourself when you want to cleanly stop Ray (end of script, end of test session, etc.). In pytest, use a `session`-scoped fixture that starts Ray once and calls `ray.shutdown()` in teardown.\n\n---\n\n## \ud83e\udded API Reference\n\n```python\nRayMap.imap(iterable, *, timeout=None, safe_exceptions=False, keep_order=True, ret_args=False) -> iterator\nRayMap.map(iterable,  *, timeout=None, safe_exceptions=False, keep_order=True, ret_args=False) -> list\n\n# Async variants\nRayMap.imap_async(...): async iterator\nRayMap.map_async(...): list\n```\n\n* `timeout`: per-item timeout (seconds) on worker via `ThreadPoolExecutor`\n* `safe_exceptions=True`: return exception objects (no crash)\n* `keep_order=True`: preserve input order (1:1); `False` \u2192 yield as-ready\n* `ret_args=True`: yield `(arg, res_or_exc)` instead of just `res_or_exc`\n\n---\n\n## \ud83d\udcbe Checkpointing\n\n* Stores `(key, arg, result_or_exc)` to `checkpoint_path`.\n* On restart, previously computed results are yielded first; then Ray resumes the rest.\n* (Optional, future) flag to skip storing exceptions.\n\n---\n\n## \ud83e\uddea Examples\n\nSee `examples/` directory for quickstarts and CI-ready snippets (including pytest fixtures for `ray.init`/`ray.shutdown`).\n\n---\n\n## \ud83d\udee0\ufe0f Ray configuration & environments (details)\n\n* Local vs remote cluster; choosing `batch_size` / `max_pending`.\n* Shipping code to workers via `runtime_env` (`working_dir`, `py_modules`).\n* Error callback and tuple/dict arg calling conventions.\n\n---\n\n## \ud83d\udcd0 Performance tips\n\n* Keep `batch_size` modest (8\u201364). Too small \u2192 overhead; too big \u2192 latency/memory.\n* Start with `max_pending=-1` and reduce if memory use spikes.\n* For lower latency, consider `keep_order=False`.\n\n---\n\n## \ud83e\uddea Testing\n\nProvide a pytest `conftest.py` that starts Ray once per session with proper `runtime_env` and shuts it down in teardown. See README \u201cray.init / ray.shutdown\u201d.\n\n---\n\n## \ud83d\udcc4 License\n\nMIT \u2014 see `LICENSE`.\n",
    "bugtrack_url": null,
    "license": "MIT License\n        \n        Copyright (c) 2025 Tovarnov Mikhail\n        \n        Permission is hereby granted, free of charge, to any person obtaining a copy\n        of this software and associated documentation files (the \"Software\"), to deal\n        in the Software without restriction, including without limitation the rights\n        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\n        copies of the Software, and to permit persons to whom the Software is\n        furnished to do so, subject to the following conditions:\n        \n        The above copyright notice and this permission notice shall be included in\n        all copies or substantial portions of the Software.\n        \n        THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\n        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\n        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN\n        THE SOFTWARE.\n        ",
    "summary": "Efficient Ray-powered map/imap with backpressure, checkpointing, timeouts, and async",
    "version": "0.1.2",
    "project_urls": {
        "Homepage": "https://github.com/TovarnovM/ray_map",
        "Issues": "https://github.com/TovarnovM/ray_map/issues",
        "Source": "https://github.com/TovarnovM/ray_map"
    },
    "split_keywords": [
        "ray",
        " parallel",
        " map",
        " distributed-computing",
        " async"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "9950ee3b00741aa8635772a0f77500c9b4ed215f3e2916a253cddaa59b5e457d",
                "md5": "e5c02983c823178a42ecbec2121b40e6",
                "sha256": "fc21ad78736a5ad4d631fc163c2d3152553d16e165cc835b0806b88d6e75da88"
            },
            "downloads": -1,
            "filename": "ray_map-0.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e5c02983c823178a42ecbec2121b40e6",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 11295,
            "upload_time": "2025-10-25T13:53:40",
            "upload_time_iso_8601": "2025-10-25T13:53:40.954780Z",
            "url": "https://files.pythonhosted.org/packages/99/50/ee3b00741aa8635772a0f77500c9b4ed215f3e2916a253cddaa59b5e457d/ray_map-0.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-25 13:53:40",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "TovarnovM",
    "github_project": "ray_map",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "ray-map"
}

None