octorun

Name	octorun JSON
Version	0.2.1 JSON
	download
home_page	None
Summary	A command-line tool for distributed parallel execution across multiple GPUs
upload_time	2025-10-25 20:26:36
maintainer	None
docs_url	None
author	None
requires_python	>=3.10
license	MIT
keywords	cli deep-learning distributed gpu parallel
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            <div align="center">

# 🐙 OctoRun

**Distributed Parallel Execution Made Simple**

*A powerful command-line tool for running Python scripts across multiple GPUs with intelligent task management and monitoring*

[![PyPI version](https://img.shields.io/pypi/v/octorun.svg)](https://pypi.org/project/octorun/)
[![Python](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
[![CUDA](https://img.shields.io/badge/CUDA-supported-green.svg)](https://developer.nvidia.com/cuda-downloads)
[![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
[![Build Status](https://img.shields.io/badge/build-passing-brightgreen.svg)](https://github.com/HarborYuan/OctoRun/actions)

---

</div>

## 📋 Overview

**OctoRun** is designed to help you run computationally intensive Python scripts across multiple GPUs efficiently. It automatically manages GPU allocation, chunks your workload, handles failures with retry mechanisms, and provides comprehensive monitoring and logging.

## ✨ Key Features

- 🔍 **Automatic GPU Detection**: Automatically detects and utilizes available GPUs
- 🧩 **Intelligent Chunk Management**: Divides work into chunks and distributes across GPUs
- 🔄 **Failure Recovery**: Automatic retry mechanism for failed chunks
- 📊 **Comprehensive Logging**: Detailed logging for monitoring and debugging
- ⚙️ **Flexible Configuration**: JSON-based configuration with CLI overrides
- 🎯 **Kwargs Support**: Pass custom arguments to your scripts via config or CLI
- 💾 **Memory Monitoring**: Monitor GPU memory usage and thresholds
- 🔒 **Lock Management**: Prevent duplicate processing of chunks

## 🚀 Installation

You can install OctoRun using `pip` or `uv`.

### Via pip
```bash
pip install octorun
```

### Via uv
```bash
# Install globally
uv tool install octorun

# Install in your project
uv add octorun
```

### Optional extras
- Benchmark tooling: `pip install "octorun[benchmark]"` (installs PyTorch with CUDA support)

## ⚡ Quick Start

1.  **Create Configuration**:
    ```bash
    octorun save_config --script ./your_script.py
    ```

2.  **Run Your Script**:
    ```bash
    octorun run
    ```

3.  **Monitor GPUs**:
    ```bash
    octorun list_gpus -d
    ```

## 🎮 Commands

### `run` (r)

Run your script with the specified configuration.

```bash
octorun run --config config.json [--kwargs '{"key": "value"}']
```

### `save_config` (s)

Generate a default configuration file.

```bash
octorun save_config --script ./your_script.py
```

### `list_gpus` (l)

List available GPUs and their current usage.

```bash
octorun list_gpus [--detailed]
```

The `detailed` flag provides a more comprehensive view of GPU stats, including memory usage, temperature, and running processes.

### `benchmark` (b)

Run a benchmark to determine the optimal number of parallel processes for your GPUs.

```bash
octorun benchmark
```

This command runs a series of tests to help you configure the `gpus` parameter in your `config.json` for the best performance.
Requires the optional benchmark extra (`pip install "octorun[benchmark]"`) so PyTorch is available.

## ⚙️ Configuration

OctoRun uses a `config.json` file for configuration. You can generate a default one with `octorun save_config`.

| Option             | Description                                  | Default        |
| ------------------ | -------------------------------------------- | -------------- |
| `script_path`      | Path to your Python script                   | -              |
| `gpus`             | "auto" or list of GPU IDs                    | "auto"         |
| `total_chunks`     | Number of chunks to divide work into         | 128            |
| `log_dir`          | Directory for log files                      | "./logs"       |
| `chunk_lock_dir`   | Directory for chunk lock files               | "./logs/locks" |
| `monitor_interval` | Monitoring interval in seconds               | 60             |
| `restart_failed`   | Whether to restart failed processes          | false          |
| `max_retries`      | Maximum retries for failed chunks            | 3              |
| `memory_threshold` | Memory threshold percentage                  | 90             |
| `kwargs`           | Custom arguments to pass to your script      | {}             |

## 🎯 Using Kwargs

You can pass custom arguments to your script via the `kwargs` object in your `config.json` or directly through the CLI.

**CLI kwargs will override config file kwargs.**

```bash
octorun run --kwargs '{"batch_size": 128, "learning_rate": 0.005}'
```

## 🔧 Script Implementation

Your script must accept the following arguments:

-   `--gpu_id`: GPU device ID (int)
-   `--chunk_id`: Current chunk number (int)
-   `--total_chunks`: Total number of chunks (int)

Here is an example of how to structure your script:

```python
import argparse
import torch

def main():
    parser = argparse.ArgumentParser()
    
    # Required OctoRun arguments
    parser.add_argument('--gpu_id', type=int, required=True)
    parser.add_argument('--chunk_id', type=int, required=True)
    parser.add_argument('--total_chunks', type=int, required=True)
    
    # Your custom arguments
    parser.add_argument('--batch_size', type=int, default=32)
    parser.add_argument('--learning_rate', type=float, default=0.001)
    parser.add_argument('--model_type', type=str, default='default')
    parser.add_argument('--epochs', type=int, default=1)
    parser.add_argument('--output_dir', type=str, default='./output')
    
    args = parser.parse_args()
    
    # Set the GPU device
    if torch.cuda.is_available():
        torch.cuda.set_device(args.gpu_id)
        print(f"Using GPU {args.gpu_id}")
    
    print(f"Processing chunk {args.chunk_id}/{args.total_chunks}")
    
    # Your logic here

if __name__ == "__main__":
    main()
```

## 🤝 Contributing

Contributions are welcome! Please fork the repository, create a feature branch, and submit a pull request.

## 📄 License

This project is licensed under the **MIT License**.

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "octorun",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": "Haobo Yuan <haoboyuan@ucmerced.edu>",
    "keywords": "cli, deep-learning, distributed, gpu, parallel",
    "author": null,
    "author_email": "Haobo Yuan <haoboyuan@ucmerced.edu>",
    "download_url": "https://files.pythonhosted.org/packages/e2/46/50db1fb43a392e68ca2f48d88776a828a866b5fdb22e91e38a6d0af54b23/octorun-0.2.1.tar.gz",
    "platform": null,
    "description": "<div align=\"center\">\n\n# \ud83d\udc19 OctoRun\n\n**Distributed Parallel Execution Made Simple**\n\n*A powerful command-line tool for running Python scripts across multiple GPUs with intelligent task management and monitoring*\n\n[![PyPI version](https://img.shields.io/pypi/v/octorun.svg)](https://pypi.org/project/octorun/)\n[![Python](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)\n[![CUDA](https://img.shields.io/badge/CUDA-supported-green.svg)](https://developer.nvidia.com/cuda-downloads)\n[![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)\n[![Build Status](https://img.shields.io/badge/build-passing-brightgreen.svg)](https://github.com/HarborYuan/OctoRun/actions)\n\n---\n\n</div>\n\n## \ud83d\udccb Overview\n\n**OctoRun** is designed to help you run computationally intensive Python scripts across multiple GPUs efficiently. It automatically manages GPU allocation, chunks your workload, handles failures with retry mechanisms, and provides comprehensive monitoring and logging.\n\n## \u2728 Key Features\n\n- \ud83d\udd0d **Automatic GPU Detection**: Automatically detects and utilizes available GPUs\n- \ud83e\udde9 **Intelligent Chunk Management**: Divides work into chunks and distributes across GPUs\n- \ud83d\udd04 **Failure Recovery**: Automatic retry mechanism for failed chunks\n- \ud83d\udcca **Comprehensive Logging**: Detailed logging for monitoring and debugging\n- \u2699\ufe0f **Flexible Configuration**: JSON-based configuration with CLI overrides\n- \ud83c\udfaf **Kwargs Support**: Pass custom arguments to your scripts via config or CLI\n- \ud83d\udcbe **Memory Monitoring**: Monitor GPU memory usage and thresholds\n- \ud83d\udd12 **Lock Management**: Prevent duplicate processing of chunks\n\n## \ud83d\ude80 Installation\n\nYou can install OctoRun using `pip` or `uv`.\n\n### Via pip\n```bash\npip install octorun\n```\n\n### Via uv\n```bash\n# Install globally\nuv tool install octorun\n\n# Install in your project\nuv add octorun\n```\n\n### Optional extras\n- Benchmark tooling: `pip install \"octorun[benchmark]\"` (installs PyTorch with CUDA support)\n\n## \u26a1 Quick Start\n\n1.  **Create Configuration**:\n    ```bash\n    octorun save_config --script ./your_script.py\n    ```\n\n2.  **Run Your Script**:\n    ```bash\n    octorun run\n    ```\n\n3.  **Monitor GPUs**:\n    ```bash\n    octorun list_gpus -d\n    ```\n\n## \ud83c\udfae Commands\n\n### `run` (r)\n\nRun your script with the specified configuration.\n\n```bash\noctorun run --config config.json [--kwargs '{\"key\": \"value\"}']\n```\n\n### `save_config` (s)\n\nGenerate a default configuration file.\n\n```bash\noctorun save_config --script ./your_script.py\n```\n\n### `list_gpus` (l)\n\nList available GPUs and their current usage.\n\n```bash\noctorun list_gpus [--detailed]\n```\n\nThe `detailed` flag provides a more comprehensive view of GPU stats, including memory usage, temperature, and running processes.\n\n### `benchmark` (b)\n\nRun a benchmark to determine the optimal number of parallel processes for your GPUs.\n\n```bash\noctorun benchmark\n```\n\nThis command runs a series of tests to help you configure the `gpus` parameter in your `config.json` for the best performance.\nRequires the optional benchmark extra (`pip install \"octorun[benchmark]\"`) so PyTorch is available.\n\n## \u2699\ufe0f Configuration\n\nOctoRun uses a `config.json` file for configuration. You can generate a default one with `octorun save_config`.\n\n| Option             | Description                                  | Default        |\n| ------------------ | -------------------------------------------- | -------------- |\n| `script_path`      | Path to your Python script                   | -              |\n| `gpus`             | \"auto\" or list of GPU IDs                    | \"auto\"         |\n| `total_chunks`     | Number of chunks to divide work into         | 128            |\n| `log_dir`          | Directory for log files                      | \"./logs\"       |\n| `chunk_lock_dir`   | Directory for chunk lock files               | \"./logs/locks\" |\n| `monitor_interval` | Monitoring interval in seconds               | 60             |\n| `restart_failed`   | Whether to restart failed processes          | false          |\n| `max_retries`      | Maximum retries for failed chunks            | 3              |\n| `memory_threshold` | Memory threshold percentage                  | 90             |\n| `kwargs`           | Custom arguments to pass to your script      | {}             |\n\n## \ud83c\udfaf Using Kwargs\n\nYou can pass custom arguments to your script via the `kwargs` object in your `config.json` or directly through the CLI.\n\n**CLI kwargs will override config file kwargs.**\n\n```bash\noctorun run --kwargs '{\"batch_size\": 128, \"learning_rate\": 0.005}'\n```\n\n## \ud83d\udd27 Script Implementation\n\nYour script must accept the following arguments:\n\n-   `--gpu_id`: GPU device ID (int)\n-   `--chunk_id`: Current chunk number (int)\n-   `--total_chunks`: Total number of chunks (int)\n\nHere is an example of how to structure your script:\n\n```python\nimport argparse\nimport torch\n\ndef main():\n    parser = argparse.ArgumentParser()\n    \n    # Required OctoRun arguments\n    parser.add_argument('--gpu_id', type=int, required=True)\n    parser.add_argument('--chunk_id', type=int, required=True)\n    parser.add_argument('--total_chunks', type=int, required=True)\n    \n    # Your custom arguments\n    parser.add_argument('--batch_size', type=int, default=32)\n    parser.add_argument('--learning_rate', type=float, default=0.001)\n    parser.add_argument('--model_type', type=str, default='default')\n    parser.add_argument('--epochs', type=int, default=1)\n    parser.add_argument('--output_dir', type=str, default='./output')\n    \n    args = parser.parse_args()\n    \n    # Set the GPU device\n    if torch.cuda.is_available():\n        torch.cuda.set_device(args.gpu_id)\n        print(f\"Using GPU {args.gpu_id}\")\n    \n    print(f\"Processing chunk {args.chunk_id}/{args.total_chunks}\")\n    \n    # Your logic here\n\nif __name__ == \"__main__\":\n    main()\n```\n\n## \ud83e\udd1d Contributing\n\nContributions are welcome! Please fork the repository, create a feature branch, and submit a pull request.\n\n## \ud83d\udcc4 License\n\nThis project is licensed under the **MIT License**.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A command-line tool for distributed parallel execution across multiple GPUs",
    "version": "0.2.1",
    "project_urls": {
        "Homepage": "https://github.com/HarborYuan/OctoRun",
        "Repository": "https://github.com/HarborYuan/OctoRun"
    },
    "split_keywords": [
        "cli",
        " deep-learning",
        " distributed",
        " gpu",
        " parallel"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "9a7f9cc94bc38f1b111a8dd8eec6cde6eaf41df464b49daff88134a503ec65f3",
                "md5": "b106d8097832982c0345c0adeaf36cb4",
                "sha256": "240fa6d85a73db0d99d4da3bfff6c40c2d6df785d5e9b0dc9524a3073db431f6"
            },
            "downloads": -1,
            "filename": "octorun-0.2.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b106d8097832982c0345c0adeaf36cb4",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 20222,
            "upload_time": "2025-10-25T20:26:35",
            "upload_time_iso_8601": "2025-10-25T20:26:35.392407Z",
            "url": "https://files.pythonhosted.org/packages/9a/7f/9cc94bc38f1b111a8dd8eec6cde6eaf41df464b49daff88134a503ec65f3/octorun-0.2.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "e24650db1fb43a392e68ca2f48d88776a828a866b5fdb22e91e38a6d0af54b23",
                "md5": "27f254174c631bfae0c2113dc934f3eb",
                "sha256": "8a45abce1f2ccd709a1cd50016565551300839063ded5bb690186c12a1d209b3"
            },
            "downloads": -1,
            "filename": "octorun-0.2.1.tar.gz",
            "has_sig": false,
            "md5_digest": "27f254174c631bfae0c2113dc934f3eb",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 44336,
            "upload_time": "2025-10-25T20:26:36",
            "upload_time_iso_8601": "2025-10-25T20:26:36.981338Z",
            "url": "https://files.pythonhosted.org/packages/e2/46/50db1fb43a392e68ca2f48d88776a828a866b5fdb22e91e38a6d0af54b23/octorun-0.2.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-25 20:26:36",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "HarborYuan",
    "github_project": "OctoRun",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "octorun"
}

None