distry-py


Namedistry-py JSON
Version 0.2.1 PyPI version JSON
download
home_pageNone
SummaryDistributed task execution framework
upload_time2025-10-24 23:49:47
maintainerNone
docs_urlNone
authorCarlo Moro
requires_python>=3.8
licenseMIT
keywords distributed computing parallel processing
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <img src="https://raw.githubusercontent.com/cnmoro/distry/refs/heads/master/assets/logo.svg" alt="Logo" width="220" />

## Distry

Distributed task execution framework. Scale your Python functions across multiple workers.

## Features

* **Zero-config setup** - Auto-detects and installs dependencies
* **Simple API** - Just `client.map(func, inputs)`
* **Fault-tolerant** - Handles worker failures gracefully
* **Automatic Job Batching** - Large jobs are automatically split to fit worker RAM limits.
* **Package management** - Installs required packages on workers
* **Global indexing** - Results returned in input order

## Installation

```plaintext
# Client only (for task submission)
pip install distry-py[client]

# Worker only (for task execution)
pip install distry-py[worker]

# Full installation
pip install distry-py[all]
```

## Quick Start

### 1. Start Workers

```plaintext
# Terminal 1 - Worker 1
distry-worker --port 8001

# Terminal 2 - Worker 2 (with RAM limit)
distry-worker --port 8002 --max-ram 2g
```

### 2. Run Tasks

```python
from distry import Client

# Connect to workers
client = Client(["http://127.0.0.1:8001", "http://127.0.0.1:8002"])

# Define function (any Python function works!)
import numpy as np

def process_data(x):
    return np.mean([x, x**2, x**3])

# Process inputs in parallel
results = client.map(process_data, [1, 2, 3, 4, 5])

print(results)
# [1.0, 6.0, 19.0, 40.0, 69.0]

client.close()
```

### 2b. Using the Decorator (for single function calls)

For simpler cases where you want to execute a single function call on a worker, you can use the `@distry` decorator.

```python
from distry import register_workers, distry
import numpy as np

# Connect to workers
register_workers(["http://127.0.0.1:8001", "http://127.0.0.1:8002"])

@distry
def process_data(x, power=2):
    return np.mean([x, x**power])

# Process a single input on a randomly selected worker
result = process_data(10)
print(result)
# 55.0

# With keyword arguments
result_power_3 = process_data(10, power=3)
print(result_power_3)
# 505.0
```

### 3. Advanced Usage

```python
from distry import Client

client = Client(worker_urls)

# Custom packages (optional - auto-detection works too)
def scipy_func(x):
    from scipy.special import factorial
    return float(factorial(x))

results = client.map(
    scipy_func,
    [1, 2, 3, 4],
    required_packages=['scipy'],  # Manual specification
    max_workers=2
)

# Results with error handling
def risky_func(x):
    if x == 3:
        raise ValueError("Oops!")
    return x * 2

results = client.map(risky_func, [1, 2, 3, 4])  # [2, 4, None, 8]

client.close()
```

## API Reference

### Client

```python
from distry import Client

client = Client(worker_urls, max_concurrent_jobs=10)

# Map function across inputs
results = client.map(
    func,           # Any Python function
    inputs,         # List of inputs
    max_workers=4,  # Limit concurrent workers
    timeout=60,     # Timeout per input
    required_packages=None  # Auto-detected
)

# Cluster status
status = client.get_cluster_status()

client.close()
```

### Worker

```python
from distry import WorkerServer

# Programmatic worker
server = WorkerServer(host="0.0.0.0", port=8000)
server.run()

# Or use CLI
# distry-worker --host 0.0.0.0 --port 8000
```

## CLI

```plaintext
# Start worker
distry-worker --help
distry-worker --host 0.0.0.0 --port 8000 --max-ram 4g

# The client will automatically split large jobs into batches
# to fit the worker's RAM limit.

# View worker endpoints
# GET /health
# GET /status
# GET /installed_packages
```

## What Happens Under the Hood?

1. **Function Analysis**: Auto-detects imports from your function
2. **Package Installation**: Installs missing packages on workers
3. **Task Distribution**: Splits inputs across available workers
4. **Result Collection**: Gathers results with global indexing
5. **Error Handling**: Failed inputs return `None`, others succeed

## Use Cases

* **Data Processing**: Apply functions to large datasets
* **ML Inference**: Scale model predictions across workers
* **API Calls**: Parallelize HTTP requests
* **Computational Tasks**: CPU-intensive calculations
* **Batch Processing**: Process files, images, or documents

## Limitations

* Single function per job (no complex workflows)
* 30s timeout per input (configurable)
* Synchronous function execution on workers
* Basic package management (no virtual environments)

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "distry-py",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "distributed, computing, parallel, processing",
    "author": "Carlo Moro",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/6c/31/d6cb81d287c318b691bf70d7a6c8aba7f9d8297eea98636a3e01f1a1513f/distry_py-0.2.1.tar.gz",
    "platform": null,
    "description": "<img src=\"https://raw.githubusercontent.com/cnmoro/distry/refs/heads/master/assets/logo.svg\" alt=\"Logo\" width=\"220\" />\n\n## Distry\n\nDistributed task execution framework. Scale your Python functions across multiple workers.\n\n## Features\n\n* **Zero-config setup** - Auto-detects and installs dependencies\n* **Simple API** - Just `client.map(func, inputs)`\n* **Fault-tolerant** - Handles worker failures gracefully\n* **Automatic Job Batching** - Large jobs are automatically split to fit worker RAM limits.\n* **Package management** - Installs required packages on workers\n* **Global indexing** - Results returned in input order\n\n## Installation\n\n```plaintext\n# Client only (for task submission)\npip install distry-py[client]\n\n# Worker only (for task execution)\npip install distry-py[worker]\n\n# Full installation\npip install distry-py[all]\n```\n\n## Quick Start\n\n### 1. Start Workers\n\n```plaintext\n# Terminal 1 - Worker 1\ndistry-worker --port 8001\n\n# Terminal 2 - Worker 2 (with RAM limit)\ndistry-worker --port 8002 --max-ram 2g\n```\n\n### 2. Run Tasks\n\n```python\nfrom distry import Client\n\n# Connect to workers\nclient = Client([\"http://127.0.0.1:8001\", \"http://127.0.0.1:8002\"])\n\n# Define function (any Python function works!)\nimport numpy as np\n\ndef process_data(x):\n    return np.mean([x, x**2, x**3])\n\n# Process inputs in parallel\nresults = client.map(process_data, [1, 2, 3, 4, 5])\n\nprint(results)\n# [1.0, 6.0, 19.0, 40.0, 69.0]\n\nclient.close()\n```\n\n### 2b. Using the Decorator (for single function calls)\n\nFor simpler cases where you want to execute a single function call on a worker, you can use the `@distry` decorator.\n\n```python\nfrom distry import register_workers, distry\nimport numpy as np\n\n# Connect to workers\nregister_workers([\"http://127.0.0.1:8001\", \"http://127.0.0.1:8002\"])\n\n@distry\ndef process_data(x, power=2):\n    return np.mean([x, x**power])\n\n# Process a single input on a randomly selected worker\nresult = process_data(10)\nprint(result)\n# 55.0\n\n# With keyword arguments\nresult_power_3 = process_data(10, power=3)\nprint(result_power_3)\n# 505.0\n```\n\n### 3. Advanced Usage\n\n```python\nfrom distry import Client\n\nclient = Client(worker_urls)\n\n# Custom packages (optional - auto-detection works too)\ndef scipy_func(x):\n    from scipy.special import factorial\n    return float(factorial(x))\n\nresults = client.map(\n    scipy_func,\n    [1, 2, 3, 4],\n    required_packages=['scipy'],  # Manual specification\n    max_workers=2\n)\n\n# Results with error handling\ndef risky_func(x):\n    if x == 3:\n        raise ValueError(\"Oops!\")\n    return x * 2\n\nresults = client.map(risky_func, [1, 2, 3, 4])  # [2, 4, None, 8]\n\nclient.close()\n```\n\n## API Reference\n\n### Client\n\n```python\nfrom distry import Client\n\nclient = Client(worker_urls, max_concurrent_jobs=10)\n\n# Map function across inputs\nresults = client.map(\n    func,           # Any Python function\n    inputs,         # List of inputs\n    max_workers=4,  # Limit concurrent workers\n    timeout=60,     # Timeout per input\n    required_packages=None  # Auto-detected\n)\n\n# Cluster status\nstatus = client.get_cluster_status()\n\nclient.close()\n```\n\n### Worker\n\n```python\nfrom distry import WorkerServer\n\n# Programmatic worker\nserver = WorkerServer(host=\"0.0.0.0\", port=8000)\nserver.run()\n\n# Or use CLI\n# distry-worker --host 0.0.0.0 --port 8000\n```\n\n## CLI\n\n```plaintext\n# Start worker\ndistry-worker --help\ndistry-worker --host 0.0.0.0 --port 8000 --max-ram 4g\n\n# The client will automatically split large jobs into batches\n# to fit the worker's RAM limit.\n\n# View worker endpoints\n# GET /health\n# GET /status\n# GET /installed_packages\n```\n\n## What Happens Under the Hood?\n\n1. **Function Analysis**: Auto-detects imports from your function\n2. **Package Installation**: Installs missing packages on workers\n3. **Task Distribution**: Splits inputs across available workers\n4. **Result Collection**: Gathers results with global indexing\n5. **Error Handling**: Failed inputs return `None`, others succeed\n\n## Use Cases\n\n* **Data Processing**: Apply functions to large datasets\n* **ML Inference**: Scale model predictions across workers\n* **API Calls**: Parallelize HTTP requests\n* **Computational Tasks**: CPU-intensive calculations\n* **Batch Processing**: Process files, images, or documents\n\n## Limitations\n\n* Single function per job (no complex workflows)\n* 30s timeout per input (configurable)\n* Synchronous function execution on workers\n* Basic package management (no virtual environments)\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Distributed task execution framework",
    "version": "0.2.1",
    "project_urls": {
        "Homepage": "https://github.com/cnmoro/distry"
    },
    "split_keywords": [
        "distributed",
        " computing",
        " parallel",
        " processing"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "71a8df8ab80f157eccec8731d70f02c29227d7436aa44bcfbff5e78c32154c8f",
                "md5": "b167fe365f21919ec60cd5490ab324dd",
                "sha256": "25e6d2e946752af5837738a6c310efd8f14afab2b2df807a46a03cf0528b1d83"
            },
            "downloads": -1,
            "filename": "distry_py-0.2.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b167fe365f21919ec60cd5490ab324dd",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 12641,
            "upload_time": "2025-10-24T23:49:46",
            "upload_time_iso_8601": "2025-10-24T23:49:46.002284Z",
            "url": "https://files.pythonhosted.org/packages/71/a8/df8ab80f157eccec8731d70f02c29227d7436aa44bcfbff5e78c32154c8f/distry_py-0.2.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "6c31d6cb81d287c318b691bf70d7a6c8aba7f9d8297eea98636a3e01f1a1513f",
                "md5": "9e63a77ef401bf3eff33485151534ae8",
                "sha256": "7ecbba2c1f3bb32bc4cc0fbe2f01e7fabaaa1004695e54a17f80cc7cf7273103"
            },
            "downloads": -1,
            "filename": "distry_py-0.2.1.tar.gz",
            "has_sig": false,
            "md5_digest": "9e63a77ef401bf3eff33485151534ae8",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 29563,
            "upload_time": "2025-10-24T23:49:47",
            "upload_time_iso_8601": "2025-10-24T23:49:47.256660Z",
            "url": "https://files.pythonhosted.org/packages/6c/31/d6cb81d287c318b691bf70d7a6c8aba7f9d8297eea98636a3e01f1a1513f/distry_py-0.2.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-24 23:49:47",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "cnmoro",
    "github_project": "distry",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "distry-py"
}
        
Elapsed time: 2.04749s