<img src="https://raw.githubusercontent.com/cnmoro/distry/refs/heads/master/assets/logo.svg" alt="Logo" width="220" />
## Distry
Distributed task execution framework. Scale your Python functions across multiple workers.
## Features
* **Zero-config setup** - Auto-detects and installs dependencies
* **Simple API** - Just `client.map(func, inputs)`
* **Fault-tolerant** - Handles worker failures gracefully
* **Automatic Job Batching** - Large jobs are automatically split to fit worker RAM limits.
* **Package management** - Installs required packages on workers
* **Global indexing** - Results returned in input order
## Installation
```plaintext
# Client only (for task submission)
pip install distry-py[client]
# Worker only (for task execution)
pip install distry-py[worker]
# Full installation
pip install distry-py[all]
```
## Quick Start
### 1. Start Workers
```plaintext
# Terminal 1 - Worker 1
distry-worker --port 8001
# Terminal 2 - Worker 2 (with RAM limit)
distry-worker --port 8002 --max-ram 2g
```
### 2. Run Tasks
```python
from distry import Client
# Connect to workers
client = Client(["http://127.0.0.1:8001", "http://127.0.0.1:8002"])
# Define function (any Python function works!)
import numpy as np
def process_data(x):
return np.mean([x, x**2, x**3])
# Process inputs in parallel
results = client.map(process_data, [1, 2, 3, 4, 5])
print(results)
# [1.0, 6.0, 19.0, 40.0, 69.0]
client.close()
```
### 2b. Using the Decorator (for single function calls)
For simpler cases where you want to execute a single function call on a worker, you can use the `@distry` decorator.
```python
from distry import register_workers, distry
import numpy as np
# Connect to workers
register_workers(["http://127.0.0.1:8001", "http://127.0.0.1:8002"])
@distry
def process_data(x, power=2):
return np.mean([x, x**power])
# Process a single input on a randomly selected worker
result = process_data(10)
print(result)
# 55.0
# With keyword arguments
result_power_3 = process_data(10, power=3)
print(result_power_3)
# 505.0
```
### 3. Advanced Usage
```python
from distry import Client
client = Client(worker_urls)
# Custom packages (optional - auto-detection works too)
def scipy_func(x):
from scipy.special import factorial
return float(factorial(x))
results = client.map(
scipy_func,
[1, 2, 3, 4],
required_packages=['scipy'], # Manual specification
max_workers=2
)
# Results with error handling
def risky_func(x):
if x == 3:
raise ValueError("Oops!")
return x * 2
results = client.map(risky_func, [1, 2, 3, 4]) # [2, 4, None, 8]
client.close()
```
## API Reference
### Client
```python
from distry import Client
client = Client(worker_urls, max_concurrent_jobs=10)
# Map function across inputs
results = client.map(
func, # Any Python function
inputs, # List of inputs
max_workers=4, # Limit concurrent workers
timeout=60, # Timeout per input
required_packages=None # Auto-detected
)
# Cluster status
status = client.get_cluster_status()
client.close()
```
### Worker
```python
from distry import WorkerServer
# Programmatic worker
server = WorkerServer(host="0.0.0.0", port=8000)
server.run()
# Or use CLI
# distry-worker --host 0.0.0.0 --port 8000
```
## CLI
```plaintext
# Start worker
distry-worker --help
distry-worker --host 0.0.0.0 --port 8000 --max-ram 4g
# The client will automatically split large jobs into batches
# to fit the worker's RAM limit.
# View worker endpoints
# GET /health
# GET /status
# GET /installed_packages
```
## What Happens Under the Hood?
1. **Function Analysis**: Auto-detects imports from your function
2. **Package Installation**: Installs missing packages on workers
3. **Task Distribution**: Splits inputs across available workers
4. **Result Collection**: Gathers results with global indexing
5. **Error Handling**: Failed inputs return `None`, others succeed
## Use Cases
* **Data Processing**: Apply functions to large datasets
* **ML Inference**: Scale model predictions across workers
* **API Calls**: Parallelize HTTP requests
* **Computational Tasks**: CPU-intensive calculations
* **Batch Processing**: Process files, images, or documents
## Limitations
* Single function per job (no complex workflows)
* 30s timeout per input (configurable)
* Synchronous function execution on workers
* Basic package management (no virtual environments)
Raw data
{
"_id": null,
"home_page": null,
"name": "distry-py",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "distributed, computing, parallel, processing",
"author": "Carlo Moro",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/6c/31/d6cb81d287c318b691bf70d7a6c8aba7f9d8297eea98636a3e01f1a1513f/distry_py-0.2.1.tar.gz",
"platform": null,
"description": "<img src=\"https://raw.githubusercontent.com/cnmoro/distry/refs/heads/master/assets/logo.svg\" alt=\"Logo\" width=\"220\" />\n\n## Distry\n\nDistributed task execution framework. Scale your Python functions across multiple workers.\n\n## Features\n\n* **Zero-config setup** - Auto-detects and installs dependencies\n* **Simple API** - Just `client.map(func, inputs)`\n* **Fault-tolerant** - Handles worker failures gracefully\n* **Automatic Job Batching** - Large jobs are automatically split to fit worker RAM limits.\n* **Package management** - Installs required packages on workers\n* **Global indexing** - Results returned in input order\n\n## Installation\n\n```plaintext\n# Client only (for task submission)\npip install distry-py[client]\n\n# Worker only (for task execution)\npip install distry-py[worker]\n\n# Full installation\npip install distry-py[all]\n```\n\n## Quick Start\n\n### 1. Start Workers\n\n```plaintext\n# Terminal 1 - Worker 1\ndistry-worker --port 8001\n\n# Terminal 2 - Worker 2 (with RAM limit)\ndistry-worker --port 8002 --max-ram 2g\n```\n\n### 2. Run Tasks\n\n```python\nfrom distry import Client\n\n# Connect to workers\nclient = Client([\"http://127.0.0.1:8001\", \"http://127.0.0.1:8002\"])\n\n# Define function (any Python function works!)\nimport numpy as np\n\ndef process_data(x):\n return np.mean([x, x**2, x**3])\n\n# Process inputs in parallel\nresults = client.map(process_data, [1, 2, 3, 4, 5])\n\nprint(results)\n# [1.0, 6.0, 19.0, 40.0, 69.0]\n\nclient.close()\n```\n\n### 2b. Using the Decorator (for single function calls)\n\nFor simpler cases where you want to execute a single function call on a worker, you can use the `@distry` decorator.\n\n```python\nfrom distry import register_workers, distry\nimport numpy as np\n\n# Connect to workers\nregister_workers([\"http://127.0.0.1:8001\", \"http://127.0.0.1:8002\"])\n\n@distry\ndef process_data(x, power=2):\n return np.mean([x, x**power])\n\n# Process a single input on a randomly selected worker\nresult = process_data(10)\nprint(result)\n# 55.0\n\n# With keyword arguments\nresult_power_3 = process_data(10, power=3)\nprint(result_power_3)\n# 505.0\n```\n\n### 3. Advanced Usage\n\n```python\nfrom distry import Client\n\nclient = Client(worker_urls)\n\n# Custom packages (optional - auto-detection works too)\ndef scipy_func(x):\n from scipy.special import factorial\n return float(factorial(x))\n\nresults = client.map(\n scipy_func,\n [1, 2, 3, 4],\n required_packages=['scipy'], # Manual specification\n max_workers=2\n)\n\n# Results with error handling\ndef risky_func(x):\n if x == 3:\n raise ValueError(\"Oops!\")\n return x * 2\n\nresults = client.map(risky_func, [1, 2, 3, 4]) # [2, 4, None, 8]\n\nclient.close()\n```\n\n## API Reference\n\n### Client\n\n```python\nfrom distry import Client\n\nclient = Client(worker_urls, max_concurrent_jobs=10)\n\n# Map function across inputs\nresults = client.map(\n func, # Any Python function\n inputs, # List of inputs\n max_workers=4, # Limit concurrent workers\n timeout=60, # Timeout per input\n required_packages=None # Auto-detected\n)\n\n# Cluster status\nstatus = client.get_cluster_status()\n\nclient.close()\n```\n\n### Worker\n\n```python\nfrom distry import WorkerServer\n\n# Programmatic worker\nserver = WorkerServer(host=\"0.0.0.0\", port=8000)\nserver.run()\n\n# Or use CLI\n# distry-worker --host 0.0.0.0 --port 8000\n```\n\n## CLI\n\n```plaintext\n# Start worker\ndistry-worker --help\ndistry-worker --host 0.0.0.0 --port 8000 --max-ram 4g\n\n# The client will automatically split large jobs into batches\n# to fit the worker's RAM limit.\n\n# View worker endpoints\n# GET /health\n# GET /status\n# GET /installed_packages\n```\n\n## What Happens Under the Hood?\n\n1. **Function Analysis**: Auto-detects imports from your function\n2. **Package Installation**: Installs missing packages on workers\n3. **Task Distribution**: Splits inputs across available workers\n4. **Result Collection**: Gathers results with global indexing\n5. **Error Handling**: Failed inputs return `None`, others succeed\n\n## Use Cases\n\n* **Data Processing**: Apply functions to large datasets\n* **ML Inference**: Scale model predictions across workers\n* **API Calls**: Parallelize HTTP requests\n* **Computational Tasks**: CPU-intensive calculations\n* **Batch Processing**: Process files, images, or documents\n\n## Limitations\n\n* Single function per job (no complex workflows)\n* 30s timeout per input (configurable)\n* Synchronous function execution on workers\n* Basic package management (no virtual environments)\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Distributed task execution framework",
"version": "0.2.1",
"project_urls": {
"Homepage": "https://github.com/cnmoro/distry"
},
"split_keywords": [
"distributed",
" computing",
" parallel",
" processing"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "71a8df8ab80f157eccec8731d70f02c29227d7436aa44bcfbff5e78c32154c8f",
"md5": "b167fe365f21919ec60cd5490ab324dd",
"sha256": "25e6d2e946752af5837738a6c310efd8f14afab2b2df807a46a03cf0528b1d83"
},
"downloads": -1,
"filename": "distry_py-0.2.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "b167fe365f21919ec60cd5490ab324dd",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 12641,
"upload_time": "2025-10-24T23:49:46",
"upload_time_iso_8601": "2025-10-24T23:49:46.002284Z",
"url": "https://files.pythonhosted.org/packages/71/a8/df8ab80f157eccec8731d70f02c29227d7436aa44bcfbff5e78c32154c8f/distry_py-0.2.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "6c31d6cb81d287c318b691bf70d7a6c8aba7f9d8297eea98636a3e01f1a1513f",
"md5": "9e63a77ef401bf3eff33485151534ae8",
"sha256": "7ecbba2c1f3bb32bc4cc0fbe2f01e7fabaaa1004695e54a17f80cc7cf7273103"
},
"downloads": -1,
"filename": "distry_py-0.2.1.tar.gz",
"has_sig": false,
"md5_digest": "9e63a77ef401bf3eff33485151534ae8",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 29563,
"upload_time": "2025-10-24T23:49:47",
"upload_time_iso_8601": "2025-10-24T23:49:47.256660Z",
"url": "https://files.pythonhosted.org/packages/6c/31/d6cb81d287c318b691bf70d7a6c8aba7f9d8297eea98636a3e01f1a1513f/distry_py-0.2.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-24 23:49:47",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "cnmoro",
"github_project": "distry",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "distry-py"
}