job-tqdflex


Namejob-tqdflex JSON
Version 0.1.0 PyPI version JSON
download
home_pageNone
SummaryParallel processing with progress bars using joblib and tqdm
upload_time2025-08-27 16:45:09
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseCC-BY-SA-4.0
keywords parallel processing joblib tqdm progress multiprocessing
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # job-tqdflex

[![License: CC BY-SA 4.0](https://img.shields.io/badge/License-CC%20BY--SA%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by-sa/4.0/)
[![GitHub Actions](https://img.shields.io/endpoint.svg?url=https%3A%2F%2Factions-badge.atrox.dev%2FDavid-Araripe%2Fjob_tqdflex%2Fbadge%3Fref%3Dmaster&style=flat-square)](https://actions-badge.atrox.dev/David-Araripe/job_tqdflex/goto?ref=master)

A Python library supporting parallel processing with progress bars using joblib (*job*) and tqdm (*tq*), with flexibility (*flex*) for chunked processing for memory efficiency.

## Features

- **Memory efficient** - supports generators and iterators
- **Context manager support** - automatic cleanup of resources
- **Easy parallel processing** with automatic chunking for optimal performance
- **Error handling** - support for error handling with detailed logging
- **Custom logging support** - compatible with loguru and standard python logging

## Installation

```bash
pip install job-tqdflex
```

## Quick Start

```python
from job_tqdflex import ParallelApplier
import time

def slow_square(x): 
    time.sleep(0.1) # (slow) function to apply
    return x ** 2

data = range(20)

# Create and run parallel applier
applier = ParallelApplier(slow_square, data, n_jobs=4)
results = applier()

print(results)  # [0, 1, 4, 9, 16, 25, ...]
```

## Usage Examples

### Basic Usage

```python
from job_tqdflex import ParallelApplier

def process_item(item):
    # Your processing logic here
    return item * 2

data = [1, 2, 3, 4, 5]
applier = ParallelApplier(process_item, data)
results = applier()
```

### With Additional Arguments

```python
def power_function(base, exponent=2):
    return base ** exponent

data = [1, 2, 3, 4, 5]
applier = ParallelApplier(power_function, data)
results = applier(exponent=3)  # [1, 8, 27, 64, 125]
```

### Using functools.partial for Complex Arguments

```python
from functools import partial

def complex_function(item, multiplier, offset=0):
    return item * multiplier + offset

# Pre-configure the function
configured_func = partial(complex_function, multiplier=3, offset=10)

data = [1, 2, 3, 4, 5]
applier = ParallelApplier(configured_func, data)
results = applier()  # [13, 16, 19, 22, 25]
```

### Working with Generators

```python
def data_generator():
    for i in range(1000):
        yield i

def expensive_computation(x):
    return sum(range(x))

# Works seamlessly with generators
applier = ParallelApplier(expensive_computation, data_generator(), n_jobs=8)
results = applier()
```

### Context Manager Usage

```python
def process_data(item):
    return item ** 2

data = range(100)

# Automatic resource cleanup
with ParallelApplier(process_data, data, n_jobs=4) as applier:
    results = applier()
```

### Different Backends

```python
# For CPU-bound tasks (default)
applier = ParallelApplier(cpu_intensive_func, data, backend="loky")

# For I/O-bound tasks
applier = ParallelApplier(io_bound_func, data, backend="threading")

# For other use cases
applier = ParallelApplier(some_func, data, backend="multiprocessing")
```

### Custom Progress Bar Settings

```python
# Disable progress bar
applier = ParallelApplier(func, data, show_progress=False)

# Custom chunk size for memory management
applier = ParallelApplier(func, large_dataset, chunk_size=100)

# Custom progress bar description (default: "Applying {func_name} to chunks")
applier = ParallelApplier(func, data, custom_desc="Processing...")
```

### Using the Low-Level `tqdm_joblib` Context Manager

```python
from job_tqdflex import tqdm_joblib
from joblib import Parallel, delayed
from tqdm import tqdm

def slow_function(x):
    time.sleep(0.1)
    return x ** 2

# Direct integration with joblib
with tqdm_joblib(tqdm(total=10, desc="Processing")) as progress_bar:
    results = Parallel(n_jobs=4)(delayed(slow_function)(i) for i in range(10))
```

## Configuration Options

### ParallelApplier Parameters

- **`func`**: The function to apply to each item
- **`iterable`**: Input data (list, generator, or any iterable)
- **`show_progress`**: Whether to show progress bars (default: `True`)
- **`n_jobs`**: Number of parallel jobs (default: `8`, use `-1` for all cores)
- **`backend`**: Parallelization backend (`"loky"`, `"threading"`, or `"multiprocessing"`)
- **`chunk_size`**: Size of chunks to process (default: auto-calculated)
- **`custom_desc`**: Custom description for the progress bar (default: `None`, uses `"Applying {func_name} to chunks"`)
- **`logger`**: Optional custom logger instance (supports standard logging and loguru)

### Performance Tips

1. **Choose the right backend**:
   - `"loky"` (default): Best for CPU-bound tasks
   - `"threading"`: Good for I/O-bound tasks
   - `"multiprocessing"`: For CPU-bound tasks with shared memory concerns

2. **Optimize chunk size**:
   - Larger chunks reduce overhead but increase memory usage
   - Smaller chunks provide better load balancing
   - Auto-calculation usually works well

3. **Use generators for large datasets**:
   ```python
   def large_data_generator():
       for i in range(1_000_000):
           yield expensive_data_loader(i)
   
   applier = ParallelApplier(process_func, large_data_generator())
   ```

## Error Handling

The library provides comprehensive error handling:

```python
def potentially_failing_function(x):
    if x == 42:
        raise ValueError("The answer to everything!")
    return x * 2

try:
    applier = ParallelApplier(potentially_failing_function, range(100))
    results = applier()
except RuntimeError as e:
    print(f"Parallel processing failed: {e}")
```

## Logging

### Standard Python Logging

Enable debug logging to monitor performance:

```python
import logging

logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger("joblib_tqdm")

# Your parallel processing code here
```

### Custom Logger Support (including Loguru)

The library supports custom logger instances, including loguru:

```python
# With loguru (if installed)
from loguru import logger as loguru_logger

def process_item(x):
    return x ** 2

data = range(100)

# Use loguru for all internal logging
applier = ParallelApplier(process_item, data, logger=loguru_logger)
results = applier()

# Or with tqdm_joblib context manager
from tqdm import tqdm
with tqdm_joblib(tqdm(total=100, desc="Processing"), logger=loguru_logger) as pbar:
    results = Parallel(n_jobs=4)(delayed(process_item)(i) for i in data)
```

```python
# With standard logging custom logger
import logging

custom_logger = logging.getLogger("my_custom_logger")
custom_logger.setLevel(logging.INFO)

applier = ParallelApplier(process_item, data, logger=custom_logger)
results = applier()
```

**Note**: Loguru is not a required dependency. It's included in the `[dev]` optional dependencies for testing purposes. You can use any logger object that has `debug()` and `error()` methods.

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## License

This project is licensed under the CC BY-SA 4.0 License - see the [LICENSE](LICENSE) file for details.

## Attribution

This project includes code based on the [tqdm_joblib](https://github.com/louisabraham/tqdm_joblib) implementation by Louis Abraham, which is distributed under CC BY-SA 4.0. The original implementation was inspired by a Stack Overflow solution for integrating tqdm with joblib's parallel processing.

## Acknowledgments

- Built on top of the excellent [joblib](https://joblib.readthedocs.io/) library
- Progress bars provided by [tqdm](https://tqdm.github.io/)
- Based on the original [tqdm_joblib](https://github.com/louisabraham/tqdm_joblib) by Louis Abraham
- Inspired by the need for simple parallel processing with progress tracking and custom logging support

## Changelog

### 0.1.0 (2025)
- Initial release
- Basic parallel processing with progress bars
- Support for multiple backends (loky, threading, multiprocessing)
- Generator and iterator support
- Context manager support
- Custom logger support (compatible with loguru and standard logging)
- Comprehensive test suite including loguru integration tests
- Memory efficient chunking with auto-calculated chunk sizes

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "job-tqdflex",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "parallel, processing, joblib, tqdm, progress, multiprocessing",
    "author": null,
    "author_email": "David Araripe <david.araripe17@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/33/a3/2ed2a28f3522758b7a53a0a178595e14b225210d6be685fe406931ba0f73/job_tqdflex-0.1.0.tar.gz",
    "platform": null,
    "description": "# job-tqdflex\n\n[![License: CC BY-SA 4.0](https://img.shields.io/badge/License-CC%20BY--SA%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by-sa/4.0/)\n[![GitHub Actions](https://img.shields.io/endpoint.svg?url=https%3A%2F%2Factions-badge.atrox.dev%2FDavid-Araripe%2Fjob_tqdflex%2Fbadge%3Fref%3Dmaster&style=flat-square)](https://actions-badge.atrox.dev/David-Araripe/job_tqdflex/goto?ref=master)\n\nA Python library supporting parallel processing with progress bars using joblib (*job*) and tqdm (*tq*), with flexibility (*flex*) for chunked processing for memory efficiency.\n\n## Features\n\n- **Memory efficient** - supports generators and iterators\n- **Context manager support** - automatic cleanup of resources\n- **Easy parallel processing** with automatic chunking for optimal performance\n- **Error handling** - support for error handling with detailed logging\n- **Custom logging support** - compatible with loguru and standard python logging\n\n## Installation\n\n```bash\npip install job-tqdflex\n```\n\n## Quick Start\n\n```python\nfrom job_tqdflex import ParallelApplier\nimport time\n\ndef slow_square(x): \n    time.sleep(0.1) # (slow) function to apply\n    return x ** 2\n\ndata = range(20)\n\n# Create and run parallel applier\napplier = ParallelApplier(slow_square, data, n_jobs=4)\nresults = applier()\n\nprint(results)  # [0, 1, 4, 9, 16, 25, ...]\n```\n\n## Usage Examples\n\n### Basic Usage\n\n```python\nfrom job_tqdflex import ParallelApplier\n\ndef process_item(item):\n    # Your processing logic here\n    return item * 2\n\ndata = [1, 2, 3, 4, 5]\napplier = ParallelApplier(process_item, data)\nresults = applier()\n```\n\n### With Additional Arguments\n\n```python\ndef power_function(base, exponent=2):\n    return base ** exponent\n\ndata = [1, 2, 3, 4, 5]\napplier = ParallelApplier(power_function, data)\nresults = applier(exponent=3)  # [1, 8, 27, 64, 125]\n```\n\n### Using functools.partial for Complex Arguments\n\n```python\nfrom functools import partial\n\ndef complex_function(item, multiplier, offset=0):\n    return item * multiplier + offset\n\n# Pre-configure the function\nconfigured_func = partial(complex_function, multiplier=3, offset=10)\n\ndata = [1, 2, 3, 4, 5]\napplier = ParallelApplier(configured_func, data)\nresults = applier()  # [13, 16, 19, 22, 25]\n```\n\n### Working with Generators\n\n```python\ndef data_generator():\n    for i in range(1000):\n        yield i\n\ndef expensive_computation(x):\n    return sum(range(x))\n\n# Works seamlessly with generators\napplier = ParallelApplier(expensive_computation, data_generator(), n_jobs=8)\nresults = applier()\n```\n\n### Context Manager Usage\n\n```python\ndef process_data(item):\n    return item ** 2\n\ndata = range(100)\n\n# Automatic resource cleanup\nwith ParallelApplier(process_data, data, n_jobs=4) as applier:\n    results = applier()\n```\n\n### Different Backends\n\n```python\n# For CPU-bound tasks (default)\napplier = ParallelApplier(cpu_intensive_func, data, backend=\"loky\")\n\n# For I/O-bound tasks\napplier = ParallelApplier(io_bound_func, data, backend=\"threading\")\n\n# For other use cases\napplier = ParallelApplier(some_func, data, backend=\"multiprocessing\")\n```\n\n### Custom Progress Bar Settings\n\n```python\n# Disable progress bar\napplier = ParallelApplier(func, data, show_progress=False)\n\n# Custom chunk size for memory management\napplier = ParallelApplier(func, large_dataset, chunk_size=100)\n\n# Custom progress bar description (default: \"Applying {func_name} to chunks\")\napplier = ParallelApplier(func, data, custom_desc=\"Processing...\")\n```\n\n### Using the Low-Level `tqdm_joblib` Context Manager\n\n```python\nfrom job_tqdflex import tqdm_joblib\nfrom joblib import Parallel, delayed\nfrom tqdm import tqdm\n\ndef slow_function(x):\n    time.sleep(0.1)\n    return x ** 2\n\n# Direct integration with joblib\nwith tqdm_joblib(tqdm(total=10, desc=\"Processing\")) as progress_bar:\n    results = Parallel(n_jobs=4)(delayed(slow_function)(i) for i in range(10))\n```\n\n## Configuration Options\n\n### ParallelApplier Parameters\n\n- **`func`**: The function to apply to each item\n- **`iterable`**: Input data (list, generator, or any iterable)\n- **`show_progress`**: Whether to show progress bars (default: `True`)\n- **`n_jobs`**: Number of parallel jobs (default: `8`, use `-1` for all cores)\n- **`backend`**: Parallelization backend (`\"loky\"`, `\"threading\"`, or `\"multiprocessing\"`)\n- **`chunk_size`**: Size of chunks to process (default: auto-calculated)\n- **`custom_desc`**: Custom description for the progress bar (default: `None`, uses `\"Applying {func_name} to chunks\"`)\n- **`logger`**: Optional custom logger instance (supports standard logging and loguru)\n\n### Performance Tips\n\n1. **Choose the right backend**:\n   - `\"loky\"` (default): Best for CPU-bound tasks\n   - `\"threading\"`: Good for I/O-bound tasks\n   - `\"multiprocessing\"`: For CPU-bound tasks with shared memory concerns\n\n2. **Optimize chunk size**:\n   - Larger chunks reduce overhead but increase memory usage\n   - Smaller chunks provide better load balancing\n   - Auto-calculation usually works well\n\n3. **Use generators for large datasets**:\n   ```python\n   def large_data_generator():\n       for i in range(1_000_000):\n           yield expensive_data_loader(i)\n   \n   applier = ParallelApplier(process_func, large_data_generator())\n   ```\n\n## Error Handling\n\nThe library provides comprehensive error handling:\n\n```python\ndef potentially_failing_function(x):\n    if x == 42:\n        raise ValueError(\"The answer to everything!\")\n    return x * 2\n\ntry:\n    applier = ParallelApplier(potentially_failing_function, range(100))\n    results = applier()\nexcept RuntimeError as e:\n    print(f\"Parallel processing failed: {e}\")\n```\n\n## Logging\n\n### Standard Python Logging\n\nEnable debug logging to monitor performance:\n\n```python\nimport logging\n\nlogging.basicConfig(level=logging.DEBUG)\nlogger = logging.getLogger(\"joblib_tqdm\")\n\n# Your parallel processing code here\n```\n\n### Custom Logger Support (including Loguru)\n\nThe library supports custom logger instances, including loguru:\n\n```python\n# With loguru (if installed)\nfrom loguru import logger as loguru_logger\n\ndef process_item(x):\n    return x ** 2\n\ndata = range(100)\n\n# Use loguru for all internal logging\napplier = ParallelApplier(process_item, data, logger=loguru_logger)\nresults = applier()\n\n# Or with tqdm_joblib context manager\nfrom tqdm import tqdm\nwith tqdm_joblib(tqdm(total=100, desc=\"Processing\"), logger=loguru_logger) as pbar:\n    results = Parallel(n_jobs=4)(delayed(process_item)(i) for i in data)\n```\n\n```python\n# With standard logging custom logger\nimport logging\n\ncustom_logger = logging.getLogger(\"my_custom_logger\")\ncustom_logger.setLevel(logging.INFO)\n\napplier = ParallelApplier(process_item, data, logger=custom_logger)\nresults = applier()\n```\n\n**Note**: Loguru is not a required dependency. It's included in the `[dev]` optional dependencies for testing purposes. You can use any logger object that has `debug()` and `error()` methods.\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## License\n\nThis project is licensed under the CC BY-SA 4.0 License - see the [LICENSE](LICENSE) file for details.\n\n## Attribution\n\nThis project includes code based on the [tqdm_joblib](https://github.com/louisabraham/tqdm_joblib) implementation by Louis Abraham, which is distributed under CC BY-SA 4.0. The original implementation was inspired by a Stack Overflow solution for integrating tqdm with joblib's parallel processing.\n\n## Acknowledgments\n\n- Built on top of the excellent [joblib](https://joblib.readthedocs.io/) library\n- Progress bars provided by [tqdm](https://tqdm.github.io/)\n- Based on the original [tqdm_joblib](https://github.com/louisabraham/tqdm_joblib) by Louis Abraham\n- Inspired by the need for simple parallel processing with progress tracking and custom logging support\n\n## Changelog\n\n### 0.1.0 (2025)\n- Initial release\n- Basic parallel processing with progress bars\n- Support for multiple backends (loky, threading, multiprocessing)\n- Generator and iterator support\n- Context manager support\n- Custom logger support (compatible with loguru and standard logging)\n- Comprehensive test suite including loguru integration tests\n- Memory efficient chunking with auto-calculated chunk sizes\n",
    "bugtrack_url": null,
    "license": "CC-BY-SA-4.0",
    "summary": "Parallel processing with progress bars using joblib and tqdm",
    "version": "0.1.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/yourusername/joblib-tqdm/issues",
        "Documentation": "https://github.com/yourusername/joblib-tqdm#readme",
        "Homepage": "https://github.com/yourusername/joblib-tqdm",
        "Repository": "https://github.com/yourusername/joblib-tqdm"
    },
    "split_keywords": [
        "parallel",
        " processing",
        " joblib",
        " tqdm",
        " progress",
        " multiprocessing"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "50e55d7c8b06f19c4a978cabd200c2dd8913d0e0c1c1c0e681649991ae550a56",
                "md5": "30d95f79055344e220cf50c973611cbe",
                "sha256": "c79429703d719f8531477321593819403ec69e650e03d35ec7d92de7f13e31f3"
            },
            "downloads": -1,
            "filename": "job_tqdflex-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "30d95f79055344e220cf50c973611cbe",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 15314,
            "upload_time": "2025-08-27T16:45:08",
            "upload_time_iso_8601": "2025-08-27T16:45:08.197999Z",
            "url": "https://files.pythonhosted.org/packages/50/e5/5d7c8b06f19c4a978cabd200c2dd8913d0e0c1c1c0e681649991ae550a56/job_tqdflex-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "33a32ed2a28f3522758b7a53a0a178595e14b225210d6be685fe406931ba0f73",
                "md5": "8a01af7eb2daa0f1e7e911a1ecc2f787",
                "sha256": "f3e91f3ac5ef8e1fca38117bb98aa888522eb8990ff246522cd6514ba81228bb"
            },
            "downloads": -1,
            "filename": "job_tqdflex-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "8a01af7eb2daa0f1e7e911a1ecc2f787",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 20870,
            "upload_time": "2025-08-27T16:45:09",
            "upload_time_iso_8601": "2025-08-27T16:45:09.696567Z",
            "url": "https://files.pythonhosted.org/packages/33/a3/2ed2a28f3522758b7a53a0a178595e14b225210d6be685fe406931ba0f73/job_tqdflex-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-27 16:45:09",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "yourusername",
    "github_project": "joblib-tqdm",
    "github_not_found": true,
    "lcname": "job-tqdflex"
}
        
Elapsed time: 1.10912s