dejaq

Name	dejaq JSON
Version	0.2.0 JSON
	download
home_page	None
Summary	Déjà Queue – A fast multiprocessing queue for Python
upload_time	2025-02-23 12:04:36
maintainer	None
docs_url	None
author	jlab.berlin, Benjamin Judkewitz
requires_python	>=3.8
license	MIT
keywords	multiprocessing queue
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            
![Python Version](https://img.shields.io/badge/python-3.8+-blue)
[![PyPI - Version](https://img.shields.io/pypi/v/dejaq)](https://pypi.org/project/dejaq/)
[![Conda Version](https://img.shields.io/conda/v/conda-forge/dejaq)](https://anaconda.org/conda-forge/dejaq)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
![GitHub last commit](https://img.shields.io/github/last-commit/danionella/dejaq)

# Déjà Queue

A fast alternative to `multiprocessing.Queue`. Faster, because it takes advantage of a shared memory ring buffer (rather than slow pipes) and [pickle protocol 5 out-of-band data](https://peps.python.org/pep-0574/) to minimize copies. [`dejaq.DejaQueue`](#dejaqdejaqueue) supports any type of [picklable](https://docs.python.org/3/library/pickle.html#what-can-be-pickled-and-unpickled) Python object, including numpy arrays or nested dictionaries with mixed content.

<img src="https://github.com/user-attachments/assets/00465436-47f8-4b2a-a236-d288ee34df28" width="100%">

The speed advantege of `DejaQueue` becomes substantial for items of > 1 MB size. It enables efficient inter-job communication in big-data processing pipelines, which can be implemented in a few lines of code with [`dejaq.Parallel`](#dejaqparallel).

Auto-generated (minimal) API documentation: https://danionella.github.io/dejaq


## Installation
- `conda install conda-forge::dejaq `

- or, if you prefer pip: `pip install dejaq`

- for development, clone this repository, navigate to the root directory and type `pip install -e .`

## Examples
### `dejaq.DejaQueue`
```python
import numpy as np
from multiprocessing import Process
from dejaq import DejaQueue

def produce(queue):
    for i in range(10):
        arr = np.random.randn(100,200,300)
        data = dict(array=arr, i=i)
        queue.put(data)
        print(f'produced {type(arr)} {arr.shape} {arr.dtype}; meta: {i}; hash: {hash(arr.tobytes())}\n', flush=True)

def consume(queue, pid):
    while True:
        data = queue.get()
        array, i = data['array'], data['i']
        print(f'consumer {pid} consumed {type(array)} {array.shape} {array.dtype}; index: {i}; hash: {hash(array.tobytes())}\n', flush=True)

queue = DejaQueue(buffer_bytes=100e6)
producer = Process(target=produce, args=(queue,))
consumers = [Process(target=consume, args=(queue, pid)) for pid in range(3)]
for c in consumers:
    c.start()
producer.start()
```


### `dejaq.Parallel`
The following examples show how to use `dejaq.Parallel` to parallelize a function or a class, and how to create job pipelines.

Here we execute a function and map iterable inputs across 10 workers. To enable pipelining, the results of each stage are provided as iterable generator. Use the `.compute()` method to get the final result (note that each stage pre-fetches results from `n_workers` calls, so some of the execution already starts before `.compute`). Results are always ordered.

```python
from time import sleep
from dejaq import Parallel

def slow_function(arg):
    sleep(1.0)
    return arg + 5

input_iterable = range(100)
slow_function = Parallel(n_workers=10)(slow_function)
stage = slow_function(input_iterable)
result = stage.compute() # or list(stage)
# or shorter: 
result = Parallel(n_workers=10)(slow_function)(input_iterable).compute()
```

You can also use `Parallel` as a function decorator:
```python
@Parallel(n_workers=10)
def slow_function_decorated(arg):
    sleep(1.0)
    return arg + 5

result = slow_function_decorated(input_iterable).compute()
```

Similarly, you can decorate a class. It will be instantiated within a worker. Iterable items will be fed to the `__call__` method. Note how the additional init arguments are provided:
```python
@Parallel(n_workers=1)
class Reader:
    def __init__(self, arg1):
        self.arg1 = arg1
    def __call__(self, item):
        return item + self.arg1

result = Reader(arg1=0.5)(input_iterable).compute()
```

Finally, you can create pipelines of chained jobs. In this example, we have a single threaded reader and consumer, but a parallel processing stage (an example use case is sequentially reading a file, compressing chunks in parallel and then sequentially writing to an output file):
```python
@Parallel(n_workers=1)
class Producer:
    def __init__(self, arg1):
        self.arg1 = arg1
    def __call__(self, item):
        return item + self.arg1

@Parallel(n_workers=10)
class Processor:
    def __init__(self, arg1):
        self.arg1 = arg1
    def __call__(self, arg):
        sleep(1.0) #simulating a slow function
        return arg * self.arg1

@Parallel(n_workers=1)
class Consumer:
    def __init__(self, arg1):
        self.arg1 = arg1
    def __call__(self, arg):
        return arg - self.arg1

input_iterable = range(100)
stage1 = Producer(0.5)(input_iterable)
stage2 = Processor(10.0)(stage1)
stage3 = Consumer(1000)(stage2)
result = stage3.compute()

# or:
result = Consumer(1000)(Processor(10.0)(Producer(0.5)(input_iterable))).compute()
```

# See also
- [ArrayQueues](https://github.com/portugueslab/arrayqueues) 
- [joblib.Parallel](https://joblib.readthedocs.io/en/latest/generated/joblib.Parallel.html)
- [Déjà Q](https://en.wikipedia.org/wiki/Deja_Q)

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "dejaq",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "multiprocessing, queue",
    "author": "jlab.berlin, Benjamin Judkewitz",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/64/22/219f2703307e3f447f4f6831b54aead39e7d4417265b39449752bf20e6cf/dejaq-0.2.0.tar.gz",
    "platform": null,
    "description": "\n![Python Version](https://img.shields.io/badge/python-3.8+-blue)\n[![PyPI - Version](https://img.shields.io/pypi/v/dejaq)](https://pypi.org/project/dejaq/)\n[![Conda Version](https://img.shields.io/conda/v/conda-forge/dejaq)](https://anaconda.org/conda-forge/dejaq)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n![GitHub last commit](https://img.shields.io/github/last-commit/danionella/dejaq)\n\n# D\u00e9j\u00e0 Queue\n\nA fast alternative to `multiprocessing.Queue`. Faster, because it takes advantage of a shared memory ring buffer (rather than slow pipes) and [pickle protocol 5 out-of-band data](https://peps.python.org/pep-0574/) to minimize copies. [`dejaq.DejaQueue`](#dejaqdejaqueue) supports any type of [picklable](https://docs.python.org/3/library/pickle.html#what-can-be-pickled-and-unpickled) Python object, including numpy arrays or nested dictionaries with mixed content.\n\n<img src=\"https://github.com/user-attachments/assets/00465436-47f8-4b2a-a236-d288ee34df28\" width=\"100%\">\n\nThe speed advantege of `DejaQueue` becomes substantial for items of > 1 MB size. It enables efficient inter-job communication in big-data processing pipelines, which can be implemented in a few lines of code with [`dejaq.Parallel`](#dejaqparallel).\n\nAuto-generated (minimal) API documentation: https://danionella.github.io/dejaq\n\n\n## Installation\n- `conda install conda-forge::dejaq `\n\n- or, if you prefer pip: `pip install dejaq`\n\n- for development, clone this repository, navigate to the root directory and type `pip install -e .`\n\n## Examples\n### `dejaq.DejaQueue`\n```python\nimport numpy as np\nfrom multiprocessing import Process\nfrom dejaq import DejaQueue\n\ndef produce(queue):\n    for i in range(10):\n        arr = np.random.randn(100,200,300)\n        data = dict(array=arr, i=i)\n        queue.put(data)\n        print(f'produced {type(arr)} {arr.shape} {arr.dtype}; meta: {i}; hash: {hash(arr.tobytes())}\\n', flush=True)\n\ndef consume(queue, pid):\n    while True:\n        data = queue.get()\n        array, i = data['array'], data['i']\n        print(f'consumer {pid} consumed {type(array)} {array.shape} {array.dtype}; index: {i}; hash: {hash(array.tobytes())}\\n', flush=True)\n\nqueue = DejaQueue(buffer_bytes=100e6)\nproducer = Process(target=produce, args=(queue,))\nconsumers = [Process(target=consume, args=(queue, pid)) for pid in range(3)]\nfor c in consumers:\n    c.start()\nproducer.start()\n```\n\n\n### `dejaq.Parallel`\nThe following examples show how to use `dejaq.Parallel` to parallelize a function or a class, and how to create job pipelines.\n\nHere we execute a function and map iterable inputs across 10 workers. To enable pipelining, the results of each stage are provided as iterable generator. Use the `.compute()` method to get the final result (note that each stage pre-fetches results from `n_workers` calls, so some of the execution already starts before `.compute`). Results are always ordered.\n\n```python\nfrom time import sleep\nfrom dejaq import Parallel\n\ndef slow_function(arg):\n    sleep(1.0)\n    return arg + 5\n\ninput_iterable = range(100)\nslow_function = Parallel(n_workers=10)(slow_function)\nstage = slow_function(input_iterable)\nresult = stage.compute() # or list(stage)\n# or shorter: \nresult = Parallel(n_workers=10)(slow_function)(input_iterable).compute()\n```\n\nYou can also use `Parallel` as a function decorator:\n```python\n@Parallel(n_workers=10)\ndef slow_function_decorated(arg):\n    sleep(1.0)\n    return arg + 5\n\nresult = slow_function_decorated(input_iterable).compute()\n```\n\nSimilarly, you can decorate a class. It will be instantiated within a worker. Iterable items will be fed to the `__call__` method. Note how the additional init arguments are provided:\n```python\n@Parallel(n_workers=1)\nclass Reader:\n    def __init__(self, arg1):\n        self.arg1 = arg1\n    def __call__(self, item):\n        return item + self.arg1\n\nresult = Reader(arg1=0.5)(input_iterable).compute()\n```\n\nFinally, you can create pipelines of chained jobs. In this example, we have a single threaded reader and consumer, but a parallel processing stage (an example use case is sequentially reading a file, compressing chunks in parallel and then sequentially writing to an output file):\n```python\n@Parallel(n_workers=1)\nclass Producer:\n    def __init__(self, arg1):\n        self.arg1 = arg1\n    def __call__(self, item):\n        return item + self.arg1\n\n@Parallel(n_workers=10)\nclass Processor:\n    def __init__(self, arg1):\n        self.arg1 = arg1\n    def __call__(self, arg):\n        sleep(1.0) #simulating a slow function\n        return arg * self.arg1\n\n@Parallel(n_workers=1)\nclass Consumer:\n    def __init__(self, arg1):\n        self.arg1 = arg1\n    def __call__(self, arg):\n        return arg - self.arg1\n\ninput_iterable = range(100)\nstage1 = Producer(0.5)(input_iterable)\nstage2 = Processor(10.0)(stage1)\nstage3 = Consumer(1000)(stage2)\nresult = stage3.compute()\n\n# or:\nresult = Consumer(1000)(Processor(10.0)(Producer(0.5)(input_iterable))).compute()\n```\n\n# See also\n- [ArrayQueues](https://github.com/portugueslab/arrayqueues) \n- [joblib.Parallel](https://joblib.readthedocs.io/en/latest/generated/joblib.Parallel.html)\n- [D\u00e9j\u00e0 Q](https://en.wikipedia.org/wiki/Deja_Q)\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "D\u00e9j\u00e0 Queue \u2013 A fast multiprocessing queue for Python",
    "version": "0.2.0",
    "project_urls": {
        "Homepage": "https://github.com/danionella/dejaq"
    },
    "split_keywords": [
        "multiprocessing",
        " queue"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "5482cb07858b768da511f6c8ee46df1df1661d2f6144578463fb4f952f6fe784",
                "md5": "fe567098c213edf994497408a39b081f",
                "sha256": "2764cb6f328c70c7f84de2ba861edf397277f4de483eb0ce4d65a7bd4f23a96d"
            },
            "downloads": -1,
            "filename": "dejaq-0.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "fe567098c213edf994497408a39b081f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 11210,
            "upload_time": "2025-02-23T12:04:33",
            "upload_time_iso_8601": "2025-02-23T12:04:33.820389Z",
            "url": "https://files.pythonhosted.org/packages/54/82/cb07858b768da511f6c8ee46df1df1661d2f6144578463fb4f952f6fe784/dejaq-0.2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "6422219f2703307e3f447f4f6831b54aead39e7d4417265b39449752bf20e6cf",
                "md5": "3ef9555586bca89969f8f442d2293c74",
                "sha256": "37abd8f7ff49fb30a0b5d9997852fee6a8c1195c42443d4dbc6128280f8c944e"
            },
            "downloads": -1,
            "filename": "dejaq-0.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "3ef9555586bca89969f8f442d2293c74",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 12525,
            "upload_time": "2025-02-23T12:04:36",
            "upload_time_iso_8601": "2025-02-23T12:04:36.099154Z",
            "url": "https://files.pythonhosted.org/packages/64/22/219f2703307e3f447f4f6831b54aead39e7d4417265b39449752bf20e6cf/dejaq-0.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-23 12:04:36",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "danionella",
    "github_project": "dejaq",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "dejaq"
}

jlab.berlin, Benjamin Judkewitz