[![pytest](https://github.com/wimpomp/parfor/actions/workflows/pytest.yml/badge.svg)](https://github.com/wimpomp/parfor/actions/workflows/pytest.yml)
# Parfor
Used to parallelize for-loops using parfor in Matlab? This package allows you to do the same in python.
Take any normal serial but parallelizable for-loop and execute it in parallel using easy syntax.
Don't worry about the technical details of using the multiprocessing module, race conditions, queues,
parfor handles all that.
Tested on linux, Windows and OSX with python 3.10 and 3.12.
## Why is parfor better than just using multiprocessing?
- Easy to use
- Using dill instead of pickle: a lot more objects can be used when parallelizing
- Progress bars are built-in
- Automatically use multithreading instead of multiprocessing when the GIL is disabled
## How it works
This depends on whether the GIL is currently disabled or not. Disabling the GIL in Python is currently an experimental
feature in Python3.13, and not the standard.
### Python with GIL enabled
The work you want parfor to do is divided over a number of processes. These processes are started by parfor and put
together in a pool. This pool is reused when you want parfor to do more work, or shut down when no new work arrives
within 10 minutes.
A handle to each bit of work is put in a queue from which the workers take work. The objects needed to do the work are
stored in a memory manager in serialized form (using dill) and the manager hands out an object to a worker when the
worker is requesting it. The manager deletes objects automatically when they're not needed anymore.
When the work is done the result is sent back for collection in the main process.
### Python with GIL disabled
The work you want parfor to do is given to a new thread. These threads are started by parfor and put together in a pool.
The threads and pool are not reused and closed automatically when done.
When the work is done a message is sent to the main thread to update the status of the pool.
## Installation
`pip install parfor`
## Usage
Parfor decorates a functions and returns the result of that function evaluated in parallel for each iteration of
an iterator.
## Requires
tqdm, dill
## Limitations
If you're using Python with the GIL enabaled, then objects passed to the pool need to be dillable (dill needs to
serialize them). Generators and SwigPyObjects are examples of objects that cannot be used. They can be used however, for
the iterator argument when using parfor, but its iterations need to be dillable. You might be able to make objects
dillable anyhow using `dill.register` or with `__reduce__`, `__getstate__`, etc.
## Arguments
To functions `parfor.parfor`, `parfor.pmap` and `parfor.gmap`.
### Required:
fun: function taking arguments: iteration from iterable, other arguments defined in args & kwargs
iterable: iterable or iterator from which an item is given to fun as a first argument
### Optional:
args: tuple with other unnamed arguments to fun
kwargs: dict with other named arguments to fun
total: give the length of the iterator in cases where len(iterator) results in an error
desc: string with description of the progress bar
bar: bool enable progress bar,
or a callback function taking the number of passed iterations as an argument
serial: execute in series instead of parallel if True, None (default): let pmap decide
length: deprecated alias for total
n_processes: number of processes to use,
the parallel pool will be restarted if the current pool does not have the right number of processes
yield_ordered: return the result in the same order as the iterable
yield_index: return the index of the result too
**bar_kwargs: keyword arguments for tqdm.tqdm
### Return
list with results from applying the function 'fun' to each iteration of the iterable / iterator
## Examples
### Normal serial for loop
<<
from time import sleep
a = 3
fun = []
for i in range(10):
sleep(1)
fun.append(a * i ** 2)
print(fun)
>> [0, 3, 12, 27, 48, 75, 108, 147, 192, 243]
### Using parfor to parallelize
<<
from time import sleep
from parfor import parfor
@parfor(range(10), (3,))
def fun(i, a):
sleep(1)
return a * i ** 2
print(fun)
>> [0, 3, 12, 27, 48, 75, 108, 147, 192, 243]
<<
@parfor(range(10), (3,), bar=False)
def fun(i, a):
sleep(1)
return a * i ** 2
print(fun)
>> [0, 3, 12, 27, 48, 75, 108, 147, 192, 243]
### Using parfor in a script/module/.py-file
Parfor should never be executed during the import phase of a .py-file. To prevent that from happening
use the `if __name__ == '__main__':` structure:
<<
from time import sleep
from parfor import parfor
if __name__ == '__main__':
@parfor(range(10), (3,))
def fun(i, a):
sleep(1)
return a * i ** 2
print(fun)
>> [0, 3, 12, 27, 48, 75, 108, 147, 192, 243]
or:
<<
from time import sleep
from parfor import parfor
def my_fun(*args, **kwargs):
@parfor(range(10), (3,))
def fun(i, a):
sleep(1)
return a * i ** 2
return fun
if __name__ == '__main__':
print(my_fun())
>> [0, 3, 12, 27, 48, 75, 108, 147, 192, 243]
### If you hate decorators not returning a function
pmap maps an iterator to a function like map does, but in parallel
<<
from parfor import pmap
from time import sleep
def fun(i, a):
sleep(1)
return a * i ** 2
print(pmap(fun, range(10), (3,)))
>> [0, 3, 12, 27, 48, 75, 108, 147, 192, 243]
### Using generators
If iterators like lists and tuples are too big for the memory, use generators instead.
Since generators don't have a predefined length, give parfor the length (total) as an argument (optional).
<<
import numpy as np
c = (im for im in imagereader)
@parfor(c, total=len(imagereader))
def fun(im):
return np.mean(im)
>> [list with means of the images]
# Extra's
## `pmap`
The function parfor decorates, it's used similarly to `map`, it returns a list with the results.
## `gmap`
Same as pmap, but returns a generator. Useful to use the result as soon as it's generated.
## `Chunks`
Split a long iterator in bite-sized chunks to parallelize
## `ParPool`
More low-level accessibility to parallel execution. Submit tasks and request the result at any time,
(although to avoid breaking causality, submit first, then request), use different functions and function
arguments for different tasks.
Raw data
{
"_id": null,
"home_page": "https://github.com/wimpomp/parfor",
"name": "parfor",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.10",
"maintainer_email": null,
"keywords": "parfor, concurrency, multiprocessing, parallel",
"author": "Wim Pomp",
"author_email": "wimpomp@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/62/33/5c72bcba3250445b2e294a79e1a0a29afa881a56e1fb0baaf1a9efc821dd/parfor-2024.11.1.tar.gz",
"platform": null,
"description": "[![pytest](https://github.com/wimpomp/parfor/actions/workflows/pytest.yml/badge.svg)](https://github.com/wimpomp/parfor/actions/workflows/pytest.yml)\n\n# Parfor\nUsed to parallelize for-loops using parfor in Matlab? This package allows you to do the same in python.\nTake any normal serial but parallelizable for-loop and execute it in parallel using easy syntax.\nDon't worry about the technical details of using the multiprocessing module, race conditions, queues,\nparfor handles all that. \n\nTested on linux, Windows and OSX with python 3.10 and 3.12.\n\n## Why is parfor better than just using multiprocessing?\n- Easy to use\n- Using dill instead of pickle: a lot more objects can be used when parallelizing\n- Progress bars are built-in\n- Automatically use multithreading instead of multiprocessing when the GIL is disabled\n\n## How it works\nThis depends on whether the GIL is currently disabled or not. Disabling the GIL in Python is currently an experimental\nfeature in Python3.13, and not the standard.\n\n### Python with GIL enabled\nThe work you want parfor to do is divided over a number of processes. These processes are started by parfor and put\ntogether in a pool. This pool is reused when you want parfor to do more work, or shut down when no new work arrives\nwithin 10 minutes.\n\nA handle to each bit of work is put in a queue from which the workers take work. The objects needed to do the work are\nstored in a memory manager in serialized form (using dill) and the manager hands out an object to a worker when the\nworker is requesting it. The manager deletes objects automatically when they're not needed anymore.\n\nWhen the work is done the result is sent back for collection in the main process.\n\n### Python with GIL disabled\nThe work you want parfor to do is given to a new thread. These threads are started by parfor and put together in a pool.\nThe threads and pool are not reused and closed automatically when done.\n\nWhen the work is done a message is sent to the main thread to update the status of the pool.\n\n## Installation\n`pip install parfor`\n\n## Usage\nParfor decorates a functions and returns the result of that function evaluated in parallel for each iteration of\nan iterator.\n\n## Requires\ntqdm, dill\n\n## Limitations\nIf you're using Python with the GIL enabaled, then objects passed to the pool need to be dillable (dill needs to\nserialize them). Generators and SwigPyObjects are examples of objects that cannot be used. They can be used however, for\nthe iterator argument when using parfor, but its iterations need to be dillable. You might be able to make objects\ndillable anyhow using `dill.register` or with `__reduce__`, `__getstate__`, etc.\n\n## Arguments\nTo functions `parfor.parfor`, `parfor.pmap` and `parfor.gmap`.\n\n### Required:\n fun: function taking arguments: iteration from iterable, other arguments defined in args & kwargs\n iterable: iterable or iterator from which an item is given to fun as a first argument\n\n### Optional:\n args: tuple with other unnamed arguments to fun\n kwargs: dict with other named arguments to fun\n total: give the length of the iterator in cases where len(iterator) results in an error\n desc: string with description of the progress bar\n bar: bool enable progress bar,\n or a callback function taking the number of passed iterations as an argument\n serial: execute in series instead of parallel if True, None (default): let pmap decide\n length: deprecated alias for total\n n_processes: number of processes to use,\n the parallel pool will be restarted if the current pool does not have the right number of processes\n yield_ordered: return the result in the same order as the iterable\n yield_index: return the index of the result too\n **bar_kwargs: keyword arguments for tqdm.tqdm\n\n### Return\n list with results from applying the function 'fun' to each iteration of the iterable / iterator\n\n## Examples\n### Normal serial for loop\n <<\n from time import sleep\n\n a = 3\n fun = []\n for i in range(10):\n sleep(1)\n fun.append(a * i ** 2)\n print(fun)\n\n >> [0, 3, 12, 27, 48, 75, 108, 147, 192, 243]\n \n### Using parfor to parallelize\n <<\n from time import sleep\n from parfor import parfor\n @parfor(range(10), (3,))\n def fun(i, a):\n sleep(1)\n return a * i ** 2\n print(fun)\n\n >> [0, 3, 12, 27, 48, 75, 108, 147, 192, 243]\n\n <<\n @parfor(range(10), (3,), bar=False)\n def fun(i, a):\n sleep(1)\n return a * i ** 2\n print(fun)\n\n >> [0, 3, 12, 27, 48, 75, 108, 147, 192, 243]\n\n### Using parfor in a script/module/.py-file\nParfor should never be executed during the import phase of a .py-file. To prevent that from happening\nuse the `if __name__ == '__main__':` structure:\n\n <<\n from time import sleep\n from parfor import parfor\n \n if __name__ == '__main__':\n @parfor(range(10), (3,))\n def fun(i, a):\n sleep(1)\n return a * i ** 2\n print(fun)\n\n >> [0, 3, 12, 27, 48, 75, 108, 147, 192, 243] \nor:\n\n <<\n from time import sleep\n from parfor import parfor\n \n def my_fun(*args, **kwargs):\n @parfor(range(10), (3,))\n def fun(i, a):\n sleep(1)\n return a * i ** 2\n return fun\n \n if __name__ == '__main__':\n print(my_fun())\n\n >> [0, 3, 12, 27, 48, 75, 108, 147, 192, 243]\n\n### If you hate decorators not returning a function\npmap maps an iterator to a function like map does, but in parallel\n\n <<\n from parfor import pmap\n from time import sleep\n def fun(i, a):\n sleep(1)\n return a * i ** 2\n print(pmap(fun, range(10), (3,)))\n\n >> [0, 3, 12, 27, 48, 75, 108, 147, 192, 243] \n \n### Using generators\nIf iterators like lists and tuples are too big for the memory, use generators instead.\nSince generators don't have a predefined length, give parfor the length (total) as an argument (optional). \n \n <<\n import numpy as np\n c = (im for im in imagereader)\n @parfor(c, total=len(imagereader))\n def fun(im):\n return np.mean(im)\n \n >> [list with means of the images]\n \n# Extra's\n## `pmap`\nThe function parfor decorates, it's used similarly to `map`, it returns a list with the results.\n\n## `gmap`\nSame as pmap, but returns a generator. Useful to use the result as soon as it's generated.\n\n## `Chunks`\nSplit a long iterator in bite-sized chunks to parallelize\n\n## `ParPool`\nMore low-level accessibility to parallel execution. Submit tasks and request the result at any time,\n(although to avoid breaking causality, submit first, then request), use different functions and function\narguments for different tasks.\n\n",
"bugtrack_url": null,
"license": "GPLv3",
"summary": "A package to mimic the use of parfor as done in Matlab.",
"version": "2024.11.1",
"project_urls": {
"Homepage": "https://github.com/wimpomp/parfor",
"Repository": "https://github.com/wimpomp/parfor"
},
"split_keywords": [
"parfor",
" concurrency",
" multiprocessing",
" parallel"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "eeae9a76344e6acdc8f31bbf99839c78c6bc3b34d2ad1009fd0169f4bc1e6c97",
"md5": "a6040b8c3acf374a9f47df83f91c429d",
"sha256": "b4dbba2deafe829df0cef58a30cce0edcdbe82c7a9dbcfa493c4dade48fc97e1"
},
"downloads": -1,
"filename": "parfor-2024.11.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "a6040b8c3acf374a9f47df83f91c429d",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.10",
"size": 27439,
"upload_time": "2024-11-05T14:06:13",
"upload_time_iso_8601": "2024-11-05T14:06:13.706386Z",
"url": "https://files.pythonhosted.org/packages/ee/ae/9a76344e6acdc8f31bbf99839c78c6bc3b34d2ad1009fd0169f4bc1e6c97/parfor-2024.11.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "62335c72bcba3250445b2e294a79e1a0a29afa881a56e1fb0baaf1a9efc821dd",
"md5": "760e80b737d09da793fdefa459c4c49f",
"sha256": "f480333ac70465bbc4608e67197c3bd9d0b556346d4130dabc8c7e6b1bca3f82"
},
"downloads": -1,
"filename": "parfor-2024.11.1.tar.gz",
"has_sig": false,
"md5_digest": "760e80b737d09da793fdefa459c4c49f",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.10",
"size": 26585,
"upload_time": "2024-11-05T14:06:15",
"upload_time_iso_8601": "2024-11-05T14:06:15.269264Z",
"url": "https://files.pythonhosted.org/packages/62/33/5c72bcba3250445b2e294a79e1a0a29afa881a56e1fb0baaf1a9efc821dd/parfor-2024.11.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-05 14:06:15",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "wimpomp",
"github_project": "parfor",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "parfor"
}