Name | easy-multiprocess JSON |
Version |
1.0.1
JSON |
| download |
home_page | None |
Summary | Use all your cores with no extra code |
upload_time | 2024-06-30 20:38:45 |
maintainer | None |
docs_url | None |
author | Pranay |
requires_python | None |
license | MIT License |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# easy_multiprocess
```easy_multiprocess``` is a package that makes it extremely simple to multiprocess.
___
#### Multiprocess your code with just 1 line!
```python
# Before:
def func1(x):
# some heavy computing
...
a = [func1(i) for i in range(16)]
# After:
@parallelize
def func1(x):
# some heavy computing
a = [func1(i) for i in range(16)] # all calls run in parallel on 16 cores
```
---
#### Other multiprocess libraries force a specific coding style/syntax.
Below is the same code from above, except using `concurrent.futures`:
```python
def func1(x):
# some heavy computation
...
# concurrent.futures
with ProcessPoolExecutor() as pool:
a = list(pool.map(func1, range(16)))
```
---
#### Other multiprocess libraries don't use all cores (when multiple operations occur).
On a 16 core machine, let's see how long the following two take:
```python
# Our machine has 16 cores
# func1, func2... each take 10 seconds
# 1: concurrent.futures library:
a = list(pool.map(func1, range(4)))
b = list(pool.map(func2, range(4)))
c = list(pool.map(func3, range(4)))
d = list(pool.map(func4, range(4)))
# elapsed time = 40s
# 2: easy_multiprocess:
a = [func1(i) for i in range(4)]
b = [func2(i) for i in range(4)]
c = [func3(i) for i in range(4)]
d = [func4(i) for i in range(4)]
# elapsed time = 10s
```
---
#### Parallelize simple code
You can even use ```easy_multiprocess``` for simple code (that needs parallelizing):
```python
# func1, func2... each take 10 seconds
a = func1(0)
b = func2(1)
c = func3(2)
d = func4(3)
print(a, b, c, d)
# elapsed time = 10s
```
---
#### Non Embarrasingly Parallel Code
It even works for the non-[embarrassingly parallel](https://en.wikipedia.org/wiki/Embarrassingly_parallel) case (but might be [suboptimal](#limitations)):
```python
# func1, func2... each take 10 seconds
a = func1(0)
b = func2(a)
c = func3(2) # c/d need to wait if after b
d = func4(3)
print(a, b, c, d)
# elapsed time = 30s
```
`easy_multiprocess` implicitly uses a DAG computation graph for this (other libraries have similar mechanisms, such as [Ray's DAG](https://docs.ray.io/en/latest/ray-core/ray-dag.html)). See [Limitations](#limitations) for where this doesn't work.
---
#### User Installation
On Mac/Linux/Unix-like:
```
pip install easy_multiprocess
```
(Windows not currently supported)
---
#### Developer Installation
```
git clone <this_repo>
cd easymultiprocess
pip install -e .
```
Then, run tests:
```
python -m unittest tests.test
```
---
#### Author Notes
I built ```easy_multiprocess``` simply to learn how to build a python package.
It's built on top of ```concurrent.futures```, rather than being built from the ground up using OS-level primitives, since that would've taken me over 10x as much time and code to build. **This means it has MANY limitations.**
#### Limitations:
- ```is``` comparisons aren't supported for ```FutureResult``` objects due to python identity. This means that for ```is``` operations involving any output from any ```@parallelize```-d function, the user should use ```==``` instead, or call ```.result()``` before using ```is``` (similar to any ```future``` object from other multiprocess libraries). Adding support for this would require the user to install/use an inefficient custom python interpreter fork, which I would also have to spend time to build. This would anyway defeat the purpose of user-friendliness.
- Standard IO streams are not guaranteed to work correctly
- The non-embarrassingly parallel case is suboptimally implemented (see the [example](#non-embarrasingly-parallel-code), which should take 20s in the ideal case), but can be improved in the future
- Requires Copy-on-write, so only works on Mac/Linux/Unix-like (system with `fork` method)
General limitations of all common python multiprocessing libraries:
- Closure variables cannot be created/updated once processes are set up (for std library concurrent futures, this occurs upon first submission to executor). You can get around this by calling ```ProcessPoolManager.cleanup``` and ```get_executor``` again. (TODO: add code sample)
- Args must be ```pickle```-able (some other cases also work, such as if the library is using ```dill``` or other serialization methods)
- If you have lots of state, can be expensive to create new processes (copy-on-write not guaranteed)
- Program correctness is not guaranteed when external state race-conditions exist (ex. parallel processes try to write/read from same file)
Other Notes:
- The ```@parallelize``` decorator will send off the code it wraps to another process
- ```parallelize``` sounds more intuitive (and cooler), but ```concurrent``` is technically "correct". If you want, you can use ```@concurrent``` instead
Future improvements:
- Avoid `pickle`-ing arguments. Instead, wrap the function and turn its `args` into closure variables (copy-on-write would apply). Even if you pass in a large arg (such as a large ML model), it would not delay, or need to copy the large arg over to, the subprocess.
Raw data
{
"_id": null,
"home_page": null,
"name": "easy-multiprocess",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": null,
"author": "Pranay",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/35/27/b5961061103a6ff4f7d6b18fc3a72f6fb82d61d14a6deb1dd5779f142c28/easy_multiprocess-1.0.1.tar.gz",
"platform": null,
"description": "# easy_multiprocess\n```easy_multiprocess``` is a package that makes it extremely simple to multiprocess.\n___\n#### Multiprocess your code with just 1 line!\n\n```python\n# Before:\ndef func1(x):\n\t# some heavy computing\n ...\n\na = [func1(i) for i in range(16)]\n\n# After:\n@parallelize\ndef func1(x):\n\t# some heavy computing\n\na = [func1(i) for i in range(16)] # all calls run in parallel on 16 cores\n```\n---\n#### Other multiprocess libraries force a specific coding style/syntax.\n Below is the same code from above, except using `concurrent.futures`:\n\n```python\ndef func1(x):\n\t# some heavy computation\n ...\n\n# concurrent.futures\nwith ProcessPoolExecutor() as pool:\n\ta = list(pool.map(func1, range(16)))\n```\n---\n#### Other multiprocess libraries don't use all cores (when multiple operations occur). \n\nOn a 16 core machine, let's see how long the following two take:\n\n```python\n# Our machine has 16 cores\n# func1, func2... each take 10 seconds\n\n# 1: concurrent.futures library:\na = list(pool.map(func1, range(4)))\nb = list(pool.map(func2, range(4)))\nc = list(pool.map(func3, range(4)))\nd = list(pool.map(func4, range(4)))\n\n# elapsed time = 40s\n\n\n# 2: easy_multiprocess:\na = [func1(i) for i in range(4)]\nb = [func2(i) for i in range(4)]\nc = [func3(i) for i in range(4)]\nd = [func4(i) for i in range(4)]\n\n# elapsed time = 10s\n```\n---\n#### Parallelize simple code\nYou can even use ```easy_multiprocess``` for simple code (that needs parallelizing):\n```python\n# func1, func2... each take 10 seconds\na = func1(0)\nb = func2(1)\nc = func3(2)\nd = func4(3)\nprint(a, b, c, d)\n# elapsed time = 10s\n```\n---\n#### Non Embarrasingly Parallel Code\nIt even works for the non-[embarrassingly parallel](https://en.wikipedia.org/wiki/Embarrassingly_parallel) case (but might be [suboptimal](#limitations)):\n\n```python\n# func1, func2... each take 10 seconds\na = func1(0)\nb = func2(a)\nc = func3(2) # c/d need to wait if after b\nd = func4(3)\nprint(a, b, c, d)\n\n# elapsed time = 30s\n```\n`easy_multiprocess` implicitly uses a DAG computation graph for this (other libraries have similar mechanisms, such as [Ray's DAG](https://docs.ray.io/en/latest/ray-core/ray-dag.html)). See [Limitations](#limitations) for where this doesn't work.\n\n---\n#### User Installation\nOn Mac/Linux/Unix-like:\n```\npip install easy_multiprocess\n```\n(Windows not currently supported)\n\n---\n#### Developer Installation\n```\ngit clone <this_repo>\ncd easymultiprocess\npip install -e .\n```\nThen, run tests:\n```\npython -m unittest tests.test\n```\n---\n#### Author Notes\nI built ```easy_multiprocess``` simply to learn how to build a python package. \n\nIt's built on top of ```concurrent.futures```, rather than being built from the ground up using OS-level primitives, since that would've taken me over 10x as much time and code to build. **This means it has MANY limitations.**\n\n#### Limitations:\n- ```is``` comparisons aren't supported for ```FutureResult``` objects due to python identity. This means that for ```is``` operations involving any output from any ```@parallelize```-d function, the user should use ```==``` instead, or call ```.result()``` before using ```is``` (similar to any ```future``` object from other multiprocess libraries). Adding support for this would require the user to install/use an inefficient custom python interpreter fork, which I would also have to spend time to build. This would anyway defeat the purpose of user-friendliness.\n- Standard IO streams are not guaranteed to work correctly\n- The non-embarrassingly parallel case is suboptimally implemented (see the [example](#non-embarrasingly-parallel-code), which should take 20s in the ideal case), but can be improved in the future\n- Requires Copy-on-write, so only works on Mac/Linux/Unix-like (system with `fork` method)\n\nGeneral limitations of all common python multiprocessing libraries:\n- Closure variables cannot be created/updated once processes are set up (for std library concurrent futures, this occurs upon first submission to executor). You can get around this by calling ```ProcessPoolManager.cleanup``` and ```get_executor``` again. (TODO: add code sample)\n- Args must be ```pickle```-able (some other cases also work, such as if the library is using ```dill``` or other serialization methods)\n- If you have lots of state, can be expensive to create new processes (copy-on-write not guaranteed)\n- Program correctness is not guaranteed when external state race-conditions exist (ex. parallel processes try to write/read from same file)\n\nOther Notes:\n- The ```@parallelize``` decorator will send off the code it wraps to another process\n- ```parallelize``` sounds more intuitive (and cooler), but ```concurrent``` is technically \"correct\". If you want, you can use ```@concurrent``` instead\n\nFuture improvements:\n- Avoid `pickle`-ing arguments. Instead, wrap the function and turn its `args` into closure variables (copy-on-write would apply). Even if you pass in a large arg (such as a large ML model), it would not delay, or need to copy the large arg over to, the subprocess.\n",
"bugtrack_url": null,
"license": "MIT License",
"summary": "Use all your cores with no extra code",
"version": "1.0.1",
"project_urls": null,
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "db7155c882302a60ddc77ca50327f8f05103aa3ee1fb3472280a9aa884754c6c",
"md5": "8fdd658d7695e2b93cd45bff4d76ac2c",
"sha256": "e425ee8240e327f2c37a7f6011070b84e00deb71c619dedb1207926d935ec4bc"
},
"downloads": -1,
"filename": "easy_multiprocess-1.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "8fdd658d7695e2b93cd45bff4d76ac2c",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 5750,
"upload_time": "2024-06-30T20:38:44",
"upload_time_iso_8601": "2024-06-30T20:38:44.631808Z",
"url": "https://files.pythonhosted.org/packages/db/71/55c882302a60ddc77ca50327f8f05103aa3ee1fb3472280a9aa884754c6c/easy_multiprocess-1.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "3527b5961061103a6ff4f7d6b18fc3a72f6fb82d61d14a6deb1dd5779f142c28",
"md5": "98f26f82e026a60dc356bea6cf1bda37",
"sha256": "91b8e8799cb740be87ad6c765a12a96cf62c0ea5ba374f3c582ab5f1793c96f8"
},
"downloads": -1,
"filename": "easy_multiprocess-1.0.1.tar.gz",
"has_sig": false,
"md5_digest": "98f26f82e026a60dc356bea6cf1bda37",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 13348,
"upload_time": "2024-06-30T20:38:45",
"upload_time_iso_8601": "2024-06-30T20:38:45.736691Z",
"url": "https://files.pythonhosted.org/packages/35/27/b5961061103a6ff4f7d6b18fc3a72f6fb82d61d14a6deb1dd5779f142c28/easy_multiprocess-1.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-06-30 20:38:45",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "easy-multiprocess"
}