pychunkbuffers


Namepychunkbuffers JSON
Version 1.0.4 PyPI version JSON
download
home_pagehttps://github.com/AdityaIyer2k7/pychunkbuffers
SummaryAn open-source python library for writing large amounts of data to buffers via chunks
upload_time2023-04-26 10:08:36
maintainer
docs_urlNone
authorAdityaIyer2k7
requires_python>=3.6
license
keywords python 3 threading thread chunking buffers pychunk pybuffer
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # pychunkbuffers
An open-source python library for writing large amounts of data to buffers via chunks.

## Description
This repositiory contains the source code for the `pychunkbuffers` library. I came up with the idea for this library while making my other project [AdityaIyer2k7/image-file-hider](https://github.com/AdityaIyer2k7/image-file-hider). In that project, I often had to write large amounts of data (hundreds of megabytes) to lists and buffers. Doing this byte-by-byte took a lot of time, so instead I came up with the solution of chunking.

Basically, let us say we have a `for` loop that has to run 10^8 times, and each time it adds a value to a list. In a chunked implementation, you would pre-define this list like this:
```py
[0]*10**8
```
and then create a function that goes from index a to b and updates that value of the list like this:
```py
def func(startidx, endidx):
  for i in range(startidx, endidx):
    LIST[i] = SOMEVALUE
```
However, if we run `func(0, 10**8)`, we are still running 10^8 iterations in sequence. Instead, we can run parts like `func(0, 10000)`, `func(10000, 20000)` and so on simultaneously on threads. With this library, we can simply use the line
```py
run_chunked(func, 10000, 0, 10**8) # Where 10000 is our chunk size, while 0 and 10**8 are our bounds
```
Now, we would like to check when all chunks have completed their tasks. The library implements this using a completion status list. The `run_chunked` function returns a list of boolean values which are all `False` when the chunks start. Whenever a chunk finishes its task, that specific chunk's status is set to `True` in the list. If we want to wait for all the chunks to finish, we can use a line like this:
```py
while not all(STATUS): pass
```
Example implementation:
```py
# Task: To write the squares values for numbers 1 to 10**8 (inclusive)
squares = [0]*10**8
CHUNKSIZE = 10**5
def func(startidx, endidx):
  for i in range(startidx, endidx):
    squares[i] = (i+1)**2
status = run_chunked(func, CHUNKSIZE, 0, len(squares))
while not all(status): pass
print("Done")
print(squares[:100])
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/AdityaIyer2k7/pychunkbuffers",
    "name": "pychunkbuffers",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "python 3,threading,thread,chunking,buffers,pychunk,pybuffer",
    "author": "AdityaIyer2k7",
    "author_email": "adityaiyer2007@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/fa/a4/5280372fb448de383573cbd521c8912df9d08c5e63e510a34b9a535ab185/pychunkbuffers-1.0.4.tar.gz",
    "platform": null,
    "description": "# pychunkbuffers\nAn open-source python library for writing large amounts of data to buffers via chunks.\n\n## Description\nThis repositiory contains the source code for the `pychunkbuffers` library. I came up with the idea for this library while making my other project [AdityaIyer2k7/image-file-hider](https://github.com/AdityaIyer2k7/image-file-hider). In that project, I often had to write large amounts of data (hundreds of megabytes) to lists and buffers. Doing this byte-by-byte took a lot of time, so instead I came up with the solution of chunking.\n\nBasically, let us say we have a `for` loop that has to run 10^8 times, and each time it adds a value to a list. In a chunked implementation, you would pre-define this list like this:\n```py\n[0]*10**8\n```\nand then create a function that goes from index a to b and updates that value of the list like this:\n```py\ndef func(startidx, endidx):\n  for i in range(startidx, endidx):\n    LIST[i] = SOMEVALUE\n```\nHowever, if we run `func(0, 10**8)`, we are still running 10^8 iterations in sequence. Instead, we can run parts like `func(0, 10000)`, `func(10000, 20000)` and so on simultaneously on threads. With this library, we can simply use the line\n```py\nrun_chunked(func, 10000, 0, 10**8) # Where 10000 is our chunk size, while 0 and 10**8 are our bounds\n```\nNow, we would like to check when all chunks have completed their tasks. The library implements this using a completion status list. The `run_chunked` function returns a list of boolean values which are all `False` when the chunks start. Whenever a chunk finishes its task, that specific chunk's status is set to `True` in the list. If we want to wait for all the chunks to finish, we can use a line like this:\n```py\nwhile not all(STATUS): pass\n```\nExample implementation:\n```py\n# Task: To write the squares values for numbers 1 to 10**8 (inclusive)\nsquares = [0]*10**8\nCHUNKSIZE = 10**5\ndef func(startidx, endidx):\n  for i in range(startidx, endidx):\n    squares[i] = (i+1)**2\nstatus = run_chunked(func, CHUNKSIZE, 0, len(squares))\nwhile not all(status): pass\nprint(\"Done\")\nprint(squares[:100])\n```\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "An open-source python library for writing large amounts of data to buffers via chunks",
    "version": "1.0.4",
    "split_keywords": [
        "python 3",
        "threading",
        "thread",
        "chunking",
        "buffers",
        "pychunk",
        "pybuffer"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3c839c7f8c41cb3c918d5e552cd4ffc6aca3e1fada9893137ce593abbcd74419",
                "md5": "998864e25d13bc6b0b506c0091adb8e8",
                "sha256": "9e5b387ef41f1c8535b712cb7b39fd1c3d9d417a20cb0feef1e40f591119d518"
            },
            "downloads": -1,
            "filename": "pychunkbuffers-1.0.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "998864e25d13bc6b0b506c0091adb8e8",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 15814,
            "upload_time": "2023-04-26T10:08:34",
            "upload_time_iso_8601": "2023-04-26T10:08:34.340710Z",
            "url": "https://files.pythonhosted.org/packages/3c/83/9c7f8c41cb3c918d5e552cd4ffc6aca3e1fada9893137ce593abbcd74419/pychunkbuffers-1.0.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "faa45280372fb448de383573cbd521c8912df9d08c5e63e510a34b9a535ab185",
                "md5": "b3645b545c406929bad9511818ffb25e",
                "sha256": "98071e55bddcbec5fc6a5c351cebeb000bf25165720ea4bf77cb4c0d28d80f7a"
            },
            "downloads": -1,
            "filename": "pychunkbuffers-1.0.4.tar.gz",
            "has_sig": false,
            "md5_digest": "b3645b545c406929bad9511818ffb25e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 15397,
            "upload_time": "2023-04-26T10:08:36",
            "upload_time_iso_8601": "2023-04-26T10:08:36.065760Z",
            "url": "https://files.pythonhosted.org/packages/fa/a4/5280372fb448de383573cbd521c8912df9d08c5e63e510a34b9a535ab185/pychunkbuffers-1.0.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-04-26 10:08:36",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "AdityaIyer2k7",
    "github_project": "pychunkbuffers",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "pychunkbuffers"
}
        
Elapsed time: 0.06181s