pdp


Namepdp JSON
Version 0.3.0 PyPI version JSON
download
home_pagehttps://github.com/Egor-Krivov/pdp
SummaryBuild fast data processing pipelines easily
upload_time2018-09-20 13:15:41
maintainer
docs_urlNone
authorEgor-Krivov
requires_python
licenseMIT
keywords pipeline parallel thread data processing augmentation
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            =======================
Pipline Data Processing
=======================

Why?
----
Many tasks in machine learning, deep learning and other fields require complex data processing that takes a lot of time. Ideally, this processing should run in parallel to the main process, preparing data for usage (by neural net, for instance). PDP provide simple interface to organize pipeline of data processing with simple blocks that satisfy most typical needs.

Use cases
--------------
* Neural Net training, where you need a way to train net, load data from the disk and augment it. PDP allows user to do all these things at the same time without need to use *threading* module directly.

Examples
--------
Are in repository in *examples* folder

Is it fast? 
-----------
Speed and parallel execution is a top priority. Right now threads are used to exchange information between pipline stages, because it's memory and CPU efficient to exchange data between threads and not processes. Python's threads are flawed by GIL, but it doesn't affect performance for IO-bound tasks and for numpy operations. Since all operations for data augmentations are likely to be done in numpy operations, performance will not be significantly affected by GIL.

Installation
------------
:code:`pip install pdp`

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Egor-Krivov/pdp",
    "name": "pdp",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "pipeline parallel thread data processing augmentation",
    "author": "Egor-Krivov",
    "author_email": "e.a.krivov@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/e4/f9/5e4886980fd2a86013055142f9f3c9f94d3205495a0603a55d5eae32ac9d/pdp-0.3.0.tar.gz",
    "platform": "",
    "description": "=======================\nPipline Data Processing\n=======================\n\nWhy?\n----\nMany tasks in machine learning, deep learning and other fields require complex data processing that takes a lot of time. Ideally, this processing should run in parallel to the main process, preparing data for usage (by neural net, for instance). PDP provide simple interface to organize pipeline of data processing with simple blocks that satisfy most typical needs.\n\nUse cases\n--------------\n* Neural Net training, where you need a way to train net, load data from the disk and augment it. PDP allows user to do all these things at the same time without need to use *threading* module directly.\n\nExamples\n--------\nAre in repository in *examples* folder\n\nIs it fast? \n-----------\nSpeed and parallel execution is a top priority. Right now threads are used to exchange information between pipline stages, because it's memory and CPU efficient to exchange data between threads and not processes. Python's threads are flawed by GIL, but it doesn't affect performance for IO-bound tasks and for numpy operations. Since all operations for data augmentations are likely to be done in numpy operations, performance will not be significantly affected by GIL.\n\nInstallation\n------------\n:code:`pip install pdp`\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Build fast data processing pipelines easily",
    "version": "0.3.0",
    "project_urls": {
        "Download": "https://github.com/Egor-Krivov/pdp/archive/v0.3.0.tar.gz",
        "Homepage": "https://github.com/Egor-Krivov/pdp"
    },
    "split_keywords": [
        "pipeline",
        "parallel",
        "thread",
        "data",
        "processing",
        "augmentation"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e4f95e4886980fd2a86013055142f9f3c9f94d3205495a0603a55d5eae32ac9d",
                "md5": "63bcd72870c1619effd5d5f371a1dcd2",
                "sha256": "248b9e714efbaccd3d1d7f70a67a315aad8e54aecd9756320aadd1a951cf5171"
            },
            "downloads": -1,
            "filename": "pdp-0.3.0.tar.gz",
            "has_sig": false,
            "md5_digest": "63bcd72870c1619effd5d5f371a1dcd2",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 5223,
            "upload_time": "2018-09-20T13:15:41",
            "upload_time_iso_8601": "2018-09-20T13:15:41.006384Z",
            "url": "https://files.pythonhosted.org/packages/e4/f9/5e4886980fd2a86013055142f9f3c9f94d3205495a0603a55d5eae32ac9d/pdp-0.3.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2018-09-20 13:15:41",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Egor-Krivov",
    "github_project": "pdp",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "pdp"
}
        
Elapsed time: 0.07377s