django-crunch


Namedjango-crunch JSON
Version 0.1.12 PyPI version JSON
download
home_pagehttps://github.com/rbturnbull/django-crunch
SummaryA data processing orcestration tool.
upload_time2023-01-23 12:33:28
maintainer
docs_urlNone
authorRobert Turnbull, Mar Quiroga and Simon Mutch
requires_python>=3.8,<=3.11
licenseApache-2.0
keywords django data processing
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage
            ================================================================
django-crunch
================================================================

.. image:: https://raw.githubusercontent.com/rbturnbull/django-crunch/main/docs/images/crunch-banner.svg

.. start-badges

|testing badge| |coverage badge| |docs badge| |black badge|

.. |testing badge| image:: https://github.com/rbturnbull/django-crunch/actions/workflows/testing.yml/badge.svg
    :target: https://github.com/rbturnbull/django-crunch/actions

.. |docs badge| image:: https://github.com/rbturnbull/django-crunch/actions/workflows/docs.yml/badge.svg
    :target: https://rbturnbull.github.io/django-crunch
    
.. |black badge| image:: https://img.shields.io/badge/code%20style-black-000000.svg
    :target: https://github.com/psf/black
    
.. |coverage badge| image:: https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/rbturnbull/d83b00666fad82df59a814083a09d1c1/raw/coverage-badge.json
    :target: https://rbturnbull.github.io/django-crunch/coverage/
    
.. end-badges


.. start-quickstart

A data processing orcestration tool.
Crunch allows you to visualize the datasets, orchestrate and manage the processing online and present the results to the world.

Description
===========

Crunch coordinates three components. First there is a web application in the cloud. 
Crunch includes a Django app which can be included in a website built using the Django framework for building database driven websites. 
The website allows users to create what we call ``datasets``. 
Each dataset corresponds to one collection of files and metadata which is designed to be run through the workflow. 
Each dataset has its own page on the website which displays all the information about it.

The second component is data storage. 
The main use-case for crunch is when your datasets are so many and so large that you cannott fit them all on to the disk where you are doing your computation. 
With crunch, the data can be stored in any of the media storage options available with Django. 
These can be Amazon S3, Google Cloud or many other storage options. 

The third component is the client which runs at the place where you are performing the computation. 
This could be in a high-performance computing environment, or it could be on a virtual machine in the cloud or it could be just on your own laptop. 
The user just runs the crunch command line tool. 
The command line tool communicates with the website to find out which dataset should be processed next, 
then it copies it from storage and saves it locally. 
Then it processes the dataset using a pre-defined workflow provided by the user. 
When it is finished, it copies back the data to the storage with the resulting files. 
At each stage the user can see the status on the website interface. 
If you run the `crunch loop`` command then the client continually loops through the datasets until they are completely finished. 
You can run as many clients in parallel as you have computing resources.

Once each dataset is processed, the resulting files can be accessed via the website. 
The permissions for the website can be set dynamically so that users can restrict access 
to just the team for while the data is being processed and once the results are ready for the world then you can allow access to the public.


Installation
==================================

The crunch app for a Django website and the command-line client are installed with pip:

.. code-block:: bash

    pip install git+https://github.com/rbturnbull/django-crunch


Install the crunch app to the Djanco website project by adding it to the settings:

.. code-block:: python

    INSTALLED_APPS += [
        "crunch",
    ]

Then add the urls to your main urls.py:

.. code-block:: python

    urlpatterns += [
        path('crunch/', include('crunch.django.app.urls'))),    
    ]

The path ``crunch/`` can be changed to be whatever you choose.

Usage
==================================

Create projects, datasets, items and attributes on the website using the HTML interface, the crunch command-line client or the Python API.

Upload initial data for each dataset as needed to the storage using the Crunch HTML interface or direct to the folder for each dataset on the storage.

Then process each dataset at the location where you are performing your compute with the crunch client. All datasets can be processed with the single command:

.. code-block:: bash

    crunch loop

.. end-quickstart

Credits
==================================

.. start-credits

Robert Turnbull, Mar Quiroga and Simon Mutch from the Melbourne Data Analytics Platform.

Publication and citation details to follow.

.. end-credits

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/rbturnbull/django-crunch",
    "name": "django-crunch",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8,<=3.11",
    "maintainer_email": "",
    "keywords": "django,data processing",
    "author": "Robert Turnbull, Mar Quiroga and Simon Mutch",
    "author_email": "robert.turnbull@unimelb.edu.au",
    "download_url": "https://files.pythonhosted.org/packages/be/f3/75f50389d4988e7392159273169c377b8ac08ad56cba2b622f38c1919696/django_crunch-0.1.12.tar.gz",
    "platform": null,
    "description": "================================================================\ndjango-crunch\n================================================================\n\n.. image:: https://raw.githubusercontent.com/rbturnbull/django-crunch/main/docs/images/crunch-banner.svg\n\n.. start-badges\n\n|testing badge| |coverage badge| |docs badge| |black badge|\n\n.. |testing badge| image:: https://github.com/rbturnbull/django-crunch/actions/workflows/testing.yml/badge.svg\n    :target: https://github.com/rbturnbull/django-crunch/actions\n\n.. |docs badge| image:: https://github.com/rbturnbull/django-crunch/actions/workflows/docs.yml/badge.svg\n    :target: https://rbturnbull.github.io/django-crunch\n    \n.. |black badge| image:: https://img.shields.io/badge/code%20style-black-000000.svg\n    :target: https://github.com/psf/black\n    \n.. |coverage badge| image:: https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/rbturnbull/d83b00666fad82df59a814083a09d1c1/raw/coverage-badge.json\n    :target: https://rbturnbull.github.io/django-crunch/coverage/\n    \n.. end-badges\n\n\n.. start-quickstart\n\nA data processing orcestration tool.\nCrunch allows you to visualize the datasets, orchestrate and manage the processing online and present the results to the world.\n\nDescription\n===========\n\nCrunch coordinates three components. First there is a web application in the cloud. \nCrunch includes a Django app which can be included in a website built using the Django framework for building database driven websites. \nThe website allows users to create what we call ``datasets``. \nEach dataset corresponds to one collection of files and metadata which is designed to be run through the workflow. \nEach dataset has its own page on the website which displays all the information about it.\n\nThe second component is data storage. \nThe main use-case for crunch is when your datasets are so many and so large that you cannott fit them all on to the disk where you are doing your computation. \nWith crunch, the data can be stored in any of the media storage options available with Django. \nThese can be Amazon S3, Google Cloud or many other storage options. \n\nThe third component is the client which runs at the place where you are performing the computation. \nThis could be in a high-performance computing environment, or it could be on a virtual machine in the cloud or it could be just on your own laptop. \nThe user just runs the crunch command line tool. \nThe command line tool communicates with the website to find out which dataset should be processed next, \nthen it copies it from storage and saves it locally. \nThen it processes the dataset using a pre-defined workflow provided by the user. \nWhen it is finished, it copies back the data to the storage with the resulting files. \nAt each stage the user can see the status on the website interface. \nIf you run the `crunch loop`` command then the client continually loops through the datasets until they are completely finished. \nYou can run as many clients in parallel as you have computing resources.\n\nOnce each dataset is processed, the resulting files can be accessed via the website. \nThe permissions for the website can be set dynamically so that users can restrict access \nto just the team for while the data is being processed and once the results are ready for the world then you can allow access to the public.\n\n\nInstallation\n==================================\n\nThe crunch app for a Django website and the command-line client are installed with pip:\n\n.. code-block:: bash\n\n    pip install git+https://github.com/rbturnbull/django-crunch\n\n\nInstall the crunch app to the Djanco website project by adding it to the settings:\n\n.. code-block:: python\n\n    INSTALLED_APPS += [\n        \"crunch\",\n    ]\n\nThen add the urls to your main urls.py:\n\n.. code-block:: python\n\n    urlpatterns += [\n        path('crunch/', include('crunch.django.app.urls'))),    \n    ]\n\nThe path ``crunch/`` can be changed to be whatever you choose.\n\nUsage\n==================================\n\nCreate projects, datasets, items and attributes on the website using the HTML interface, the crunch command-line client or the Python API.\n\nUpload initial data for each dataset as needed to the storage using the Crunch HTML interface or direct to the folder for each dataset on the storage.\n\nThen process each dataset at the location where you are performing your compute with the crunch client. All datasets can be processed with the single command:\n\n.. code-block:: bash\n\n    crunch loop\n\n.. end-quickstart\n\nCredits\n==================================\n\n.. start-credits\n\nRobert Turnbull, Mar Quiroga and Simon Mutch from the Melbourne Data Analytics Platform.\n\nPublication and citation details to follow.\n\n.. end-credits\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "A data processing orcestration tool.",
    "version": "0.1.12",
    "split_keywords": [
        "django",
        "data processing"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "625285a377f65994176ab0d6bd9339b844dc94f5e7ab0d05b21cb1cf33b2ed55",
                "md5": "1a3c79604bacbb33c8738f74c9e2c6d0",
                "sha256": "a380b407d4601f7502b772cfc795f9ba224788f629dcc3dc33718d6add514a19"
            },
            "downloads": -1,
            "filename": "django_crunch-0.1.12-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1a3c79604bacbb33c8738f74c9e2c6d0",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8,<=3.11",
            "size": 49459,
            "upload_time": "2023-01-23T12:33:25",
            "upload_time_iso_8601": "2023-01-23T12:33:25.793779Z",
            "url": "https://files.pythonhosted.org/packages/62/52/85a377f65994176ab0d6bd9339b844dc94f5e7ab0d05b21cb1cf33b2ed55/django_crunch-0.1.12-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "bef375f50389d4988e7392159273169c377b8ac08ad56cba2b622f38c1919696",
                "md5": "944dfe0911683ecb888faeb33a1f4cc1",
                "sha256": "a16ca77c52f9973dfc369c9e053d2ca669658c753c9052635bf68b2178ba904b"
            },
            "downloads": -1,
            "filename": "django_crunch-0.1.12.tar.gz",
            "has_sig": false,
            "md5_digest": "944dfe0911683ecb888faeb33a1f4cc1",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8,<=3.11",
            "size": 37908,
            "upload_time": "2023-01-23T12:33:28",
            "upload_time_iso_8601": "2023-01-23T12:33:28.328737Z",
            "url": "https://files.pythonhosted.org/packages/be/f3/75f50389d4988e7392159273169c377b8ac08ad56cba2b622f38c1919696/django_crunch-0.1.12.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-01-23 12:33:28",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "rbturnbull",
    "github_project": "django-crunch",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "lcname": "django-crunch"
}
        
Elapsed time: 0.13326s