cluster-pack


Namecluster-pack JSON
Version 0.3.8 PyPI version JSON
download
home_pagehttps://github.com/criteo/cluster-pack
SummaryA library on top of either pex or conda-packto make your Python code easily available on a cluster
upload_time2024-09-16 08:25:55
maintainerCriteo
docs_urlNone
authorNone
requires_python>=3.7
licenseNone
keywords hadoop distributed cluster s3 hdfs
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # cluster-pack

cluster-pack is a library on top of either [pex][pex] or [conda-pack][conda-pack] to make your Python code easily available on a cluster.

Its goal is to make your prod/dev Python code & libraries easiliy available on any cluster. cluster-pack supports HDFS/S3 as a distributed storage.

The first examples use [Skein][skein] (a simple library for deploying applications on Apache YARN) and [PySpark](https://spark.apache.org/docs/latest/quick-start.html) with HDFS storage. We intend to add more examples for other applications (like [Dask](https://dask.org/), [Ray](https://ray.readthedocs.io/en/latest/index.html)) and S3 storage.

An introducing blog post can be found [here](https://medium.com/criteo-labs/open-sourcing-cluster-pack-700f46c139a).

![cluster-pack](https://github.com/criteo/cluster-pack/blob/master/cluster_pack.png?raw=true)

## Installation

### Install with Pip

```bash
$ pip install cluster-pack
```

### Install from source

```bash
$ git clone https://github.com/criteo/cluster-pack
$ cd cluster-pack
$ pip install .
```

## Prerequisites

cluster-pack supports Python ≥3.7.

## Features

- Ships a package with all the dependencies from your current virtual environment or your conda environment

- Stores metadata for an environment

- Supports "under development" mode by taking advantage of pip's [editable installs mode][editable_installs_mode], all editable requirements will be uploaded all the time, making local changes directly visible on the cluster

- Interactive (Jupyter notebook) mode

- Provides config helpers to directly use the uploaded zip file inside your application

- Launching jobs from jobs by propagating all artifacts


## Basic examples with [skein][skein]

1) [Interactive mode](https://github.com/criteo/cluster-pack/blob/master/examples/interactive-mode/README.md)

2) [Self shipping project](https://github.com/criteo/cluster-pack/blob/master/examples/skein-project/README.md)


## Basic examples with [PySpark](https://spark.apache.org/docs/latest/quick-start.html)

1) [PySpark with HDFS on Yarn](https://github.com/criteo/cluster-pack/blob/master/examples/spark/spark_example.py)

2) [Docker with PySpark on S3](https://github.com/criteo/cluster-pack/blob/master/examples/spark-with-S3/README.md)

[pex]: https://github.com/pantsbuild/pex
[conda-pack]: https://github.com/conda/conda-pack
[editable_installs_mode]: https://pip.pypa.io/en/stable/reference/pip_install/#editable-installs
[skein]: https://jcrist.github.io/skein/



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/criteo/cluster-pack",
    "name": "cluster-pack",
    "maintainer": "Criteo",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "github@criteo.com",
    "keywords": "hadoop distributed cluster S3 HDFS",
    "author": null,
    "author_email": null,
    "download_url": null,
    "platform": null,
    "description": "# cluster-pack\n\ncluster-pack is a library on top of either [pex][pex] or [conda-pack][conda-pack] to make your Python code easily available on a cluster.\n\nIts goal is to make your prod/dev Python code & libraries easiliy available on any cluster. cluster-pack supports HDFS/S3 as a distributed storage.\n\nThe first examples use [Skein][skein] (a simple library for deploying applications on Apache YARN) and [PySpark](https://spark.apache.org/docs/latest/quick-start.html) with HDFS storage. We intend to add more examples for other applications (like [Dask](https://dask.org/), [Ray](https://ray.readthedocs.io/en/latest/index.html)) and S3 storage.\n\nAn introducing blog post can be found [here](https://medium.com/criteo-labs/open-sourcing-cluster-pack-700f46c139a).\n\n![cluster-pack](https://github.com/criteo/cluster-pack/blob/master/cluster_pack.png?raw=true)\n\n## Installation\n\n### Install with Pip\n\n```bash\n$ pip install cluster-pack\n```\n\n### Install from source\n\n```bash\n$ git clone https://github.com/criteo/cluster-pack\n$ cd cluster-pack\n$ pip install .\n```\n\n## Prerequisites\n\ncluster-pack supports Python \u22653.7.\n\n## Features\n\n- Ships a package with all the dependencies from your current virtual environment or your conda environment\n\n- Stores metadata for an environment\n\n- Supports\u00a0\"under\u00a0development\"\u00a0mode\u00a0by\u00a0taking\u00a0advantage\u00a0of\u00a0pip's\u00a0[editable\u00a0installs\u00a0mode][editable_installs_mode],\u00a0all\u00a0editable\u00a0requirements\u00a0will\u00a0be\u00a0uploaded\u00a0all\u00a0the\u00a0time,\u00a0making\u00a0local\u00a0changes\u00a0directly\u00a0visible\u00a0on\u00a0the\u00a0cluster\n\n- Interactive (Jupyter notebook) mode\n\n- Provides config helpers to directly use the uploaded zip file inside your application\n\n- Launching jobs from jobs by propagating all artifacts\n\n\n## Basic examples with [skein][skein]\n\n1) [Interactive mode](https://github.com/criteo/cluster-pack/blob/master/examples/interactive-mode/README.md)\n\n2) [Self shipping project](https://github.com/criteo/cluster-pack/blob/master/examples/skein-project/README.md)\n\n\n## Basic examples with [PySpark](https://spark.apache.org/docs/latest/quick-start.html)\n\n1) [PySpark with HDFS on Yarn](https://github.com/criteo/cluster-pack/blob/master/examples/spark/spark_example.py)\n\n2) [Docker with PySpark on S3](https://github.com/criteo/cluster-pack/blob/master/examples/spark-with-S3/README.md)\n\n[pex]: https://github.com/pantsbuild/pex\n[conda-pack]: https://github.com/conda/conda-pack\n[editable_installs_mode]: https://pip.pypa.io/en/stable/reference/pip_install/#editable-installs\n[skein]: https://jcrist.github.io/skein/\n\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A library on top of either pex or conda-packto make your Python code easily available on a cluster",
    "version": "0.3.8",
    "project_urls": {
        "Homepage": "https://github.com/criteo/cluster-pack"
    },
    "split_keywords": [
        "hadoop",
        "distributed",
        "cluster",
        "s3",
        "hdfs"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "20ae6b08d9eb0407faa3031c92732f0786c22d8bf5b1d3a4fcdabd6422051e4d",
                "md5": "1017f09dcad33a36efd052d56e5774bc",
                "sha256": "a348c71a19cb438547370d07f3c19abbb02a73da7399ad0174cd8a2ab0f7a376"
            },
            "downloads": -1,
            "filename": "cluster_pack-0.3.8-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1017f09dcad33a36efd052d56e5774bc",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 33777,
            "upload_time": "2024-09-16T08:25:55",
            "upload_time_iso_8601": "2024-09-16T08:25:55.798126Z",
            "url": "https://files.pythonhosted.org/packages/20/ae/6b08d9eb0407faa3031c92732f0786c22d8bf5b1d3a4fcdabd6422051e4d/cluster_pack-0.3.8-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-16 08:25:55",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "criteo",
    "github_project": "cluster-pack",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "tox": true,
    "lcname": "cluster-pack"
}
        
Elapsed time: 0.44888s