cluster-pack


Namecluster-pack JSON
Version 0.3.5 PyPI version JSON
download
home_pagehttps://github.com/criteo/cluster-pack
SummaryA library on top of either pex or conda-packto make your Python code easily available on a cluster
upload_time2024-03-13 14:47:12
maintainerCriteo
docs_urlNone
author
requires_python>=3.6
license
keywords hadoop distributed cluster s3 hdfs
VCS
bugtrack_url
requirements cloudpickle pex conda-pack pip pyarrow fire types-setuptools wheel-filename
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # cluster-pack

cluster-pack is a library on top of either [pex][pex] or [conda-pack][conda-pack] to make your Python code easily available on a cluster.

Its goal is to make your prod/dev Python code & libraries easiliy available on any cluster. cluster-pack supports HDFS/S3 as a distributed storage.

The first examples use [Skein][skein] (a simple library for deploying applications on Apache YARN) and [PySpark](https://spark.apache.org/docs/latest/quick-start.html) with HDFS storage. We intend to add more examples for other applications (like [Dask](https://dask.org/), [Ray](https://ray.readthedocs.io/en/latest/index.html)) and S3 storage.

An introducing blog post can be found [here](https://medium.com/criteo-labs/open-sourcing-cluster-pack-700f46c139a).

![cluster-pack](https://github.com/criteo/cluster-pack/blob/master/cluster_pack.png?raw=true)

## Installation

### Install with Pip

```bash
$ pip install cluster-pack
```

### Install from source

```bash
$ git clone https://github.com/criteo/cluster-pack
$ cd cluster-pack
$ pip install .
```

## Prerequisites

cluster-pack supports Python ≥3.6.

## Features

- Ships a package with all the dependencies from your current virtual environment or your conda environment

- Stores metadata for an environment

- Supports "under development" mode by taking advantage of pip's [editable installs mode][editable_installs_mode], all editable requirements will be uploaded all the time, making local changes directly visible on the cluster

- Interactive (Jupyter notebook) mode

- Provides config helpers to directly use the uploaded zip file inside your application

- Launching jobs from jobs by propagating all artifacts


## Basic examples with [skein][skein]

1) [Interactive mode](https://github.com/criteo/cluster-pack/blob/master/examples/interactive-mode/README.md)

2) [Self shipping project](https://github.com/criteo/cluster-pack/blob/master/examples/skein-project/README.md)


## Basic examples with [PySpark](https://spark.apache.org/docs/latest/quick-start.html)

1) [PySpark with HDFS on Yarn](https://github.com/criteo/cluster-pack/blob/master/examples/spark/spark_example.py)

2) [Docker with PySpark on S3](https://github.com/criteo/cluster-pack/blob/master/examples/spark-with-S3/README.md)

[pex]: https://github.com/pantsbuild/pex
[conda-pack]: https://github.com/conda/conda-pack
[editable_installs_mode]: https://pip.pypa.io/en/stable/reference/pip_install/#editable-installs
[skein]: https://jcrist.github.io/skein/



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/criteo/cluster-pack",
    "name": "cluster-pack",
    "maintainer": "Criteo",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "github@criteo.com",
    "keywords": "hadoop distributed cluster S3 HDFS",
    "author": "",
    "author_email": "",
    "download_url": "",
    "platform": null,
    "description": "# cluster-pack\n\ncluster-pack is a library on top of either [pex][pex] or [conda-pack][conda-pack] to make your Python code easily available on a cluster.\n\nIts goal is to make your prod/dev Python code & libraries easiliy available on any cluster. cluster-pack supports HDFS/S3 as a distributed storage.\n\nThe first examples use [Skein][skein] (a simple library for deploying applications on Apache YARN) and [PySpark](https://spark.apache.org/docs/latest/quick-start.html) with HDFS storage. We intend to add more examples for other applications (like [Dask](https://dask.org/), [Ray](https://ray.readthedocs.io/en/latest/index.html)) and S3 storage.\n\nAn introducing blog post can be found [here](https://medium.com/criteo-labs/open-sourcing-cluster-pack-700f46c139a).\n\n![cluster-pack](https://github.com/criteo/cluster-pack/blob/master/cluster_pack.png?raw=true)\n\n## Installation\n\n### Install with Pip\n\n```bash\n$ pip install cluster-pack\n```\n\n### Install from source\n\n```bash\n$ git clone https://github.com/criteo/cluster-pack\n$ cd cluster-pack\n$ pip install .\n```\n\n## Prerequisites\n\ncluster-pack supports Python \u22653.6.\n\n## Features\n\n- Ships a package with all the dependencies from your current virtual environment or your conda environment\n\n- Stores metadata for an environment\n\n- Supports\u00a0\"under\u00a0development\"\u00a0mode\u00a0by\u00a0taking\u00a0advantage\u00a0of\u00a0pip's\u00a0[editable\u00a0installs\u00a0mode][editable_installs_mode],\u00a0all\u00a0editable\u00a0requirements\u00a0will\u00a0be\u00a0uploaded\u00a0all\u00a0the\u00a0time,\u00a0making\u00a0local\u00a0changes\u00a0directly\u00a0visible\u00a0on\u00a0the\u00a0cluster\n\n- Interactive (Jupyter notebook) mode\n\n- Provides config helpers to directly use the uploaded zip file inside your application\n\n- Launching jobs from jobs by propagating all artifacts\n\n\n## Basic examples with [skein][skein]\n\n1) [Interactive mode](https://github.com/criteo/cluster-pack/blob/master/examples/interactive-mode/README.md)\n\n2) [Self shipping project](https://github.com/criteo/cluster-pack/blob/master/examples/skein-project/README.md)\n\n\n## Basic examples with [PySpark](https://spark.apache.org/docs/latest/quick-start.html)\n\n1) [PySpark with HDFS on Yarn](https://github.com/criteo/cluster-pack/blob/master/examples/spark/spark_example.py)\n\n2) [Docker with PySpark on S3](https://github.com/criteo/cluster-pack/blob/master/examples/spark-with-S3/README.md)\n\n[pex]: https://github.com/pantsbuild/pex\n[conda-pack]: https://github.com/conda/conda-pack\n[editable_installs_mode]: https://pip.pypa.io/en/stable/reference/pip_install/#editable-installs\n[skein]: https://jcrist.github.io/skein/\n\n\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "A library on top of either pex or conda-packto make your Python code easily available on a cluster",
    "version": "0.3.5",
    "project_urls": {
        "Homepage": "https://github.com/criteo/cluster-pack"
    },
    "split_keywords": [
        "hadoop",
        "distributed",
        "cluster",
        "s3",
        "hdfs"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "cabaa508d0cbe4eabadcaa3b344274a3718ad5aaaf7e45fa07594db98d51ec5b",
                "md5": "4b71ba15a0f951d859dd30407c9d25e2",
                "sha256": "ad43804656d0127261737bdf4f5be40d482e6f68de18d1ad9e9c852b1e0e02ba"
            },
            "downloads": -1,
            "filename": "cluster_pack-0.3.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "4b71ba15a0f951d859dd30407c9d25e2",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 33377,
            "upload_time": "2024-03-13T14:47:12",
            "upload_time_iso_8601": "2024-03-13T14:47:12.392395Z",
            "url": "https://files.pythonhosted.org/packages/ca/ba/a508d0cbe4eabadcaa3b344274a3718ad5aaaf7e45fa07594db98d51ec5b/cluster_pack-0.3.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-13 14:47:12",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "criteo",
    "github_project": "cluster-pack",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "cloudpickle",
            "specs": []
        },
        {
            "name": "pex",
            "specs": [
                [
                    "==",
                    "2.1.137"
                ]
            ]
        },
        {
            "name": "conda-pack",
            "specs": []
        },
        {
            "name": "pip",
            "specs": [
                [
                    ">=",
                    "18.1"
                ]
            ]
        },
        {
            "name": "pyarrow",
            "specs": []
        },
        {
            "name": "fire",
            "specs": []
        },
        {
            "name": "types-setuptools",
            "specs": []
        },
        {
            "name": "wheel-filename",
            "specs": []
        }
    ],
    "tox": true,
    "lcname": "cluster-pack"
}
        
Elapsed time: 0.20407s