# cluster-pack
cluster-pack is a library on top of either [pex][pex] or [conda-pack][conda-pack] to make your Python code easily available on a cluster.
Its goal is to make your prod/dev Python code & libraries easiliy available on any cluster. cluster-pack supports HDFS/S3 as a distributed storage.
The first examples use [Skein][skein] (a simple library for deploying applications on Apache YARN) and [PySpark](https://spark.apache.org/docs/latest/quick-start.html) with HDFS storage. We intend to add more examples for other applications (like [Dask](https://dask.org/), [Ray](https://ray.readthedocs.io/en/latest/index.html)) and S3 storage.
An introducing blog post can be found [here](https://medium.com/criteo-labs/open-sourcing-cluster-pack-700f46c139a).
![cluster-pack](https://github.com/criteo/cluster-pack/blob/master/cluster_pack.png?raw=true)
## Installation
### Install with Pip
```bash
$ pip install cluster-pack
```
### Install from source
```bash
$ git clone https://github.com/criteo/cluster-pack
$ cd cluster-pack
$ pip install .
```
## Prerequisites
cluster-pack supports Python ≥3.7.
## Features
- Ships a package with all the dependencies from your current virtual environment or your conda environment
- Stores metadata for an environment
- Supports "under development" mode by taking advantage of pip's [editable installs mode][editable_installs_mode], all editable requirements will be uploaded all the time, making local changes directly visible on the cluster
- Interactive (Jupyter notebook) mode
- Provides config helpers to directly use the uploaded zip file inside your application
- Launching jobs from jobs by propagating all artifacts
## Basic examples with [skein][skein]
1) [Interactive mode](https://github.com/criteo/cluster-pack/blob/master/examples/interactive-mode/README.md)
2) [Self shipping project](https://github.com/criteo/cluster-pack/blob/master/examples/skein-project/README.md)
## Basic examples with [PySpark](https://spark.apache.org/docs/latest/quick-start.html)
1) [PySpark with HDFS on Yarn](https://github.com/criteo/cluster-pack/blob/master/examples/spark/spark_example.py)
2) [Docker with PySpark on S3](https://github.com/criteo/cluster-pack/blob/master/examples/spark-with-S3/README.md)
[pex]: https://github.com/pantsbuild/pex
[conda-pack]: https://github.com/conda/conda-pack
[editable_installs_mode]: https://pip.pypa.io/en/stable/reference/pip_install/#editable-installs
[skein]: https://jcrist.github.io/skein/
Raw data
{
"_id": null,
"home_page": "https://github.com/criteo/cluster-pack",
"name": "cluster-pack",
"maintainer": "Criteo",
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "github@criteo.com",
"keywords": "hadoop distributed cluster S3 HDFS",
"author": null,
"author_email": null,
"download_url": null,
"platform": null,
"description": "# cluster-pack\n\ncluster-pack is a library on top of either [pex][pex] or [conda-pack][conda-pack] to make your Python code easily available on a cluster.\n\nIts goal is to make your prod/dev Python code & libraries easiliy available on any cluster. cluster-pack supports HDFS/S3 as a distributed storage.\n\nThe first examples use [Skein][skein] (a simple library for deploying applications on Apache YARN) and [PySpark](https://spark.apache.org/docs/latest/quick-start.html) with HDFS storage. We intend to add more examples for other applications (like [Dask](https://dask.org/), [Ray](https://ray.readthedocs.io/en/latest/index.html)) and S3 storage.\n\nAn introducing blog post can be found [here](https://medium.com/criteo-labs/open-sourcing-cluster-pack-700f46c139a).\n\n![cluster-pack](https://github.com/criteo/cluster-pack/blob/master/cluster_pack.png?raw=true)\n\n## Installation\n\n### Install with Pip\n\n```bash\n$ pip install cluster-pack\n```\n\n### Install from source\n\n```bash\n$ git clone https://github.com/criteo/cluster-pack\n$ cd cluster-pack\n$ pip install .\n```\n\n## Prerequisites\n\ncluster-pack supports Python \u22653.7.\n\n## Features\n\n- Ships a package with all the dependencies from your current virtual environment or your conda environment\n\n- Stores metadata for an environment\n\n- Supports\u00a0\"under\u00a0development\"\u00a0mode\u00a0by\u00a0taking\u00a0advantage\u00a0of\u00a0pip's\u00a0[editable\u00a0installs\u00a0mode][editable_installs_mode],\u00a0all\u00a0editable\u00a0requirements\u00a0will\u00a0be\u00a0uploaded\u00a0all\u00a0the\u00a0time,\u00a0making\u00a0local\u00a0changes\u00a0directly\u00a0visible\u00a0on\u00a0the\u00a0cluster\n\n- Interactive (Jupyter notebook) mode\n\n- Provides config helpers to directly use the uploaded zip file inside your application\n\n- Launching jobs from jobs by propagating all artifacts\n\n\n## Basic examples with [skein][skein]\n\n1) [Interactive mode](https://github.com/criteo/cluster-pack/blob/master/examples/interactive-mode/README.md)\n\n2) [Self shipping project](https://github.com/criteo/cluster-pack/blob/master/examples/skein-project/README.md)\n\n\n## Basic examples with [PySpark](https://spark.apache.org/docs/latest/quick-start.html)\n\n1) [PySpark with HDFS on Yarn](https://github.com/criteo/cluster-pack/blob/master/examples/spark/spark_example.py)\n\n2) [Docker with PySpark on S3](https://github.com/criteo/cluster-pack/blob/master/examples/spark-with-S3/README.md)\n\n[pex]: https://github.com/pantsbuild/pex\n[conda-pack]: https://github.com/conda/conda-pack\n[editable_installs_mode]: https://pip.pypa.io/en/stable/reference/pip_install/#editable-installs\n[skein]: https://jcrist.github.io/skein/\n\n\n",
"bugtrack_url": null,
"license": null,
"summary": "A library on top of either pex or conda-packto make your Python code easily available on a cluster",
"version": "0.3.10",
"project_urls": {
"Homepage": "https://github.com/criteo/cluster-pack"
},
"split_keywords": [
"hadoop",
"distributed",
"cluster",
"s3",
"hdfs"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "b1747cf61060d4bde584a7bb8c21c71b6a491ac927ba12784d476c45da2f93c8",
"md5": "25f2b788bf1051f445181771d70a634d",
"sha256": "2dc397999b0008c384a4473bf35e8c93168afecf26f35213f7eddbeee295aefa"
},
"downloads": -1,
"filename": "cluster_pack-0.3.10-py3-none-any.whl",
"has_sig": false,
"md5_digest": "25f2b788bf1051f445181771d70a634d",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 31790,
"upload_time": "2024-11-27T16:34:14",
"upload_time_iso_8601": "2024-11-27T16:34:14.535684Z",
"url": "https://files.pythonhosted.org/packages/b1/74/7cf61060d4bde584a7bb8c21c71b6a491ac927ba12784d476c45da2f93c8/cluster_pack-0.3.10-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-27 16:34:14",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "criteo",
"github_project": "cluster-pack",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"tox": true,
"lcname": "cluster-pack"
}