rainbow-scheduler


Namerainbow-scheduler JSON
Version 0.0.16 PyPI version JSON
download
home_pagehttps://github.com/converged-computing/rainbow/tree/main/python/v1
SummaryPython gRPC functions for the Rainbow Scheduler
upload_time2024-05-11 17:25:56
maintainerVanessasaurus
docs_urlNone
authorVanessasaurus
requires_pythonNone
licenseMIT
keywords multi-cluster scheduler
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # rainbow (python)

> 🌈️ Where keebler elves and schedulers live, somewhere in the clouds, and with marshmallows

![https://github.com/converged-computing/rainbow/raw/main/docs/img/rainbow.png](https://github.com/converged-computing/rainbow/raw/main/docs/img/rainbow.png)

This is the rainbow scheduler prototype, specifically Python bindings for a gRPC client. To learn more about rainbow, visit [https://github.com/converged-computing/rainbow](https://github.com/converged-computing/rainbow).

## Example

Assuming that you can run the server with Go, let's first do that (e.g., from the root of the repository linked above, and soon we will provide a container):

### Register

```bash
make server
```
```console
go run cmd/server/server.go
2024/02/12 19:38:58 creating 🌈️ server...
2024/02/12 19:38:58 ✨️ creating rainbow.db...
2024/02/12 19:38:58    rainbow.db file created
2024/02/12 19:38:58    create cluster table...
2024/02/12 19:38:58    cluster table created
2024/02/12 19:38:58    create jobs table...
2024/02/12 19:38:58    jobs table created
2024/02/12 19:38:58 starting scheduler server: rainbow v0.1.0-draft
2024/02/12 19:38:58 server listening: [::]:50051
```

And then let's do a registration, but this time from the Python bindings (client) here! We will use the core bindings in [rainbow/client.py](rainbow/client.py) but run a custom command from [examples](examples). Assuming you've installed everything into a venv:

```bash
python -m venv env
source env/bin/activate
pip install -e .
```

The command below will register and save the secret to a new configuration file.
Note that if you provide an existing one, it will use or update it.

```bash
python ./examples/flux/register.py keebler --config-path ./rainbow-config.yaml
```
```console
Saving rainbow config to ./rainbow-config.yaml
🤫️ The token you will need to submit jobs to this cluster is rainbow
🔐️ The secret you will need to accept jobs is 649598a9-e77b-4aa3-ab46-bfbbc5e2d606
```
Try running it again - you can't register a cluster twice. But of course other cluster names you can register. A "cluster" can actually be a cluster, or a flux instance, or any entity that can accept jobs. The script also accepts arguments (see `register.py --help`)

```console
python ./examples/flux/register.py --help

🌈️ Rainbow scheduler register

options:
  -h, --help            show this help message and exit
  --cluster CLUSTER     cluster name to register
  --host HOST           host of rainbow cluster
  --secret SECRET       Rainbow cluster registration secret
  --config-path CONFIG_PATH
                        Path to rainbow configuration file to write or use
  --cluster-nodes CLUSTER_NODES
                        Nodes to provide for registration
```

### Register Subsystem

Let's now register the subsystem. Akin to register, this has the path to the subsystem nodes set as a default,
and the name `--subsystem` set to "io." This assumes you've registered your cluster and have the cluster secret
in your ./rainbow-config.yaml

```bash
python ./examples/flux/register-subsystem.py keebler --config-path ./rainbow-config.yaml
```
```console
status: REGISTER_SUCCESS
```

In the server window you'll see the subsystem added:

```console
...
2024/03/09 14:21:50 📝️ received subsystem register: keebler
2024/03/09 14:21:50 Preparing to load 6 nodes and 30 edges
2024/03/09 14:21:50 We have made an in memory graph (subsystem io) with 7 vertices, with 15 connections to the dominant!
{
 "keebler": {
  "Name": "keebler",
  "Counts": {
   "io": 1,
   "mtl1unit": 1,
   "mtl2unit": 1,
   "mtl3unit": 1,
   "nvme": 1,
   "shm": 1
  }
 }
}
```

### Update State

While we likely will have clusters sending back state when they accept jobs, for now we have a separate endpoint to do a one-off request to update the state. You can test that here.

```bash
python ./examples/flux/update-state.py keebler --config-path ./rainbow-config.yaml
```
```console
status: UPDATE_STATE_SUCCESS
```

In the server terminal (depending on your level of logging) you'll see the state update.

```console
2024/04/05 18:45:16 We have made an in memory graph (subsystem io) with 7 vertices, with 15 connections to the dominant!
Metrics for subsystem io{
 "io": 1,
 "mtl1unit": 1,
 "mtl2unit": 1,
 "mtl3unit": 1,
 "nvme": 1,
 "shm": 1
}
2024/04/05 18:47:18 📝️ received state update: keebler
Updating state cost-per-node to 12
Updating state max-jobs to 100
```

Note that the path to the state metadata file is provided as a default to make the demo simple.
This state metadata will be provided to the selection algorithm to use as needed to make choice for
a final cluster.

### Submit Job (Simple)

Now let's submit a job to our faux cluster. We need to provide the token we received above. Remember that this is a two stage process:

1. Query the graph database for one or more cluster matches.
2. Send that request to rainbow.

The client handles both, so you (as the user) only are exposed to the single submit. We will be providing basic arguments for
the job, but note you can provide other arguments too:

```console
python ./examples/flux/submit-job.py --help

🌈️ Rainbow scheduler submit

positional arguments:
  command               Command to submit

options:
  -h, --help            show this help message and exit
  --config-path CONFIG_PATH
                        config path with cluster names
  --host HOST           host of rainbow cluster
  --token TOKEN         Cluster token for permission to submit jobs
  --nodes NODES         Nodes for job (defaults to 1)
```

And then submit! Remember that you need to have registered first. Note that we need to provide our cluster config path.

```console
$ python examples/flux/submit-job.py --config-path ./rainbow-config.yaml --nodes 1 echo hello world
```bash
```console
{
    "version": 1,
    "resources": [
        {
            "type": "node",
            "count": 1,
            "with": [
                {
                    "type": "slot",
                    "count": 1,
                    "label": "echo",
                    "with": [
                        {
                            "type": "core",
                            "count": 1
                        }
                    ]
                }
            ]
        }
    ],
    "tasks": [
        {
            "command": [
                "echo",
                "hello",
                "world"
            ],
            "slot": "echo",
            "count": {
                "per_slot": 1
            }
        }
    ],
    "attributes": {}
}
clusters: "keebler"
status: RESULT_TYPE_SUCCESS

status: SUBMIT_SUCCESS
```

### Submit Jobspec

We can also submit a jobspec directly, which is an advanced use case. It works predominantly the same, except we load in the Jobspec from
the yaml directly.

```console
python examples/flux/submit-jobspec.py --config-path ./rainbow-config.yaml ../../docs/examples/scheduler/jobspec-io.yaml

🌈️ Rainbow scheduler submit

positional arguments:
  jobspec               Jobspec path to submit

options:
  -h, --help            show this help message and exit
  --config-path CONFIG_PATH
                        config path with cluster metadata
```

It largely looks the same - I'll cut most of it out. It's just a different entry point for the job definition.

```console
clusters: "keebler"
status: RESULT_TYPE_SUCCESS

status: SUBMIT_SUCCESS
```

### Receive Jobs

After we submit jobs, rainbow assigns them to a cluster. For this dummy example we are assigning to the same cluster (keebler) so we can also use our host "keebler" to receive the job. Here is what that looks like.

```console
python ./examples/flux/receive-jobs.py --help

🌈️ Rainbow scheduler receive jobs

options:
  -h, --help            show this help message and exit
  --max-jobs MAX_JOBS   Maximum jobs to request (unset defaults to all)
  --config-path CONFIG_PATH
                        config path with cluster metadata
```

And then request and accept jobs:

```console
python examples/flux/receive-jobs.py --config-path ./rainbow-config.yaml
Status: REQUEST_JOBS_SUCCESS
Received 1 jobs to accept...
```

If this were running in Flux, we would be able to run it, and the response above has told rainbow that you've accepted it (and rainbow deletes the record of it).


## License

HPCIC DevTools is distributed under the terms of the MIT license.
All new contributions must be made under this license.

See [LICENSE](https://github.com/converged-computing/cloud-select/blob/main/LICENSE),
[COPYRIGHT](https://github.com/converged-computing/cloud-select/blob/main/COPYRIGHT), and
[NOTICE](https://github.com/converged-computing/cloud-select/blob/main/NOTICE) for details.

SPDX-License-Identifier: (MIT)

LLNL-CODE- 842614

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/converged-computing/rainbow/tree/main/python/v1",
    "name": "rainbow-scheduler",
    "maintainer": "Vanessasaurus",
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "multi-cluster, scheduler",
    "author": "Vanessasaurus",
    "author_email": "vsoch@users.noreply.github.com",
    "download_url": "https://files.pythonhosted.org/packages/5a/75/db05abebbf9f1c2f5376cc23b612995664e3302f6c5b939852346743ec9e/rainbow-scheduler-0.0.16.tar.gz",
    "platform": null,
    "description": "# rainbow (python)\n\n> \ud83c\udf08\ufe0f Where keebler elves and schedulers live, somewhere in the clouds, and with marshmallows\n\n![https://github.com/converged-computing/rainbow/raw/main/docs/img/rainbow.png](https://github.com/converged-computing/rainbow/raw/main/docs/img/rainbow.png)\n\nThis is the rainbow scheduler prototype, specifically Python bindings for a gRPC client. To learn more about rainbow, visit [https://github.com/converged-computing/rainbow](https://github.com/converged-computing/rainbow).\n\n## Example\n\nAssuming that you can run the server with Go, let's first do that (e.g., from the root of the repository linked above, and soon we will provide a container):\n\n### Register\n\n```bash\nmake server\n```\n```console\ngo run cmd/server/server.go\n2024/02/12 19:38:58 creating \ud83c\udf08\ufe0f server...\n2024/02/12 19:38:58 \u2728\ufe0f creating rainbow.db...\n2024/02/12 19:38:58    rainbow.db file created\n2024/02/12 19:38:58    create cluster table...\n2024/02/12 19:38:58    cluster table created\n2024/02/12 19:38:58    create jobs table...\n2024/02/12 19:38:58    jobs table created\n2024/02/12 19:38:58 starting scheduler server: rainbow v0.1.0-draft\n2024/02/12 19:38:58 server listening: [::]:50051\n```\n\nAnd then let's do a registration, but this time from the Python bindings (client) here! We will use the core bindings in [rainbow/client.py](rainbow/client.py) but run a custom command from [examples](examples). Assuming you've installed everything into a venv:\n\n```bash\npython -m venv env\nsource env/bin/activate\npip install -e .\n```\n\nThe command below will register and save the secret to a new configuration file.\nNote that if you provide an existing one, it will use or update it.\n\n```bash\npython ./examples/flux/register.py keebler --config-path ./rainbow-config.yaml\n```\n```console\nSaving rainbow config to ./rainbow-config.yaml\n\ud83e\udd2b\ufe0f The token you will need to submit jobs to this cluster is rainbow\n\ud83d\udd10\ufe0f The secret you will need to accept jobs is 649598a9-e77b-4aa3-ab46-bfbbc5e2d606\n```\nTry running it again - you can't register a cluster twice. But of course other cluster names you can register. A \"cluster\" can actually be a cluster, or a flux instance, or any entity that can accept jobs. The script also accepts arguments (see `register.py --help`)\n\n```console\npython ./examples/flux/register.py --help\n\n\ud83c\udf08\ufe0f Rainbow scheduler register\n\noptions:\n  -h, --help            show this help message and exit\n  --cluster CLUSTER     cluster name to register\n  --host HOST           host of rainbow cluster\n  --secret SECRET       Rainbow cluster registration secret\n  --config-path CONFIG_PATH\n                        Path to rainbow configuration file to write or use\n  --cluster-nodes CLUSTER_NODES\n                        Nodes to provide for registration\n```\n\n### Register Subsystem\n\nLet's now register the subsystem. Akin to register, this has the path to the subsystem nodes set as a default,\nand the name `--subsystem` set to \"io.\" This assumes you've registered your cluster and have the cluster secret\nin your ./rainbow-config.yaml\n\n```bash\npython ./examples/flux/register-subsystem.py keebler --config-path ./rainbow-config.yaml\n```\n```console\nstatus: REGISTER_SUCCESS\n```\n\nIn the server window you'll see the subsystem added:\n\n```console\n...\n2024/03/09 14:21:50 \ud83d\udcdd\ufe0f received subsystem register: keebler\n2024/03/09 14:21:50 Preparing to load 6 nodes and 30 edges\n2024/03/09 14:21:50 We have made an in memory graph (subsystem io) with 7 vertices, with 15 connections to the dominant!\n{\n \"keebler\": {\n  \"Name\": \"keebler\",\n  \"Counts\": {\n   \"io\": 1,\n   \"mtl1unit\": 1,\n   \"mtl2unit\": 1,\n   \"mtl3unit\": 1,\n   \"nvme\": 1,\n   \"shm\": 1\n  }\n }\n}\n```\n\n### Update State\n\nWhile we likely will have clusters sending back state when they accept jobs, for now we have a separate endpoint to do a one-off request to update the state. You can test that here.\n\n```bash\npython ./examples/flux/update-state.py keebler --config-path ./rainbow-config.yaml\n```\n```console\nstatus: UPDATE_STATE_SUCCESS\n```\n\nIn the server terminal (depending on your level of logging) you'll see the state update.\n\n```console\n2024/04/05 18:45:16 We have made an in memory graph (subsystem io) with 7 vertices, with 15 connections to the dominant!\nMetrics for subsystem io{\n \"io\": 1,\n \"mtl1unit\": 1,\n \"mtl2unit\": 1,\n \"mtl3unit\": 1,\n \"nvme\": 1,\n \"shm\": 1\n}\n2024/04/05 18:47:18 \ud83d\udcdd\ufe0f received state update: keebler\nUpdating state cost-per-node to 12\nUpdating state max-jobs to 100\n```\n\nNote that the path to the state metadata file is provided as a default to make the demo simple.\nThis state metadata will be provided to the selection algorithm to use as needed to make choice for\na final cluster.\n\n### Submit Job (Simple)\n\nNow let's submit a job to our faux cluster. We need to provide the token we received above. Remember that this is a two stage process:\n\n1. Query the graph database for one or more cluster matches.\n2. Send that request to rainbow.\n\nThe client handles both, so you (as the user) only are exposed to the single submit. We will be providing basic arguments for\nthe job, but note you can provide other arguments too:\n\n```console\npython ./examples/flux/submit-job.py --help\n\n\ud83c\udf08\ufe0f Rainbow scheduler submit\n\npositional arguments:\n  command               Command to submit\n\noptions:\n  -h, --help            show this help message and exit\n  --config-path CONFIG_PATH\n                        config path with cluster names\n  --host HOST           host of rainbow cluster\n  --token TOKEN         Cluster token for permission to submit jobs\n  --nodes NODES         Nodes for job (defaults to 1)\n```\n\nAnd then submit! Remember that you need to have registered first. Note that we need to provide our cluster config path.\n\n```console\n$ python examples/flux/submit-job.py --config-path ./rainbow-config.yaml --nodes 1 echo hello world\n```bash\n```console\n{\n    \"version\": 1,\n    \"resources\": [\n        {\n            \"type\": \"node\",\n            \"count\": 1,\n            \"with\": [\n                {\n                    \"type\": \"slot\",\n                    \"count\": 1,\n                    \"label\": \"echo\",\n                    \"with\": [\n                        {\n                            \"type\": \"core\",\n                            \"count\": 1\n                        }\n                    ]\n                }\n            ]\n        }\n    ],\n    \"tasks\": [\n        {\n            \"command\": [\n                \"echo\",\n                \"hello\",\n                \"world\"\n            ],\n            \"slot\": \"echo\",\n            \"count\": {\n                \"per_slot\": 1\n            }\n        }\n    ],\n    \"attributes\": {}\n}\nclusters: \"keebler\"\nstatus: RESULT_TYPE_SUCCESS\n\nstatus: SUBMIT_SUCCESS\n```\n\n### Submit Jobspec\n\nWe can also submit a jobspec directly, which is an advanced use case. It works predominantly the same, except we load in the Jobspec from\nthe yaml directly.\n\n```console\npython examples/flux/submit-jobspec.py --config-path ./rainbow-config.yaml ../../docs/examples/scheduler/jobspec-io.yaml\n\n\ud83c\udf08\ufe0f Rainbow scheduler submit\n\npositional arguments:\n  jobspec               Jobspec path to submit\n\noptions:\n  -h, --help            show this help message and exit\n  --config-path CONFIG_PATH\n                        config path with cluster metadata\n```\n\nIt largely looks the same - I'll cut most of it out. It's just a different entry point for the job definition.\n\n```console\nclusters: \"keebler\"\nstatus: RESULT_TYPE_SUCCESS\n\nstatus: SUBMIT_SUCCESS\n```\n\n### Receive Jobs\n\nAfter we submit jobs, rainbow assigns them to a cluster. For this dummy example we are assigning to the same cluster (keebler) so we can also use our host \"keebler\" to receive the job. Here is what that looks like.\n\n```console\npython ./examples/flux/receive-jobs.py --help\n\n\ud83c\udf08\ufe0f Rainbow scheduler receive jobs\n\noptions:\n  -h, --help            show this help message and exit\n  --max-jobs MAX_JOBS   Maximum jobs to request (unset defaults to all)\n  --config-path CONFIG_PATH\n                        config path with cluster metadata\n```\n\nAnd then request and accept jobs:\n\n```console\npython examples/flux/receive-jobs.py --config-path ./rainbow-config.yaml\nStatus: REQUEST_JOBS_SUCCESS\nReceived 1 jobs to accept...\n```\n\nIf this were running in Flux, we would be able to run it, and the response above has told rainbow that you've accepted it (and rainbow deletes the record of it).\n\n\n## License\n\nHPCIC DevTools is distributed under the terms of the MIT license.\nAll new contributions must be made under this license.\n\nSee [LICENSE](https://github.com/converged-computing/cloud-select/blob/main/LICENSE),\n[COPYRIGHT](https://github.com/converged-computing/cloud-select/blob/main/COPYRIGHT), and\n[NOTICE](https://github.com/converged-computing/cloud-select/blob/main/NOTICE) for details.\n\nSPDX-License-Identifier: (MIT)\n\nLLNL-CODE- 842614\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Python gRPC functions for the Rainbow Scheduler",
    "version": "0.0.16",
    "project_urls": {
        "Homepage": "https://github.com/converged-computing/rainbow/tree/main/python/v1"
    },
    "split_keywords": [
        "multi-cluster",
        " scheduler"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c06187a243027e31b2b0e71327f68dcfc7a0a0e94a24f782e27067c3e27f074f",
                "md5": "8df807246d9df115c4b6244802c60665",
                "sha256": "651894c0c9314d779d5aed800bd6cb00191c836a4698289d662373a543252e48"
            },
            "downloads": -1,
            "filename": "rainbow_scheduler-0.0.16-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "8df807246d9df115c4b6244802c60665",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 27520,
            "upload_time": "2024-05-11T17:25:55",
            "upload_time_iso_8601": "2024-05-11T17:25:55.431101Z",
            "url": "https://files.pythonhosted.org/packages/c0/61/87a243027e31b2b0e71327f68dcfc7a0a0e94a24f782e27067c3e27f074f/rainbow_scheduler-0.0.16-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5a75db05abebbf9f1c2f5376cc23b612995664e3302f6c5b939852346743ec9e",
                "md5": "8d550303ecb96ba8003363f2f4374fd6",
                "sha256": "4f0aed3aa5f12316857f819b6a4576bd9789bf70405d8a027a0c57931236442a"
            },
            "downloads": -1,
            "filename": "rainbow-scheduler-0.0.16.tar.gz",
            "has_sig": false,
            "md5_digest": "8d550303ecb96ba8003363f2f4374fd6",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 19443,
            "upload_time": "2024-05-11T17:25:56",
            "upload_time_iso_8601": "2024-05-11T17:25:56.533464Z",
            "url": "https://files.pythonhosted.org/packages/5a/75/db05abebbf9f1c2f5376cc23b612995664e3302f6c5b939852346743ec9e/rainbow-scheduler-0.0.16.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-11 17:25:56",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "converged-computing",
    "github_project": "rainbow",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "rainbow-scheduler"
}
        
Elapsed time: 0.26718s