xgenius


Namexgenius JSON
Version 0.3.0 PyPI version JSON
download
home_pagehttps://github.com/roger-creus/xgenius
SummaryA tool for managing cluster jobs and configurations
upload_time2024-08-24 18:09:53
maintainerNone
docs_urlNone
authorRoger Creus Castanyer
requires_pythonNone
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # xgenius πŸš€

`xgenius` is a command-line tool for managing remote jobs and containerized experiments across multiple clusters. It simplifies the process of building Docker images, converting them to Singularity format, and submitting jobs to clusters using SLURM.

## Pre-requisites πŸ› οΈ

- You have a working Dockerfile.
- Singularity installed on your local machine.
- Docker installed on your local machine.
- `docker login` works.
- You have access to the clusters you want to run experiments on.
- Your project code is also cloned on the clusters.

## Local Set-up 🧩

### Installation πŸ”§

```bash
pip install xgenius
```

### (Optional) Build Singularity Container from Dockerfile 🐳

```bash
xgenius-build-image --dockerfile=/path/to/Dockerfile \
--name=<output_image_name> \
--tag=<tag> \
--registry=<your_docker_username>
```
where `--dockerfile` is the ABSOLUTE path to your Dockerfile.

This command will build a Docker container, push it to your Docker registry, and then pull it to your local machine as a Singularity image. The Singularity image will be saved in the current directory under the name `<output_image_name>.sif` (the `.sif` extension will be added automatically).

### Define Environment Variable 🌍

First, define the environment variable for the path where SLURM template files will be saved:

```bash
export XGENIUS_TEMPLATES_DIR=/path/to/your/templates
```

**Recommendation:** `export XGENIUS_TEMPLATES_DIR=<your_project_path>/slurm_templates`

**Recommendation:** Use a Conda environment and set:

```bash
conda env config vars set XGENIUS_TEMPLATES_DIR=/path/to/your/templates
```
This way you can have a different `XGENIUS_TEMPLATES_DIR` for each environment/project.

Otherwise, set the environment variable in your `bashrc` or `~/.zshrc` to make it permanent.

### Set-Up Cluster Configuration πŸ—οΈ

Run:

```bash
xgenius-setup-clusters
```

Follow the prompts to configure your cluster settings. You can add as many clusters as you want. Finish by answering 'done' at the end or the config file won’t be saved!

This creates `cluster_config.json` in the current directory.

### Set-Up Run Configuration βš™οΈ

Pass the `cluster_config.json` file path to the following command to create `run_config.json`:

```bash
xgenius-setup-runs path/to/cluster_config.json
```

This creates `run_config.json` with placeholder values. The placeholder values are created according to the associated SLURM template for each cluster in `cluster_config.json`.

You are now all set up! Let’s run some experiments remotely!

**Recommendation:**  If you leave `cluster_config.json` and `run_config.json` in your project directory, running commands will be super easy as you won't need to specify the paths ever again!

## Running Experiments πŸ§ͺ

1. Push your Singularity image to the clusters you want:
    ```bash
    xgenius push-image \
    --image=path/to/singularity_image.sif \
    --clusters=cluster1,cluster2,cluster3
    ```

2. Submit your jobs with:
    ```bash
    xgenius submit_jobs \
    --cluster=cluster1 \
    --run_command="python test.py" \
    --pull_repos
    ```

    Note: The `--pull_repos` flag is optional. It pulls changes from GitHub repositories before running the jobs. Always include it if your code is in a GitHub repository!

Done! Your jobs are now running on the cluster! πŸŽ‰

## Batch jobs

You can also submit batch jobs using a JSON config file:

```json
[
    {
        "command": "python test.py --test-arg1=1 --test-arg2=2",
        "cluster": "cluster1",
    },
    {
        "command": "python test.py --test-arg1=5 --test-arg2=10",
        "cluster": "cluster2",
    }
]
```

And running:

```bash
xgenius-batch-submit --batch-file=/path/to/batch_job.json --pull-repos
```

## Utility Commands πŸ› οΈ

Check the status of your jobs in all clusters in cluster_config.json:

```bash
xgenius-check-jobs
```

Cancel all jobs in all clusters in cluster_config.json:

```bash
xgenius-cancel-jobs
```

Pull the results of your jobs from all clusters in cluster_config.json:

```bash
xgenius-pull-results
```

Remove the output folder in your clusters (useful before running a new batch of experiments)

```bash
xgenius-remove-results
```

## Examples πŸ“

These files are created automatically with the commands above.

### `cluster_config.json`

```bash
[
    {
        "cluster_name": "cluster1",
        "username": "<your_username>",
        "image_path": "<cluster1_scratch_folder>", # the path where the Singularity image will be saved in the cluster
        "project_path": "/path/to/project/code/in/cluster", # the path where your code is in the cluster. same as CODE_DIR_IN_CLUSTER in run_config.json
        "sbatch_template": "slurm_partition_template.sbatch" # the SLURM template file to use for this cluster. see the templates in the XGENIUS_TEMPLATES_DIR directory
    },
    {
        "cluster_name": "cluster2",
        "username": "<your_username>",
        "image_path": "<cluster2_scratch_folder>", 
        "project_path": "/path/to/project/code/in/cluster", 
        "sbatch_template": "slurm_partition_template.sbatch" 
    }
]
```

### `run_config.json`
```bash
{
    "cluster1": {
        "SINGULARITY_COMMAND": "singularity", # or 'apptainer' depending on the cluster
        "NUM_GPUS": "1",
        "IMAGE_NAME": "<your_singularity_image_name>.sif",
        "PARTITION": "<partition_name>",
        "CODE_DIR_IN_CLUSTER": "/path/to/project/code/in/cluster",
        "OUTPUT_DIR_IN_CONTAINER": "/path/to/output/dir/in/container", # set this to the directory where your code writes output
        "TIME": "23:59:00", # for the time limit of the job
        "MODULES_TO_LOAD": "singularity", # or 'apptainer' depending on the cluster + any other modules
        "MEM": "12G", # example RAM memory per CPU
        "OUTPUT_DIR_IN_CLUSTER": "/path/to/cluster/scratch/runs", # your code outputs will be saved here. OUTPUT_DIR_IN_CLUSTER is binded to OUTPUT_DIR_IN_CONTAINER (see the slurm templates)
        "COMMAND": "python test.py", # the code you want to run
        "NUM_CPUS": "12", # example CPUs
        "OUTPUT_FILE": "/path/to/cluster/scratch/slurm-%j.out" # the logs file of the job
    }
}
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/roger-creus/xgenius",
    "name": "xgenius",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": null,
    "author": "Roger Creus Castanyer",
    "author_email": "creus99@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/b1/e9/cba976079a7f9942bfdb1fea27e06d51a1efa5a9d20a6953cd50a5ff7189/xgenius-0.3.0.tar.gz",
    "platform": null,
    "description": "# xgenius \ud83d\ude80\n\n`xgenius` is a command-line tool for managing remote jobs and containerized experiments across multiple clusters. It simplifies the process of building Docker images, converting them to Singularity format, and submitting jobs to clusters using SLURM.\n\n## Pre-requisites \ud83d\udee0\ufe0f\n\n- You have a working Dockerfile.\n- Singularity installed on your local machine.\n- Docker installed on your local machine.\n- `docker login` works.\n- You have access to the clusters you want to run experiments on.\n- Your project code is also cloned on the clusters.\n\n## Local Set-up \ud83e\udde9\n\n### Installation \ud83d\udd27\n\n```bash\npip install xgenius\n```\n\n### (Optional) Build Singularity Container from Dockerfile \ud83d\udc33\n\n```bash\nxgenius-build-image --dockerfile=/path/to/Dockerfile \\\n--name=<output_image_name> \\\n--tag=<tag> \\\n--registry=<your_docker_username>\n```\nwhere `--dockerfile` is the ABSOLUTE path to your Dockerfile.\n\nThis command will build a Docker container, push it to your Docker registry, and then pull it to your local machine as a Singularity image. The Singularity image will be saved in the current directory under the name `<output_image_name>.sif` (the `.sif` extension will be added automatically).\n\n### Define Environment Variable \ud83c\udf0d\n\nFirst, define the environment variable for the path where SLURM template files will be saved:\n\n```bash\nexport XGENIUS_TEMPLATES_DIR=/path/to/your/templates\n```\n\n**Recommendation:** `export XGENIUS_TEMPLATES_DIR=<your_project_path>/slurm_templates`\n\n**Recommendation:** Use a Conda environment and set:\n\n```bash\nconda env config vars set XGENIUS_TEMPLATES_DIR=/path/to/your/templates\n```\nThis way you can have a different `XGENIUS_TEMPLATES_DIR` for each environment/project.\n\nOtherwise, set the environment variable in your `bashrc` or `~/.zshrc` to make it permanent.\n\n### Set-Up Cluster Configuration \ud83c\udfd7\ufe0f\n\nRun:\n\n```bash\nxgenius-setup-clusters\n```\n\nFollow the prompts to configure your cluster settings. You can add as many clusters as you want. Finish by answering 'done' at the end or the config file won\u2019t be saved!\n\nThis creates `cluster_config.json` in the current directory.\n\n### Set-Up Run Configuration \u2699\ufe0f\n\nPass the `cluster_config.json` file path to the following command to create `run_config.json`:\n\n```bash\nxgenius-setup-runs path/to/cluster_config.json\n```\n\nThis creates `run_config.json` with placeholder values. The placeholder values are created according to the associated SLURM template for each cluster in `cluster_config.json`.\n\nYou are now all set up! Let\u2019s run some experiments remotely!\n\n**Recommendation:**  If you leave `cluster_config.json` and `run_config.json` in your project directory, running commands will be super easy as you won't need to specify the paths ever again!\n\n## Running Experiments \ud83e\uddea\n\n1. Push your Singularity image to the clusters you want:\n    ```bash\n    xgenius push-image \\\n    --image=path/to/singularity_image.sif \\\n    --clusters=cluster1,cluster2,cluster3\n    ```\n\n2. Submit your jobs with:\n    ```bash\n    xgenius submit_jobs \\\n    --cluster=cluster1 \\\n    --run_command=\"python test.py\" \\\n    --pull_repos\n    ```\n\n    Note: The `--pull_repos` flag is optional. It pulls changes from GitHub repositories before running the jobs. Always include it if your code is in a GitHub repository!\n\nDone! Your jobs are now running on the cluster! \ud83c\udf89\n\n## Batch jobs\n\nYou can also submit batch jobs using a JSON config file:\n\n```json\n[\n    {\n        \"command\": \"python test.py --test-arg1=1 --test-arg2=2\",\n        \"cluster\": \"cluster1\",\n    },\n    {\n        \"command\": \"python test.py --test-arg1=5 --test-arg2=10\",\n        \"cluster\": \"cluster2\",\n    }\n]\n```\n\nAnd running:\n\n```bash\nxgenius-batch-submit --batch-file=/path/to/batch_job.json --pull-repos\n```\n\n## Utility Commands \ud83d\udee0\ufe0f\n\nCheck the status of your jobs in all clusters in cluster_config.json:\n\n```bash\nxgenius-check-jobs\n```\n\nCancel all jobs in all clusters in cluster_config.json:\n\n```bash\nxgenius-cancel-jobs\n```\n\nPull the results of your jobs from all clusters in cluster_config.json:\n\n```bash\nxgenius-pull-results\n```\n\nRemove the output folder in your clusters (useful before running a new batch of experiments)\n\n```bash\nxgenius-remove-results\n```\n\n## Examples \ud83d\udcdd\n\nThese files are created automatically with the commands above.\n\n### `cluster_config.json`\n\n```bash\n[\n    {\n        \"cluster_name\": \"cluster1\",\n        \"username\": \"<your_username>\",\n        \"image_path\": \"<cluster1_scratch_folder>\", # the path where the Singularity image will be saved in the cluster\n        \"project_path\": \"/path/to/project/code/in/cluster\", # the path where your code is in the cluster. same as CODE_DIR_IN_CLUSTER in run_config.json\n        \"sbatch_template\": \"slurm_partition_template.sbatch\" # the SLURM template file to use for this cluster. see the templates in the XGENIUS_TEMPLATES_DIR directory\n    },\n    {\n        \"cluster_name\": \"cluster2\",\n        \"username\": \"<your_username>\",\n        \"image_path\": \"<cluster2_scratch_folder>\", \n        \"project_path\": \"/path/to/project/code/in/cluster\", \n        \"sbatch_template\": \"slurm_partition_template.sbatch\" \n    }\n]\n```\n\n### `run_config.json`\n```bash\n{\n    \"cluster1\": {\n        \"SINGULARITY_COMMAND\": \"singularity\", # or 'apptainer' depending on the cluster\n        \"NUM_GPUS\": \"1\",\n        \"IMAGE_NAME\": \"<your_singularity_image_name>.sif\",\n        \"PARTITION\": \"<partition_name>\",\n        \"CODE_DIR_IN_CLUSTER\": \"/path/to/project/code/in/cluster\",\n        \"OUTPUT_DIR_IN_CONTAINER\": \"/path/to/output/dir/in/container\", # set this to the directory where your code writes output\n        \"TIME\": \"23:59:00\", # for the time limit of the job\n        \"MODULES_TO_LOAD\": \"singularity\", # or 'apptainer' depending on the cluster + any other modules\n        \"MEM\": \"12G\", # example RAM memory per CPU\n        \"OUTPUT_DIR_IN_CLUSTER\": \"/path/to/cluster/scratch/runs\", # your code outputs will be saved here. OUTPUT_DIR_IN_CLUSTER is binded to OUTPUT_DIR_IN_CONTAINER (see the slurm templates)\n        \"COMMAND\": \"python test.py\", # the code you want to run\n        \"NUM_CPUS\": \"12\", # example CPUs\n        \"OUTPUT_FILE\": \"/path/to/cluster/scratch/slurm-%j.out\" # the logs file of the job\n    }\n}\n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A tool for managing cluster jobs and configurations",
    "version": "0.3.0",
    "project_urls": {
        "Homepage": "https://github.com/roger-creus/xgenius"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8a9e0d919e673e72842ad2c3f2223c92c45b2e7a4a6b852907bb24f7c3bb009e",
                "md5": "e721e22bdab6266941578a3de1d26b30",
                "sha256": "3e0e0b27f986f4e08fe714be7117488af82534afe53872fdea62a2ee0a459360"
            },
            "downloads": -1,
            "filename": "xgenius-0.3.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e721e22bdab6266941578a3de1d26b30",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 16815,
            "upload_time": "2024-08-24T18:09:52",
            "upload_time_iso_8601": "2024-08-24T18:09:52.728593Z",
            "url": "https://files.pythonhosted.org/packages/8a/9e/0d919e673e72842ad2c3f2223c92c45b2e7a4a6b852907bb24f7c3bb009e/xgenius-0.3.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b1e9cba976079a7f9942bfdb1fea27e06d51a1efa5a9d20a6953cd50a5ff7189",
                "md5": "51efce2abbfe1a9b3362081ac751013b",
                "sha256": "2ce7b8b253c688593a4b57145bd953380711233b7ffe945da9dbee91abc3722e"
            },
            "downloads": -1,
            "filename": "xgenius-0.3.0.tar.gz",
            "has_sig": false,
            "md5_digest": "51efce2abbfe1a9b3362081ac751013b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 13210,
            "upload_time": "2024-08-24T18:09:53",
            "upload_time_iso_8601": "2024-08-24T18:09:53.889856Z",
            "url": "https://files.pythonhosted.org/packages/b1/e9/cba976079a7f9942bfdb1fea27e06d51a1efa5a9d20a6953cd50a5ff7189/xgenius-0.3.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-24 18:09:53",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "roger-creus",
    "github_project": "xgenius",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "xgenius"
}
        
Elapsed time: 0.29259s