clearml-agent


Nameclearml-agent JSON
Version 1.8.0 PyPI version JSON
download
home_pagehttps://github.com/allegroai/clearml-agent
SummaryClearML Agent - Auto-Magical DevOps for Deep Learning
upload_time2024-04-02 13:43:21
maintainerNone
docs_urlNone
authorAllegroai
requires_pythonNone
licenseApache License 2.0
keywords clearml trains devops machine deep learning agent automation hpc cluster
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <div align="center">

<img src="https://github.com/allegroai/clearml-agent/blob/master/docs/clearml_agent_logo.png?raw=true" width="250px">

**ClearML Agent - MLOps/LLMOps made easy  
MLOps/LLMOps scheduler & orchestration solution supporting Linux, macOS and Windows**

[![GitHub license](https://img.shields.io/github/license/allegroai/clearml-agent.svg)](https://img.shields.io/github/license/allegroai/clearml-agent.svg)
[![PyPI pyversions](https://img.shields.io/pypi/pyversions/clearml-agent.svg)](https://img.shields.io/pypi/pyversions/clearml-agent.svg)
[![PyPI version shields.io](https://img.shields.io/pypi/v/clearml-agent.svg)](https://img.shields.io/pypi/v/clearml-agent.svg)
[![PyPI Downloads](https://pepy.tech/badge/clearml-agent/month)](https://pypi.org/project/clearml-agent/)
[![Artifact Hub](https://img.shields.io/endpoint?url=https://artifacthub.io/badge/repository/allegroai)](https://artifacthub.io/packages/search?repo=allegroai)

`🌟 ClearML is open-source - Leave a star to support the project! 🌟`

</div>

---

### ClearML-Agent

#### *Formerly known as Trains Agent*

* Run jobs (experiments) on any local or cloud based resource
* Implement optimized resource utilization policies
* Deploy execution environments with either virtualenv or fully docker containerized with zero effort
* Launch-and-Forget service containers
* [Cloud autoscaling](https://clear.ml/docs/latest/docs/guides/services/aws_autoscaler)
* [Customizable cleanup](https://clear.ml/docs/latest/docs/guides/services/cleanup_service)
* Advanced [pipeline building and execution](https://clear.ml/docs/latest/docs/guides/frameworks/pytorch/notebooks/table/tabular_training_pipeline)

It is a zero configuration fire-and-forget execution agent, providing a full ML/DL cluster solution.

**Full Automation in 5 steps**

1. ClearML Server [self-hosted](https://github.com/allegroai/clearml-server)
   or [free tier hosting](https://app.clear.ml)
2. `pip install clearml-agent` ([install](#installing-the-clearml-agent) the ClearML Agent on any GPU machine:
   on-premises / cloud / ...)
3. Create a [job](https://clear.ml/docs/latest/docs/apps/clearml_task) or
   add [ClearML](https://github.com/allegroai/clearml) to your code with just 2 lines of code
4. Change the [parameters](#using-the-clearml-agent) in the UI & schedule for [execution](#using-the-clearml-agent) (or
   automate with an [AutoML pipeline](#automl-and-orchestration-pipelines-))
5. :chart_with_downwards_trend: :chart_with_upwards_trend: :eyes:  :beer:

"All the Deep/Machine-Learning DevOps your research needs, and then some... Because ain't nobody got time for that"

**Try ClearML now** [Self Hosted](https://github.com/allegroai/clearml-server)
or [Free tier Hosting](https://app.clear.ml)
<a href="https://app.clear.ml"><img src="https://github.com/allegroai/clearml-agent/blob/master/docs/screenshots.gif?raw=true" width="100%"></a>

### Simple, Flexible Experiment Orchestration

**The ClearML Agent was built to address the DL/ML R&D DevOps needs:**

* Easily add & remove machines from the cluster
* Reuse machines without the need for any dedicated containers or images
* **Combine GPU resources across any cloud and on-prem**
* **No need for yaml / json / template configuration of any kind**
* **User friendly UI**
* Manageable resource allocation that can be used by researchers and engineers
* Flexible and controllable scheduler with priority support
* Automatic instance spinning in the cloud

**Using the ClearML Agent, you can now set up a dynamic cluster with \*epsilon DevOps**

*epsilon - Because we are :triangular_ruler: and nothing is really zero work

### Kubernetes Integration (Optional)

We think Kubernetes is awesome, but it is not a must to get started with remote execution agents and cluster management.
We designed `clearml-agent` so you can run both bare-metal and on top of Kubernetes, in any combination that fits your environment.

You can find the Dockerfiles in the [docker folder](./docker) and the helm Chart in https://github.com/allegroai/clearml-helm-charts

#### Benefits of integrating existing Kubernetes cluster with ClearML

- ClearML-Agent adds the missing scheduling capabilities to your Kubernetes cluster
- Users do not need to have direct Kubernetes access!
- Easy learning curve with UI and CLI requiring no DevOps knowledge from end users
- Unlike other solutions, ClearML-Agents work in tandem with other customers of your Kubernetes cluster 
- Allows for more flexible automation from code, building pipelines and visibility
- A programmatic interface for easy CI/CD workflows, enabling GitOps to trigger jobs inside your cluster
- Seamless integration with the ClearML ML/DL/GenAI experiment manager
- Web UI for customization, scheduling & prioritization of jobs
- **Enterprise Features**: RBAC, vault, multi-tenancy, scheduler, quota management, fractional GPU support 

**Run the agent in Kubernetes Glue mode an map ClearML jobs directly to K8s jobs:**
- Use the [ClearML Agent Helm Chart](https://github.com/allegroai/clearml-helm-charts/tree/main/charts/clearml-agent) to spin an agent pod acting as a controller
  - Or run the [clearml-k8s glue](https://github.com/allegroai/clearml-agent/blob/master/examples/k8s_glue_example.py) on
    a Kubernetes cpu node
- The clearml-k8s glue pulls jobs from the ClearML job execution queue and prepares a Kubernetes job (based on provided
  yaml template)
- Inside each pod the clearml-agent will install the job (experiment) environment and spin and monitor the
  experiment's process, fully visible in the clearml UI
- Benefits: Kubernetes full view of all running jobs in the system
- **Enterprise Features**
  - Full scheduler features added on Top of Kubernetes, with quota/over-quota management, priorities and order.
  - Fractional GPU support, allowing multiple isolated containers sharing the same GPU with memory/compute limit per container 

### SLURM (Optional)

Yes! Slurm integration is available, check the [documentation](https://clear.ml/docs/latest/docs/clearml_agent/#slurm) for further details 

### Using the ClearML Agent

**Full scale HPC with a click of a button**

The ClearML Agent is a job scheduler that listens on job queue(s), pulls jobs, sets the job environments, executes the
job and monitors its progress.

Any 'Draft' experiment can be scheduled for execution by a ClearML agent.

A previously run experiment can be put into 'Draft' state by either of two methods:

* Using the **'Reset'** action from the experiment right-click context menu in the ClearML UI - This will clear any
  results and artifacts the previous run had created.
* Using the **'Clone'** action from the experiment right-click context menu in the ClearML UI - This will create a new 
  'Draft' experiment with the same configuration as the original experiment.

An experiment is scheduled for execution using the **'Enqueue'** action from the experiment right-click context menu in
the ClearML UI and selecting the execution queue.

See [creating an experiment and enqueuing it for execution](#from-scratch).

Once an experiment is enqueued, it will be picked up and executed by a ClearML Agent monitoring this queue.

The ClearML UI Workers & Queues page provides ongoing execution information:

- Workers Tab: Monitor you cluster
    - Review available resources
    - Monitor machines statistics (CPU / GPU / Disk / Network)
- Queues Tab:
    - Control the scheduling order of jobs
    - Cancel or abort job execution
    - Move jobs between execution queues

#### What The ClearML Agent Actually Does

The ClearML Agent executes experiments using the following process:

- Create a new virtual environment (or launch the selected docker image)
- Clone the code into the virtual-environment (or inside the docker)
- Install python packages based on the package requirements listed for the experiment
    - Special note for PyTorch: The ClearML Agent will automatically select the torch packages based on the CUDA_VERSION
      environment variable of the machine
- Execute the code, while monitoring the process
- Log all stdout/stderr in the ClearML UI, including the cloning and installation process, for easy debugging
- Monitor the execution and allow you to manually abort the job using the ClearML UI (or, in the unfortunate case of a
  code crash, catch the error and signal the experiment has failed)

#### System Design & Flow

<img src="https://github.com/allegroai/clearml-agent/blob/master/docs/clearml_architecture.png" width="100%" alt="clearml-architecture">

#### Installing the ClearML Agent

```bash
pip install clearml-agent
```

#### ClearML Agent Usage Examples

Full Interface and capabilities are available with

```bash
clearml-agent --help
clearml-agent daemon --help
```

#### Configuring the ClearML Agent

```bash
clearml-agent init
```

Note: The ClearML Agent uses a cache folder to cache pip packages, apt packages and cloned repositories. The default
ClearML Agent cache folder is `~/.clearml`.

See full details in your configuration file at `~/clearml.conf`.

Note: The **ClearML Agent** extends the **ClearML** configuration file `~/clearml.conf`.
They are designed to share the same configuration file, see example [here](docs/clearml.conf)

#### Running the ClearML Agent

For debug and experimentation, start the ClearML agent in `foreground` mode, where all the output is printed to screen:

```bash
clearml-agent daemon --queue default --foreground
```

For actual service mode, all the stdout will be stored automatically into a temporary file (no need to pipe).
Notice: with `--detached` flag, the *clearml-agent* will be running in the background

```bash
clearml-agent daemon --detached --queue default
```

GPU allocation is controlled via the standard OS environment `NVIDIA_VISIBLE_DEVICES` or `--gpus` flag (or disabled
with `--cpu-only`).

If no flag is set, and `NVIDIA_VISIBLE_DEVICES` variable doesn't exist, all GPUs will be allocated for
the `clearml-agent`. <br>
If `--cpu-only` flag is set, or `NVIDIA_VISIBLE_DEVICES="none"`, no gpu will be allocated for
the `clearml-agent`.

Example: spin two agents, one per GPU on the same machine:

Notice: with `--detached` flag, the *clearml-agent* will run in the background

```bash
clearml-agent daemon --detached --gpus 0 --queue default
clearml-agent daemon --detached --gpus 1 --queue default
```

Example: spin two agents, pulling from dedicated `dual_gpu` queue, two GPUs per agent

```bash
clearml-agent daemon --detached --gpus 0,1 --queue dual_gpu
clearml-agent daemon --detached --gpus 2,3 --queue dual_gpu
```

##### Starting the ClearML Agent in docker mode

For debug and experimentation, start the ClearML agent in `foreground` mode, where all the output is printed to screen

```bash
clearml-agent daemon --queue default --docker --foreground
```

For actual service mode, all the stdout will be stored automatically into a file (no need to pipe).
Notice: with `--detached` flag, the *clearml-agent* will run in the background

```bash
clearml-agent daemon --detached --queue default --docker
```

Example: spin two agents, one per gpu on the same machine, with default `nvidia/cuda:11.0.3-cudnn8-runtime-ubuntu20.04`
docker:

```bash
clearml-agent daemon --detached --gpus 0 --queue default --docker nvidia/cuda:11.0.3-cudnn8-runtime-ubuntu20.04
clearml-agent daemon --detached --gpus 1 --queue default --docker nvidia/cuda:11.0.3-cudnn8-runtime-ubuntu20.04
```

Example: spin two agents, pulling from dedicated `dual_gpu` queue, two GPUs per agent, with default 
`nvidia/cuda:11.0.3-cudnn8-runtime-ubuntu20.04` docker:

```bash
clearml-agent daemon --detached --gpus 0,1 --queue dual_gpu --docker nvidia/cuda:11.0.3-cudnn8-runtime-ubuntu20.04
clearml-agent daemon --detached --gpus 2,3 --queue dual_gpu --docker nvidia/cuda:11.0.3-cudnn8-runtime-ubuntu20.04
```

##### Starting the ClearML Agent - Priority Queues

Priority Queues are also supported, example use case:

High priority queue: `important_jobs`, low priority queue: `default`

```bash
clearml-agent daemon --queue important_jobs default
```

The **ClearML Agent** will first try to pull jobs from the `important_jobs` queue, and only if it is empty, the agent 
will try to pull from the `default` queue.

Adding queues, managing job order within a queue, and moving jobs between queues, is available using the Web UI, see
example on our [free server](https://app.clear.ml/workers-and-queues/queues)

##### Stopping the ClearML Agent

To stop a **ClearML Agent** running in the background, run the same command line used to start the agent with `--stop`
appended. For example, to stop the first of the above shown same machine, single gpu agents:

```bash
clearml-agent daemon --detached --gpus 0 --queue default --docker nvidia/cuda:11.0.3-cudnn8-runtime-ubuntu20.04 --stop
```

### How do I create an experiment on the ClearML Server? <a name="from-scratch"></a>

* Integrate [ClearML](https://github.com/allegroai/clearml) with your code
* Execute the code on your machine (Manually / PyCharm / Jupyter Notebook)
* As your code is running, **ClearML** creates an experiment logging all the necessary execution information:
    - Git repository link and commit ID (or an entire jupyter notebook)
    - Git diff (we’re not saying you never commit and push, but still...)
    - Python packages used by your code (including specific versions used)
    - Hyperparameters
    - Input artifacts

  You now have a 'template' of your experiment with everything required for automated execution

* In the ClearML UI, right-click on the experiment and select 'clone'. A copy of your experiment will be created.
* You now have a new draft experiment cloned from your original experiment, feel free to edit it
    - Change the hyperparameters
    - Switch to the latest code base of the repository
    - Update package versions
    - Select a specific docker image to run in (see docker execution mode section)
    - Or simply change nothing to run the same experiment again...
* Schedule the newly created experiment for execution: right-click the experiment and select 'enqueue'

### ClearML-Agent Services Mode <a name="services"></a>

ClearML-Agent Services is a special mode of ClearML-Agent that provides the ability to launch long-lasting jobs that
previously had to be executed on local / dedicated machines. It allows a single agent to launch multiple dockers (Tasks)
for different use cases: 
* Auto-scaler service (spinning instances when the need arises and the budget allows)
* Controllers (Implementing pipelines and more sophisticated DevOps logic)
* Optimizer (such as Hyperparameter Optimization or sweeping)
* Application (such as interactive Bokeh apps for increased data transparency)

ClearML-Agent Services mode will spin **any** task enqueued into the specified queue. Every task launched by
ClearML-Agent Services will be registered as a new node in the system, providing tracking and transparency capabilities.
Currently, clearml-agent in services-mode supports CPU only configuration. ClearML-Agent services mode can be launched
alongside GPU agents.

```bash
clearml-agent daemon --services-mode --detached --queue services --create-queue --docker ubuntu:18.04 --cpu-only
```

**Note**: It is the user's responsibility to make sure the proper tasks are pushed into the specified queue.

### AutoML and Orchestration Pipelines <a name="automl-pipes"></a>

The ClearML Agent can also be used to implement AutoML orchestration and Experiment Pipelines in conjunction with the
ClearML package.

Sample AutoML & Orchestration examples can be found in the
ClearML [example/automation](https://github.com/allegroai/clearml/tree/master/examples/automation) folder.

AutoML examples:

- [Toy Keras training experiment](https://github.com/allegroai/clearml/blob/master/examples/optimization/hyper-parameter-optimization/base_template_keras_simple.py)
    - In order to create an experiment-template in the system, this code must be executed once manually
- [Random Search over the above Keras experiment-template](https://github.com/allegroai/clearml/blob/master/examples/automation/manual_random_param_search_example.py)
    - This example will create multiple copies of the Keras experiment-template, with different hyperparameter
      combinations

Experiment Pipeline examples:

- [First step experiment](https://github.com/allegroai/clearml/blob/master/examples/automation/task_piping_example.py)
    - This example will "process data", and once done, will launch a copy of the 'second step' experiment-template
- [Second step experiment](https://github.com/allegroai/clearml/blob/master/examples/automation/toy_base_task.py)
    - In order to create an experiment-template in the system, this code must be executed once manually

### License

Apache License, Version 2.0 (see the [LICENSE](https://www.apache.org/licenses/LICENSE-2.0.html) for more information)



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/allegroai/clearml-agent",
    "name": "clearml-agent",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "clearml trains devops machine deep learning agent automation hpc cluster",
    "author": "Allegroai",
    "author_email": "clearml@allegro.ai",
    "download_url": null,
    "platform": null,
    "description": "<div align=\"center\">\n\n<img src=\"https://github.com/allegroai/clearml-agent/blob/master/docs/clearml_agent_logo.png?raw=true\" width=\"250px\">\n\n**ClearML Agent - MLOps/LLMOps made easy  \nMLOps/LLMOps scheduler & orchestration solution supporting Linux, macOS and Windows**\n\n[![GitHub license](https://img.shields.io/github/license/allegroai/clearml-agent.svg)](https://img.shields.io/github/license/allegroai/clearml-agent.svg)\n[![PyPI pyversions](https://img.shields.io/pypi/pyversions/clearml-agent.svg)](https://img.shields.io/pypi/pyversions/clearml-agent.svg)\n[![PyPI version shields.io](https://img.shields.io/pypi/v/clearml-agent.svg)](https://img.shields.io/pypi/v/clearml-agent.svg)\n[![PyPI Downloads](https://pepy.tech/badge/clearml-agent/month)](https://pypi.org/project/clearml-agent/)\n[![Artifact Hub](https://img.shields.io/endpoint?url=https://artifacthub.io/badge/repository/allegroai)](https://artifacthub.io/packages/search?repo=allegroai)\n\n`\ud83c\udf1f ClearML is open-source - Leave a star to support the project! \ud83c\udf1f`\n\n</div>\n\n---\n\n### ClearML-Agent\n\n#### *Formerly known as Trains Agent*\n\n* Run jobs (experiments) on any local or cloud based resource\n* Implement optimized resource utilization policies\n* Deploy execution environments with either virtualenv or fully docker containerized with zero effort\n* Launch-and-Forget service containers\n* [Cloud autoscaling](https://clear.ml/docs/latest/docs/guides/services/aws_autoscaler)\n* [Customizable cleanup](https://clear.ml/docs/latest/docs/guides/services/cleanup_service)\n* Advanced [pipeline building and execution](https://clear.ml/docs/latest/docs/guides/frameworks/pytorch/notebooks/table/tabular_training_pipeline)\n\nIt is a zero configuration fire-and-forget execution agent, providing a full ML/DL cluster solution.\n\n**Full Automation in 5 steps**\n\n1. ClearML Server [self-hosted](https://github.com/allegroai/clearml-server)\n   or [free tier hosting](https://app.clear.ml)\n2. `pip install clearml-agent` ([install](#installing-the-clearml-agent) the ClearML Agent on any GPU machine:\n   on-premises / cloud / ...)\n3. Create a [job](https://clear.ml/docs/latest/docs/apps/clearml_task) or\n   add [ClearML](https://github.com/allegroai/clearml) to your code with just 2 lines of code\n4. Change the [parameters](#using-the-clearml-agent) in the UI & schedule for [execution](#using-the-clearml-agent) (or\n   automate with an [AutoML pipeline](#automl-and-orchestration-pipelines-))\n5. :chart_with_downwards_trend: :chart_with_upwards_trend: :eyes:  :beer:\n\n\"All the Deep/Machine-Learning DevOps your research needs, and then some... Because ain't nobody got time for that\"\n\n**Try ClearML now** [Self Hosted](https://github.com/allegroai/clearml-server)\nor [Free tier Hosting](https://app.clear.ml)\n<a href=\"https://app.clear.ml\"><img src=\"https://github.com/allegroai/clearml-agent/blob/master/docs/screenshots.gif?raw=true\" width=\"100%\"></a>\n\n### Simple, Flexible Experiment Orchestration\n\n**The ClearML Agent was built to address the DL/ML R&D DevOps needs:**\n\n* Easily add & remove machines from the cluster\n* Reuse machines without the need for any dedicated containers or images\n* **Combine GPU resources across any cloud and on-prem**\n* **No need for yaml / json / template configuration of any kind**\n* **User friendly UI**\n* Manageable resource allocation that can be used by researchers and engineers\n* Flexible and controllable scheduler with priority support\n* Automatic instance spinning in the cloud\n\n**Using the ClearML Agent, you can now set up a dynamic cluster with \\*epsilon DevOps**\n\n*epsilon - Because we are :triangular_ruler: and nothing is really zero work\n\n### Kubernetes Integration (Optional)\n\nWe think Kubernetes is awesome, but it is not a must to get started with remote execution agents and cluster management.\nWe designed `clearml-agent` so you can run both bare-metal and on top of Kubernetes, in any combination that fits your environment.\n\nYou can find the Dockerfiles in the [docker folder](./docker) and the helm Chart in https://github.com/allegroai/clearml-helm-charts\n\n#### Benefits of integrating existing Kubernetes cluster with ClearML\n\n- ClearML-Agent adds the missing scheduling capabilities to your Kubernetes cluster\n- Users do not need to have direct Kubernetes access!\n- Easy learning curve with UI and CLI requiring no DevOps knowledge from end users\n- Unlike other solutions, ClearML-Agents work in tandem with other customers of your Kubernetes cluster \n- Allows for more flexible automation from code, building pipelines and visibility\n- A programmatic interface for easy CI/CD workflows, enabling GitOps to trigger jobs inside your cluster\n- Seamless integration with the ClearML ML/DL/GenAI experiment manager\n- Web UI for customization, scheduling & prioritization of jobs\n- **Enterprise Features**: RBAC, vault, multi-tenancy, scheduler, quota management, fractional GPU support \n\n**Run the agent in Kubernetes Glue mode an map ClearML jobs directly to K8s jobs:**\n- Use the [ClearML Agent Helm Chart](https://github.com/allegroai/clearml-helm-charts/tree/main/charts/clearml-agent) to spin an agent pod acting as a controller\n  - Or run the [clearml-k8s glue](https://github.com/allegroai/clearml-agent/blob/master/examples/k8s_glue_example.py) on\n    a Kubernetes cpu node\n- The clearml-k8s glue pulls jobs from the ClearML job execution queue and prepares a Kubernetes job (based on provided\n  yaml template)\n- Inside each pod the clearml-agent will install the job (experiment) environment and spin and monitor the\n  experiment's process, fully visible in the clearml UI\n- Benefits: Kubernetes full view of all running jobs in the system\n- **Enterprise Features**\n  - Full scheduler features added on Top of Kubernetes, with quota/over-quota management, priorities and order.\n  - Fractional GPU support, allowing multiple isolated containers sharing the same GPU with memory/compute limit per container \n\n### SLURM (Optional)\n\nYes! Slurm integration is available, check the [documentation](https://clear.ml/docs/latest/docs/clearml_agent/#slurm) for further details \n\n### Using the ClearML Agent\n\n**Full scale HPC with a click of a button**\n\nThe ClearML Agent is a job scheduler that listens on job queue(s), pulls jobs, sets the job environments, executes the\njob and monitors its progress.\n\nAny 'Draft' experiment can be scheduled for execution by a ClearML agent.\n\nA previously run experiment can be put into 'Draft' state by either of two methods:\n\n* Using the **'Reset'** action from the experiment right-click context menu in the ClearML UI - This will clear any\n  results and artifacts the previous run had created.\n* Using the **'Clone'** action from the experiment right-click context menu in the ClearML UI - This will create a new \n  'Draft' experiment with the same configuration as the original experiment.\n\nAn experiment is scheduled for execution using the **'Enqueue'** action from the experiment right-click context menu in\nthe ClearML UI and selecting the execution queue.\n\nSee [creating an experiment and enqueuing it for execution](#from-scratch).\n\nOnce an experiment is enqueued, it will be picked up and executed by a ClearML Agent monitoring this queue.\n\nThe ClearML UI Workers & Queues page provides ongoing execution information:\n\n- Workers Tab: Monitor you cluster\n    - Review available resources\n    - Monitor machines statistics (CPU / GPU / Disk / Network)\n- Queues Tab:\n    - Control the scheduling order of jobs\n    - Cancel or abort job execution\n    - Move jobs between execution queues\n\n#### What The ClearML Agent Actually Does\n\nThe ClearML Agent executes experiments using the following process:\n\n- Create a new virtual environment (or launch the selected docker image)\n- Clone the code into the virtual-environment (or inside the docker)\n- Install python packages based on the package requirements listed for the experiment\n    - Special note for PyTorch: The ClearML Agent will automatically select the torch packages based on the CUDA_VERSION\n      environment variable of the machine\n- Execute the code, while monitoring the process\n- Log all stdout/stderr in the ClearML UI, including the cloning and installation process, for easy debugging\n- Monitor the execution and allow you to manually abort the job using the ClearML UI (or, in the unfortunate case of a\n  code crash, catch the error and signal the experiment has failed)\n\n#### System Design & Flow\n\n<img src=\"https://github.com/allegroai/clearml-agent/blob/master/docs/clearml_architecture.png\" width=\"100%\" alt=\"clearml-architecture\">\n\n#### Installing the ClearML Agent\n\n```bash\npip install clearml-agent\n```\n\n#### ClearML Agent Usage Examples\n\nFull Interface and capabilities are available with\n\n```bash\nclearml-agent --help\nclearml-agent daemon --help\n```\n\n#### Configuring the ClearML Agent\n\n```bash\nclearml-agent init\n```\n\nNote: The ClearML Agent uses a cache folder to cache pip packages, apt packages and cloned repositories. The default\nClearML Agent cache folder is `~/.clearml`.\n\nSee full details in your configuration file at `~/clearml.conf`.\n\nNote: The **ClearML Agent** extends the **ClearML** configuration file `~/clearml.conf`.\nThey are designed to share the same configuration file, see example [here](docs/clearml.conf)\n\n#### Running the ClearML Agent\n\nFor debug and experimentation, start the ClearML agent in `foreground` mode, where all the output is printed to screen:\n\n```bash\nclearml-agent daemon --queue default --foreground\n```\n\nFor actual service mode, all the stdout will be stored automatically into a temporary file (no need to pipe).\nNotice: with `--detached` flag, the *clearml-agent* will be running in the background\n\n```bash\nclearml-agent daemon --detached --queue default\n```\n\nGPU allocation is controlled via the standard OS environment `NVIDIA_VISIBLE_DEVICES` or `--gpus` flag (or disabled\nwith `--cpu-only`).\n\nIf no flag is set, and `NVIDIA_VISIBLE_DEVICES` variable doesn't exist, all GPUs will be allocated for\nthe `clearml-agent`. <br>\nIf `--cpu-only` flag is set, or `NVIDIA_VISIBLE_DEVICES=\"none\"`, no gpu will be allocated for\nthe `clearml-agent`.\n\nExample: spin two agents, one per GPU on the same machine:\n\nNotice: with `--detached` flag, the *clearml-agent* will run in the background\n\n```bash\nclearml-agent daemon --detached --gpus 0 --queue default\nclearml-agent daemon --detached --gpus 1 --queue default\n```\n\nExample: spin two agents, pulling from dedicated `dual_gpu` queue, two GPUs per agent\n\n```bash\nclearml-agent daemon --detached --gpus 0,1 --queue dual_gpu\nclearml-agent daemon --detached --gpus 2,3 --queue dual_gpu\n```\n\n##### Starting the ClearML Agent in docker mode\n\nFor debug and experimentation, start the ClearML agent in `foreground` mode, where all the output is printed to screen\n\n```bash\nclearml-agent daemon --queue default --docker --foreground\n```\n\nFor actual service mode, all the stdout will be stored automatically into a file (no need to pipe).\nNotice: with `--detached` flag, the *clearml-agent* will run in the background\n\n```bash\nclearml-agent daemon --detached --queue default --docker\n```\n\nExample: spin two agents, one per gpu on the same machine, with default `nvidia/cuda:11.0.3-cudnn8-runtime-ubuntu20.04`\ndocker:\n\n```bash\nclearml-agent daemon --detached --gpus 0 --queue default --docker nvidia/cuda:11.0.3-cudnn8-runtime-ubuntu20.04\nclearml-agent daemon --detached --gpus 1 --queue default --docker nvidia/cuda:11.0.3-cudnn8-runtime-ubuntu20.04\n```\n\nExample: spin two agents, pulling from dedicated `dual_gpu` queue, two GPUs per agent, with default \n`nvidia/cuda:11.0.3-cudnn8-runtime-ubuntu20.04` docker:\n\n```bash\nclearml-agent daemon --detached --gpus 0,1 --queue dual_gpu --docker nvidia/cuda:11.0.3-cudnn8-runtime-ubuntu20.04\nclearml-agent daemon --detached --gpus 2,3 --queue dual_gpu --docker nvidia/cuda:11.0.3-cudnn8-runtime-ubuntu20.04\n```\n\n##### Starting the ClearML Agent - Priority Queues\n\nPriority Queues are also supported, example use case:\n\nHigh priority queue: `important_jobs`, low priority queue: `default`\n\n```bash\nclearml-agent daemon --queue important_jobs default\n```\n\nThe **ClearML Agent** will first try to pull jobs from the `important_jobs` queue, and only if it is empty, the agent \nwill try to pull from the `default` queue.\n\nAdding queues, managing job order within a queue, and moving jobs between queues, is available using the Web UI, see\nexample on our [free server](https://app.clear.ml/workers-and-queues/queues)\n\n##### Stopping the ClearML Agent\n\nTo stop a **ClearML Agent** running in the background, run the same command line used to start the agent with `--stop`\nappended. For example, to stop the first of the above shown same machine, single gpu agents:\n\n```bash\nclearml-agent daemon --detached --gpus 0 --queue default --docker nvidia/cuda:11.0.3-cudnn8-runtime-ubuntu20.04 --stop\n```\n\n### How do I create an experiment on the ClearML Server? <a name=\"from-scratch\"></a>\n\n* Integrate [ClearML](https://github.com/allegroai/clearml) with your code\n* Execute the code on your machine (Manually / PyCharm / Jupyter Notebook)\n* As your code is running, **ClearML** creates an experiment logging all the necessary execution information:\n    - Git repository link and commit ID (or an entire jupyter notebook)\n    - Git diff (we\u2019re not saying you never commit and push, but still...)\n    - Python packages used by your code (including specific versions used)\n    - Hyperparameters\n    - Input artifacts\n\n  You now have a 'template' of your experiment with everything required for automated execution\n\n* In the ClearML UI, right-click on the experiment and select 'clone'. A copy of your experiment will be created.\n* You now have a new draft experiment cloned from your original experiment, feel free to edit it\n    - Change the hyperparameters\n    - Switch to the latest code base of the repository\n    - Update package versions\n    - Select a specific docker image to run in (see docker execution mode section)\n    - Or simply change nothing to run the same experiment again...\n* Schedule the newly created experiment for execution: right-click the experiment and select 'enqueue'\n\n### ClearML-Agent Services Mode <a name=\"services\"></a>\n\nClearML-Agent Services is a special mode of ClearML-Agent that provides the ability to launch long-lasting jobs that\npreviously had to be executed on local / dedicated machines. It allows a single agent to launch multiple dockers (Tasks)\nfor different use cases: \n* Auto-scaler service (spinning instances when the need arises and the budget allows)\n* Controllers (Implementing pipelines and more sophisticated DevOps logic)\n* Optimizer (such as Hyperparameter Optimization or sweeping)\n* Application (such as interactive Bokeh apps for increased data transparency)\n\nClearML-Agent Services mode will spin **any** task enqueued into the specified queue. Every task launched by\nClearML-Agent Services will be registered as a new node in the system, providing tracking and transparency capabilities.\nCurrently, clearml-agent in services-mode supports CPU only configuration. ClearML-Agent services mode can be launched\nalongside GPU agents.\n\n```bash\nclearml-agent daemon --services-mode --detached --queue services --create-queue --docker ubuntu:18.04 --cpu-only\n```\n\n**Note**: It is the user's responsibility to make sure the proper tasks are pushed into the specified queue.\n\n### AutoML and Orchestration Pipelines <a name=\"automl-pipes\"></a>\n\nThe ClearML Agent can also be used to implement AutoML orchestration and Experiment Pipelines in conjunction with the\nClearML package.\n\nSample AutoML & Orchestration examples can be found in the\nClearML [example/automation](https://github.com/allegroai/clearml/tree/master/examples/automation) folder.\n\nAutoML examples:\n\n- [Toy Keras training experiment](https://github.com/allegroai/clearml/blob/master/examples/optimization/hyper-parameter-optimization/base_template_keras_simple.py)\n    - In order to create an experiment-template in the system, this code must be executed once manually\n- [Random Search over the above Keras experiment-template](https://github.com/allegroai/clearml/blob/master/examples/automation/manual_random_param_search_example.py)\n    - This example will create multiple copies of the Keras experiment-template, with different hyperparameter\n      combinations\n\nExperiment Pipeline examples:\n\n- [First step experiment](https://github.com/allegroai/clearml/blob/master/examples/automation/task_piping_example.py)\n    - This example will \"process data\", and once done, will launch a copy of the 'second step' experiment-template\n- [Second step experiment](https://github.com/allegroai/clearml/blob/master/examples/automation/toy_base_task.py)\n    - In order to create an experiment-template in the system, this code must be executed once manually\n\n### License\n\nApache License, Version 2.0 (see the [LICENSE](https://www.apache.org/licenses/LICENSE-2.0.html) for more information)\n\n\n",
    "bugtrack_url": null,
    "license": "Apache License 2.0",
    "summary": "ClearML Agent - Auto-Magical DevOps for Deep Learning",
    "version": "1.8.0",
    "project_urls": {
        "Homepage": "https://github.com/allegroai/clearml-agent"
    },
    "split_keywords": [
        "clearml",
        "trains",
        "devops",
        "machine",
        "deep",
        "learning",
        "agent",
        "automation",
        "hpc",
        "cluster"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "57f2accac22453ec3436ffa1f7017ae6abf663ce805385b13b581921ebf09f21",
                "md5": "8a89eda3e0f44a4e15031d9a67ae6452",
                "sha256": "8a0d616eaa74def72af579e35adb7aedc55077f3cab79b8fb5dbf687b5de7296"
            },
            "downloads": -1,
            "filename": "clearml_agent-1.8.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "8a89eda3e0f44a4e15031d9a67ae6452",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 429866,
            "upload_time": "2024-04-02T13:43:21",
            "upload_time_iso_8601": "2024-04-02T13:43:21.764776Z",
            "url": "https://files.pythonhosted.org/packages/57/f2/accac22453ec3436ffa1f7017ae6abf663ce805385b13b581921ebf09f21/clearml_agent-1.8.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-02 13:43:21",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "allegroai",
    "github_project": "clearml-agent",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "clearml-agent"
}
        
Elapsed time: 0.23216s