optimum-benchmark


Nameoptimum-benchmark JSON
Version 0.2.1 PyPI version JSON
download
home_pagehttps://github.com/huggingface/optimum-benchmark
SummaryOptimum-Benchmark is a unified multi-backend utility for benchmarking Transformers, Timm, Diffusers and Sentence-Transformers with full support of Optimum's hardware optimizations & quantization schemes.
upload_time2024-05-17 09:25:34
maintainerNone
docs_urlNone
authorHuggingFace Inc. Special Ops Team
requires_pythonNone
licenseApache
keywords benchmaek transformers quantization pruning optimization training inference onnx onnx runtime intel habana graphcore neural compressor ipex ipu hpu llm-swarm py-txi vllm auto-gptq autoawq sentence-transformers bitsandbytes codecarbon flash-attn deepspeed diffusers timm peft
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <p align="center"><img src="https://raw.githubusercontent.com/huggingface/optimum-benchmark/main/logo.png" alt="Optimum-Benchmark Logo" width="350" style="max-width: 100%;" /></p>
<p align="center"><q>All benchmarks are wrong, some will cost you less than others.</q></p>
<h1 align="center">Optimum-Benchmark ๐Ÿ‹๏ธ</h1>

[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/optimum-benchmark)](https://pypi.org/project/optimum-benchmark/)
[![PyPI - Version](https://img.shields.io/pypi/v/optimum-benchmark)](https://pypi.org/project/optimum-benchmark/)
[![PyPI - Downloads](https://img.shields.io/pypi/dm/optimum-benchmark)](https://pypi.org/project/optimum-benchmark/)
[![PyPI - Implementation](https://img.shields.io/pypi/implementation/optimum-benchmark)](https://pypi.org/project/optimum-benchmark/)
[![PyPI - Format](https://img.shields.io/pypi/format/optimum-benchmark)](https://pypi.org/project/optimum-benchmark/)
[![PyPI - License](https://img.shields.io/pypi/l/optimum-benchmark)](https://pypi.org/project/optimum-benchmark/)

Optimum-Benchmark is a unified [multi-backend & multi-device](#backends--devices-) utility for benchmarking [Transformers](https://github.com/huggingface/transformers), [Diffusers](https://github.com/huggingface/diffusers), [PEFT](https://github.com/huggingface/peft), [TIMM](https://github.com/huggingface/pytorch-image-models) and [Optimum](https://github.com/huggingface/optimum) libraries, along with all their supported [optimizations & quantization schemes](#backends--devices-), for [inference & training](#scenarios-), in [distributed & non-distributed settings](#launchers-), in the most correct, efficient and scalable way possible.

*News* ๐Ÿ“ฐ

- PyPI package is now available for installation: `pip install optimum-benchmark` ๐ŸŽ‰ [check it out](https://pypi.org/project/optimum-benchmark/) !
- Hosted 4 minimal docker images (`cpu`, `cuda`, `rocm`, `cuda-ort`) in [packages](https://github.com/huggingface/optimum-benchmark/pkgs/container/optimum-benchmark) for testing, benchmarking and reproducibility ๐Ÿณ
- Added vLLM backend for benchmarking [vLLM](https://github.com/vllm-project/vllm)'s inference engine ๐Ÿš€
- Hosted the codebase of the [LLM-Perf Leaderboard](https://huggingface.co/spaces/optimum/llm-perf-leaderboard) ๐Ÿฅ‡
- Added Py-TXI backend for benchmarking [Py-TXI](https://github.com/IlyasMoutawwakil/py-txi/tree/main) ๐Ÿš€
- Introduced a Python API for running isolated benchmarks from the comfort of your Python scripts ๐Ÿ
- Simplified the CLI interface for running benchmarks using the Hydra CLI ๐Ÿงช

*Motivations* ๐ŸŽฏ

- HuggingFace hardware partners wanting to know how their hardware performs compared to another hardware on the same models.
- HuggingFace ecosystem users wanting to know how their chosen model performs in terms of latency, throughput, memory usage, energy consumption, etc compared to another model.
- Benchmarking hardware & backend specific optimizations & quantization schemes that can be applied to models and improve their computational/memory/energy efficiency.

&#160;

> \[!Note\]
> Optimum-Benchmark is a work in progress and is not yet ready for production use, but we're working hard to make it so. Please keep an eye on the project and help us improve it and make it more useful for the community.

&#160;

## CI Status ๐Ÿšฆ

Optimum-Benchmark is continuously and intensively tested on a variety of devices, backends, scenarios and launchers to ensure its stability with over 300 tests running on every PR (you can request more tests if you want to).

### API ๐Ÿ“ˆ

[![API_CPU](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_api_cpu.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_api_cpu.yaml)
[![API_CUDA](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_api_cuda.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_api_cuda.yaml)
[![API_MISC](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_api_misc.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_api_misc.yaml)
[![API_ROCM](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_api_rocm.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_api_rocm.yaml)

### CLI ๐Ÿ“ˆ

[![CLI_CPU_NEURAL_COMPRESSOR](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cpu_neural_compressor.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cpu_neural_compressor.yaml)
[![CLI_CPU_ONNXRUNTIME](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cpu_onnxruntime.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cpu_onnxruntime.yaml)
[![CLI_CPU_OPENVINO](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cpu_openvino.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cpu_openvino.yaml)
[![CLI_CPU_PYTORCH](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cpu_pytorch.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cpu_pytorch.yaml)
[![CLI_CPU_PY_TXI](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cpu_py_txi.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cpu_py_txi.yaml)
[![CLI_CUDA_ONNXRUNTIME](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cuda_onnxruntime.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cuda_onnxruntime.yaml)
[![CLI_CUDA_VLLM](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cuda_vllm.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cuda_vllm.yaml)
[![CLI_CUDA_PYTORCH_MULTI_GPU](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cuda_pytorch_multi_gpu.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cuda_pytorch_multi_gpu.yaml)
[![CLI_CUDA_PYTORCH_SINGLE_GPU](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cuda_pytorch_single_gpu.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cuda_pytorch_single_gpu.yaml)
[![CLI_CUDA_TENSORRT_LLM](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cuda_tensorrt_llm.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cuda_tensorrt_llm.yaml)
[![CLI_CUDA_TORCH_ORT_MULTI_GPU](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cuda_torch_ort_multi_gpu.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cuda_torch_ort_multi_gpu.yaml)
[![CLI_CUDA_TORCH_ORT_SINGLE_GPU](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cuda_torch_ort_single_gpu.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cuda_torch_ort_single_gpu.yaml)
[![CLI_MISC](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_misc.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_misc.yaml)
[![CLI_ROCM_PYTORCH_MULTI_GPU](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_rocm_pytorch_multi_gpu.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_rocm_pytorch_multi_gpu.yaml)
[![CLI_ROCM_PYTORCH_SINGLE_GPU](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_rocm_pytorch_single_gpu.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_rocm_pytorch_single_gpu.yaml)

## Quickstart ๐Ÿš€

### Installation ๐Ÿ“ฅ

You can install the latest released version of `optimum-benchmark` on PyPI:

```bash
pip install optimum-benchmark
```

or you can install the latest version from the main branch on GitHub:

```bash
pip install git+https://github.com/huggingface/optimum-benchmark.git
```

or if you want to tinker with the code, you can clone the repository and install it in editable mode:

```bash
git clone https://github.com/huggingface/optimum-benchmark.git
cd optimum-benchmark
pip install -e .
```

<details>
    <summary>Advanced install options</summary>

Depending on the backends you want to use, you can install `optimum-benchmark` with the following extras:

- PyTorch (default): `pip install optimum-benchmark`
- OpenVINO: `pip install optimum-benchmark[openvino]`
- Torch-ORT: `pip install optimum-benchmark[torch-ort]`
- OnnxRuntime: `pip install optimum-benchmark[onnxruntime]`
- TensorRT-LLM: `pip install optimum-benchmark[tensorrt-llm]`
- OnnxRuntime-GPU: `pip install optimum-benchmark[onnxruntime-gpu]`
- Neural Compressor: `pip install optimum-benchmark[neural-compressor]`
- Py-TXI: `pip install optimum-benchmark[py-txi]`
- vLLM: `pip install optimum-benchmark[vllm]`

We also support the following extra extra dependencies:

- autoawq
- auto-gptq
- sentence-transformers
- bitsandbytes
- codecarbon
- flash-attn
- deepspeed
- diffusers
- timm
- peft

</details>

### Running benchmarks using the Python API ๐Ÿงช

You can run benchmarks from the Python API, using the `Benchmark` class and its `launch` method. It takes a `BenchmarkConfig` object as input, runs the benchmark in an isolated process and returns a `BenchmarkReport` object containing the benchmark results.

Here's an example of how to run an isolated benchmark using the `pytorch` backend, `torchrun` launcher and `inference` scenario with latency and memory tracking enabled.

```python
from optimum_benchmark import Benchmark, BenchmarkConfig, TorchrunConfig, InferenceConfig, PyTorchConfig
from optimum_benchmark.logging_utils import setup_logging

setup_logging(level="INFO", handlers=["console"])

if __name__ == "__main__":
    launcher_config = TorchrunConfig(nproc_per_node=2)
    scenario_config = InferenceConfig(latency=True, memory=True)
    backend_config = PyTorchConfig(model="gpt2", device="cuda", device_ids="0,1", no_weights=True)
    benchmark_config = BenchmarkConfig(
        name="pytorch_gpt2",
        scenario=scenario_config,
        launcher=launcher_config,
        backend=backend_config,
    )
    benchmark_report = Benchmark.launch(benchmark_config)

    # log the benchmark in terminal
    benchmark_report.log() # or print(benchmark_report)

    # convert artifacts to a dictionary or dataframe
    benchmark_config.to_dict() # or benchmark_config.to_dataframe()

    # save artifacts to disk as json or csv files
    benchmark_report.save_csv("benchmark_report.csv") # or benchmark_report.save_json("benchmark_report.json")

    # push artifacts to the hub
    benchmark_config.push_to_hub("IlyasMoutawwakil/pytorch_gpt2") # or benchmark_config.push_to_hub("IlyasMoutawwakil/pytorch_gpt2")

    # or merge them into a single artifact
    benchmark = Benchmark(config=benchmark_config, report=benchmark_report)
    benchmark.save_json("benchmark.json") # or benchmark.save_csv("benchmark.csv")
    benchmark.push_to_hub("IlyasMoutawwakil/pytorch_gpt2")

    # load artifacts from the hub
    benchmark = Benchmark.from_hub("IlyasMoutawwakil/pytorch_gpt2") # or Benchmark.from_hub("IlyasMoutawwakil/pytorch_gpt2")

    # or load them from disk
    benchmark = Benchmark.load_json("benchmark.json") # or Benchmark.load_csv("benchmark_report.csv")
```

If you're on VSCode, you can hover over the configuration classes to see the available parameters and their descriptions. You can also see the available parameters in the [Features](#features-) section below.

### Running benchmarks using the Hydra CLI ๐Ÿงช

You can also run a benchmark using the command line by specifying the configuration directory and the configuration name. Both arguments are mandatory for [`hydra`](https://hydra.cc/). `--config-dir` is the directory where the configuration files are stored and `--config-name` is the name of the configuration file without its `.yaml` extension.

```bash
optimum-benchmark --config-dir examples/ --config-name pytorch_bert
```

This will run the benchmark using the configuration in [`examples/pytorch_bert.yaml`](examples/pytorch_bert.yaml) and store the results in `runs/pytorch_bert`.

The resulting files are :

- `benchmark_config.json` which contains the configuration used for the benchmark, including the backend, launcher, scenario and the environment in which the benchmark was run.
- `benchmark_report.json` which contains a full report of the benchmark's results, like latency measurements, memory usage, energy consumption, etc.
- `benchmark.json` contains both the report and the configuration in a single file.
- `benchmark.log` contains the logs of the benchmark run.

<details>
<summary>Advanced CLI options</summary>

#### Configuration overrides ๐ŸŽ›๏ธ

It's easy to override the default behavior of a benchmark from the command line of an already existing configuration file. For example, to run the same benchmark on a different device, you can use the following command:

```bash
optimum-benchmark --config-dir examples/ --config-name pytorch_bert backend.model=gpt2 backend.device=cuda
```

#### Configuration sweeps ๐Ÿงน

You can easily run configuration sweeps using the `--multirun` option. By default, configurations will be executed serially but other kinds of executions are supported with hydra's launcher plugins (e.g. `hydra/launcher=joblib`).

```bash
optimum-benchmark --config-dir examples --config-name pytorch_bert -m backend.device=cpu,cuda
```

### Configurations structure ๐Ÿ“

You can create custom and more complex configuration files following these [examples]([examples](https://github.com/IlyasMoutawwakil/optimum-benchmark-examples)). They are heavily commented to help you understand the structure of the configuration files.

</details>

## Features ๐ŸŽจ

`optimum-benchmark` allows you to run benchmarks with minimal configuration. A benchmark is defined by three main components:

- The launcher to use (e.g. `process`)
- The scenario to follow (e.g. `training`)
- The backend to run on (e.g. `onnxruntime`)

### Launchers ๐Ÿš€

- [x] Process launcher (`launcher=process`); Launches the benchmark in an isolated process.
- [x] Torchrun launcher (`launcher=torchrun`); Launches the benchmark in multiples processes using `torch.distributed`.
- [x] Inline launcher (`launcher=inline`), not recommended for benchmarking, only for debugging purposes.

<details>
<summary>General Launcher features ๐Ÿงฐ</summary>

- [x] Assert GPU devices (NVIDIA & AMD) isolation (`launcher.device_isolation=true`). This feature makes sure no other processes are running on the targeted GPU devices other than the benchmark. Espepecially useful when running benchmarks on shared resources.

</details>

### Scenarios ๐Ÿ‹

- [x] Training scenario (`scenario=training`) which benchmarks the model using the trainer class with a randomly generated dataset.
- [x] Inference scenario (`scenario=inference`) which benchmakrs the model's inference method (forward/call/generate) with randomly generated inputs.

<details>
<summary>Inference scenario features ๐Ÿงฐ</summary>

- [x] Memory tracking (`scenario.memory=true`)
- [x] Energy and efficiency tracking (`scenario.energy=true`)
- [x] Latency and throughput tracking (`scenario.latency=true`)
- [x] Warm up runs before inference (`scenario.warmup_runs=20`)
- [x] Inputs shapes control (e.g. `scenario.input_shapes.sequence_length=128`)
- [x] Forward, Call and Generate kwargs (e.g. for an LLM `scenario.generate_kwargs.max_new_tokens=100`, for a diffusion model `scenario.call_kwargs.num_images_per_prompt=4`)

See [InferenceConfig](optimum_benchmark/scenarios/inference/config.py) for more information.

</details>

<details>
<summary>Training scenario features ๐Ÿงฐ</summary>

- [x] Memory tracking (`scenario.memory=true`)
- [x] Energy and efficiency tracking (`scenario.energy=true`)
- [x] Latency and throughput tracking (`scenario.latency=true`)
- [x] Warm up steps before training (`scenario.warmup_steps=20`)
- [x] Dataset shapes control (e.g. `scenario.dataset_shapes.sequence_length=128`)
- [x] Training arguments control (e.g. `scenario.training_args.per_device_train_batch_size=4`)

See [TrainingConfig](optimum_benchmark/scenarios/training/config.py) for more information.

</details>

### Backends & Devices ๐Ÿ“ฑ

- [x] Pytorch backend for CPU (`backend=pytorch`, `backend.device=cpu`)
- [x] Pytorch backend for CUDA (`backend=pytorch`, `backend.device=cuda`, `backend.device_ids=0,1`)
- [ ] Pytorch backend for Habana Gaudi Processor (`backend=pytorch`, `backend.device=hpu`, `backend.device_ids=0,1`)
- [x] OnnxRuntime backend for CPUExecutionProvider (`backend=onnxruntime`, `backend.device=cpu`)
- [x] OnnxRuntime backend for CUDAExecutionProvider (`backend=onnxruntime`, `backend.device=cuda`)
- [x] OnnxRuntime backend for ROCMExecutionProvider (`backend=onnxruntime`, `backend.device=cuda`, `backend.provider=ROCMExecutionProvider`)
- [x] OnnxRuntime backend for TensorrtExecutionProvider (`backend=onnxruntime`, `backend.device=cuda`, `backend.provider=TensorrtExecutionProvider`)
- [x] Py-TXI backend for CPU and GPU (`backend=py-txi`, `backend.device=cpu` or `backend.device=cuda`)
- [x] Neural Compressor backend for CPU (`backend=neural-compressor`, `backend.device=cpu`)
- [x] TensorRT-LLM backend for CUDA (`backend=tensorrt-llm`, `backend.device=cuda`)
- [x] Torch-ORT backend for CUDA (`backend=torch-ort`, `backend.device=cuda`)
- [x] OpenVINO backend for CPU (`backend=openvino`, `backend.device=cpu`)
- [x] OpenVINO backend for GPU (`backend=openvino`, `backend.device=gpu`)
- [x] vLLM backend for CUDA (`backend=vllm`, `backend.device=cuda`)
- [x] vLLM backend for ROCM (`backend=vllm`, `backend.device=rocm`)
- [x] vLLM backend for CPU (`backend=vllm`, `backend.device=cpu`)

<details>
<summary>General backend features ๐Ÿงฐ</summary>

- [x] Device selection (`backend.device=cuda`), can be `cpu`, `cuda`, `mps`, etc.
- [x] Device ids selection (`backend.device_ids=0,1`), can be a list of device ids to run the benchmark on multiple devices.
- [x] Model selection (`backend.model=gpt2`), can be a model id from the HuggingFace model hub or an **absolute path** to a model folder.
- [x] "No weights" feature, to benchmark models without downloading their weights, using randomly initialized weights (`backend.no_weights=true`)

</details>

<details>
<summary>Backend specific features ๐Ÿงฐ</summary>

For more information on the features of each backend, you can check their respective configuration files:

- [VLLMConfig](optimum_benchmark/backends/vllm/config.py)
- [OVConfig](optimum_benchmark/backends/openvino/config.py)
- [PyTXIConfig](optimum_benchmark/backends/py_txi/config.py)
- [PyTorchConfig](optimum_benchmark/backends/pytorch/config.py)
- [ORTConfig](optimum_benchmark/backends/onnxruntime/config.py)
- [TorchORTConfig](optimum_benchmark/backends/torch_ort/config.py)
- [LLMSwarmConfig](optimum_benchmark/backends/llm_swarm/config.py)
- [TRTLLMConfig](optimum_benchmark/backends/tensorrt_llm/config.py)
- [INCConfig](optimum_benchmark/backends/neural_compressor/config.py)

</details>

## Contributing ๐Ÿค

Contributions are welcome! And we're happy to help you get started. Feel free to open an issue or a pull request.
Things that we'd like to see:

- More backends (Tensorflow, TFLite, Jax, etc).
- More tests (for optimizations and quantization schemes).
- More hardware support (Habana Gaudi Processor (HPU), Apple M series, etc).
- Task evaluators for the most common tasks (would be great for output regression).

To get started, you can check the [CONTRIBUTING.md](CONTRIBUTING.md) file.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/huggingface/optimum-benchmark",
    "name": "optimum-benchmark",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "benchmaek, transformers, quantization, pruning, optimization, training, inference, onnx, onnx runtime, intel, habana, graphcore, neural compressor, ipex, ipu, hpu, llm-swarm, py-txi, vllm, auto-gptq, autoawq, sentence-transformers, bitsandbytes, codecarbon, flash-attn, deepspeed, diffusers, timm, peft",
    "author": "HuggingFace Inc. Special Ops Team",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/77/10/ae5cd27783bc61ad29931c548603c488cee4f16787f9045bc4f35854934f/optimum-benchmark-0.2.1.tar.gz",
    "platform": null,
    "description": "<p align=\"center\"><img src=\"https://raw.githubusercontent.com/huggingface/optimum-benchmark/main/logo.png\" alt=\"Optimum-Benchmark Logo\" width=\"350\" style=\"max-width: 100%;\" /></p>\n<p align=\"center\"><q>All benchmarks are wrong, some will cost you less than others.</q></p>\n<h1 align=\"center\">Optimum-Benchmark \ud83c\udfcb\ufe0f</h1>\n\n[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/optimum-benchmark)](https://pypi.org/project/optimum-benchmark/)\n[![PyPI - Version](https://img.shields.io/pypi/v/optimum-benchmark)](https://pypi.org/project/optimum-benchmark/)\n[![PyPI - Downloads](https://img.shields.io/pypi/dm/optimum-benchmark)](https://pypi.org/project/optimum-benchmark/)\n[![PyPI - Implementation](https://img.shields.io/pypi/implementation/optimum-benchmark)](https://pypi.org/project/optimum-benchmark/)\n[![PyPI - Format](https://img.shields.io/pypi/format/optimum-benchmark)](https://pypi.org/project/optimum-benchmark/)\n[![PyPI - License](https://img.shields.io/pypi/l/optimum-benchmark)](https://pypi.org/project/optimum-benchmark/)\n\nOptimum-Benchmark is a unified [multi-backend & multi-device](#backends--devices-) utility for benchmarking [Transformers](https://github.com/huggingface/transformers), [Diffusers](https://github.com/huggingface/diffusers), [PEFT](https://github.com/huggingface/peft), [TIMM](https://github.com/huggingface/pytorch-image-models) and [Optimum](https://github.com/huggingface/optimum) libraries, along with all their supported [optimizations & quantization schemes](#backends--devices-), for [inference & training](#scenarios-), in [distributed & non-distributed settings](#launchers-), in the most correct, efficient and scalable way possible.\n\n*News* \ud83d\udcf0\n\n- PyPI package is now available for installation: `pip install optimum-benchmark` \ud83c\udf89 [check it out](https://pypi.org/project/optimum-benchmark/) !\n- Hosted 4 minimal docker images (`cpu`, `cuda`, `rocm`, `cuda-ort`) in [packages](https://github.com/huggingface/optimum-benchmark/pkgs/container/optimum-benchmark) for testing, benchmarking and reproducibility \ud83d\udc33\n- Added vLLM backend for benchmarking [vLLM](https://github.com/vllm-project/vllm)'s inference engine \ud83d\ude80\n- Hosted the codebase of the [LLM-Perf Leaderboard](https://huggingface.co/spaces/optimum/llm-perf-leaderboard) \ud83e\udd47\n- Added Py-TXI backend for benchmarking [Py-TXI](https://github.com/IlyasMoutawwakil/py-txi/tree/main) \ud83d\ude80\n- Introduced a Python API for running isolated benchmarks from the comfort of your Python scripts \ud83d\udc0d\n- Simplified the CLI interface for running benchmarks using the Hydra CLI \ud83e\uddea\n\n*Motivations* \ud83c\udfaf\n\n- HuggingFace hardware partners wanting to know how their hardware performs compared to another hardware on the same models.\n- HuggingFace ecosystem users wanting to know how their chosen model performs in terms of latency, throughput, memory usage, energy consumption, etc compared to another model.\n- Benchmarking hardware & backend specific optimizations & quantization schemes that can be applied to models and improve their computational/memory/energy efficiency.\n\n&#160;\n\n> \\[!Note\\]\n> Optimum-Benchmark is a work in progress and is not yet ready for production use, but we're working hard to make it so. Please keep an eye on the project and help us improve it and make it more useful for the community.\n\n&#160;\n\n## CI Status \ud83d\udea6\n\nOptimum-Benchmark is continuously and intensively tested on a variety of devices, backends, scenarios and launchers to ensure its stability with over 300 tests running on every PR (you can request more tests if you want to).\n\n### API \ud83d\udcc8\n\n[![API_CPU](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_api_cpu.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_api_cpu.yaml)\n[![API_CUDA](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_api_cuda.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_api_cuda.yaml)\n[![API_MISC](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_api_misc.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_api_misc.yaml)\n[![API_ROCM](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_api_rocm.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_api_rocm.yaml)\n\n### CLI \ud83d\udcc8\n\n[![CLI_CPU_NEURAL_COMPRESSOR](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cpu_neural_compressor.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cpu_neural_compressor.yaml)\n[![CLI_CPU_ONNXRUNTIME](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cpu_onnxruntime.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cpu_onnxruntime.yaml)\n[![CLI_CPU_OPENVINO](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cpu_openvino.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cpu_openvino.yaml)\n[![CLI_CPU_PYTORCH](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cpu_pytorch.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cpu_pytorch.yaml)\n[![CLI_CPU_PY_TXI](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cpu_py_txi.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cpu_py_txi.yaml)\n[![CLI_CUDA_ONNXRUNTIME](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cuda_onnxruntime.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cuda_onnxruntime.yaml)\n[![CLI_CUDA_VLLM](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cuda_vllm.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cuda_vllm.yaml)\n[![CLI_CUDA_PYTORCH_MULTI_GPU](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cuda_pytorch_multi_gpu.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cuda_pytorch_multi_gpu.yaml)\n[![CLI_CUDA_PYTORCH_SINGLE_GPU](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cuda_pytorch_single_gpu.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cuda_pytorch_single_gpu.yaml)\n[![CLI_CUDA_TENSORRT_LLM](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cuda_tensorrt_llm.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cuda_tensorrt_llm.yaml)\n[![CLI_CUDA_TORCH_ORT_MULTI_GPU](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cuda_torch_ort_multi_gpu.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cuda_torch_ort_multi_gpu.yaml)\n[![CLI_CUDA_TORCH_ORT_SINGLE_GPU](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cuda_torch_ort_single_gpu.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cuda_torch_ort_single_gpu.yaml)\n[![CLI_MISC](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_misc.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_misc.yaml)\n[![CLI_ROCM_PYTORCH_MULTI_GPU](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_rocm_pytorch_multi_gpu.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_rocm_pytorch_multi_gpu.yaml)\n[![CLI_ROCM_PYTORCH_SINGLE_GPU](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_rocm_pytorch_single_gpu.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_rocm_pytorch_single_gpu.yaml)\n\n## Quickstart \ud83d\ude80\n\n### Installation \ud83d\udce5\n\nYou can install the latest released version of `optimum-benchmark` on PyPI:\n\n```bash\npip install optimum-benchmark\n```\n\nor you can install the latest version from the main branch on GitHub:\n\n```bash\npip install git+https://github.com/huggingface/optimum-benchmark.git\n```\n\nor if you want to tinker with the code, you can clone the repository and install it in editable mode:\n\n```bash\ngit clone https://github.com/huggingface/optimum-benchmark.git\ncd optimum-benchmark\npip install -e .\n```\n\n<details>\n    <summary>Advanced install options</summary>\n\nDepending on the backends you want to use, you can install `optimum-benchmark` with the following extras:\n\n- PyTorch (default): `pip install optimum-benchmark`\n- OpenVINO: `pip install optimum-benchmark[openvino]`\n- Torch-ORT: `pip install optimum-benchmark[torch-ort]`\n- OnnxRuntime: `pip install optimum-benchmark[onnxruntime]`\n- TensorRT-LLM: `pip install optimum-benchmark[tensorrt-llm]`\n- OnnxRuntime-GPU: `pip install optimum-benchmark[onnxruntime-gpu]`\n- Neural Compressor: `pip install optimum-benchmark[neural-compressor]`\n- Py-TXI: `pip install optimum-benchmark[py-txi]`\n- vLLM: `pip install optimum-benchmark[vllm]`\n\nWe also support the following extra extra dependencies:\n\n- autoawq\n- auto-gptq\n- sentence-transformers\n- bitsandbytes\n- codecarbon\n- flash-attn\n- deepspeed\n- diffusers\n- timm\n- peft\n\n</details>\n\n### Running benchmarks using the Python API \ud83e\uddea\n\nYou can run benchmarks from the Python API, using the `Benchmark` class and its `launch` method. It takes a `BenchmarkConfig` object as input, runs the benchmark in an isolated process and returns a `BenchmarkReport` object containing the benchmark results.\n\nHere's an example of how to run an isolated benchmark using the `pytorch` backend, `torchrun` launcher and `inference` scenario with latency and memory tracking enabled.\n\n```python\nfrom optimum_benchmark import Benchmark, BenchmarkConfig, TorchrunConfig, InferenceConfig, PyTorchConfig\nfrom optimum_benchmark.logging_utils import setup_logging\n\nsetup_logging(level=\"INFO\", handlers=[\"console\"])\n\nif __name__ == \"__main__\":\n    launcher_config = TorchrunConfig(nproc_per_node=2)\n    scenario_config = InferenceConfig(latency=True, memory=True)\n    backend_config = PyTorchConfig(model=\"gpt2\", device=\"cuda\", device_ids=\"0,1\", no_weights=True)\n    benchmark_config = BenchmarkConfig(\n        name=\"pytorch_gpt2\",\n        scenario=scenario_config,\n        launcher=launcher_config,\n        backend=backend_config,\n    )\n    benchmark_report = Benchmark.launch(benchmark_config)\n\n    # log the benchmark in terminal\n    benchmark_report.log() # or print(benchmark_report)\n\n    # convert artifacts to a dictionary or dataframe\n    benchmark_config.to_dict() # or benchmark_config.to_dataframe()\n\n    # save artifacts to disk as json or csv files\n    benchmark_report.save_csv(\"benchmark_report.csv\") # or benchmark_report.save_json(\"benchmark_report.json\")\n\n    # push artifacts to the hub\n    benchmark_config.push_to_hub(\"IlyasMoutawwakil/pytorch_gpt2\") # or benchmark_config.push_to_hub(\"IlyasMoutawwakil/pytorch_gpt2\")\n\n    # or merge them into a single artifact\n    benchmark = Benchmark(config=benchmark_config, report=benchmark_report)\n    benchmark.save_json(\"benchmark.json\") # or benchmark.save_csv(\"benchmark.csv\")\n    benchmark.push_to_hub(\"IlyasMoutawwakil/pytorch_gpt2\")\n\n    # load artifacts from the hub\n    benchmark = Benchmark.from_hub(\"IlyasMoutawwakil/pytorch_gpt2\") # or Benchmark.from_hub(\"IlyasMoutawwakil/pytorch_gpt2\")\n\n    # or load them from disk\n    benchmark = Benchmark.load_json(\"benchmark.json\") # or Benchmark.load_csv(\"benchmark_report.csv\")\n```\n\nIf you're on VSCode, you can hover over the configuration classes to see the available parameters and their descriptions. You can also see the available parameters in the [Features](#features-) section below.\n\n### Running benchmarks using the Hydra CLI \ud83e\uddea\n\nYou can also run a benchmark using the command line by specifying the configuration directory and the configuration name. Both arguments are mandatory for [`hydra`](https://hydra.cc/). `--config-dir` is the directory where the configuration files are stored and `--config-name` is the name of the configuration file without its `.yaml` extension.\n\n```bash\noptimum-benchmark --config-dir examples/ --config-name pytorch_bert\n```\n\nThis will run the benchmark using the configuration in [`examples/pytorch_bert.yaml`](examples/pytorch_bert.yaml) and store the results in `runs/pytorch_bert`.\n\nThe resulting files are :\n\n- `benchmark_config.json` which contains the configuration used for the benchmark, including the backend, launcher, scenario and the environment in which the benchmark was run.\n- `benchmark_report.json` which contains a full report of the benchmark's results, like latency measurements, memory usage, energy consumption, etc.\n- `benchmark.json` contains both the report and the configuration in a single file.\n- `benchmark.log` contains the logs of the benchmark run.\n\n<details>\n<summary>Advanced CLI options</summary>\n\n#### Configuration overrides \ud83c\udf9b\ufe0f\n\nIt's easy to override the default behavior of a benchmark from the command line of an already existing configuration file. For example, to run the same benchmark on a different device, you can use the following command:\n\n```bash\noptimum-benchmark --config-dir examples/ --config-name pytorch_bert backend.model=gpt2 backend.device=cuda\n```\n\n#### Configuration sweeps \ud83e\uddf9\n\nYou can easily run configuration sweeps using the `--multirun` option. By default, configurations will be executed serially but other kinds of executions are supported with hydra's launcher plugins (e.g. `hydra/launcher=joblib`).\n\n```bash\noptimum-benchmark --config-dir examples --config-name pytorch_bert -m backend.device=cpu,cuda\n```\n\n### Configurations structure \ud83d\udcc1\n\nYou can create custom and more complex configuration files following these [examples]([examples](https://github.com/IlyasMoutawwakil/optimum-benchmark-examples)). They are heavily commented to help you understand the structure of the configuration files.\n\n</details>\n\n## Features \ud83c\udfa8\n\n`optimum-benchmark` allows you to run benchmarks with minimal configuration. A benchmark is defined by three main components:\n\n- The launcher to use (e.g. `process`)\n- The scenario to follow (e.g. `training`)\n- The backend to run on (e.g. `onnxruntime`)\n\n### Launchers \ud83d\ude80\n\n- [x] Process launcher (`launcher=process`); Launches the benchmark in an isolated process.\n- [x] Torchrun launcher (`launcher=torchrun`); Launches the benchmark in multiples processes using `torch.distributed`.\n- [x] Inline launcher (`launcher=inline`), not recommended for benchmarking, only for debugging purposes.\n\n<details>\n<summary>General Launcher features \ud83e\uddf0</summary>\n\n- [x] Assert GPU devices (NVIDIA & AMD) isolation (`launcher.device_isolation=true`). This feature makes sure no other processes are running on the targeted GPU devices other than the benchmark. Espepecially useful when running benchmarks on shared resources.\n\n</details>\n\n### Scenarios \ud83c\udfcb\n\n- [x] Training scenario (`scenario=training`) which benchmarks the model using the trainer class with a randomly generated dataset.\n- [x] Inference scenario (`scenario=inference`) which benchmakrs the model's inference method (forward/call/generate) with randomly generated inputs.\n\n<details>\n<summary>Inference scenario features \ud83e\uddf0</summary>\n\n- [x] Memory tracking (`scenario.memory=true`)\n- [x] Energy and efficiency tracking (`scenario.energy=true`)\n- [x] Latency and throughput tracking (`scenario.latency=true`)\n- [x] Warm up runs before inference (`scenario.warmup_runs=20`)\n- [x] Inputs shapes control (e.g. `scenario.input_shapes.sequence_length=128`)\n- [x] Forward, Call and Generate kwargs (e.g. for an LLM `scenario.generate_kwargs.max_new_tokens=100`, for a diffusion model `scenario.call_kwargs.num_images_per_prompt=4`)\n\nSee [InferenceConfig](optimum_benchmark/scenarios/inference/config.py) for more information.\n\n</details>\n\n<details>\n<summary>Training scenario features \ud83e\uddf0</summary>\n\n- [x] Memory tracking (`scenario.memory=true`)\n- [x] Energy and efficiency tracking (`scenario.energy=true`)\n- [x] Latency and throughput tracking (`scenario.latency=true`)\n- [x] Warm up steps before training (`scenario.warmup_steps=20`)\n- [x] Dataset shapes control (e.g. `scenario.dataset_shapes.sequence_length=128`)\n- [x] Training arguments control (e.g. `scenario.training_args.per_device_train_batch_size=4`)\n\nSee [TrainingConfig](optimum_benchmark/scenarios/training/config.py) for more information.\n\n</details>\n\n### Backends & Devices \ud83d\udcf1\n\n- [x] Pytorch backend for CPU (`backend=pytorch`, `backend.device=cpu`)\n- [x] Pytorch backend for CUDA (`backend=pytorch`, `backend.device=cuda`, `backend.device_ids=0,1`)\n- [ ] Pytorch backend for Habana Gaudi Processor (`backend=pytorch`, `backend.device=hpu`, `backend.device_ids=0,1`)\n- [x] OnnxRuntime backend for CPUExecutionProvider (`backend=onnxruntime`, `backend.device=cpu`)\n- [x] OnnxRuntime backend for CUDAExecutionProvider (`backend=onnxruntime`, `backend.device=cuda`)\n- [x] OnnxRuntime backend for ROCMExecutionProvider (`backend=onnxruntime`, `backend.device=cuda`, `backend.provider=ROCMExecutionProvider`)\n- [x] OnnxRuntime backend for TensorrtExecutionProvider (`backend=onnxruntime`, `backend.device=cuda`, `backend.provider=TensorrtExecutionProvider`)\n- [x] Py-TXI backend for CPU and GPU (`backend=py-txi`, `backend.device=cpu` or `backend.device=cuda`)\n- [x] Neural Compressor backend for CPU (`backend=neural-compressor`, `backend.device=cpu`)\n- [x] TensorRT-LLM backend for CUDA (`backend=tensorrt-llm`, `backend.device=cuda`)\n- [x] Torch-ORT backend for CUDA (`backend=torch-ort`, `backend.device=cuda`)\n- [x] OpenVINO backend for CPU (`backend=openvino`, `backend.device=cpu`)\n- [x] OpenVINO backend for GPU (`backend=openvino`, `backend.device=gpu`)\n- [x] vLLM backend for CUDA (`backend=vllm`, `backend.device=cuda`)\n- [x] vLLM backend for ROCM (`backend=vllm`, `backend.device=rocm`)\n- [x] vLLM backend for CPU (`backend=vllm`, `backend.device=cpu`)\n\n<details>\n<summary>General backend features \ud83e\uddf0</summary>\n\n- [x] Device selection (`backend.device=cuda`), can be `cpu`, `cuda`, `mps`, etc.\n- [x] Device ids selection (`backend.device_ids=0,1`), can be a list of device ids to run the benchmark on multiple devices.\n- [x] Model selection (`backend.model=gpt2`), can be a model id from the HuggingFace model hub or an **absolute path** to a model folder.\n- [x] \"No weights\" feature, to benchmark models without downloading their weights, using randomly initialized weights (`backend.no_weights=true`)\n\n</details>\n\n<details>\n<summary>Backend specific features \ud83e\uddf0</summary>\n\nFor more information on the features of each backend, you can check their respective configuration files:\n\n- [VLLMConfig](optimum_benchmark/backends/vllm/config.py)\n- [OVConfig](optimum_benchmark/backends/openvino/config.py)\n- [PyTXIConfig](optimum_benchmark/backends/py_txi/config.py)\n- [PyTorchConfig](optimum_benchmark/backends/pytorch/config.py)\n- [ORTConfig](optimum_benchmark/backends/onnxruntime/config.py)\n- [TorchORTConfig](optimum_benchmark/backends/torch_ort/config.py)\n- [LLMSwarmConfig](optimum_benchmark/backends/llm_swarm/config.py)\n- [TRTLLMConfig](optimum_benchmark/backends/tensorrt_llm/config.py)\n- [INCConfig](optimum_benchmark/backends/neural_compressor/config.py)\n\n</details>\n\n## Contributing \ud83e\udd1d\n\nContributions are welcome! And we're happy to help you get started. Feel free to open an issue or a pull request.\nThings that we'd like to see:\n\n- More backends (Tensorflow, TFLite, Jax, etc).\n- More tests (for optimizations and quantization schemes).\n- More hardware support (Habana Gaudi Processor (HPU), Apple M series, etc).\n- Task evaluators for the most common tasks (would be great for output regression).\n\nTo get started, you can check the [CONTRIBUTING.md](CONTRIBUTING.md) file.\n",
    "bugtrack_url": null,
    "license": "Apache",
    "summary": "Optimum-Benchmark is a unified multi-backend utility for benchmarking Transformers, Timm, Diffusers and Sentence-Transformers with full support of Optimum's hardware optimizations & quantization schemes.",
    "version": "0.2.1",
    "project_urls": {
        "Homepage": "https://github.com/huggingface/optimum-benchmark"
    },
    "split_keywords": [
        "benchmaek",
        " transformers",
        " quantization",
        " pruning",
        " optimization",
        " training",
        " inference",
        " onnx",
        " onnx runtime",
        " intel",
        " habana",
        " graphcore",
        " neural compressor",
        " ipex",
        " ipu",
        " hpu",
        " llm-swarm",
        " py-txi",
        " vllm",
        " auto-gptq",
        " autoawq",
        " sentence-transformers",
        " bitsandbytes",
        " codecarbon",
        " flash-attn",
        " deepspeed",
        " diffusers",
        " timm",
        " peft"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "29f66dfea23e754c69e3d25ff55fbe0548ead802114357d70ea8c19c6ab25c99",
                "md5": "471ee877606fbc32b6a12f632f3bde94",
                "sha256": "a6812c13e6e8b4d3c5ef9c0d774c34c45c99cc750888f66f28e846862fd7c795"
            },
            "downloads": -1,
            "filename": "optimum_benchmark-0.2.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "471ee877606fbc32b6a12f632f3bde94",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 106939,
            "upload_time": "2024-05-17T09:25:32",
            "upload_time_iso_8601": "2024-05-17T09:25:32.860542Z",
            "url": "https://files.pythonhosted.org/packages/29/f6/6dfea23e754c69e3d25ff55fbe0548ead802114357d70ea8c19c6ab25c99/optimum_benchmark-0.2.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7710ae5cd27783bc61ad29931c548603c488cee4f16787f9045bc4f35854934f",
                "md5": "6074302dcf4a36bc91d7db832a06b92e",
                "sha256": "207138c08602eebded0005443e19c430738cbdd0b41f117ea51210858b7dea2c"
            },
            "downloads": -1,
            "filename": "optimum-benchmark-0.2.1.tar.gz",
            "has_sig": false,
            "md5_digest": "6074302dcf4a36bc91d7db832a06b92e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 77899,
            "upload_time": "2024-05-17T09:25:34",
            "upload_time_iso_8601": "2024-05-17T09:25:34.728856Z",
            "url": "https://files.pythonhosted.org/packages/77/10/ae5cd27783bc61ad29931c548603c488cee4f16787f9045bc4f35854934f/optimum-benchmark-0.2.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-17 09:25:34",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "huggingface",
    "github_project": "optimum-benchmark",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "optimum-benchmark"
}
        
Elapsed time: 0.27121s