[![ONNX Runtime](https://github.com/huggingface/optimum/actions/workflows/test_onnxruntime.yml/badge.svg)](https://github.com/huggingface/optimum/actions/workflows/test_onnxruntime.yml)
# Hugging Face Optimum
🤗 Optimum is an extension of 🤗 Transformers and Diffusers, providing a set of optimization tools enabling maximum efficiency to train and run models on targeted hardware, while keeping things easy to use.
## Installation
🤗 Optimum can be installed using `pip` as follows:
```bash
python -m pip install optimum
```
If you'd like to use the accelerator-specific features of 🤗 Optimum, you can install the required dependencies according to the table below:
| Accelerator | Installation |
|:-----------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------|
| [ONNX Runtime](https://huggingface.co/docs/optimum/onnxruntime/overview) | `pip install --upgrade --upgrade-strategy eager optimum[onnxruntime]` |
| [Intel Neural Compressor](https://huggingface.co/docs/optimum/intel/index) | `pip install --upgrade --upgrade-strategy eager optimum[neural-compressor]`|
| [OpenVINO](https://huggingface.co/docs/optimum/intel/index) | `pip install --upgrade --upgrade-strategy eager optimum[openvino]` |
| [NVIDIA TensorRT-LLM](https://huggingface.co/docs/optimum/main/en/nvidia_overview) | `docker run -it --gpus all --ipc host huggingface/optimum-nvidia` |
| [AMD Instinct GPUs and Ryzen AI NPU](https://huggingface.co/docs/optimum/amd/index) | `pip install --upgrade --upgrade-strategy eager optimum[amd]` |
| [AWS Trainum & Inferentia](https://huggingface.co/docs/optimum-neuron/index) | `pip install --upgrade --upgrade-strategy eager optimum[neuronx]` |
| [Habana Gaudi Processor (HPU)](https://huggingface.co/docs/optimum/habana/index) | `pip install --upgrade --upgrade-strategy eager optimum[habana]` |
| [FuriosaAI](https://huggingface.co/docs/optimum/furiosa/index) | `pip install --upgrade --upgrade-strategy eager optimum[furiosa]` |
The `--upgrade --upgrade-strategy eager` option is needed to ensure the different packages are upgraded to the latest possible version.
To install from source:
```bash
python -m pip install git+https://github.com/huggingface/optimum.git
```
For the accelerator-specific features, append `optimum[accelerator_type]` to the above command:
```bash
python -m pip install optimum[onnxruntime]@git+https://github.com/huggingface/optimum.git
```
## Accelerated Inference
🤗 Optimum provides multiple tools to export and run optimized models on various ecosystems:
- [ONNX](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model) / [ONNX Runtime](https://huggingface.co/docs/optimum/onnxruntime/usage_guides/models)
- TensorFlow Lite
- [OpenVINO](https://huggingface.co/docs/optimum/intel/inference)
- Habana first-gen Gaudi / Gaudi2, more details [here](https://huggingface.co/docs/optimum/main/en/habana/usage_guides/accelerate_inference)
- AWS Inferentia 2 / Inferentia 1, more details [here](https://huggingface.co/docs/optimum-neuron/en/guides/models)
- NVIDIA TensorRT-LLM , more details [here](https://huggingface.co/blog/optimum-nvidia)
The [export](https://huggingface.co/docs/optimum/exporters/overview) and optimizations can be done both programmatically and with a command line.
### Features summary
| Features | [ONNX Runtime](https://huggingface.co/docs/optimum/main/en/onnxruntime/overview)| [Neural Compressor](https://huggingface.co/docs/optimum/main/en/intel/optimization_inc)| [OpenVINO](https://huggingface.co/docs/optimum/main/en/intel/inference)| [TensorFlow Lite](https://huggingface.co/docs/optimum/main/en/exporters/tflite/overview)|
|:----------------------------------:|:------------------:|:------------------:|:------------------:|:------------------:|
| Graph optimization | :heavy_check_mark: | N/A | :heavy_check_mark: | N/A |
| Post-training dynamic quantization | :heavy_check_mark: | :heavy_check_mark: | N/A | :heavy_check_mark: |
| Post-training static quantization | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
| Quantization Aware Training (QAT) | N/A | :heavy_check_mark: | :heavy_check_mark: | N/A |
| FP16 (half precision) | :heavy_check_mark: | N/A | :heavy_check_mark: | :heavy_check_mark: |
| Pruning | N/A | :heavy_check_mark: | :heavy_check_mark: | N/A |
| Knowledge Distillation | N/A | :heavy_check_mark: | :heavy_check_mark: | N/A |
### OpenVINO
Before you begin, make sure you have all the necessary libraries installed :
```bash
pip install --upgrade --upgrade-strategy eager optimum[openvino]
```
It is possible to export 🤗 Transformers and Diffusers models to the OpenVINO format easily:
```bash
optimum-cli export openvino --model distilbert-base-uncased-finetuned-sst-2-english distilbert_sst2_ov
```
If you add `--weight-format int8`, the weights will be quantized to `int8`, check out our [documentation](https://huggingface.co/docs/optimum/main/intel/openvino/export) for more detail. To apply quantization on both weights and activations, you can find more information [here](https://huggingface.co/docs/optimum/main/intel/openvino/optimization#static-quantization).
To load a model and run inference with OpenVINO Runtime, you can just replace your `AutoModelForXxx` class with the corresponding `OVModelForXxx` class. To load a PyTorch checkpoint and convert it to the OpenVINO format on-the-fly, you can set `export=True` when loading your model.
```diff
- from transformers import AutoModelForSequenceClassification
+ from optimum.intel import OVModelForSequenceClassification
from transformers import AutoTokenizer, pipeline
model_id = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(model_id)
- model = AutoModelForSequenceClassification.from_pretrained(model_id)
+ model = OVModelForSequenceClassification.from_pretrained(model_id, export=True)
classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)
results = classifier("He's a dreadful magician.")
```
You can find more examples in the [documentation](https://huggingface.co/docs/optimum/main/intel/openvino/inference) and in the [examples](https://github.com/huggingface/optimum-intel/tree/main/examples/openvino).
### Neural Compressor
Before you begin, make sure you have all the necessary libraries installed :
```bash
pip install --upgrade --upgrade-strategy eager optimum[neural-compressor]
```
Dynamic quantization can be applied on your model:
```bash
optimum-cli inc quantize --model distilbert-base-cased-distilled-squad --output ./quantized_distilbert
```
To load a model quantized with Intel Neural Compressor, hosted locally or on the 🤗 hub, you can do as follows :
```python
from optimum.intel import INCModelForSequenceClassification
model_id = "Intel/distilbert-base-uncased-finetuned-sst-2-english-int8-dynamic"
model = INCModelForSequenceClassification.from_pretrained(model_id)
```
You can find more examples in the [documentation](https://huggingface.co/docs/optimum/intel/optimization_inc) and in the [examples](https://github.com/huggingface/optimum-intel/tree/main/examples/neural_compressor).
### ONNX + ONNX Runtime
Before you begin, make sure you have all the necessary libraries installed :
```bash
pip install optimum[exporters,onnxruntime]
```
It is possible to export 🤗 Transformers and Diffusers models to the [ONNX](https://onnx.ai/) format and perform graph optimization as well as quantization easily:
```plain
optimum-cli export onnx -m deepset/roberta-base-squad2 --optimize O2 roberta_base_qa_onnx
```
The model can then be quantized using `onnxruntime`:
```bash
optimum-cli onnxruntime quantize \
--avx512 \
--onnx_model roberta_base_qa_onnx \
-o quantized_roberta_base_qa_onnx
```
These commands will export `deepset/roberta-base-squad2` and perform [O2 graph optimization](https://huggingface.co/docs/optimum/onnxruntime/usage_guides/optimization#optimization-configuration) on the exported model, and finally quantize it with the [avx512 configuration](https://huggingface.co/docs/optimum/main/en/onnxruntime/package_reference/configuration#optimum.onnxruntime.AutoQuantizationConfig.avx512).
For more information on the ONNX export, please check the [documentation](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model).
#### Run the exported model using ONNX Runtime
Once the model is exported to the ONNX format, we provide Python classes enabling you to run the exported ONNX model in a seemless manner using [ONNX Runtime](https://onnxruntime.ai/) in the backend:
```diff
- from transformers import AutoModelForQuestionAnswering
+ from optimum.onnxruntime import ORTModelForQuestionAnswering
from transformers import AutoTokenizer, pipeline
model_id = "deepset/roberta-base-squad2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
- model = AutoModelForQuestionAnswering.from_pretrained(model_id)
+ model = ORTModelForQuestionAnswering.from_pretrained("roberta_base_qa_onnx")
qa_pipe = pipeline("question-answering", model=model, tokenizer=tokenizer)
question = "What's Optimum?"
context = "Optimum is an awesome library everyone should use!"
results = qa_pipe(question=question, context=context)
```
More details on how to run ONNX models with `ORTModelForXXX` classes [here](https://huggingface.co/docs/optimum/main/en/onnxruntime/usage_guides/models).
### TensorFlow Lite
Before you begin, make sure you have all the necessary libraries installed :
```bash
pip install optimum[exporters-tf]
```
Just as for ONNX, it is possible to export models to [TensorFlow Lite](https://www.tensorflow.org/lite) and quantize them:
```plain
optimum-cli export tflite \
-m deepset/roberta-base-squad2 \
--sequence_length 384 \
--quantize int8-dynamic roberta_tflite_model
```
## Accelerated training
🤗 Optimum provides wrappers around the original 🤗 Transformers [Trainer](https://huggingface.co/docs/transformers/main_classes/trainer) to enable training on powerful hardware easily.
We support many providers:
- Habana's Gaudi processors
- AWS Trainium instances, check [here](https://huggingface.co/docs/optimum-neuron/en/guides/distributed_training)
- ONNX Runtime (optimized for GPUs)
### Habana
Before you begin, make sure you have all the necessary libraries installed :
```bash
pip install --upgrade --upgrade-strategy eager optimum[habana]
```
```diff
- from transformers import Trainer, TrainingArguments
+ from optimum.habana import GaudiTrainer, GaudiTrainingArguments
# Download a pretrained model from the Hub
model = AutoModelForXxx.from_pretrained("bert-base-uncased")
# Define the training arguments
- training_args = TrainingArguments(
+ training_args = GaudiTrainingArguments(
output_dir="path/to/save/folder/",
+ use_habana=True,
+ use_lazy_mode=True,
+ gaudi_config_name="Habana/bert-base-uncased",
...
)
# Initialize the trainer
- trainer = Trainer(
+ trainer = GaudiTrainer(
model=model,
args=training_args,
train_dataset=train_dataset,
...
)
# Use Habana Gaudi processor for training!
trainer.train()
```
You can find more examples in the [documentation](https://huggingface.co/docs/optimum/habana/quickstart) and in the [examples](https://github.com/huggingface/optimum-habana/tree/main/examples).
### ONNX Runtime
```diff
- from transformers import Trainer, TrainingArguments
+ from optimum.onnxruntime import ORTTrainer, ORTTrainingArguments
# Download a pretrained model from the Hub
model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased")
# Define the training arguments
- training_args = TrainingArguments(
+ training_args = ORTTrainingArguments(
output_dir="path/to/save/folder/",
optim="adamw_ort_fused",
...
)
# Create a ONNX Runtime Trainer
- trainer = Trainer(
+ trainer = ORTTrainer(
model=model,
args=training_args,
train_dataset=train_dataset,
...
)
# Use ONNX Runtime for training!
trainer.train()
```
You can find more examples in the [documentation](https://huggingface.co/docs/optimum/onnxruntime/usage_guides/trainer) and in the [examples](https://github.com/huggingface/optimum/tree/main/examples/onnxruntime/training).
### Quanto
[Quanto](https://github.com/huggingface/optimum-quanto) is a pytorch quantization backend.
You can quantize a model either using the python API or the `optimum-cli`.
```python
from transformers import AutoModelForCausalLM
from optimum.quanto import QuantizedModelForCausalLM, qint4
model = AutoModelForCausalLM.from_pretrained('meta-llama/Meta-Llama-3.1-8B')
qmodel = QuantizedModelForCausalLM.quantize(model, weights=qint4, exclude='lm_head')
```
The quantized model can be saved using `save_pretrained`:
```python
qmodel.save_pretrained('./Llama-3.1-8B-quantized')
```
It can later be reloaded using `from_pretrained`:
```python
from optimum.quanto import QuantizedModelForCausalLM
qmodel = QuantizedModelForCausalLM.from_pretrained('Llama-3.1-8B-quantized')
```
You can see more details and [examples](https://github.com/huggingface/optimum-quanto/tree/main/examples) in the [Quanto](https://github.com/huggingface/optimum-quanto) repository.
Raw data
{
"_id": null,
"home_page": "https://github.com/huggingface/optimum",
"name": "optimum",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7.0",
"maintainer_email": null,
"keywords": "transformers, quantization, pruning, optimization, training, inference, onnx, onnx runtime, intel, habana, graphcore, neural compressor, ipu, hpu",
"author": "HuggingFace Inc. Special Ops Team",
"author_email": "hardware@huggingface.co",
"download_url": "https://files.pythonhosted.org/packages/f5/95/44eb569e2a70f9c63dd75f80fea8495eec464c29b988188ebcae940a6470/optimum-1.23.3.tar.gz",
"platform": null,
"description": "[![ONNX Runtime](https://github.com/huggingface/optimum/actions/workflows/test_onnxruntime.yml/badge.svg)](https://github.com/huggingface/optimum/actions/workflows/test_onnxruntime.yml)\n\n# Hugging Face Optimum\n\n\ud83e\udd17 Optimum is an extension of \ud83e\udd17 Transformers and Diffusers, providing a set of optimization tools enabling maximum efficiency to train and run models on targeted hardware, while keeping things easy to use.\n\n## Installation\n\n\ud83e\udd17 Optimum can be installed using `pip` as follows:\n\n```bash\npython -m pip install optimum\n```\n\nIf you'd like to use the accelerator-specific features of \ud83e\udd17 Optimum, you can install the required dependencies according to the table below:\n\n| Accelerator | Installation |\n|:-----------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------|\n| [ONNX Runtime](https://huggingface.co/docs/optimum/onnxruntime/overview) | `pip install --upgrade --upgrade-strategy eager optimum[onnxruntime]` |\n| [Intel Neural Compressor](https://huggingface.co/docs/optimum/intel/index) | `pip install --upgrade --upgrade-strategy eager optimum[neural-compressor]`|\n| [OpenVINO](https://huggingface.co/docs/optimum/intel/index) | `pip install --upgrade --upgrade-strategy eager optimum[openvino]` |\n| [NVIDIA TensorRT-LLM](https://huggingface.co/docs/optimum/main/en/nvidia_overview) | `docker run -it --gpus all --ipc host huggingface/optimum-nvidia` |\n| [AMD Instinct GPUs and Ryzen AI NPU](https://huggingface.co/docs/optimum/amd/index) | `pip install --upgrade --upgrade-strategy eager optimum[amd]` |\n| [AWS Trainum & Inferentia](https://huggingface.co/docs/optimum-neuron/index) | `pip install --upgrade --upgrade-strategy eager optimum[neuronx]` |\n| [Habana Gaudi Processor (HPU)](https://huggingface.co/docs/optimum/habana/index) | `pip install --upgrade --upgrade-strategy eager optimum[habana]` |\n| [FuriosaAI](https://huggingface.co/docs/optimum/furiosa/index) | `pip install --upgrade --upgrade-strategy eager optimum[furiosa]` |\n\nThe `--upgrade --upgrade-strategy eager` option is needed to ensure the different packages are upgraded to the latest possible version.\n\nTo install from source:\n\n```bash\npython -m pip install git+https://github.com/huggingface/optimum.git\n```\n\nFor the accelerator-specific features, append `optimum[accelerator_type]` to the above command:\n\n```bash\npython -m pip install optimum[onnxruntime]@git+https://github.com/huggingface/optimum.git\n```\n\n## Accelerated Inference\n\n\ud83e\udd17 Optimum provides multiple tools to export and run optimized models on various ecosystems:\n\n- [ONNX](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model) / [ONNX Runtime](https://huggingface.co/docs/optimum/onnxruntime/usage_guides/models)\n- TensorFlow Lite\n- [OpenVINO](https://huggingface.co/docs/optimum/intel/inference)\n- Habana first-gen Gaudi / Gaudi2, more details [here](https://huggingface.co/docs/optimum/main/en/habana/usage_guides/accelerate_inference)\n- AWS Inferentia 2 / Inferentia 1, more details [here](https://huggingface.co/docs/optimum-neuron/en/guides/models)\n- NVIDIA TensorRT-LLM , more details [here](https://huggingface.co/blog/optimum-nvidia)\n\nThe [export](https://huggingface.co/docs/optimum/exporters/overview) and optimizations can be done both programmatically and with a command line.\n\n### Features summary\n\n| Features | [ONNX Runtime](https://huggingface.co/docs/optimum/main/en/onnxruntime/overview)| [Neural Compressor](https://huggingface.co/docs/optimum/main/en/intel/optimization_inc)| [OpenVINO](https://huggingface.co/docs/optimum/main/en/intel/inference)| [TensorFlow Lite](https://huggingface.co/docs/optimum/main/en/exporters/tflite/overview)|\n|:----------------------------------:|:------------------:|:------------------:|:------------------:|:------------------:|\n| Graph optimization | :heavy_check_mark: | N/A | :heavy_check_mark: | N/A |\n| Post-training dynamic quantization | :heavy_check_mark: | :heavy_check_mark: | N/A | :heavy_check_mark: |\n| Post-training static quantization | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |\n| Quantization Aware Training (QAT) | N/A | :heavy_check_mark: | :heavy_check_mark: | N/A |\n| FP16 (half precision) | :heavy_check_mark: | N/A | :heavy_check_mark: | :heavy_check_mark: |\n| Pruning | N/A | :heavy_check_mark: | :heavy_check_mark: | N/A |\n| Knowledge Distillation | N/A | :heavy_check_mark: | :heavy_check_mark: | N/A |\n\n\n### OpenVINO\n\nBefore you begin, make sure you have all the necessary libraries installed :\n\n```bash\npip install --upgrade --upgrade-strategy eager optimum[openvino]\n```\n\nIt is possible to export \ud83e\udd17 Transformers and Diffusers models to the OpenVINO format easily:\n\n```bash\noptimum-cli export openvino --model distilbert-base-uncased-finetuned-sst-2-english distilbert_sst2_ov\n```\n\nIf you add `--weight-format int8`, the weights will be quantized to `int8`, check out our [documentation](https://huggingface.co/docs/optimum/main/intel/openvino/export) for more detail. To apply quantization on both weights and activations, you can find more information [here](https://huggingface.co/docs/optimum/main/intel/openvino/optimization#static-quantization).\n\nTo load a model and run inference with OpenVINO Runtime, you can just replace your `AutoModelForXxx` class with the corresponding `OVModelForXxx` class. To load a PyTorch checkpoint and convert it to the OpenVINO format on-the-fly, you can set `export=True` when loading your model.\n\n```diff\n- from transformers import AutoModelForSequenceClassification\n+ from optimum.intel import OVModelForSequenceClassification\n from transformers import AutoTokenizer, pipeline\n\n model_id = \"distilbert-base-uncased-finetuned-sst-2-english\"\n tokenizer = AutoTokenizer.from_pretrained(model_id)\n- model = AutoModelForSequenceClassification.from_pretrained(model_id)\n+ model = OVModelForSequenceClassification.from_pretrained(model_id, export=True)\n\n classifier = pipeline(\"text-classification\", model=model, tokenizer=tokenizer)\n results = classifier(\"He's a dreadful magician.\")\n```\n\nYou can find more examples in the [documentation](https://huggingface.co/docs/optimum/main/intel/openvino/inference) and in the [examples](https://github.com/huggingface/optimum-intel/tree/main/examples/openvino).\n\n### Neural Compressor\n\nBefore you begin, make sure you have all the necessary libraries installed :\n\n```bash\npip install --upgrade --upgrade-strategy eager optimum[neural-compressor]\n```\n\nDynamic quantization can be applied on your model:\n\n```bash\noptimum-cli inc quantize --model distilbert-base-cased-distilled-squad --output ./quantized_distilbert\n```\n\nTo load a model quantized with Intel Neural Compressor, hosted locally or on the \ud83e\udd17 hub, you can do as follows :\n```python\nfrom optimum.intel import INCModelForSequenceClassification\n\nmodel_id = \"Intel/distilbert-base-uncased-finetuned-sst-2-english-int8-dynamic\"\nmodel = INCModelForSequenceClassification.from_pretrained(model_id)\n```\n\nYou can find more examples in the [documentation](https://huggingface.co/docs/optimum/intel/optimization_inc) and in the [examples](https://github.com/huggingface/optimum-intel/tree/main/examples/neural_compressor).\n\n### ONNX + ONNX Runtime\n\nBefore you begin, make sure you have all the necessary libraries installed :\n\n```bash\npip install optimum[exporters,onnxruntime]\n```\n\nIt is possible to export \ud83e\udd17 Transformers and Diffusers models to the [ONNX](https://onnx.ai/) format and perform graph optimization as well as quantization easily:\n\n```plain\noptimum-cli export onnx -m deepset/roberta-base-squad2 --optimize O2 roberta_base_qa_onnx\n```\n\nThe model can then be quantized using `onnxruntime`:\n\n```bash\noptimum-cli onnxruntime quantize \\\n --avx512 \\\n --onnx_model roberta_base_qa_onnx \\\n -o quantized_roberta_base_qa_onnx\n```\n\nThese commands will export `deepset/roberta-base-squad2` and perform [O2 graph optimization](https://huggingface.co/docs/optimum/onnxruntime/usage_guides/optimization#optimization-configuration) on the exported model, and finally quantize it with the [avx512 configuration](https://huggingface.co/docs/optimum/main/en/onnxruntime/package_reference/configuration#optimum.onnxruntime.AutoQuantizationConfig.avx512).\n\nFor more information on the ONNX export, please check the [documentation](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model).\n\n#### Run the exported model using ONNX Runtime\n\nOnce the model is exported to the ONNX format, we provide Python classes enabling you to run the exported ONNX model in a seemless manner using [ONNX Runtime](https://onnxruntime.ai/) in the backend:\n\n```diff\n- from transformers import AutoModelForQuestionAnswering\n+ from optimum.onnxruntime import ORTModelForQuestionAnswering\n from transformers import AutoTokenizer, pipeline\n\n model_id = \"deepset/roberta-base-squad2\"\n tokenizer = AutoTokenizer.from_pretrained(model_id)\n- model = AutoModelForQuestionAnswering.from_pretrained(model_id)\n+ model = ORTModelForQuestionAnswering.from_pretrained(\"roberta_base_qa_onnx\")\n qa_pipe = pipeline(\"question-answering\", model=model, tokenizer=tokenizer)\n question = \"What's Optimum?\"\n context = \"Optimum is an awesome library everyone should use!\"\n results = qa_pipe(question=question, context=context)\n```\n\nMore details on how to run ONNX models with `ORTModelForXXX` classes [here](https://huggingface.co/docs/optimum/main/en/onnxruntime/usage_guides/models).\n\n### TensorFlow Lite\n\nBefore you begin, make sure you have all the necessary libraries installed :\n\n```bash\npip install optimum[exporters-tf]\n```\n\nJust as for ONNX, it is possible to export models to [TensorFlow Lite](https://www.tensorflow.org/lite) and quantize them:\n\n```plain\noptimum-cli export tflite \\\n -m deepset/roberta-base-squad2 \\\n --sequence_length 384 \\\n --quantize int8-dynamic roberta_tflite_model\n```\n\n## Accelerated training\n\n\ud83e\udd17 Optimum provides wrappers around the original \ud83e\udd17 Transformers [Trainer](https://huggingface.co/docs/transformers/main_classes/trainer) to enable training on powerful hardware easily.\nWe support many providers:\n\n- Habana's Gaudi processors\n- AWS Trainium instances, check [here](https://huggingface.co/docs/optimum-neuron/en/guides/distributed_training)\n- ONNX Runtime (optimized for GPUs)\n\n### Habana\n\nBefore you begin, make sure you have all the necessary libraries installed :\n\n```bash\npip install --upgrade --upgrade-strategy eager optimum[habana]\n```\n\n```diff\n- from transformers import Trainer, TrainingArguments\n+ from optimum.habana import GaudiTrainer, GaudiTrainingArguments\n\n # Download a pretrained model from the Hub\n model = AutoModelForXxx.from_pretrained(\"bert-base-uncased\")\n\n # Define the training arguments\n- training_args = TrainingArguments(\n+ training_args = GaudiTrainingArguments(\n output_dir=\"path/to/save/folder/\",\n+ use_habana=True,\n+ use_lazy_mode=True,\n+ gaudi_config_name=\"Habana/bert-base-uncased\",\n ...\n )\n\n # Initialize the trainer\n- trainer = Trainer(\n+ trainer = GaudiTrainer(\n model=model,\n args=training_args,\n train_dataset=train_dataset,\n ...\n )\n\n # Use Habana Gaudi processor for training!\n trainer.train()\n```\n\nYou can find more examples in the [documentation](https://huggingface.co/docs/optimum/habana/quickstart) and in the [examples](https://github.com/huggingface/optimum-habana/tree/main/examples).\n\n### ONNX Runtime\n\n```diff\n- from transformers import Trainer, TrainingArguments\n+ from optimum.onnxruntime import ORTTrainer, ORTTrainingArguments\n\n # Download a pretrained model from the Hub\n model = AutoModelForSequenceClassification.from_pretrained(\"bert-base-uncased\")\n\n # Define the training arguments\n- training_args = TrainingArguments(\n+ training_args = ORTTrainingArguments(\n output_dir=\"path/to/save/folder/\",\n optim=\"adamw_ort_fused\",\n ...\n )\n\n # Create a ONNX Runtime Trainer\n- trainer = Trainer(\n+ trainer = ORTTrainer(\n model=model,\n args=training_args,\n train_dataset=train_dataset,\n ...\n )\n\n # Use ONNX Runtime for training!\n trainer.train()\n```\n\nYou can find more examples in the [documentation](https://huggingface.co/docs/optimum/onnxruntime/usage_guides/trainer) and in the [examples](https://github.com/huggingface/optimum/tree/main/examples/onnxruntime/training).\n\n\n### Quanto\n\n[Quanto](https://github.com/huggingface/optimum-quanto) is a pytorch quantization backend.\n\nYou can quantize a model either using the python API or the `optimum-cli`.\n\n```python\nfrom transformers import AutoModelForCausalLM\nfrom optimum.quanto import QuantizedModelForCausalLM, qint4\n\nmodel = AutoModelForCausalLM.from_pretrained('meta-llama/Meta-Llama-3.1-8B')\nqmodel = QuantizedModelForCausalLM.quantize(model, weights=qint4, exclude='lm_head')\n```\n\nThe quantized model can be saved using `save_pretrained`:\n\n```python\nqmodel.save_pretrained('./Llama-3.1-8B-quantized')\n```\n\nIt can later be reloaded using `from_pretrained`:\n\n```python\nfrom optimum.quanto import QuantizedModelForCausalLM\n\nqmodel = QuantizedModelForCausalLM.from_pretrained('Llama-3.1-8B-quantized')\n```\n\nYou can see more details and [examples](https://github.com/huggingface/optimum-quanto/tree/main/examples) in the [Quanto](https://github.com/huggingface/optimum-quanto) repository.\n",
"bugtrack_url": null,
"license": "Apache",
"summary": "Optimum Library is an extension of the Hugging Face Transformers library, providing a framework to integrate third-party libraries from Hardware Partners and interface with their specific functionality.",
"version": "1.23.3",
"project_urls": {
"Homepage": "https://github.com/huggingface/optimum"
},
"split_keywords": [
"transformers",
" quantization",
" pruning",
" optimization",
" training",
" inference",
" onnx",
" onnx runtime",
" intel",
" habana",
" graphcore",
" neural compressor",
" ipu",
" hpu"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "483397cf226c47e4cf5a79159668732038cdd6c0199c72782d5b5a0db54f9a2d",
"md5": "1f50b98816fc7050ae06387b1688e641",
"sha256": "ac34b497310e74e919e8eb3bc01cfea48bca304ade3e3ce8a7707d125120001a"
},
"downloads": -1,
"filename": "optimum-1.23.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "1f50b98816fc7050ae06387b1688e641",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7.0",
"size": 424070,
"upload_time": "2024-10-29T17:43:29",
"upload_time_iso_8601": "2024-10-29T17:43:29.934776Z",
"url": "https://files.pythonhosted.org/packages/48/33/97cf226c47e4cf5a79159668732038cdd6c0199c72782d5b5a0db54f9a2d/optimum-1.23.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "f59544eb569e2a70f9c63dd75f80fea8495eec464c29b988188ebcae940a6470",
"md5": "e9b6fa8e759856b66307ac7af8bca3e7",
"sha256": "2089bd73d1232686473a80effd53800f8a8c385c02126e80d35c07227c1b9bf5"
},
"downloads": -1,
"filename": "optimum-1.23.3.tar.gz",
"has_sig": false,
"md5_digest": "e9b6fa8e759856b66307ac7af8bca3e7",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7.0",
"size": 341546,
"upload_time": "2024-10-29T17:43:32",
"upload_time_iso_8601": "2024-10-29T17:43:32.132536Z",
"url": "https://files.pythonhosted.org/packages/f5/95/44eb569e2a70f9c63dd75f80fea8495eec464c29b988188ebcae940a6470/optimum-1.23.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-29 17:43:32",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "huggingface",
"github_project": "optimum",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "optimum"
}