optimum


Nameoptimum JSON
Version 1.23.3 PyPI version JSON
download
home_pagehttps://github.com/huggingface/optimum
SummaryOptimum Library is an extension of the Hugging Face Transformers library, providing a framework to integrate third-party libraries from Hardware Partners and interface with their specific functionality.
upload_time2024-10-29 17:43:32
maintainerNone
docs_urlNone
authorHuggingFace Inc. Special Ops Team
requires_python>=3.7.0
licenseApache
keywords transformers quantization pruning optimization training inference onnx onnx runtime intel habana graphcore neural compressor ipu hpu
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![ONNX Runtime](https://github.com/huggingface/optimum/actions/workflows/test_onnxruntime.yml/badge.svg)](https://github.com/huggingface/optimum/actions/workflows/test_onnxruntime.yml)

# Hugging Face Optimum

🤗 Optimum is an extension of 🤗 Transformers and Diffusers, providing a set of optimization tools enabling maximum efficiency to train and run models on targeted hardware, while keeping things easy to use.

## Installation

🤗 Optimum can be installed using `pip` as follows:

```bash
python -m pip install optimum
```

If you'd like to use the accelerator-specific features of 🤗 Optimum, you can install the required dependencies according to the table below:

| Accelerator                                                                                                            | Installation                                                      |
|:-----------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------|
| [ONNX Runtime](https://huggingface.co/docs/optimum/onnxruntime/overview)                                               | `pip install --upgrade --upgrade-strategy eager optimum[onnxruntime]`      |
| [Intel Neural Compressor](https://huggingface.co/docs/optimum/intel/index)                                             | `pip install --upgrade --upgrade-strategy eager optimum[neural-compressor]`|
| [OpenVINO](https://huggingface.co/docs/optimum/intel/index)                                                            | `pip install --upgrade --upgrade-strategy eager optimum[openvino]`         |
| [NVIDIA TensorRT-LLM](https://huggingface.co/docs/optimum/main/en/nvidia_overview)                                     | `docker run -it --gpus all --ipc host huggingface/optimum-nvidia`          |
| [AMD Instinct GPUs and Ryzen AI NPU](https://huggingface.co/docs/optimum/amd/index)                                    | `pip install --upgrade --upgrade-strategy eager optimum[amd]`              |
| [AWS Trainum & Inferentia](https://huggingface.co/docs/optimum-neuron/index)                                           | `pip install --upgrade --upgrade-strategy eager optimum[neuronx]`          |
| [Habana Gaudi Processor (HPU)](https://huggingface.co/docs/optimum/habana/index)                                       | `pip install --upgrade --upgrade-strategy eager optimum[habana]`           |
| [FuriosaAI](https://huggingface.co/docs/optimum/furiosa/index)                                                         | `pip install --upgrade --upgrade-strategy eager optimum[furiosa]`          |

The `--upgrade --upgrade-strategy eager` option is needed to ensure the different packages are upgraded to the latest possible version.

To install from source:

```bash
python -m pip install git+https://github.com/huggingface/optimum.git
```

For the accelerator-specific features, append `optimum[accelerator_type]` to the above command:

```bash
python -m pip install optimum[onnxruntime]@git+https://github.com/huggingface/optimum.git
```

## Accelerated Inference

🤗 Optimum provides multiple tools to export and run optimized models on various ecosystems:

- [ONNX](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model) / [ONNX Runtime](https://huggingface.co/docs/optimum/onnxruntime/usage_guides/models)
- TensorFlow Lite
- [OpenVINO](https://huggingface.co/docs/optimum/intel/inference)
- Habana first-gen Gaudi / Gaudi2, more details [here](https://huggingface.co/docs/optimum/main/en/habana/usage_guides/accelerate_inference)
- AWS Inferentia 2 / Inferentia 1, more details [here](https://huggingface.co/docs/optimum-neuron/en/guides/models)
- NVIDIA TensorRT-LLM , more details [here](https://huggingface.co/blog/optimum-nvidia)

The [export](https://huggingface.co/docs/optimum/exporters/overview) and optimizations can be done both programmatically and with a command line.

### Features summary

| Features                           | [ONNX Runtime](https://huggingface.co/docs/optimum/main/en/onnxruntime/overview)| [Neural Compressor](https://huggingface.co/docs/optimum/main/en/intel/optimization_inc)| [OpenVINO](https://huggingface.co/docs/optimum/main/en/intel/inference)| [TensorFlow Lite](https://huggingface.co/docs/optimum/main/en/exporters/tflite/overview)|
|:----------------------------------:|:------------------:|:------------------:|:------------------:|:------------------:|
| Graph optimization                 | :heavy_check_mark: | N/A                | :heavy_check_mark: | N/A                |
| Post-training dynamic quantization | :heavy_check_mark: | :heavy_check_mark: | N/A                | :heavy_check_mark: |
| Post-training static quantization  | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
| Quantization Aware Training (QAT)  | N/A                | :heavy_check_mark: | :heavy_check_mark: | N/A                |
| FP16 (half precision)              | :heavy_check_mark: | N/A                | :heavy_check_mark: | :heavy_check_mark: |
| Pruning                            | N/A                | :heavy_check_mark: | :heavy_check_mark: | N/A                |
| Knowledge Distillation             | N/A                | :heavy_check_mark: | :heavy_check_mark: | N/A                |


### OpenVINO

Before you begin, make sure you have all the necessary libraries installed :

```bash
pip install --upgrade --upgrade-strategy eager optimum[openvino]
```

It is possible to export 🤗 Transformers and Diffusers models to the OpenVINO format easily:

```bash
optimum-cli export openvino --model distilbert-base-uncased-finetuned-sst-2-english distilbert_sst2_ov
```

If you add `--weight-format int8`, the weights will be quantized to `int8`, check out our [documentation](https://huggingface.co/docs/optimum/main/intel/openvino/export) for more detail. To apply quantization on both weights and activations, you can find more information [here](https://huggingface.co/docs/optimum/main/intel/openvino/optimization#static-quantization).

To load a model and run inference with OpenVINO Runtime, you can just replace your `AutoModelForXxx` class with the corresponding `OVModelForXxx` class. To load a PyTorch checkpoint and convert it to the OpenVINO format on-the-fly, you can set `export=True` when loading your model.

```diff
- from transformers import AutoModelForSequenceClassification
+ from optimum.intel import OVModelForSequenceClassification
  from transformers import AutoTokenizer, pipeline

  model_id = "distilbert-base-uncased-finetuned-sst-2-english"
  tokenizer = AutoTokenizer.from_pretrained(model_id)
- model = AutoModelForSequenceClassification.from_pretrained(model_id)
+ model = OVModelForSequenceClassification.from_pretrained(model_id, export=True)

  classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)
  results = classifier("He's a dreadful magician.")
```

You can find more examples in the [documentation](https://huggingface.co/docs/optimum/main/intel/openvino/inference) and in the [examples](https://github.com/huggingface/optimum-intel/tree/main/examples/openvino).

### Neural Compressor

Before you begin, make sure you have all the necessary libraries installed :

```bash
pip install --upgrade --upgrade-strategy eager optimum[neural-compressor]
```

Dynamic quantization can be applied on your model:

```bash
optimum-cli inc quantize --model distilbert-base-cased-distilled-squad --output ./quantized_distilbert
```

To load a model quantized with Intel Neural Compressor, hosted locally or on the 🤗 hub, you can do as follows :
```python
from optimum.intel import INCModelForSequenceClassification

model_id = "Intel/distilbert-base-uncased-finetuned-sst-2-english-int8-dynamic"
model = INCModelForSequenceClassification.from_pretrained(model_id)
```

You can find more examples in the [documentation](https://huggingface.co/docs/optimum/intel/optimization_inc) and in the [examples](https://github.com/huggingface/optimum-intel/tree/main/examples/neural_compressor).

### ONNX + ONNX Runtime

Before you begin, make sure you have all the necessary libraries installed :

```bash
pip install optimum[exporters,onnxruntime]
```

It is possible to export 🤗 Transformers and Diffusers models to the [ONNX](https://onnx.ai/) format and perform graph optimization as well as quantization easily:

```plain
optimum-cli export onnx -m deepset/roberta-base-squad2 --optimize O2 roberta_base_qa_onnx
```

The model can then be quantized using `onnxruntime`:

```bash
optimum-cli onnxruntime quantize \
  --avx512 \
  --onnx_model roberta_base_qa_onnx \
  -o quantized_roberta_base_qa_onnx
```

These commands will export `deepset/roberta-base-squad2` and perform [O2 graph optimization](https://huggingface.co/docs/optimum/onnxruntime/usage_guides/optimization#optimization-configuration) on the exported model, and finally quantize it with the [avx512 configuration](https://huggingface.co/docs/optimum/main/en/onnxruntime/package_reference/configuration#optimum.onnxruntime.AutoQuantizationConfig.avx512).

For more information on the ONNX export, please check the [documentation](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model).

#### Run the exported model using ONNX Runtime

Once the model is exported to the ONNX format, we provide Python classes enabling you to run the exported ONNX model in a seemless manner using [ONNX Runtime](https://onnxruntime.ai/) in the backend:

```diff
- from transformers import AutoModelForQuestionAnswering
+ from optimum.onnxruntime import ORTModelForQuestionAnswering
  from transformers import AutoTokenizer, pipeline

  model_id = "deepset/roberta-base-squad2"
  tokenizer = AutoTokenizer.from_pretrained(model_id)
- model = AutoModelForQuestionAnswering.from_pretrained(model_id)
+ model = ORTModelForQuestionAnswering.from_pretrained("roberta_base_qa_onnx")
  qa_pipe = pipeline("question-answering", model=model, tokenizer=tokenizer)
  question = "What's Optimum?"
  context = "Optimum is an awesome library everyone should use!"
  results = qa_pipe(question=question, context=context)
```

More details on how to run ONNX models with `ORTModelForXXX` classes [here](https://huggingface.co/docs/optimum/main/en/onnxruntime/usage_guides/models).

### TensorFlow Lite

Before you begin, make sure you have all the necessary libraries installed :

```bash
pip install optimum[exporters-tf]
```

Just as for ONNX, it is possible to export models to [TensorFlow Lite](https://www.tensorflow.org/lite) and quantize them:

```plain
optimum-cli export tflite \
  -m deepset/roberta-base-squad2 \
  --sequence_length 384  \
  --quantize int8-dynamic roberta_tflite_model
```

## Accelerated training

🤗 Optimum provides wrappers around the original 🤗 Transformers [Trainer](https://huggingface.co/docs/transformers/main_classes/trainer) to enable training on powerful hardware easily.
We support many providers:

- Habana's Gaudi processors
- AWS Trainium instances, check [here](https://huggingface.co/docs/optimum-neuron/en/guides/distributed_training)
- ONNX Runtime (optimized for GPUs)

### Habana

Before you begin, make sure you have all the necessary libraries installed :

```bash
pip install --upgrade --upgrade-strategy eager optimum[habana]
```

```diff
- from transformers import Trainer, TrainingArguments
+ from optimum.habana import GaudiTrainer, GaudiTrainingArguments

  # Download a pretrained model from the Hub
  model = AutoModelForXxx.from_pretrained("bert-base-uncased")

  # Define the training arguments
- training_args = TrainingArguments(
+ training_args = GaudiTrainingArguments(
      output_dir="path/to/save/folder/",
+     use_habana=True,
+     use_lazy_mode=True,
+     gaudi_config_name="Habana/bert-base-uncased",
      ...
  )

  # Initialize the trainer
- trainer = Trainer(
+ trainer = GaudiTrainer(
      model=model,
      args=training_args,
      train_dataset=train_dataset,
      ...
  )

  # Use Habana Gaudi processor for training!
  trainer.train()
```

You can find more examples in the [documentation](https://huggingface.co/docs/optimum/habana/quickstart) and in the [examples](https://github.com/huggingface/optimum-habana/tree/main/examples).

### ONNX Runtime

```diff
- from transformers import Trainer, TrainingArguments
+ from optimum.onnxruntime import ORTTrainer, ORTTrainingArguments

  # Download a pretrained model from the Hub
  model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased")

  # Define the training arguments
- training_args = TrainingArguments(
+ training_args = ORTTrainingArguments(
      output_dir="path/to/save/folder/",
      optim="adamw_ort_fused",
      ...
  )

  # Create a ONNX Runtime Trainer
- trainer = Trainer(
+ trainer = ORTTrainer(
      model=model,
      args=training_args,
      train_dataset=train_dataset,
      ...
  )

  # Use ONNX Runtime for training!
  trainer.train()
```

You can find more examples in the [documentation](https://huggingface.co/docs/optimum/onnxruntime/usage_guides/trainer) and in the [examples](https://github.com/huggingface/optimum/tree/main/examples/onnxruntime/training).


### Quanto

[Quanto](https://github.com/huggingface/optimum-quanto) is a pytorch quantization backend.

You can quantize a model either using the python API or the `optimum-cli`.

```python
from transformers import AutoModelForCausalLM
from optimum.quanto import QuantizedModelForCausalLM, qint4

model = AutoModelForCausalLM.from_pretrained('meta-llama/Meta-Llama-3.1-8B')
qmodel = QuantizedModelForCausalLM.quantize(model, weights=qint4, exclude='lm_head')
```

The quantized model can be saved using `save_pretrained`:

```python
qmodel.save_pretrained('./Llama-3.1-8B-quantized')
```

It can later be reloaded using `from_pretrained`:

```python
from optimum.quanto import QuantizedModelForCausalLM

qmodel = QuantizedModelForCausalLM.from_pretrained('Llama-3.1-8B-quantized')
```

You can see more details and [examples](https://github.com/huggingface/optimum-quanto/tree/main/examples) in the [Quanto](https://github.com/huggingface/optimum-quanto) repository.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/huggingface/optimum",
    "name": "optimum",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7.0",
    "maintainer_email": null,
    "keywords": "transformers, quantization, pruning, optimization, training, inference, onnx, onnx runtime, intel, habana, graphcore, neural compressor, ipu, hpu",
    "author": "HuggingFace Inc. Special Ops Team",
    "author_email": "hardware@huggingface.co",
    "download_url": "https://files.pythonhosted.org/packages/f5/95/44eb569e2a70f9c63dd75f80fea8495eec464c29b988188ebcae940a6470/optimum-1.23.3.tar.gz",
    "platform": null,
    "description": "[![ONNX Runtime](https://github.com/huggingface/optimum/actions/workflows/test_onnxruntime.yml/badge.svg)](https://github.com/huggingface/optimum/actions/workflows/test_onnxruntime.yml)\n\n# Hugging Face Optimum\n\n\ud83e\udd17 Optimum is an extension of \ud83e\udd17 Transformers and Diffusers, providing a set of optimization tools enabling maximum efficiency to train and run models on targeted hardware, while keeping things easy to use.\n\n## Installation\n\n\ud83e\udd17 Optimum can be installed using `pip` as follows:\n\n```bash\npython -m pip install optimum\n```\n\nIf you'd like to use the accelerator-specific features of \ud83e\udd17 Optimum, you can install the required dependencies according to the table below:\n\n| Accelerator                                                                                                            | Installation                                                      |\n|:-----------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------|\n| [ONNX Runtime](https://huggingface.co/docs/optimum/onnxruntime/overview)                                               | `pip install --upgrade --upgrade-strategy eager optimum[onnxruntime]`      |\n| [Intel Neural Compressor](https://huggingface.co/docs/optimum/intel/index)                                             | `pip install --upgrade --upgrade-strategy eager optimum[neural-compressor]`|\n| [OpenVINO](https://huggingface.co/docs/optimum/intel/index)                                                            | `pip install --upgrade --upgrade-strategy eager optimum[openvino]`         |\n| [NVIDIA TensorRT-LLM](https://huggingface.co/docs/optimum/main/en/nvidia_overview)                                     | `docker run -it --gpus all --ipc host huggingface/optimum-nvidia`          |\n| [AMD Instinct GPUs and Ryzen AI NPU](https://huggingface.co/docs/optimum/amd/index)                                    | `pip install --upgrade --upgrade-strategy eager optimum[amd]`              |\n| [AWS Trainum & Inferentia](https://huggingface.co/docs/optimum-neuron/index)                                           | `pip install --upgrade --upgrade-strategy eager optimum[neuronx]`          |\n| [Habana Gaudi Processor (HPU)](https://huggingface.co/docs/optimum/habana/index)                                       | `pip install --upgrade --upgrade-strategy eager optimum[habana]`           |\n| [FuriosaAI](https://huggingface.co/docs/optimum/furiosa/index)                                                         | `pip install --upgrade --upgrade-strategy eager optimum[furiosa]`          |\n\nThe `--upgrade --upgrade-strategy eager` option is needed to ensure the different packages are upgraded to the latest possible version.\n\nTo install from source:\n\n```bash\npython -m pip install git+https://github.com/huggingface/optimum.git\n```\n\nFor the accelerator-specific features, append `optimum[accelerator_type]` to the above command:\n\n```bash\npython -m pip install optimum[onnxruntime]@git+https://github.com/huggingface/optimum.git\n```\n\n## Accelerated Inference\n\n\ud83e\udd17 Optimum provides multiple tools to export and run optimized models on various ecosystems:\n\n- [ONNX](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model) / [ONNX Runtime](https://huggingface.co/docs/optimum/onnxruntime/usage_guides/models)\n- TensorFlow Lite\n- [OpenVINO](https://huggingface.co/docs/optimum/intel/inference)\n- Habana first-gen Gaudi / Gaudi2, more details [here](https://huggingface.co/docs/optimum/main/en/habana/usage_guides/accelerate_inference)\n- AWS Inferentia 2 / Inferentia 1, more details [here](https://huggingface.co/docs/optimum-neuron/en/guides/models)\n- NVIDIA TensorRT-LLM , more details [here](https://huggingface.co/blog/optimum-nvidia)\n\nThe [export](https://huggingface.co/docs/optimum/exporters/overview) and optimizations can be done both programmatically and with a command line.\n\n### Features summary\n\n| Features                           | [ONNX Runtime](https://huggingface.co/docs/optimum/main/en/onnxruntime/overview)| [Neural Compressor](https://huggingface.co/docs/optimum/main/en/intel/optimization_inc)| [OpenVINO](https://huggingface.co/docs/optimum/main/en/intel/inference)| [TensorFlow Lite](https://huggingface.co/docs/optimum/main/en/exporters/tflite/overview)|\n|:----------------------------------:|:------------------:|:------------------:|:------------------:|:------------------:|\n| Graph optimization                 | :heavy_check_mark: | N/A                | :heavy_check_mark: | N/A                |\n| Post-training dynamic quantization | :heavy_check_mark: | :heavy_check_mark: | N/A                | :heavy_check_mark: |\n| Post-training static quantization  | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |\n| Quantization Aware Training (QAT)  | N/A                | :heavy_check_mark: | :heavy_check_mark: | N/A                |\n| FP16 (half precision)              | :heavy_check_mark: | N/A                | :heavy_check_mark: | :heavy_check_mark: |\n| Pruning                            | N/A                | :heavy_check_mark: | :heavy_check_mark: | N/A                |\n| Knowledge Distillation             | N/A                | :heavy_check_mark: | :heavy_check_mark: | N/A                |\n\n\n### OpenVINO\n\nBefore you begin, make sure you have all the necessary libraries installed :\n\n```bash\npip install --upgrade --upgrade-strategy eager optimum[openvino]\n```\n\nIt is possible to export \ud83e\udd17 Transformers and Diffusers models to the OpenVINO format easily:\n\n```bash\noptimum-cli export openvino --model distilbert-base-uncased-finetuned-sst-2-english distilbert_sst2_ov\n```\n\nIf you add `--weight-format int8`, the weights will be quantized to `int8`, check out our [documentation](https://huggingface.co/docs/optimum/main/intel/openvino/export) for more detail. To apply quantization on both weights and activations, you can find more information [here](https://huggingface.co/docs/optimum/main/intel/openvino/optimization#static-quantization).\n\nTo load a model and run inference with OpenVINO Runtime, you can just replace your `AutoModelForXxx` class with the corresponding `OVModelForXxx` class. To load a PyTorch checkpoint and convert it to the OpenVINO format on-the-fly, you can set `export=True` when loading your model.\n\n```diff\n- from transformers import AutoModelForSequenceClassification\n+ from optimum.intel import OVModelForSequenceClassification\n  from transformers import AutoTokenizer, pipeline\n\n  model_id = \"distilbert-base-uncased-finetuned-sst-2-english\"\n  tokenizer = AutoTokenizer.from_pretrained(model_id)\n- model = AutoModelForSequenceClassification.from_pretrained(model_id)\n+ model = OVModelForSequenceClassification.from_pretrained(model_id, export=True)\n\n  classifier = pipeline(\"text-classification\", model=model, tokenizer=tokenizer)\n  results = classifier(\"He's a dreadful magician.\")\n```\n\nYou can find more examples in the [documentation](https://huggingface.co/docs/optimum/main/intel/openvino/inference) and in the [examples](https://github.com/huggingface/optimum-intel/tree/main/examples/openvino).\n\n### Neural Compressor\n\nBefore you begin, make sure you have all the necessary libraries installed :\n\n```bash\npip install --upgrade --upgrade-strategy eager optimum[neural-compressor]\n```\n\nDynamic quantization can be applied on your model:\n\n```bash\noptimum-cli inc quantize --model distilbert-base-cased-distilled-squad --output ./quantized_distilbert\n```\n\nTo load a model quantized with Intel Neural Compressor, hosted locally or on the \ud83e\udd17 hub, you can do as follows :\n```python\nfrom optimum.intel import INCModelForSequenceClassification\n\nmodel_id = \"Intel/distilbert-base-uncased-finetuned-sst-2-english-int8-dynamic\"\nmodel = INCModelForSequenceClassification.from_pretrained(model_id)\n```\n\nYou can find more examples in the [documentation](https://huggingface.co/docs/optimum/intel/optimization_inc) and in the [examples](https://github.com/huggingface/optimum-intel/tree/main/examples/neural_compressor).\n\n### ONNX + ONNX Runtime\n\nBefore you begin, make sure you have all the necessary libraries installed :\n\n```bash\npip install optimum[exporters,onnxruntime]\n```\n\nIt is possible to export \ud83e\udd17 Transformers and Diffusers models to the [ONNX](https://onnx.ai/) format and perform graph optimization as well as quantization easily:\n\n```plain\noptimum-cli export onnx -m deepset/roberta-base-squad2 --optimize O2 roberta_base_qa_onnx\n```\n\nThe model can then be quantized using `onnxruntime`:\n\n```bash\noptimum-cli onnxruntime quantize \\\n  --avx512 \\\n  --onnx_model roberta_base_qa_onnx \\\n  -o quantized_roberta_base_qa_onnx\n```\n\nThese commands will export `deepset/roberta-base-squad2` and perform [O2 graph optimization](https://huggingface.co/docs/optimum/onnxruntime/usage_guides/optimization#optimization-configuration) on the exported model, and finally quantize it with the [avx512 configuration](https://huggingface.co/docs/optimum/main/en/onnxruntime/package_reference/configuration#optimum.onnxruntime.AutoQuantizationConfig.avx512).\n\nFor more information on the ONNX export, please check the [documentation](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model).\n\n#### Run the exported model using ONNX Runtime\n\nOnce the model is exported to the ONNX format, we provide Python classes enabling you to run the exported ONNX model in a seemless manner using [ONNX Runtime](https://onnxruntime.ai/) in the backend:\n\n```diff\n- from transformers import AutoModelForQuestionAnswering\n+ from optimum.onnxruntime import ORTModelForQuestionAnswering\n  from transformers import AutoTokenizer, pipeline\n\n  model_id = \"deepset/roberta-base-squad2\"\n  tokenizer = AutoTokenizer.from_pretrained(model_id)\n- model = AutoModelForQuestionAnswering.from_pretrained(model_id)\n+ model = ORTModelForQuestionAnswering.from_pretrained(\"roberta_base_qa_onnx\")\n  qa_pipe = pipeline(\"question-answering\", model=model, tokenizer=tokenizer)\n  question = \"What's Optimum?\"\n  context = \"Optimum is an awesome library everyone should use!\"\n  results = qa_pipe(question=question, context=context)\n```\n\nMore details on how to run ONNX models with `ORTModelForXXX` classes [here](https://huggingface.co/docs/optimum/main/en/onnxruntime/usage_guides/models).\n\n### TensorFlow Lite\n\nBefore you begin, make sure you have all the necessary libraries installed :\n\n```bash\npip install optimum[exporters-tf]\n```\n\nJust as for ONNX, it is possible to export models to [TensorFlow Lite](https://www.tensorflow.org/lite) and quantize them:\n\n```plain\noptimum-cli export tflite \\\n  -m deepset/roberta-base-squad2 \\\n  --sequence_length 384  \\\n  --quantize int8-dynamic roberta_tflite_model\n```\n\n## Accelerated training\n\n\ud83e\udd17 Optimum provides wrappers around the original \ud83e\udd17 Transformers [Trainer](https://huggingface.co/docs/transformers/main_classes/trainer) to enable training on powerful hardware easily.\nWe support many providers:\n\n- Habana's Gaudi processors\n- AWS Trainium instances, check [here](https://huggingface.co/docs/optimum-neuron/en/guides/distributed_training)\n- ONNX Runtime (optimized for GPUs)\n\n### Habana\n\nBefore you begin, make sure you have all the necessary libraries installed :\n\n```bash\npip install --upgrade --upgrade-strategy eager optimum[habana]\n```\n\n```diff\n- from transformers import Trainer, TrainingArguments\n+ from optimum.habana import GaudiTrainer, GaudiTrainingArguments\n\n  # Download a pretrained model from the Hub\n  model = AutoModelForXxx.from_pretrained(\"bert-base-uncased\")\n\n  # Define the training arguments\n- training_args = TrainingArguments(\n+ training_args = GaudiTrainingArguments(\n      output_dir=\"path/to/save/folder/\",\n+     use_habana=True,\n+     use_lazy_mode=True,\n+     gaudi_config_name=\"Habana/bert-base-uncased\",\n      ...\n  )\n\n  # Initialize the trainer\n- trainer = Trainer(\n+ trainer = GaudiTrainer(\n      model=model,\n      args=training_args,\n      train_dataset=train_dataset,\n      ...\n  )\n\n  # Use Habana Gaudi processor for training!\n  trainer.train()\n```\n\nYou can find more examples in the [documentation](https://huggingface.co/docs/optimum/habana/quickstart) and in the [examples](https://github.com/huggingface/optimum-habana/tree/main/examples).\n\n### ONNX Runtime\n\n```diff\n- from transformers import Trainer, TrainingArguments\n+ from optimum.onnxruntime import ORTTrainer, ORTTrainingArguments\n\n  # Download a pretrained model from the Hub\n  model = AutoModelForSequenceClassification.from_pretrained(\"bert-base-uncased\")\n\n  # Define the training arguments\n- training_args = TrainingArguments(\n+ training_args = ORTTrainingArguments(\n      output_dir=\"path/to/save/folder/\",\n      optim=\"adamw_ort_fused\",\n      ...\n  )\n\n  # Create a ONNX Runtime Trainer\n- trainer = Trainer(\n+ trainer = ORTTrainer(\n      model=model,\n      args=training_args,\n      train_dataset=train_dataset,\n      ...\n  )\n\n  # Use ONNX Runtime for training!\n  trainer.train()\n```\n\nYou can find more examples in the [documentation](https://huggingface.co/docs/optimum/onnxruntime/usage_guides/trainer) and in the [examples](https://github.com/huggingface/optimum/tree/main/examples/onnxruntime/training).\n\n\n### Quanto\n\n[Quanto](https://github.com/huggingface/optimum-quanto) is a pytorch quantization backend.\n\nYou can quantize a model either using the python API or the `optimum-cli`.\n\n```python\nfrom transformers import AutoModelForCausalLM\nfrom optimum.quanto import QuantizedModelForCausalLM, qint4\n\nmodel = AutoModelForCausalLM.from_pretrained('meta-llama/Meta-Llama-3.1-8B')\nqmodel = QuantizedModelForCausalLM.quantize(model, weights=qint4, exclude='lm_head')\n```\n\nThe quantized model can be saved using `save_pretrained`:\n\n```python\nqmodel.save_pretrained('./Llama-3.1-8B-quantized')\n```\n\nIt can later be reloaded using `from_pretrained`:\n\n```python\nfrom optimum.quanto import QuantizedModelForCausalLM\n\nqmodel = QuantizedModelForCausalLM.from_pretrained('Llama-3.1-8B-quantized')\n```\n\nYou can see more details and [examples](https://github.com/huggingface/optimum-quanto/tree/main/examples) in the [Quanto](https://github.com/huggingface/optimum-quanto) repository.\n",
    "bugtrack_url": null,
    "license": "Apache",
    "summary": "Optimum Library is an extension of the Hugging Face Transformers library, providing a framework to integrate third-party libraries from Hardware Partners and interface with their specific functionality.",
    "version": "1.23.3",
    "project_urls": {
        "Homepage": "https://github.com/huggingface/optimum"
    },
    "split_keywords": [
        "transformers",
        " quantization",
        " pruning",
        " optimization",
        " training",
        " inference",
        " onnx",
        " onnx runtime",
        " intel",
        " habana",
        " graphcore",
        " neural compressor",
        " ipu",
        " hpu"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "483397cf226c47e4cf5a79159668732038cdd6c0199c72782d5b5a0db54f9a2d",
                "md5": "1f50b98816fc7050ae06387b1688e641",
                "sha256": "ac34b497310e74e919e8eb3bc01cfea48bca304ade3e3ce8a7707d125120001a"
            },
            "downloads": -1,
            "filename": "optimum-1.23.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1f50b98816fc7050ae06387b1688e641",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7.0",
            "size": 424070,
            "upload_time": "2024-10-29T17:43:29",
            "upload_time_iso_8601": "2024-10-29T17:43:29.934776Z",
            "url": "https://files.pythonhosted.org/packages/48/33/97cf226c47e4cf5a79159668732038cdd6c0199c72782d5b5a0db54f9a2d/optimum-1.23.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f59544eb569e2a70f9c63dd75f80fea8495eec464c29b988188ebcae940a6470",
                "md5": "e9b6fa8e759856b66307ac7af8bca3e7",
                "sha256": "2089bd73d1232686473a80effd53800f8a8c385c02126e80d35c07227c1b9bf5"
            },
            "downloads": -1,
            "filename": "optimum-1.23.3.tar.gz",
            "has_sig": false,
            "md5_digest": "e9b6fa8e759856b66307ac7af8bca3e7",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7.0",
            "size": 341546,
            "upload_time": "2024-10-29T17:43:32",
            "upload_time_iso_8601": "2024-10-29T17:43:32.132536Z",
            "url": "https://files.pythonhosted.org/packages/f5/95/44eb569e2a70f9c63dd75f80fea8495eec464c29b988188ebcae940a6470/optimum-1.23.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-29 17:43:32",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "huggingface",
    "github_project": "optimum",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "optimum"
}
        
Elapsed time: 1.20097s