fms-model-optimizer

Name	fms-model-optimizer JSON
Version	0.4.1 JSON
	download
home_page	None
Summary	Quantization Techniques
upload_time	2025-07-11 23:19:55
maintainer	None
docs_url	None
author	None
requires_python	<3.13,>3.9
license	Apache-2.0
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # FMS Model Optimizer

![Lint](https://github.com/foundation-model-stack/fms-model-optimizer/actions/workflows/lint.yml/badge.svg?branch=main)
![Tests](https://github.com/foundation-model-stack/fms-model-optimizer/actions/workflows/test.yml/badge.svg?branch=main)
![Build](https://github.com/foundation-model-stack/fms-model-optimizer/actions/workflows/pypi.yml/badge.svg?branch=main)
[![Minimum Python Version](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
![Release](https://img.shields.io/github/v/release/foundation-model-stack/fms-model-optimizer)
![License](https://img.shields.io/github/license/foundation-model-stack/fms-model-optimizer)


## Introduction

FMS Model Optimizer is a framework for developing reduced precision neural network models. [Quantization](https://www.ibm.com/think/topics/quantization) techniques, such as [quantization-aware-training (QAT)](https://arxiv.org/abs/2407.11062), [post-training quantization (PTQ)](https://arxiv.org/abs/2102.05426), and several other optimization techniques on popular deep learning workloads are supported.

## Highlights

- **Python API to enable model quantization:** With the addition of a few lines of codes, module-level and/or function-level operations replacement will be performed.
- **Robust:** Verified for INT 8/4-bit quantization on important vision/speech/NLP/object detection/LLMs.
- **Flexible:** Options to analyze the network using PyTorch Dynamo, apply best practices, such as clip_val initialization, layer-level precision setting, optimizer param group setting, etc. during quantization.
- **State-of-the-art INT and FP quantization techniques** for weights and activations, such as SmoothQuant, SAWB+ and PACT+.
- **Supports key compute-intensive operations** like Conv2d, Linear, LSTM, MM and BMM

## Supported Models

| | GPTQ | FP8 | PTQ | QAT |
|---|------|-----|-----|-----|
| Granite      |:white_check_mark:|:white_check_mark:|:white_check_mark:|:black_square_button:|
| Llama        |:white_check_mark:|:white_check_mark:|:white_check_mark:|:black_square_button:|
| Mixtral      |:white_check_mark:|:white_check_mark:|:white_check_mark:|:black_square_button:|
| BERT/Roberta |:white_check_mark:|:white_check_mark:|:white_check_mark:|:white_check_mark:   |

**Note**: Direct QAT on LLMs is not recommended

## Getting Started

### Requirements

1. **🐧 Linux system with Nvidia GPU (V100/A100/H100)**
2. Python 3.10 to Python 3.12
3. CUDA >=12

*Optional packages based on optimization functionality required:*

- **GPTQ** is a popular compression method for LLMs: 
    - [gptqmodel](https://pypi.org/project/gptqmodel/) or build from [source](https://github.com/ModelCloud/GPTQModel)
- If you want to experiment with **INT8** deployment in [QAT](./examples/QAT_INT8/) and [PTQ](./examples/PTQ_INT8/) examples:
    - Nvidia GPU with compute capability > 8.0 (A100 family or higher)
    - Option 1:
        - [Ninja](https://ninja-build.org/)
        - Clone the [CUTLASS](https://github.com/NVIDIA/cutlass) repository
        - `PyTorch 2.3.1` (as newer version will cause issue for the custom CUDA kernel used in these examples)
    - Option 2:
        - use triton kernel included. But this kernel is currently not faster than FP16.
- **FP8** is a reduced precision format like **INT8**:
    - Nvidia A100 family or higher
    - [llm-compressor](https://github.com/vllm-project/llm-compressor)
- To enable compute graph plotting function (mostly for troubleshooting purpose):
    - [matplotlib](https://matplotlib.org/)
    - [graphviz](https://graphviz.org/)
    - [pygraphviz](https://pygraphviz.github.io/)

> [!NOTE]
> PyTorch version should be < 2.4 if you would like to experiment deployment with external INT8 kernel.

### Installation

We recommend using a Python virtual environment with Python 3.9+. Here is how to setup a virtual environment using [Python venv](https://docs.python.org/3/library/venv.html):

```
python3 -m venv fms_mo_venv
source fms_mo_venv/bin/activate
```

> [!TIP]
> If you use [pyenv](https://github.com/pyenv/pyenv), [Conda Miniforge](https://github.com/conda-forge/miniforge) or other such tools for Python version management, create the virtual environment with that tool instead of venv. Otherwise, you may have issues with installed packages not being found as they are linked to your Python version management tool and not `venv`.

There are 2 ways to install the FMS Model Optimizer as follows:

#### From Release

To install from release ([PyPi package](https://pypi.org/project/fms-model-optimizer/)):

```shell
python3 -m venv fms_mo_venv
source fms_mo_venv/bin/activate
pip install fms-model-optimizer
```

#### From Source

To install from source(GitHub Repository):

```shell
python3 -m venv fms_mo_venv
source fms_mo_venv/bin/activate
git clone https://github.com/foundation-model-stack/fms-model-optimizer
cd fms-model-optimizer
pip install -e .
```

#### Optional Dependencies
The following optional dependencies are available:
- `fp8`: `llmcompressor` package for fp8 quantization
- `gptq`: `GPTQModel` package for W4A16 quantization
- `mx`: `microxcaling` package for MX quantization
- `opt`: Shortcut for `fp8`, `gptq`, and `mx` installs
- `aiu`: `ibm-fms` package for AIU model deployment
- `torchvision`: `torch` package for image recognition training and inference
- `triton`: `triton` package for matrix multiplication kernels
- `examples`: Dependencies needed for examples
- `visualize`: Dependencies for visualizing models and performance data
- `test`: Dependencies needed for unit testing
- `dev`: Dependencies needed for development

To install an optional dependency, modify the `pip install` commands above with a list of these names enclosed in brackets.  The example below installs `llm-compressor` and `torchvision` with FMS Model Optimizer:

```shell
pip install fms-model-optimizer[fp8,torchvision]

pip install -e .[fp8,torchvision]
```
If you have already installed FMS Model Optimizer, then only the optional packages will be installed.

### Try It Out!

To help you get up and running as quickly as possible with the FMS Model Optimizer framework, check out the following resources which demonstrate how to use the framework with different quantization techniques:

 - Jupyter notebook tutorials (It is recommended to begin here):
    - [Quantization tutorial](tutorials/quantization_tutorial.ipynb):
        - Visualizes a random Gaussian tensor step-by-step along the quantization process
        - Build a quantizer and quantized convolution module based on this process
- [Python script examples](./examples/)

## Docs

Dive into the [design document](./docs/fms_mo_design.md) to get a better understanding of the
framework motivation and concepts.

## Contributing

Check out our [contributing guide](CONTRIBUTING.md) to learn how to contribute.

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "fms-model-optimizer",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.13,>3.9",
    "maintainer_email": null,
    "keywords": null,
    "author": null,
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/04/67/399c791f21b94e3c6e7ecc84be3ff601023d677b82568b30d9d0d4be1ab6/fms_model_optimizer-0.4.1.tar.gz",
    "platform": null,
    "description": "# FMS Model Optimizer\n\n![Lint](https://github.com/foundation-model-stack/fms-model-optimizer/actions/workflows/lint.yml/badge.svg?branch=main)\n![Tests](https://github.com/foundation-model-stack/fms-model-optimizer/actions/workflows/test.yml/badge.svg?branch=main)\n![Build](https://github.com/foundation-model-stack/fms-model-optimizer/actions/workflows/pypi.yml/badge.svg?branch=main)\n[![Minimum Python Version](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)\n![Release](https://img.shields.io/github/v/release/foundation-model-stack/fms-model-optimizer)\n![License](https://img.shields.io/github/license/foundation-model-stack/fms-model-optimizer)\n\n\n## Introduction\n\nFMS Model Optimizer is a framework for developing reduced precision neural network models. [Quantization](https://www.ibm.com/think/topics/quantization) techniques, such as [quantization-aware-training (QAT)](https://arxiv.org/abs/2407.11062), [post-training quantization (PTQ)](https://arxiv.org/abs/2102.05426), and several other optimization techniques on popular deep learning workloads are supported.\n\n## Highlights\n\n- **Python API to enable model quantization:** With the addition of a few lines of codes, module-level and/or function-level operations replacement will be performed.\n- **Robust:** Verified for INT 8/4-bit quantization on important vision/speech/NLP/object detection/LLMs.\n- **Flexible:** Options to analyze the network using PyTorch Dynamo, apply best practices, such as clip_val initialization, layer-level precision setting, optimizer param group setting, etc. during quantization.\n- **State-of-the-art INT and FP quantization techniques** for weights and activations, such as SmoothQuant, SAWB+ and PACT+.\n- **Supports key compute-intensive operations** like Conv2d, Linear, LSTM, MM and BMM\n\n## Supported Models\n\n| | GPTQ | FP8 | PTQ | QAT |\n|---|------|-----|-----|-----|\n| Granite      |:white_check_mark:|:white_check_mark:|:white_check_mark:|:black_square_button:|\n| Llama        |:white_check_mark:|:white_check_mark:|:white_check_mark:|:black_square_button:|\n| Mixtral      |:white_check_mark:|:white_check_mark:|:white_check_mark:|:black_square_button:|\n| BERT/Roberta |:white_check_mark:|:white_check_mark:|:white_check_mark:|:white_check_mark:   |\n\n**Note**: Direct QAT on LLMs is not recommended\n\n## Getting Started\n\n### Requirements\n\n1. **\ud83d\udc27 Linux system with Nvidia GPU (V100/A100/H100)**\n2. Python 3.10 to Python 3.12\n3. CUDA >=12\n\n*Optional packages based on optimization functionality required:*\n\n- **GPTQ** is a popular compression method for LLMs: \n    - [gptqmodel](https://pypi.org/project/gptqmodel/) or build from [source](https://github.com/ModelCloud/GPTQModel)\n- If you want to experiment with **INT8** deployment in [QAT](./examples/QAT_INT8/) and [PTQ](./examples/PTQ_INT8/) examples:\n    - Nvidia GPU with compute capability > 8.0 (A100 family or higher)\n    - Option 1:\n        - [Ninja](https://ninja-build.org/)\n        - Clone the [CUTLASS](https://github.com/NVIDIA/cutlass) repository\n        - `PyTorch 2.3.1` (as newer version will cause issue for the custom CUDA kernel used in these examples)\n    - Option 2:\n        - use triton kernel included. But this kernel is currently not faster than FP16.\n- **FP8** is a reduced precision format like **INT8**:\n    - Nvidia A100 family or higher\n    - [llm-compressor](https://github.com/vllm-project/llm-compressor)\n- To enable compute graph plotting function (mostly for troubleshooting purpose):\n    - [matplotlib](https://matplotlib.org/)\n    - [graphviz](https://graphviz.org/)\n    - [pygraphviz](https://pygraphviz.github.io/)\n\n> [!NOTE]\n> PyTorch version should be < 2.4 if you would like to experiment deployment with external INT8 kernel.\n\n### Installation\n\nWe recommend using a Python virtual environment with Python 3.9+. Here is how to setup a virtual environment using [Python venv](https://docs.python.org/3/library/venv.html):\n\n```\npython3 -m venv fms_mo_venv\nsource fms_mo_venv/bin/activate\n```\n\n> [!TIP]\n> If you use [pyenv](https://github.com/pyenv/pyenv), [Conda Miniforge](https://github.com/conda-forge/miniforge) or other such tools for Python version management, create the virtual environment with that tool instead of venv. Otherwise, you may have issues with installed packages not being found as they are linked to your Python version management tool and not `venv`.\n\nThere are 2 ways to install the FMS Model Optimizer as follows:\n\n#### From Release\n\nTo install from release ([PyPi package](https://pypi.org/project/fms-model-optimizer/)):\n\n```shell\npython3 -m venv fms_mo_venv\nsource fms_mo_venv/bin/activate\npip install fms-model-optimizer\n```\n\n#### From Source\n\nTo install from source(GitHub Repository):\n\n```shell\npython3 -m venv fms_mo_venv\nsource fms_mo_venv/bin/activate\ngit clone https://github.com/foundation-model-stack/fms-model-optimizer\ncd fms-model-optimizer\npip install -e .\n```\n\n#### Optional Dependencies\nThe following optional dependencies are available:\n- `fp8`: `llmcompressor` package for fp8 quantization\n- `gptq`: `GPTQModel` package for W4A16 quantization\n- `mx`: `microxcaling` package for MX quantization\n- `opt`: Shortcut for `fp8`, `gptq`, and `mx` installs\n- `aiu`: `ibm-fms` package for AIU model deployment\n- `torchvision`: `torch` package for image recognition training and inference\n- `triton`: `triton` package for matrix multiplication kernels\n- `examples`: Dependencies needed for examples\n- `visualize`: Dependencies for visualizing models and performance data\n- `test`: Dependencies needed for unit testing\n- `dev`: Dependencies needed for development\n\nTo install an optional dependency, modify the `pip install` commands above with a list of these names enclosed in brackets.  The example below installs `llm-compressor` and `torchvision` with FMS Model Optimizer:\n\n```shell\npip install fms-model-optimizer[fp8,torchvision]\n\npip install -e .[fp8,torchvision]\n```\nIf you have already installed FMS Model Optimizer, then only the optional packages will be installed.\n\n### Try It Out!\n\nTo help you get up and running as quickly as possible with the FMS Model Optimizer framework, check out the following resources which demonstrate how to use the framework with different quantization techniques:\n\n - Jupyter notebook tutorials (It is recommended to begin here):\n    - [Quantization tutorial](tutorials/quantization_tutorial.ipynb):\n        - Visualizes a random Gaussian tensor step-by-step along the quantization process\n        - Build a quantizer and quantized convolution module based on this process\n- [Python script examples](./examples/)\n\n## Docs\n\nDive into the [design document](./docs/fms_mo_design.md) to get a better understanding of the\nframework motivation and concepts.\n\n## Contributing\n\nCheck out our [contributing guide](CONTRIBUTING.md) to learn how to contribute.\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Quantization Techniques",
    "version": "0.4.1",
    "project_urls": {
        "homepage": "https://github.com/foundation-model-stack/fms-model-optimizer",
        "issues": "https://github.com/foundation-model-stack/fms-model-optimizer/issues",
        "source": "https://github.com/foundation-model-stack/fms-model-optimizer"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "c7fdedbdaa611661c086d7f1a3c54d47bf71254683c54a7a9f53a932565ec826",
                "md5": "440d2ca246141d9600f418ec574b7a33",
                "sha256": "7a3c58d26741aa872d6fb85fce2c2694254b8e68583d3fc8b941840b3066193e"
            },
            "downloads": -1,
            "filename": "fms_model_optimizer-0.4.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "440d2ca246141d9600f418ec574b7a33",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.13,>3.9",
            "size": 283674,
            "upload_time": "2025-07-11T23:19:53",
            "upload_time_iso_8601": "2025-07-11T23:19:53.987025Z",
            "url": "https://files.pythonhosted.org/packages/c7/fd/edbdaa611661c086d7f1a3c54d47bf71254683c54a7a9f53a932565ec826/fms_model_optimizer-0.4.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "0467399c791f21b94e3c6e7ecc84be3ff601023d677b82568b30d9d0d4be1ab6",
                "md5": "594e45533b1b0fa56ce98756091d3719",
                "sha256": "04fdd223f6e9731b86cdd4be0218a81e236fcd0b90018d1fbbe433c76d9379a0"
            },
            "downloads": -1,
            "filename": "fms_model_optimizer-0.4.1.tar.gz",
            "has_sig": false,
            "md5_digest": "594e45533b1b0fa56ce98756091d3719",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.13,>3.9",
            "size": 5110614,
            "upload_time": "2025-07-11T23:19:55",
            "upload_time_iso_8601": "2025-07-11T23:19:55.992090Z",
            "url": "https://files.pythonhosted.org/packages/04/67/399c791f21b94e3c6e7ecc84be3ff601023d677b82568b30d9d0d4be1ab6/fms_model_optimizer-0.4.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-11 23:19:55",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "foundation-model-stack",
    "github_project": "fms-model-optimizer",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "fms-model-optimizer"
}

None