onnx-neural-compressor

Name	onnx-neural-compressor JSON
Version	1.0 JSON
	download
home_page	None
Summary	Repository of Neural Compressor ORT
upload_time	2024-07-31 16:36:14
maintainer	None
docs_url	None
author	Intel AIPT Team
requires_python	>=3.8.0
license	Apache 2.0
keywords	quantization
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            <div align="center">

Neural Compressor
===========================
<h3> An open-source Python library supporting popular model compression techniques for ONNX</h3>

[![python](https://img.shields.io/badge/python-3.8%2B-blue)](https://github.com/onnx/neural-compressor)
[![version](https://img.shields.io/badge/release-1.0-green)](https://github.com/onnx/neural-compressor/releases)
[![license](https://img.shields.io/badge/license-Apache%202-blue)](https://github.com/onnx/neural-compressor/blob/master/LICENSE)


---
<div align="left">

Neural Compressor aims to provide popular model compression techniques inherited from [Intel Neural Compressor](https://github.com/intel/neural-compressor) yet focused on ONNX model quantization such as SmoothQuant, weight-only quantization through [ONNX Runtime](https://onnxruntime.ai/). In particular, the tool provides the key features, typical examples, and open collaborations as below:

* Support a wide range of Intel hardware such as [Intel Xeon Scalable Processors](https://www.intel.com/content/www/us/en/products/details/processors/xeon/scalable.html) and AIPC

* Validate popular LLMs such as [LLama2](./examples/nlp/huggingface_model/text_generation/llama/) and broad models such as [BERT-base](./examples/nlp/onnx_model_zoo/bert-squad/), and [ResNet50](./examples/image_recognition/onnx_model_zoo/resnet50/) from popular model hubs such as [Hugging Face](https://huggingface.co/), [ONNX Model Zoo](https://github.com/onnx/models#models), by leveraging automatic [accuracy-driven](./docs/design.md#workflow) quantization strategies

* Collaborate with software platforms such as [Microsoft Olive](https://github.com/microsoft/Olive), and open AI ecosystem such as [Hugging Face](https://huggingface.co/blog/intel), [ONNX](https://github.com/onnx/models#models) and [ONNX Runtime](https://github.com/microsoft/onnxruntime)

## Installation

### Install from source
```Shell
git clone https://github.com/onnx/neural-compressor.git
cd neural-compressor
pip install -r requirements.txt
pip install .
```

> **Note**:
> Further installation methods can be found under [Installation Guide](./docs/installation_guide.md).

## Getting Started

Setting up the environment:
```bash
pip install onnx-neural-compressor "onnxruntime>=1.17.0" onnx
```
After successfully installing these packages, try your first quantization program.
> Notes: please install from source before the formal pypi release.

### Weight-Only Quantization (LLMs)
Following example code demonstrates Weight-Only Quantization on LLMs, device will be selected for efficiency automatically when multiple devices are available.

Run the example:
```python
from onnx_neural_compressor.quantization import matmul_nbits_quantizer

algo_config = matmul_nbits_quantizer.RTNWeightOnlyQuantConfig()
quant = matmul_nbits_quantizer.MatMulNBitsQuantizer(
    model,
    n_bits=4,
    block_size=32,
    is_symmetric=True,
    algo_config=algo_config,
)
quant.process()
best_model = quant.model
```

### Static Quantization

```python
from onnx_neural_compressor.quantization import quantize, config
from onnx_neural_compressor import data_reader


class DataReader(data_reader.CalibrationDataReader):
    def __init__(self):
        self.encoded_list = []
        # append data into self.encoded_list

        self.iter_next = iter(self.encoded_list)

    def get_next(self):
        return next(self.iter_next, None)

    def rewind(self):
        self.iter_next = iter(self.encoded_list)


data_reader = DataReader()
qconfig = config.StaticQuantConfig(calibration_data_reader=data_reader)
quantize(model, output_model_path, qconfig)
```

## Documentation

<table class="docutils">
  <thead>
  <tr>
    <th colspan="8">Overview</th>
  </tr>
  </thead>
  <tbody>
    <tr>
      <td colspan="3" align="center"><a href="./docs/design.md#architecture">Architecture</a></td>
      <td colspan="3" align="center"><a href="./docs/design.md#workflow">Workflow</a></td>
      <td colspan="3" align="center"><a href="./examples/">Examples</a></td>
    </tr>
  </tbody>
  <thead>
    <tr>
      <th colspan="8">Feature</th>
    </tr>
  </thead>
  <tbody>
    <tr>
        <td colspan="4" align="center"><a href="./docs/quantization.md">Quantization</a></td>
          <td colspan="4" align="center"><a href="./docs/smooth_quant.md">SmoothQuant</td>
      <tr>
          <td colspan="4" align="center"><a href="./docs/quantization_weight_only.md">Weight-Only Quantization (INT8/INT4) </td>
           </td>
          <td colspan="4" align="center"><a href="./docs/quantization_layer_wise.md">Layer-Wise Quantization </td>
      </tr>
  </tbody>
</table>



## Additional Content

* [Contribution Guidelines](./docs/source/CONTRIBUTING.md)
* [Security Policy](SECURITY.md)

## Communication
- [GitHub Issues](https://github.com/onnx/neural-compressor/issues): mainly for bug reports, new feature requests, question asking, etc.
- [Email](mailto:inc.maintainers@intel.com): welcome to raise any interesting research ideas on model compression techniques by email for collaborations.

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "onnx-neural-compressor",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8.0",
    "maintainer_email": null,
    "keywords": "quantization",
    "author": "Intel AIPT Team",
    "author_email": "tai.huang@intel.com, mengni.wang@intel.com, yuwen.zhou@intel.com, suyue.chen@intel.com",
    "download_url": "https://files.pythonhosted.org/packages/69/6a/25cdb4307e361d54ca7c824e35fa325925134d5df91c50538fad846fe774/onnx_neural_compressor-1.0.tar.gz",
    "platform": null,
    "description": "<div align=\"center\">\n\nNeural Compressor\n===========================\n<h3> An open-source Python library supporting popular model compression techniques for ONNX</h3>\n\n[![python](https://img.shields.io/badge/python-3.8%2B-blue)](https://github.com/onnx/neural-compressor)\n[![version](https://img.shields.io/badge/release-1.0-green)](https://github.com/onnx/neural-compressor/releases)\n[![license](https://img.shields.io/badge/license-Apache%202-blue)](https://github.com/onnx/neural-compressor/blob/master/LICENSE)\n\n\n---\n<div align=\"left\">\n\nNeural Compressor aims to provide popular model compression techniques inherited from [Intel Neural Compressor](https://github.com/intel/neural-compressor) yet focused on ONNX model quantization such as SmoothQuant, weight-only quantization through [ONNX Runtime](https://onnxruntime.ai/). In particular, the tool provides the key features, typical examples, and open collaborations as below:\n\n* Support a wide range of Intel hardware such as [Intel Xeon Scalable Processors](https://www.intel.com/content/www/us/en/products/details/processors/xeon/scalable.html) and AIPC\n\n* Validate popular LLMs such as [LLama2](./examples/nlp/huggingface_model/text_generation/llama/) and broad models such as [BERT-base](./examples/nlp/onnx_model_zoo/bert-squad/), and [ResNet50](./examples/image_recognition/onnx_model_zoo/resnet50/) from popular model hubs such as [Hugging Face](https://huggingface.co/), [ONNX Model Zoo](https://github.com/onnx/models#models), by leveraging automatic [accuracy-driven](./docs/design.md#workflow) quantization strategies\n\n* Collaborate with software platforms such as [Microsoft Olive](https://github.com/microsoft/Olive), and open AI ecosystem such as [Hugging Face](https://huggingface.co/blog/intel), [ONNX](https://github.com/onnx/models#models) and [ONNX Runtime](https://github.com/microsoft/onnxruntime)\n\n## Installation\n\n### Install from source\n```Shell\ngit clone https://github.com/onnx/neural-compressor.git\ncd neural-compressor\npip install -r requirements.txt\npip install .\n```\n\n> **Note**:\n> Further installation methods can be found under [Installation Guide](./docs/installation_guide.md).\n\n## Getting Started\n\nSetting up the environment:\n```bash\npip install onnx-neural-compressor \"onnxruntime>=1.17.0\" onnx\n```\nAfter successfully installing these packages, try your first quantization program.\n> Notes: please install from source before the formal pypi release.\n\n### Weight-Only Quantization (LLMs)\nFollowing example code demonstrates Weight-Only Quantization on LLMs, device will be selected for efficiency automatically when multiple devices are available.\n\nRun the example:\n```python\nfrom onnx_neural_compressor.quantization import matmul_nbits_quantizer\n\nalgo_config = matmul_nbits_quantizer.RTNWeightOnlyQuantConfig()\nquant = matmul_nbits_quantizer.MatMulNBitsQuantizer(\n    model,\n    n_bits=4,\n    block_size=32,\n    is_symmetric=True,\n    algo_config=algo_config,\n)\nquant.process()\nbest_model = quant.model\n```\n\n### Static Quantization\n\n```python\nfrom onnx_neural_compressor.quantization import quantize, config\nfrom onnx_neural_compressor import data_reader\n\n\nclass DataReader(data_reader.CalibrationDataReader):\n    def __init__(self):\n        self.encoded_list = []\n        # append data into self.encoded_list\n\n        self.iter_next = iter(self.encoded_list)\n\n    def get_next(self):\n        return next(self.iter_next, None)\n\n    def rewind(self):\n        self.iter_next = iter(self.encoded_list)\n\n\ndata_reader = DataReader()\nqconfig = config.StaticQuantConfig(calibration_data_reader=data_reader)\nquantize(model, output_model_path, qconfig)\n```\n\n## Documentation\n\n<table class=\"docutils\">\n  <thead>\n  <tr>\n    <th colspan=\"8\">Overview</th>\n  </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <td colspan=\"3\" align=\"center\"><a href=\"./docs/design.md#architecture\">Architecture</a></td>\n      <td colspan=\"3\" align=\"center\"><a href=\"./docs/design.md#workflow\">Workflow</a></td>\n      <td colspan=\"3\" align=\"center\"><a href=\"./examples/\">Examples</a></td>\n    </tr>\n  </tbody>\n  <thead>\n    <tr>\n      <th colspan=\"8\">Feature</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n        <td colspan=\"4\" align=\"center\"><a href=\"./docs/quantization.md\">Quantization</a></td>\n          <td colspan=\"4\" align=\"center\"><a href=\"./docs/smooth_quant.md\">SmoothQuant</td>\n      <tr>\n          <td colspan=\"4\" align=\"center\"><a href=\"./docs/quantization_weight_only.md\">Weight-Only Quantization (INT8/INT4) </td>\n           </td>\n          <td colspan=\"4\" align=\"center\"><a href=\"./docs/quantization_layer_wise.md\">Layer-Wise Quantization </td>\n      </tr>\n  </tbody>\n</table>\n\n\n\n## Additional Content\n\n* [Contribution Guidelines](./docs/source/CONTRIBUTING.md)\n* [Security Policy](SECURITY.md)\n\n## Communication\n- [GitHub Issues](https://github.com/onnx/neural-compressor/issues): mainly for bug reports, new feature requests, question asking, etc.\n- [Email](mailto:inc.maintainers@intel.com): welcome to raise any interesting research ideas on model compression techniques by email for collaborations.\n\n\n",
    "bugtrack_url": null,
    "license": "Apache 2.0",
    "summary": "Repository of Neural Compressor ORT",
    "version": "1.0",
    "project_urls": null,
    "split_keywords": [
        "quantization"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9ffb750b57c3174bccb6b77518f5fa1f0cf308e38cbdf24209f62be31d8eceff",
                "md5": "d1990a350fce841ec2d4ad5ff79e54fd",
                "sha256": "6896dd9084e75812ce6d28bc794e8f5d7e2f8a394c23c81d5232624aa1db8722"
            },
            "downloads": -1,
            "filename": "onnx_neural_compressor-1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d1990a350fce841ec2d4ad5ff79e54fd",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8.0",
            "size": 142982,
            "upload_time": "2024-07-31T16:36:12",
            "upload_time_iso_8601": "2024-07-31T16:36:12.310502Z",
            "url": "https://files.pythonhosted.org/packages/9f/fb/750b57c3174bccb6b77518f5fa1f0cf308e38cbdf24209f62be31d8eceff/onnx_neural_compressor-1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "696a25cdb4307e361d54ca7c824e35fa325925134d5df91c50538fad846fe774",
                "md5": "3e617eb35d800bfd56d3039ebc441c11",
                "sha256": "7d04a517a36c1bb0e976b014dbf51bea7b6b747136a409ea0959351d4a8acce1"
            },
            "downloads": -1,
            "filename": "onnx_neural_compressor-1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "3e617eb35d800bfd56d3039ebc441c11",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8.0",
            "size": 95327,
            "upload_time": "2024-07-31T16:36:14",
            "upload_time_iso_8601": "2024-07-31T16:36:14.472527Z",
            "url": "https://files.pythonhosted.org/packages/69/6a/25cdb4307e361d54ca7c824e35fa325925134d5df91c50538fad846fe774/onnx_neural_compressor-1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-07-31 16:36:14",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "onnx-neural-compressor"
}

Intel AIPT Team