<div align="center">
Neural Compressor
===========================
<h3> An open-source Python library supporting popular model compression techniques for ONNX</h3>
[![python](https://img.shields.io/badge/python-3.8%2B-blue)](https://github.com/onnx/neural-compressor)
[![version](https://img.shields.io/badge/release-1.0-green)](https://github.com/onnx/neural-compressor/releases)
[![license](https://img.shields.io/badge/license-Apache%202-blue)](https://github.com/onnx/neural-compressor/blob/master/LICENSE)
---
<div align="left">
Neural Compressor aims to provide popular model compression techniques inherited from [Intel Neural Compressor](https://github.com/intel/neural-compressor) yet focused on ONNX model quantization such as SmoothQuant, weight-only quantization through [ONNX Runtime](https://onnxruntime.ai/). In particular, the tool provides the key features, typical examples, and open collaborations as below:
* Support a wide range of Intel hardware such as [Intel Xeon Scalable Processors](https://www.intel.com/content/www/us/en/products/details/processors/xeon/scalable.html) and AIPC
* Validate popular LLMs such as [LLama2](./examples/nlp/huggingface_model/text_generation/llama/) and broad models such as [BERT-base](./examples/nlp/onnx_model_zoo/bert-squad/), and [ResNet50](./examples/image_recognition/onnx_model_zoo/resnet50/) from popular model hubs such as [Hugging Face](https://huggingface.co/), [ONNX Model Zoo](https://github.com/onnx/models#models), by leveraging automatic [accuracy-driven](./docs/design.md#workflow) quantization strategies
* Collaborate with software platforms such as [Microsoft Olive](https://github.com/microsoft/Olive), and open AI ecosystem such as [Hugging Face](https://huggingface.co/blog/intel), [ONNX](https://github.com/onnx/models#models) and [ONNX Runtime](https://github.com/microsoft/onnxruntime)
## Installation
### Install from source
```Shell
git clone https://github.com/onnx/neural-compressor.git
cd neural-compressor
pip install -r requirements.txt
pip install .
```
> **Note**:
> Further installation methods can be found under [Installation Guide](./docs/installation_guide.md).
## Getting Started
Setting up the environment:
```bash
pip install onnx-neural-compressor "onnxruntime>=1.17.0" onnx
```
After successfully installing these packages, try your first quantization program.
> Notes: please install from source before the formal pypi release.
### Weight-Only Quantization (LLMs)
Following example code demonstrates Weight-Only Quantization on LLMs, device will be selected for efficiency automatically when multiple devices are available.
Run the example:
```python
from onnx_neural_compressor.quantization import matmul_nbits_quantizer
algo_config = matmul_nbits_quantizer.RTNWeightOnlyQuantConfig()
quant = matmul_nbits_quantizer.MatMulNBitsQuantizer(
model,
n_bits=4,
block_size=32,
is_symmetric=True,
algo_config=algo_config,
)
quant.process()
best_model = quant.model
```
### Static Quantization
```python
from onnx_neural_compressor.quantization import quantize, config
from onnx_neural_compressor import data_reader
class DataReader(data_reader.CalibrationDataReader):
def __init__(self):
self.encoded_list = []
# append data into self.encoded_list
self.iter_next = iter(self.encoded_list)
def get_next(self):
return next(self.iter_next, None)
def rewind(self):
self.iter_next = iter(self.encoded_list)
data_reader = DataReader()
qconfig = config.StaticQuantConfig(calibration_data_reader=data_reader)
quantize(model, output_model_path, qconfig)
```
## Documentation
<table class="docutils">
<thead>
<tr>
<th colspan="8">Overview</th>
</tr>
</thead>
<tbody>
<tr>
<td colspan="3" align="center"><a href="./docs/design.md#architecture">Architecture</a></td>
<td colspan="3" align="center"><a href="./docs/design.md#workflow">Workflow</a></td>
<td colspan="3" align="center"><a href="./examples/">Examples</a></td>
</tr>
</tbody>
<thead>
<tr>
<th colspan="8">Feature</th>
</tr>
</thead>
<tbody>
<tr>
<td colspan="4" align="center"><a href="./docs/quantization.md">Quantization</a></td>
<td colspan="4" align="center"><a href="./docs/smooth_quant.md">SmoothQuant</td>
<tr>
<td colspan="4" align="center"><a href="./docs/quantization_weight_only.md">Weight-Only Quantization (INT8/INT4) </td>
</td>
<td colspan="4" align="center"><a href="./docs/quantization_layer_wise.md">Layer-Wise Quantization </td>
</tr>
</tbody>
</table>
## Additional Content
* [Contribution Guidelines](./docs/source/CONTRIBUTING.md)
* [Security Policy](SECURITY.md)
## Communication
- [GitHub Issues](https://github.com/onnx/neural-compressor/issues): mainly for bug reports, new feature requests, question asking, etc.
- [Email](mailto:inc.maintainers@intel.com): welcome to raise any interesting research ideas on model compression techniques by email for collaborations.
Raw data
{
"_id": null,
"home_page": null,
"name": "onnx-neural-compressor",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8.0",
"maintainer_email": null,
"keywords": "quantization",
"author": "Intel AIPT Team",
"author_email": "tai.huang@intel.com, mengni.wang@intel.com, yuwen.zhou@intel.com, suyue.chen@intel.com",
"download_url": "https://files.pythonhosted.org/packages/69/6a/25cdb4307e361d54ca7c824e35fa325925134d5df91c50538fad846fe774/onnx_neural_compressor-1.0.tar.gz",
"platform": null,
"description": "<div align=\"center\">\n\nNeural Compressor\n===========================\n<h3> An open-source Python library supporting popular model compression techniques for ONNX</h3>\n\n[![python](https://img.shields.io/badge/python-3.8%2B-blue)](https://github.com/onnx/neural-compressor)\n[![version](https://img.shields.io/badge/release-1.0-green)](https://github.com/onnx/neural-compressor/releases)\n[![license](https://img.shields.io/badge/license-Apache%202-blue)](https://github.com/onnx/neural-compressor/blob/master/LICENSE)\n\n\n---\n<div align=\"left\">\n\nNeural Compressor aims to provide popular model compression techniques inherited from [Intel Neural Compressor](https://github.com/intel/neural-compressor) yet focused on ONNX model quantization such as SmoothQuant, weight-only quantization through [ONNX Runtime](https://onnxruntime.ai/). In particular, the tool provides the key features, typical examples, and open collaborations as below:\n\n* Support a wide range of Intel hardware such as [Intel Xeon Scalable Processors](https://www.intel.com/content/www/us/en/products/details/processors/xeon/scalable.html) and AIPC\n\n* Validate popular LLMs such as [LLama2](./examples/nlp/huggingface_model/text_generation/llama/) and broad models such as [BERT-base](./examples/nlp/onnx_model_zoo/bert-squad/), and [ResNet50](./examples/image_recognition/onnx_model_zoo/resnet50/) from popular model hubs such as [Hugging Face](https://huggingface.co/), [ONNX Model Zoo](https://github.com/onnx/models#models), by leveraging automatic [accuracy-driven](./docs/design.md#workflow) quantization strategies\n\n* Collaborate with software platforms such as [Microsoft Olive](https://github.com/microsoft/Olive), and open AI ecosystem such as [Hugging Face](https://huggingface.co/blog/intel), [ONNX](https://github.com/onnx/models#models) and [ONNX Runtime](https://github.com/microsoft/onnxruntime)\n\n## Installation\n\n### Install from source\n```Shell\ngit clone https://github.com/onnx/neural-compressor.git\ncd neural-compressor\npip install -r requirements.txt\npip install .\n```\n\n> **Note**:\n> Further installation methods can be found under [Installation Guide](./docs/installation_guide.md).\n\n## Getting Started\n\nSetting up the environment:\n```bash\npip install onnx-neural-compressor \"onnxruntime>=1.17.0\" onnx\n```\nAfter successfully installing these packages, try your first quantization program.\n> Notes: please install from source before the formal pypi release.\n\n### Weight-Only Quantization (LLMs)\nFollowing example code demonstrates Weight-Only Quantization on LLMs, device will be selected for efficiency automatically when multiple devices are available.\n\nRun the example:\n```python\nfrom onnx_neural_compressor.quantization import matmul_nbits_quantizer\n\nalgo_config = matmul_nbits_quantizer.RTNWeightOnlyQuantConfig()\nquant = matmul_nbits_quantizer.MatMulNBitsQuantizer(\n model,\n n_bits=4,\n block_size=32,\n is_symmetric=True,\n algo_config=algo_config,\n)\nquant.process()\nbest_model = quant.model\n```\n\n### Static Quantization\n\n```python\nfrom onnx_neural_compressor.quantization import quantize, config\nfrom onnx_neural_compressor import data_reader\n\n\nclass DataReader(data_reader.CalibrationDataReader):\n def __init__(self):\n self.encoded_list = []\n # append data into self.encoded_list\n\n self.iter_next = iter(self.encoded_list)\n\n def get_next(self):\n return next(self.iter_next, None)\n\n def rewind(self):\n self.iter_next = iter(self.encoded_list)\n\n\ndata_reader = DataReader()\nqconfig = config.StaticQuantConfig(calibration_data_reader=data_reader)\nquantize(model, output_model_path, qconfig)\n```\n\n## Documentation\n\n<table class=\"docutils\">\n <thead>\n <tr>\n <th colspan=\"8\">Overview</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <td colspan=\"3\" align=\"center\"><a href=\"./docs/design.md#architecture\">Architecture</a></td>\n <td colspan=\"3\" align=\"center\"><a href=\"./docs/design.md#workflow\">Workflow</a></td>\n <td colspan=\"3\" align=\"center\"><a href=\"./examples/\">Examples</a></td>\n </tr>\n </tbody>\n <thead>\n <tr>\n <th colspan=\"8\">Feature</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <td colspan=\"4\" align=\"center\"><a href=\"./docs/quantization.md\">Quantization</a></td>\n <td colspan=\"4\" align=\"center\"><a href=\"./docs/smooth_quant.md\">SmoothQuant</td>\n <tr>\n <td colspan=\"4\" align=\"center\"><a href=\"./docs/quantization_weight_only.md\">Weight-Only Quantization (INT8/INT4) </td>\n </td>\n <td colspan=\"4\" align=\"center\"><a href=\"./docs/quantization_layer_wise.md\">Layer-Wise Quantization </td>\n </tr>\n </tbody>\n</table>\n\n\n\n## Additional Content\n\n* [Contribution Guidelines](./docs/source/CONTRIBUTING.md)\n* [Security Policy](SECURITY.md)\n\n## Communication\n- [GitHub Issues](https://github.com/onnx/neural-compressor/issues): mainly for bug reports, new feature requests, question asking, etc.\n- [Email](mailto:inc.maintainers@intel.com): welcome to raise any interesting research ideas on model compression techniques by email for collaborations.\n\n\n",
"bugtrack_url": null,
"license": "Apache 2.0",
"summary": "Repository of Neural Compressor ORT",
"version": "1.0",
"project_urls": null,
"split_keywords": [
"quantization"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "9ffb750b57c3174bccb6b77518f5fa1f0cf308e38cbdf24209f62be31d8eceff",
"md5": "d1990a350fce841ec2d4ad5ff79e54fd",
"sha256": "6896dd9084e75812ce6d28bc794e8f5d7e2f8a394c23c81d5232624aa1db8722"
},
"downloads": -1,
"filename": "onnx_neural_compressor-1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "d1990a350fce841ec2d4ad5ff79e54fd",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8.0",
"size": 142982,
"upload_time": "2024-07-31T16:36:12",
"upload_time_iso_8601": "2024-07-31T16:36:12.310502Z",
"url": "https://files.pythonhosted.org/packages/9f/fb/750b57c3174bccb6b77518f5fa1f0cf308e38cbdf24209f62be31d8eceff/onnx_neural_compressor-1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "696a25cdb4307e361d54ca7c824e35fa325925134d5df91c50538fad846fe774",
"md5": "3e617eb35d800bfd56d3039ebc441c11",
"sha256": "7d04a517a36c1bb0e976b014dbf51bea7b6b747136a409ea0959351d4a8acce1"
},
"downloads": -1,
"filename": "onnx_neural_compressor-1.0.tar.gz",
"has_sig": false,
"md5_digest": "3e617eb35d800bfd56d3039ebc441c11",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8.0",
"size": 95327,
"upload_time": "2024-07-31T16:36:14",
"upload_time_iso_8601": "2024-07-31T16:36:14.472527Z",
"url": "https://files.pythonhosted.org/packages/69/6a/25cdb4307e361d54ca7c824e35fa325925134d5df91c50538fad846fe774/onnx_neural_compressor-1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-07-31 16:36:14",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "onnx-neural-compressor"
}