auto-round


Nameauto-round JSON
Version 0.1 PyPI version JSON
download
home_pagehttps://github.com/intel/auto-round
SummaryRepository of AutoRound: Advanced Weight-Only Quantization Algorithm for LLMs
upload_time2024-03-08 07:52:40
maintainer
docs_urlNone
authorIntel AIPT Team
requires_python>=3.7.0
licenseApache 2.0
keywords quantization auto-around llm signround
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <div align="center">

AutoRound
===========================
<h3> Advanced Weight-Only Quantization Algorithm for LLMs</h3>

[![python](https://img.shields.io/badge/python-3.8%2B-blue)](https://github.com/intel/auto-round)
[![version](https://img.shields.io/badge/release-0.1-green)](https://github.com/intel/auto-round)
[![license](https://img.shields.io/badge/license-Apache%202-blue)](https://github.com/intel/auto-round/blob/main/LICENSE)
---
<div align="left">

AutoRound is an advanced weight-only quantization algorithm for low-bits LLM inference. It's tailored for a wide range of models and consistently delivers noticeable improvements, often significantly outperforming SignRound with the cost of more tuning time for quantization.

## Prerequisites
- Python 3.9 or higher

## Installation
### Build from Source
```bash
pip install -r requirements.txt
python setup.py install
```
### Install from pypi
```bash
pip install auto-round
```
## Usage of Tuning

### On CPU/ Gaudi2/ GPU

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

tuning_device = "cuda:0"  ## or "cpu", "hpu"
dtype = "auto" if tuning_device != "hpu" else torch.bfloat16
model_name = "meta-llama/Llama-2-7b-hf"
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=dtype, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

from auto_round import AutoRound

bits, group_size, sym = 4, 128, False
autoround = AutoRound(model, tokenizer, bits=bits, group_size=group_size, sym=sym, device=tuning_device)
autoround.quantize()
output_dir = "./tmp_autoround"
autoround.save_quantized(output_dir)
```



## Model inference
Please run the tuning code first



### Intel CPU
```python
# Please save the quantized model in 'itrex' format first, then refer to the ITREX tutorial for more details on inference with the INT4 model.
# (https://github.com/intel/intel-extension-for-transformers/tree/main/intel_extension_for_transformers/llm/runtime/neural_speed)
from intel_extension_for_transformers.transformers import AutoModelForCausalLM, WeightOnlyQuantConfig
from transformers import AutoTokenizer

quantized_model_path = "./tmp_autoround"
scheme = "sym" if sym else "asym"
woq_config = WeightOnlyQuantConfig(
    group_size=group_size, scheme=scheme, use_autoround=True
)  ##only supports 4 bits currently
prompt = "There is a girl who likes adventure,"
tokenizer = AutoTokenizer.from_pretrained(quantized_model_path, trust_remote_code=True)
inputs = tokenizer(prompt, return_tensors="pt").input_ids
model = AutoModelForCausalLM.from_pretrained(
    quantized_model_path, quantization_config=woq_config, trust_remote_code=True, device="cpu"
)
outputs = model.generate(inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0]))
```


### GPU
```python
from transformers import AutoModelForCausalLM, AutoTokenizer

quantized_model_path = "./tmp_autoround"
model = AutoModelForCausalLM.from_pretrained(quantized_model_path, device_map="auto", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(quantized_model_path, use_fast=True)
text = "There is a girl who likes adventure,"
inputs = tokenizer(text, return_tensors="pt").to(model.device)
print(tokenizer.decode(model.generate(**inputs, max_new_tokens=50)[0]))
```

<details>
  <summary>Detailed Hyperparameters</summary>

- `model`: The PyTorch model to be quantized.
            
- `tokenizer`: An optional tokenizer for processing input data. If none is provided, a dataloader must be supplied.
  
- `bits (int)`: Number of bits for quantization (default is 4).
  
- `group_size (int)`: Size of the quantization group (default is 128).

- `sym (bool)`: Whether to use symmetric quantization.
  
- `use_quant_input (bool)`: Whether to use the output of the previous quantized block as the input for the current block (default is True).
  
- `enable_minmax_tuning (bool)`: Whether to enable weight min-max tuning (default is True).
  
- `iters (int)`: Number of tuning iterations (default is 200).
  
- `lr (float)`: The learning rate for rounding value (default is None, it will be set to 1.0/iters automatically).
  
- `minmax_lr (float)`: The learning rate for min-max tuning (default is None, it will be set to lr automatically).
  
- `n_samples (int)`: Number of samples for tuning (default is 512).
  
- `seqlen (int)`: Data length of the sequence for tuning (default is 2048).
  
- `batch_size (int)`: Batch size for training (default is 8).

- `scale_dtype (str)`: The data type of quantization scale to be used (default is "float32"), different kernels have different choices.
  
- `amp (bool)`: Whether to use automatic mixed precision (default is True).
  
- `n_blocks (int)`: Packing several blocks as one for tuning together (default is 1).
  
- `gradient_accumulate_steps (int)`: Number of gradient accumulation steps (default is 1).
  
- `low_gpu_mem_usage (bool)`: Whether to save GPU memory at the cost of a little tuning time (default is True).
  
- `dataset (str)`: The default dataset name for tuning (default is "NeelNanda/pile-10k").
  
- `dataset_split (str)`: The split of the dataset to be used for tuning (default is "train").
  
- `dataloader`: The dataloader for tuning data.
  
- `weight_config (dict)`: Configuration for weight quantization (default is an empty dictionary), mainly for mixed bits or mixed precision.
  
- `device`: The device to be used for tuning. The default is set to 'auto', allowing for automatic detection.

</details>


## Support List

| Model                    | Supported                                                                                                                                                                                                                                                          |
|--------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Intel/neural-chat-7b-v3-3 | [HF-int4-model](https://huggingface.co/Intel/neural-chat-7b-v3-3-int4-inc), [accuracy](./docs/neural-chat-7b-v3-3-acc.md), [recipe](./examples/language-modeling/scripts/neural-chat-7b-v3-3.sh), [example](./examples/language-modeling/)                         |
| Intel/neural-chat-7b-v3-1 | [HF-int4-model](https://huggingface.co/Intel/neural-chat-7b-v3-1-int4-inc), [accuracy](./docs/neural-chat-7b-v3-1-acc.md), [recipe](./examples/language-modeling/scripts/neural-chat-7b-v3-1.sh), [example](./examples/language-modeling/)                         |
| mistralai/Mistral-7B-v0.1 | [HF-int4-model](https://huggingface.co/Intel/Mistral-7B-v0.1-int4-inc), [accuracy](./docs/Mistral-7B-v0.1-acc.md), [recipe](./examples/language-modeling/scripts/Mistral-7B-v0.1.sh), [example](./examples/language-modeling/)                                     |
| google/gemma-7b          | [HF-int4-model](https://huggingface.co/Intel/gemma-7b-int4-inc) under review, [accuracy](./docs/gemma-7b-acc.md), [recipe](./examples/language-modeling/scripts/gemma-7b.sh),  [example](./examples/language-modeling/)                                            |
| google/gemma-7b-it       | [HF-int4-model](https://huggingface.co/Intel/gemma-7b-it-int4-inc) under review, [accuracy](./docs/gemma-7b-it-acc.md), [recipe](./examples/language-modeling/scripts/gemma-7b-it.sh), [example](./examples/language-modeling/)                                    |                                            |
  mistralai/Mixtral-8x7B-Instruct-v0.1 | [HF-int4-model](https://huggingface.co/Intel/Mistral-7B-v0.1-int4-inc) under review, [accuracy](./docs/Mixtral-8x7B-Instruct-v0.1-acc.md), [recipe](./examples/language-modeling/scripts/Mixtral-8x7B-Instruct-v0.1.sh),  [example](./examples/language-modeling/) |
| mistralai/Mixtral-8x7B-v0.1 | [HF-int4-model](https://huggingface.co/Intel/Mixtral-8x7B-v0.1-int4-inc) under review, [accuracy](./docs/Mixtral-8x7B-v0.1-acc.md), [recipe](./examples/language-modeling/scripts/Mixtral-8x7B-v0.1.sh), [example](./examples/language-modeling/)                  |
| microsoft/phi-2          | [HF-int4-model](https://huggingface.co/Intel/phi-2-int4-inc) under review, [accuracy](./docs/phi-2-acc.md), [recipe](./examples/language-modeling/scripts/phi-2.sh), [example](./examples/language-modeling/)                                                      |
| meta-llama/Llama-2-7b-chat-hf | [accuracy](./docs/Llama-2-7b-chat-hf-acc.md), [recipe](./examples/language-modeling/scripts/Llama-2-7b-chat-hf.sh), [example](./examples/language-modeling/)                                                                                                                    |
| Salesforce/codegen25-7b-multi | [example](./examples/code-generation)                                                                                                                                                                                                                              |
| EleutherAI/gpt-j-6b | [example](./examples/language-modeling/)                                                                                                                                                                                                                           |
| huggyllama/llama-7b | [example](./examples/language-modeling/)                                                                                                                                                                                                                           |
| meta-llama/Llama-2-7b-hf | [example](./examples/language-modeling/)                                                                                                                                                                                                                           |
| facebook/opt-6.7b | [example](./examples/language-modeling/)                                                                                                                                                                                                                           |
| tiiuae/falcon-7b | [example](./examples/language-modeling/)                                                                                                                                                                                                                           |
| mosaicml/mpt-7b | [example](./examples/language-modeling/)                                                                                                                                                                                                                           |
| bigscience/bloom-7b1 | [example](./examples/language-modeling/)                                                                                                                                                                                                                           |
| baichuan-inc/Baichuan-7B | [example](./examples/language-modeling/)                                                                                                                                                                                                                           |
| Qwen/Qwen-7B | [example](./examples/language-modeling/)                                                                                                                                                                                                                           |
| THUDM/chatglm3-6b | [example](./examples/language-modeling/)                                                                                                                                                                                                                           |
| MBZUAI/LaMini-GPT-124M | [example](./examples/language-modeling/)                                                                                                                                                                                                                           |
| EleutherAI/gpt-neo-125m | [example](./examples/language-modeling/)                                                                                                                                                                                                                           |
| databricks/dolly-v2-3b | [example](./examples/language-modeling/)                                                                                                                                                                                                                           |
| stabilityai/stablelm-base-alpha-3b | [example](./examples/language-modeling/)




## Comparison with other methods

We provide a [comprehensive analysis](docs/acc.md) with other methods in our accuracy data section. Notably, our approach has outperformed GPTQ with a score of 30/32 and AWQ with a score of 27/32 across llamv1/llamav2/mistral-7b on W4G-1, W4G128, W3G128, W2G128.  And the tuning costs are comparable.

## Tips
1 Consider increasing tuning steps to achieve better results, albeit with increased tuning time. 

2 Setting 'use_quant_input' to False has been observed to occasionally yield improved results.

3 Setting 'minmax_lr' to 2.0/iters has been observed to occasionally yield improved results.

## Reference
If you find SignRound useful for your research, please cite our paper:
```bash
@article{cheng2023optimize,
  title={Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs},
  author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao},
  journal={arXiv preprint arXiv:2309.05516},
  year={2023}
}
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/intel/auto-round",
    "name": "auto-round",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7.0",
    "maintainer_email": "",
    "keywords": "quantization,auto-around,LLM,SignRound",
    "author": "Intel AIPT Team",
    "author_email": "wenhua.cheng@intel.com, weiwei1.zhang@intel.com",
    "download_url": "https://files.pythonhosted.org/packages/d8/ed/69e6bf537ed6afcc5e87ff65f383cbe2c6b9695a0c0691371334187f8c83/auto_round-0.1.tar.gz",
    "platform": null,
    "description": "<div align=\"center\">\n\nAutoRound\n===========================\n<h3> Advanced Weight-Only Quantization Algorithm for LLMs</h3>\n\n[![python](https://img.shields.io/badge/python-3.8%2B-blue)](https://github.com/intel/auto-round)\n[![version](https://img.shields.io/badge/release-0.1-green)](https://github.com/intel/auto-round)\n[![license](https://img.shields.io/badge/license-Apache%202-blue)](https://github.com/intel/auto-round/blob/main/LICENSE)\n---\n<div align=\"left\">\n\nAutoRound is an advanced weight-only quantization algorithm for low-bits LLM inference. It's tailored for a wide range of models and consistently delivers noticeable improvements, often significantly outperforming SignRound with the cost of more tuning time for quantization.\n\n## Prerequisites\n- Python 3.9 or higher\n\n## Installation\n### Build from Source\n```bash\npip install -r requirements.txt\npython setup.py install\n```\n### Install from pypi\n```bash\npip install auto-round\n```\n## Usage of Tuning\n\n### On CPU/ Gaudi2/ GPU\n\n```python\nimport torch\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\n\ntuning_device = \"cuda:0\"  ## or \"cpu\", \"hpu\"\ndtype = \"auto\" if tuning_device != \"hpu\" else torch.bfloat16\nmodel_name = \"meta-llama/Llama-2-7b-hf\"\nmodel = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=dtype, trust_remote_code=True)\ntokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)\n\nfrom auto_round import AutoRound\n\nbits, group_size, sym = 4, 128, False\nautoround = AutoRound(model, tokenizer, bits=bits, group_size=group_size, sym=sym, device=tuning_device)\nautoround.quantize()\noutput_dir = \"./tmp_autoround\"\nautoround.save_quantized(output_dir)\n```\n\n\n\n## Model inference\nPlease run the tuning code first\n\n\n\n### Intel CPU\n```python\n# Please save the quantized model in 'itrex' format first, then refer to the ITREX tutorial for more details on inference with the INT4 model.\n# (https://github.com/intel/intel-extension-for-transformers/tree/main/intel_extension_for_transformers/llm/runtime/neural_speed)\nfrom intel_extension_for_transformers.transformers import AutoModelForCausalLM, WeightOnlyQuantConfig\nfrom transformers import AutoTokenizer\n\nquantized_model_path = \"./tmp_autoround\"\nscheme = \"sym\" if sym else \"asym\"\nwoq_config = WeightOnlyQuantConfig(\n    group_size=group_size, scheme=scheme, use_autoround=True\n)  ##only supports 4 bits currently\nprompt = \"There is a girl who likes adventure,\"\ntokenizer = AutoTokenizer.from_pretrained(quantized_model_path, trust_remote_code=True)\ninputs = tokenizer(prompt, return_tensors=\"pt\").input_ids\nmodel = AutoModelForCausalLM.from_pretrained(\n    quantized_model_path, quantization_config=woq_config, trust_remote_code=True, device=\"cpu\"\n)\noutputs = model.generate(inputs, max_new_tokens=50)\nprint(tokenizer.decode(outputs[0]))\n```\n\n\n### GPU\n```python\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\n\nquantized_model_path = \"./tmp_autoround\"\nmodel = AutoModelForCausalLM.from_pretrained(quantized_model_path, device_map=\"auto\", trust_remote_code=True)\ntokenizer = AutoTokenizer.from_pretrained(quantized_model_path, use_fast=True)\ntext = \"There is a girl who likes adventure,\"\ninputs = tokenizer(text, return_tensors=\"pt\").to(model.device)\nprint(tokenizer.decode(model.generate(**inputs, max_new_tokens=50)[0]))\n```\n\n<details>\n  <summary>Detailed Hyperparameters</summary>\n\n- `model`: The PyTorch model to be quantized.\n            \n- `tokenizer`: An optional tokenizer for processing input data. If none is provided, a dataloader must be supplied.\n  \n- `bits (int)`: Number of bits for quantization (default is 4).\n  \n- `group_size (int)`: Size of the quantization group (default is 128).\n\n- `sym (bool)`: Whether to use symmetric quantization.\n  \n- `use_quant_input (bool)`: Whether to use the output of the previous quantized block as the input for the current block (default is True).\n  \n- `enable_minmax_tuning (bool)`: Whether to enable weight min-max tuning (default is True).\n  \n- `iters (int)`: Number of tuning iterations (default is 200).\n  \n- `lr (float)`: The learning rate for rounding value (default is None, it will be set to 1.0/iters automatically).\n  \n- `minmax_lr (float)`: The learning rate for min-max tuning (default is None, it will be set to lr automatically).\n  \n- `n_samples (int)`: Number of samples for tuning (default is 512).\n  \n- `seqlen (int)`: Data length of the sequence for tuning (default is 2048).\n  \n- `batch_size (int)`: Batch size for training (default is 8).\n\n- `scale_dtype (str)`: The data type of quantization scale to be used (default is \"float32\"), different kernels have different choices.\n  \n- `amp (bool)`: Whether to use automatic mixed precision (default is True).\n  \n- `n_blocks (int)`: Packing several blocks as one for tuning together (default is 1).\n  \n- `gradient_accumulate_steps (int)`: Number of gradient accumulation steps (default is 1).\n  \n- `low_gpu_mem_usage (bool)`: Whether to save GPU memory at the cost of a little tuning time (default is True).\n  \n- `dataset (str)`: The default dataset name for tuning (default is \"NeelNanda/pile-10k\").\n  \n- `dataset_split (str)`: The split of the dataset to be used for tuning (default is \"train\").\n  \n- `dataloader`: The dataloader for tuning data.\n  \n- `weight_config (dict)`: Configuration for weight quantization (default is an empty dictionary), mainly for mixed bits or mixed precision.\n  \n- `device`: The device to be used for tuning. The default is set to 'auto', allowing for automatic detection.\n\n</details>\n\n\n## Support List\n\n| Model                    | Supported                                                                                                                                                                                                                                                          |\n|--------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| Intel/neural-chat-7b-v3-3 | [HF-int4-model](https://huggingface.co/Intel/neural-chat-7b-v3-3-int4-inc), [accuracy](./docs/neural-chat-7b-v3-3-acc.md), [recipe](./examples/language-modeling/scripts/neural-chat-7b-v3-3.sh), [example](./examples/language-modeling/)                         |\n| Intel/neural-chat-7b-v3-1 | [HF-int4-model](https://huggingface.co/Intel/neural-chat-7b-v3-1-int4-inc), [accuracy](./docs/neural-chat-7b-v3-1-acc.md), [recipe](./examples/language-modeling/scripts/neural-chat-7b-v3-1.sh), [example](./examples/language-modeling/)                         |\n| mistralai/Mistral-7B-v0.1 | [HF-int4-model](https://huggingface.co/Intel/Mistral-7B-v0.1-int4-inc), [accuracy](./docs/Mistral-7B-v0.1-acc.md), [recipe](./examples/language-modeling/scripts/Mistral-7B-v0.1.sh), [example](./examples/language-modeling/)                                     |\n| google/gemma-7b          | [HF-int4-model](https://huggingface.co/Intel/gemma-7b-int4-inc) under review, [accuracy](./docs/gemma-7b-acc.md), [recipe](./examples/language-modeling/scripts/gemma-7b.sh),  [example](./examples/language-modeling/)                                            |\n| google/gemma-7b-it       | [HF-int4-model](https://huggingface.co/Intel/gemma-7b-it-int4-inc) under review, [accuracy](./docs/gemma-7b-it-acc.md), [recipe](./examples/language-modeling/scripts/gemma-7b-it.sh), [example](./examples/language-modeling/)                                    |                                            |\n  mistralai/Mixtral-8x7B-Instruct-v0.1 | [HF-int4-model](https://huggingface.co/Intel/Mistral-7B-v0.1-int4-inc) under review, [accuracy](./docs/Mixtral-8x7B-Instruct-v0.1-acc.md), [recipe](./examples/language-modeling/scripts/Mixtral-8x7B-Instruct-v0.1.sh),  [example](./examples/language-modeling/) |\n| mistralai/Mixtral-8x7B-v0.1 | [HF-int4-model](https://huggingface.co/Intel/Mixtral-8x7B-v0.1-int4-inc) under review, [accuracy](./docs/Mixtral-8x7B-v0.1-acc.md), [recipe](./examples/language-modeling/scripts/Mixtral-8x7B-v0.1.sh), [example](./examples/language-modeling/)                  |\n| microsoft/phi-2          | [HF-int4-model](https://huggingface.co/Intel/phi-2-int4-inc) under review, [accuracy](./docs/phi-2-acc.md), [recipe](./examples/language-modeling/scripts/phi-2.sh), [example](./examples/language-modeling/)                                                      |\n| meta-llama/Llama-2-7b-chat-hf | [accuracy](./docs/Llama-2-7b-chat-hf-acc.md), [recipe](./examples/language-modeling/scripts/Llama-2-7b-chat-hf.sh), [example](./examples/language-modeling/)                                                                                                                    |\n| Salesforce/codegen25-7b-multi | [example](./examples/code-generation)                                                                                                                                                                                                                              |\n| EleutherAI/gpt-j-6b | [example](./examples/language-modeling/)                                                                                                                                                                                                                           |\n| huggyllama/llama-7b | [example](./examples/language-modeling/)                                                                                                                                                                                                                           |\n| meta-llama/Llama-2-7b-hf | [example](./examples/language-modeling/)                                                                                                                                                                                                                           |\n| facebook/opt-6.7b | [example](./examples/language-modeling/)                                                                                                                                                                                                                           |\n| tiiuae/falcon-7b | [example](./examples/language-modeling/)                                                                                                                                                                                                                           |\n| mosaicml/mpt-7b | [example](./examples/language-modeling/)                                                                                                                                                                                                                           |\n| bigscience/bloom-7b1 | [example](./examples/language-modeling/)                                                                                                                                                                                                                           |\n| baichuan-inc/Baichuan-7B | [example](./examples/language-modeling/)                                                                                                                                                                                                                           |\n| Qwen/Qwen-7B | [example](./examples/language-modeling/)                                                                                                                                                                                                                           |\n| THUDM/chatglm3-6b | [example](./examples/language-modeling/)                                                                                                                                                                                                                           |\n| MBZUAI/LaMini-GPT-124M | [example](./examples/language-modeling/)                                                                                                                                                                                                                           |\n| EleutherAI/gpt-neo-125m | [example](./examples/language-modeling/)                                                                                                                                                                                                                           |\n| databricks/dolly-v2-3b | [example](./examples/language-modeling/)                                                                                                                                                                                                                           |\n| stabilityai/stablelm-base-alpha-3b | [example](./examples/language-modeling/)\n\n\n\n\n## Comparison with other methods\n\nWe provide a [comprehensive analysis](docs/acc.md) with other methods in our accuracy data section. Notably, our approach has outperformed GPTQ with a score of 30/32 and AWQ with a score of 27/32 across llamv1/llamav2/mistral-7b on W4G-1, W4G128, W3G128, W2G128.  And the tuning costs are comparable.\n\n## Tips\n1 Consider increasing tuning steps to achieve better results, albeit with increased tuning time. \n\n2 Setting 'use_quant_input' to False has been observed to occasionally yield improved results.\n\n3 Setting 'minmax_lr' to 2.0/iters has been observed to occasionally yield improved results.\n\n## Reference\nIf you find SignRound useful for your research, please cite our paper:\n```bash\n@article{cheng2023optimize,\n  title={Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs},\n  author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao},\n  journal={arXiv preprint arXiv:2309.05516},\n  year={2023}\n}\n```\n",
    "bugtrack_url": null,
    "license": "Apache 2.0",
    "summary": "Repository of AutoRound: Advanced Weight-Only Quantization Algorithm for LLMs",
    "version": "0.1",
    "project_urls": {
        "Homepage": "https://github.com/intel/auto-round"
    },
    "split_keywords": [
        "quantization",
        "auto-around",
        "llm",
        "signround"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "29c4ab673382b58f880e456b81a28b00b90a636c8e0cdcfdee2aec41cc86939b",
                "md5": "0b158d504435c69bb8a5f8ddb4b189eb",
                "sha256": "b5ffd71eba07d29c562546fda3e4ba03ffaaf3c732e33207f037edd2d50e5955"
            },
            "downloads": -1,
            "filename": "auto_round-0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "0b158d504435c69bb8a5f8ddb4b189eb",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7.0",
            "size": 48402,
            "upload_time": "2024-03-08T07:52:36",
            "upload_time_iso_8601": "2024-03-08T07:52:36.999655Z",
            "url": "https://files.pythonhosted.org/packages/29/c4/ab673382b58f880e456b81a28b00b90a636c8e0cdcfdee2aec41cc86939b/auto_round-0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d8ed69e6bf537ed6afcc5e87ff65f383cbe2c6b9695a0c0691371334187f8c83",
                "md5": "32ae8e81f745d019e22baf0dc1b750c6",
                "sha256": "a52331cf78530b08bd9810608fa616487c331748e419dc1f86afec170b14790c"
            },
            "downloads": -1,
            "filename": "auto_round-0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "32ae8e81f745d019e22baf0dc1b750c6",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7.0",
            "size": 46237,
            "upload_time": "2024-03-08T07:52:40",
            "upload_time_iso_8601": "2024-03-08T07:52:40.032397Z",
            "url": "https://files.pythonhosted.org/packages/d8/ed/69e6bf537ed6afcc5e87ff65f383cbe2c6b9695a0c0691371334187f8c83/auto_round-0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-08 07:52:40",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "intel",
    "github_project": "auto-round",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "auto-round"
}
        
Elapsed time: 0.19433s