# FMS Acceleration for Accelerated PeFT Techniques
Currently only supports LoRA-related techniques, but more are in the pipeline to be added:
## Plugins
Plugin | Description | Depends | Loading | Augmentation | Callbacks
--|--|--|--|--|--
[autogptq](./src/fms_acceleration_peft/framework_plugin_autogptq.py) | Loads 4bit GPTQ-LoRA with quantized GPTQ as base | AutoGPTQ | ✅ | ✅ | ✅
[bnb](./src/fms_acceleration_peft/framework_plugin_bnb.py) | Loads 4bit QLoRA with quantized bitsandbytes Linear4 | Huggingface<br>bitsandbytes | ✅ | ✅ | ✅
### Key Points
- fix upcasting (resulting in slowdown) issue for `bnb` plugin, originally discovered by inventors of [Unsloth](https://unsloth.ai/blog/mistral-benchmark). **NOTE**: we recommend using *mixed precision* when using 4bit quant for better performance, as per our benchmarks.
- `bnb` properly configured to work with FSDP following [this guide](https://huggingface.co/docs/bitsandbytes/main/en/fsdp_qlora).
- `triton_v2` kernels are not yet properly integrated into huggingface optimum.
- `triton_v2` kernels are [the only 4bit kernels that work for training](https://github.com/AutoGPTQ/AutoGPTQ/issues/633).
## GPTQ-LORA's AutoGPTQ - Current Implementation vs Legacy Implementation
GPTQ-LORA depends on an AutoGPTQ backend to run. There are 2 backend options
1. Current Implementation
- This is an extracted local subset from [ModelCloud's](https://github.com/ModelCloud/GPTQModel) refactored fork.
- It removes redundant code to simplify build and installation of the plugin
2. Legacy Implementation
- This requires building the package from the official AutoGPTQ repository
- To replicate this implementation, follow the installation below
- The legacy implementation of GPTQ-LORA uses an external AutoGPTQ package, you must ensure the specific commit is installed
```
pip install git+https://github.com/AutoGPTQ/AutoGPTQ.git@ea829c7bbe83561c2b1de26795b6592992373ef7
```
- To construct the plugin, in the configuration object that is passed to the plugin - set `use_external_lib: True` (otherwise defaults to use the local AutoGPTQ package)
```
peft:
quantization:
auto_gptq:
kernel: triton_v2
from_quantized: True
use_external_lib: True
```
## Known Issues
<!--
- Models with sliding windows (e.g., Mistral, Mixtral) will have [memory and throughout issues](https://github.com/huggingface/transformers/issues/30461).
-->
- GPTQ-LORA sometimes observed to have `nan` grad norms in the begining of training, but training proceeds well otherwise.
Raw data
{
"_id": null,
"home_page": null,
"name": "fms-acceleration-peft",
"maintainer": null,
"docs_url": null,
"requires_python": "~=3.9",
"maintainer_email": null,
"keywords": "acceleration, fms-hf-tuning, peft",
"author": null,
"author_email": "Fabian Lim <flim@sg.ibm.com>, Aaron Chew <aaron.chew1@ibm.com>",
"download_url": null,
"platform": null,
"description": "# FMS Acceleration for Accelerated PeFT Techniques\n\nCurrently only supports LoRA-related techniques, but more are in the pipeline to be added:\n\n## Plugins\n\nPlugin | Description | Depends | Loading | Augmentation | Callbacks\n--|--|--|--|--|--\n[autogptq](./src/fms_acceleration_peft/framework_plugin_autogptq.py) | Loads 4bit GPTQ-LoRA with quantized GPTQ as base | AutoGPTQ | \u2705 | \u2705 | \u2705 \n[bnb](./src/fms_acceleration_peft/framework_plugin_bnb.py) | Loads 4bit QLoRA with quantized bitsandbytes Linear4 | Huggingface<br>bitsandbytes | \u2705 | \u2705 | \u2705 \n\n\n### Key Points\n- fix upcasting (resulting in slowdown) issue for `bnb` plugin, originally discovered by inventors of [Unsloth](https://unsloth.ai/blog/mistral-benchmark). **NOTE**: we recommend using *mixed precision* when using 4bit quant for better performance, as per our benchmarks.\n- `bnb` properly configured to work with FSDP following [this guide](https://huggingface.co/docs/bitsandbytes/main/en/fsdp_qlora). \n- `triton_v2` kernels are not yet properly integrated into huggingface optimum.\n- `triton_v2` kernels are [the only 4bit kernels that work for training](https://github.com/AutoGPTQ/AutoGPTQ/issues/633).\n\n## GPTQ-LORA's AutoGPTQ - Current Implementation vs Legacy Implementation\n\nGPTQ-LORA depends on an AutoGPTQ backend to run. There are 2 backend options\n\n1. Current Implementation\n - This is an extracted local subset from [ModelCloud's](https://github.com/ModelCloud/GPTQModel) refactored fork.\n - It removes redundant code to simplify build and installation of the plugin\n2. Legacy Implementation\n - This requires building the package from the official AutoGPTQ repository\n - To replicate this implementation, follow the installation below\n\n - The legacy implementation of GPTQ-LORA uses an external AutoGPTQ package, you must ensure the specific commit is installed\n ```\n pip install git+https://github.com/AutoGPTQ/AutoGPTQ.git@ea829c7bbe83561c2b1de26795b6592992373ef7\n ```\n - To construct the plugin, in the configuration object that is passed to the plugin - set `use_external_lib: True` (otherwise defaults to use the local AutoGPTQ package)\n ```\n peft:\n quantization: \n auto_gptq:\n kernel: triton_v2\n from_quantized: True\n use_external_lib: True\n ```\n\n## Known Issues\n\n<!--\n- Models with sliding windows (e.g., Mistral, Mixtral) will have [memory and throughout issues](https://github.com/huggingface/transformers/issues/30461).\n-->\n- GPTQ-LORA sometimes observed to have `nan` grad norms in the begining of training, but training proceeds well otherwise.\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "FMS Acceleration for PeFT",
"version": "0.3.3",
"project_urls": null,
"split_keywords": [
"acceleration",
" fms-hf-tuning",
" peft"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "ab338849e68bbeca003d7f5fbdedbfa291529323bfe7656346315bfb3959d3ef",
"md5": "963dadb413f351a324d09c354f6e16a2",
"sha256": "f88209c6080affbf5fbb3bf23f68c8cec69da48ed14cbbfe52fae17724e6126c"
},
"downloads": -1,
"filename": "fms_acceleration_peft-0.3.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "963dadb413f351a324d09c354f6e16a2",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "~=3.9",
"size": 73889,
"upload_time": "2024-10-25T05:11:58",
"upload_time_iso_8601": "2024-10-25T05:11:58.804237Z",
"url": "https://files.pythonhosted.org/packages/ab/33/8849e68bbeca003d7f5fbdedbfa291529323bfe7656346315bfb3959d3ef/fms_acceleration_peft-0.3.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-25 05:11:58",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "fms-acceleration-peft"
}