fms-acceleration-peft

Name	fms-acceleration-peft JSON
Version	0.3.3 JSON
	download
home_page	None
Summary	FMS Acceleration for PeFT
upload_time	2024-10-25 05:11:58
maintainer	None
docs_url	None
author	None
requires_python	~=3.9
license	Apache-2.0
keywords	acceleration fms-hf-tuning peft
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # FMS Acceleration for Accelerated PeFT Techniques

Currently only supports LoRA-related techniques, but more are in the pipeline to be added:

## Plugins

Plugin | Description | Depends | Loading | Augmentation | Callbacks
--|--|--|--|--|--
[autogptq](./src/fms_acceleration_peft/framework_plugin_autogptq.py) | Loads 4bit GPTQ-LoRA with quantized GPTQ as base | AutoGPTQ | ✅ | ✅ | ✅ 
[bnb](./src/fms_acceleration_peft/framework_plugin_bnb.py) | Loads 4bit QLoRA with quantized bitsandbytes Linear4 | Huggingface<br>bitsandbytes | ✅ | ✅ | ✅ 


### Key Points
- fix upcasting (resulting in slowdown) issue for `bnb` plugin, originally discovered by inventors of [Unsloth](https://unsloth.ai/blog/mistral-benchmark). **NOTE**: we recommend using *mixed precision* when using 4bit quant for better performance, as per our benchmarks.
- `bnb` properly configured to work with FSDP following [this guide](https://huggingface.co/docs/bitsandbytes/main/en/fsdp_qlora). 
- `triton_v2` kernels are not yet properly integrated into huggingface optimum.
- `triton_v2` kernels are [the only 4bit kernels that work for training](https://github.com/AutoGPTQ/AutoGPTQ/issues/633).

## GPTQ-LORA's AutoGPTQ - Current Implementation vs Legacy Implementation

GPTQ-LORA depends on an AutoGPTQ backend to run. There are 2 backend options

1. Current Implementation
    - This is an extracted local subset from [ModelCloud's](https://github.com/ModelCloud/GPTQModel) refactored fork.
    - It removes redundant code to simplify build and installation of the plugin
2. Legacy Implementation
    - This requires building the package from the official AutoGPTQ repository
    - To replicate this implementation, follow the installation below

        - The legacy implementation of GPTQ-LORA uses an external AutoGPTQ package, you must ensure the specific commit is installed
            ```
            pip install git+https://github.com/AutoGPTQ/AutoGPTQ.git@ea829c7bbe83561c2b1de26795b6592992373ef7
            ```
        - To construct the plugin, in the configuration object that is passed to the plugin - set `use_external_lib: True` (otherwise defaults to use the local AutoGPTQ package)
        ```
            peft:
            quantization: 
                auto_gptq:
                kernel: triton_v2
                from_quantized: True
                use_external_lib: True
        ```

## Known Issues

<!--
- Models with sliding windows (e.g., Mistral, Mixtral) will have [memory and throughout issues](https://github.com/huggingface/transformers/issues/30461).
-->
- GPTQ-LORA sometimes observed to have `nan` grad norms in the begining of training, but training proceeds well otherwise.

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "fms-acceleration-peft",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "~=3.9",
    "maintainer_email": null,
    "keywords": "acceleration, fms-hf-tuning, peft",
    "author": null,
    "author_email": "Fabian Lim <flim@sg.ibm.com>, Aaron Chew <aaron.chew1@ibm.com>",
    "download_url": null,
    "platform": null,
    "description": "# FMS Acceleration for Accelerated PeFT Techniques\n\nCurrently only supports LoRA-related techniques, but more are in the pipeline to be added:\n\n## Plugins\n\nPlugin | Description | Depends | Loading | Augmentation | Callbacks\n--|--|--|--|--|--\n[autogptq](./src/fms_acceleration_peft/framework_plugin_autogptq.py) | Loads 4bit GPTQ-LoRA with quantized GPTQ as base | AutoGPTQ | \u2705 | \u2705 | \u2705 \n[bnb](./src/fms_acceleration_peft/framework_plugin_bnb.py) | Loads 4bit QLoRA with quantized bitsandbytes Linear4 | Huggingface<br>bitsandbytes | \u2705 | \u2705 | \u2705 \n\n\n### Key Points\n- fix upcasting (resulting in slowdown) issue for `bnb` plugin, originally discovered by inventors of [Unsloth](https://unsloth.ai/blog/mistral-benchmark). **NOTE**: we recommend using *mixed precision* when using 4bit quant for better performance, as per our benchmarks.\n- `bnb` properly configured to work with FSDP following [this guide](https://huggingface.co/docs/bitsandbytes/main/en/fsdp_qlora). \n- `triton_v2` kernels are not yet properly integrated into huggingface optimum.\n- `triton_v2` kernels are [the only 4bit kernels that work for training](https://github.com/AutoGPTQ/AutoGPTQ/issues/633).\n\n## GPTQ-LORA's AutoGPTQ - Current Implementation vs Legacy Implementation\n\nGPTQ-LORA depends on an AutoGPTQ backend to run. There are 2 backend options\n\n1. Current Implementation\n    - This is an extracted local subset from [ModelCloud's](https://github.com/ModelCloud/GPTQModel) refactored fork.\n    - It removes redundant code to simplify build and installation of the plugin\n2. Legacy Implementation\n    - This requires building the package from the official AutoGPTQ repository\n    - To replicate this implementation, follow the installation below\n\n        - The legacy implementation of GPTQ-LORA uses an external AutoGPTQ package, you must ensure the specific commit is installed\n            ```\n            pip install git+https://github.com/AutoGPTQ/AutoGPTQ.git@ea829c7bbe83561c2b1de26795b6592992373ef7\n            ```\n        - To construct the plugin, in the configuration object that is passed to the plugin - set `use_external_lib: True` (otherwise defaults to use the local AutoGPTQ package)\n        ```\n            peft:\n            quantization: \n                auto_gptq:\n                kernel: triton_v2\n                from_quantized: True\n                use_external_lib: True\n        ```\n\n## Known Issues\n\n<!--\n- Models with sliding windows (e.g., Mistral, Mixtral) will have [memory and throughout issues](https://github.com/huggingface/transformers/issues/30461).\n-->\n- GPTQ-LORA sometimes observed to have `nan` grad norms in the begining of training, but training proceeds well otherwise.\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "FMS Acceleration for PeFT",
    "version": "0.3.3",
    "project_urls": null,
    "split_keywords": [
        "acceleration",
        " fms-hf-tuning",
        " peft"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ab338849e68bbeca003d7f5fbdedbfa291529323bfe7656346315bfb3959d3ef",
                "md5": "963dadb413f351a324d09c354f6e16a2",
                "sha256": "f88209c6080affbf5fbb3bf23f68c8cec69da48ed14cbbfe52fae17724e6126c"
            },
            "downloads": -1,
            "filename": "fms_acceleration_peft-0.3.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "963dadb413f351a324d09c354f6e16a2",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "~=3.9",
            "size": 73889,
            "upload_time": "2024-10-25T05:11:58",
            "upload_time_iso_8601": "2024-10-25T05:11:58.804237Z",
            "url": "https://files.pythonhosted.org/packages/ab/33/8849e68bbeca003d7f5fbdedbfa291529323bfe7656346315bfb3959d3ef/fms_acceleration_peft-0.3.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-25 05:11:58",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "fms-acceleration-peft"
}

None