fms-acceleration-foak


Namefms-acceleration-foak JSON
Version 0.3.0 PyPI version JSON
download
home_pageNone
SummaryFMS Acceleration using Fused Operations and Kernels
upload_time2024-09-16 06:41:09
maintainerNone
docs_urlNone
authorNone
requires_python~=3.9
licenseApache-2.0
keywords acceleration fms-hf-tuning fused-ops triton
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # FMS Acceleration for Fused Operations and Kernels

This library contains fused operations and custom kernels, to be expanded over time. Currently it contains the following:


1. Fused operations and kernels extracted from [unsloth](#extracted-code-from-unsloth). 
    - Low-Rank Adapter Fused Operations
    - Fast RoPE Triton Kernels
    - Fast RMS LayerNorm Triton Kernels
    - Fast Cross Entropy Triton Kernels

## Plugins

Plugin | Description | Depends | Loading | Augmentation | Callbacks
--|--|--|--|--|--
[fast_quantized_peft](./src/fms_accelerate_foak/framework_plugin_fast_quantized_peft.py) | LoRA fused ops, fast cross-entropy, fast rms, fast RoPE | Contains extracted code |  | ✅
[fast_kernels](./src/fms_accelerate_foak/framework_plugin_fast_kernels.py) | Enhanced version of quantized_peft, that also works for full-FT and non-quant peft | Contains extracted code |  | ✅

### Supported DataType Settings
**Compatibility Matrix with Mixed Precision**
torch_dtype | Mixed Precision | Full-FT-FOAK | PEFT-FOAK | QPEFT-FOAK
-- | -- | -- | -- | --
FLOAT16 | - | ✗ Not Allowed | ✗| ✗
FLOAT16 | FP16 | ValueError: <br>Attempting to <br>unscale FP16 gradients. <br>[See here](https://github.com/huggingface/peft/blob/main/docs/source/developer_guides/troubleshooting.md) | **Compatible** | **Compatible**
BFLOAT16 | - | ✗ | ✗ | ✗
BFLOAT16 | BF16 | **Compatible** | **Compatible** | [Less Performant](https://github.com/foundation-model-stack/fms-acceleration/issues/84)

### Code Extracted from Unsloth


Notes on the extraction of code from [unsloth](https://github.com/unslothai/unsloth):
- While unsloth is [released under Apache 2.0](https://github.com/unslothai/unsloth/blob/main/LICENSE), there are comments indicating some exceptions strewn throughout the code base, see [an example here](https://github.com/unslothai/unsloth/blob/ec19e61c854dcf9104386fa63fc6c4f2944d4f35/unsloth/models/llama.py#L1140-L1143).
    ```
    it would require a commercial license if used to run on more than 4 GPUs ...
    ```
- These exceptions appear to be located around the trainer improvements, see [another example here](https://github.com/unslothai/unsloth/blob/ec19e61c854dcf9104386fa63fc6c4f2944d4f35/unsloth/models/llama.py#L1177-L1183).
- These exceptions appear around [Feb 2024 Release](https://github.com/unslothai/unsloth/commit/3e4c5a323c16bbda2c92212b790073c4e99c2a55); any code that appears in any file where such exceptions occur **is not extracted**.
- Instead in its place, we have adopted a different approach; we adopt the approach of model patching, as opposed unsloths' approach to rewrite the model. Our approach is novel and **completely rewritten from scratch**. 
- We have also enabled dropout on the lora fused operations.
- All extracted code appears before the Feb 2024 Release. 
- In the table below we record what was extracted, and the exact commit from which it was taken.

Path | Description | Extracted From  | Modifications | Date
--|--|--|--|--
[fused_ops/unsloth_lora](./src/fms_acceleration_foak/fused_ops/unsloth_lora) | QLoRA fast dequant, activation kernels | `unsloth/main` @ [1ecc0185](https://github.com/unslothai/unsloth/commit/1ecc0185a5759c7a0c95dfc96aceea5023cebdfc) |  | 28 Jan 2024
[fused_ops/unsloth_lora/bnb](./src/fms_acceleration_foak/fused_ops/unsloth_lora/bnb) | BNB fast lora | `unsloth/main` @ [1ecc0185](https://github.com/unslothai/unsloth/commit/1ecc0185a5759c7a0c95dfc96aceea5023cebdfc) | `fast_lora.py` | 28 Jan 2024
[fused_ops/unsloth_lora/gptq](./src/fms_acceleration_foak/fused_ops/unsloth_lora/gptq) | GPTQ fast dequant (triton_v2) | `jeromeku/main` @ [2839d39](https://github.com/jeromeku/unsloth/commit/2839d390ef3bb318904289bfb9a7751a782c4e44) | `fast_lora.py`<br>`triton/layers.py` | 6 Feb 2024
[kernels/unsloth](./src/fms_acceleration_foak/kernels/unsloth) | Fast RMS, RoPE, CrossEnt kernels | `unsloth/main` @ [1ecc0185](https://github.com/unslothai/unsloth/commit/1ecc0185a5759c7a0c95dfc96aceea5023cebdfc) | `cross_entropy_loss.py`<br>`rms_layernorm.py` | 28 Jan 2024

### Supported Models

Model | norm | pos emb | cross-ent | fused_lora 
--|--|--|--|--
`LlamaForCausalLM` | ✅  | ✅ | ✅  | ✅ 
`MistralForCausalLM` | ✅  | ✅ | ✅  | ✅ 
`MixtralForCausalLM` | ✅  | ✅ | ✅  | ✅ 
`GPTBigCodeForCausalLM` | ❌  | ❌ | ✅  | ❌ 
<!-- `GraniteForCausalLM` | ✅  | ✅ | ✅  | ✅  -->

## Known Issues

- MixedPrecision `--fp16` or `--bf16` should be used with `fast_lora`.
- `fast_lora` has issues with FSDP V1 with the `peft` style of FSDP wrapping. 
    * This is because the adapter's forward functions are bypassed in the fused ops.
    * For AutoGPTQ/QLoRA this is addressed by distributing the adapters using DDP so they will be unsharded in time for the fused ops.
- `fast_rope_embeddings` does not work with position_ids. Currently `position_ids` are ignored and could give wrong results.
            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "fms-acceleration-foak",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "~=3.9",
    "maintainer_email": null,
    "keywords": "acceleration, fms-hf-tuning, fused-ops, triton",
    "author": null,
    "author_email": "Fabian Lim <flim@sg.ibm.com>, Aaron Chew <aaron.chew1@ibm.com>",
    "download_url": null,
    "platform": null,
    "description": "# FMS Acceleration for Fused Operations and Kernels\n\nThis library contains fused operations and custom kernels, to be expanded over time. Currently it contains the following:\n\n\n1. Fused operations and kernels extracted from [unsloth](#extracted-code-from-unsloth). \n    - Low-Rank Adapter Fused Operations\n    - Fast RoPE Triton Kernels\n    - Fast RMS LayerNorm Triton Kernels\n    - Fast Cross Entropy Triton Kernels\n\n## Plugins\n\nPlugin | Description | Depends | Loading | Augmentation | Callbacks\n--|--|--|--|--|--\n[fast_quantized_peft](./src/fms_accelerate_foak/framework_plugin_fast_quantized_peft.py) | LoRA fused ops, fast cross-entropy, fast rms, fast RoPE | Contains extracted code |  | \u2705\n[fast_kernels](./src/fms_accelerate_foak/framework_plugin_fast_kernels.py) | Enhanced version of quantized_peft, that also works for full-FT and non-quant peft | Contains extracted code |  | \u2705\n\n### Supported DataType Settings\n**Compatibility Matrix with Mixed Precision**\ntorch_dtype | Mixed Precision | Full-FT-FOAK | PEFT-FOAK | QPEFT-FOAK\n-- | -- | -- | -- | --\nFLOAT16 | - | \u2717 Not Allowed | \u2717| \u2717\nFLOAT16 | FP16 | ValueError: <br>Attempting to <br>unscale FP16 gradients. <br>[See here](https://github.com/huggingface/peft/blob/main/docs/source/developer_guides/troubleshooting.md) | **Compatible** | **Compatible**\nBFLOAT16 | - | \u2717 | \u2717 | \u2717\nBFLOAT16 | BF16 | **Compatible** | **Compatible** | [Less Performant](https://github.com/foundation-model-stack/fms-acceleration/issues/84)\n\n### Code Extracted from Unsloth\n\n\nNotes on the extraction of code from [unsloth](https://github.com/unslothai/unsloth):\n- While unsloth is [released under Apache 2.0](https://github.com/unslothai/unsloth/blob/main/LICENSE), there are comments indicating some exceptions strewn throughout the code base, see [an example here](https://github.com/unslothai/unsloth/blob/ec19e61c854dcf9104386fa63fc6c4f2944d4f35/unsloth/models/llama.py#L1140-L1143).\n    ```\n    it would require a commercial license if used to run on more than 4 GPUs ...\n    ```\n- These exceptions appear to be located around the trainer improvements, see [another example here](https://github.com/unslothai/unsloth/blob/ec19e61c854dcf9104386fa63fc6c4f2944d4f35/unsloth/models/llama.py#L1177-L1183).\n- These exceptions appear around [Feb 2024 Release](https://github.com/unslothai/unsloth/commit/3e4c5a323c16bbda2c92212b790073c4e99c2a55); any code that appears in any file where such exceptions occur **is not extracted**.\n- Instead in its place, we have adopted a different approach; we adopt the approach of model patching, as opposed unsloths' approach to rewrite the model. Our approach is novel and **completely rewritten from scratch**. \n- We have also enabled dropout on the lora fused operations.\n- All extracted code appears before the Feb 2024 Release. \n- In the table below we record what was extracted, and the exact commit from which it was taken.\n\nPath | Description | Extracted From  | Modifications | Date\n--|--|--|--|--\n[fused_ops/unsloth_lora](./src/fms_acceleration_foak/fused_ops/unsloth_lora) | QLoRA fast dequant, activation kernels | `unsloth/main` @ [1ecc0185](https://github.com/unslothai/unsloth/commit/1ecc0185a5759c7a0c95dfc96aceea5023cebdfc) |  | 28 Jan 2024\n[fused_ops/unsloth_lora/bnb](./src/fms_acceleration_foak/fused_ops/unsloth_lora/bnb) | BNB fast lora | `unsloth/main` @ [1ecc0185](https://github.com/unslothai/unsloth/commit/1ecc0185a5759c7a0c95dfc96aceea5023cebdfc) | `fast_lora.py` | 28 Jan 2024\n[fused_ops/unsloth_lora/gptq](./src/fms_acceleration_foak/fused_ops/unsloth_lora/gptq) | GPTQ fast dequant (triton_v2) | `jeromeku/main` @ [2839d39](https://github.com/jeromeku/unsloth/commit/2839d390ef3bb318904289bfb9a7751a782c4e44) | `fast_lora.py`<br>`triton/layers.py` | 6 Feb 2024\n[kernels/unsloth](./src/fms_acceleration_foak/kernels/unsloth) | Fast RMS, RoPE, CrossEnt kernels | `unsloth/main` @ [1ecc0185](https://github.com/unslothai/unsloth/commit/1ecc0185a5759c7a0c95dfc96aceea5023cebdfc) | `cross_entropy_loss.py`<br>`rms_layernorm.py` | 28 Jan 2024\n\n### Supported Models\n\nModel | norm | pos emb | cross-ent | fused_lora \n--|--|--|--|--\n`LlamaForCausalLM` | \u2705  | \u2705 | \u2705  | \u2705 \n`MistralForCausalLM` | \u2705  | \u2705 | \u2705  | \u2705 \n`MixtralForCausalLM` | \u2705  | \u2705 | \u2705  | \u2705 \n`GPTBigCodeForCausalLM` | \u274c  | \u274c | \u2705  | \u274c \n<!-- `GraniteForCausalLM` | \u2705  | \u2705 | \u2705  | \u2705  -->\n\n## Known Issues\n\n- MixedPrecision `--fp16` or `--bf16` should be used with `fast_lora`.\n- `fast_lora` has issues with FSDP V1 with the `peft` style of FSDP wrapping. \n    * This is because the adapter's forward functions are bypassed in the fused ops.\n    * For AutoGPTQ/QLoRA this is addressed by distributing the adapters using DDP so they will be unsharded in time for the fused ops.\n- `fast_rope_embeddings` does not work with position_ids. Currently `position_ids` are ignored and could give wrong results.",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "FMS Acceleration using Fused Operations and Kernels",
    "version": "0.3.0",
    "project_urls": null,
    "split_keywords": [
        "acceleration",
        " fms-hf-tuning",
        " fused-ops",
        " triton"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f1665a00540be7cfcee9c29990a64d5352aca421d86cdb380ff7a70a6604a838",
                "md5": "f106129e3a21c71734309442b245f40b",
                "sha256": "674d51967c06867bcbefec41b42accb97e5942a4088e0768b22c9adad36734eb"
            },
            "downloads": -1,
            "filename": "fms_acceleration_foak-0.3.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "f106129e3a21c71734309442b245f40b",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "~=3.9",
            "size": 50870,
            "upload_time": "2024-09-16T06:41:09",
            "upload_time_iso_8601": "2024-09-16T06:41:09.211327Z",
            "url": "https://files.pythonhosted.org/packages/f1/66/5a00540be7cfcee9c29990a64d5352aca421d86cdb380ff7a70a6604a838/fms_acceleration_foak-0.3.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-16 06:41:09",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "fms-acceleration-foak"
}
        
Elapsed time: 0.36394s