# FMS Acceleration for Attention And Distributed Packing Plugin
This library contains plugins to accelerate finetuning with the following optimizations:
1. Padding-Free Flash Attention Computation
2. Multipack Distributed Sampling
## Plugins
Plugin | Description | Depends | Loading | Augmentation | Callbacks
--|--|--|--|--|--
[padding_free](./src/fms_acceleration_aadp/framework_plugin_padding_free.py) | Padding-Free Flash Attention Computation | flash_attn | | ✅ |
[multipack sampler](./src/fms_acceleration_aadp/framework_plugin_multipack.py) | Multipack Distributed Sampling | numba | | ✅ |
## Native Transformers Support from v4.44.0
Transformers natively supports padding-free from v4.44.0 [see here](https://github.com/huggingface/transformers/pull/31629). The padding-free plugin will use the transformers library if compatible,
otherwise if `transformers < v4.44.0` the plugin will use an internal implementation instead.
## Native TRL Support for PaddingFree with DataCollatorForCompletionOnlyLM from v0.10.1
Users will be able to use PaddingFree with untokenized data from TRL >= v0.10.1. The flattening of inputs and addition of `position_ids` to the batch
is carried out inside `DataCollatorForCompletionOnlyLM` when keyword `padding_free` is passed to the collator. The plugin uses the TRL library if compatible,
otherwise if `trl < v0.10.1` the plugin will use an internal implementation instead.
If a user still passes in a pretokenized dataset, the plugin will still use `DataCollaterForFlattening` in the `collate_fn`.
## Running Benchmarks
To reproduce the benchmarks, simply run the following commands,
Reproduce [Padding Free on A100 80GB](scripts/benchmarks/refs_orca/a100_80gb_pf.csv)
`tox -e run-benches -- "1 2" "4 8" benchmark_outputs scenarios-orca.yaml "none"`
Reproduce [MultiPack on A100 80GB](scripts/benchmarks/refs_orca/a100_80gb_mp.csv)
`tox -e run-benches -- "2 4 8" "16 32 64" benchmark_outputs scenarios-orca.yaml "padding-free"`
## Known Issues
### Currenly Only Supports Multipack with Padding-Free
The multipack plugin currently also requires the padding-free plugin to work.
This may change in the future if there is demand for multipack to work standalone without padding free.
Raw data
{
"_id": null,
"home_page": null,
"name": "fms-acceleration-aadp",
"maintainer": null,
"docs_url": null,
"requires_python": "~=3.9",
"maintainer_email": null,
"keywords": "acceleration, fms-hf-tuning, multipack, padding-free",
"author": null,
"author_email": "Fabian Lim <flim@sg.ibm.com>, Aaron Chew <aaron.chew1@ibm.com>",
"download_url": null,
"platform": null,
"description": "# FMS Acceleration for Attention And Distributed Packing Plugin\n\nThis library contains plugins to accelerate finetuning with the following optimizations:\n\n1. Padding-Free Flash Attention Computation\n2. Multipack Distributed Sampling\n\n\n## Plugins\n\nPlugin | Description | Depends | Loading | Augmentation | Callbacks\n--|--|--|--|--|--\n[padding_free](./src/fms_acceleration_aadp/framework_plugin_padding_free.py) | Padding-Free Flash Attention Computation | flash_attn | | \u2705 | \n[multipack sampler](./src/fms_acceleration_aadp/framework_plugin_multipack.py) | Multipack Distributed Sampling | numba | | \u2705 | \n\n\n## Native Transformers Support from v4.44.0\nTransformers natively supports padding-free from v4.44.0 [see here](https://github.com/huggingface/transformers/pull/31629). The padding-free plugin will use the transformers library if compatible, \notherwise if `transformers < v4.44.0` the plugin will use an internal implementation instead.\n\n## Native TRL Support for PaddingFree with DataCollatorForCompletionOnlyLM from v0.10.1\nUsers will be able to use PaddingFree with untokenized data from TRL >= v0.10.1. The flattening of inputs and addition of `position_ids` to the batch\nis carried out inside `DataCollatorForCompletionOnlyLM` when keyword `padding_free` is passed to the collator. The plugin uses the TRL library if compatible, \notherwise if `trl < v0.10.1` the plugin will use an internal implementation instead.\n\nIf a user still passes in a pretokenized dataset, the plugin will still use `DataCollaterForFlattening` in the `collate_fn`.\n\n## Running Benchmarks\n\nTo reproduce the benchmarks, simply run the following commands,\n\nReproduce [Padding Free on A100 80GB](scripts/benchmarks/refs_orca/a100_80gb_pf.csv)\n`tox -e run-benches -- \"1 2\" \"4 8\" benchmark_outputs scenarios-orca.yaml \"none\"`\n\nReproduce [MultiPack on A100 80GB](scripts/benchmarks/refs_orca/a100_80gb_mp.csv)\n`tox -e run-benches -- \"2 4 8\" \"16 32 64\" benchmark_outputs scenarios-orca.yaml \"padding-free\"`\n\n## Known Issues\n\n### Currenly Only Supports Multipack with Padding-Free\n\nThe multipack plugin currently also requires the padding-free plugin to work.\nThis may change in the future if there is demand for multipack to work standalone without padding free.\n\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "FMS Acceleration Plugin for Attention and Distributed Packing Optimizations",
"version": "0.1.1",
"project_urls": null,
"split_keywords": [
"acceleration",
" fms-hf-tuning",
" multipack",
" padding-free"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "f9091ea1428ab69d28550fe4e0077f56832b4230c445fdb02b5d01439076990f",
"md5": "b4c485b1a5227a38ebc95c650e1bcc4e",
"sha256": "f7cf38e5d93693d084306f59efde73dba4e3bd58982e89d32ef01ef523589c3a"
},
"downloads": -1,
"filename": "fms_acceleration_aadp-0.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "b4c485b1a5227a38ebc95c650e1bcc4e",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "~=3.9",
"size": 16457,
"upload_time": "2024-09-16T06:41:18",
"upload_time_iso_8601": "2024-09-16T06:41:18.597198Z",
"url": "https://files.pythonhosted.org/packages/f9/09/1ea1428ab69d28550fe4e0077f56832b4230c445fdb02b5d01439076990f/fms_acceleration_aadp-0.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-09-16 06:41:18",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "fms-acceleration-aadp"
}