## Tilearn.llm使用说明
### 1. transfomers大模型训练加速
#### 1.1 CUDA Kernel(以LLAMA为例)
TILEARN.LLM 依赖 OPS 版本:
- 镜像 v1.6.7, transformers v4.31.0 -> 请使用 TILEARN.OPS 0.2.1.167
- 镜像 v1.7.2, transformers v4.39.3 -> 请使用 TILEARN.OPS 0.2.1.172
支持显卡:Ampere, Ada, or Hopper GPUs (e.g., A100, A800, H100, H800)
Dependencies: pytorch >= 2.0.0
当前版本完全兼容huggingface接口,不需要额外的操作
cuda kernel使用方法-代码修改如下
```python3
### TILEARN.LLM
from tilearn.llm.transformers import LlamaForCausalLM
### 模型接口与标准huggingface一致
model = LlamaForCausalLM.from_pretrained(...)
```
或者使用AutoModelForCausalLM接口
```python3
### TILEARN.LLM
from tilearn.llm.transformers import AutoModelForCausalLM
### 模型接口与标准huggingface一致
model = AutoModelForCausalLM.from_pretrained(...)
```
特殊说明:
1、由于baichuan1 13B和baichuan2 13B会产生冲突,目前tilearn.llm.transformers.AutoModelForCausalLM默认开启了baichuan1 13B,如果需要使用baichuan2 13B,需要在启动训练脚本中设置环境变量:export TILEARN_LLM_BAICHUAN_13B=2
```shell
### TILEARN_LLM_BAICHUAN_13B open baichuan2 model
export TILEARN_LLM_BAICHUAN_13B=2
```
2、目前加速已经支持的模型列表:
```python3
# llama
from tilearn.llm.transformers.models.llama.modeling_llama import LlamaForCausalLM
# bloom
from tilearn.llm.transformers.models.bloom.modeling_bloom import BloomForCausalLM
# baichuan1
from tilearn.llm.transformers.models.baichuan.baichuan1_13B.modeling_baichuan import BaichuanForCausalLM
from tilearn.llm.transformers.models.baichuan.baichuan1_7B.modeling_baichuan import BaiChuanForCausalLM
# baichuan2
# 默认使用TILEARN.LLM且无需任何设置
# 单独使用xformer,需安装xformer且设置环境变量TIACC_TRAINING_CUDA_KERNEL=2
from tilearn.llm.transformers.models.baichuan.baichuan2_7B.modeling_baichuan import BaichuanForCausalLM
from tilearn.llm.transformers.models.baichuan.baichuan2_13B.modeling_baichuan import BaichuanForCausalLM
# aquila2
from tilearn.llm.transformers.models.aquila.aquila2.modeling_aquila import AquilaForCausalLM
```
#### 1.2 torch compile - experiment
适用场景:huggingface transformers + trainer模型
自动编译优化,在main.py添加如下代码即可开启,目前还在实验阶段
```python
import tilearn.llm.compile
```
目前已支持手工CUDA算子+自动编译优化,若要关闭手工CUDA算子,则添加以下环境变量
```bash
export TILEARN_COMPILE_MODELPATCH=0
```
#### 1.3 Static Zero
适用场景:在deepspeed zero1、zero2、zero3、offload、int8等不同优化状态间切换
启动脚本修改如下
```shell
### TILEARN STATIC ZERO
### Open: TIACC_TRAINING_CUDA_KERNEL='O2'
### support 'O2' / 'O2.5' / 'O3' / 'O3.5' / 'O3_Q8'(doing)
### Close: TIACC_TRAINING_CUDA_KERNEL='None'
export TIACC_TRAINING_STATIC_ZERO='None' #'O2'
```
代码修改如下
```python3
from transformers import HfArgumentParser
from tilearn.llm.transformers import TrainingArguments
### 接口与标准huggingface一致
parser = HfArgumentParser((ModelArguments, DataTrainingArguments, TrainingArguments))
```
#### 1.4 加速效果
[TILEARN-LLM大模型训练加速指标](https://doc.weixin.qq.com/sheet/e3_AMgA0QZ_ACcjF99orweSwKk9RITj7?scode=AJEAIQdfAAoOnhk9M1AMgA0QZ_ACc&tab=pl3grp)
### 2. 通用训练加速功能介绍
训练加速中的通信加速能力通过兼容原生的DDP工具提供,用户无需修改原生的使用代码可直接进行使用,数据IO优化、自适应FP16都通过封装好的简单函数/类进行提供,用户仅需增加几行代码便可使用。
#### 2.1 使用DDP分布式训练通信优化(PyTorch+多机多卡DPP)
适用范围:多机多卡
以兼容原生DDP的方式启动训练脚本,无需进行训练代码的修改,启动命令参考示例如下:
在启动脚本start.sh内使用tiaccrun替换torchrun,接口与pytorch torchrun完全一致
```bash
export NODE_NUM=1
export INDEX=0
export GPU_NUM_PER_NODE=1
export MASTER_ADDR=127.0.0.1
export MASTER_PORT=23458
tiaccrun \
--nnodes $NODE_NUM \
--node_rank $INDEX \
--nproc_per_node $GPU_NUM_PER_NODE \
--master_addr $MASTER_ADDR \
--master_port $MASTER_PORT \
xxx.py
tilearnrun \
--nnodes $NODE_NUM \
--node_rank $INDEX \
--nproc_per_node $GPU_NUM_PER_NODE \
--master_addr $MASTER_ADDR \
--master_port $MASTER_PORT \
xxx.py
```
DDP分布式训练通信优化实测效果:
(加速效果在多机多卡场景方有体现,单机多卡场景与原生DDP性能无异。)
| 硬件环境 | 模型 | GPU卡数 | 原生DDP(examples/sec per V100) | TI-ACC通信优化(examples/sec per V100) |
| --------- |--------- |--------|------|--------------------------------|
| 腾讯云GN10Xp.20XLARGE320 | resnext50_32x4d | 1(单机) | 227 | 227 |
| 腾讯云GN10Xp.20XLARGE320 | resnext50_32x4d | 8(单机) | 215 | 215 |
| 腾讯云GN10Xp.20XLARGE320 | resnext50_32x4d | 16(双机) | 116 | 158.6 |
#### 2.2 使用TIACC优化器(PyTorch)
适用范围:单机单卡、单机多卡、多级多卡
```python
import torch
from tilearn.llm.torch.optimizers import FusedSGD
from tilearn.llm.torch.optimizers import FusedAdam
from tilearn.llm.torch.optimizers import FusedLAMB
from tilearn.llm.torch.optimizers import FusedAdagrad
nelem = 1
tensor = torch.rand(nelem, dtype=torch.float, device="cuda")
param = []
param.append(torch.nn.Parameter(tensor.clone()))
sgd_options = {"lr": .25, "momentum": .125}
optimizer =FusedSGD(param, **sgd_options)
optimizer =FusedAdam(param)
optimizer =FusedLAMB(param)
optimizer =FusedAdagrad(param)
```
FusedSGD接口
```python
class FusedSGD(Optimizer):
def __init__(self, params, lr=required, momentum=0,
dampening=0, weight_decay=0, nesterov=False)
```
FusedAdam接口
```python
class FusedAdam(Optimizer):
def __init__(self, params, lr=1e-3, bias_correction=True,
betas=(0.9, 0.999), eps=1e-8, adam_w_mode=True,
weight_decay=0., amsgrad=False)
```
FusedLAMB接口
```python
class FusedLAMB(Optimizer):
def __init__(self, params, lr=1e-3, bias_correction=True,
betas=(0.9, 0.999), eps=1e-6, weight_decay=0.01,
amsgrad=False, adam_w_mode=True,
max_grad_norm=1.0):
```
FusedAdagrad接口
```python
class FusedAdagrad(Optimizer):
def __init__(self, params, lr=1e-2, eps=1e-10,
weight_decay=0., adagrad_w_mode=False):
```
#### 2.3 cpu亲和性优化
适用范围单机8卡、多机多卡,需结合tiaccrun、torchrun一起使用
```python
from tilearn.llm.cpu_affinity import cpu_affinity
def main():
cpu_affinity()
main()
```
#### 2.4 使用自适应混合精度优化(PyTorch)
适用范围:开启torch amp后,loss不收敛或模型效果下降时,使用tiacc_training amp接口提升模型效果
```python
import torch
from tilearn.llm.torch.adapt_amp import MixedPrecision_TrainingPolicy
def main():
#使用tiacc自适应混合精度
scaler = torch.cuda.amp.GradScaler()
#实例化tiacc自适应混合精度策略类的对象
schedulePolicy = "TimeSchedulePolicy"
policy = MixedPrecision_TrainingPolicy(
policy=schedulePolicy,
start_time=0, end_time=40)
#根据输入的参数得到当前epoch是否需要开启混合精度
for epoch in range(0, 51):
mixed_precision = policy.enable_mixed_precision(epoch,
scaler=scaler)
print(mixed_precision)
#with amp.autocast(enabled=mixed_precision):
# outputs = model(inputs)
# loss = criterion(outputs, targets)
#scaler.scale(loss).backward()
#scaler.step(optimizer)
#scaler.update()
main()
```
##### 1) MixedPrecision_TrainingPolicy类接口
实现对训练过程中自动混合精度自适应策略的实例化,自适应策略包括时间混合精度、时间学习率混合精度策略、损失函数混合精度策略。
初始化参数:
| 是否必填| 参数说明 |示例|默认值|
| -------- |--------------------------------------------------------------------------------------------------------------| ------ | -------- |
|是 | 自适应混合精度策略,0:时间混合精度,适用于通用自适应情况; 1:时间学习率混合精度策略,适用于训练过程中某一阶段loss波动出现异常的情况; 2:损失函数混合精度策略,适用于训练过程中loss下降过快或过慢情况。 |0 |无|
| 否 | 开启自适应混合精度的开始时间,一般建议设为10。策略为0和1时必填,为2时非必填。 |10 |10|
| 否 | 开启自适应混合精度的结束时间,一般建议设为最后一个epoch时间。策略为0和1时必填,为2时非必填。 | 1000 None|
| 否 | 开启策略1时的保持时间,在保持时间内采用统一策略:开启或者不开启。一般建议为训练过程中loss异常波动的持续时间。策略为1时必填,为0和2时非必填。| 20| None|
| 否 | 开启策略2的间隔时间,默认值为1000,即每间隔1000轮epoch开启策略2。策略为2时需要填写,为0和1时无需必填。 |1000 |1000|
| 否 | 在interval_time间隔时间开启策略2后的保持时间,默认值为100,如interval_time为1000,即在1000-1100,2000-2100...开启策略2。策略为2时需要填写,为0和1时无需必填。 |100 |100|
policy实例化对象:
| 对象 | 类型 | 对象说明|
|---|---|---|
| policy | MixedPrecision_TrainingPolicy类 | 训练过程中自动混合精度自适应策略的实例化对象 |
##### 2) 自适应混合精度 enable_mixed_precision函数方法
属于MixedPrecision_TrainingPolicy类,根据输入的参数得到当前epoch是否需要开启自动混合精度。
输入参数:
| 参数 | 类型 | 是否必填 | 参数说明 | 示例 | 默认值|
|-----|-----| ----- | ----- | ----- | -----|
| epoch | INT | 是 | 当前的epoch| 20 | 无|
| scaler | torch.cuda.amp.GradScaler | 是 | 梯度缩放实例化对象| scaler| 无|
| lr | float | 否 | lr是当前epoch的学习率 | 0.01 | None|
| loss | float | 否 | loss是上一轮epoch的损失值| 0.1 | None|
输出参数:
| 输出参数 | 类型 | 参数说明|
|-------- | ---------| ---------|
| mixed_precision | BOOL| 输入的参数得到当前epoch是否需要开启自动混合精度,是返回TRUE,否则返回FLASE。|
Raw data
{
"_id": null,
"home_page": null,
"name": "tilearn-llm",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "pytorch, LLM",
"author": null,
"author_email": "Bo Yang <boyyang@tencent.com>, Boyuan Deng <bryandeng@tencent.com>, Chenyang Guo <chenyangguo@tencent.com>, Run Li <danerli@tencent.com>, Yahui Chen <huecheng@tencent.com>, Jing Gong <jennygong@tencent.com>, Minhao Qin <lemonqin@tencent.com>",
"download_url": null,
"platform": null,
"description": "## Tilearn.llm\u4f7f\u7528\u8bf4\u660e\n\n### 1. transfomers\u5927\u6a21\u578b\u8bad\u7ec3\u52a0\u901f\n\n#### 1.1 CUDA Kernel\uff08\u4ee5LLAMA\u4e3a\u4f8b\uff09\n\nTILEARN.LLM \u4f9d\u8d56 OPS \u7248\u672c\uff1a\n- \u955c\u50cf v1.6.7, transformers v4.31.0 -> \u8bf7\u4f7f\u7528 TILEARN.OPS 0.2.1.167\n- \u955c\u50cf v1.7.2, transformers v4.39.3 -> \u8bf7\u4f7f\u7528 TILEARN.OPS 0.2.1.172\n\n\u652f\u6301\u663e\u5361\uff1aAmpere, Ada, or Hopper GPUs (e.g., A100, A800, H100, H800)\n\nDependencies: pytorch >= 2.0.0\n\n\u5f53\u524d\u7248\u672c\u5b8c\u5168\u517c\u5bb9huggingface\u63a5\u53e3\uff0c\u4e0d\u9700\u8981\u989d\u5916\u7684\u64cd\u4f5c\n\ncuda kernel\u4f7f\u7528\u65b9\u6cd5-\u4ee3\u7801\u4fee\u6539\u5982\u4e0b\n\n```python3\n### TILEARN.LLM\nfrom tilearn.llm.transformers import LlamaForCausalLM\n\n### \u6a21\u578b\u63a5\u53e3\u4e0e\u6807\u51c6huggingface\u4e00\u81f4\nmodel = LlamaForCausalLM.from_pretrained(...)\n```\n\n\u6216\u8005\u4f7f\u7528AutoModelForCausalLM\u63a5\u53e3\n\n```python3\n### TILEARN.LLM\nfrom tilearn.llm.transformers import AutoModelForCausalLM\n\n### \u6a21\u578b\u63a5\u53e3\u4e0e\u6807\u51c6huggingface\u4e00\u81f4\nmodel = AutoModelForCausalLM.from_pretrained(...)\n```\n\n\u7279\u6b8a\u8bf4\u660e\uff1a\n\n1\u3001\u7531\u4e8ebaichuan1 13B\u548cbaichuan2 13B\u4f1a\u4ea7\u751f\u51b2\u7a81\uff0c\u76ee\u524dtilearn.llm.transformers.AutoModelForCausalLM\u9ed8\u8ba4\u5f00\u542f\u4e86baichuan1 13B\uff0c\u5982\u679c\u9700\u8981\u4f7f\u7528baichuan2 13B\uff0c\u9700\u8981\u5728\u542f\u52a8\u8bad\u7ec3\u811a\u672c\u4e2d\u8bbe\u7f6e\u73af\u5883\u53d8\u91cf\uff1aexport TILEARN_LLM_BAICHUAN_13B=2\n```shell\n### TILEARN_LLM_BAICHUAN_13B open baichuan2 model\nexport TILEARN_LLM_BAICHUAN_13B=2\n```\n\n2\u3001\u76ee\u524d\u52a0\u901f\u5df2\u7ecf\u652f\u6301\u7684\u6a21\u578b\u5217\u8868\uff1a\n\n```python3\n# llama\nfrom tilearn.llm.transformers.models.llama.modeling_llama import LlamaForCausalLM\n\n# bloom\nfrom tilearn.llm.transformers.models.bloom.modeling_bloom import BloomForCausalLM\n\n# baichuan1\nfrom tilearn.llm.transformers.models.baichuan.baichuan1_13B.modeling_baichuan import BaichuanForCausalLM\nfrom tilearn.llm.transformers.models.baichuan.baichuan1_7B.modeling_baichuan import BaiChuanForCausalLM\n\n# baichuan2\n# \u9ed8\u8ba4\u4f7f\u7528TILEARN.LLM\u4e14\u65e0\u9700\u4efb\u4f55\u8bbe\u7f6e\n# \u5355\u72ec\u4f7f\u7528xformer\uff0c\u9700\u5b89\u88c5xformer\u4e14\u8bbe\u7f6e\u73af\u5883\u53d8\u91cfTIACC_TRAINING_CUDA_KERNEL=2\nfrom tilearn.llm.transformers.models.baichuan.baichuan2_7B.modeling_baichuan import BaichuanForCausalLM\nfrom tilearn.llm.transformers.models.baichuan.baichuan2_13B.modeling_baichuan import BaichuanForCausalLM\n\n# aquila2\nfrom tilearn.llm.transformers.models.aquila.aquila2.modeling_aquila import AquilaForCausalLM\n```\n\n#### 1.2 torch compile - experiment\n\n\u9002\u7528\u573a\u666f\uff1ahuggingface transformers + trainer\u6a21\u578b\n\n\u81ea\u52a8\u7f16\u8bd1\u4f18\u5316\uff0c\u5728main.py\u6dfb\u52a0\u5982\u4e0b\u4ee3\u7801\u5373\u53ef\u5f00\u542f\uff0c\u76ee\u524d\u8fd8\u5728\u5b9e\u9a8c\u9636\u6bb5\n\n\n```python\nimport tilearn.llm.compile\n\n```\n\n\u76ee\u524d\u5df2\u652f\u6301\u624b\u5de5CUDA\u7b97\u5b50+\u81ea\u52a8\u7f16\u8bd1\u4f18\u5316\uff0c\u82e5\u8981\u5173\u95ed\u624b\u5de5CUDA\u7b97\u5b50\uff0c\u5219\u6dfb\u52a0\u4ee5\u4e0b\u73af\u5883\u53d8\u91cf\n\n```bash\nexport TILEARN_COMPILE_MODELPATCH=0\n```\n\n#### 1.3 Static Zero\n\n\u9002\u7528\u573a\u666f\uff1a\u5728deepspeed zero1\u3001zero2\u3001zero3\u3001offload\u3001int8\u7b49\u4e0d\u540c\u4f18\u5316\u72b6\u6001\u95f4\u5207\u6362\n\n\u542f\u52a8\u811a\u672c\u4fee\u6539\u5982\u4e0b\n\n```shell\n### TILEARN STATIC ZERO\n### Open: TIACC_TRAINING_CUDA_KERNEL='O2' \n### support 'O2' / 'O2.5' / 'O3' / 'O3.5' / 'O3_Q8'(doing)\n### Close: TIACC_TRAINING_CUDA_KERNEL='None'\nexport TIACC_TRAINING_STATIC_ZERO='None' #'O2'\n```\n\u4ee3\u7801\u4fee\u6539\u5982\u4e0b\n```python3\nfrom transformers import HfArgumentParser\nfrom tilearn.llm.transformers import TrainingArguments\n\t\n### \u63a5\u53e3\u4e0e\u6807\u51c6huggingface\u4e00\u81f4\nparser = HfArgumentParser((ModelArguments, DataTrainingArguments, TrainingArguments))\n```\n\n#### 1.4 \u52a0\u901f\u6548\u679c\n\n[TILEARN-LLM\u5927\u6a21\u578b\u8bad\u7ec3\u52a0\u901f\u6307\u6807](https://doc.weixin.qq.com/sheet/e3_AMgA0QZ_ACcjF99orweSwKk9RITj7?scode=AJEAIQdfAAoOnhk9M1AMgA0QZ_ACc&tab=pl3grp)\n\n\n### 2. \u901a\u7528\u8bad\u7ec3\u52a0\u901f\u529f\u80fd\u4ecb\u7ecd\n\u8bad\u7ec3\u52a0\u901f\u4e2d\u7684\u901a\u4fe1\u52a0\u901f\u80fd\u529b\u901a\u8fc7\u517c\u5bb9\u539f\u751f\u7684DDP\u5de5\u5177\u63d0\u4f9b\uff0c\u7528\u6237\u65e0\u9700\u4fee\u6539\u539f\u751f\u7684\u4f7f\u7528\u4ee3\u7801\u53ef\u76f4\u63a5\u8fdb\u884c\u4f7f\u7528\uff0c\u6570\u636eIO\u4f18\u5316\u3001\u81ea\u9002\u5e94FP16\u90fd\u901a\u8fc7\u5c01\u88c5\u597d\u7684\u7b80\u5355\u51fd\u6570/\u7c7b\u8fdb\u884c\u63d0\u4f9b\uff0c\u7528\u6237\u4ec5\u9700\u589e\u52a0\u51e0\u884c\u4ee3\u7801\u4fbf\u53ef\u4f7f\u7528\u3002\n\n#### 2.1 \u4f7f\u7528DDP\u5206\u5e03\u5f0f\u8bad\u7ec3\u901a\u4fe1\u4f18\u5316\uff08PyTorch+\u591a\u673a\u591a\u5361DPP\uff09\n\u9002\u7528\u8303\u56f4\uff1a\u591a\u673a\u591a\u5361\n\u4ee5\u517c\u5bb9\u539f\u751fDDP\u7684\u65b9\u5f0f\u542f\u52a8\u8bad\u7ec3\u811a\u672c\uff0c\u65e0\u9700\u8fdb\u884c\u8bad\u7ec3\u4ee3\u7801\u7684\u4fee\u6539\uff0c\u542f\u52a8\u547d\u4ee4\u53c2\u8003\u793a\u4f8b\u5982\u4e0b\uff1a\n\u5728\u542f\u52a8\u811a\u672cstart.sh\u5185\u4f7f\u7528tiaccrun\u66ff\u6362torchrun\uff0c\u63a5\u53e3\u4e0epytorch torchrun\u5b8c\u5168\u4e00\u81f4\n\n```bash\nexport NODE_NUM=1\nexport INDEX=0\nexport GPU_NUM_PER_NODE=1\nexport MASTER_ADDR=127.0.0.1\nexport MASTER_PORT=23458\n\ntiaccrun \\\n --nnodes $NODE_NUM \\\n --node_rank $INDEX \\\n --nproc_per_node $GPU_NUM_PER_NODE \\\n --master_addr $MASTER_ADDR \\\n --master_port $MASTER_PORT \\\n xxx.py\n\ntilearnrun \\\n --nnodes $NODE_NUM \\\n --node_rank $INDEX \\\n --nproc_per_node $GPU_NUM_PER_NODE \\\n --master_addr $MASTER_ADDR \\\n --master_port $MASTER_PORT \\\n xxx.py\n```\nDDP\u5206\u5e03\u5f0f\u8bad\u7ec3\u901a\u4fe1\u4f18\u5316\u5b9e\u6d4b\u6548\u679c\uff1a\n\uff08\u52a0\u901f\u6548\u679c\u5728\u591a\u673a\u591a\u5361\u573a\u666f\u65b9\u6709\u4f53\u73b0\uff0c\u5355\u673a\u591a\u5361\u573a\u666f\u4e0e\u539f\u751fDDP\u6027\u80fd\u65e0\u5f02\u3002\uff09\n\n| \u786c\u4ef6\u73af\u5883 | \u6a21\u578b | GPU\u5361\u6570 | \u539f\u751fDDP(examples/sec per V100) | TI-ACC\u901a\u4fe1\u4f18\u5316(examples/sec per V100) |\n| --------- |--------- |--------|------|--------------------------------|\n| \u817e\u8baf\u4e91GN10Xp.20XLARGE320 | resnext50_32x4d | 1\uff08\u5355\u673a\uff09 | 227 | 227 |\n| \u817e\u8baf\u4e91GN10Xp.20XLARGE320 | resnext50_32x4d | 8\uff08\u5355\u673a\uff09 | 215 | 215 |\n| \u817e\u8baf\u4e91GN10Xp.20XLARGE320 | resnext50_32x4d | 16\uff08\u53cc\u673a\uff09 | 116 | 158.6 |\n\n\n#### 2.2 \u4f7f\u7528TIACC\u4f18\u5316\u5668\uff08PyTorch\uff09\n\u9002\u7528\u8303\u56f4\uff1a\u5355\u673a\u5355\u5361\u3001\u5355\u673a\u591a\u5361\u3001\u591a\u7ea7\u591a\u5361\n```python\nimport torch\n\nfrom tilearn.llm.torch.optimizers import FusedSGD\nfrom tilearn.llm.torch.optimizers import FusedAdam\nfrom tilearn.llm.torch.optimizers import FusedLAMB\nfrom tilearn.llm.torch.optimizers import FusedAdagrad\n\nnelem = 1\ntensor = torch.rand(nelem, dtype=torch.float, device=\"cuda\")\n\nparam = []\nparam.append(torch.nn.Parameter(tensor.clone()))\n\nsgd_options = {\"lr\": .25, \"momentum\": .125}\n\noptimizer =FusedSGD(param, **sgd_options)\noptimizer =FusedAdam(param)\noptimizer =FusedLAMB(param)\noptimizer =FusedAdagrad(param)\n```\n\nFusedSGD\u63a5\u53e3\n```python\nclass FusedSGD(Optimizer):\n def __init__(self, params, lr=required, momentum=0, \n dampening=0, weight_decay=0, nesterov=False)\n```\n\nFusedAdam\u63a5\u53e3\n```python\nclass FusedAdam(Optimizer):\n def __init__(self, params, lr=1e-3, bias_correction=True,\n betas=(0.9, 0.999), eps=1e-8, adam_w_mode=True,\n weight_decay=0., amsgrad=False)\n```\n\nFusedLAMB\u63a5\u53e3\n```python\nclass FusedLAMB(Optimizer):\n def __init__(self, params, lr=1e-3, bias_correction=True,\n betas=(0.9, 0.999), eps=1e-6, weight_decay=0.01,\n amsgrad=False, adam_w_mode=True,\n max_grad_norm=1.0):\n```\n\nFusedAdagrad\u63a5\u53e3\n```python\nclass FusedAdagrad(Optimizer):\n def __init__(self, params, lr=1e-2, eps=1e-10,\n weight_decay=0., adagrad_w_mode=False):\n```\n\n\n#### 2.3 cpu\u4eb2\u548c\u6027\u4f18\u5316\n\u9002\u7528\u8303\u56f4\u5355\u673a8\u5361\u3001\u591a\u673a\u591a\u5361\uff0c\u9700\u7ed3\u5408tiaccrun\u3001torchrun\u4e00\u8d77\u4f7f\u7528\n```python\nfrom tilearn.llm.cpu_affinity import cpu_affinity\n\ndef main():\n cpu_affinity()\n \nmain()\n```\n\n#### 2.4 \u4f7f\u7528\u81ea\u9002\u5e94\u6df7\u5408\u7cbe\u5ea6\u4f18\u5316\uff08PyTorch\uff09\n\u9002\u7528\u8303\u56f4\uff1a\u5f00\u542ftorch amp\u540e\uff0closs\u4e0d\u6536\u655b\u6216\u6a21\u578b\u6548\u679c\u4e0b\u964d\u65f6\uff0c\u4f7f\u7528tiacc_training amp\u63a5\u53e3\u63d0\u5347\u6a21\u578b\u6548\u679c\n```python\nimport torch\nfrom tilearn.llm.torch.adapt_amp import MixedPrecision_TrainingPolicy\n\ndef main():\n #\u4f7f\u7528tiacc\u81ea\u9002\u5e94\u6df7\u5408\u7cbe\u5ea6\n scaler = torch.cuda.amp.GradScaler()\n #\u5b9e\u4f8b\u5316tiacc\u81ea\u9002\u5e94\u6df7\u5408\u7cbe\u5ea6\u7b56\u7565\u7c7b\u7684\u5bf9\u8c61\n schedulePolicy = \"TimeSchedulePolicy\"\n policy = MixedPrecision_TrainingPolicy(\n policy=schedulePolicy,\n start_time=0, end_time=40)\n #\u6839\u636e\u8f93\u5165\u7684\u53c2\u6570\u5f97\u5230\u5f53\u524depoch\u662f\u5426\u9700\u8981\u5f00\u542f\u6df7\u5408\u7cbe\u5ea6\n for epoch in range(0, 51):\n mixed_precision = policy.enable_mixed_precision(epoch,\n scaler=scaler)\n\n print(mixed_precision)\n #with amp.autocast(enabled=mixed_precision):\n # outputs = model(inputs)\n #\u00a0 loss = criterion(outputs, targets)\n\n #scaler.scale(loss).backward()\n #scaler.step(optimizer)\n #scaler.update()\n\n\nmain()\n```\n##### 1) MixedPrecision_TrainingPolicy\u7c7b\u63a5\u53e3\n\n\u5b9e\u73b0\u5bf9\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u81ea\u52a8\u6df7\u5408\u7cbe\u5ea6\u81ea\u9002\u5e94\u7b56\u7565\u7684\u5b9e\u4f8b\u5316\uff0c\u81ea\u9002\u5e94\u7b56\u7565\u5305\u62ec\u65f6\u95f4\u6df7\u5408\u7cbe\u5ea6\u3001\u65f6\u95f4\u5b66\u4e60\u7387\u6df7\u5408\u7cbe\u5ea6\u7b56\u7565\u3001\u635f\u5931\u51fd\u6570\u6df7\u5408\u7cbe\u5ea6\u7b56\u7565\u3002\n\n\u521d\u59cb\u5316\u53c2\u6570\uff1a\n\n| \u662f\u5426\u5fc5\u586b| \u53c2\u6570\u8bf4\u660e |\u793a\u4f8b|\u9ed8\u8ba4\u503c|\n| -------- |--------------------------------------------------------------------------------------------------------------| ------ | -------- |\n|\u662f\t| \u81ea\u9002\u5e94\u6df7\u5408\u7cbe\u5ea6\u7b56\u7565\uff0c0:\u65f6\u95f4\u6df7\u5408\u7cbe\u5ea6\uff0c\u9002\u7528\u4e8e\u901a\u7528\u81ea\u9002\u5e94\u60c5\u51b5; 1:\u65f6\u95f4\u5b66\u4e60\u7387\u6df7\u5408\u7cbe\u5ea6\u7b56\u7565\uff0c\u9002\u7528\u4e8e\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u67d0\u4e00\u9636\u6bb5loss\u6ce2\u52a8\u51fa\u73b0\u5f02\u5e38\u7684\u60c5\u51b5; 2:\u635f\u5931\u51fd\u6570\u6df7\u5408\u7cbe\u5ea6\u7b56\u7565\uff0c\u9002\u7528\u4e8e\u8bad\u7ec3\u8fc7\u7a0b\u4e2dloss\u4e0b\u964d\u8fc7\u5feb\u6216\u8fc7\u6162\u60c5\u51b5\u3002\t |0\t|\u65e0|\n| \u5426\t| \u5f00\u542f\u81ea\u9002\u5e94\u6df7\u5408\u7cbe\u5ea6\u7684\u5f00\u59cb\u65f6\u95f4\uff0c\u4e00\u822c\u5efa\u8bae\u8bbe\u4e3a10\u3002\u7b56\u7565\u4e3a0\u548c1\u65f6\u5fc5\u586b\uff0c\u4e3a2\u65f6\u975e\u5fc5\u586b\u3002\t|10\t|10| \n| \u5426\t| \u5f00\u542f\u81ea\u9002\u5e94\u6df7\u5408\u7cbe\u5ea6\u7684\u7ed3\u675f\u65f6\u95f4\uff0c\u4e00\u822c\u5efa\u8bae\u8bbe\u4e3a\u6700\u540e\u4e00\u4e2aepoch\u65f6\u95f4\u3002\u7b56\u7565\u4e3a0\u548c1\u65f6\u5fc5\u586b\uff0c\u4e3a2\u65f6\u975e\u5fc5\u586b\u3002\t| 1000\tNone|\n| \u5426\t| \u5f00\u542f\u7b56\u75651\u65f6\u7684\u4fdd\u6301\u65f6\u95f4\uff0c\u5728\u4fdd\u6301\u65f6\u95f4\u5185\u91c7\u7528\u7edf\u4e00\u7b56\u7565\uff1a\u5f00\u542f\u6216\u8005\u4e0d\u5f00\u542f\u3002\u4e00\u822c\u5efa\u8bae\u4e3a\u8bad\u7ec3\u8fc7\u7a0b\u4e2dloss\u5f02\u5e38\u6ce2\u52a8\u7684\u6301\u7eed\u65f6\u95f4\u3002\u7b56\u7565\u4e3a1\u65f6\u5fc5\u586b\uff0c\u4e3a0\u548c2\u65f6\u975e\u5fc5\u586b\u3002|\t20|\tNone|\n| \u5426\t| \u5f00\u542f\u7b56\u75652\u7684\u95f4\u9694\u65f6\u95f4\uff0c\u9ed8\u8ba4\u503c\u4e3a1000\uff0c\u5373\u6bcf\u95f4\u96941000\u8f6eepoch\u5f00\u542f\u7b56\u75652\u3002\u7b56\u7565\u4e3a2\u65f6\u9700\u8981\u586b\u5199\uff0c\u4e3a0\u548c1\u65f6\u65e0\u9700\u5fc5\u586b\u3002\t|1000\t|1000|\n| \u5426\t| \u5728interval_time\u95f4\u9694\u65f6\u95f4\u5f00\u542f\u7b56\u75652\u540e\u7684\u4fdd\u6301\u65f6\u95f4\uff0c\u9ed8\u8ba4\u503c\u4e3a100\uff0c\u5982interval_time\u4e3a1000\uff0c\u5373\u57281000-1100,2000-2100...\u5f00\u542f\u7b56\u75652\u3002\u7b56\u7565\u4e3a2\u65f6\u9700\u8981\u586b\u5199\uff0c\u4e3a0\u548c1\u65f6\u65e0\u9700\u5fc5\u586b\u3002\t|100\t|100|\n\npolicy\u5b9e\u4f8b\u5316\u5bf9\u8c61: \n\n| \u5bf9\u8c61\t| \u7c7b\u578b | \u5bf9\u8c61\u8bf4\u660e|\n|---|---|---|\n| policy\t| MixedPrecision_TrainingPolicy\u7c7b\t| \u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u81ea\u52a8\u6df7\u5408\u7cbe\u5ea6\u81ea\u9002\u5e94\u7b56\u7565\u7684\u5b9e\u4f8b\u5316\u5bf9\u8c61 |\n\n##### 2) \u81ea\u9002\u5e94\u6df7\u5408\u7cbe\u5ea6 enable_mixed_precision\u51fd\u6570\u65b9\u6cd5\n\u5c5e\u4e8eMixedPrecision_TrainingPolicy\u7c7b\uff0c\u6839\u636e\u8f93\u5165\u7684\u53c2\u6570\u5f97\u5230\u5f53\u524depoch\u662f\u5426\u9700\u8981\u5f00\u542f\u81ea\u52a8\u6df7\u5408\u7cbe\u5ea6\u3002\n\u8f93\u5165\u53c2\u6570\uff1a\n\n| \u53c2\u6570\t| \u7c7b\u578b\t| \u662f\u5426\u5fc5\u586b\t| \u53c2\u6570\u8bf4\u660e\t| \u793a\u4f8b\t| \u9ed8\u8ba4\u503c| \n|-----|-----| -----\t| -----\t| -----\t| -----| \n| epoch\t| INT\t| \u662f | \t\u5f53\u524d\u7684epoch| \t20\t| \u65e0| \n| scaler\t| torch.cuda.amp.GradScaler \t| \u662f\t| \u68af\u5ea6\u7f29\u653e\u5b9e\u4f8b\u5316\u5bf9\u8c61| \tscaler| \t\u65e0| \n| lr\t| float\t| \u5426\t| lr\u662f\u5f53\u524depoch\u7684\u5b66\u4e60\u7387\t| 0.01\t| None| \n| loss\t| float\t| \u5426\t| loss\u662f\u4e0a\u4e00\u8f6eepoch\u7684\u635f\u5931\u503c| \t0.1\t| None| \n\n\u8f93\u51fa\u53c2\u6570\uff1a\n\n| \u8f93\u51fa\u53c2\u6570\t| \u7c7b\u578b\t| \u53c2\u6570\u8bf4\u660e| \n|-------- | ---------| ---------| \n| mixed_precision\t| BOOL| \t\u8f93\u5165\u7684\u53c2\u6570\u5f97\u5230\u5f53\u524depoch\u662f\u5426\u9700\u8981\u5f00\u542f\u81ea\u52a8\u6df7\u5408\u7cbe\u5ea6\uff0c\u662f\u8fd4\u56deTRUE\uff0c\u5426\u5219\u8fd4\u56deFLASE\u3002| \n\n\n",
"bugtrack_url": null,
"license": "Copyright (c) 2023 - 2023 TENCENT CORPORATION. All rights reserved.",
"summary": "TILEARN for LLM",
"version": "0.10.0.1",
"project_urls": {
"Homepage": "https://cloud.tencent.com/document/product/1511/64787"
},
"split_keywords": [
"pytorch",
" llm"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "27546e261fc32acd7155bb5b2fda135973618d9ff35f6f3e5be746c4bf19c5e4",
"md5": "92d2f9ef76076046f076067b46c32eaf",
"sha256": "60463356951476005a8e32155daca9fbbd6f76fec8e81ada930134d46787cea2"
},
"downloads": -1,
"filename": "tilearn_llm-0.10.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "92d2f9ef76076046f076067b46c32eaf",
"packagetype": "bdist_wheel",
"python_version": "cp310",
"requires_python": ">=3.8",
"size": 16953391,
"upload_time": "2024-11-01T10:57:53",
"upload_time_iso_8601": "2024-11-01T10:57:53.478147Z",
"url": "https://files.pythonhosted.org/packages/27/54/6e261fc32acd7155bb5b2fda135973618d9ff35f6f3e5be746c4bf19c5e4/tilearn_llm-0.10.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "f013a728f8048b0f2eb6ab98bd20971233ae3a669bc549c9091096217be5bf51",
"md5": "08fedd4f4e1bdbefe9f581c977e82f6b",
"sha256": "a16fa63d90fd408cf49e1cc4b32de6a78aac50b12b0c7c3b872d29bf7575fa40"
},
"downloads": -1,
"filename": "tilearn_llm-0.10.0.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "08fedd4f4e1bdbefe9f581c977e82f6b",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": ">=3.8",
"size": 16935810,
"upload_time": "2024-11-01T10:58:03",
"upload_time_iso_8601": "2024-11-01T10:58:03.694267Z",
"url": "https://files.pythonhosted.org/packages/f0/13/a728f8048b0f2eb6ab98bd20971233ae3a669bc549c9091096217be5bf51/tilearn_llm-0.10.0.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "a5206009572d174ff51d15ed8099b4ca48661884b0a65565e7dc82cd223a9dec",
"md5": "9dc21ec3e635a7e99beeff0b403883b1",
"sha256": "223bbea3b5cfe0d81a99bb272b5d3f1cc7a4723ad7ffe1f8c443f9899e24d6b2"
},
"downloads": -1,
"filename": "tilearn_llm-0.10.0.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "9dc21ec3e635a7e99beeff0b403883b1",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": ">=3.8",
"size": 16937427,
"upload_time": "2024-11-01T10:58:12",
"upload_time_iso_8601": "2024-11-01T10:58:12.034750Z",
"url": "https://files.pythonhosted.org/packages/a5/20/6009572d174ff51d15ed8099b4ca48661884b0a65565e7dc82cd223a9dec/tilearn_llm-0.10.0.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-01 10:57:53",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "tilearn-llm"
}