tilearn-llm


Nametilearn-llm JSON
Version 0.8.7 PyPI version JSON
download
home_pageNone
SummaryTILEARN for LLM
upload_time2024-04-19 02:35:24
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseCopyright (c) 2023 - 2023 TENCENT CORPORATION. All rights reserved.
keywords pytorch llm
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ## Tilearn.llm使用说明

### 1. transfomers大模型训练加速

#### 1.1 CUDA Kernel(以LLAMA为例)

支持显卡:Ampere, Ada, or Hopper GPUs (e.g., A100, A800, H100, H800)

Dependencies: pytorch >= 2.0.0

当前版本完全兼容huggingface接口,不需要额外的操作

cuda kernel使用方法-代码修改如下

```python3
### TILEARN.LLM
from tilearn.llm.transformers import LlamaForCausalLM

### 模型接口与标准huggingface一致
model = LlamaForCausalLM.from_pretrained(...)
```

或者使用AutoModelForCausalLM接口

```python3
### TILEARN.LLM
from tilearn.llm.transformers import AutoModelForCausalLM

### 模型接口与标准huggingface一致
model = AutoModelForCausalLM.from_pretrained(...)
```

特殊说明:

1、由于baichuan1 13B和baichuan2 13B会产生冲突,目前tilearn.llm.transformers.AutoModelForCausalLM默认开启了baichuan1 13B,如果需要使用baichuan2 13B,需要在启动训练脚本中设置环境变量:export TILEARN_LLM_BAICHUAN_13B=2
```shell
### TILEARN_LLM_BAICHUAN_13B open baichuan2 model
export TILEARN_LLM_BAICHUAN_13B=2
```

2、目前加速已经支持的模型列表:

```python3
# llama
from tilearn.llm.transformers.models.llama.modeling_llama import LlamaForCausalLM

# bloom
from tilearn.llm.transformers.models.bloom.modeling_bloom import BloomForCausalLM

# baichuan1
from tilearn.llm.transformers.models.baichuan.baichuan1_13B.modeling_baichuan import BaichuanForCausalLM
from tilearn.llm.transformers.models.baichuan.baichuan1_7B.modeling_baichuan import BaiChuanForCausalLM

# baichuan2
# 默认使用TILEARN.LLM且无需任何设置
# 单独使用xformer,需安装xformer且设置环境变量TIACC_TRAINING_CUDA_KERNEL=2
from tilearn.llm.transformers.models.baichuan.baichuan2_7B.modeling_baichuan import BaichuanForCausalLM
from tilearn.llm.transformers.models.baichuan.baichuan2_13B.modeling_baichuan import BaichuanForCausalLM

# aquila2
from tilearn.llm.transformers.models.aquila.aquila2.modeling_aquila import AquilaForCausalLM
```

#### 1.2 torch compile - experiment

适用场景:huggingface transformers + trainer模型

自动编译优化,在main.py添加如下代码即可开启,目前还在实验阶段


```python
import tilearn.llm.compile

```

目前已支持手工CUDA算子+自动编译优化,若要关闭手工CUDA算子,则添加以下环境变量

```bash
export TILEARN_COMPILE_MODELPATCH=0
```

#### 1.3 Static Zero

适用场景:在deepspeed zero1、zero2、zero3、offload、int8等不同优化状态间切换

启动脚本修改如下

```shell
### TILEARN STATIC ZERO
### Open: TIACC_TRAINING_CUDA_KERNEL='O2' 
### support 'O2' / 'O2.5' / 'O3' / 'O3.5' / 'O3_Q8'(doing)
### Close: TIACC_TRAINING_CUDA_KERNEL='None'
export TIACC_TRAINING_STATIC_ZERO='None' #'O2'
```
代码修改如下
```python3
from transformers import HfArgumentParser
from tilearn.llm.transformers import TrainingArguments
	
### 接口与标准huggingface一致
parser = HfArgumentParser((ModelArguments, DataTrainingArguments, TrainingArguments))
```

#### 1.4 加速效果

[TILEARN-LLM大模型训练加速指标](https://doc.weixin.qq.com/sheet/e3_AMgA0QZ_ACcjF99orweSwKk9RITj7?scode=AJEAIQdfAAoOnhk9M1AMgA0QZ_ACc&tab=pl3grp)


### 2. 通用训练加速功能介绍
训练加速中的通信加速能力通过兼容原生的DDP工具提供,用户无需修改原生的使用代码可直接进行使用,数据IO优化、自适应FP16都通过封装好的简单函数/类进行提供,用户仅需增加几行代码便可使用。

#### 2.1 使用DDP分布式训练通信优化(PyTorch+多机多卡DPP)
适用范围:多机多卡
以兼容原生DDP的方式启动训练脚本,无需进行训练代码的修改,启动命令参考示例如下:
在启动脚本start.sh内使用tiaccrun替换torchrun,接口与pytorch torchrun完全一致

```bash
export NODE_NUM=1
export INDEX=0
export GPU_NUM_PER_NODE=1
export MASTER_ADDR=127.0.0.1
export MASTER_PORT=23458

tiaccrun \
    --nnodes $NODE_NUM \
    --node_rank $INDEX \
    --nproc_per_node $GPU_NUM_PER_NODE \
    --master_addr $MASTER_ADDR \
    --master_port $MASTER_PORT \
    xxx.py

tilearnrun \
    --nnodes $NODE_NUM \
    --node_rank $INDEX \
    --nproc_per_node $GPU_NUM_PER_NODE \
    --master_addr $MASTER_ADDR \
    --master_port $MASTER_PORT \
    xxx.py
```
DDP分布式训练通信优化实测效果:
(加速效果在多机多卡场景方有体现,单机多卡场景与原生DDP性能无异。)

| 硬件环境 |  模型 | GPU卡数  | 原生DDP(examples/sec per V100) | TI-ACC通信优化(examples/sec per V100) |
| --------- |--------- |--------|------|--------------------------------|
| 腾讯云GN10Xp.20XLARGE320 | resnext50_32x4d | 1(单机)  | 227  | 227 |
| 腾讯云GN10Xp.20XLARGE320  | resnext50_32x4d | 8(单机)  | 215  | 215 |
| 腾讯云GN10Xp.20XLARGE320  | resnext50_32x4d | 16(双机) | 116  | 158.6 |


#### 2.2 使用TIACC优化器(PyTorch)
适用范围:单机单卡、单机多卡、多级多卡
```python
import torch

from tilearn.llm.torch.optimizers import FusedSGD
from tilearn.llm.torch.optimizers import FusedAdam
from tilearn.llm.torch.optimizers import FusedLAMB
from tilearn.llm.torch.optimizers import FusedAdagrad

nelem = 1
tensor = torch.rand(nelem, dtype=torch.float, device="cuda")

param = []
param.append(torch.nn.Parameter(tensor.clone()))

sgd_options = {"lr": .25, "momentum": .125}

optimizer =FusedSGD(param, **sgd_options)
optimizer =FusedAdam(param)
optimizer =FusedLAMB(param)
optimizer =FusedAdagrad(param)
```

FusedSGD接口
```python
class FusedSGD(Optimizer):
    def __init__(self, params, lr=required, momentum=0, 
                 dampening=0, weight_decay=0, nesterov=False)
```

FusedAdam接口
```python
class FusedAdam(Optimizer):
    def __init__(self, params, lr=1e-3, bias_correction=True,
                 betas=(0.9, 0.999), eps=1e-8, adam_w_mode=True,
                 weight_decay=0., amsgrad=False)
```

FusedLAMB接口
```python
class FusedLAMB(Optimizer):
    def __init__(self, params, lr=1e-3, bias_correction=True,
                 betas=(0.9, 0.999), eps=1e-6, weight_decay=0.01,
                 amsgrad=False, adam_w_mode=True,
                 max_grad_norm=1.0):
```

FusedAdagrad接口
```python
class FusedAdagrad(Optimizer):
    def __init__(self, params, lr=1e-2, eps=1e-10,
                 weight_decay=0., adagrad_w_mode=False):
```


#### 2.3 cpu亲和性优化
适用范围单机8卡、多机多卡,需结合tiaccrun、torchrun一起使用
```python
from tilearn.llm.cpu_affinity import cpu_affinity

def main():
    cpu_affinity()
    
main()
```

#### 2.4 使用自适应混合精度优化(PyTorch)
适用范围:开启torch amp后,loss不收敛或模型效果下降时,使用tiacc_training amp接口提升模型效果
```python
import torch
from tilearn.llm.torch.adapt_amp import MixedPrecision_TrainingPolicy

def main():
    #使用tiacc自适应混合精度
    scaler = torch.cuda.amp.GradScaler()
    #实例化tiacc自适应混合精度策略类的对象
    schedulePolicy = "TimeSchedulePolicy"
    policy = MixedPrecision_TrainingPolicy(
            policy=schedulePolicy,
            start_time=0, end_time=40)
    #根据输入的参数得到当前epoch是否需要开启混合精度
    for epoch in range(0, 51):
        mixed_precision = policy.enable_mixed_precision(epoch,
                          scaler=scaler)

        print(mixed_precision)
        #with amp.autocast(enabled=mixed_precision):
        #    outputs = model(inputs)
        #    loss = criterion(outputs, targets)

        #scaler.scale(loss).backward()
        #scaler.step(optimizer)
        #scaler.update()


main()
```
##### 1) MixedPrecision_TrainingPolicy类接口

实现对训练过程中自动混合精度自适应策略的实例化,自适应策略包括时间混合精度、时间学习率混合精度策略、损失函数混合精度策略。

初始化参数:

| 是否必填| 参数说明                                                                                                         |示例|默认值|
| -------- |--------------------------------------------------------------------------------------------------------------| ------ | -------- |
|是	| 自适应混合精度策略,0:时间混合精度,适用于通用自适应情况; 1:时间学习率混合精度策略,适用于训练过程中某一阶段loss波动出现异常的情况; 2:损失函数混合精度策略,适用于训练过程中loss下降过快或过慢情况。	 |0	|无|
| 否	| 开启自适应混合精度的开始时间,一般建议设为10。策略为0和1时必填,为2时非必填。	|10	|10| 
| 否	| 开启自适应混合精度的结束时间,一般建议设为最后一个epoch时间。策略为0和1时必填,为2时非必填。	| 1000	None|
| 否	| 开启策略1时的保持时间,在保持时间内采用统一策略:开启或者不开启。一般建议为训练过程中loss异常波动的持续时间。策略为1时必填,为0和2时非必填。|	20|	None|
| 否	| 开启策略2的间隔时间,默认值为1000,即每间隔1000轮epoch开启策略2。策略为2时需要填写,为0和1时无需必填。	|1000	|1000|
| 否	| 在interval_time间隔时间开启策略2后的保持时间,默认值为100,如interval_time为1000,即在1000-1100,2000-2100...开启策略2。策略为2时需要填写,为0和1时无需必填。	|100	|100|

policy实例化对象: 

| 对象	| 类型 | 对象说明|
|---|---|---|
| policy	| MixedPrecision_TrainingPolicy类	| 训练过程中自动混合精度自适应策略的实例化对象 |

##### 2) 自适应混合精度 enable_mixed_precision函数方法
属于MixedPrecision_TrainingPolicy类,根据输入的参数得到当前epoch是否需要开启自动混合精度。
输入参数:

| 参数	| 类型	| 是否必填	| 参数说明	| 示例	| 默认值| 
|-----|-----| -----	| -----	| -----	| -----| 
| epoch	| INT	| 是 | 	当前的epoch| 	20	| 无| 
| scaler	|  torch.cuda.amp.GradScaler 	| 是	| 梯度缩放实例化对象| 	scaler| 	无| 
| lr	| float	| 否	| lr是当前epoch的学习率	| 0.01	| None| 
| loss	| float	| 否	| loss是上一轮epoch的损失值| 	0.1	| None| 

输出参数:

| 输出参数	| 类型	| 参数说明| 
|-------- | ---------| ---------| 
| mixed_precision	| BOOL| 	输入的参数得到当前epoch是否需要开启自动混合精度,是返回TRUE,否则返回FLASE。| 



            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "tilearn-llm",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "pytorch, LLM",
    "author": null,
    "author_email": "Bo Yang <boyyang@tencent.com>, Boyuan Deng <bryandeng@tencent.com>, Chenyang Guo <chenyangguo@tencent.com>, Run Li <danerli@tencent.com>, Yahui Chen <huecheng@tencent.com>, Jing Gong <jennygong@tencent.com>, Minhao Qin <lemonqin@tencent.com>",
    "download_url": null,
    "platform": null,
    "description": "## Tilearn.llm\u4f7f\u7528\u8bf4\u660e\n\n### 1. transfomers\u5927\u6a21\u578b\u8bad\u7ec3\u52a0\u901f\n\n#### 1.1 CUDA Kernel\uff08\u4ee5LLAMA\u4e3a\u4f8b\uff09\n\n\u652f\u6301\u663e\u5361\uff1aAmpere, Ada, or Hopper GPUs (e.g., A100, A800, H100, H800)\n\nDependencies: pytorch >= 2.0.0\n\n\u5f53\u524d\u7248\u672c\u5b8c\u5168\u517c\u5bb9huggingface\u63a5\u53e3\uff0c\u4e0d\u9700\u8981\u989d\u5916\u7684\u64cd\u4f5c\n\ncuda kernel\u4f7f\u7528\u65b9\u6cd5-\u4ee3\u7801\u4fee\u6539\u5982\u4e0b\n\n```python3\n### TILEARN.LLM\nfrom tilearn.llm.transformers import LlamaForCausalLM\n\n### \u6a21\u578b\u63a5\u53e3\u4e0e\u6807\u51c6huggingface\u4e00\u81f4\nmodel = LlamaForCausalLM.from_pretrained(...)\n```\n\n\u6216\u8005\u4f7f\u7528AutoModelForCausalLM\u63a5\u53e3\n\n```python3\n### TILEARN.LLM\nfrom tilearn.llm.transformers import AutoModelForCausalLM\n\n### \u6a21\u578b\u63a5\u53e3\u4e0e\u6807\u51c6huggingface\u4e00\u81f4\nmodel = AutoModelForCausalLM.from_pretrained(...)\n```\n\n\u7279\u6b8a\u8bf4\u660e\uff1a\n\n1\u3001\u7531\u4e8ebaichuan1 13B\u548cbaichuan2 13B\u4f1a\u4ea7\u751f\u51b2\u7a81\uff0c\u76ee\u524dtilearn.llm.transformers.AutoModelForCausalLM\u9ed8\u8ba4\u5f00\u542f\u4e86baichuan1 13B\uff0c\u5982\u679c\u9700\u8981\u4f7f\u7528baichuan2 13B\uff0c\u9700\u8981\u5728\u542f\u52a8\u8bad\u7ec3\u811a\u672c\u4e2d\u8bbe\u7f6e\u73af\u5883\u53d8\u91cf\uff1aexport TILEARN_LLM_BAICHUAN_13B=2\n```shell\n### TILEARN_LLM_BAICHUAN_13B open baichuan2 model\nexport TILEARN_LLM_BAICHUAN_13B=2\n```\n\n2\u3001\u76ee\u524d\u52a0\u901f\u5df2\u7ecf\u652f\u6301\u7684\u6a21\u578b\u5217\u8868\uff1a\n\n```python3\n# llama\nfrom tilearn.llm.transformers.models.llama.modeling_llama import LlamaForCausalLM\n\n# bloom\nfrom tilearn.llm.transformers.models.bloom.modeling_bloom import BloomForCausalLM\n\n# baichuan1\nfrom tilearn.llm.transformers.models.baichuan.baichuan1_13B.modeling_baichuan import BaichuanForCausalLM\nfrom tilearn.llm.transformers.models.baichuan.baichuan1_7B.modeling_baichuan import BaiChuanForCausalLM\n\n# baichuan2\n# \u9ed8\u8ba4\u4f7f\u7528TILEARN.LLM\u4e14\u65e0\u9700\u4efb\u4f55\u8bbe\u7f6e\n# \u5355\u72ec\u4f7f\u7528xformer\uff0c\u9700\u5b89\u88c5xformer\u4e14\u8bbe\u7f6e\u73af\u5883\u53d8\u91cfTIACC_TRAINING_CUDA_KERNEL=2\nfrom tilearn.llm.transformers.models.baichuan.baichuan2_7B.modeling_baichuan import BaichuanForCausalLM\nfrom tilearn.llm.transformers.models.baichuan.baichuan2_13B.modeling_baichuan import BaichuanForCausalLM\n\n# aquila2\nfrom tilearn.llm.transformers.models.aquila.aquila2.modeling_aquila import AquilaForCausalLM\n```\n\n#### 1.2 torch compile - experiment\n\n\u9002\u7528\u573a\u666f\uff1ahuggingface transformers + trainer\u6a21\u578b\n\n\u81ea\u52a8\u7f16\u8bd1\u4f18\u5316\uff0c\u5728main.py\u6dfb\u52a0\u5982\u4e0b\u4ee3\u7801\u5373\u53ef\u5f00\u542f\uff0c\u76ee\u524d\u8fd8\u5728\u5b9e\u9a8c\u9636\u6bb5\n\n\n```python\nimport tilearn.llm.compile\n\n```\n\n\u76ee\u524d\u5df2\u652f\u6301\u624b\u5de5CUDA\u7b97\u5b50+\u81ea\u52a8\u7f16\u8bd1\u4f18\u5316\uff0c\u82e5\u8981\u5173\u95ed\u624b\u5de5CUDA\u7b97\u5b50\uff0c\u5219\u6dfb\u52a0\u4ee5\u4e0b\u73af\u5883\u53d8\u91cf\n\n```bash\nexport TILEARN_COMPILE_MODELPATCH=0\n```\n\n#### 1.3 Static Zero\n\n\u9002\u7528\u573a\u666f\uff1a\u5728deepspeed zero1\u3001zero2\u3001zero3\u3001offload\u3001int8\u7b49\u4e0d\u540c\u4f18\u5316\u72b6\u6001\u95f4\u5207\u6362\n\n\u542f\u52a8\u811a\u672c\u4fee\u6539\u5982\u4e0b\n\n```shell\n### TILEARN STATIC ZERO\n### Open: TIACC_TRAINING_CUDA_KERNEL='O2' \n### support 'O2' / 'O2.5' / 'O3' / 'O3.5' / 'O3_Q8'(doing)\n### Close: TIACC_TRAINING_CUDA_KERNEL='None'\nexport TIACC_TRAINING_STATIC_ZERO='None' #'O2'\n```\n\u4ee3\u7801\u4fee\u6539\u5982\u4e0b\n```python3\nfrom transformers import HfArgumentParser\nfrom tilearn.llm.transformers import TrainingArguments\n\t\n### \u63a5\u53e3\u4e0e\u6807\u51c6huggingface\u4e00\u81f4\nparser = HfArgumentParser((ModelArguments, DataTrainingArguments, TrainingArguments))\n```\n\n#### 1.4 \u52a0\u901f\u6548\u679c\n\n[TILEARN-LLM\u5927\u6a21\u578b\u8bad\u7ec3\u52a0\u901f\u6307\u6807](https://doc.weixin.qq.com/sheet/e3_AMgA0QZ_ACcjF99orweSwKk9RITj7?scode=AJEAIQdfAAoOnhk9M1AMgA0QZ_ACc&tab=pl3grp)\n\n\n### 2. \u901a\u7528\u8bad\u7ec3\u52a0\u901f\u529f\u80fd\u4ecb\u7ecd\n\u8bad\u7ec3\u52a0\u901f\u4e2d\u7684\u901a\u4fe1\u52a0\u901f\u80fd\u529b\u901a\u8fc7\u517c\u5bb9\u539f\u751f\u7684DDP\u5de5\u5177\u63d0\u4f9b\uff0c\u7528\u6237\u65e0\u9700\u4fee\u6539\u539f\u751f\u7684\u4f7f\u7528\u4ee3\u7801\u53ef\u76f4\u63a5\u8fdb\u884c\u4f7f\u7528\uff0c\u6570\u636eIO\u4f18\u5316\u3001\u81ea\u9002\u5e94FP16\u90fd\u901a\u8fc7\u5c01\u88c5\u597d\u7684\u7b80\u5355\u51fd\u6570/\u7c7b\u8fdb\u884c\u63d0\u4f9b\uff0c\u7528\u6237\u4ec5\u9700\u589e\u52a0\u51e0\u884c\u4ee3\u7801\u4fbf\u53ef\u4f7f\u7528\u3002\n\n#### 2.1 \u4f7f\u7528DDP\u5206\u5e03\u5f0f\u8bad\u7ec3\u901a\u4fe1\u4f18\u5316\uff08PyTorch+\u591a\u673a\u591a\u5361DPP\uff09\n\u9002\u7528\u8303\u56f4\uff1a\u591a\u673a\u591a\u5361\n\u4ee5\u517c\u5bb9\u539f\u751fDDP\u7684\u65b9\u5f0f\u542f\u52a8\u8bad\u7ec3\u811a\u672c\uff0c\u65e0\u9700\u8fdb\u884c\u8bad\u7ec3\u4ee3\u7801\u7684\u4fee\u6539\uff0c\u542f\u52a8\u547d\u4ee4\u53c2\u8003\u793a\u4f8b\u5982\u4e0b\uff1a\n\u5728\u542f\u52a8\u811a\u672cstart.sh\u5185\u4f7f\u7528tiaccrun\u66ff\u6362torchrun\uff0c\u63a5\u53e3\u4e0epytorch torchrun\u5b8c\u5168\u4e00\u81f4\n\n```bash\nexport NODE_NUM=1\nexport INDEX=0\nexport GPU_NUM_PER_NODE=1\nexport MASTER_ADDR=127.0.0.1\nexport MASTER_PORT=23458\n\ntiaccrun \\\n    --nnodes $NODE_NUM \\\n    --node_rank $INDEX \\\n    --nproc_per_node $GPU_NUM_PER_NODE \\\n    --master_addr $MASTER_ADDR \\\n    --master_port $MASTER_PORT \\\n    xxx.py\n\ntilearnrun \\\n    --nnodes $NODE_NUM \\\n    --node_rank $INDEX \\\n    --nproc_per_node $GPU_NUM_PER_NODE \\\n    --master_addr $MASTER_ADDR \\\n    --master_port $MASTER_PORT \\\n    xxx.py\n```\nDDP\u5206\u5e03\u5f0f\u8bad\u7ec3\u901a\u4fe1\u4f18\u5316\u5b9e\u6d4b\u6548\u679c\uff1a\n\uff08\u52a0\u901f\u6548\u679c\u5728\u591a\u673a\u591a\u5361\u573a\u666f\u65b9\u6709\u4f53\u73b0\uff0c\u5355\u673a\u591a\u5361\u573a\u666f\u4e0e\u539f\u751fDDP\u6027\u80fd\u65e0\u5f02\u3002\uff09\n\n| \u786c\u4ef6\u73af\u5883 |  \u6a21\u578b | GPU\u5361\u6570  | \u539f\u751fDDP(examples/sec per V100) | TI-ACC\u901a\u4fe1\u4f18\u5316(examples/sec per V100) |\n| --------- |--------- |--------|------|--------------------------------|\n| \u817e\u8baf\u4e91GN10Xp.20XLARGE320 | resnext50_32x4d | 1\uff08\u5355\u673a\uff09  | 227  | 227 |\n| \u817e\u8baf\u4e91GN10Xp.20XLARGE320  | resnext50_32x4d | 8\uff08\u5355\u673a\uff09  | 215  | 215 |\n| \u817e\u8baf\u4e91GN10Xp.20XLARGE320  | resnext50_32x4d | 16\uff08\u53cc\u673a\uff09 | 116  | 158.6 |\n\n\n#### 2.2 \u4f7f\u7528TIACC\u4f18\u5316\u5668\uff08PyTorch\uff09\n\u9002\u7528\u8303\u56f4\uff1a\u5355\u673a\u5355\u5361\u3001\u5355\u673a\u591a\u5361\u3001\u591a\u7ea7\u591a\u5361\n```python\nimport torch\n\nfrom tilearn.llm.torch.optimizers import FusedSGD\nfrom tilearn.llm.torch.optimizers import FusedAdam\nfrom tilearn.llm.torch.optimizers import FusedLAMB\nfrom tilearn.llm.torch.optimizers import FusedAdagrad\n\nnelem = 1\ntensor = torch.rand(nelem, dtype=torch.float, device=\"cuda\")\n\nparam = []\nparam.append(torch.nn.Parameter(tensor.clone()))\n\nsgd_options = {\"lr\": .25, \"momentum\": .125}\n\noptimizer =FusedSGD(param, **sgd_options)\noptimizer =FusedAdam(param)\noptimizer =FusedLAMB(param)\noptimizer =FusedAdagrad(param)\n```\n\nFusedSGD\u63a5\u53e3\n```python\nclass FusedSGD(Optimizer):\n    def __init__(self, params, lr=required, momentum=0, \n                 dampening=0, weight_decay=0, nesterov=False)\n```\n\nFusedAdam\u63a5\u53e3\n```python\nclass FusedAdam(Optimizer):\n    def __init__(self, params, lr=1e-3, bias_correction=True,\n                 betas=(0.9, 0.999), eps=1e-8, adam_w_mode=True,\n                 weight_decay=0., amsgrad=False)\n```\n\nFusedLAMB\u63a5\u53e3\n```python\nclass FusedLAMB(Optimizer):\n    def __init__(self, params, lr=1e-3, bias_correction=True,\n                 betas=(0.9, 0.999), eps=1e-6, weight_decay=0.01,\n                 amsgrad=False, adam_w_mode=True,\n                 max_grad_norm=1.0):\n```\n\nFusedAdagrad\u63a5\u53e3\n```python\nclass FusedAdagrad(Optimizer):\n    def __init__(self, params, lr=1e-2, eps=1e-10,\n                 weight_decay=0., adagrad_w_mode=False):\n```\n\n\n#### 2.3 cpu\u4eb2\u548c\u6027\u4f18\u5316\n\u9002\u7528\u8303\u56f4\u5355\u673a8\u5361\u3001\u591a\u673a\u591a\u5361\uff0c\u9700\u7ed3\u5408tiaccrun\u3001torchrun\u4e00\u8d77\u4f7f\u7528\n```python\nfrom tilearn.llm.cpu_affinity import cpu_affinity\n\ndef main():\n    cpu_affinity()\n    \nmain()\n```\n\n#### 2.4 \u4f7f\u7528\u81ea\u9002\u5e94\u6df7\u5408\u7cbe\u5ea6\u4f18\u5316\uff08PyTorch\uff09\n\u9002\u7528\u8303\u56f4\uff1a\u5f00\u542ftorch amp\u540e\uff0closs\u4e0d\u6536\u655b\u6216\u6a21\u578b\u6548\u679c\u4e0b\u964d\u65f6\uff0c\u4f7f\u7528tiacc_training amp\u63a5\u53e3\u63d0\u5347\u6a21\u578b\u6548\u679c\n```python\nimport torch\nfrom tilearn.llm.torch.adapt_amp import MixedPrecision_TrainingPolicy\n\ndef main():\n    #\u4f7f\u7528tiacc\u81ea\u9002\u5e94\u6df7\u5408\u7cbe\u5ea6\n    scaler = torch.cuda.amp.GradScaler()\n    #\u5b9e\u4f8b\u5316tiacc\u81ea\u9002\u5e94\u6df7\u5408\u7cbe\u5ea6\u7b56\u7565\u7c7b\u7684\u5bf9\u8c61\n    schedulePolicy = \"TimeSchedulePolicy\"\n    policy = MixedPrecision_TrainingPolicy(\n            policy=schedulePolicy,\n            start_time=0, end_time=40)\n    #\u6839\u636e\u8f93\u5165\u7684\u53c2\u6570\u5f97\u5230\u5f53\u524depoch\u662f\u5426\u9700\u8981\u5f00\u542f\u6df7\u5408\u7cbe\u5ea6\n    for epoch in range(0, 51):\n        mixed_precision = policy.enable_mixed_precision(epoch,\n                          scaler=scaler)\n\n        print(mixed_precision)\n        #with amp.autocast(enabled=mixed_precision):\n        #    outputs = model(inputs)\n        #\u00a0   loss = criterion(outputs, targets)\n\n        #scaler.scale(loss).backward()\n        #scaler.step(optimizer)\n        #scaler.update()\n\n\nmain()\n```\n##### 1) MixedPrecision_TrainingPolicy\u7c7b\u63a5\u53e3\n\n\u5b9e\u73b0\u5bf9\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u81ea\u52a8\u6df7\u5408\u7cbe\u5ea6\u81ea\u9002\u5e94\u7b56\u7565\u7684\u5b9e\u4f8b\u5316\uff0c\u81ea\u9002\u5e94\u7b56\u7565\u5305\u62ec\u65f6\u95f4\u6df7\u5408\u7cbe\u5ea6\u3001\u65f6\u95f4\u5b66\u4e60\u7387\u6df7\u5408\u7cbe\u5ea6\u7b56\u7565\u3001\u635f\u5931\u51fd\u6570\u6df7\u5408\u7cbe\u5ea6\u7b56\u7565\u3002\n\n\u521d\u59cb\u5316\u53c2\u6570\uff1a\n\n| \u662f\u5426\u5fc5\u586b| \u53c2\u6570\u8bf4\u660e                                                                                                         |\u793a\u4f8b|\u9ed8\u8ba4\u503c|\n| -------- |--------------------------------------------------------------------------------------------------------------| ------ | -------- |\n|\u662f\t| \u81ea\u9002\u5e94\u6df7\u5408\u7cbe\u5ea6\u7b56\u7565\uff0c0:\u65f6\u95f4\u6df7\u5408\u7cbe\u5ea6\uff0c\u9002\u7528\u4e8e\u901a\u7528\u81ea\u9002\u5e94\u60c5\u51b5; 1:\u65f6\u95f4\u5b66\u4e60\u7387\u6df7\u5408\u7cbe\u5ea6\u7b56\u7565\uff0c\u9002\u7528\u4e8e\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u67d0\u4e00\u9636\u6bb5loss\u6ce2\u52a8\u51fa\u73b0\u5f02\u5e38\u7684\u60c5\u51b5; 2:\u635f\u5931\u51fd\u6570\u6df7\u5408\u7cbe\u5ea6\u7b56\u7565\uff0c\u9002\u7528\u4e8e\u8bad\u7ec3\u8fc7\u7a0b\u4e2dloss\u4e0b\u964d\u8fc7\u5feb\u6216\u8fc7\u6162\u60c5\u51b5\u3002\t |0\t|\u65e0|\n| \u5426\t| \u5f00\u542f\u81ea\u9002\u5e94\u6df7\u5408\u7cbe\u5ea6\u7684\u5f00\u59cb\u65f6\u95f4\uff0c\u4e00\u822c\u5efa\u8bae\u8bbe\u4e3a10\u3002\u7b56\u7565\u4e3a0\u548c1\u65f6\u5fc5\u586b\uff0c\u4e3a2\u65f6\u975e\u5fc5\u586b\u3002\t|10\t|10| \n| \u5426\t| \u5f00\u542f\u81ea\u9002\u5e94\u6df7\u5408\u7cbe\u5ea6\u7684\u7ed3\u675f\u65f6\u95f4\uff0c\u4e00\u822c\u5efa\u8bae\u8bbe\u4e3a\u6700\u540e\u4e00\u4e2aepoch\u65f6\u95f4\u3002\u7b56\u7565\u4e3a0\u548c1\u65f6\u5fc5\u586b\uff0c\u4e3a2\u65f6\u975e\u5fc5\u586b\u3002\t| 1000\tNone|\n| \u5426\t| \u5f00\u542f\u7b56\u75651\u65f6\u7684\u4fdd\u6301\u65f6\u95f4\uff0c\u5728\u4fdd\u6301\u65f6\u95f4\u5185\u91c7\u7528\u7edf\u4e00\u7b56\u7565\uff1a\u5f00\u542f\u6216\u8005\u4e0d\u5f00\u542f\u3002\u4e00\u822c\u5efa\u8bae\u4e3a\u8bad\u7ec3\u8fc7\u7a0b\u4e2dloss\u5f02\u5e38\u6ce2\u52a8\u7684\u6301\u7eed\u65f6\u95f4\u3002\u7b56\u7565\u4e3a1\u65f6\u5fc5\u586b\uff0c\u4e3a0\u548c2\u65f6\u975e\u5fc5\u586b\u3002|\t20|\tNone|\n| \u5426\t| \u5f00\u542f\u7b56\u75652\u7684\u95f4\u9694\u65f6\u95f4\uff0c\u9ed8\u8ba4\u503c\u4e3a1000\uff0c\u5373\u6bcf\u95f4\u96941000\u8f6eepoch\u5f00\u542f\u7b56\u75652\u3002\u7b56\u7565\u4e3a2\u65f6\u9700\u8981\u586b\u5199\uff0c\u4e3a0\u548c1\u65f6\u65e0\u9700\u5fc5\u586b\u3002\t|1000\t|1000|\n| \u5426\t| \u5728interval_time\u95f4\u9694\u65f6\u95f4\u5f00\u542f\u7b56\u75652\u540e\u7684\u4fdd\u6301\u65f6\u95f4\uff0c\u9ed8\u8ba4\u503c\u4e3a100\uff0c\u5982interval_time\u4e3a1000\uff0c\u5373\u57281000-1100,2000-2100...\u5f00\u542f\u7b56\u75652\u3002\u7b56\u7565\u4e3a2\u65f6\u9700\u8981\u586b\u5199\uff0c\u4e3a0\u548c1\u65f6\u65e0\u9700\u5fc5\u586b\u3002\t|100\t|100|\n\npolicy\u5b9e\u4f8b\u5316\u5bf9\u8c61: \n\n| \u5bf9\u8c61\t| \u7c7b\u578b | \u5bf9\u8c61\u8bf4\u660e|\n|---|---|---|\n| policy\t| MixedPrecision_TrainingPolicy\u7c7b\t| \u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u81ea\u52a8\u6df7\u5408\u7cbe\u5ea6\u81ea\u9002\u5e94\u7b56\u7565\u7684\u5b9e\u4f8b\u5316\u5bf9\u8c61 |\n\n##### 2) \u81ea\u9002\u5e94\u6df7\u5408\u7cbe\u5ea6 enable_mixed_precision\u51fd\u6570\u65b9\u6cd5\n\u5c5e\u4e8eMixedPrecision_TrainingPolicy\u7c7b\uff0c\u6839\u636e\u8f93\u5165\u7684\u53c2\u6570\u5f97\u5230\u5f53\u524depoch\u662f\u5426\u9700\u8981\u5f00\u542f\u81ea\u52a8\u6df7\u5408\u7cbe\u5ea6\u3002\n\u8f93\u5165\u53c2\u6570\uff1a\n\n| \u53c2\u6570\t| \u7c7b\u578b\t| \u662f\u5426\u5fc5\u586b\t| \u53c2\u6570\u8bf4\u660e\t| \u793a\u4f8b\t| \u9ed8\u8ba4\u503c| \n|-----|-----| -----\t| -----\t| -----\t| -----| \n| epoch\t| INT\t| \u662f | \t\u5f53\u524d\u7684epoch| \t20\t| \u65e0| \n| scaler\t|  torch.cuda.amp.GradScaler \t| \u662f\t| \u68af\u5ea6\u7f29\u653e\u5b9e\u4f8b\u5316\u5bf9\u8c61| \tscaler| \t\u65e0| \n| lr\t| float\t| \u5426\t| lr\u662f\u5f53\u524depoch\u7684\u5b66\u4e60\u7387\t| 0.01\t| None| \n| loss\t| float\t| \u5426\t| loss\u662f\u4e0a\u4e00\u8f6eepoch\u7684\u635f\u5931\u503c| \t0.1\t| None| \n\n\u8f93\u51fa\u53c2\u6570\uff1a\n\n| \u8f93\u51fa\u53c2\u6570\t| \u7c7b\u578b\t| \u53c2\u6570\u8bf4\u660e| \n|-------- | ---------| ---------| \n| mixed_precision\t| BOOL| \t\u8f93\u5165\u7684\u53c2\u6570\u5f97\u5230\u5f53\u524depoch\u662f\u5426\u9700\u8981\u5f00\u542f\u81ea\u52a8\u6df7\u5408\u7cbe\u5ea6\uff0c\u662f\u8fd4\u56deTRUE\uff0c\u5426\u5219\u8fd4\u56deFLASE\u3002| \n\n\n",
    "bugtrack_url": null,
    "license": "Copyright (c) 2023 - 2023 TENCENT CORPORATION. All rights reserved.",
    "summary": "TILEARN for LLM",
    "version": "0.8.7",
    "project_urls": {
        "Homepage": "https://cloud.tencent.com/document/product/1511/64787"
    },
    "split_keywords": [
        "pytorch",
        " llm"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9e543d37d552142ef5e78092a9c2e69af251f38714cc9fb5ed945b41bbc3c554",
                "md5": "68275caf9742ea92d3abc09e93278136",
                "sha256": "f36a42bb997e0b2dad25720d60fecd494a133a8eaa8e3303140e28ef05691188"
            },
            "downloads": -1,
            "filename": "tilearn_llm-0.8.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "68275caf9742ea92d3abc09e93278136",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": ">=3.8",
            "size": 14758921,
            "upload_time": "2024-04-19T02:35:24",
            "upload_time_iso_8601": "2024-04-19T02:35:24.299231Z",
            "url": "https://files.pythonhosted.org/packages/9e/54/3d37d552142ef5e78092a9c2e69af251f38714cc9fb5ed945b41bbc3c554/tilearn_llm-0.8.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9dfefa937a65afa8da7c4dc4d31e133834374636e296c8b17527abd1f1633676",
                "md5": "56b6d0d014a2695cf47c88f95a1c766a",
                "sha256": "88ba560601e4c6d96ea050b4298795d015289e879b62098049db02cd93c415de"
            },
            "downloads": -1,
            "filename": "tilearn_llm-0.8.7-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "56b6d0d014a2695cf47c88f95a1c766a",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.8",
            "size": 14750250,
            "upload_time": "2024-04-19T02:35:33",
            "upload_time_iso_8601": "2024-04-19T02:35:33.179412Z",
            "url": "https://files.pythonhosted.org/packages/9d/fe/fa937a65afa8da7c4dc4d31e133834374636e296c8b17527abd1f1633676/tilearn_llm-0.8.7-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f6c438359b0ea9bfd0754c7160498a0d6a2fc9cc70ff25e0041a537aec2a354a",
                "md5": "12341dd18a7e25727451acbc6565d52a",
                "sha256": "1c6c82aaafd0ea0b0c381c5aba7d5de0f541f53a2067d98e84aeca832e70e8df"
            },
            "downloads": -1,
            "filename": "tilearn_llm-0.8.7-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "12341dd18a7e25727451acbc6565d52a",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": ">=3.8",
            "size": 14750313,
            "upload_time": "2024-04-19T02:35:39",
            "upload_time_iso_8601": "2024-04-19T02:35:39.394754Z",
            "url": "https://files.pythonhosted.org/packages/f6/c4/38359b0ea9bfd0754c7160498a0d6a2fc9cc70ff25e0041a537aec2a354a/tilearn_llm-0.8.7-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-19 02:35:24",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "tilearn-llm"
}
        
Elapsed time: 0.28244s