Name | l0n0lacl JSON |
Version |
1.0.3
JSON |
| download |
home_page | None |
Summary | 用于调用ascendc编写的算子 |
upload_time | 2024-11-16 02:14:48 |
maintainer | None |
docs_url | None |
author | l0n0l |
requires_python | <4,>=3.7 |
license | None |
keywords |
acl
ascendc
算子
算子开发
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# 1 功能描述
由于在ascendc算子开发过程中运行算子比较复杂,为了简化算子的运行,将运行算子变成可以用python直接调用的函数。所以编写了此代码。
# 2 安装
```
pip install l0n0lacl
```
# 3 运行算子实例
## 3.1 先切换到cann环境,比如我的环境是:
```
source /home/HwHiAiUser/Ascend/ascend-toolkit/set_env.sh
```
## 3.2 先安装我们编写的算子
```
bash custom_opp_xxx_aarch64.run
```
## 3.3 创建算子运行器
```python
from l0n0lacl import *
ascendc_gelu = OpRunner("Gelu", op_path_prefix='customize')
```
## 3.4 调用算子
### 3.4.1 先看调用传参顺序
在算子工程编译后,会有代码生成,在算子工程目录:
`${算子目录}/build_out/autogen/aclnn_xxx.h`中可以找到`aclnnXXXGetWorkspaceSize`函数。以Gelu为例:
```c++
__attribute__((visibility("default")))
aclnnStatus aclnnGeluGetWorkspaceSize(
const aclTensor *x,
const aclTensor *out,
uint64_t *workspaceSize,
aclOpExecutor **executor);
```
可以看到参数为 `x`, `out`, `workspaceSize`, `executor`。其中 `workspaceSize`, `executor`不需要管。
* `aclTensor*`对应`numpy.ndarray`
* 其他参考: <a href = "https://docs.python.org/zh-cn/3/library/ctypes.html#fundamental-data-types">ctypes类型</a>
### 3.4.2 调用算子
```python
import torch
from l0n0lacl import *
ascendc_gelu = OpRunner("Gelu", op_path_prefix='customize')
target_dtype = torch.float
x = torch.empty(shape, dtype=target_dtype).uniform_(-1, 1)
y = torch.empty(shape, dtype=target_dtype).zero_()
out = ascendc_gelu(x.numpy(), y.numpy()).to_cpu()
print(out)
```
# 4. api参考
## 4.1 AclNDTensor
```python
class AclNDTensor:
def __init__(self, np_array: np.ndarray):
pass
def to_cpu(self):
pass
```
numpy ndarray与ascend nd tensor间的桥梁
### 4.1.1 `__init__`
* `np_array`: numpy的tensor
### 4.1.2 `to_cpu`
将运算结果从npu拷贝到cpu
## 4.2 OpRunner
```python
class OpRunner:
def __init__(self, name, op_path_prefix='customize', op_path=None, device_id=0) -> None:
pass
def __call__(self, *args, outCout=1, argtypes=None, stream=None) -> Union[AclNDTensor, List[AclNDTensor]]:
pass
def sync_stream(self)->None:
pass
```
### 4.2.1 `__init__`
* `name`:算子名称,
* `op_path_prefix`: 算子工程中**CMakePresets.json**文件中**vender_name**的值。默认是`customize`,可以不传
```json
"vendor_name": {
"type": "STRING",
"value": "customize"
},
```
* `op_path`: 算子`libcust_opapi.so`库的绝对位置。不传。
* `device_id`: 设备ID。默认`0`
### 4.2.2 `__call__`
* `args`: 表示传给`aclnnXXXGetWorkspaceSize`除了`workspaceSize`, `executor`的参数
* `outCout` : 表示算子的输出个数。如果输出个数为`1`,返回一个`AclNDTensor`。如果输出个数大于1,返回`List[AclNDTensor]`
* `argtypes`: 表示`aclnnXXXGetWorkspaceSize`的参数`ctypes`参数类型,对于特别复杂的算子,如果发现调用异常,可以手动指定类型。
比如(**仅用于举例,其实可以不传,自动推导就可运行。但是当发现运行异常的情况下,可以自己指定**),对于:
```c++
__attribute__((visibility("default")))
aclnnStatus aclnnCumsumGetWorkspaceSize(
const aclTensor *x,
const aclTensor *axis,
bool exclusiveOptional,
bool reverseOptional,
const aclTensor *out,
uint64_t *workspaceSize,
aclOpExecutor **executor);
```
```python
import ctypes
from l0n0lacl import *
ascendc_cumsum = OpRunner("Cumsum")
target_dtype = np.float32
data_range = (-10, 10)
shape = [100, 3, 2304]
axis_py = 1
exclusive = True
reverse = False
x = np.random.uniform(*data_range, shape).astype(target_dtype)
axis = np.array([axis_py]).astype(np.int32)
golden: np.ndarray = tf.cumsum(x, axis_py, exclusive, reverse, argtypes=[
ctypes.c_void_p, # x
ctypes.c_void_p, # axis
ctypes.c_bool, # exclusiveOptional
ctypes.c_bool, # reverseOptional
ctypes.c_void_p, # out
ctypes.c_void_p, # workspaceSize
ctypes.c_void_p, # executor
]).numpy()
y = np.ones_like(golden, golden.dtype) * 123
ascendc_cumsum(x, axis, exclusive, reverse, y).to_cpu()
print(y)
```
* `stream` 如果是多stream的情况下,可以自己指定stream:
例如:
```python
import numpy as np
from l0n0lacl import *
ascendc_gelu = OpRunner("Gelu", op_path_prefix='customize')
target_dtype = np.float32
shape = [10, 10]
x = np.random.uniform(-10, 10, shape).astype(target_dtype)
y = np.zeros_like(x, dtype=target_dtype)
with AclStream(0) as stream:
out = ascendc_gelu(x, y, stream=stream).to_cpu()
print(out)
```
### 4.2.3 `sync_stream`
用于同步stream
## 4.3 verify_result
参考自:https://gitee.com/ascend/samples/blob/master/operator/AddCustomSample/KernelLaunch/AddKernelInvocationNeo/scripts/verify_result.py
```python
def verify_result(real_result:numpy.ndarray, golden:numpy.ndarray):
pass
```
判断精度是否符合
float16: 千分之一
float32: 万分之一
int16,int32,int8: 0
## 4.4 AclArray
```python
class AclArray:
def __init__(self, np_array: np.ndarray):
pass
```
实例:
```c++
__attribute__((visibility("default")))
aclnnStatus aclnnEyeGetWorkspaceSize(
aclTensor *yRef,
int64_t numRows,
int64_t numColumnsOptional,
const aclIntArray *batchShapeOptional,
int64_t dtypeOptional,
uint64_t *workspaceSize,
aclOpExecutor **executor);
```
```python
import tensorflow as tf
from l0n0lacl import *
ascendc_fn = OpRunner("Eye")
for i, target_dtype in enumerate([np.float16, np.float32]):
numRows = 2
numColumnsOptional = 3
batchShapeOptional = 0
dtypeOptional = 0
shape = [numRows * numColumnsOptional]
for value_range in [(-1, 1), (1, 10), (-1000, 1000)]:
y = np.zeros(shape, dtype=target_dtype)
batchShape = AclArray(np.array([1, 2, 3], dtype=np.int64))
output = ascendc_fn(y, numRows, numColumnsOptional, batchShape, 0, outCout=5)
output[0].to_cpu()
golden = tf.eye(numRows, numColumnsOptional)
print(y)
print(golden)
print(value_range)
verify_result(y, golden.numpy().reshape(shape))
```
Raw data
{
"_id": null,
"home_page": null,
"name": "l0n0lacl",
"maintainer": null,
"docs_url": null,
"requires_python": "<4,>=3.7",
"maintainer_email": null,
"keywords": "acl, ascendc, \u7b97\u5b50, \u7b97\u5b50\u5f00\u53d1",
"author": "l0n0l",
"author_email": "1038352856@qq.com",
"download_url": "https://files.pythonhosted.org/packages/c9/07/d843d5d76fe7feb77933d2dd4b5200f7caa04ead57df86734b72def7902e/l0n0lacl-1.0.3.tar.gz",
"platform": null,
"description": "# 1 \u529f\u80fd\u63cf\u8ff0\n\u7531\u4e8e\u5728ascendc\u7b97\u5b50\u5f00\u53d1\u8fc7\u7a0b\u4e2d\u8fd0\u884c\u7b97\u5b50\u6bd4\u8f83\u590d\u6742\uff0c\u4e3a\u4e86\u7b80\u5316\u7b97\u5b50\u7684\u8fd0\u884c\uff0c\u5c06\u8fd0\u884c\u7b97\u5b50\u53d8\u6210\u53ef\u4ee5\u7528python\u76f4\u63a5\u8c03\u7528\u7684\u51fd\u6570\u3002\u6240\u4ee5\u7f16\u5199\u4e86\u6b64\u4ee3\u7801\u3002\n\n# 2 \u5b89\u88c5\n```\npip install l0n0lacl\n```\n\n# 3 \u8fd0\u884c\u7b97\u5b50\u5b9e\u4f8b\n## 3.1 \u5148\u5207\u6362\u5230cann\u73af\u5883,\u6bd4\u5982\u6211\u7684\u73af\u5883\u662f:\n```\nsource /home/HwHiAiUser/Ascend/ascend-toolkit/set_env.sh\n```\n## 3.2 \u5148\u5b89\u88c5\u6211\u4eec\u7f16\u5199\u7684\u7b97\u5b50\n```\nbash custom_opp_xxx_aarch64.run\n```\n## 3.3 \u521b\u5efa\u7b97\u5b50\u8fd0\u884c\u5668\n```python\nfrom l0n0lacl import *\nascendc_gelu = OpRunner(\"Gelu\", op_path_prefix='customize')\n```\n\n## 3.4 \u8c03\u7528\u7b97\u5b50\n### 3.4.1 \u5148\u770b\u8c03\u7528\u4f20\u53c2\u987a\u5e8f\n\u5728\u7b97\u5b50\u5de5\u7a0b\u7f16\u8bd1\u540e\uff0c\u4f1a\u6709\u4ee3\u7801\u751f\u6210\uff0c\u5728\u7b97\u5b50\u5de5\u7a0b\u76ee\u5f55:\n`${\u7b97\u5b50\u76ee\u5f55}/build_out/autogen/aclnn_xxx.h`\u4e2d\u53ef\u4ee5\u627e\u5230`aclnnXXXGetWorkspaceSize`\u51fd\u6570\u3002\u4ee5Gelu\u4e3a\u4f8b\uff1a\n```c++\n__attribute__((visibility(\"default\")))\naclnnStatus aclnnGeluGetWorkspaceSize(\n const aclTensor *x,\n const aclTensor *out,\n uint64_t *workspaceSize,\n aclOpExecutor **executor);\n```\n\u53ef\u4ee5\u770b\u5230\u53c2\u6570\u4e3a `x`, `out`, `workspaceSize`, `executor`\u3002\u5176\u4e2d `workspaceSize`, `executor`\u4e0d\u9700\u8981\u7ba1\u3002\n* `aclTensor*`\u5bf9\u5e94`numpy.ndarray`\n* \u5176\u4ed6\u53c2\u8003: <a href = \"https://docs.python.org/zh-cn/3/library/ctypes.html#fundamental-data-types\">ctypes\u7c7b\u578b</a>\n### 3.4.2 \u8c03\u7528\u7b97\u5b50\n```python\nimport torch\nfrom l0n0lacl import *\nascendc_gelu = OpRunner(\"Gelu\", op_path_prefix='customize')\ntarget_dtype = torch.float\nx = torch.empty(shape, dtype=target_dtype).uniform_(-1, 1)\ny = torch.empty(shape, dtype=target_dtype).zero_()\nout = ascendc_gelu(x.numpy(), y.numpy()).to_cpu()\nprint(out)\n```\n\n# 4. api\u53c2\u8003\n## 4.1 AclNDTensor\n```python\nclass AclNDTensor:\n def __init__(self, np_array: np.ndarray):\n pass\n def to_cpu(self):\n pass\n```\nnumpy ndarray\u4e0eascend nd tensor\u95f4\u7684\u6865\u6881\n### 4.1.1 `__init__`\n* `np_array`: numpy\u7684tensor\n### 4.1.2 `to_cpu`\n\u5c06\u8fd0\u7b97\u7ed3\u679c\u4ecenpu\u62f7\u8d1d\u5230cpu\n## 4.2 OpRunner\n```python\nclass OpRunner:\n def __init__(self, name, op_path_prefix='customize', op_path=None, device_id=0) -> None:\n pass\n def __call__(self, *args, outCout=1, argtypes=None, stream=None) -> Union[AclNDTensor, List[AclNDTensor]]:\n pass\n def sync_stream(self)->None:\n pass\n```\n### 4.2.1 `__init__`\n* `name`:\u7b97\u5b50\u540d\u79f0\uff0c\n* `op_path_prefix`: \u7b97\u5b50\u5de5\u7a0b\u4e2d**CMakePresets.json**\u6587\u4ef6\u4e2d**vender_name**\u7684\u503c\u3002\u9ed8\u8ba4\u662f`customize`,\u53ef\u4ee5\u4e0d\u4f20\n```json\n\"vendor_name\": {\n \"type\": \"STRING\",\n \"value\": \"customize\"\n},\n```\n* `op_path`: \u7b97\u5b50`libcust_opapi.so`\u5e93\u7684\u7edd\u5bf9\u4f4d\u7f6e\u3002\u4e0d\u4f20\u3002\n* `device_id`: \u8bbe\u5907ID\u3002\u9ed8\u8ba4`0`\n\n### 4.2.2 `__call__`\n* `args`: \u8868\u793a\u4f20\u7ed9`aclnnXXXGetWorkspaceSize`\u9664\u4e86`workspaceSize`, `executor`\u7684\u53c2\u6570\n* `outCout` : \u8868\u793a\u7b97\u5b50\u7684\u8f93\u51fa\u4e2a\u6570\u3002\u5982\u679c\u8f93\u51fa\u4e2a\u6570\u4e3a`1`,\u8fd4\u56de\u4e00\u4e2a`AclNDTensor`\u3002\u5982\u679c\u8f93\u51fa\u4e2a\u6570\u5927\u4e8e1,\u8fd4\u56de`List[AclNDTensor]`\n* `argtypes`: \u8868\u793a`aclnnXXXGetWorkspaceSize`\u7684\u53c2\u6570`ctypes`\u53c2\u6570\u7c7b\u578b\uff0c\u5bf9\u4e8e\u7279\u522b\u590d\u6742\u7684\u7b97\u5b50\uff0c\u5982\u679c\u53d1\u73b0\u8c03\u7528\u5f02\u5e38\uff0c\u53ef\u4ee5\u624b\u52a8\u6307\u5b9a\u7c7b\u578b\u3002\n\u6bd4\u5982(**\u4ec5\u7528\u4e8e\u4e3e\u4f8b\uff0c\u5176\u5b9e\u53ef\u4ee5\u4e0d\u4f20\uff0c\u81ea\u52a8\u63a8\u5bfc\u5c31\u53ef\u8fd0\u884c\u3002\u4f46\u662f\u5f53\u53d1\u73b0\u8fd0\u884c\u5f02\u5e38\u7684\u60c5\u51b5\u4e0b\uff0c\u53ef\u4ee5\u81ea\u5df1\u6307\u5b9a**)\uff0c\u5bf9\u4e8e:\n```c++\n__attribute__((visibility(\"default\")))\naclnnStatus aclnnCumsumGetWorkspaceSize(\n const aclTensor *x,\n const aclTensor *axis,\n bool exclusiveOptional,\n bool reverseOptional,\n const aclTensor *out,\n uint64_t *workspaceSize,\n aclOpExecutor **executor);\n```\n\n```python\nimport ctypes\nfrom l0n0lacl import *\nascendc_cumsum = OpRunner(\"Cumsum\")\ntarget_dtype = np.float32\ndata_range = (-10, 10)\nshape = [100, 3, 2304]\naxis_py = 1\nexclusive = True\nreverse = False\nx = np.random.uniform(*data_range, shape).astype(target_dtype)\naxis = np.array([axis_py]).astype(np.int32)\ngolden: np.ndarray = tf.cumsum(x, axis_py, exclusive, reverse, argtypes=[\n ctypes.c_void_p, # x\n ctypes.c_void_p, # axis\n ctypes.c_bool, # exclusiveOptional\n ctypes.c_bool, # reverseOptional\n ctypes.c_void_p, # out\n ctypes.c_void_p, # workspaceSize\n ctypes.c_void_p, # executor\n]).numpy()\ny = np.ones_like(golden, golden.dtype) * 123\nascendc_cumsum(x, axis, exclusive, reverse, y).to_cpu()\nprint(y)\n```\n* `stream` \u5982\u679c\u662f\u591astream\u7684\u60c5\u51b5\u4e0b\uff0c\u53ef\u4ee5\u81ea\u5df1\u6307\u5b9astream:\n\u4f8b\u5982:\n```python\nimport numpy as np\nfrom l0n0lacl import *\nascendc_gelu = OpRunner(\"Gelu\", op_path_prefix='customize')\ntarget_dtype = np.float32\nshape = [10, 10]\nx = np.random.uniform(-10, 10, shape).astype(target_dtype)\ny = np.zeros_like(x, dtype=target_dtype)\nwith AclStream(0) as stream:\n out = ascendc_gelu(x, y, stream=stream).to_cpu()\nprint(out)\n```\n\n### 4.2.3 `sync_stream`\n\u7528\u4e8e\u540c\u6b65stream\n\n## 4.3 verify_result\n\u53c2\u8003\u81ea\uff1ahttps://gitee.com/ascend/samples/blob/master/operator/AddCustomSample/KernelLaunch/AddKernelInvocationNeo/scripts/verify_result.py\n```python\ndef verify_result(real_result:numpy.ndarray, golden:numpy.ndarray):\n pass\n```\n\u5224\u65ad\u7cbe\u5ea6\u662f\u5426\u7b26\u5408\nfloat16: \u5343\u5206\u4e4b\u4e00\nfloat32: \u4e07\u5206\u4e4b\u4e00\nint16,int32,int8: 0\n\n## 4.4 AclArray\n```python\nclass AclArray:\n def __init__(self, np_array: np.ndarray):\n pass\n```\n\u5b9e\u4f8b\uff1a\n```c++\n__attribute__((visibility(\"default\")))\naclnnStatus aclnnEyeGetWorkspaceSize(\n aclTensor *yRef,\n int64_t numRows,\n int64_t numColumnsOptional,\n const aclIntArray *batchShapeOptional,\n int64_t dtypeOptional,\n uint64_t *workspaceSize,\n aclOpExecutor **executor);\n```\n\n```python\nimport tensorflow as tf\nfrom l0n0lacl import *\nascendc_fn = OpRunner(\"Eye\")\nfor i, target_dtype in enumerate([np.float16, np.float32]):\n numRows = 2\n numColumnsOptional = 3\n batchShapeOptional = 0\n dtypeOptional = 0\n shape = [numRows * numColumnsOptional]\n for value_range in [(-1, 1), (1, 10), (-1000, 1000)]:\n y = np.zeros(shape, dtype=target_dtype)\n batchShape = AclArray(np.array([1, 2, 3], dtype=np.int64))\n output = ascendc_fn(y, numRows, numColumnsOptional, batchShape, 0, outCout=5)\n output[0].to_cpu()\n golden = tf.eye(numRows, numColumnsOptional)\n print(y)\n print(golden)\n print(value_range)\n verify_result(y, golden.numpy().reshape(shape))\n```\n",
"bugtrack_url": null,
"license": null,
"summary": "\u7528\u4e8e\u8c03\u7528ascendc\u7f16\u5199\u7684\u7b97\u5b50",
"version": "1.0.3",
"project_urls": null,
"split_keywords": [
"acl",
" ascendc",
" \u7b97\u5b50",
" \u7b97\u5b50\u5f00\u53d1"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "4e06d44e7ac145b9f7ce196e85e2cde984a5fc5d56eca6c3300b924458715215",
"md5": "d9b5189353e1141e6c823c430daf9ff1",
"sha256": "beca9bfefbe4130bb4436d5a4cc8ec999fbc74876def1e72ea38a46016845fa6"
},
"downloads": -1,
"filename": "l0n0lacl-1.0.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "d9b5189353e1141e6c823c430daf9ff1",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4,>=3.7",
"size": 11652,
"upload_time": "2024-11-16T02:14:46",
"upload_time_iso_8601": "2024-11-16T02:14:46.578074Z",
"url": "https://files.pythonhosted.org/packages/4e/06/d44e7ac145b9f7ce196e85e2cde984a5fc5d56eca6c3300b924458715215/l0n0lacl-1.0.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "c907d843d5d76fe7feb77933d2dd4b5200f7caa04ead57df86734b72def7902e",
"md5": "3f25c1ac6c419997d52f9f7e239d2b82",
"sha256": "2c282daf8d23b734651c6059a2ea22d0d9a0c33ed2bdc6aa144b75d12327b7b2"
},
"downloads": -1,
"filename": "l0n0lacl-1.0.3.tar.gz",
"has_sig": false,
"md5_digest": "3f25c1ac6c419997d52f9f7e239d2b82",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4,>=3.7",
"size": 14557,
"upload_time": "2024-11-16T02:14:48",
"upload_time_iso_8601": "2024-11-16T02:14:48.549657Z",
"url": "https://files.pythonhosted.org/packages/c9/07/d843d5d76fe7feb77933d2dd4b5200f7caa04ead57df86734b72def7902e/l0n0lacl-1.0.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-16 02:14:48",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "l0n0lacl"
}