l0n0lacl


Namel0n0lacl JSON
Version 1.0.5 PyPI version JSON
download
home_pageNone
Summary用于调用ascendc编写的算子
upload_time2024-12-24 02:43:45
maintainerNone
docs_urlNone
authorl0n0l
requires_python<4,>=3.7
licenseNone
keywords acl ascendc 算子 算子开发
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # 1 功能描述
由于在ascendc算子开发过程中运行算子比较复杂,为了简化算子的运行,将运行算子变成可以用python直接调用的函数。所以编写了此代码。

# 2 安装
```
pip install l0n0lacl
```

# 3 运行算子实例
## 3.1 先切换到cann环境,比如我的环境是:
```
source /home/HwHiAiUser/Ascend/ascend-toolkit/set_env.sh
```
## 3.2 先安装我们编写的算子
```
bash custom_opp_xxx_aarch64.run
```
## 3.3 创建算子运行器
```python
from l0n0lacl import *
ascendc_gelu = OpRunner("Gelu", op_path_prefix='customize')
```

## 3.4 调用算子
### 3.4.1 先看调用传参顺序
在算子工程编译后,会有代码生成,在算子工程目录:
`${算子目录}/build_out/autogen/aclnn_xxx.h`中可以找到`aclnnXXXGetWorkspaceSize`函数。以Gelu为例:
```c++
__attribute__((visibility("default")))
aclnnStatus aclnnGeluGetWorkspaceSize(
    const aclTensor *x,
    const aclTensor *out,
    uint64_t *workspaceSize,
    aclOpExecutor **executor);
```
可以看到参数为 `x`, `out`, `workspaceSize`, `executor`。其中 `workspaceSize`, `executor`不需要管。
* `aclTensor*`对应`numpy.ndarray`
* 其他参考: <a href = "https://docs.python.org/zh-cn/3/library/ctypes.html#fundamental-data-types">ctypes类型</a>
### 3.4.2 调用算子
```python
import torch
from l0n0lacl import *
ascendc_gelu = OpRunner("Gelu", op_path_prefix='customize')
target_dtype = torch.float
x = torch.empty(shape, dtype=target_dtype).uniform_(-1, 1)
y = torch.empty(shape, dtype=target_dtype).zero_()
out = ascendc_gelu(x.numpy(), y.numpy()).to_cpu()
print(out)
```

# 4. api参考
## 4.1 AclNDTensor
```python
class AclNDTensor:
    def __init__(self, np_array: np.ndarray):
        pass
    def to_cpu(self):
        pass
```
numpy ndarray与ascend nd tensor间的桥梁
### 4.1.1 `__init__`
* `np_array`: numpy的tensor
### 4.1.2 `to_cpu`
将运算结果从npu拷贝到cpu
## 4.2 OpRunner
```python
class OpRunner:
    def __init__(self, name, op_path_prefix='customize', op_path=None, device_id=0) -> None:
        pass
    def __call__(self, *args, outCout=1, argtypes=None, stream=None) -> Union[AclNDTensor, List[AclNDTensor]]:
        pass
    def sync_stream(self)->None:
        pass
```
### 4.2.1 `__init__`
* `name`:算子名称,
* `op_path_prefix`: 算子工程中**CMakePresets.json**文件中**vender_name**的值。默认是`customize`,可以不传
```json
"vendor_name": {
    "type": "STRING",
    "value": "customize"
},
```
* `op_path`: 算子`libcust_opapi.so`库的绝对位置。不传。
* `device_id`: 设备ID。默认`0`

### 4.2.2 `__call__`
* `args`: 表示传给`aclnnXXXGetWorkspaceSize`除了`workspaceSize`, `executor`的参数
* `outCout` : 表示算子的输出个数。如果输出个数为`1`,返回一个`AclNDTensor`。如果输出个数大于1,返回`List[AclNDTensor]`
* `argtypes`: 表示`aclnnXXXGetWorkspaceSize`的参数`ctypes`参数类型,对于特别复杂的算子,如果发现调用异常,可以手动指定类型。
比如(**仅用于举例,其实可以不传,自动推导就可运行。但是当发现运行异常的情况下,可以自己指定**),对于:
```c++
__attribute__((visibility("default")))
aclnnStatus aclnnCumsumGetWorkspaceSize(
    const aclTensor *x,
    const aclTensor *axis,
    bool exclusiveOptional,
    bool reverseOptional,
    const aclTensor *out,
    uint64_t *workspaceSize,
    aclOpExecutor **executor);
```

```python
import ctypes
from l0n0lacl import *
ascendc_cumsum = OpRunner("Cumsum")
target_dtype = np.float32
data_range = (-10, 10)
shape = [100, 3, 2304]
axis_py = 1
exclusive = True
reverse = False
x = np.random.uniform(*data_range, shape).astype(target_dtype)
axis = np.array([axis_py]).astype(np.int32)
golden: np.ndarray = tf.cumsum(x, axis_py, exclusive, reverse, argtypes=[
    ctypes.c_void_p, # x
    ctypes.c_void_p, # axis
    ctypes.c_bool,   # exclusiveOptional
    ctypes.c_bool,   # reverseOptional
    ctypes.c_void_p, # out
    ctypes.c_void_p, # workspaceSize
    ctypes.c_void_p, # executor
]).numpy()
y = np.ones_like(golden, golden.dtype) * 123
ascendc_cumsum(x, axis, exclusive, reverse,  y).to_cpu()
print(y)
```
* `stream` 如果是多stream的情况下,可以自己指定stream:
例如:
```python
import numpy as np
from l0n0lacl import *
ascendc_gelu = OpRunner("Gelu", op_path_prefix='customize')
target_dtype = np.float32
shape = [10, 10]
x = np.random.uniform(-10, 10, shape).astype(target_dtype)
y = np.zeros_like(x, dtype=target_dtype)
with AclStream(0) as stream:
    out = ascendc_gelu(x, y, stream=stream).to_cpu()
print(out)
```

### 4.2.3 `sync_stream`
用于同步stream

## 4.3 verify_result
参考自:https://gitee.com/ascend/samples/blob/master/operator/AddCustomSample/KernelLaunch/AddKernelInvocationNeo/scripts/verify_result.py
```python
def verify_result(real_result:numpy.ndarray, golden:numpy.ndarray):
    pass
```
判断精度是否符合
float16: 千分之一
float32: 万分之一
int16,int32,int8: 0

## 4.4 AclArray
```python
class AclArray:
    def __init__(self, np_array: np.ndarray):
        pass
```
实例:
```c++
__attribute__((visibility("default")))
aclnnStatus aclnnEyeGetWorkspaceSize(
    aclTensor *yRef,
    int64_t numRows,
    int64_t numColumnsOptional,
    const aclIntArray *batchShapeOptional,
    int64_t dtypeOptional,
    uint64_t *workspaceSize,
    aclOpExecutor **executor);
```

```python
import tensorflow as tf
from l0n0lacl import *
ascendc_fn = OpRunner("Eye")
for i, target_dtype in enumerate([np.float16, np.float32]):
    numRows = 2
    numColumnsOptional = 3
    batchShapeOptional = 0
    dtypeOptional = 0
    shape = [numRows * numColumnsOptional]
    for value_range in [(-1, 1), (1, 10), (-1000, 1000)]:
        y = np.zeros(shape, dtype=target_dtype)
        batchShape = AclArray(np.array([1, 2, 3], dtype=np.int64))
        output = ascendc_fn(y, numRows, numColumnsOptional, batchShape, 0, outCout=5)
        output[0].to_cpu()
        golden = tf.eye(numRows, numColumnsOptional)
        print(y)
        print(golden)
        print(value_range)
        verify_result(y, golden.numpy().reshape(shape))
```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "l0n0lacl",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4,>=3.7",
    "maintainer_email": null,
    "keywords": "acl, ascendc, \u7b97\u5b50, \u7b97\u5b50\u5f00\u53d1",
    "author": "l0n0l",
    "author_email": "1038352856@qq.com",
    "download_url": "https://files.pythonhosted.org/packages/2a/2a/865964896537e97f27d1a746e5752ca16f8118cd053ce945ed2b155bae97/l0n0lacl-1.0.5.tar.gz",
    "platform": null,
    "description": "# 1 \u529f\u80fd\u63cf\u8ff0\n\u7531\u4e8e\u5728ascendc\u7b97\u5b50\u5f00\u53d1\u8fc7\u7a0b\u4e2d\u8fd0\u884c\u7b97\u5b50\u6bd4\u8f83\u590d\u6742\uff0c\u4e3a\u4e86\u7b80\u5316\u7b97\u5b50\u7684\u8fd0\u884c\uff0c\u5c06\u8fd0\u884c\u7b97\u5b50\u53d8\u6210\u53ef\u4ee5\u7528python\u76f4\u63a5\u8c03\u7528\u7684\u51fd\u6570\u3002\u6240\u4ee5\u7f16\u5199\u4e86\u6b64\u4ee3\u7801\u3002\n\n# 2 \u5b89\u88c5\n```\npip install l0n0lacl\n```\n\n# 3 \u8fd0\u884c\u7b97\u5b50\u5b9e\u4f8b\n## 3.1 \u5148\u5207\u6362\u5230cann\u73af\u5883,\u6bd4\u5982\u6211\u7684\u73af\u5883\u662f:\n```\nsource /home/HwHiAiUser/Ascend/ascend-toolkit/set_env.sh\n```\n## 3.2 \u5148\u5b89\u88c5\u6211\u4eec\u7f16\u5199\u7684\u7b97\u5b50\n```\nbash custom_opp_xxx_aarch64.run\n```\n## 3.3 \u521b\u5efa\u7b97\u5b50\u8fd0\u884c\u5668\n```python\nfrom l0n0lacl import *\nascendc_gelu = OpRunner(\"Gelu\", op_path_prefix='customize')\n```\n\n## 3.4 \u8c03\u7528\u7b97\u5b50\n### 3.4.1 \u5148\u770b\u8c03\u7528\u4f20\u53c2\u987a\u5e8f\n\u5728\u7b97\u5b50\u5de5\u7a0b\u7f16\u8bd1\u540e\uff0c\u4f1a\u6709\u4ee3\u7801\u751f\u6210\uff0c\u5728\u7b97\u5b50\u5de5\u7a0b\u76ee\u5f55:\n`${\u7b97\u5b50\u76ee\u5f55}/build_out/autogen/aclnn_xxx.h`\u4e2d\u53ef\u4ee5\u627e\u5230`aclnnXXXGetWorkspaceSize`\u51fd\u6570\u3002\u4ee5Gelu\u4e3a\u4f8b\uff1a\n```c++\n__attribute__((visibility(\"default\")))\naclnnStatus aclnnGeluGetWorkspaceSize(\n    const aclTensor *x,\n    const aclTensor *out,\n    uint64_t *workspaceSize,\n    aclOpExecutor **executor);\n```\n\u53ef\u4ee5\u770b\u5230\u53c2\u6570\u4e3a `x`, `out`, `workspaceSize`, `executor`\u3002\u5176\u4e2d `workspaceSize`, `executor`\u4e0d\u9700\u8981\u7ba1\u3002\n* `aclTensor*`\u5bf9\u5e94`numpy.ndarray`\n* \u5176\u4ed6\u53c2\u8003: <a href = \"https://docs.python.org/zh-cn/3/library/ctypes.html#fundamental-data-types\">ctypes\u7c7b\u578b</a>\n### 3.4.2 \u8c03\u7528\u7b97\u5b50\n```python\nimport torch\nfrom l0n0lacl import *\nascendc_gelu = OpRunner(\"Gelu\", op_path_prefix='customize')\ntarget_dtype = torch.float\nx = torch.empty(shape, dtype=target_dtype).uniform_(-1, 1)\ny = torch.empty(shape, dtype=target_dtype).zero_()\nout = ascendc_gelu(x.numpy(), y.numpy()).to_cpu()\nprint(out)\n```\n\n# 4. api\u53c2\u8003\n## 4.1 AclNDTensor\n```python\nclass AclNDTensor:\n    def __init__(self, np_array: np.ndarray):\n        pass\n    def to_cpu(self):\n        pass\n```\nnumpy ndarray\u4e0eascend nd tensor\u95f4\u7684\u6865\u6881\n### 4.1.1 `__init__`\n* `np_array`: numpy\u7684tensor\n### 4.1.2 `to_cpu`\n\u5c06\u8fd0\u7b97\u7ed3\u679c\u4ecenpu\u62f7\u8d1d\u5230cpu\n## 4.2 OpRunner\n```python\nclass OpRunner:\n    def __init__(self, name, op_path_prefix='customize', op_path=None, device_id=0) -> None:\n        pass\n    def __call__(self, *args, outCout=1, argtypes=None, stream=None) -> Union[AclNDTensor, List[AclNDTensor]]:\n        pass\n    def sync_stream(self)->None:\n        pass\n```\n### 4.2.1 `__init__`\n* `name`:\u7b97\u5b50\u540d\u79f0\uff0c\n* `op_path_prefix`: \u7b97\u5b50\u5de5\u7a0b\u4e2d**CMakePresets.json**\u6587\u4ef6\u4e2d**vender_name**\u7684\u503c\u3002\u9ed8\u8ba4\u662f`customize`,\u53ef\u4ee5\u4e0d\u4f20\n```json\n\"vendor_name\": {\n    \"type\": \"STRING\",\n    \"value\": \"customize\"\n},\n```\n* `op_path`: \u7b97\u5b50`libcust_opapi.so`\u5e93\u7684\u7edd\u5bf9\u4f4d\u7f6e\u3002\u4e0d\u4f20\u3002\n* `device_id`: \u8bbe\u5907ID\u3002\u9ed8\u8ba4`0`\n\n### 4.2.2 `__call__`\n* `args`: \u8868\u793a\u4f20\u7ed9`aclnnXXXGetWorkspaceSize`\u9664\u4e86`workspaceSize`, `executor`\u7684\u53c2\u6570\n* `outCout` : \u8868\u793a\u7b97\u5b50\u7684\u8f93\u51fa\u4e2a\u6570\u3002\u5982\u679c\u8f93\u51fa\u4e2a\u6570\u4e3a`1`,\u8fd4\u56de\u4e00\u4e2a`AclNDTensor`\u3002\u5982\u679c\u8f93\u51fa\u4e2a\u6570\u5927\u4e8e1,\u8fd4\u56de`List[AclNDTensor]`\n* `argtypes`: \u8868\u793a`aclnnXXXGetWorkspaceSize`\u7684\u53c2\u6570`ctypes`\u53c2\u6570\u7c7b\u578b\uff0c\u5bf9\u4e8e\u7279\u522b\u590d\u6742\u7684\u7b97\u5b50\uff0c\u5982\u679c\u53d1\u73b0\u8c03\u7528\u5f02\u5e38\uff0c\u53ef\u4ee5\u624b\u52a8\u6307\u5b9a\u7c7b\u578b\u3002\n\u6bd4\u5982(**\u4ec5\u7528\u4e8e\u4e3e\u4f8b\uff0c\u5176\u5b9e\u53ef\u4ee5\u4e0d\u4f20\uff0c\u81ea\u52a8\u63a8\u5bfc\u5c31\u53ef\u8fd0\u884c\u3002\u4f46\u662f\u5f53\u53d1\u73b0\u8fd0\u884c\u5f02\u5e38\u7684\u60c5\u51b5\u4e0b\uff0c\u53ef\u4ee5\u81ea\u5df1\u6307\u5b9a**)\uff0c\u5bf9\u4e8e:\n```c++\n__attribute__((visibility(\"default\")))\naclnnStatus aclnnCumsumGetWorkspaceSize(\n    const aclTensor *x,\n    const aclTensor *axis,\n    bool exclusiveOptional,\n    bool reverseOptional,\n    const aclTensor *out,\n    uint64_t *workspaceSize,\n    aclOpExecutor **executor);\n```\n\n```python\nimport ctypes\nfrom l0n0lacl import *\nascendc_cumsum = OpRunner(\"Cumsum\")\ntarget_dtype = np.float32\ndata_range = (-10, 10)\nshape = [100, 3, 2304]\naxis_py = 1\nexclusive = True\nreverse = False\nx = np.random.uniform(*data_range, shape).astype(target_dtype)\naxis = np.array([axis_py]).astype(np.int32)\ngolden: np.ndarray = tf.cumsum(x, axis_py, exclusive, reverse, argtypes=[\n    ctypes.c_void_p, # x\n    ctypes.c_void_p, # axis\n    ctypes.c_bool,   # exclusiveOptional\n    ctypes.c_bool,   # reverseOptional\n    ctypes.c_void_p, # out\n    ctypes.c_void_p, # workspaceSize\n    ctypes.c_void_p, # executor\n]).numpy()\ny = np.ones_like(golden, golden.dtype) * 123\nascendc_cumsum(x, axis, exclusive, reverse,  y).to_cpu()\nprint(y)\n```\n* `stream` \u5982\u679c\u662f\u591astream\u7684\u60c5\u51b5\u4e0b\uff0c\u53ef\u4ee5\u81ea\u5df1\u6307\u5b9astream:\n\u4f8b\u5982:\n```python\nimport numpy as np\nfrom l0n0lacl import *\nascendc_gelu = OpRunner(\"Gelu\", op_path_prefix='customize')\ntarget_dtype = np.float32\nshape = [10, 10]\nx = np.random.uniform(-10, 10, shape).astype(target_dtype)\ny = np.zeros_like(x, dtype=target_dtype)\nwith AclStream(0) as stream:\n    out = ascendc_gelu(x, y, stream=stream).to_cpu()\nprint(out)\n```\n\n### 4.2.3 `sync_stream`\n\u7528\u4e8e\u540c\u6b65stream\n\n## 4.3 verify_result\n\u53c2\u8003\u81ea\uff1ahttps://gitee.com/ascend/samples/blob/master/operator/AddCustomSample/KernelLaunch/AddKernelInvocationNeo/scripts/verify_result.py\n```python\ndef verify_result(real_result:numpy.ndarray, golden:numpy.ndarray):\n    pass\n```\n\u5224\u65ad\u7cbe\u5ea6\u662f\u5426\u7b26\u5408\nfloat16: \u5343\u5206\u4e4b\u4e00\nfloat32: \u4e07\u5206\u4e4b\u4e00\nint16,int32,int8: 0\n\n## 4.4 AclArray\n```python\nclass AclArray:\n    def __init__(self, np_array: np.ndarray):\n        pass\n```\n\u5b9e\u4f8b\uff1a\n```c++\n__attribute__((visibility(\"default\")))\naclnnStatus aclnnEyeGetWorkspaceSize(\n    aclTensor *yRef,\n    int64_t numRows,\n    int64_t numColumnsOptional,\n    const aclIntArray *batchShapeOptional,\n    int64_t dtypeOptional,\n    uint64_t *workspaceSize,\n    aclOpExecutor **executor);\n```\n\n```python\nimport tensorflow as tf\nfrom l0n0lacl import *\nascendc_fn = OpRunner(\"Eye\")\nfor i, target_dtype in enumerate([np.float16, np.float32]):\n    numRows = 2\n    numColumnsOptional = 3\n    batchShapeOptional = 0\n    dtypeOptional = 0\n    shape = [numRows * numColumnsOptional]\n    for value_range in [(-1, 1), (1, 10), (-1000, 1000)]:\n        y = np.zeros(shape, dtype=target_dtype)\n        batchShape = AclArray(np.array([1, 2, 3], dtype=np.int64))\n        output = ascendc_fn(y, numRows, numColumnsOptional, batchShape, 0, outCout=5)\n        output[0].to_cpu()\n        golden = tf.eye(numRows, numColumnsOptional)\n        print(y)\n        print(golden)\n        print(value_range)\n        verify_result(y, golden.numpy().reshape(shape))\n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "\u7528\u4e8e\u8c03\u7528ascendc\u7f16\u5199\u7684\u7b97\u5b50",
    "version": "1.0.5",
    "project_urls": null,
    "split_keywords": [
        "acl",
        " ascendc",
        " \u7b97\u5b50",
        " \u7b97\u5b50\u5f00\u53d1"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "eaed965c5dbfbaa31a4cbbf53e40a35a00fa4000647b8e958f107f75d3da1aef",
                "md5": "1193808db14fb6d61e06e8b37691e403",
                "sha256": "25b3b800f29271945fb467dea7b524ce852d66ee92600de13081d56d29d4c136"
            },
            "downloads": -1,
            "filename": "l0n0lacl-1.0.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1193808db14fb6d61e06e8b37691e403",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4,>=3.7",
            "size": 11643,
            "upload_time": "2024-12-24T02:43:43",
            "upload_time_iso_8601": "2024-12-24T02:43:43.490209Z",
            "url": "https://files.pythonhosted.org/packages/ea/ed/965c5dbfbaa31a4cbbf53e40a35a00fa4000647b8e958f107f75d3da1aef/l0n0lacl-1.0.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2a2a865964896537e97f27d1a746e5752ca16f8118cd053ce945ed2b155bae97",
                "md5": "b9065cd2d2efe01a7bacc1a5757b3a8d",
                "sha256": "8305297dc4d0fa8f567c5d7800127b8f7991b38c797e6d79cc6206c7877dcc6c"
            },
            "downloads": -1,
            "filename": "l0n0lacl-1.0.5.tar.gz",
            "has_sig": false,
            "md5_digest": "b9065cd2d2efe01a7bacc1a5757b3a8d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4,>=3.7",
            "size": 14540,
            "upload_time": "2024-12-24T02:43:45",
            "upload_time_iso_8601": "2024-12-24T02:43:45.087706Z",
            "url": "https://files.pythonhosted.org/packages/2a/2a/865964896537e97f27d1a746e5752ca16f8118cd053ce945ed2b155bae97/l0n0lacl-1.0.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-24 02:43:45",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "l0n0lacl"
}
        
Elapsed time: 0.42614s