# Probing: AI应用的性能与稳定性诊断工具
Probing 是一款专为AI应用设计的性能与稳定性诊断工具,旨在解决大规模、分布式、长周期AI异构计算任务(如LLM训练和推理)中的调试与优化难题。通过向目标进程植入探针,可以更详细地采集性能数据,或实时修改目标进程的执行行为。
## 主要特性
Probing的主要功能包括:
- 调试功能:
- 观测目标进程的调用栈、Python对象、Torch Tensor与模块等;
- 支持远程调试,可通过DAP协议使用VSCode远程调试目标进程;
- 性能剖析:
- 对C/C++代码进行性能采样,并生成火焰图;
- 支持Torch的profiling功能,分析模型性能;
- 远程控制:
- 提供HTTP接口,用于获取数据和控制目标进程执行;
- 支持远程注入任意Python代码至目标进程。
相比其他调试与诊断工具,`probing`能够即插即用,可在任意时刻侵入目标进程,无需中断或重启,也无需修改代码。
## Quick Start
### 探针注入
`probing`通过探针采集数据和控制目标进程,有两种方式用于注入探针:
1. **通过命令行注入**
```shell
probing <pid> inject [OPTIONS]
```
选项:`-P,--pprof` 启用 profiling;`-c,--crash` 启用崩溃处理;`-l,--listen <ADDRESS>` 在指定地址服务监听远程连接。
2. **通过代码注入**
```python
import probing
probing.init(listen="127.0.0.1:9922")
```
### 命令行与REPL
`probing`通过一系列指令控制探针来获取数据或是执行特定操作,以下为`probing`的命令行:
```
Probing CLI - A performance and stability diagnostic tool for AI applications
Usage: probing [OPTIONS] <TARGET> [COMMAND]
Commands:
inject Inject into the target process [aliases: inj, i]
panel Interactive visualizer in terminal [aliases: pnl, console]
repl Repl debugging shell
enable Enable features (`-h, --help` to see full feature list)
disable Disable features (see `-h, --help` above)
show Display informations from the target process (see `-h, --help` above)
backtrace Show the backtrace of the target process or thread [aliases: bt]
eval Evaluate code in the target process
help Print this message or the help of the given subcommand(s)
Arguments:
<TARGET> target process, PID (e.g., 1234) or `Name` (e.g., "chrome.exe") for local process, and <ip>:<port> for remote process
Options:
-v, --verbose Enable verbose mode
--ptrace Send ctrl commands via ptrace
-h, --help Print help
```
其中`enable`,`disable`,`show`,`backtrace`和`eval`是主要的控制指令:
- enable:启用某特性,特性列表如下:
- pprof:启用profinling;
- dap:启用dap远程调试;
- remote:启用tcp远程控制;
- catch-crash:启用crash handler
- disable:禁用某特性,特性列表同上;
- show:显示目标进程信息
- memory:内存信息
- threads:线程信息
- objects:python对象信息
- tensors:pytorch tensor信息
- modules:pytorch module信息
- plt:过程链接表(PLT, Procedure Linkage Table)
- backtrace:抓取目标进程调用堆栈
- eval:向目标进程注入特定代码并执行;
上述指令可以通过命令行发送,也可以通过发送。
### Web Panel 与 Console Panel
`probing`的功能可以通过web方式可视化访问,例如:
```shell
probing <pid> inject -l 127.0.0.1:1234
```
之后可以通过浏览器打开`http://127.0.0.1:1234`来使用上述功能。若无法通过浏览器访问,也可从终端打开交互界面:
```shell
probing <pid> panel
```
## 安装probing
### 二进制安装
`probing` 可以通过pip命令安装:
```sh
$pip install probing
```
### 源码构建
`probing` 构建时依赖`trunk`工具,可通过如下命令安装,若已经安装可以跳过此步:
```shell
cargo install trunk
```
构建环境准备就绪后,可以通过`make`命令来完成构建
```shell
$make
```
### 开发模式
为了便于用户使用,probing将python脚本与web app打包进libprobing.so中。开发时每次修改代码都要重新打包会极大的降低效率。
因此这里推荐手动构建:
```shell
# 持续构建web app
cd app
trunk watch --filehash false -d dist/
# 构建probing与libprobing
cargo b -p probing-cli
cargo b
```
在debug模式下,`probing`会自动从dist目录加载web app,从src/加载python脚本,而无需重新打包。
Raw data
{
"_id": null,
"home_page": "https://github.com/reiase/probing",
"name": "probing",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "debug, performance, python",
"author": "reiase <reiase@gmail.com>",
"author_email": "reiase <reiase@gmail.com>",
"download_url": null,
"platform": null,
"description": "# Probing: AI\u5e94\u7528\u7684\u6027\u80fd\u4e0e\u7a33\u5b9a\u6027\u8bca\u65ad\u5de5\u5177\n\nProbing \u662f\u4e00\u6b3e\u4e13\u4e3aAI\u5e94\u7528\u8bbe\u8ba1\u7684\u6027\u80fd\u4e0e\u7a33\u5b9a\u6027\u8bca\u65ad\u5de5\u5177\uff0c\u65e8\u5728\u89e3\u51b3\u5927\u89c4\u6a21\u3001\u5206\u5e03\u5f0f\u3001\u957f\u5468\u671fAI\u5f02\u6784\u8ba1\u7b97\u4efb\u52a1\uff08\u5982LLM\u8bad\u7ec3\u548c\u63a8\u7406\uff09\u4e2d\u7684\u8c03\u8bd5\u4e0e\u4f18\u5316\u96be\u9898\u3002\u901a\u8fc7\u5411\u76ee\u6807\u8fdb\u7a0b\u690d\u5165\u63a2\u9488\uff0c\u53ef\u4ee5\u66f4\u8be6\u7ec6\u5730\u91c7\u96c6\u6027\u80fd\u6570\u636e\uff0c\u6216\u5b9e\u65f6\u4fee\u6539\u76ee\u6807\u8fdb\u7a0b\u7684\u6267\u884c\u884c\u4e3a\u3002\n\n## \u4e3b\u8981\u7279\u6027\n\nProbing\u7684\u4e3b\u8981\u529f\u80fd\u5305\u62ec\uff1a\n\n- \u8c03\u8bd5\u529f\u80fd\uff1a\n - \u89c2\u6d4b\u76ee\u6807\u8fdb\u7a0b\u7684\u8c03\u7528\u6808\u3001Python\u5bf9\u8c61\u3001Torch Tensor\u4e0e\u6a21\u5757\u7b49\uff1b\n - \u652f\u6301\u8fdc\u7a0b\u8c03\u8bd5\uff0c\u53ef\u901a\u8fc7DAP\u534f\u8bae\u4f7f\u7528VSCode\u8fdc\u7a0b\u8c03\u8bd5\u76ee\u6807\u8fdb\u7a0b\uff1b\n- \u6027\u80fd\u5256\u6790\uff1a\n - \u5bf9C/C++\u4ee3\u7801\u8fdb\u884c\u6027\u80fd\u91c7\u6837\uff0c\u5e76\u751f\u6210\u706b\u7130\u56fe\uff1b\n - \u652f\u6301Torch\u7684profiling\u529f\u80fd\uff0c\u5206\u6790\u6a21\u578b\u6027\u80fd\uff1b\n- \u8fdc\u7a0b\u63a7\u5236\uff1a\n - \u63d0\u4f9bHTTP\u63a5\u53e3\uff0c\u7528\u4e8e\u83b7\u53d6\u6570\u636e\u548c\u63a7\u5236\u76ee\u6807\u8fdb\u7a0b\u6267\u884c\uff1b\n - \u652f\u6301\u8fdc\u7a0b\u6ce8\u5165\u4efb\u610fPython\u4ee3\u7801\u81f3\u76ee\u6807\u8fdb\u7a0b\u3002\n\n\u76f8\u6bd4\u5176\u4ed6\u8c03\u8bd5\u4e0e\u8bca\u65ad\u5de5\u5177\uff0c`probing`\u80fd\u591f\u5373\u63d2\u5373\u7528\uff0c\u53ef\u5728\u4efb\u610f\u65f6\u523b\u4fb5\u5165\u76ee\u6807\u8fdb\u7a0b\uff0c\u65e0\u9700\u4e2d\u65ad\u6216\u91cd\u542f\uff0c\u4e5f\u65e0\u9700\u4fee\u6539\u4ee3\u7801\u3002\n\n## Quick Start\n\n### \u63a2\u9488\u6ce8\u5165\n\n`probing`\u901a\u8fc7\u63a2\u9488\u91c7\u96c6\u6570\u636e\u548c\u63a7\u5236\u76ee\u6807\u8fdb\u7a0b\uff0c\u6709\u4e24\u79cd\u65b9\u5f0f\u7528\u4e8e\u6ce8\u5165\u63a2\u9488\uff1a\n\n1. **\u901a\u8fc7\u547d\u4ee4\u884c\u6ce8\u5165**\n\n```shell\nprobing <pid> inject [OPTIONS]\n```\n\n\u9009\u9879\uff1a`-P,--pprof` \u542f\u7528 profiling\uff1b`-c,--crash` \u542f\u7528\u5d29\u6e83\u5904\u7406\uff1b`-l,--listen <ADDRESS>` \u5728\u6307\u5b9a\u5730\u5740\u670d\u52a1\u76d1\u542c\u8fdc\u7a0b\u8fde\u63a5\u3002\n\n2. **\u901a\u8fc7\u4ee3\u7801\u6ce8\u5165**\n\n```python\nimport probing\nprobing.init(listen=\"127.0.0.1:9922\")\n```\n\n### \u547d\u4ee4\u884c\u4e0eREPL\n\n`probing`\u901a\u8fc7\u4e00\u7cfb\u5217\u6307\u4ee4\u63a7\u5236\u63a2\u9488\u6765\u83b7\u53d6\u6570\u636e\u6216\u662f\u6267\u884c\u7279\u5b9a\u64cd\u4f5c\uff0c\u4ee5\u4e0b\u4e3a`probing`\u7684\u547d\u4ee4\u884c\uff1a\n\n```\nProbing CLI - A performance and stability diagnostic tool for AI applications\n\nUsage: probing [OPTIONS] <TARGET> [COMMAND]\n\nCommands:\n inject Inject into the target process [aliases: inj, i]\n panel Interactive visualizer in terminal [aliases: pnl, console]\n repl Repl debugging shell\n enable Enable features (`-h, --help` to see full feature list)\n disable Disable features (see `-h, --help` above)\n show Display informations from the target process (see `-h, --help` above)\n backtrace Show the backtrace of the target process or thread [aliases: bt]\n eval Evaluate code in the target process\n help Print this message or the help of the given subcommand(s)\n\nArguments:\n <TARGET> target process, PID (e.g., 1234) or `Name` (e.g., \"chrome.exe\") for local process, and <ip>:<port> for remote process\n\nOptions:\n -v, --verbose Enable verbose mode\n --ptrace Send ctrl commands via ptrace\n -h, --help Print help\n\n```\n\n\u5176\u4e2d`enable`\uff0c`disable`\uff0c`show`\uff0c`backtrace`\u548c`eval`\u662f\u4e3b\u8981\u7684\u63a7\u5236\u6307\u4ee4\uff1a\n- enable\uff1a\u542f\u7528\u67d0\u7279\u6027\uff0c\u7279\u6027\u5217\u8868\u5982\u4e0b\uff1a\n - pprof\uff1a\u542f\u7528profinling\uff1b\n - dap\uff1a\u542f\u7528dap\u8fdc\u7a0b\u8c03\u8bd5\uff1b\n - remote\uff1a\u542f\u7528tcp\u8fdc\u7a0b\u63a7\u5236\uff1b\n - catch-crash\uff1a\u542f\u7528crash handler\n- disable\uff1a\u7981\u7528\u67d0\u7279\u6027\uff0c\u7279\u6027\u5217\u8868\u540c\u4e0a\uff1b\n- show\uff1a\u663e\u793a\u76ee\u6807\u8fdb\u7a0b\u4fe1\u606f\n - memory\uff1a\u5185\u5b58\u4fe1\u606f\n - threads\uff1a\u7ebf\u7a0b\u4fe1\u606f \n - objects\uff1apython\u5bf9\u8c61\u4fe1\u606f\n - tensors\uff1apytorch tensor\u4fe1\u606f\n - modules\uff1apytorch module\u4fe1\u606f\n - plt\uff1a\u8fc7\u7a0b\u94fe\u63a5\u8868\uff08PLT, Procedure Linkage Table\uff09\n- backtrace\uff1a\u6293\u53d6\u76ee\u6807\u8fdb\u7a0b\u8c03\u7528\u5806\u6808\n- eval\uff1a\u5411\u76ee\u6807\u8fdb\u7a0b\u6ce8\u5165\u7279\u5b9a\u4ee3\u7801\u5e76\u6267\u884c\uff1b\n\n\u4e0a\u8ff0\u6307\u4ee4\u53ef\u4ee5\u901a\u8fc7\u547d\u4ee4\u884c\u53d1\u9001\uff0c\u4e5f\u53ef\u4ee5\u901a\u8fc7\u53d1\u9001\u3002\n\n### Web Panel \u4e0e Console Panel\n\n`probing`\u7684\u529f\u80fd\u53ef\u4ee5\u901a\u8fc7web\u65b9\u5f0f\u53ef\u89c6\u5316\u8bbf\u95ee\uff0c\u4f8b\u5982\uff1a\n\n```shell\nprobing <pid> inject -l 127.0.0.1:1234\n```\n\n\u4e4b\u540e\u53ef\u4ee5\u901a\u8fc7\u6d4f\u89c8\u5668\u6253\u5f00`http://127.0.0.1:1234`\u6765\u4f7f\u7528\u4e0a\u8ff0\u529f\u80fd\u3002\u82e5\u65e0\u6cd5\u901a\u8fc7\u6d4f\u89c8\u5668\u8bbf\u95ee\uff0c\u4e5f\u53ef\u4ece\u7ec8\u7aef\u6253\u5f00\u4ea4\u4e92\u754c\u9762\uff1a\n\n```shell\nprobing <pid> panel\n```\n\n## \u5b89\u88c5probing\n\n### \u4e8c\u8fdb\u5236\u5b89\u88c5\n\n`probing` \u53ef\u4ee5\u901a\u8fc7pip\u547d\u4ee4\u5b89\u88c5\uff1a\n\n```sh\n$pip install probing\n```\n\n### \u6e90\u7801\u6784\u5efa\n\n`probing` \u6784\u5efa\u65f6\u4f9d\u8d56`trunk`\u5de5\u5177\uff0c\u53ef\u901a\u8fc7\u5982\u4e0b\u547d\u4ee4\u5b89\u88c5\uff0c\u82e5\u5df2\u7ecf\u5b89\u88c5\u53ef\u4ee5\u8df3\u8fc7\u6b64\u6b65\uff1a\n```shell\ncargo install trunk\n```\n\u6784\u5efa\u73af\u5883\u51c6\u5907\u5c31\u7eea\u540e\uff0c\u53ef\u4ee5\u901a\u8fc7`make`\u547d\u4ee4\u6765\u5b8c\u6210\u6784\u5efa\n```shell\n$make\n```\n\n### \u5f00\u53d1\u6a21\u5f0f\n\n\u4e3a\u4e86\u4fbf\u4e8e\u7528\u6237\u4f7f\u7528\uff0cprobing\u5c06python\u811a\u672c\u4e0eweb app\u6253\u5305\u8fdblibprobing.so\u4e2d\u3002\u5f00\u53d1\u65f6\u6bcf\u6b21\u4fee\u6539\u4ee3\u7801\u90fd\u8981\u91cd\u65b0\u6253\u5305\u4f1a\u6781\u5927\u7684\u964d\u4f4e\u6548\u7387\u3002\n\u56e0\u6b64\u8fd9\u91cc\u63a8\u8350\u624b\u52a8\u6784\u5efa:\n\n```shell\n# \u6301\u7eed\u6784\u5efaweb app\ncd app\ntrunk watch --filehash false -d dist/\n\n# \u6784\u5efaprobing\u4e0elibprobing\ncargo b -p probing-cli\ncargo b\n```\n\n\u5728debug\u6a21\u5f0f\u4e0b\uff0c`probing`\u4f1a\u81ea\u52a8\u4ecedist\u76ee\u5f55\u52a0\u8f7dweb app\uff0c\u4ecesrc/\u52a0\u8f7dpython\u811a\u672c\uff0c\u800c\u65e0\u9700\u91cd\u65b0\u6253\u5305\u3002\n\n",
"bugtrack_url": null,
"license": "GPL3",
"summary": "Performance and Stability Diagnostic Tool for AI Applications",
"version": "0.1.6",
"project_urls": {
"Homepage": "https://github.com/reiase/probing",
"Source Code": "https://github.com/reiase/probing"
},
"split_keywords": [
"debug",
" performance",
" python"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "030dccd6b7e3eb33817872a8e3ca72f02841ce12bcfc490f97fea8834851a8e8",
"md5": "965eb6c2c69bc91ca318c5ffe4e22c7b",
"sha256": "38ee84614dc9d61c358c25c68e23dd4220b90e36ccad97c21b0b1c957f6225d4"
},
"downloads": -1,
"filename": "probing-0.1.6-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "965eb6c2c69bc91ca318c5ffe4e22c7b",
"packagetype": "bdist_wheel",
"python_version": "cp37",
"requires_python": null,
"size": 1193571,
"upload_time": "2024-08-06T05:46:30",
"upload_time_iso_8601": "2024-08-06T05:46:30.806219Z",
"url": "https://files.pythonhosted.org/packages/03/0d/ccd6b7e3eb33817872a8e3ca72f02841ce12bcfc490f97fea8834851a8e8/probing-0.1.6-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-08-06 05:46:30",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "reiase",
"github_project": "probing",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "probing"
}