Name | net-prof JSON |
Version |
0.1.10
JSON |
| download |
home_page | None |
Summary | Network Profiler for the HPE Cassini Cray NIC |
upload_time | 2025-07-15 14:21:56 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.7 |
license | None |
keywords |
network
profiler
cxi
cassini
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# net_prof
net_prof is a network profiler library aimed to profile the HPE Cray Cassini Network Interface Card (NIC) on a compute node to collect, analyze and visualize the network counter events. This tool will help to compare and diagnose a successful workload without any network issues with an unsuccessful workload due to a network issue. net-prof summary reports help to understand, analyze, and optimize current network bandwidth usage for any type of communication — whether it’s ping, point-to-point, send-receive, MPI collectives, or PyTorch CCL collectives — by pinpointing why the current communication API is not achieving its theoretical peak performance.
## To Install
```
pip install net_prof
```
## Functions
```
collect(input_directory, output.json)
summarize(before, after)
dump(summary)
dump_html(summary, output.html)
```
### Example Utilizing multi-NIC
```
import net_prof
net_prof.collect("../sys/class/cxi", "/path/to/file/before.json"))
# Perform some sort of action between before and after.
net_prof.collect("../sys/class/cxi", "/path/to/file/after.json"))
summary = net_prof.summarize("/path/to/file/before.json", "/path/to/file/after.json")
net_prof.dump(summary)
net_prof.dump_html(summary, "/path/to/file/report.html")
```
### Instructions for single-NIC collection
If you want to collect a single-NIC, pass in the /telemetry/ directory, otherwise, provide a /cxi/ directory.
For example:
Instead of giving a ../sys/class/cxi/ directory:
```
net_prof.collect("../sys/class/cxi", os.path.join(script_dir, "before.json"))
```
pass in the whole directory up to /telemetry of specific NIC:
```
net_prof.collect("../sys/class/cxi/cxi0/device/telemetry", os.path.join(script_dir, "before.json"))
```
### Test used by Aurora:
```
import os
import net_prof
target_host = "x4306c7s2b0n0.hostmgmt2306.cm.aurora.alcf.anl.gov"
net_prof.collect("/sys/class/cxi/","/lus/flare/projects/datascience/kaushik/network/net-prof-tests/ping-test/before.json")
os.system(f"ping -c 4 {target_host}")
net_prof.collect("/sys/class/cxi/","/lus/flare/projects/datascience/kaushik/network/net-prof-tests/ping-test/after.json")
summary = net_prof.summarize("/lus/flare/projects/datascience/kaushik/network/net-prof-tests/ping-test/before.json", "/lus/flare/projects/datascience/kaushik/network/net-prof-tests/ping-test/after.json")
net_prof.dump(summary)
net_prof.dump_html(summary, "/lus/flare/projects/datascience/kaushik/network/net-prof-tests/ping-test/net_prof_report.html")
```
### Pytorch example:
```
import net_prof
import torch.distributed as dist
net_prof.collect("../sys/class/cxi", "/path/to/file/before.json"))
dist.init_process_group(backend="nccl") # or gloo
x = torch.tensor([1.0], device="cuda")
dist.all_reduce(x, op=dist.ReduceOp.SUM)
net_prof.collect("../sys/class/cxi", "/path/to/file/after.json"))
summary = net_prof.summarize("/path/to/file/before.json", "/path/to/file/after.json")
net_prof.dump(summary)
net_prof.dump_html(summary, "/path/to/file/report.html")
```
### Healthy vs. Faulty Ping Test:
Net-Prof lets you contrast a good and bad run to pinpoint which NIC counters change.
```
# Simulated "Healthy Node"
import net_prof, os
target = "good-node"
net_prof.collect("/sys/class/cxi", "before_healthy.json")
os.system(f"ping -c 4 {target}")
net_prof.collect("/sys/class/cxi", "after_healthy.json")
net_prof.dump_html(net_prof.summarize("before_healthy.json", "after_healthy.json"),
"report_healthy.html")
```
```
# Simulated "Faulty Node"
import net_prof, os
target = "bad-node" # simulate issue (e.g., firewall drop)
net_prof.collect("/sys/class/cxi", "before_faulty.json")
os.system(f"ping -c 4 {target}") # expect high loss / timeout
net_prof.collect("/sys/class/cxi", "after_faulty.json")
net_prof.dump_html(net_prof.summarize("before_faulty.json", "after_faulty.json"),
"report_faulty.html")
```
### Notes
- We are aware that the Ping issue may not be purely due to cxi or nics, there could be many other reasons like memory, network switches or hardware going down, however this tool is helpful to gain network insights.
- A function such as compare() should be devoloped -- This could allow a user to compare a "idle" test to a "real" test, which visualizes changes between the tests.
It could be implemented as such:
```
# DO NOT FOLLOW THIS CODE. THIS IS A REPRESENTATION OF WHAT CAPABALITIES I WANT net_prof TO HAVE IN THE FUTURE
# psuedocode:
net_prof.collect(before_idle.json)
time.sleep(5) # doing effectively "nothing" or just idling...
net_prof.collect(after_idle.json)
idle_test = net_prof.summarize(before_idle.json, after_idle.json)
net_prof.collect(before_ping.json)
os.system(f"ping -c 4 {target}")
net_prof.collect(after_ping.json)
ping_test = net_prof.summarize(before_ping.json, after_ping.json)
compare(idle_test, ping_test, report_idle_vs_ping.html)
```
## Profiler Snapshots




## References
https://cpe.ext.hpe.com/docs/latest/getting_started/HPE-Cassini-Performance-Counters.html
https://github.com/argonne-lcf/net_prof
https://pypi.org/project/net_prof/
Raw data
{
"_id": null,
"home_page": null,
"name": "net-prof",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": null,
"keywords": "network, profiler, cxi, cassini",
"author": null,
"author_email": "Anthony Cardia <acardia@protonmail.com>, Kaushik Velusamy <kaushikvelusamy@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/d2/16/968a13c07aa5d876715809c592dcb3c12df0f615b6cafc5af48d370cc12c/net_prof-0.1.10.tar.gz",
"platform": null,
"description": "# net_prof\n\nnet_prof is a network profiler library aimed to profile the HPE Cray Cassini Network Interface Card (NIC) on a compute node to collect, analyze and visualize the network counter events. This tool will help to compare and diagnose a successful workload without any network issues with an unsuccessful workload due to a network issue. net-prof summary reports help to understand, analyze, and optimize current network bandwidth usage for any type of communication \u2014 whether it\u2019s ping, point-to-point, send-receive, MPI collectives, or PyTorch CCL collectives \u2014 by pinpointing why the current communication API is not achieving its theoretical peak performance.\n\n## To Install\n\n```\npip install net_prof\n```\n\n## Functions\n```\ncollect(input_directory, output.json)\nsummarize(before, after)\ndump(summary)\ndump_html(summary, output.html)\n```\n\n### Example Utilizing multi-NIC\n```\nimport net_prof\n\nnet_prof.collect(\"../sys/class/cxi\", \"/path/to/file/before.json\"))\n# Perform some sort of action between before and after.\nnet_prof.collect(\"../sys/class/cxi\", \"/path/to/file/after.json\"))\n\nsummary = net_prof.summarize(\"/path/to/file/before.json\", \"/path/to/file/after.json\")\n\nnet_prof.dump(summary)\nnet_prof.dump_html(summary, \"/path/to/file/report.html\")\n```\n\n### Instructions for single-NIC collection\nIf you want to collect a single-NIC, pass in the /telemetry/ directory, otherwise, provide a /cxi/ directory.\nFor example:\nInstead of giving a ../sys/class/cxi/ directory:\n```\nnet_prof.collect(\"../sys/class/cxi\", os.path.join(script_dir, \"before.json\"))\n```\npass in the whole directory up to /telemetry of specific NIC:\n```\nnet_prof.collect(\"../sys/class/cxi/cxi0/device/telemetry\", os.path.join(script_dir, \"before.json\"))\n```\n\n### Test used by Aurora:\n```\nimport os\nimport net_prof\n\ntarget_host = \"x4306c7s2b0n0.hostmgmt2306.cm.aurora.alcf.anl.gov\"\n\nnet_prof.collect(\"/sys/class/cxi/\",\"/lus/flare/projects/datascience/kaushik/network/net-prof-tests/ping-test/before.json\")\nos.system(f\"ping -c 4 {target_host}\") \nnet_prof.collect(\"/sys/class/cxi/\",\"/lus/flare/projects/datascience/kaushik/network/net-prof-tests/ping-test/after.json\")\n\nsummary = net_prof.summarize(\"/lus/flare/projects/datascience/kaushik/network/net-prof-tests/ping-test/before.json\", \"/lus/flare/projects/datascience/kaushik/network/net-prof-tests/ping-test/after.json\")\n\nnet_prof.dump(summary)\nnet_prof.dump_html(summary, \"/lus/flare/projects/datascience/kaushik/network/net-prof-tests/ping-test/net_prof_report.html\")\n```\n\n### Pytorch example:\n```\nimport net_prof\nimport torch.distributed as dist\n\nnet_prof.collect(\"../sys/class/cxi\", \"/path/to/file/before.json\"))\ndist.init_process_group(backend=\"nccl\") # or gloo\nx = torch.tensor([1.0], device=\"cuda\")\ndist.all_reduce(x, op=dist.ReduceOp.SUM)\nnet_prof.collect(\"../sys/class/cxi\", \"/path/to/file/after.json\"))\n\nsummary = net_prof.summarize(\"/path/to/file/before.json\", \"/path/to/file/after.json\")\n\nnet_prof.dump(summary)\nnet_prof.dump_html(summary, \"/path/to/file/report.html\")\n```\n\n### Healthy vs. Faulty Ping Test:\nNet-Prof lets you contrast a good and bad run to pinpoint which NIC counters change. \n```\n# Simulated \"Healthy Node\"\n\nimport net_prof, os\n\ntarget = \"good-node\"\n\nnet_prof.collect(\"/sys/class/cxi\", \"before_healthy.json\")\nos.system(f\"ping -c 4 {target}\")\nnet_prof.collect(\"/sys/class/cxi\", \"after_healthy.json\")\n\nnet_prof.dump_html(net_prof.summarize(\"before_healthy.json\", \"after_healthy.json\"),\n \"report_healthy.html\")\n```\n```\n# Simulated \"Faulty Node\"\n\nimport net_prof, os\n\ntarget = \"bad-node\" # simulate issue (e.g., firewall drop)\n\nnet_prof.collect(\"/sys/class/cxi\", \"before_faulty.json\")\nos.system(f\"ping -c 4 {target}\") # expect high loss / timeout\nnet_prof.collect(\"/sys/class/cxi\", \"after_faulty.json\")\n\nnet_prof.dump_html(net_prof.summarize(\"before_faulty.json\", \"after_faulty.json\"),\n \"report_faulty.html\")\n```\n\n### Notes\n- We are aware that the Ping issue may not be purely due to cxi or nics, there could be many other reasons like memory, network switches or hardware going down, however this tool is helpful to gain network insights.\n- A function such as compare() should be devoloped -- This could allow a user to compare a \"idle\" test to a \"real\" test, which visualizes changes between the tests.\n\nIt could be implemented as such:\n```\n# DO NOT FOLLOW THIS CODE. THIS IS A REPRESENTATION OF WHAT CAPABALITIES I WANT net_prof TO HAVE IN THE FUTURE\n# psuedocode:\n\nnet_prof.collect(before_idle.json)\ntime.sleep(5) # doing effectively \"nothing\" or just idling...\nnet_prof.collect(after_idle.json)\n\nidle_test = net_prof.summarize(before_idle.json, after_idle.json)\n\nnet_prof.collect(before_ping.json)\nos.system(f\"ping -c 4 {target}\") \nnet_prof.collect(after_ping.json)\n\nping_test = net_prof.summarize(before_ping.json, after_ping.json)\n\ncompare(idle_test, ping_test, report_idle_vs_ping.html)\n```\n\n\n## Profiler Snapshots\n\n\n\n\n\n\n## References\n\nhttps://cpe.ext.hpe.com/docs/latest/getting_started/HPE-Cassini-Performance-Counters.html\n\nhttps://github.com/argonne-lcf/net_prof\n\nhttps://pypi.org/project/net_prof/\n\n",
"bugtrack_url": null,
"license": null,
"summary": "Network Profiler for the HPE Cassini Cray NIC",
"version": "0.1.10",
"project_urls": {
"Homepage": "https://github.com/argonne-lcf/net_prof"
},
"split_keywords": [
"network",
" profiler",
" cxi",
" cassini"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "37b755ed170decaadab56065f8fea0465729989117692e52573bb19f84f33ed6",
"md5": "87e1edcd4f7dd569bc73b682ada52f7a",
"sha256": "3a65879592ce736603d8b579030cd127e23c0009baf23195a5f571f6dd3ae508"
},
"downloads": -1,
"filename": "net_prof-0.1.10-py3-none-any.whl",
"has_sig": false,
"md5_digest": "87e1edcd4f7dd569bc73b682ada52f7a",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 25679,
"upload_time": "2025-07-15T14:21:54",
"upload_time_iso_8601": "2025-07-15T14:21:54.081170Z",
"url": "https://files.pythonhosted.org/packages/37/b7/55ed170decaadab56065f8fea0465729989117692e52573bb19f84f33ed6/net_prof-0.1.10-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "d216968a13c07aa5d876715809c592dcb3c12df0f615b6cafc5af48d370cc12c",
"md5": "1fd41aa813e8c64f579bff4d62421ae6",
"sha256": "6165c8e8704e8ed6c32c0d1051bccb1a790e5c9474ea9401610e9983aaac72f2"
},
"downloads": -1,
"filename": "net_prof-0.1.10.tar.gz",
"has_sig": false,
"md5_digest": "1fd41aa813e8c64f579bff4d62421ae6",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 26990,
"upload_time": "2025-07-15T14:21:56",
"upload_time_iso_8601": "2025-07-15T14:21:56.294241Z",
"url": "https://files.pythonhosted.org/packages/d2/16/968a13c07aa5d876715809c592dcb3c12df0f615b6cafc5af48d370cc12c/net_prof-0.1.10.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-15 14:21:56",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "argonne-lcf",
"github_project": "net_prof",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "net-prof"
}