Alibabacloud Aes Sysom Pai Diagnosis
v0.0.1Perform SysOM deep diagnosis on Alibaba Cloud PAI products (EAS / DLC) to identify root causes of instance-level issues. Use when users report: - EAS instance anomalies, GPU OOM (out of memory), GPU memory out-of-bounds errors - Slow first-token latency, uneven request scheduling across model service instances - OOM (Out Of Memory), insufficient memory, processes being killed - Abnormally high system load, high IO latency, network jitter, packet loss - Instance crashes, unexpected restarts, kernel oops - DLC training job hangs, communication timeouts, per-step throughput degradation - Any issue related to EAS instance health, DLC job health, or underlying compute resource performance
0· 0·0 当前·0 累计
下载技能包
License
MIT-0
运行时依赖
无特殊依赖
安装命令
点击复制官方npx clawhub@latest install alibabacloud-aes-sysom-pai-diagnosis
镜像加速npx clawhub@latest install alibabacloud-aes-sysom-pai-diagnosis --registry https://cn.longxiaskill.com 镜像可用