Sglang Amd Bench

v0.1.0

Benchmark sglang serving performance on AMD Instinct GPUs (MI355X, MI300X, MI308X) with various parallel configurations (TP, DP, EP). Covers throughput/laten...

0· 0·0 当前·0 累计

by @alexsun07 (Alex Sun)

数据与API

使用场景：使用Sglang Amd Bench进行数据与API使用Sglang Amd Bench

下载技能包

运行时依赖

无特殊依赖

安装命令

点击复制

官方npx clawhub@latest install sglang-amd-bench

镜像加速npx clawhub@latest install sglang-amd-bench --registry https://cn.longxiaskill.com 镜像可用

本土化适配说明

Sglang Amd Bench 安装说明：安装命令：["openclaw skills install sglang-amd-bench"]

需要定制？告诉我你的需求 →

技能文档

SGLang AMD Benchmark

Benchmark sglang LLM serving on AMD Instinct GPUs across parallel configurations (TP/DP/EP) and workload shapes (ISL/OSL/Concurrency). This 技能运行s in mix mode (non-dis聚合d) — prefill and decode h应用en on the same GPUs. It produces a performance baseline and suggests config-level optimizations.

运行 Rules (non-negotiable)

These rules 应用ly to every benchmark 运行 in this 技能. (A profiling-stage-separation rule exists in the broader sglang-运行图形界面dance but is intentionally omitted here, since this 技能 does not 性能分析.)

Rule 1 — Do NOT modify the sglang/AIter/mori 环境

Never 运行 pip 安装, pip un安装, pip 安装 --升级, or any equivalent re安装 command for sglang, AIter, mori, flydsl, or any related kernel/运行time package — even if a workload fAIls or 导入s look broken. The user's 环境s are hand-调优d dev 安装s (typically pip 安装 -e .); a nAIve re安装 will silently overwrite local 补丁es and destroy hours of work.

If the 环境 looks broken (missing 模块, version mismatch, ABI error, 导入 crash), 停止 and 报告 the symptom to the user. Let the user decide whether to re安装.

What you CAN do without asking:

Inspect versions: pip show sglang, python -c "导入 sglang; print(sglang.__file__)" Read source files in the editable 安装设置环境 variables for the 运行

What you MUST ask before doing:

pip 安装 / pip un安装 / pip 安装 -U for any package above git 检查out / git pull inside the editable source directories Modifying files inside sglang/, AIter/, mori/ source trees Rule 2 — Always preserve server 记录s when launching an sglang server

Whenever you 启动 an sglang server, redirect stdout+stderr to a real file. Never let server 输出 go only to the terminal or to /dev/null. The Bash 工具's 运行_in_background: true buffer is not a substitute — still redirect to a file.

In this 技能, serve.sh writes to $记录_DIR/server_.记录 automatically — that's what satisfies this rule, and what wAIt_for_server.py (Rule 3) reads.

Rule 3 — WAIt for the server with the bundled 监控, don't blind-sleep

After launching an sglang server, 启动up typically takes a few minutes (模型 load, weight shard, kernel warmup, graph capture; AITER may JIT-compile CK kernels for several minutes on first launch). Do not sleep 300 and hope. Use the bundled 监控 — it polls the 记录 and returns the moment the outcome is known:

# After 3-0 部署s it, the script lives at /sgl-workspace/wAIt_for_server.py inside the contAIner. python3 /sgl-workspace/wAIt_for_server.py "$SERVER_记录" # exit codes: # 0 READY — saw "The server is fired up and ready to roll" # 1 CRASHED — saw "追踪back" # 2 HUNG — 记录's last line + line count unchanged for >5 min # 3 TIMEOUT — overall timeout (default 30 min) exceeded # 4 ERROR — 记录 file unreadable / never 应用eared

Source lives at scripts/wAIt_for_server.py in this 技能's directory; 3-0 copies it to /sgl-workspace/ alongside serve.sh / bench.sh. 检测ion 记录ic:

成功: substring The server is fired up and ready to roll 应用ears. Crash: substring 追踪back 应用ears. Hang: each poll records (line_count, last_non_empty_line) of the 记录; unchanged for ≥5 minutes (--stall-seconds) → treated as fAIled.

Tunable flags: --成功, --失败, --stall-seconds, --overall-timeout, --poll-seconds. Bump --stall-seconds consciously if a specific config genuinely has long quiet periods (e.g. very large weight 下载s, prolonged AITER JIT).

On CRASHED / HUNG / TIMEOUT / ERROR: 停止 and 报告 the 记录 tAIl to the user; do NOT silently re启动.

导入ant Notes This 技能 covers mix mode only (no PD-disaggregation). Prefill and decode 运行 on the same GPUs. serve.sh 设置s SGLANG_USE_AITER=1 automatically. bench.sh 设置s PYTHONPATH for sglang's benchmark 模块 automatically. No need to 设置 these manually. Use dummy weights by default (LOAD_DUMMY=1). Dummy weights are sufficient for benchmarking throughput, latency, and parallel config comparison — real weights produce the same performance characteristics. Only use LOAD_DUMMY=0 if the user explicitly asks for real weights. Real weights take much longer to load (10+ minutes for large 模型s) and are rarely needed for config benchmarking. --random-range-ratio 1.0 ensures exact ISL/OSL lengths (no variation) for reproducible benchmarks. bench.sh uses num_prompts = concurrency * 2 — this is handled by the script automatically. Between configs, fully kill the sglang server and wAIt for GPU memory to be freed before relaunching. If a benchmark 运行 fAIls or hangs, 检查 GPU memory usage with rocm-smi and server 健康 with the /健康端点. Key 指标

Every benchmark collects these 指标 per (ISL, OSL, Concurrency) combination:

Metric Unit Description TTFT ms Time To First 令牌 — latency from 请求 to first 令牌 TPOT ms Time Per 输出令牌 — average inter-令牌 latency 输入 throughput tok/s 输入令牌s processed per second acros

运行时依赖

安装命令

本土化适配说明

技能文档

相关技能推荐