tilegym-improve-cutile-kernel-perf
v?Iteratively optimize cuTile kernel performance through systematic profiling, bottleneck analysis, IR comparison, and targeted tuning. Covers tile sizes, occupancy, autotune configs, TMA, latency hints, persistent scheduling, num_ctas, flush_to_zero, and IR-level debugging. Use when asked to "optimize cutile kernel", "improve kernel perf", "tune cutile performance", "make kernel faster", or iteratively benchmark and refine a cuTile GPU kernel in the TileGym project.
0· 1·0 当前·0 累计
运行时依赖
无特殊依赖
安装命令
点击复制官方npx clawhub@latest install tilegym-improve-cutile-kernel-perf
镜像加速npx clawhub@latest install tilegym-improve-cutile-kernel-perf --registry https://cn.longxiaskill.com镜像同步中