详细分析 ▾
运行时依赖
版本
Initial release of advanced-evaluation, a comprehensive skill for building robust LLM evaluation systems. - Provides actionable guidance for implementing LLM-as-judge in automated pipelines. - Explains evaluation methods: direct scoring vs. pairwise comparison, with reliability and bias considerations. - Details systemic LLM biases (e.g., position, length, self-enhancement) and mitigation strategies. - Outlines metric selection frameworks for different evaluation tasks. - Supplies prompt templates and protocols for direct scoring, pairwise comparison, and rubric creation. - Offers practical patterns for evaluation pipeline design and rubric adaptation by domain.
安装命令
点击复制本土化适配说明
Advanced Evaluation — LLM评估 安装说明: 安装命令:npx clawhub@latest install advanced-evaluation