Experiment Designer
v2.1.1Use when planning product experiments, writing testable hypotheses, estimating sample size, prioritizing tests, or interpreting A/B outcomes with practical s...
0· 494·4 当前·4 累计
安全扫描
OpenClaw
安全
high confidenceThe skill's files and runtime instructions are consistent with an experiment-design helper: it contains documentation and a local sample-size script, asks for no credentials, installs nothing, and does not attempt unexpected access.
评估建议
This skill appears to be what it claims: documentation plus a local Python sample-size calculator. Before using: (1) review the sample_size_calculator.py to ensure its assumptions (two-proportion A/B, equal group sizes, interpretation of relative vs absolute MDE) match your experiment; (2) validate results against another calculator or statistical package when stakes are high; and (3) remember this tool does not handle sequential monitoring, multiple comparisons, or continuous-metric power analy...详细分析 ▾
✓ 用途与能力
Name/description (experiment design, hypothesis writing, sample-size estimation) match the included materials: two reference docs and a local sample-size calculator script. No unrelated credentials, binaries, or config paths are requested.
✓ 指令范围
SKILL.md stays on-topic (hypothesis format, metrics, sample-size estimation, ICE prioritization, stopping rules). The instructions only reference local files included in the package and show how to run the local Python script; they do not direct the agent to read unrelated files or transmit data externally.
✓ 安装机制
No install spec is present (instruction-only skill with one local script). Nothing is downloaded or extracted from external URLs and no packages are installed automatically.
✓ 凭证需求
The skill requires no environment variables, no credentials, and no config paths. All functionality is local and proportional to the stated purpose.
✓ 持久化与权限
always is false and the skill is user-invocable. It does not request persistent system-wide changes or elevated privileges.
安全有层次,运行前请审查代码。
运行时依赖
无特殊依赖
版本
latestv2.1.12026/3/11
v2.1.1: optimization, reference splits
● 无害
安装命令 点击复制
官方npx clawhub@latest install experiment-designer
镜像加速npx clawhub@latest install experiment-designer --registry https://cn.clawhub-mirror.com
技能文档
Design, prioritize, and evaluate product experiments with clear hypotheses and defensible decisions.
When To Use
Use this skill for:
- A/B and multivariate experiment planning
- Hypothesis writing and success criteria definition
- Sample size and minimum detectable effect planning
- Experiment prioritization with ICE scoring
- Reading statistical output for product decisions
Core Workflow
- Write hypothesis in If/Then/Because format
- If we change
[intervention] - Then
[metric]will change by[expected direction/magnitude] - Because
[behavioral mechanism]
- Define metrics before running test
- Primary metric: single decision metric
- Guardrail metrics: quality/risk protection
- Secondary metrics: diagnostics only
- Estimate sample size
- Baseline conversion or baseline mean
- Minimum detectable effect (MDE)
- Significance level (alpha) and power
Use:
python3 scripts/sample_size_calculator.py --baseline-rate 0.12 --mde 0.02 --mde-type absolute
- Prioritize experiments with ICE
- Impact: potential upside
- Confidence: evidence quality
- Ease: cost/speed/complexity
ICE Score = (Impact Confidence Ease) / 10
- Launch with stopping rules
- Decide fixed sample size or fixed duration in advance
- Avoid repeated peeking without proper method
- Monitor guardrails continuously
- Interpret results
- Statistical significance is not business significance
- Compare point estimate + confidence interval to decision threshold
- Investigate novelty effects and segment heterogeneity
Hypothesis Quality Checklist
- [ ] Contains explicit intervention and audience
- [ ] Specifies measurable metric change
- [ ] States plausible causal reason
- [ ] Includes expected minimum effect
- [ ] Defines failure condition
Common Experiment Pitfalls
- Underpowered tests leading to false negatives
- Running too many simultaneous changes without isolation
- Changing targeting or implementation mid-test
- Stopping early on random spikes
- Ignoring sample ratio mismatch and instrumentation drift
- Declaring success from p-value without effect-size context
Statistical Interpretation Guardrails
- p-value < alpha indicates evidence against null, not guaranteed truth.
- Confidence interval crossing zero/no-effect means uncertain directional claim.
- Wide intervals imply low precision even when significant.
- Use practical significance thresholds tied to business impact.
See:
references/experiment-playbook.mdreferences/statistics-reference.md
Tooling
scripts/sample_size_calculator.py
Computes required sample size (per variant and total) from:
- baseline rate
- MDE (absolute or relative)
- significance level (alpha)
- statistical power
Example:
python3 scripts/sample_size_calculator.py \
--baseline-rate 0.10 \
--mde 0.015 \
--mde-type absolute \
--alpha 0.05 \
--power 0.8
数据来源:ClawHub ↗ · 中文优化:龙虾技能库
OpenClaw 技能定制 / 插件定制 / 私有工作流定制
免费技能或插件可能存在安全风险,如需更匹配、更安全的方案,建议联系付费定制