Experiment Designer

Name: Experiment Designer
Author: Alireza Rezvani

Alireza Rezvani

Experiment Designer

v2.1.1

Use when planning product experiments, writing testable hypotheses, estimating sample size, prioritizing tests, or interpreting A/B outcomes with practical s...

0· 494·4 当前·4 累计

by @alirezarezvani (Alireza Rezvani)·MIT-0

下载技能包

License

MIT-0

最后更新

2026/3/11

安全扫描

VirusTotal

无害

查看报告

OpenClaw

安全

high confidence

The skill's files and runtime instructions are consistent with an experiment-design helper: it contains documentation and a local sample-size script, asks for no credentials, installs nothing, and does not attempt unexpected access.

评估建议

This skill appears to be what it claims: documentation plus a local Python sample-size calculator. Before using: (1) review the sample_size_calculator.py to ensure its assumptions (two-proportion A/B, equal group sizes, interpretation of relative vs absolute MDE) match your experiment; (2) validate results against another calculator or statistical package when stakes are high; and (3) remember this tool does not handle sequential monitoring, multiple comparisons, or continuous-metric power analy...

详细分析 ▾

✓ 用途与能力

Name/description (experiment design, hypothesis writing, sample-size estimation) match the included materials: two reference docs and a local sample-size calculator script. No unrelated credentials, binaries, or config paths are requested.

✓ 指令范围

SKILL.md stays on-topic (hypothesis format, metrics, sample-size estimation, ICE prioritization, stopping rules). The instructions only reference local files included in the package and show how to run the local Python script; they do not direct the agent to read unrelated files or transmit data externally.

✓ 安装机制

No install spec is present (instruction-only skill with one local script). Nothing is downloaded or extracted from external URLs and no packages are installed automatically.

✓ 凭证需求

The skill requires no environment variables, no credentials, and no config paths. All functionality is local and proportional to the stated purpose.

✓ 持久化与权限

always is false and the skill is user-invocable. It does not request persistent system-wide changes or elevated privileges.

安全有层次，运行前请审查代码。

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

版本

latestv2.1.12026/3/11

v2.1.1: optimization, reference splits

● 无害

安装命令点击复制

官方npx clawhub@latest install experiment-designer

镜像加速npx clawhub@latest install experiment-designer --registry https://cn.clawhub-mirror.com

技能文档

Design, prioritize, and evaluate product experiments with clear hypotheses and defensible decisions.

When To Use

Use this skill for:

A/B and multivariate experiment planning
Hypothesis writing and success criteria definition
Sample size and minimum detectable effect planning
Experiment prioritization with ICE scoring
Reading statistical output for product decisions

Core Workflow

Write hypothesis in If/Then/Because format
If we change [intervention]
Then [metric] will change by [expected direction/magnitude]
Because [behavioral mechanism]

Define metrics before running test
Primary metric: single decision metric
Guardrail metrics: quality/risk protection
Secondary metrics: diagnostics only

Estimate sample size
Baseline conversion or baseline mean
Minimum detectable effect (MDE)
Significance level (alpha) and power

Use:

python3 scripts/sample_size_calculator.py --baseline-rate 0.12 --mde 0.02 --mde-type absolute

Prioritize experiments with ICE
Impact: potential upside
Confidence: evidence quality
Ease: cost/speed/complexity

ICE Score = (Impact Confidence Ease) / 10

Launch with stopping rules
Decide fixed sample size or fixed duration in advance
Avoid repeated peeking without proper method
Monitor guardrails continuously

Interpret results
Statistical significance is not business significance
Compare point estimate + confidence interval to decision threshold
Investigate novelty effects and segment heterogeneity

Hypothesis Quality Checklist

[ ] Contains explicit intervention and audience
[ ] Specifies measurable metric change
[ ] States plausible causal reason
[ ] Includes expected minimum effect
[ ] Defines failure condition

Common Experiment Pitfalls

Underpowered tests leading to false negatives
Running too many simultaneous changes without isolation
Changing targeting or implementation mid-test
Stopping early on random spikes
Ignoring sample ratio mismatch and instrumentation drift
Declaring success from p-value without effect-size context

Statistical Interpretation Guardrails

p-value < alpha indicates evidence against null, not guaranteed truth.
Confidence interval crossing zero/no-effect means uncertain directional claim.
Wide intervals imply low precision even when significant.
Use practical significance thresholds tied to business impact.

See:

references/experiment-playbook.md
references/statistics-reference.md

Tooling

`scripts/sample_size_calculator.py`

Computes required sample size (per variant and total) from:

baseline rate
MDE (absolute or relative)
significance level (alpha)
statistical power

Example:

python3 scripts/sample_size_calculator.py \
  --baseline-rate 0.10 \
  --mde 0.015 \
  --mde-type absolute \
  --alpha 0.05 \
  --power 0.8

数据来源：ClawHub ↗ · 中文优化：龙虾技能库

OpenClaw 技能定制 / 插件定制 / 私有工作流定制

免费技能或插件可能存在安全风险，如需更匹配、更安全的方案，建议联系付费定制

了解定制服务

License

运行时依赖

版本

安装命令 点击复制

技能文档

When To Use

Core Workflow

Hypothesis Quality Checklist

Common Experiment Pitfalls

Statistical Interpretation Guardrails

Tooling

scripts/sample_size_calculator.py

安装命令点击复制

`scripts/sample_size_calculator.py`