Gen Test Plan — 自动生成端到端测试计划

Name: Gen Test Plan — 自动生成端到端测试计划
Author: Kevin Anderson

Kevin Anderson

Gen Test Plan — 自动生成端到端测试计划

v1.0.0

分析仓库，检测技术栈，追踪对用户可见入口点的更改，生成可执行的端到端（E2E）YAML测试计划，模拟真实人工QA测试流程。

0· 31·1 当前·1 累计

by @anderskev (Kevin Anderson)·MIT-0

测试工具代码生成自动化开发工具数据分析

下载技能包

License

MIT-0

最后更新

2026/4/11

安全扫描

VirusTotal

无害

查看报告

OpenClaw

可疑

medium confidence

该技能旨在生成端到端测试计划，但其运行指令假设访问了许多本地工具、凭据和服务而未声明，这可能导致不一致，使用前请了解。

评估建议

虽然该技能看似合法的E2E测试计划生成器，但它假设可以读取仓库、发现端口、构建/运行二进制文件、访问数据库，并使用诸如psql和agent-browser等工具——却没有声明任何所需的二进制文件或环境变量。使用前，请：1) 请求作者在清单中列出所需的CLI和环境变量；2) 在隔离环境中检查生成的测试计划；3) 确保提供所需的凭据；4) 如果需要更高的保证，请请求作者添加一个最小的`requires`部分。...

详细分析 ▾

⚠ 用途与能力

名称/描述与SKILL.md意图匹配，但指令需要与未声明的构建工件、数据库、局部服务器和`agent-browser` CLI交互，能力与要求映射不完整。

⚠ 指令范围

SKILL.md包含许多具体的运行时操作（git命令、grep、启动服务器、curl、psql、agent-browser、docker-compose、构建二进制文件、查询DATABASE_URL、引用ANTHROPIC_API_KEY的示例）。这些操作在E2E测试范围内，但它们读取和操作未声明的本地系统状态和凭据，可能在执行时产生实际副作用。指令未尝试限制或沙盒这些操作。

✓ 安装机制

仅指令的技能，无安装规格和代码文件——供应链/安装风险低。安装步骤不会下载或写入任何内容。

⚠ 凭证需求

文档明确引用环境值和外部工具（例如DATABASE_URL、ANTHROPIC_API_KEY、psql、agent-browser），并期望代理与数据库和服务交互，但清单声明了零个所需的环境变量或二进制文件。这种不匹配可能导致运行时意外使用敏感凭据或缺失先决条件。

✓ 持久化与权限

always:false和disable-model-invocation:true减少了自主风险（模型不会自主调用此技能）。该技能不请求持久的代理范围内的权限，也不修改其他技能。未请求任何提升的“always”权限。

安全有层次，运行前请审查代码。

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

版本

latestv1.0.02026/4/11

Gen Test Plan的初始发布——通过分析代码和仓库更改生成E2E YAML测试计划，重点关注真实用户行为。功能包括：分析仓库更改、检测技术栈、追踪对用户可见入口点的更改、生成优先级测试案例、输出可执行YAML测试计划。

● 无害

安装命令点击复制

官方npx clawhub@latest install gen-test-plan

镜像加速npx clawhub@latest install gen-test-plan --registry https://cn.clawhub-mirror.com

技能文档

分析仓库的技术栈、分支更改与默认分支的差异，并生成一个可执行的YAML测试计划，重点关注用户可见的影响。这是一个端到端测试计划——而不是自动化测试包装器。生成的计划将由一个自治代理执行，模拟真实人工QA测试人员的行为：启动真实二进制文件、访问真实端点、与真实数据库交互，并验证真实的可观察行为。

Analyze the repository's tech stack, branch changes vs default, and generate an executable YAML test plan focused on user-facing impact.

This is an E2E test plan — not an automated test wrapper. The generated plan will be executed by an autonomous agent acting exactly as a human QA tester would: launching real binaries, hitting real endpoints, interacting with real databases, and verifying real observable behavior.

Critical Rule: No Automated Test Duplication

NEVER generate test steps that re-run the project's existing automated test suite. This means:

No cargo test, pytest, npm test, go test, mix test, or equivalent commands as test steps
No wrapping unit/integration test modules in a test case
No "run the tests and check they pass" — that's CI's job, not QA's

If you find yourself writing a test step that invokes the project's test runner, stop and rethink. Ask: "What would a human tester do to verify this feature works?" The answer is never "run the unit tests."

What E2E test steps look like:

Build the binary and run it with real arguments, check stdout/stderr/exit code
Start a server and hit it with curl
Run a CLI command that writes to a real database, then query the database to verify
Launch the TUI and verify it renders (via screenshot or process lifecycle)
Chain multiple commands that exercise a full user workflow end-to-end

Arguments

--base : Base branch to diff against (default: main)
Path: Target directory (default: current working directory)

Step 1: Gather Repository Context

# Get current branch
git rev-parse --abbrev-ref HEAD
# Get default base branch (try origin/main, then origin/master)
git rev-parse --verify origin/main >/dev/null 2>&1 && echo "main" || echo "master"
# Get changed files vs base
git diff --name-only $(git merge-base HEAD origin/main)..HEAD# Get commit messages for context
git log --oneline $(git merge-base HEAD origin/main)..HEAD

Capture:

current_branch: Branch name
base_branch: Default branch to compare against
changed_files: List of modified files
commit_messages: What the PR is about

Step 2: Detect Tech Stack

See references/stack-discovery.md for stack detection commands, entrypoint discovery, port discovery, and trace rules.

Step 3: Discover User-Facing Entry Points

A "user-facing entry point" is anything a human interacts with: CLI subcommands, HTTP endpoints, UI routes, TUI screens, gRPC services, database migrations, or configuration files that affect runtime behavior.

CLI Applications (Rust/clap, Python/argparse/click, Go/cobra)

# Rust (clap) — look for Subcommand derives and command enums
grep -rn "Subcommand\|#\[command\]" --include=".rs" | head -20# Python (click/typer/argparse)
grep -rn "@click.command\|@app.command\|add_parser\|add_subparser" --include=".py" | head -20# Go (cobra)
grep -rn "cobra.Command\|AddCommand" --include=".go" | head -20

Build a map of:

CLI subcommands: command name + description + file:line

Required arguments and flags per subcommand

Environment variables the binary reads (grep for env, std::env::var, os.Getenv, os.environ)

HTTP/API Services

Python (FastAPI/Flask):

grep -rn "@app\.\(get\|post\|put\|delete\|patch\)" --include=".py" | head -20
grep -rn "@router\.\(get\|post\|put\|delete\|patch\)" --include=".py" | head -20

Node.js (Express/Fastify):

grep -rn "app\.\(get\|post\|put\|delete\)" --include=".ts" --include=".js" | head -20
grep -rn "router\.\(get\|post\|put\|delete\)" --include=".ts" --include=".js" | head -20

Rust (axum/actix/rocket):

grep -rn "Router::new\|\.route(\|#\[get\]\|#\[post\]\|HttpServer" --include=".rs" | head -20

Go (net/http, gin, chi):

grep -rn "http.HandleFunc\|r.GET\|r.POST\|router.Get\|router.Post" --include=".go" | head -20

Elixir (Phoenix):

grep -rn "get \"/\|post \"/\|pipe_through\|live \"/\|scope \"/\"" --include=".ex" | head -20

Browser UI Routes

grep -rn "createBrowserRouter\|

`Database and Migrations`

# SQL migrations
ls migrations/ db/migrate/ priv/repo/migrations/ 2>/dev/null
# Schema files
ls schema.sql schema.prisma 2>/dev/null

`Build a consolidated map of:`


CLI subcommands: name + args + file:line
API endpoints: method + path + file:line
UI routes: path + component + file:line
Database migrations: filename + what they create/alter
Configuration: env vars and config files that affect behavior
Step 4: Trace Changes to Entry Points
For each changed file, determine if it affects user-facing functionality:
Direct entry point change — File contains route definitions
Import chain analysis — Find what imports the changed file and trace up to entry points
Architecture-aware tracing — Read the project's CLAUDE.md, README, or architecture docs to understand data flow and module relationships, rather than relying solely on grep
Document the trace path in test context
Import Chain Analysis by Ecosystem
# Rust — use/mod/crate references and workspace deps
grep -rn "use.\|mod " --include=".rs"
grep -rn "" --include="Cargo.toml"
# Python — from/import
grep -rn "from.\|import." --include=".py"
# TypeScript/JavaScript — import/require
grep -rn "from.\|require." --include=".ts" --include=".tsx" --include=".js" --include=".jsx"
# Elixir — alias/import/use
grep -rn "alias.\|import.\|use." --include=".ex" --include=".exs"# Go — package references
grep -rn "\." --include=".go"
If the ecosystem is not covered above, or grep results are inconclusive, read the project's CLAUDE.md, README, or architecture docs to understand the module graph and trace the data flow from changed files to user-facing entry points.
Classify Affected Entry Points
After identifying all affected entry points, classify each one:
Category Description Examples Priority
Core functionality Entry points where the feature does its actual work for the end user Chat endpoint, API action, data processing pipeline, generation flow High — test first
Configuration/admin Entry points where the feature is set up, toggled, or configured Settings page, admin dashboard, preference toggles, dropdown selections Lower — test after core

Classification rules:
Ask: "If a user wanted to use this feature (not configure it), which entry point would they interact with?" — that's core functionality

A settings page that adds a new dropdown option is configuration; the endpoint that actually uses* that option is core functionality
The same changed file (e.g., a new provider module) may affect both a settings page and a functional endpoint — both must be traced
Requirement: At least one test must target a core functionality entry point before generating configuration/admin tests. If no core functionality entry point can be identified, explicitly document why and flag this for manual review.
Output:
For each affected entry point, document:
Which changed files affect it
The import/dependency chain
Classification: Core functionality or Configuration/admin
Why this entry point needs testing
Step 5: Generate Test Cases
See references/test-case-generation.md for the detailed API/browser templates, prioritization rules, and test-case guidelines.
Step 6: Write YAML Test Plan
Create the test plan file:
mkdir -p docs/testing
Write to docs/testing/test-plan.yaml:
version: 1
metadata:
  branch: 
  base: 
  generated: 
  changes_summary: |
    
setup:
  stack:
    - type: 
      package_manager: 
  prerequisites:
    # Services or infrastructure the tests need running
    - name: 
      check: 
  build:
    # Commands to build the project artifacts (binaries, assets, etc.)
    - 
  services:
    # Long-running processes to start before tests (servers, watchers, etc.)
    # Omit if the project is a CLI tool or library with no server component
    - command: 
      health_check:
        url: http://localhost:/health
        timeout: 30
  env:
    # Environment variables needed by tests (use ${VAR} for secrets)
    DATABASE_URL: "${DATABASE_URL}"
tests:
  # CLI test example — run the built binary with real arguments:
  - id: TC-01
    name: 
    context: |
      
    steps:
      - run: 
      - run: 
    expected: |
      
  # API test example:
  - id: TC-02
    name: 
    context: |
      
    steps:
      - action: curl
        method: GET
        url: http://localhost:/
    expected: |
      
  # Database verification example:
  - id: TC-03
    name: 
    context: |
      
    steps:
      - run: 
      - run: psql "${DATABASE_URL}" -c "SELECT ... FROM ... WHERE ..."
    expected: |
        # Browser test example (always use agent-browser CLI commands):
  - id: TC-04
    name: 
    context: |
      
    steps:
      - run: agent-browser open http://localhost:/
      - run: agent-browser snapshot -i
      - run: agent-browser click @
      - run: agent-browser snapshot -i
      - run: agent-browser screenshot evidence/tc-04.png
    expected: |
      
    evidence:
      screenshot: evidence/tc-04.png
Step 7: Report Summary
After generating the test plan:
## Test Plan Generated
File: docs/testing/test-plan.yaml
Branch:  → 
Detected Stack
Component Type Port

Tests Generated
ID Name Type Affected By
TC-01 curl/browser 
Entry Point Coverage
Covered:  entry points with tests
Unchanged:  entry points not affected by this PR
Next Steps
Review the generated test plan at docs/testing/test-plan.yaml
Adjust test values and expectations as needed
Run tests with:
   
   /beagle-testing:run-test-plan
   
Step 8: Verification
Before completing:
# Verify file was created
ls -la docs/testing/test-plan.yaml
# Validate YAML syntax
python3 -c "import yaml; yaml.safe_load(open('docs/testing/test-plan.yaml'))" && echo "Valid YAML"# Check required fields
grep -E "^version:|^metadata:|^setup:|^tests:" docs/testing/test-plan.yaml
Verification Checklist:
[ ] Test plan file created at docs/testing/test-plan.yaml
[ ] YAML is syntactically valid
[ ] At least one test case generated
[ ] Setup commands match detected stack
[ ] Each test has id, name, steps, and expected fields
[ ] No automated test duplication: Grep every run: and command: step in the plan for test runner invocations (cargo test, pytest, npm test, go test, mix test, jest, vitest, mocha, etc.). If ANY step invokes the project's test runner, the plan fails verification. Remove those steps and replace them with real E2E actions.
[ ] Behavioral coverage: At least one test exercises the primary behavioral change described in changes_summary. Re-read the changes_summary and commit messages — if they describe a capability (e.g., "adds Claude Code as a new LLM provider") but no test invokes that capability (e.g., sends a message through the provider), the plan fails verification. Add the missing core functionality test before completing.
[ ] No config-only plans: If all tests target configuration/admin entry points and zero tests target core functionality entry points, the plan is incomplete. Go back to Step 4, identify the core functionality entry points, and add tests for them.
Rules
E2E only — every test step must exercise the real built artifact (binary, server, UI) as a human would. Never wrap automated test suites.
Always create docs/testing/ directory if it doesn't exist
Generate at least one test per affected entry point
Include context explaining why each test matters (trace from changes)
Use natural language for expected field (agent will interpret)
CLI projects: Test steps should invoke the actual binary with real arguments and verify stdout, stderr, exit codes, and side effects (files created, database rows written, processes spawned)
Server projects: Start the server in setup, test via curl/agent-browser
Library-only projects with no binary or server: If the change is purely internal library code with no user-facing entry point (no CLI, no server, no UI), state this explicitly and generate tests that exercise the library through its public API via a small driver script — not by running the test suite
Default to conservative port detection (8000 for API, 5173/3000 for frontend)
Browser automation steps MUST use agent-browser CLI commands (e.g., agent-browser open, agent-browser snapshot -i, agent-browser click @ref) — never use abstract action syntax
Always agent-browser snapshot -i before interacting with elements and after navigation/DOM changes
Use agent-browser screenshot  to capture evidence for browser tests
Use ${ENV_VAR} syntax for secrets, never hardcode credentials
If no user-facing changes detected, explain why and suggest manual verification


数据来源：ClawHub ↗ · 中文优化：龙虾技能库



  
    
       OpenClaw 技能定制 / 插件定制 / 私有工作流定制
       免费技能或插件可能存在安全风险，如需更匹配、更安全的方案，建议联系付费定制
    
    了解定制服务

Category	Description	Examples	Priority
Core functionality	Entry points where the feature does its actual work for the end user	Chat endpoint, API action, data processing pipeline, generation flow	High — test first
Configuration/admin	Entry points where the feature is set up, toggled, or configured	Settings page, admin dashboard, preference toggles, dropdown selections	Lower — test after core

License

运行时依赖

版本

安装命令 点击复制

技能文档

Critical Rule: No Automated Test Duplication

Arguments

Step 1: Gather Repository Context

Step 2: Detect Tech Stack

Step 3: Discover User-Facing Entry Points

CLI Applications (Rust/clap, Python/argparse/click, Go/cobra)

HTTP/API Services

Browser UI Routes

Database and Migrations

Build a consolidated map of:

Step 4: Trace Changes to Entry Points

Import Chain Analysis by Ecosystem

Classify Affected Entry Points

Step 5: Generate Test Cases

Step 6: Write YAML Test Plan

Step 7: Report Summary

Detected Stack

Tests Generated

Entry Point Coverage

Next Steps

Step 8: Verification

Rules

安装命令点击复制

`Database and Migrations`

`Build a consolidated map of:`