skills/integration-testing-discipline/SKILL.md
4 principles for E2E testing discipline — observe first, fix in batches, expect long durations, check container state directly. NO FIXES DURING OBSERVATION RUNS.
npx skillsauth add microsoft/amplifier-bundle-superpowers integration-testing-disciplineInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
OBSERVE FIRST. FIX IN BATCHES. NEVER FIX DURING ACTIVE E2E RUNS.
When running integration tests or E2E validation, your job is to CAPTURE ALL failures first, then fix them systematically. Making code changes during a running test invalidates that test run.
DO: Let the E2E run complete (or fail with a real error), capture ALL failure points, then fix everything as a coordinated batch.
DON'T: See one failure, fix it immediately, and continue the same E2E run.
Why: Code changes during E2E runs invalidate the running test. You lose the ability to trust that run's results.
Example:
❌ WRONG:
1. E2E run finds validation error
2. Fix validation immediately
3. Continue same E2E run
4. Find auth error
5. Fix auth immediately
6. Continue same E2E run
→ Result: Can't trust this run's success
✅ RIGHT:
1. E2E run finds validation error — RECORD IT
2. E2E run finds auth error — RECORD IT
3. E2E run completes — STOP OBSERVING
4. Fix validation AND auth as a batch
5. Start fresh E2E run to validate fixes
→ Result: Clean validation of coordinated fixes
DO: Check for actual error signals: non-zero exit codes, error messages in logs, process death, hung processes.
DON'T: Declare "stuck" or "failed" based on wall clock time alone.
Expected Durations:
Example:
❌ WRONG:
"Process has been running 20 minutes with no output — it's stuck"
✅ RIGHT:
"Process has been running 20 minutes. Checking:
- Exit code: still running (0)
- Error logs: none
- New files: tracker.json updated 30s ago
- Container: process alive, making progress
→ Status: WORKING, not stuck"
DO: Use docker exec to check the container's internal state directly when monitoring seems inconsistent.
DON'T: Trust API status or monitor reports when they contradict expected behavior.
Why: Monitor APIs may lag behind container reality. Always verify directly.
Example:
❌ WRONG:
"API shows no progress for 10 minutes — declaring failure"
✅ RIGHT:
"API shows no progress. Checking container directly:
$ docker exec container-name ls -la /workspace/
$ docker exec container-name cat tracker.json
$ docker exec container-name ps aux
→ Found: New files created 2 min ago, process active
→ Status: API lag, not failure"
DO: Let each run discover its full set of failures, then address them in dependency order.
DON'T: Fix errors one-by-one across multiple runs when you could batch them.
Why: Many failures have dependency relationships. Fixing in random order creates thrashing.
Example:
❌ WRONG:
Run 1: Find validation error → fix → new run
Run 2: Find auth error → fix → new run
Run 3: Find storage error → fix → new run
→ Result: 3 full E2E cycles for related issues
✅ RIGHT:
Run 1: Find validation + auth + storage errors → RECORD ALL
Fix: Address validation first (dependency), then auth, then storage
Run 2: Validate all fixes together
→ Result: 1 observation run + 1 validation run
Use this discipline for ANY integration testing scenario:
Use ESPECIALLY when under time pressure — the temptation to "quick fix" during observation runs is highest when deadlines loom, but disciplined observation is fastest.
If you catch yourself thinking:
ALL of these mean: STOP fixing. Return to pure observation mode.
testing
Complete reference tables for Superpowers modes, agents, recipes, and anti-patterns
development
Use when about to orchestrate a subagent-driven-development execute-plan session — provides 5 realistic task scenarios with Amplifier delegate() patterns, model_role selection, status handling (DONE, DONE_WITH_CONCERNS, NEEDS_CONTEXT), and fix loops for spec and quality issues
testing
Create, edit, improve, or audit AgentSkills. Use when creating a new skill from scratch or when asked to improve, review, audit, tidy up, or clean up an existing skill or SKILL.md file. Also use when editing or restructuring a skill directory (moving files to references/ or scripts/, removing stale content, validating against the AgentSkills spec). Triggers on phrases like "create a skill", "author a skill", "tidy up a skill", "improve this skill", "review the skill", "clean up the skill", "audit the skill".
testing
Host security hardening and risk-tolerance configuration for OpenClaw deployments. Use when a user asks for security audits, firewall/SSH/update hardening, risk posture, exposure review, OpenClaw cron scheduling for periodic checks, or version status checks on a machine running OpenClaw (laptop, workstation, Pi, VPS).