Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

microsoft/integration-testing-discipline

Name: integration-testing-discipline
Author: microsoft

skills/integration-testing-discipline/SKILL.md

npx skillsauth add microsoft/amplifier-bundle-superpowers integration-testing-discipline

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Integration Testing Discipline

The Core Principle

OBSERVE FIRST. FIX IN BATCHES. NEVER FIX DURING ACTIVE E2E RUNS.

When running integration tests or E2E validation, your job is to CAPTURE ALL failures first, then fix them systematically. Making code changes during a running test invalidates that test run.

The Four Principles

Principle 1: Don't Fix During Observation Runs

DO: Let the E2E run complete (or fail with a real error), capture ALL failure points, then fix everything as a coordinated batch.

DON'T: See one failure, fix it immediately, and continue the same E2E run.

Why: Code changes during E2E runs invalidate the running test. You lose the ability to trust that run's results.

Example:

❌ WRONG:
1. E2E run finds validation error
2. Fix validation immediately  
3. Continue same E2E run
4. Find auth error
5. Fix auth immediately
6. Continue same E2E run
→ Result: Can't trust this run's success

✅ RIGHT:
1. E2E run finds validation error — RECORD IT
2. E2E run finds auth error — RECORD IT  
3. E2E run completes — STOP OBSERVING
4. Fix validation AND auth as a batch
5. Start fresh E2E run to validate fixes
→ Result: Clean validation of coordinated fixes

Principle 2: Long-Running Processes Are Normal

DO: Check for actual error signals: non-zero exit codes, error messages in logs, process death, hung processes.

DON'T: Declare "stuck" or "failed" based on wall clock time alone.

Expected Durations:

Container setup: 60-90 seconds
Simple spec (1 endpoint): ~13 minutes
Medium spec (4 CRUD endpoints): ~25 minutes
Complex spec (8+ endpoints): ~40 minutes
Each convergence iteration: 5-8 minutes

Example:

❌ WRONG:
"Process has been running 20 minutes with no output — it's stuck"

✅ RIGHT:
"Process has been running 20 minutes. Checking:
- Exit code: still running (0)
- Error logs: none
- New files: tracker.json updated 30s ago
- Container: process alive, making progress
→ Status: WORKING, not stuck"

Principle 3: Check Container State Directly

DO: Use docker exec to check the container's internal state directly when monitoring seems inconsistent.

DON'T: Trust API status or monitor reports when they contradict expected behavior.

Why: Monitor APIs may lag behind container reality. Always verify directly.

Example:

❌ WRONG:
"API shows no progress for 10 minutes — declaring failure"

✅ RIGHT:
"API shows no progress. Checking container directly:
$ docker exec container-name ls -la /workspace/
$ docker exec container-name cat tracker.json  
$ docker exec container-name ps aux
→ Found: New files created 2 min ago, process active
→ Status: API lag, not failure"

Principle 4: One Run, All Errors

DO: Let each run discover its full set of failures, then address them in dependency order.

DON'T: Fix errors one-by-one across multiple runs when you could batch them.

Why: Many failures have dependency relationships. Fixing in random order creates thrashing.

Example:

❌ WRONG:
Run 1: Find validation error → fix → new run
Run 2: Find auth error → fix → new run  
Run 3: Find storage error → fix → new run
→ Result: 3 full E2E cycles for related issues

✅ RIGHT:
Run 1: Find validation + auth + storage errors → RECORD ALL
Fix: Address validation first (dependency), then auth, then storage
Run 2: Validate all fixes together
→ Result: 1 observation run + 1 validation run

When to Apply

Use this discipline for ANY integration testing scenario:

E2E recipe validation
Multi-service integration tests
Container-based testing workflows
Cross-repo change validation
Production deployment validation

Use ESPECIALLY when under time pressure — the temptation to "quick fix" during observation runs is highest when deadlines loom, but disciplined observation is fastest.

Red Flags — Return to Observation Mode

If you catch yourself thinking:

"Just fix this one thing and continue the run"
"Quick patch while the E2E is running"
"I'll fix this now and the rest later"
"This failure is blocking — fix immediately"
"We're so close, just one more fix"

ALL of these mean: STOP fixing. Return to pure observation mode.

Impact

Disciplined approach: 1 observation run + 1 validation run = ~50 minutes total
Fix-during-run approach: 6+ interrupted runs = 3+ hours total
Success rate: 95% vs 60%
Stress level: Manageable vs high

microsoft/integration-testing-discipline

skills/integration-testing-discipline/SKILL.md

4 principles for E2E testing discipline — observe first, fix in batches, expect long durations, check container state directly. NO FIXES DURING OBSERVATION RUNS.

3 stars

testing

Updated Apr 20, 2026

$ install --global

skillsauth

npx skillsauth add microsoft/amplifier-bundle-superpowers integration-testing-discipline

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 20, 2026, 6:18 AM9.0s1 file scanned

SKILL.md

name:: integration-testing-discipline
description:: 4 principles for E2E testing discipline — observe first, fix in batches, expect long durations, check container state directly. NO FIXES DURING OBSERVATION RUNS.

Integration Testing Discipline

The Core Principle

OBSERVE FIRST. FIX IN BATCHES. NEVER FIX DURING ACTIVE E2E RUNS.

When running integration tests or E2E validation, your job is to CAPTURE ALL failures first, then fix them systematically. Making code changes during a running test invalidates that test run.

The Four Principles

Principle 1: Don't Fix During Observation Runs

DO: Let the E2E run complete (or fail with a real error), capture ALL failure points, then fix everything as a coordinated batch.

DON'T: See one failure, fix it immediately, and continue the same E2E run.

Why: Code changes during E2E runs invalidate the running test. You lose the ability to trust that run's results.

Example:

❌ WRONG:
1. E2E run finds validation error
2. Fix validation immediately  
3. Continue same E2E run
4. Find auth error
5. Fix auth immediately
6. Continue same E2E run
→ Result: Can't trust this run's success

✅ RIGHT:
1. E2E run finds validation error — RECORD IT
2. E2E run finds auth error — RECORD IT  
3. E2E run completes — STOP OBSERVING
4. Fix validation AND auth as a batch
5. Start fresh E2E run to validate fixes
→ Result: Clean validation of coordinated fixes

Principle 2: Long-Running Processes Are Normal

DO: Check for actual error signals: non-zero exit codes, error messages in logs, process death, hung processes.

DON'T: Declare "stuck" or "failed" based on wall clock time alone.

Expected Durations:

Container setup: 60-90 seconds
Simple spec (1 endpoint): ~13 minutes
Medium spec (4 CRUD endpoints): ~25 minutes
Complex spec (8+ endpoints): ~40 minutes
Each convergence iteration: 5-8 minutes

Example:

❌ WRONG:
"Process has been running 20 minutes with no output — it's stuck"

✅ RIGHT:
"Process has been running 20 minutes. Checking:
- Exit code: still running (0)
- Error logs: none
- New files: tracker.json updated 30s ago
- Container: process alive, making progress
→ Status: WORKING, not stuck"

Principle 3: Check Container State Directly

DO: Use docker exec to check the container's internal state directly when monitoring seems inconsistent.

DON'T: Trust API status or monitor reports when they contradict expected behavior.

Why: Monitor APIs may lag behind container reality. Always verify directly.

Example:

❌ WRONG:
"API shows no progress for 10 minutes — declaring failure"

✅ RIGHT:
"API shows no progress. Checking container directly:
$ docker exec container-name ls -la /workspace/
$ docker exec container-name cat tracker.json  
$ docker exec container-name ps aux
→ Found: New files created 2 min ago, process active
→ Status: API lag, not failure"

Principle 4: One Run, All Errors

DO: Let each run discover its full set of failures, then address them in dependency order.

DON'T: Fix errors one-by-one across multiple runs when you could batch them.

Why: Many failures have dependency relationships. Fixing in random order creates thrashing.

Example:

❌ WRONG:
Run 1: Find validation error → fix → new run
Run 2: Find auth error → fix → new run  
Run 3: Find storage error → fix → new run
→ Result: 3 full E2E cycles for related issues

✅ RIGHT:
Run 1: Find validation + auth + storage errors → RECORD ALL
Fix: Address validation first (dependency), then auth, then storage
Run 2: Validate all fixes together
→ Result: 1 observation run + 1 validation run

When to Apply

Use this discipline for ANY integration testing scenario:

E2E recipe validation
Multi-service integration tests
Container-based testing workflows
Cross-repo change validation
Production deployment validation

Use ESPECIALLY when under time pressure — the temptation to "quick fix" during observation runs is highest when deadlines loom, but disciplined observation is fastest.

Red Flags — Return to Observation Mode

If you catch yourself thinking:

"Just fix this one thing and continue the run"
"Quick patch while the E2E is running"
"I'll fix this now and the rest later"
"This failure is blocking — fix immediately"
"We're so close, just one more fix"

ALL of these mean: STOP fixing. Return to pure observation mode.

Impact

Disciplined approach: 1 observation run + 1 validation run = ~50 minutes total
Fix-during-run approach: 6+ interrupted runs = 3+ hours total
Success rate: 95% vs 60%
Stress level: Manageable vs high

Related Skills

microsoft/superpowers-reference

testing

VerifiedTrustedCommunity

Complete reference tables for Superpowers modes, agents, recipes, and anti-patterns

3SKILL.mdUpdated Apr 20, 2026

microsoft/superpowers-reference

microsoft/sdd-walkthrough

development

VerifiedTrustedCommunity

Use when about to orchestrate a subagent-driven-development execute-plan session — provides 5 realistic task scenarios with Amplifier delegate() patterns, model_role selection, status handling (DONE, DONE_WITH_CONCERNS, NEEDS_CONTEXT), and fix loops for spec and quality issues

3SKILL.mdUpdated Apr 20, 2026

microsoft/sdd-walkthrough

steipete/skill-creator

testing

VerifiedTrustedCommunity

Create, edit, improve, or audit AgentSkills. Use when creating a new skill from scratch or when asked to improve, review, audit, tidy up, or clean up an existing skill or SKILL.md file. Also use when editing or restructuring a skill directory (moving files to references/ or scripts/, removing stale content, validating against the AgentSkills spec). Triggers on phrases like "create a skill", "author a skill", "tidy up a skill", "improve this skill", "review the skill", "clean up the skill", "audit the skill".

356,423SKILL.mdUpdated Apr 13, 2026

steipete/skill-creator

steipete/healthcheck

testing

VerifiedTrustedCommunity

Host security hardening and risk-tolerance configuration for OpenClaw deployments. Use when a user asks for security audits, firewall/SSH/update hardening, risk posture, exposure review, OpenClaw cron scheduling for periodic checks, or version status checks on a machine running OpenClaw (laptop, workstation, Pi, VPS).

356,423SKILL.mdUpdated Apr 13, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/microsoft/amplifier-bundle-superpowers.git

# Copy into Claude Code skills folder (global)
cp -r amplifier-bundle-superpowers/skills/integration-testing-discipline ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

microsoft/amplifier-bundle-superpowers

3 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT