.kiro/skills/qe-agentic-quality-engineering/SKILL.md
AI agents as force multipliers for quality work. Core skill for all 19 QE agents using PACT principles.
npx skillsauth add proffesor-for-testing/agentic-qe qe-agentic-quality-engineeringInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
<default_to_action> When implementing agentic QE or coordinating agents:
Task tool with agent typeaqe/learning/* namespaceQuick Agent Selection:
qe-test-generatorqe-coverage-analyzerqe-quality-gateqe-security-scannerqe-performance-testerqe-fleet-commanderCritical Success Factors:
| Principle | Agent Behavior | Human Role | |-----------|---------------|------------| | Proactive | Analyze pre-merge, predict risk | Set guardrails | | Autonomous | Execute tests, fix flaky tests | Review critical | | Collaborative | Multi-agent coordination | Provide context | | Targeted | Risk-based prioritization | Define risk areas |
| Category | Agents | Primary Use | |----------|--------|-------------| | Core Testing (5) | test-generator, test-executor, coverage-analyzer, quality-gate, quality-analyzer | Daily testing | | Performance/Security (2) | performance-tester, security-scanner | Non-functional | | Strategic (3) | requirements-validator, production-intelligence, fleet-commander | Planning | | Advanced (4) | regression-risk-analyzer, test-data-architect, api-contract-validator, flaky-test-hunter | Specialized | | Visual/Chaos (2) | visual-tester, chaos-engineer | Edge cases | | Deployment (1) | deployment-readiness | Release | | Analysis (1) | code-complexity | Maintainability |
Hierarchical: fleet-commander → [generators] → [executors] → quality-gate
Mesh: test-gen ↔ coverage ↔ quality (peer decisions)
Sequential: risk-analyzer → test-gen → executor → coverage → gate
✅ 10x deployment frequency with same/better quality ✅ Coverage gaps detected in real-time ✅ Bugs caught pre-production ❌ Agents acting without human oversight on critical decisions ❌ Deploying all 19 agents at once (start with 1-2)
| Stage | Approach | Limitation | |-------|----------|------------| | Traditional | Manual everything | Human bottleneck | | Automation | Scripts + fixed scenarios | Needs orchestration | | Agentic | AI agents + human judgment | Requires trust-building |
Core Premise: Agents amplify human expertise for 10x scale.
1. Intelligent Test Generation
// Agent analyzes code change, generates targeted tests
const tests = await qeTestGenerator.generate(prDiff);
// → Happy path, edge cases, error handling tests
2. Pattern Detection - Scan logs, find anomalies, correlate errors
3. Adaptive Strategy - Adjust test focus based on risk signals
4. Root Cause Analysis - Link failures to code changes, suggest fixes
aqe/test-plan/* - Test planning decisions
aqe/coverage/* - Coverage analysis results
aqe/quality/* - Quality metrics and gates
aqe/learning/* - Patterns and Q-values
aqe/coordination/* - Cross-agent state
CRITICAL: Always use aqe memory store with persist: true for learnings.
1. Store data to persistent memory:
# Store test plan decisions (persisted to .agentic-qe/memory.db)
aqe memory store \
--key "aqe/test-plan/pr-123" \
--namespace "aqe/test-plan" \
--value '{"prNumber":123,"riskLevel":"medium","requiredCoverage":85,"testTypes":["unit","integration"]}' \
--ttl 604800 \
--json
2. Retrieve prior learnings before task:
# Query patterns before starting test generation
aqe memory search \
--pattern "aqe/learning/patterns/test-generation/*" \
--namespace "aqe/learning" \
--json
3. Store coverage analysis results:
aqe memory store \
--key "aqe/coverage/auth-module" \
--namespace "aqe/coverage" \
--value '{"moduleId":"auth-module","currentCoverage":78,"gaps":["error-handling","edge-cases"],"priority":"high"}' \
--ttl 1209600 \
--json
For coordinated multi-agent tasks, use the STATUS → PROGRESS → COMPLETE pattern:
# PHASE 1: STATUS - Task starting
aqe memory store \
--key "aqe/coordination/task-123/status" \
--namespace "aqe/coordination" \
--value '{"status":"running","agent":"qe-test-generator"}' \
--json
# PHASE 2: PROGRESS - Intermediate updates
aqe memory store \
--key "aqe/coordination/task-123/progress" \
--namespace "aqe/coordination" \
--value '{"progress":50,"action":"generating-unit-tests","testsGenerated":25}' \
--json
# PHASE 3: COMPLETE - Task finished
aqe memory store \
--key "aqe/coordination/task-123/complete" \
--namespace "aqe/coordination" \
--value '{"status":"complete","result":"success","testsGenerated":47,"coverageAchieved":92.3}' \
--json
| Event | Trigger | Subscribers |
|-------|---------|-------------|
| test:generated | New tests created | executor, coverage |
| coverage:gap | Gap detected | test-generator |
| quality:decision | Gate evaluated | fleet-commander |
| security:finding | Vulnerability found | quality-gate |
// 1. Risk analysis
const risks = await Task("Analyze PR", prDiff, "qe-regression-risk-analyzer");
// 2. Generate tests for risks
const tests = await Task("Generate tests", risks, "qe-test-generator");
// 3. Execute + analyze
const results = await Task("Run tests", tests, "qe-test-executor");
const coverage = await Task("Check coverage", results, "qe-coverage-analyzer");
// 4. Quality decision
const decision = await Task("Evaluate", {results, coverage}, "qe-quality-gate");
// → GO/NO-GO with rationale
| Phase | Duration | Goal | Agent(s) | |-------|----------|------|----------| | Experiment | Weeks 1-4 | Validate one use case | 1 agent | | Integrate | Months 2-3 | CI/CD pipeline | 3-4 agents | | Scale | Months 4-6 | Multiple use cases | 8+ agents | | Evolve | Ongoing | Continuous learning | Full fleet |
# Week 1: Deploy single agent
aqe agent spawn qe-test-generator
# Weeks 2-3: Generate tests for 10 PRs
# Track: bugs found, test quality, review time
# Week 4: Measure impact
aqe agent metrics qe-test-generator
# → Tests: 150, Bugs: 12, Time saved: 8h
| Do | Don't | |----|-------| | Start with one agent, one use case | Deploy all 18 at once | | Build feedback loops early | Deploy and forget | | Human reviews agent output | Auto-merge without review | | Measure bugs caught, time saved | Track vanity metrics (test count) | | Build trust gradually | Give full autonomy immediately |
Month 1: Agent suggests → Human decides
Month 2: Agent acts → Human reviews after
Month 3: Agent autonomous on low-risk
Month 4: Agent handles critical with oversight
coordination:
topology: hierarchical
commander: qe-fleet-commander
memory_namespace: aqe/coordination
blackboard_topic: qe-fleet
preload_skills:
- agentic-quality-engineering # Always (this skill)
- risk-based-testing # For prioritization
- quality-metrics # For measurement
agent_assignments:
qe-test-generator: [api-testing-patterns, tdd-london-chicago]
qe-coverage-analyzer: [quality-metrics, risk-based-testing]
qe-security-scanner: [security-testing, risk-based-testing]
qe-performance-tester: [performance-testing]
holistic-testing-pact - PACT principles deep diverisk-based-testing - Prioritize agent focusquality-metrics - Measure agent effectivenessapi-testing-patterns, security-testing, performance-testing - Specialized testing.claude/agents/aqe agent --helpaqe fleet statusSuccess Metric: Deploy 10x more frequently with same or better quality through intelligent agent collaboration.
development
Apply XP practices including pair programming, ensemble programming, continuous integration, and sustainable pace. Use when implementing agile development practices, improving team collaboration, or adopting technical excellence practices.
development
Warehouse Management System testing patterns for inventory operations, pick/pack/ship workflows, wave management, EDI X12/EDIFACT compliance, RF/barcode scanning, and WMS-ERP integration. Use when testing WMS platforms (Blue Yonder, Manhattan, SAP EWM).
testing
Advanced visual regression testing with pixel-perfect comparison, AI-powered diff analysis, responsive design validation, and cross-browser visual consistency. Use when detecting UI regressions, validating designs, or ensuring visual consistency.
development
Comprehensive truth scoring, code quality verification, and automatic rollback system with 0.95 accuracy threshold for ensuring high-quality agent outputs and codebase reliability.