1kalin/afrexai-qa-engine/SKILL.md
# QA & Test Engineering Command Center Complete quality assurance system — from test strategy to automation frameworks, coverage analysis, and release readiness. Works for any stack, any team size. ## When to Use - Planning test strategy for a new feature or project - Writing unit, integration, or E2E tests - Reviewing test quality and coverage gaps - Setting up test automation and CI/CD quality gates - Performance testing and load analysis - Security testing checklist - Bug triage and defect
npx skillsauth add openclaw/skills 1kalin/afrexai-qa-engineInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Complete quality assurance system — from test strategy to automation frameworks, coverage analysis, and release readiness. Works for any stack, any team size.
Before writing any tests, define the strategy:
# test-strategy.yaml
project: "[name]"
scope: "[feature/module/full product]"
risk_level: high | medium | low
stack:
language: "[TypeScript/Python/Java/Go]"
framework: "[React/Express/Django/Spring]"
test_runner: "[Jest/Vitest/pytest/JUnit/Go test]"
e2e_tool: "[Playwright/Cypress/Selenium]"
# What are we testing?
test_scope:
- area: "[e.g., Auth module]"
risk: high
test_types: [unit, integration, e2e]
priority: 1
- area: "[e.g., Settings page]"
risk: low
test_types: [unit]
priority: 3
# What's NOT in scope (and why)
exclusions:
- "[e.g., Third-party widget — covered by vendor]"
# Quality targets
targets:
line_coverage: 80
branch_coverage: 70
critical_path_coverage: 100
max_flaky_rate: 2%
max_test_duration_unit: 10ms
max_test_duration_integration: 500ms
max_test_duration_e2e: 30s
Not everything needs the same testing depth. Use the risk matrix:
| Risk Level | Unit Tests | Integration | E2E | Manual/Exploratory | |-----------|-----------|-------------|-----|-------------------| | Critical (payments, auth, data loss) | 95%+ coverage | Full API coverage | Happy + error paths | Exploratory session | | High (core features, user-facing) | 85%+ coverage | Key integrations | Happy path | Spot check | | Medium (secondary features) | 70%+ coverage | Critical paths only | Smoke only | On release | | Low (admin, internal tools) | 50%+ coverage | None | None | None |
Follow the pyramid — not the ice cream cone:
/ E2E \ ← Few (5-10%) — slow, expensive, brittle
/ Integr. \ ← Some (15-25%) — API contracts, DB queries
/ Unit \ ← Many (65-80%) — fast, isolated, cheap
Anti-pattern: Ice cream cone (mostly E2E, few unit tests) = slow CI, flaky builds, expensive maintenance.
Decision rule: Can this be tested at a lower level? → Test it there.
Every unit test follows AAA (Arrange-Act-Assert):
1. ARRANGE — Set up test data, mocks, state
2. ACT — Call the function/method under test
3. ASSERT — Verify the output matches expectations
For each function/method, verify:
Mock these:
DO NOT mock these:
Mock rule of thumb: If removing the mock would make the test hit the network, file system, or database → mock it. Otherwise → don't.
Use the pattern: [unit] [scenario] [expected result]
Examples:
calculateTotal returns 0 for empty cartvalidateEmail throws for missing @ symbolparseDate handles ISO 8601 with timezone offsetMetrics that matter: | Metric | Target | Why | |--------|--------|-----| | Line coverage | 80%+ | Basic completeness | | Branch coverage | 70%+ | Catches missed if/else paths | | Function coverage | 90%+ | Ensures all functions are tested | | Critical path coverage | 100% | Business-critical code fully verified |
Coverage traps to avoid:
Integration tests verify that components work TOGETHER:
Pattern 1: API Contract Testing
1. Start test server (or use supertest/httptest)
2. Send HTTP request with specific payload
3. Assert: status code, response body shape, headers
4. Assert: database state changed correctly
5. Assert: side effects triggered (emails, events)
Pattern 2: Database Integration
1. Start test database (SQLite in-memory or test container)
2. Run migrations
3. Seed test data
4. Execute query/operation
5. Assert: data matches expectations
6. Teardown (truncate or rollback transaction)
Pattern 3: External Service
1. Record real API response (VCR/nock/wiremock)
2. Replay recorded response in tests
3. Assert: your code handles the response correctly
4. Also test: timeout, 500 error, malformed response
E2E tests verify complete user journeys. They're expensive — be strategic:
Test these E2E:
DON'T test these E2E:
test_name: "[User journey name]"
preconditions:
- "[User is logged in]"
- "[Product exists in catalog]"
steps:
- action: "Navigate to /products"
verify: "Product list is visible"
- action: "Click 'Add to Cart' on Product A"
verify: "Cart badge shows 1"
- action: "Click 'Checkout'"
verify: "Checkout form displayed"
- action: "Fill payment details and submit"
verify: "Order confirmation page with order ID"
postconditions:
- "Order exists in database with status 'paid'"
- "Confirmation email sent"
max_duration: 30s
Flaky tests are the #1 CI killer. Handle them:
Flaky Test Triage:
Flaky rate target: < 2% of total test runs
| Type | Purpose | When | |------|---------|------| | Load test | Normal traffic handling | Before every release | | Stress test | Find breaking point | Quarterly or before scaling | | Spike test | Sudden traffic burst | Before marketing campaigns | | Soak test | Memory leaks over time | Monthly or after major changes | | Capacity test | Max users/throughput | Planning infrastructure |
test_name: "[API/Page] Load Test"
target: "[URL or endpoint]"
baseline:
p50_response: "[current p50 ms]"
p95_response: "[current p95 ms]"
p99_response: "[current p99 ms]"
error_rate: "[current %]"
scenarios:
- name: "Normal load"
vus: 50 # virtual users
duration: 5m
ramp_up: 30s
thresholds:
p95_response: "< 500ms"
error_rate: "< 1%"
- name: "Peak load"
vus: 200
duration: 10m
ramp_up: 1m
thresholds:
p95_response: "< 2000ms"
error_rate: "< 5%"
- name: "Stress test"
vus: 500
duration: 5m
ramp_up: 2m
# Find the breaking point — no thresholds, observe
Track these per endpoint:
| Metric | Green | Yellow | Red | |--------|-------|--------|-----| | p50 response | < 200ms | 200-500ms | > 500ms | | p95 response | < 500ms | 500ms-2s | > 2s | | p99 response | < 1s | 1-5s | > 5s | | Error rate | < 0.1% | 0.1-1% | > 1% | | Throughput | > baseline | 80-100% baseline | < 80% | | CPU usage | < 60% | 60-80% | > 80% | | Memory usage | < 70% | 70-85% | > 85% | | DB query time | < 50ms avg | 50-200ms | > 200ms |
| Symptom | Likely Cause | Fix | |---------|-------------|-----| | Slow API response | N+1 queries | Batch/join queries | | Memory climbing | Object retention | Profile heap, fix leaks | | Timeout spikes | Connection pool exhaustion | Increase pool, add queuing | | Slow page load | Large bundle | Code split, lazy load | | DB bottleneck | Missing index | Add index on WHERE/JOIN columns | | High CPU | Synchronous compute | Move to worker/queue |
Run through these for every feature/release:
Authentication & Authorization:
Input Validation:
../Data Protection:
*)Infrastructure:
| # | Vulnerability | Test For | |---|--------------|----------| | A01 | Broken Access Control | Access other users' resources, bypass role checks | | A02 | Cryptographic Failures | Weak hashing, plaintext secrets, expired certs | | A03 | Injection | SQL, XSS, command, LDAP injection | | A04 | Insecure Design | Business logic flaws, missing rate limits | | A05 | Security Misconfiguration | Default creds, verbose errors, open ports | | A06 | Vulnerable Components | Outdated deps with known CVEs | | A07 | Authentication Failures | Brute force, weak passwords, session fixation | | A08 | Data Integrity Failures | Unsigned updates, CI/CD pipeline injection | | A09 | Logging Failures | Missing audit logs, no alerting on breaches | | A10 | SSRF | Internal network access via user-controlled URLs |
bug_id: "[auto or manual]"
title: "[Short description of the bug]"
severity: P0-critical | P1-high | P2-medium | P3-low
reporter: "[name]"
date: "[YYYY-MM-DD]"
environment:
os: "[OS + version]"
browser: "[Browser + version]"
app_version: "[version/commit]"
steps_to_reproduce:
1. "[Step 1]"
2. "[Step 2]"
3. "[Step 3]"
expected_result: "[What should happen]"
actual_result: "[What actually happens]"
frequency: "always | intermittent | once"
screenshots: "[links]"
logs: "[relevant log output]"
| Level | Definition | SLA | Example | |-------|-----------|-----|---------| | P0 Critical | System down, data loss, security breach | Fix in 4 hours | Payment processing broken | | P1 High | Major feature broken, no workaround | Fix in 24 hours | Users can't login | | P2 Medium | Feature broken with workaround | Fix this sprint | Search returns wrong results sometimes | | P3 Low | Minor issue, cosmetic | Fix when convenient | Button alignment off by 2px |
1. Review all new bugs (unassigned)
2. For each bug:
a. Reproduce — can you trigger it?
b. Classify severity (P0-P3)
c. Estimate fix effort (S/M/L)
d. Assign to owner + sprint
e. Link to related bugs/stories
3. Review P0/P1 bugs from last week — are they fixed?
4. Close bugs that can't be reproduced (after 2 attempts)
5. Update metrics dashboard
Track weekly:
| Metric | Formula | Target | |--------|---------|--------| | Bug escape rate | Bugs found in prod / total bugs | < 10% | | Mean time to fix (P0) | Avg hours from report to deploy | < 8 hours | | Mean time to fix (P1) | Avg hours from report to deploy | < 48 hours | | Bug reopen rate | Reopened bugs / closed bugs | < 5% | | Test escape analysis | Bugs that SHOULD have been caught | Track & reduce | | Open bug count | Total open by severity | Trending down |
Before shipping to production:
Code Quality:
Coverage & Quality Gates:
Performance:
Security:
Operational Readiness:
Score 0-100 across 5 dimensions:
| Dimension | Weight | Scoring | |-----------|--------|---------| | Test coverage | 25% | 100 if targets met, -10 per gap area | | Bug status | 25% | 100 if 0 P0/P1, -20 per open P0, -10 per P1 | | Performance | 20% | 100 if all green, -15 per yellow, -30 per red | | Security | 20% | 100 if clean, -25 per critical, -15 per high | | Operational | 10% | 100 if checklist complete, -20 per missing item |
Ship threshold: ≥ 80 overall, no dimension below 60
Configure these gates in your CI pipeline:
# Quality gate configuration
gates:
- name: "Lint"
stage: pre-commit
command: "npm run lint"
blocking: true
- name: "Unit Tests"
stage: commit
command: "npm test -- --coverage"
blocking: true
thresholds:
pass_rate: 100%
coverage_line: 80%
coverage_branch: 70%
- name: "Integration Tests"
stage: merge
command: "npm run test:integration"
blocking: true
thresholds:
pass_rate: 100%
- name: "Security Scan"
stage: merge
command: "npm audit --audit-level=high"
blocking: true
- name: "E2E Smoke"
stage: staging
command: "npm run test:e2e:smoke"
blocking: true
thresholds:
pass_rate: 100%
- name: "Performance"
stage: staging
command: "npm run test:perf"
blocking: false # Alert only
thresholds:
p95_regression: 20%
Rate your team 1-5:
| Level | Description | Characteristics | |-------|------------|-----------------| | 1 — Manual | All testing is manual | No automation, long release cycles | | 2 — Reactive | Some unit tests, no CI | Tests written after bugs, not before | | 3 — Structured | Test pyramid, CI pipeline | Unit + integration, automated on push | | 4 — Proactive | Full automation, quality gates | E2E + perf + security in pipeline, TDD | | 5 — Optimized | Self-healing, predictive | Flaky auto-quarantine, AI-assisted testing, continuous deployment |
review_date: "[YYYY-MM-DD]"
metrics:
total_tests: 0
pass_rate_7d: "0%"
flaky_tests: 0
flaky_rate: "0%"
avg_suite_duration: "0s"
coverage_line: "0%"
coverage_branch: "0%"
actions:
quarantined: [] # Tests moved to flaky suite
deleted: [] # Tests removed (obsolete/unfixable)
fixed: [] # Flaky tests fixed this week
added: [] # New tests added
trends:
coverage_delta: "+0%" # vs last week
flaky_delta: "+0" # vs last week
duration_delta: "+0s" # vs last week
notes: ""
| Anti-Pattern | Problem | Fix |
|-------------|---------|-----|
| Sleeping tests | sleep(2000) instead of waiting | Use explicit waits/polling |
| Test interdependence | Test B relies on Test A's state | Isolate — each test sets up its own state |
| Assertionless tests | Test runs code but doesn't assert | Add meaningful assertions |
| Brittle selectors | CSS selectors that break on redesign | Use data-testid or aria roles |
| God test | One test verifying 20 things | Split into focused tests |
| Mock overload | Everything mocked, nothing real tested | Only mock external boundaries |
| Hardcoded data | Tests break when seed data changes | Use factories/builders |
| Ignoring test output | "It passed, ship it" | Review WHY it passed — is the assertion meaningful? |
Tell the agent:
tools
Use when the user wants to connect to, test, or use the McDonalds service at mcp.mcd.cn, including checking authentication, probing MCP endpoints, listing tools, or calling McDonalds MCP tools through a reusable local CLI.
development
Web scraping platform — Twitter/X data, Vinted marketplace, and general web scraping API
development
SlowMist AI Agent Security Review — comprehensive security framework for skills, repositories, URLs, on-chain addresses, and products (Claude Code version)
data-ai
去除中文文本中的 AI 写作痕迹,使其读起来自然。基于维基百科 AI 写作特征指南,检测 24 种 AI 模式。触发词:humanizer-cn、去除 AI 痕迹、去除 AI 写作痕迹、中文文本人性化。