areas/software/qa/skills/performance-testing/SKILL.md
Design and execute load/stress tests with k6, establish SLO baselines, and identify bottlenecks.
npx skillsauth add sawrus/agent-guides performance-testingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Expertise: k6 load/stress/soak tests, SLO baselines, threshold gates, bottleneck identification, CI integration.
Load test → Validate behavior at expected production traffic (steady state)
Stress test → Find breaking point; gradually increase load until failure
Soak test → Detect memory leaks / degradation over time (run 1-8 hours)
Spike test → Simulate sudden traffic burst (10x normal in seconds)
// tests/performance/create-order.k6.js
import http from 'k6/http';
import { check, sleep } from 'k6';
import { Trend, Counter } from 'k6/metrics';
const orderCreationDuration = new Trend('order_creation_duration');
const failedOrders = new Counter('failed_orders');
export const options = {
stages: [
{ duration: '2m', target: 50 }, // Ramp up
{ duration: '5m', target: 50 }, // Steady state
{ duration: '2m', target: 200 }, // Stress
{ duration: '2m', target: 0 }, // Ramp down
],
thresholds: {
// These are your SLO gates — CI fails if breached
http_req_duration: ['p(95)<500', 'p(99)<2000'],
http_req_failed: ['rate<0.01'], // < 1% errors
order_creation_duration: ['p(99)<3000'],
},
};
const BASE_URL = __ENV.BASE_URL || 'http://localhost:8000';
export function setup() {
// Create test auth token once before load test
const res = http.post(`${BASE_URL}/auth/token`, JSON.stringify({
username: '[email protected]', password: __ENV.TEST_PASSWORD,
}), { headers: { 'Content-Type': 'application/json' } });
return { token: res.json('access_token') };
}
export default function (data) {
const headers = {
'Authorization': `Bearer ${data.token}`,
'Content-Type': 'application/json',
};
const start = Date.now();
const res = http.post(
`${BASE_URL}/api/v1/orders`,
JSON.stringify({ items: [{ product_id: 'prod_123', quantity: 1 }] }),
{ headers },
);
orderCreationDuration.add(Date.now() - start);
const ok = check(res, {
'status is 201': (r) => r.status === 201,
'has order id': (r) => r.json('id') !== undefined,
});
if (!ok) failedOrders.add(1);
sleep(1); // Think time between requests
}
# 1. Run baseline on known-good release
k6 run --env BASE_URL=https://staging.myapp.com tests/performance/create-order.k6.js \
--out json=results/baseline-$(date +%Y%m%d).json
# 2. Extract key metrics
k6 stats results/baseline-$(date +%Y%m%d).json | jq '{
p50: .metrics.http_req_duration.values["p(50)"],
p95: .metrics.http_req_duration.values["p(95)"],
p99: .metrics.http_req_duration.values["p(99)"],
error_rate: .metrics.http_req_failed.values.rate
}'
# 3. Store in performance-baselines.json
# 4. On each release candidate: compare vs. baseline
# - p99 regression > 20% → block deploy
# - p99 regression 10–20% → warning + required justification
# .github/workflows/perf.yml
performance:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run k6 load test
uses: grafana/[email protected]
with:
filename: tests/performance/create-order.k6.js
env:
BASE_URL: ${{ secrets.STAGING_URL }}
TEST_PASSWORD: ${{ secrets.LOADTEST_PASSWORD }}
- name: Upload results
uses: actions/upload-artifact@v4
with:
name: k6-results
path: results/
After a test run showing latency regression, investigate in this order:
# 1. Check application metrics during test
# → CPU saturation? (> 80% sustained) → vertical scale or optimize hot path
# → Memory growing? → potential leak
# → Goroutines / threads spiking? → connection pool or lock contention
# 2. Check DB during load test
SELECT query, calls, mean_exec_time
FROM pg_stat_statements ORDER BY mean_exec_time DESC LIMIT 5;
-- High mean_exec_time on a simple query → missing index
# 3. Check connection pool
SELECT count(*), state FROM pg_stat_activity GROUP BY state;
-- Many 'idle in transaction' → long transactions not being released
-- Many 'waiting' → pool too small; increase max_connections or pool size
# 4. Check for lock contention
SELECT pid, wait_event_type, wait_event, query
FROM pg_stat_activity WHERE wait_event IS NOT NULL;
## Performance Test Report — [Date] — [Endpoint/Flow]
### Configuration
- Tool: k6 | Duration: 10 min | Peak VUs: 200
- Target: POST /api/v1/orders | Environment: staging
### Results vs. SLO
| Metric | Baseline | This run | SLO | Status |
|--------|----------|----------|-----|--------|
| p50 latency | 45ms | 52ms | < 200ms | ✅ |
| p95 latency | 120ms | 310ms | < 500ms | ✅ |
| p99 latency | 280ms | 890ms | < 2000ms | ⚠️ +218% |
| Error rate | 0.02% | 0.08% | < 1% | ✅ |
### Root cause of p99 regression
N+1 query in OrderRepository.list_items() — loading items one by one inside a loop.
### Remediation
Add joinedload() to the query. Estimated p99 improvement: ~400ms.
testing
QA Expert for writing E2E tests, test scenarios, test plans, and ensuring test coverage quality.
development
Expert UI/UX design intelligence for creating distinctive, high-craft, and mobile-first interfaces. Focuses on premium aesthetics, touch-first ergonomics, and Flutter performance.
development
Code Review Expert for static analysis, security auditing, architecture review, and ensuring code quality standards.
development
Babysit a GitHub pull request after creation by continuously polling review comments, CI checks/workflow runs, and mergeability state until the PR is merged/closed or user help is required. Diagnose failures, retry likely flaky failures up to 3 times, auto-fix/push branch-related issues when appropriate, and keep watching open PRs so fresh review feedback is surfaced promptly. Use when the user asks Codex to monitor a PR, watch CI, handle review comments, or keep an eye on failures and feedback on an open PR.