plugins/pm-engineering/skills/load-testing-plan/SKILL.md
Write a load and performance testing plan for a service. Use when asked to create a performance test plan, write load testing documentation, define stress or soak test scenarios, or set performance regression gates for CI. Produces a complete test plan document with scenario definitions, k6/Locust script skeleton, threshold table, result interpretation guide, and CI integration steps.
npx skillsauth add mohitagw15856/pm-claude-skills load-testing-planInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Produce a complete load and performance testing plan for a service — covering test objectives, scenario definitions, tooling configuration, success thresholds, and CI integration. A good load testing plan eliminates ambiguity about what "performance is acceptable" means, so engineers can run tests and get a pass/fail answer without having to interpret raw numbers themselves.
Ask for these if not already provided:
Author: [Name] | Team: [Team name] Date: [Date] | Review cycle: Before each major release and quarterly Testing tool: [k6 / Locust / JMeter / Gatling] Test environment: [Environment name and URL]
What we are testing: [Service name] handles [describe function — e.g. "user authentication requests from the mobile and web clients"]. This plan validates that the service meets its SLOs under expected and elevated traffic conditions.
In scope:
Out of scope:
Every scenario has explicit pass/fail thresholds. A test run FAILS if any threshold is breached.
| Metric | Baseline scenario | Stress scenario | Spike scenario | Soak scenario | |---|---|---|---|---| | p50 latency | < [X] ms | < [X × 1.5] ms | < [X × 2] ms | < [X] ms | | p95 latency | < [Y] ms | < [Y × 1.5] ms | < [Y × 2] ms | < [Y] ms | | p99 latency | < [Z] ms | < [Z × 2] ms | < [Z × 3] ms | < [Z] ms | | Error rate | < [0.1]% | < [1]% | < [2]% | < [0.1]% | | Throughput | ≥ [N] RPS | ≥ [N × 3] RPS | N/A | ≥ [N] RPS | | Failed requests | 0 (5xx) | < [threshold] | < [threshold] | 0 (5xx) |
SLO reference: These thresholds are derived from the service SLOs — p99 < [Z ms], error rate < [0.1]%, availability [99.9]%.
Baseline traffic (current production):
Simulated user behaviour:
Purpose: Confirm the service performs acceptably under normal production load. Duration: 10 minutes Load profile: Ramp to [N] RPS over 2 minutes, hold for 8 minutes. Concurrency: [N] virtual users
Pass criteria: All thresholds in the Baseline column of the targets table above.
Purpose: Find the breaking point — how much load can the service handle before SLOs are breached? Duration: 20–30 minutes Load profile: Ramp from [N] RPS (baseline) to [N × 5] RPS in 5-minute steps. Hold each step for 5 minutes. Stop at first SLO breach. Concurrency: Scales with RPS target
What to record:
Purpose: Simulate a sudden traffic surge (flash sale, viral event, bot attack). Duration: 15 minutes Load profile: Hold at [N] RPS (baseline) for 3 minutes, spike to [N × 10] RPS instantly, hold for 5 minutes, drop back to baseline for 7 minutes.
What to record:
Purpose: Detect memory leaks, connection pool exhaustion, and slow degradation over time. Duration: 4–8 hours (run overnight) Load profile: Steady [N × 1.5] RPS (50% above baseline) for entire duration.
What to watch:
| Component | Requirement | Notes | |---|---|---| | Service under test | Isolated from production | [N] replicas, matching prod resource limits | | Database | Separate instance with production-scale data | Seed script in section 7 | | Cache (Redis/Memcached) | Empty at test start | Ensures cold-start conditions are tested | | Load generator | Separate from service under test | [N] vCPUs, [N] GB RAM minimum | | Network | Low-latency path to service | Do not run generator on same host |
Before every test run, ensure the environment has:
# Seed test users (needed for authenticated endpoint tests)
[seed command or script path — e.g. python scripts/seed_load_test_users.py --count 10000]
# Seed test data for read endpoints
[seed command — e.g. ./scripts/seed_products.sh --count 50000]
# Verify seed completed
[verification command — e.g. psql $DB_URL -c "SELECT COUNT(*) FROM users WHERE load_test=true"]
Test data rules:
load_test=true for easy cleanup[cleanup command]import http from 'k6/http';
import { check, sleep } from 'k6';
import { Rate, Trend } from 'k6/metrics';
// Custom metrics
const errorRate = new Rate('error_rate');
const endpointLatency = new Trend('endpoint_latency', true);
// Test configuration — override per scenario
export const options = {
scenarios: {
baseline: {
executor: 'ramping-vus',
startVUs: 0,
stages: [
{ duration: '2m', target: [BASELINE_VUS] },
{ duration: '8m', target: [BASELINE_VUS] },
{ duration: '1m', target: 0 },
],
},
},
thresholds: {
http_req_duration: [
'p(95)<[Y_MS]',
'p(99)<[Z_MS]',
],
error_rate: ['rate<0.01'],
http_req_failed: ['rate<0.01'],
},
};
// Auth helper — get token once per VU
export function setup() {
const loginRes = http.post('[BASE_URL]/auth/login', JSON.stringify({
username: `load_test_user_${Math.floor(Math.random() * 10000)}@example.com`,
password: '[LOAD_TEST_PASSWORD]',
}), { headers: { 'Content-Type': 'application/json' } });
check(loginRes, { 'login ok': (r) => r.status === 200 });
return { token: loginRes.json('access_token') };
}
export default function (data) {
const headers = {
Authorization: `Bearer ${data.token}`,
'Content-Type': 'application/json',
};
// Endpoint 1: [Description]
const res1 = http.get('[BASE_URL]/[endpoint-1]', { headers });
check(res1, {
'[endpoint-1] status 200': (r) => r.status === 200,
'[endpoint-1] latency < [X]ms': (r) => r.timings.duration < [X],
});
errorRate.add(res1.status >= 400);
endpointLatency.add(res1.timings.duration, { endpoint: '[endpoint-1]' });
sleep(Math.random() * [THINK_TIME_MAX] + [THINK_TIME_MIN]);
// Endpoint 2: [Description]
const res2 = http.post('[BASE_URL]/[endpoint-2]',
JSON.stringify({ [key]: '[value]' }),
{ headers }
);
check(res2, {
'[endpoint-2] status 201': (r) => r.status === 201,
});
errorRate.add(res2.status >= 400);
}
from locust import HttpUser, task, between
import random
class [ServiceName]User(HttpUser):
wait_time = between([THINK_TIME_MIN], [THINK_TIME_MAX])
token = None
def on_start(self):
"""Called once per simulated user — authenticate."""
user_id = random.randint(1, 10000)
response = self.client.post("/auth/login", json={
"username": f"load_test_user_{user_id}@example.com",
"password": "[LOAD_TEST_PASSWORD]",
})
self.token = response.json()["access_token"]
self.headers = {"Authorization": f"Bearer {self.token}"}
@task([WEIGHT_1]) # Weight = relative frequency
def [endpoint_1_task](self):
"""[Endpoint 1 description]"""
with self.client.get(
"/[endpoint-1]",
headers=self.headers,
catch_response=True
) as response:
if response.elapsed.total_seconds() > [LATENCY_THRESHOLD]:
response.failure(f"Too slow: {response.elapsed.total_seconds()}s")
@task([WEIGHT_2])
def [endpoint_2_task](self):
"""[Endpoint 2 description]"""
self.client.post(
"/[endpoint-2]",
json={"[key]": "[value]"},
headers=self.headers,
)
# k6 — run baseline scenario
k6 run --env BASE_URL=https://[test-env-url] scripts/load_test.js
# k6 — run stress scenario with output to InfluxDB
k6 run --out influxdb=http://[influxdb-host]:8086/k6 \
--env SCENARIO=stress \
scripts/load_test.js
# Locust — headless run
locust -f locustfile.py \
--headless \
--users [N] \
--spawn-rate [N] \
--run-time 10m \
--host https://[test-env-url] \
--csv=results/[run-id]
# Locust — web UI (interactive)
locust -f locustfile.py --host https://[test-env-url]
Capture all of the following during every test run. Missing any of these makes result comparison unreliable.
| Metric | Source | Why it matters | |---|---|---| | p50, p95, p99, p999 latency per endpoint | Load tool | SLO validation | | Error rate (4xx, 5xx) per endpoint | Load tool | SLO validation | | Requests/sec (throughput) | Load tool | Capacity baseline | | CPU utilisation (%) | Infra monitoring | Saturation signal | | Memory utilisation (%) | Infra monitoring | Leak detection | | GC pause time / frequency | JVM/Go metrics | Latency spike root cause | | DB connection pool: active/idle/waiting | DB metrics | Pool exhaustion detection | | DB query latency (p99) | DB metrics | Downstream bottleneck | | Cache hit rate | Cache metrics | Miss storm detection | | Pod/instance count (if autoscaling) | Infra | Scaling behaviour | | Network in/out bytes | Infra | Bandwidth saturation |
After each test run, work through this analysis in order:
Step 1 — Pass/fail check Compare all captured metrics against the thresholds in Section 2. Record pass/fail per scenario.
Step 2 — Latency distribution Plot the full latency histogram, not just percentiles. A bimodal distribution (two humps) indicates two distinct code paths — investigate the slow hump.
Step 3 — Error correlation If errors occurred, correlate them with:
Step 4 — Saturation analysis Graph CPU, memory, and connection pool over time. If any resource reached 80%+ of capacity, it is a candidate bottleneck — even if SLOs passed this run.
Step 5 — Compare to baseline run Every run should be compared to the previous run. A 10% regression in p99 latency warrants investigation even if it is still within SLO.
Regression classification:
| Change | Classification | Action | |---|---|---| | p99 within 5% of previous run | Green — no regression | No action | | p99 5–15% worse than previous | Yellow — watch | Investigate before next release | | p99 >15% worse than previous | Red — regression | Block release, file ticket | | Error rate increased vs previous | Red — regression | Block release | | SLO threshold breached | Critical | Block release, page on-call |
Add load tests as a gated step in the release pipeline. Run the baseline scenario on every release candidate; run all scenarios weekly.
# Example: GitHub Actions step (adapt for your CI platform)
load-test:
runs-on: ubuntu-latest
needs: [deploy-staging]
if: github.ref == 'refs/heads/main'
steps:
- uses: actions/checkout@v3
- name: Install k6
run: |
curl -s https://dl.k6.io/key.gpg | sudo apt-key add -
echo "deb https://dl.k6.io/deb stable main" | sudo tee /etc/apt/sources.list.d/k6.list
sudo apt-get update && sudo apt-get install k6
- name: Seed test data
run: [seed command]
- name: Run baseline load test
run: |
k6 run \
--env BASE_URL=${{ secrets.LOAD_TEST_ENV_URL }} \
--out json=results.json \
scripts/load_test.js
env:
LOAD_TEST_ENV_URL: ${{ secrets.LOAD_TEST_ENV_URL }}
- name: Check thresholds
run: |
# k6 exits with non-zero if any threshold fails — this step fails the build
echo "k6 threshold check complete"
- name: Upload results
uses: actions/upload-artifact@v3
if: always()
with:
name: load-test-results-${{ github.run_id }}
path: results.json
- name: Cleanup test data
if: always()
run: [cleanup command]
CI gates summary:
development
Analyse competitor moves and translate them into strategic implications for your product roadmap. Use when a competitor announces a new feature, pricing change, partnership, or strategic shift, or when producing a periodic competitive intelligence report. Produces a categorised signal analysis with reactive-vs-proactive assessment, threat ratings, specific roadmap implications, and recommended responses with owners.
development
Build a community management playbook for a brand's social media channels. Use when asked to create guidelines for managing comments, DMs, and community interactions, define a moderation policy, or build response frameworks for social media community managers. Produces a complete playbook with response templates, escalation paths, moderation rules, and tone guidelines.
development
Activate a 4-stage coding discipline framework that forces Claude to plan before coding, isolate changes on a branch, write tests first, and self-review output twice before presenting it. Use when starting a complex coding task, when past Claude sessions produced broken first drafts, or when you want to prevent rework cycles. Produces a confirmed written plan, isolated feature branch, test-first implementation, and a double-reviewed output with a correctness and code-quality checklist.
development
Optimize an article for Answer Engine Optimization (AEO) — restructuring content so AI engines like ChatGPT, Perplexity, and Claude can extract, quote, and cite it. Rewrites headings as questions, drops 50-80 word answer capsules, audits paragraph length, and flags trust signals. Use when asked to AEO-optimize, make content AI-readable, improve AI citation chances, or adapt an article for answer engines.