skills/load-testing-plan/SKILL.md
Write a load and performance testing plan for a service. Use when asked to create a performance test plan, write load testing documentation, define stress or soak test scenarios, or set performance regression gates for CI. Produces a complete test plan document with scenario definitions, k6/Locust script skeleton, threshold table, result interpretation guide, and CI integration steps.
npx skillsauth add mohitagw15856/pm-claude-skills load-testing-planInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Produce a complete load and performance testing plan for a service — covering test objectives, scenario definitions, tooling configuration, success thresholds, and CI integration. A good load testing plan eliminates ambiguity about what "performance is acceptable" means, so engineers can run tests and get a pass/fail answer without having to interpret raw numbers themselves.
Ask for these if not already provided:
Author: [Name] | Team: [Team name] Date: [Date] | Review cycle: Before each major release and quarterly Testing tool: [k6 / Locust / JMeter / Gatling] Test environment: [Environment name and URL]
What we are testing: [Service name] handles [describe function — e.g. "user authentication requests from the mobile and web clients"]. This plan validates that the service meets its SLOs under expected and elevated traffic conditions.
In scope:
Out of scope:
Every scenario has explicit pass/fail thresholds. A test run FAILS if any threshold is breached.
| Metric | Baseline scenario | Stress scenario | Spike scenario | Soak scenario | |---|---|---|---|---| | p50 latency | < [X] ms | < [X × 1.5] ms | < [X × 2] ms | < [X] ms | | p95 latency | < [Y] ms | < [Y × 1.5] ms | < [Y × 2] ms | < [Y] ms | | p99 latency | < [Z] ms | < [Z × 2] ms | < [Z × 3] ms | < [Z] ms | | Error rate | < [0.1]% | < [1]% | < [2]% | < [0.1]% | | Throughput | ≥ [N] RPS | ≥ [N × 3] RPS | N/A | ≥ [N] RPS | | Failed requests | 0 (5xx) | < [threshold] | < [threshold] | 0 (5xx) |
SLO reference: These thresholds are derived from the service SLOs — p99 < [Z ms], error rate < [0.1]%, availability [99.9]%.
Baseline traffic (current production):
Simulated user behaviour:
Purpose: Confirm the service performs acceptably under normal production load. Duration: 10 minutes Load profile: Ramp to [N] RPS over 2 minutes, hold for 8 minutes. Concurrency: [N] virtual users
Pass criteria: All thresholds in the Baseline column of the targets table above.
Purpose: Find the breaking point — how much load can the service handle before SLOs are breached? Duration: 20–30 minutes Load profile: Ramp from [N] RPS (baseline) to [N × 5] RPS in 5-minute steps. Hold each step for 5 minutes. Stop at first SLO breach. Concurrency: Scales with RPS target
What to record:
Purpose: Simulate a sudden traffic surge (flash sale, viral event, bot attack). Duration: 15 minutes Load profile: Hold at [N] RPS (baseline) for 3 minutes, spike to [N × 10] RPS instantly, hold for 5 minutes, drop back to baseline for 7 minutes.
What to record:
Purpose: Detect memory leaks, connection pool exhaustion, and slow degradation over time. Duration: 4–8 hours (run overnight) Load profile: Steady [N × 1.5] RPS (50% above baseline) for entire duration.
What to watch:
| Component | Requirement | Notes | |---|---|---| | Service under test | Isolated from production | [N] replicas, matching prod resource limits | | Database | Separate instance with production-scale data | Seed script in section 7 | | Cache (Redis/Memcached) | Empty at test start | Ensures cold-start conditions are tested | | Load generator | Separate from service under test | [N] vCPUs, [N] GB RAM minimum | | Network | Low-latency path to service | Do not run generator on same host |
Before every test run, ensure the environment has:
# Seed test users (needed for authenticated endpoint tests)
[seed command or script path — e.g. python scripts/seed_load_test_users.py --count 10000]
# Seed test data for read endpoints
[seed command — e.g. ./scripts/seed_products.sh --count 50000]
# Verify seed completed
[verification command — e.g. psql $DB_URL -c "SELECT COUNT(*) FROM users WHERE load_test=true"]
Test data rules:
load_test=true for easy cleanup[cleanup command]import http from 'k6/http';
import { check, sleep } from 'k6';
import { Rate, Trend } from 'k6/metrics';
// Custom metrics
const errorRate = new Rate('error_rate');
const endpointLatency = new Trend('endpoint_latency', true);
// Test configuration — override per scenario
export const options = {
scenarios: {
baseline: {
executor: 'ramping-vus',
startVUs: 0,
stages: [
{ duration: '2m', target: [BASELINE_VUS] },
{ duration: '8m', target: [BASELINE_VUS] },
{ duration: '1m', target: 0 },
],
},
},
thresholds: {
http_req_duration: [
'p(95)<[Y_MS]',
'p(99)<[Z_MS]',
],
error_rate: ['rate<0.01'],
http_req_failed: ['rate<0.01'],
},
};
// Auth helper — get token once per VU
export function setup() {
const loginRes = http.post('[BASE_URL]/auth/login', JSON.stringify({
username: `load_test_user_${Math.floor(Math.random() * 10000)}@example.com`,
password: '[LOAD_TEST_PASSWORD]',
}), { headers: { 'Content-Type': 'application/json' } });
check(loginRes, { 'login ok': (r) => r.status === 200 });
return { token: loginRes.json('access_token') };
}
export default function (data) {
const headers = {
Authorization: `Bearer ${data.token}`,
'Content-Type': 'application/json',
};
// Endpoint 1: [Description]
const res1 = http.get('[BASE_URL]/[endpoint-1]', { headers });
check(res1, {
'[endpoint-1] status 200': (r) => r.status === 200,
'[endpoint-1] latency < [X]ms': (r) => r.timings.duration < [X],
});
errorRate.add(res1.status >= 400);
endpointLatency.add(res1.timings.duration, { endpoint: '[endpoint-1]' });
sleep(Math.random() * [THINK_TIME_MAX] + [THINK_TIME_MIN]);
// Endpoint 2: [Description]
const res2 = http.post('[BASE_URL]/[endpoint-2]',
JSON.stringify({ [key]: '[value]' }),
{ headers }
);
check(res2, {
'[endpoint-2] status 201': (r) => r.status === 201,
});
errorRate.add(res2.status >= 400);
}
from locust import HttpUser, task, between
import random
class [ServiceName]User(HttpUser):
wait_time = between([THINK_TIME_MIN], [THINK_TIME_MAX])
token = None
def on_start(self):
"""Called once per simulated user — authenticate."""
user_id = random.randint(1, 10000)
response = self.client.post("/auth/login", json={
"username": f"load_test_user_{user_id}@example.com",
"password": "[LOAD_TEST_PASSWORD]",
})
self.token = response.json()["access_token"]
self.headers = {"Authorization": f"Bearer {self.token}"}
@task([WEIGHT_1]) # Weight = relative frequency
def [endpoint_1_task](self):
"""[Endpoint 1 description]"""
with self.client.get(
"/[endpoint-1]",
headers=self.headers,
catch_response=True
) as response:
if response.elapsed.total_seconds() > [LATENCY_THRESHOLD]:
response.failure(f"Too slow: {response.elapsed.total_seconds()}s")
@task([WEIGHT_2])
def [endpoint_2_task](self):
"""[Endpoint 2 description]"""
self.client.post(
"/[endpoint-2]",
json={"[key]": "[value]"},
headers=self.headers,
)
# k6 — run baseline scenario
k6 run --env BASE_URL=https://[test-env-url] scripts/load_test.js
# k6 — run stress scenario with output to InfluxDB
k6 run --out influxdb=http://[influxdb-host]:8086/k6 \
--env SCENARIO=stress \
scripts/load_test.js
# Locust — headless run
locust -f locustfile.py \
--headless \
--users [N] \
--spawn-rate [N] \
--run-time 10m \
--host https://[test-env-url] \
--csv=results/[run-id]
# Locust — web UI (interactive)
locust -f locustfile.py --host https://[test-env-url]
Capture all of the following during every test run. Missing any of these makes result comparison unreliable.
| Metric | Source | Why it matters | |---|---|---| | p50, p95, p99, p999 latency per endpoint | Load tool | SLO validation | | Error rate (4xx, 5xx) per endpoint | Load tool | SLO validation | | Requests/sec (throughput) | Load tool | Capacity baseline | | CPU utilisation (%) | Infra monitoring | Saturation signal | | Memory utilisation (%) | Infra monitoring | Leak detection | | GC pause time / frequency | JVM/Go metrics | Latency spike root cause | | DB connection pool: active/idle/waiting | DB metrics | Pool exhaustion detection | | DB query latency (p99) | DB metrics | Downstream bottleneck | | Cache hit rate | Cache metrics | Miss storm detection | | Pod/instance count (if autoscaling) | Infra | Scaling behaviour | | Network in/out bytes | Infra | Bandwidth saturation |
After each test run, work through this analysis in order:
Step 1 — Pass/fail check Compare all captured metrics against the thresholds in Section 2. Record pass/fail per scenario.
Step 2 — Latency distribution Plot the full latency histogram, not just percentiles. A bimodal distribution (two humps) indicates two distinct code paths — investigate the slow hump.
Step 3 — Error correlation If errors occurred, correlate them with:
Step 4 — Saturation analysis Graph CPU, memory, and connection pool over time. If any resource reached 80%+ of capacity, it is a candidate bottleneck — even if SLOs passed this run.
Step 5 — Compare to baseline run Every run should be compared to the previous run. A 10% regression in p99 latency warrants investigation even if it is still within SLO.
Regression classification:
| Change | Classification | Action | |---|---|---| | p99 within 5% of previous run | Green — no regression | No action | | p99 5–15% worse than previous | Yellow — watch | Investigate before next release | | p99 >15% worse than previous | Red — regression | Block release, file ticket | | Error rate increased vs previous | Red — regression | Block release | | SLO threshold breached | Critical | Block release, page on-call |
Add load tests as a gated step in the release pipeline. Run the baseline scenario on every release candidate; run all scenarios weekly.
# Example: GitHub Actions step (adapt for your CI platform)
load-test:
runs-on: ubuntu-latest
needs: [deploy-staging]
if: github.ref == 'refs/heads/main'
steps:
- uses: actions/checkout@v3
- name: Install k6
run: |
curl -s https://dl.k6.io/key.gpg | sudo apt-key add -
echo "deb https://dl.k6.io/deb stable main" | sudo tee /etc/apt/sources.list.d/k6.list
sudo apt-get update && sudo apt-get install k6
- name: Seed test data
run: [seed command]
- name: Run baseline load test
run: |
k6 run \
--env BASE_URL=${{ secrets.LOAD_TEST_ENV_URL }} \
--out json=results.json \
scripts/load_test.js
env:
LOAD_TEST_ENV_URL: ${{ secrets.LOAD_TEST_ENV_URL }}
- name: Check thresholds
run: |
# k6 exits with non-zero if any threshold fails — this step fails the build
echo "k6 threshold check complete"
- name: Upload results
uses: actions/upload-artifact@v3
if: always()
with:
name: load-test-results-${{ github.run_id }}
path: results.json
- name: Cleanup test data
if: always()
run: [cleanup command]
CI gates summary:
development
Build a framework for creating shareable, high-reach social media content. Use when asked to plan viral content, develop a shareable content strategy, create a hook writing system, or build a repeatable process for content that gets shared. Produces a platform-specific viral content framework with hook formulas, content structures, shareability triggers, and a content testing system.
development
Generate article or newsletter thumbnail candidates using the Gemini API from inside Claude Code. Claude reads article copy, proposes composition concepts, writes image generation prompts incorporating brand specs, calls Gemini to generate the images, evaluates the results via computer vision, and returns ranked candidates with rationale. Use when asked to create thumbnails, generate cover images, or produce visual candidates for an article or newsletter.
testing
Flips Claude's default from "find reasons you're right" to "find reasons you're wrong." A genuine thinking partner, not a mirror with grammar. Use before high-stakes decisions, plans, assumptions, or pitches you haven't stress-tested.
development
Scrapes a Substack Notes page and exports engagement data (likes, comments, restacks) to a formatted .xlsx file with conditional formatting and summary stats.