Load Testing Plan Skill

Produce a complete load and performance testing plan for a service — covering test objectives, scenario definitions, tooling configuration, success thresholds, and CI integration. A good load testing plan eliminates ambiguity about what "performance is acceptable" means, so engineers can run tests and get a pass/fail answer without having to interpret raw numbers themselves.

Required Inputs

Ask for these if not already provided:

Service name and key endpoints — which endpoints are under test (path, method, typical request/response shape)
Current traffic baseline — current requests/sec, p50/p99 latency, error rate under normal load
Peak traffic expectations — expected peak RPS (e.g. 10× baseline for flash sales, or seasonality peak)
SLO targets — latency SLOs (p99 < X ms), error rate SLO (< Y%), availability target
Preferred testing tool — k6, Locust, JMeter, Gatling, or no preference
Test environment availability — dedicated load test environment, staging, or production (with traffic shaping)

Output Format

Load Testing Plan: [Service Name]

Author: [Name] | Team: [Team name] Date: [Date] | Review cycle: Before each major release and quarterly Testing tool: [k6 / Locust / JMeter / Gatling] Test environment: [Environment name and URL]

1. Objectives and Scope

What we are testing: [Service name] handles [describe function — e.g. "user authentication requests from the mobile and web clients"]. This plan validates that the service meets its SLOs under expected and elevated traffic conditions.

In scope:

[Endpoint 1: METHOD /path — description]
[Endpoint 2: METHOD /path — description]
[Endpoint 3: METHOD /path — description]

Out of scope:

[Any endpoints explicitly excluded and why — e.g. "admin APIs — low traffic, excluded from load test"]
[Third-party integrations that cannot be load-tested — mock them instead]

2. Performance Targets (Success Criteria)

Every scenario has explicit pass/fail thresholds. A test run FAILS if any threshold is breached.

| Metric | Baseline scenario | Stress scenario | Spike scenario | Soak scenario | |---|---|---|---|---| | p50 latency | < [X] ms | < [X × 1.5] ms | < [X × 2] ms | < [X] ms | | p95 latency | < [Y] ms | < [Y × 1.5] ms | < [Y × 2] ms | < [Y] ms | | p99 latency | < [Z] ms | < [Z × 2] ms | < [Z × 3] ms | < [Z] ms | | Error rate | < [0.1]% | < [1]% | < [2]% | < [0.1]% | | Throughput | ≥ [N] RPS | ≥ [N × 3] RPS | N/A | ≥ [N] RPS | | Failed requests | 0 (5xx) | < [threshold] | < [threshold] | 0 (5xx) |

SLO reference: These thresholds are derived from the service SLOs — p99 < [Z ms], error rate < [0.1]%, availability [99.9]%.

3. Traffic Model

Baseline traffic (current production):

Average RPS: [N] req/sec
Peak RPS (observed): [N] req/sec
Request distribution by endpoint:
- [Endpoint 1]: [X]% of traffic
- [Endpoint 2]: [Y]% of traffic
- [Endpoint 3]: [Z]% of traffic

Simulated user behaviour:

Think time between requests: [X–Y] seconds (randomised)
Session duration: [N] minutes average
Authenticated vs anonymous ratio: [X]%/[Y]%
Geographic distribution: [Region 1 X]%, [Region 2 Y]%

4. Test Scenarios

Scenario 1: Baseline (Steady-State)

Purpose: Confirm the service performs acceptably under normal production load. Duration: 10 minutes Load profile: Ramp to [N] RPS over 2 minutes, hold for 8 minutes. Concurrency: [N] virtual users

Pass criteria: All thresholds in the Baseline column of the targets table above.

Scenario 2: Stress Test

Purpose: Find the breaking point — how much load can the service handle before SLOs are breached? Duration: 20–30 minutes Load profile: Ramp from [N] RPS (baseline) to [N × 5] RPS in 5-minute steps. Hold each step for 5 minutes. Stop at first SLO breach. Concurrency: Scales with RPS target

What to record:

RPS at which p99 latency first exceeds SLO
RPS at which error rate first exceeds SLO
Whether the service recovers when load drops back to baseline

Scenario 3: Spike Test

Purpose: Simulate a sudden traffic surge (flash sale, viral event, bot attack). Duration: 15 minutes Load profile: Hold at [N] RPS (baseline) for 3 minutes, spike to [N × 10] RPS instantly, hold for 5 minutes, drop back to baseline for 7 minutes.

What to record:

Latency during spike and recovery
Whether the service sheds load gracefully (rate limiting, queue depth)
Time to recover to baseline latency after spike ends

Scenario 4: Soak / Endurance Test

Purpose: Detect memory leaks, connection pool exhaustion, and slow degradation over time. Duration: 4–8 hours (run overnight) Load profile: Steady [N × 1.5] RPS (50% above baseline) for entire duration.

What to watch:

Memory usage trend over time (should not grow unboundedly)
Error rate trend (should be flat, not creeping up)
GC pause frequency (JVM/Go services)
Database connection pool utilisation
p99 latency trend (should not creep up over hours)

5. Test Environment Requirements

Infrastructure

| Component | Requirement | Notes | |---|---|---| | Service under test | Isolated from production | [N] replicas, matching prod resource limits | | Database | Separate instance with production-scale data | Seed script in section 7 | | Cache (Redis/Memcached) | Empty at test start | Ensures cold-start conditions are tested | | Load generator | Separate from service under test | [N] vCPUs, [N] GB RAM minimum | | Network | Low-latency path to service | Do not run generator on same host |

Data Seeding

Before every test run, ensure the environment has:

# Seed test users (needed for authenticated endpoint tests)
[seed command or script path — e.g. python scripts/seed_load_test_users.py --count 10000]

# Seed test data for read endpoints
[seed command — e.g. ./scripts/seed_products.sh --count 50000]

# Verify seed completed
[verification command — e.g. psql $DB_URL -c "SELECT COUNT(*) FROM users WHERE load_test=true"]

Test data rules:

Never use real production user data in load tests
Tag all test-generated records with load_test=true for easy cleanup
Run cleanup after each test: [cleanup command]

6. Tooling Setup

k6 Script Skeleton

import http from 'k6/http';
import { check, sleep } from 'k6';
import { Rate, Trend } from 'k6/metrics';

// Custom metrics
const errorRate = new Rate('error_rate');
const endpointLatency = new Trend('endpoint_latency', true);

// Test configuration — override per scenario
export const options = {
  scenarios: {
    baseline: {
      executor: 'ramping-vus',
      startVUs: 0,
      stages: [
        { duration: '2m', target: [BASELINE_VUS] },
        { duration: '8m', target: [BASELINE_VUS] },
        { duration: '1m', target: 0 },
      ],
    },
  },
  thresholds: {
    http_req_duration: [
      'p(95)<[Y_MS]',
      'p(99)<[Z_MS]',
    ],
    error_rate: ['rate<0.01'],
    http_req_failed: ['rate<0.01'],
  },
};

// Auth helper — get token once per VU
export function setup() {
  const loginRes = http.post('[BASE_URL]/auth/login', JSON.stringify({
    username: `load_test_user_${Math.floor(Math.random() * 10000)}@example.com`,
    password: '[LOAD_TEST_PASSWORD]',
  }), { headers: { 'Content-Type': 'application/json' } });

  check(loginRes, { 'login ok': (r) => r.status === 200 });
  return { token: loginRes.json('access_token') };
}

export default function (data) {
  const headers = {
    Authorization: `Bearer ${data.token}`,
    'Content-Type': 'application/json',
  };

  // Endpoint 1: [Description]
  const res1 = http.get('[BASE_URL]/[endpoint-1]', { headers });
  check(res1, {
    '[endpoint-1] status 200': (r) => r.status === 200,
    '[endpoint-1] latency < [X]ms': (r) => r.timings.duration < [X],
  });
  errorRate.add(res1.status >= 400);
  endpointLatency.add(res1.timings.duration, { endpoint: '[endpoint-1]' });

  sleep(Math.random() * [THINK_TIME_MAX] + [THINK_TIME_MIN]);

  // Endpoint 2: [Description]
  const res2 = http.post('[BASE_URL]/[endpoint-2]',
    JSON.stringify({ [key]: '[value]' }),
    { headers }
  );
  check(res2, {
    '[endpoint-2] status 201': (r) => r.status === 201,
  });
  errorRate.add(res2.status >= 400);
}

Locust Script Skeleton (alternative)

from locust import HttpUser, task, between
import random

class [ServiceName]User(HttpUser):
    wait_time = between([THINK_TIME_MIN], [THINK_TIME_MAX])
    token = None

    def on_start(self):
        """Called once per simulated user — authenticate."""
        user_id = random.randint(1, 10000)
        response = self.client.post("/auth/login", json={
            "username": f"load_test_user_{user_id}@example.com",
            "password": "[LOAD_TEST_PASSWORD]",
        })
        self.token = response.json()["access_token"]
        self.headers = {"Authorization": f"Bearer {self.token}"}

    @task([WEIGHT_1])  # Weight = relative frequency
    def [endpoint_1_task](self):
        """[Endpoint 1 description]"""
        with self.client.get(
            "/[endpoint-1]",
            headers=self.headers,
            catch_response=True
        ) as response:
            if response.elapsed.total_seconds() > [LATENCY_THRESHOLD]:
                response.failure(f"Too slow: {response.elapsed.total_seconds()}s")

    @task([WEIGHT_2])
    def [endpoint_2_task](self):
        """[Endpoint 2 description]"""
        self.client.post(
            "/[endpoint-2]",
            json={"[key]": "[value]"},
            headers=self.headers,
        )

Running Tests

# k6 — run baseline scenario
k6 run --env BASE_URL=https://[test-env-url] scripts/load_test.js

# k6 — run stress scenario with output to InfluxDB
k6 run --out influxdb=http://[influxdb-host]:8086/k6 \
  --env SCENARIO=stress \
  scripts/load_test.js

# Locust — headless run
locust -f locustfile.py \
  --headless \
  --users [N] \
  --spawn-rate [N] \
  --run-time 10m \
  --host https://[test-env-url] \
  --csv=results/[run-id]

# Locust — web UI (interactive)
locust -f locustfile.py --host https://[test-env-url]

7. Metrics to Capture

Capture all of the following during every test run. Missing any of these makes result comparison unreliable.

| Metric | Source | Why it matters | |---|---|---| | p50, p95, p99, p999 latency per endpoint | Load tool | SLO validation | | Error rate (4xx, 5xx) per endpoint | Load tool | SLO validation | | Requests/sec (throughput) | Load tool | Capacity baseline | | CPU utilisation (%) | Infra monitoring | Saturation signal | | Memory utilisation (%) | Infra monitoring | Leak detection | | GC pause time / frequency | JVM/Go metrics | Latency spike root cause | | DB connection pool: active/idle/waiting | DB metrics | Pool exhaustion detection | | DB query latency (p99) | DB metrics | Downstream bottleneck | | Cache hit rate | Cache metrics | Miss storm detection | | Pod/instance count (if autoscaling) | Infra | Scaling behaviour | | Network in/out bytes | Infra | Bandwidth saturation |

8. Result Analysis Framework

After each test run, work through this analysis in order:

Step 1 — Pass/fail check Compare all captured metrics against the thresholds in Section 2. Record pass/fail per scenario.

Step 2 — Latency distribution Plot the full latency histogram, not just percentiles. A bimodal distribution (two humps) indicates two distinct code paths — investigate the slow hump.

Step 3 — Error correlation If errors occurred, correlate them with:

Time of occurrence (was it during ramp-up, steady state, or spike?)
Specific endpoint (is it one endpoint or all?)
Infrastructure events (CPU spike, OOM, DB connection exhaustion?)

Step 4 — Saturation analysis Graph CPU, memory, and connection pool over time. If any resource reached 80%+ of capacity, it is a candidate bottleneck — even if SLOs passed this run.

Step 5 — Compare to baseline run Every run should be compared to the previous run. A 10% regression in p99 latency warrants investigation even if it is still within SLO.

Regression classification:

| Change | Classification | Action | |---|---|---| | p99 within 5% of previous run | Green — no regression | No action | | p99 5–15% worse than previous | Yellow — watch | Investigate before next release | | p99 >15% worse than previous | Red — regression | Block release, file ticket | | Error rate increased vs previous | Red — regression | Block release | | SLO threshold breached | Critical | Block release, page on-call |

9. CI Integration

Add load tests as a gated step in the release pipeline. Run the baseline scenario on every release candidate; run all scenarios weekly.

# Example: GitHub Actions step (adapt for your CI platform)
load-test:
  runs-on: ubuntu-latest
  needs: [deploy-staging]
  if: github.ref == 'refs/heads/main'
  steps:
    - uses: actions/checkout@v3

    - name: Install k6
      run: |
        curl -s https://dl.k6.io/key.gpg | sudo apt-key add -
        echo "deb https://dl.k6.io/deb stable main" | sudo tee /etc/apt/sources.list.d/k6.list
        sudo apt-get update && sudo apt-get install k6

    - name: Seed test data
      run: [seed command]

    - name: Run baseline load test
      run: |
        k6 run \
          --env BASE_URL=${{ secrets.LOAD_TEST_ENV_URL }} \
          --out json=results.json \
          scripts/load_test.js
      env:
        LOAD_TEST_ENV_URL: ${{ secrets.LOAD_TEST_ENV_URL }}

    - name: Check thresholds
      run: |
        # k6 exits with non-zero if any threshold fails — this step fails the build
        echo "k6 threshold check complete"

    - name: Upload results
      uses: actions/upload-artifact@v3
      if: always()
      with:
        name: load-test-results-${{ github.run_id }}
        path: results.json

    - name: Cleanup test data
      if: always()
      run: [cleanup command]

CI gates summary:

Baseline scenario runs on every release to staging
Full scenario suite (stress, spike, soak) runs weekly on a schedule
Any threshold failure blocks promotion to production
Results are archived for trend analysis

Quality Checks

[ ] All key endpoints are covered by at least one test scenario — no production endpoint is untested
[ ] Thresholds are derived from actual SLO targets, not guesses
[ ] Test data seeding is scripted and reproducible — tests do not rely on pre-existing environment state
[ ] The load generator runs on separate infrastructure from the service under test
[ ] CI integration blocks promotion on threshold failure — not just records results
[ ] Soak test has been run at least once to establish a memory and connection pool baseline
[ ] Results comparison to previous run is part of the analysis — not just absolute pass/fail

Anti-Patterns

[ ] Do not set thresholds without grounding them in actual SLO targets or production baselines — arbitrary numbers produce meaningless pass/fail results
[ ] Do not run the load generator on the same host as the service under test — this contaminates both the test results and the service metrics
[ ] Do not use production user data in load test seeding — all test data must be synthetic, tagged, and cleaned up after each run
[ ] Do not skip the soak test on first deployment — only a soak test reveals slow memory leaks and connection pool exhaustion that short tests miss
[ ] Do not treat a passing baseline test as evidence the service handles spikes — baseline, stress, spike, and soak scenarios test fundamentally different failure modes

Load Testing Plan Skill

Required Inputs

Ask for these if not already provided:

Service name and key endpoints — which endpoints are under test (path, method, typical request/response shape)
Current traffic baseline — current requests/sec, p50/p99 latency, error rate under normal load
Peak traffic expectations — expected peak RPS (e.g. 10× baseline for flash sales, or seasonality peak)
SLO targets — latency SLOs (p99 < X ms), error rate SLO (< Y%), availability target
Preferred testing tool — k6, Locust, JMeter, Gatling, or no preference
Test environment availability — dedicated load test environment, staging, or production (with traffic shaping)

Output Format

Load Testing Plan: [Service Name]

1. Objectives and Scope

In scope:

[Endpoint 1: METHOD /path — description]
[Endpoint 2: METHOD /path — description]
[Endpoint 3: METHOD /path — description]

Out of scope:

[Any endpoints explicitly excluded and why — e.g. "admin APIs — low traffic, excluded from load test"]
[Third-party integrations that cannot be load-tested — mock them instead]

2. Performance Targets (Success Criteria)

Every scenario has explicit pass/fail thresholds. A test run FAILS if any threshold is breached.

SLO reference: These thresholds are derived from the service SLOs — p99 < [Z ms], error rate < [0.1]%, availability [99.9]%.

3. Traffic Model

Baseline traffic (current production):

Average RPS: [N] req/sec
Peak RPS (observed): [N] req/sec
Request distribution by endpoint:
- [Endpoint 1]: [X]% of traffic
- [Endpoint 2]: [Y]% of traffic
- [Endpoint 3]: [Z]% of traffic

Simulated user behaviour:

Think time between requests: [X–Y] seconds (randomised)
Session duration: [N] minutes average
Authenticated vs anonymous ratio: [X]%/[Y]%
Geographic distribution: [Region 1 X]%, [Region 2 Y]%

4. Test Scenarios

Scenario 1: Baseline (Steady-State)

Pass criteria: All thresholds in the Baseline column of the targets table above.

Scenario 2: Stress Test

What to record:

RPS at which p99 latency first exceeds SLO
RPS at which error rate first exceeds SLO
Whether the service recovers when load drops back to baseline

Scenario 3: Spike Test

What to record:

Latency during spike and recovery
Whether the service sheds load gracefully (rate limiting, queue depth)
Time to recover to baseline latency after spike ends

Scenario 4: Soak / Endurance Test

What to watch:

Memory usage trend over time (should not grow unboundedly)
Error rate trend (should be flat, not creeping up)
GC pause frequency (JVM/Go services)
Database connection pool utilisation
p99 latency trend (should not creep up over hours)

5. Test Environment Requirements

Infrastructure

Data Seeding

Before every test run, ensure the environment has:

# Seed test users (needed for authenticated endpoint tests)
[seed command or script path — e.g. python scripts/seed_load_test_users.py --count 10000]

# Seed test data for read endpoints
[seed command — e.g. ./scripts/seed_products.sh --count 50000]

# Verify seed completed
[verification command — e.g. psql $DB_URL -c "SELECT COUNT(*) FROM users WHERE load_test=true"]

Test data rules:

Never use real production user data in load tests
Tag all test-generated records with load_test=true for easy cleanup
Run cleanup after each test: [cleanup command]

6. Tooling Setup

k6 Script Skeleton

import http from 'k6/http';
import { check, sleep } from 'k6';
import { Rate, Trend } from 'k6/metrics';

// Custom metrics
const errorRate = new Rate('error_rate');
const endpointLatency = new Trend('endpoint_latency', true);

// Test configuration — override per scenario
export const options = {
  scenarios: {
    baseline: {
      executor: 'ramping-vus',
      startVUs: 0,
      stages: [
        { duration: '2m', target: [BASELINE_VUS] },
        { duration: '8m', target: [BASELINE_VUS] },
        { duration: '1m', target: 0 },
      ],
    },
  },
  thresholds: {
    http_req_duration: [
      'p(95)<[Y_MS]',
      'p(99)<[Z_MS]',
    ],
    error_rate: ['rate<0.01'],
    http_req_failed: ['rate<0.01'],
  },
};

// Auth helper — get token once per VU
export function setup() {
  const loginRes = http.post('[BASE_URL]/auth/login', JSON.stringify({
    username: `load_test_user_${Math.floor(Math.random() * 10000)}@example.com`,
    password: '[LOAD_TEST_PASSWORD]',
  }), { headers: { 'Content-Type': 'application/json' } });

  check(loginRes, { 'login ok': (r) => r.status === 200 });
  return { token: loginRes.json('access_token') };
}

export default function (data) {
  const headers = {
    Authorization: `Bearer ${data.token}`,
    'Content-Type': 'application/json',
  };

  // Endpoint 1: [Description]
  const res1 = http.get('[BASE_URL]/[endpoint-1]', { headers });
  check(res1, {
    '[endpoint-1] status 200': (r) => r.status === 200,
    '[endpoint-1] latency < [X]ms': (r) => r.timings.duration < [X],
  });
  errorRate.add(res1.status >= 400);
  endpointLatency.add(res1.timings.duration, { endpoint: '[endpoint-1]' });

  sleep(Math.random() * [THINK_TIME_MAX] + [THINK_TIME_MIN]);

  // Endpoint 2: [Description]
  const res2 = http.post('[BASE_URL]/[endpoint-2]',
    JSON.stringify({ [key]: '[value]' }),
    { headers }
  );
  check(res2, {
    '[endpoint-2] status 201': (r) => r.status === 201,
  });
  errorRate.add(res2.status >= 400);
}

Locust Script Skeleton (alternative)

from locust import HttpUser, task, between
import random

class [ServiceName]User(HttpUser):
    wait_time = between([THINK_TIME_MIN], [THINK_TIME_MAX])
    token = None

    def on_start(self):
        """Called once per simulated user — authenticate."""
        user_id = random.randint(1, 10000)
        response = self.client.post("/auth/login", json={
            "username": f"load_test_user_{user_id}@example.com",
            "password": "[LOAD_TEST_PASSWORD]",
        })
        self.token = response.json()["access_token"]
        self.headers = {"Authorization": f"Bearer {self.token}"}

    @task([WEIGHT_1])  # Weight = relative frequency
    def [endpoint_1_task](self):
        """[Endpoint 1 description]"""
        with self.client.get(
            "/[endpoint-1]",
            headers=self.headers,
            catch_response=True
        ) as response:
            if response.elapsed.total_seconds() > [LATENCY_THRESHOLD]:
                response.failure(f"Too slow: {response.elapsed.total_seconds()}s")

    @task([WEIGHT_2])
    def [endpoint_2_task](self):
        """[Endpoint 2 description]"""
        self.client.post(
            "/[endpoint-2]",
            json={"[key]": "[value]"},
            headers=self.headers,
        )

Running Tests

# k6 — run baseline scenario
k6 run --env BASE_URL=https://[test-env-url] scripts/load_test.js

# k6 — run stress scenario with output to InfluxDB
k6 run --out influxdb=http://[influxdb-host]:8086/k6 \
  --env SCENARIO=stress \
  scripts/load_test.js

# Locust — headless run
locust -f locustfile.py \
  --headless \
  --users [N] \
  --spawn-rate [N] \
  --run-time 10m \
  --host https://[test-env-url] \
  --csv=results/[run-id]

# Locust — web UI (interactive)
locust -f locustfile.py --host https://[test-env-url]

7. Metrics to Capture

Capture all of the following during every test run. Missing any of these makes result comparison unreliable.

8. Result Analysis Framework

After each test run, work through this analysis in order:

Step 1 — Pass/fail check Compare all captured metrics against the thresholds in Section 2. Record pass/fail per scenario.

Step 2 — Latency distribution Plot the full latency histogram, not just percentiles. A bimodal distribution (two humps) indicates two distinct code paths — investigate the slow hump.

Step 3 — Error correlation If errors occurred, correlate them with:

Time of occurrence (was it during ramp-up, steady state, or spike?)
Specific endpoint (is it one endpoint or all?)
Infrastructure events (CPU spike, OOM, DB connection exhaustion?)

Step 4 — Saturation analysis Graph CPU, memory, and connection pool over time. If any resource reached 80%+ of capacity, it is a candidate bottleneck — even if SLOs passed this run.

Step 5 — Compare to baseline run Every run should be compared to the previous run. A 10% regression in p99 latency warrants investigation even if it is still within SLO.

Regression classification:

9. CI Integration

Add load tests as a gated step in the release pipeline. Run the baseline scenario on every release candidate; run all scenarios weekly.

# Example: GitHub Actions step (adapt for your CI platform)
load-test:
  runs-on: ubuntu-latest
  needs: [deploy-staging]
  if: github.ref == 'refs/heads/main'
  steps:
    - uses: actions/checkout@v3

    - name: Install k6
      run: |
        curl -s https://dl.k6.io/key.gpg | sudo apt-key add -
        echo "deb https://dl.k6.io/deb stable main" | sudo tee /etc/apt/sources.list.d/k6.list
        sudo apt-get update && sudo apt-get install k6

    - name: Seed test data
      run: [seed command]

    - name: Run baseline load test
      run: |
        k6 run \
          --env BASE_URL=${{ secrets.LOAD_TEST_ENV_URL }} \
          --out json=results.json \
          scripts/load_test.js
      env:
        LOAD_TEST_ENV_URL: ${{ secrets.LOAD_TEST_ENV_URL }}

    - name: Check thresholds
      run: |
        # k6 exits with non-zero if any threshold fails — this step fails the build
        echo "k6 threshold check complete"

    - name: Upload results
      uses: actions/upload-artifact@v3
      if: always()
      with:
        name: load-test-results-${{ github.run_id }}
        path: results.json

    - name: Cleanup test data
      if: always()
      run: [cleanup command]

CI gates summary:

Baseline scenario runs on every release to staging
Full scenario suite (stress, spike, soak) runs weekly on a schedule
Any threshold failure blocks promotion to production
Results are archived for trend analysis

Quality Checks

[ ] All key endpoints are covered by at least one test scenario — no production endpoint is untested
[ ] Thresholds are derived from actual SLO targets, not guesses
[ ] Test data seeding is scripted and reproducible — tests do not rely on pre-existing environment state
[ ] The load generator runs on separate infrastructure from the service under test
[ ] CI integration blocks promotion on threshold failure — not just records results
[ ] Soak test has been run at least once to establish a memory and connection pool baseline
[ ] Results comparison to previous run is part of the analysis — not just absolute pass/fail

Anti-Patterns

[ ] Do not set thresholds without grounding them in actual SLO targets or production baselines — arbitrary numbers produce meaningless pass/fail results
[ ] Do not run the load generator on the same host as the service under test — this contaminates both the test results and the service metrics
[ ] Do not use production user data in load test seeding — all test data must be synthetic, tagged, and cleaned up after each run
[ ] Do not skip the soak test on first deployment — only a soak test reveals slow memory leaks and connection pool exhaustion that short tests miss
[ ] Do not treat a passing baseline test as evidence the service handles spikes — baseline, stress, spike, and soak scenarios test fundamentally different failure modes

Adoption

mohitagw15856/load-testing-plan

$ install --global

Security Scan Results

SKILL.md

Load Testing Plan Skill

Required Inputs

Output Format

Load Testing Plan: [Service Name]

1. Objectives and Scope

2. Performance Targets (Success Criteria)

3. Traffic Model

4. Test Scenarios

Scenario 1: Baseline (Steady-State)

Scenario 2: Stress Test

Scenario 3: Spike Test

Scenario 4: Soak / Endurance Test

5. Test Environment Requirements

Infrastructure

Data Seeding

6. Tooling Setup

k6 Script Skeleton

Locust Script Skeleton (alternative)

Running Tests

7. Metrics to Capture

8. Result Analysis Framework

9. CI Integration

Quality Checks

Anti-Patterns

Related Skills

mohitagw15856/win-loss-analysis

mohitagw15856/which-skill

mohitagw15856/vuln-triage

mohitagw15856/voice-of-customer-program

mohitagw15856/load-testing-plan

$ install --global

Security Scan Results

SKILL.md

Load Testing Plan Skill

Required Inputs

Output Format

Load Testing Plan: [Service Name]

1. Objectives and Scope

2. Performance Targets (Success Criteria)

3. Traffic Model

4. Test Scenarios

Scenario 1: Baseline (Steady-State)

Scenario 2: Stress Test

Scenario 3: Spike Test

Scenario 4: Soak / Endurance Test

5. Test Environment Requirements

Infrastructure

Data Seeding

6. Tooling Setup

k6 Script Skeleton

Locust Script Skeleton (alternative)

Running Tests

7. Metrics to Capture

8. Result Analysis Framework

9. CI Integration

Quality Checks

Anti-Patterns

Related Skills

mohitagw15856/win-loss-analysis

mohitagw15856/which-skill

mohitagw15856/vuln-triage

mohitagw15856/voice-of-customer-program