.claude/skills/e2e-test/SKILL.md
--- name: e2e-test description: Run end-to-end tests for easy-db-lab. Automatically detects what to test based on code changes in the current branch, or allows manual specification of test scope. Use when validating changes, running CI tests, or verifying full system functionality. Runs in background, reports results, and automatically debugs failures. allowed-tools: Bash, Read, Grep, Glob, Task argument-hint: [--cassandra|--clickhouse|--opensearch|--spark|--all] [--clean] [--start-step <N>] dis
npx skillsauth add rustyrazorblade/easy-db-lab .claude/skills/e2e-testInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Run comprehensive end-to-end tests for easy-db-lab with intelligent test scope detection.
User-provided arguments: $ARGUMENTS
If no arguments provided, detect what to test based on code changes in current branch.
If --clean is present in arguments, pass --clean to bin/end-to-end-test.
If --start-step <N> is present in arguments, pass it through to bin/end-to-end-test. This skips steps 1 through N-1 and starts execution at step N. Useful for resuming after a failure or skipping infrastructure setup when a cluster is already running. When --start-step is provided, skip the cluster existence check (Step 1) — a running cluster is assumed.
Current branch: !git branch --show-current
Recent commits on this branch: !git log --oneline main..HEAD 2>/dev/null | head -5 || echo "On main branch or no commits yet"
CRITICAL: Check if an e2e test cluster is already running before starting new tests.
SKIP THIS STEP if --clean was provided. The --clean flag tears down any existing environment, so there is no risk of duplicate infrastructure and no need to check status.
Only perform this check if --clean was NOT provided.
IMPORTANT: Use easy-db-lab status as the sole source of truth for whether infrastructure is running. Do NOT check for state.json — it is unreliable and may exist even after a cluster has been torn down.
# Always check actual infrastructure status — never gate on state.json
status_output=$(easy-db-lab status 2>&1)
if echo "$status_output" | grep -q "Infrastructure: UP"; then
echo "Infrastructure is UP — active cluster detected"
echo "CLUSTER_EXISTS=true"
else
echo "Infrastructure is not UP — safe to proceed with new test"
echo "CLUSTER_EXISTS=false"
fi
Detection Logic:
easy-db-lab status shows Infrastructure: UP → Active cluster (do NOT proceed)easy-db-lab status shows Infrastructure: DOWN → Cluster is down (proceed with tests)easy-db-lab status errors or shows no cluster → No cluster (proceed with tests)If active cluster exists (CLUSTER_EXISTS=true):
Do NOT re-run the full test. Instead:
Report cluster status:
easy-db-lab status
Check if tests already completed:
Options for user:
/debug-environmenteasy-db-lab down --yes then re-invoke skillDo NOT start new cluster - This would create duplicate infrastructure and waste resources
Message to user:
An e2e test cluster is already running in this directory.
Cluster: <name>
Status: <from status command>
Options:
1. Debug this cluster: /debug-environment
2. Resume from specific step: bin/end-to-end-test --start-step <N> --<flags> --no-teardown
3. Tear down and retest: easy-db-lab down --yes, then re-invoke /e2e-test
4. Keep for manual investigation
What would you like to do?
Exit this skill if cluster exists - do not proceed to Step 2.
Only if NO cluster exists - determine what to test.
If the user provided explicit flags (--cassandra, --clickhouse, etc.), use those.
If no flags provided, analyze code changes to determine what to test:
Files changed in current branch (excluding main): !git diff --name-only main...HEAD 2>/dev/null | head -30 || echo "No changes from main"
Based on changed files, determine which test flags to enable:
Cassandra Testing (--cassandra) - Enable if changes affect:
src/main/kotlin/**/cassandra/**src/main/kotlin/**/commands/Cassandra*packer/cassandra/**src/main/resources/**/cassandra/**ClickHouse Testing (--clickhouse) - Enable if changes affect:
src/main/kotlin/**/clickhouse/**src/main/kotlin/**/commands/Clickhouse*src/main/resources/**/clickhouse/**OpenSearch Testing (--opensearch) - Enable if changes affect:
src/main/kotlin/**/opensearch/**src/main/kotlin/**/services/aws/OpenSearch*Spark Testing (--spark) - Enable if changes affect:
spark/** directorysrc/main/kotlin/**/spark/**src/main/kotlin/**/commands/Spark*Core Infrastructure Testing - Enable Cassandra by default if changes affect:
src/main/kotlin/**/configuration/** (K8s manifests)src/main/kotlin/**/kubernetes/**src/main/kotlin/**/providers/** (AWS providers)src/main/kotlin/**/commands/Init* or cluster initializationpacker/base/** (base provisioning)If on main branch or no detectable changes:
--cassandra (fastest, covers core functionality)If many subsystems changed:
--all to test everythingIMPORTANT: This skill runs tests in non-interactive mode using --no-teardown flag.
Benefits:
The script will:
This allows the skill to analyze failures and invoke debugging automatically.
The end-to-end test script handles everything automatically:
./gradlew shadowJar installDist)init --clean flagYou don't need to:
The script does all of this. Just run it.
rmCRITICAL SAFETY RULE:
🚫 NEVER EXECUTE rm COMMANDS. EVER.
This is an ABSOLUTE RULE with no exceptions:
rm to clean up filesrm -rf for anythingWhy:
--clean flagIf you think files need cleanup:
NO EXCEPTIONS. NEVER USE rm.
CRITICAL: Do NOT use the Agent tool to delegate test execution.
Delegating to a subagent traps all test output inside the subagent until the test completes (20+ minutes), making real-time progress reporting impossible. Always run the test directly in the main agent using the Bash tool with run_in_background: true.
Use the Bash tool with run_in_background: true:
bin/end-to-end-test --<flags> --no-teardown 2>&1
The Bash tool result will include an output_file path. Save this path — you will use it to monitor progress in Step 6.
IMMEDIATELY after launching the background Bash:
output_file path from the Bash tool resultlast_reported_step = 0Always use --no-teardown flag:
# Basic Cassandra test
bin/end-to-end-test --cassandra --no-teardown
# Full test suite
bin/end-to-end-test --all --no-teardown
# Spark + Cassandra
bin/end-to-end-test --spark --cassandra --no-teardown
# ClickHouse only
bin/end-to-end-test --clickhouse --no-teardown
# With custom instance type
EASY_DB_LAB_INSTANCE_TYPE=c5d.4xlarge bin/end-to-end-test --cassandra --no-teardown
# Build AMI first (slow)
bin/end-to-end-test --build --cassandra --no-teardown
# Clean old environment files before starting
bin/end-to-end-test --clean --cassandra --no-teardown
# Skip infra setup, start from step 21 (e.g. Spark steps on existing cluster)
bin/end-to-end-test --start-step 21 --spark --cassandra --no-teardown
Using --start-step:
--list-steps output)bin/end-to-end-test --list-steps to see step numbers--start-step is provided, skip the cluster existence checkWhy --no-teardown?
EASY_DB_LAB_INSTANCE_TYPE - Override default instance type (default: c5d.2xlarge)AWS_PROFILE - AWS profile to use (default: sandbox-admin in script)EASY_DB_LAB_E2E_AUTO_TEARDOWN - If available, enables non-interactive teardownTest Duration:
--cassandra only: ~15-20 minutes--clickhouse only: ~10-15 minutes--spark --cassandra: ~25-35 minutes--opensearch: +10-30 minutes (OpenSearch domain creation is slow)--all: ~45-60 minutes--build: +30-45 minutes (packer AMI build)Test Workspace:
state.json, kubeconfig, sshConfig in project rootCleanup:
Cost Considerations:
The test runs as a background Bash. You push updates to the user by running a short-lived poller in background, re-launching it on every notification.
After launching the test, immediately launch this as a background Bash:
sleep 10 && grep "^Step\|^FAILED\|step(s) FAILED\|All tests passed" <output_file>
In EVERY response you send — whether triggered by a poll notification OR a user message — your FIRST tool call MUST be to re-launch the poller (unless the test has already finished):
sleep 10 && grep "^Step\|^FAILED\|step(s) FAILED\|All tests passed" <output_file>
Do this BEFORE reading results, BEFORE responding to the user, BEFORE anything else. This is what keeps the loop alive. If you skip this even once, updates stop.
Read the output and output a full cumulative progress report showing every step seen so far. This lets the user see the full picture without scrolling back:
Progress: Step N/TOTAL
✅ Step 1/TOTAL: <step-name>
✅ Step 2/TOTAL: <step-name>
⏭️ Step 3/TOTAL: <step-name> - Skipped
❌ Step 4/TOTAL: <step-name> - FAILED
✅ Step 5/TOTAL: <step-name>
🔄 Step 6/TOTAL: <step-name> - Running...
Use:
Stop re-launching when the output contains All tests passed or step(s) FAILED.
Example Monitoring Flow:
[System notifies: new output available]
→ Check output immediately
→ Find: Steps 6, 7, 8 completed since last check
→ Report to user (full cumulative list):
"Progress: Step 8/40
✅ Step 1/40: Build project
✅ Step 2/40: Check version
✅ Step 3/40: Build packer image
✅ Step 4/40: Set IAM policies
✅ Step 5/40: Initialize cluster
✅ Step 6/40: Setup kubectl
✅ Step 7/40: Wait for K3s
✅ Step 8/40: Verify cluster
🔄 Step 9/40: Verify VPC tags - Running..."
[System notifies: new output available]
→ Check output immediately
→ Find: Step 9 complete, Step 10 started
→ Report to user (full cumulative list):
"Progress: Step 9/40
✅ Step 1/40: Build project
...
✅ Step 9/40: Verify VPC tags
🔄 Step 10/40: List hosts - Running..."
Concrete Execution Timeline:
Message 1: [You start the test]
"Starting e2e tests with --spark. Running in background.
Will report each step as it completes."
→ [System notification arrives: new output]
Message 2: [Full progress report after steps 1-2]
"Progress: Step 2/40
✅ Step 1/40: Build project
✅ Step 2/40: Check version
🔄 Step 3/40: Build packer image - Running..."
→ [System notification arrives: new output]
Message 3: [Full progress report after steps 3-5]
"Progress: Step 5/40
✅ Step 1/40: Build project
✅ Step 2/40: Check version
⏭️ Step 3/40: Build packer image - Skipped
✅ Step 4/40: Set IAM policies
✅ Step 5/40: Initialize cluster
🔄 Step 6/40: Setup kubectl - Running..."
...and so on until test completes
The pattern is simple:
1. Initial Status (when test starts):
Starting end-to-end tests with --spark flag
Test scope: Spark + Cassandra + Core infrastructure
Estimated duration: 25-35 minutes
Running in background, will report each step as it progresses
2. After Every Step — Full Cumulative Report:
After each notification, output the complete list of all steps seen so far:
Progress: Step 5/40
✅ Step 1/40: Build project
✅ Step 2/40: Check version
⏭️ Step 3/40: Build packer image - Skipped
✅ Step 4/40: Set IAM policies
✅ Step 5/40: Initialize cluster
🔄 Step 6/40: Setup SSH config - Running...
This lets the user see all progress at a glance without scrolling back.
3. Failures (shown inline in the cumulative list):
Progress: Step 23/40
✅ Step 1/40: Build project
...
✅ Step 22/40: Deploy VictoriaMetrics
❌ Step 23/40: Test VictoriaMetrics - FAILED: pod not responding on port 8428
🔄 Step 24/40: ... - Running...
Checking logs for details...
4. Long-Running Steps: If a step takes >5 minutes, provide interim update with the same cumulative format, marking the running step with elapsed time:
🔄 Step 15/40: Wait for Cassandra - Running (5 minutes elapsed)
Before sending ANY message to the user, ask yourself:
If you realize you've missed steps:
Example Recovery:
[You realize you last reported step 5, but output shows steps 6-10 exist]
Your next message (full cumulative list):
"Progress: Step 10/40
✅ Step 1/40: Build project
✅ Step 2/40: Check version
...
✅ Step 5/40: Initialize cluster
✅ Step 6/40: Setup kubectl
✅ Step 7/40: Wait for K3s
✅ Step 8/40: Verify cluster
✅ Step 9/40: Verify VPC tags
✅ Step 10/40: List hosts
🔄 Step 11/40: ... - Running..."
Prevention: After sending each message, immediately look for the NEXT system notification.
Look for these patterns in the output:
Step N/TOTAL: <name> → New step starting (report immediately)=== followed by success message → Step completeFAILED: Step N → Step failed (report immediately)========================================== → Step boundaryAll tests passed successfully → All done (success)N step(s) FAILED → Test run complete (failure)ABSOLUTE RULE: Report EVERY step. No exceptions. No batching. No delays.
Failure Pattern 1: "I started the test and then did nothing"
Failure Pattern 2: "I only checked when the user asked"
Failure Pattern 3: "I batched multiple steps together"
Failure Pattern 4: "I didn't realize the test was progressing"
Failure Pattern 5: "I reported some steps but skipped others"
How to Verify You're Monitoring Correctly:
After each message you send, ask yourself:
If any answer is NO, you are failing to monitor correctly.
Output will show:
==========================================
=== All tests passed successfully ===
==========================================
=== --no-teardown specified: skipping teardown ===
Cluster remains running for inspection
Action: Report success to user and ask if they want to tear down the cluster.
Output will show:
==========================================
=== N step(s) FAILED ===
==========================================
=== --no-teardown specified: skipping teardown ===
Cluster remains running for inspection
Exit code will be 1.
The failure log includes:
AUTO-DEBUG on failure:
Invoke the /debug-environment skill to investigate the live cluster. Report:
If a step fails and you want to continue from that point:
# List all steps to find the number
bin/end-to-end-test --list-steps
# Resume from step 15 (after fixing the issue)
bin/end-to-end-test --start-step 15 --cassandra
Pause before specific steps to inspect state:
# Pause before step 10 and step 15
bin/end-to-end-test --break 10,15 --cassandra
# The script will pause and wait for Enter before continuing
After test completion (and debugging if failed), provide a comprehensive summary:
Summary Format:
End-to-End Test Results
=======================
Branch: <branch-name>
Test Scope: <flags used>
Duration: <total time>
Result: <PASS/FAIL>
All Steps:
✅ Step 1/N: <step-name>
✅ Step 2/N: <step-name>
❌ Step 3/N: <step-name> - <brief reason if known>
✅ Step 4/N: <step-name>
... (all steps, every one)
Steps Executed: <total>
Steps Passed: <count>
Steps Failed: <count>
Root Cause (from auto-debug):
<Summary of findings from debug-environment skill>
Recommended Fixes:
1. <Fix from debug analysis>
2. <Fix from debug analysis>
Environment:
- Cluster: <name>
- Region: <region>
- Instance Type: <type>
- Test Directory: <path>
- Cluster Status: RUNNING (not torn down)
Next Steps:
<Recommendations based on results>
If all tests passed:
Tear down cluster to avoid AWS charges:
cd /Users/jhaddad/dev/easy-db-lab
source env.sh
easy-db-lab down --yes
Commit changes if on a feature branch
Create pull request if ready
If tests failed:
Review debug findings - The debug-environment skill has identified issues
Choose action:
If fixing code:
# Make fixes based on debug recommendations
# Rebuild
./gradlew clean shadowJar
# Resume from failed step (if possible)
bin/end-to-end-test --start-step <N> --<database> --no-teardown
# Or run full test again
bin/end-to-end-test --<database> --no-teardown
If manual investigation needed:
# SSH to nodes
ssh -F sshConfig control
ssh -F sshConfig db-0
# Check K8s resources
export KUBECONFIG=$(pwd)/kubeconfig
kubectl get pods -A
kubectl logs <pod> -n <namespace>
# Use easy-db-lab commands
easy-db-lab status
easy-db-lab logs query --since 1h
Always tear down when done:
easy-db-lab down --yes
IMPORTANT: The cluster continues running and accumulating AWS charges until torn down.
If not actively investigating:
Current approximate costs (per hour):
# Test only what changed in your branch
# (detected automatically)
/e2e-test
# Override detection - test specific systems
/e2e-test --cassandra --clickhouse
# Full regression test
/e2e-test --all
# Build new AMI first (for packer changes)
/e2e-test --build --cassandra
# Clean old environment files before starting
/e2e-test --clean --cassandra
./gradlew test./gradlew detekt ktlintCheck/e2e-test (auto-detects scope)/debug-environment if neededFor CI/CD pipelines, the test script should support:
# Non-interactive mode (when available)
EASY_DB_LAB_E2E_AUTO_TEARDOWN=1 bin/end-to-end-test --cassandra
# Or use expect/timeout for current version
echo "yes" | bin/end-to-end-test --cassandra
bin/end-to-end-testbin/end-to-end-test --list-steps/debug-environment skill for investigationdocs/ directoryCLAUDE.md filesBased on the user's request:
Be clear about:
development
Propose a new change with all artifacts generated in one step. Use when the user wants to quickly describe what they want to build and get a complete proposal with design, specs, and tasks ready for implementation.
tools
Enter explore mode - a thinking partner for exploring ideas, investigating problems, and clarifying requirements. Use when the user wants to think through something before or during a change.
tools
Use when archiving an OpenSpec change that adds or modifies specs, or when the user asks to review specs for overlap. Finds specs that describe the same system from different angles and proposes merging them under a more general name.
tools
Archive a completed change in the experimental workflow. Use when the user wants to finalize and archive a change after implementation is complete.