.claude/skills/testing-validator/SKILL.md
Comprehensive testing validation for Claude Code skills through functional testing, example validation, integration testing, regression testing, and edge case testing. Task-based testing operations with automated example execution, manual scenario testing, and test reporting. Use when testing skill functionality, validating examples execute correctly, ensuring integration works, preventing regressions, or conducting comprehensive functional quality assurance.
npx skillsauth add adaptationio/skrillz testing-validatorInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
testing-validator provides comprehensive functional testing for Claude Code skills, validating that skills actually work correctly in practice through systematic testing operations.
Purpose: Functional validation - ensure skills work correctly, not just look good
The 5 Testing Operations:
Complement to review-multi:
Key Benefits:
Use testing-validator when:
Purpose: Validate core skill functionality works as intended
When to Use This Operation:
Automation Level: 30% automated (script checks), 70% manual (scenario execution)
Process:
Select Test Scenarios
Execute Scenarios
Validate Outputs
Check Error Handling
Assess Functionality
Validation Checklist:
Test Results:
Outputs:
Time Estimate: 30-90 minutes
Example:
Functional Testing: skill-researcher
====================================
Test Scenarios:
1. Primary: Research GitHub API integration patterns
2. Common: Research for skill development planning
3. Edge: Research with no results found
Scenario 1: GitHub API Integration Research
- Executed: Operation 2 (GitHub Repository Research)
- Result: ✅ SUCCESS
- Time: 25 minutes
- Output: Found 5 repositories, extracted patterns
- Functionality: Achieved purpose (research complete)
Scenario 2: Skill Development Research
- Executed: All 5 operations (Web, GitHub, Docs, Synthesis)
- Result: ✅ SUCCESS
- Time: 60 minutes
- Output: Research synthesis with 4 sources, 3 patterns
- Functionality: Fully achieved purpose
Scenario 3: No Results Edge Case
- Executed: Web search for obscure topic
- Result: ✅ HANDLED
- Time: 10 minutes
- Output: "No results found" with guidance to adjust search
- Error Handling: Good (helpful message, suggests alternatives)
Overall Functional Test: ✅ PASS
- All scenarios succeeded
- Functionality complete
- Error handling works
- Achieves stated purpose
Purpose: Verify all code/command examples in skill documentation execute correctly
When to Use This Operation:
Automation Level: 80% automated (example extraction and execution)
Process:
Extract All Examples
Categorize Examples
Execute Examples Automatically
python3 scripts/validate-examples.py /path/to/skill
Manual Validation (for non-automatable):
Generate Example Report
Validation Checklist:
Test Results:
Outputs:
Time Estimate: 20-45 minutes (mostly automated)
Example:
Example Validation: review-multi
=================================
Extraction Results:
- Total examples: 18
- Executable (bash): 12
- Executable (python): 3
- Informational (YAML): 3
Automated Execution:
Bash Examples (12 total):
✅ PASS: python3 scripts/validate-structure.py <path> (3 instances)
✅ PASS: python3 scripts/check-patterns.py <path>
✅ PASS: python3 scripts/generate-review-report.py <file>
✅ PASS: python3 scripts/review-runner.py <path>
⚠️ WARNING: Example uses placeholder <path> - works with substitution
- Success Rate: 12/12 (100%)
Python Examples (3 total):
✅ PASS: All 3 syntax-valid, execute correctly
- Success Rate: 3/3 (100%)
Manual Validation (3 YAML examples):
✅ PASS: All YAML examples valid syntax
✅ PASS: Frontmatter examples follow standards
Overall Example Validation: ✅ PASS
- Success Rate: 100% (18/18 examples work)
- Minor Note: Some examples use placeholders (acceptable with clear notes)
Recommendation: Examples excellent, all functional
Purpose: Test skills work correctly with other skills, especially in workflows and compositions
When to Use This Operation:
Automation Level: 20% automated (dependency checking), 80% manual (actual integration testing)
Process:
Identify Integration Points
Test Skill Dependencies
Test Workflow Compositions
Test Integration Examples
Assess Integration Quality
Validation Checklist:
Test Results:
Outputs:
Time Estimate: 30-90 minutes (varies by integration complexity, N/A for standalone skills)
Example:
Integration Testing: development-workflow
==========================================
Integration Type: Workflow Composition (5 component skills)
Dependencies Identified:
1. skill-researcher (Step 1)
2. planning-architect (Step 2)
3. task-development (Step 3, optional)
4. prompt-builder (Step 4)
5. todo-management (Step 5)
Integration Test Execution:
Step 1 → Step 2 Integration:
- Input to Step 2: research-synthesis.md from Step 1
- Test: Create research synthesis, feed to planning-architect
- Result: ✅ PASS (planning-architect correctly uses research findings)
- Data Flow: Smooth (outputs match expected inputs)
Step 2 → Step 3 Integration:
- Input to Step 3: skill-architecture-plan.md from Step 2
- Test: Create architecture plan, feed to task-development
- Result: ✅ PASS (task-development breaks down plan correctly)
- Data Flow: Smooth
Step 3 → Step 4 Integration:
- Input to Step 4: task-breakdown.md from Step 3
- Test: Create task breakdown, feed to prompt-builder
- Result: ✅ PASS (prompt-builder creates prompts for tasks)
- Data Flow: Smooth
Step 4 → Step 5 Integration:
- Input to Step 5: prompts-collection.md from Step 4
- Test: Create prompts, feed to todo-management
- Result: ✅ PASS (todo-management creates todos from tasks)
- Data Flow: Smooth
Workflow Execution Test:
- Executed: Complete workflow (all 5 steps)
- Result: ✅ SUCCESS (produced complete skill planning artifacts)
- Time: 4.5 hours (as documented)
- Quality: High (artifacts complete and usable)
Overall Integration Test: ✅ PASS
- All 5 integrations work smoothly
- Data flows correctly between steps
- Workflow achieves stated purpose
- No integration issues found
Purpose: Ensure updates don't break existing functionality
When to Use This Operation:
Automation Level: 60% automated (comparison, example re-execution), 40% manual
Process:
Establish Baseline
Apply Changes
Re-Run Tests
Compare Results
Identify Regressions
Validation Checklist:
Test Results:
Outputs:
Time Estimate: 30-60 minutes
Example:
Regression Testing: planning-architect (after Quick Ref addition)
==================================================================
Baseline (Before Quick Reference):
- Structure validation: 5/5 (PASS)
- Example count: 8 examples
- All examples: Execute successfully
- Scenarios tested: 2 scenarios (both PASS)
Changes Applied:
- Added Quick Reference section (96 lines)
- Added tables, checklists, decision tree
Re-Run Tests (After Quick Reference):
- Structure validation: 5/5 (PASS) ✅ No regression
- Example count: 8 examples ✅ No change
- All examples: Execute successfully ✅ No regression
- Scenarios tested: 2 scenarios (both PASS) ✅ No regression
- NEW: Quick Reference detected ✅ Improvement
Comparison:
✅ All baseline tests still pass (no regressions)
✅ New functionality added (Quick Reference)
✅ Quality maintained (5/5 score)
Overall Regression Test: ✅ PASS (No Regressions)
Additional: ✅ IMPROVEMENT (Quick Reference added)
Recommendation: Changes safe to deploy
Purpose: Test skill handles unusual scenarios, boundary conditions, and edge cases correctly
When to Use This Operation:
Automation Level: 30% automated (known edge case checks), 70% manual (scenario thinking)
Process:
Identify Edge Cases
Design Edge Case Tests
Execute Edge Case Tests
Evaluate Handling
Document Edge Case Behavior
Validation Checklist:
Test Results:
Outputs:
Time Estimate: 30-90 minutes
Example:
Edge Case Testing: todo-management
===================================
Edge Cases Identified:
1. Empty task list (initialize with 0 tasks)
2. Single task (minimal usage)
3. 100+ tasks (maximum usage)
4. Starting non-existent task
5. Completing already completed task
Edge Case Tests:
Test 1: Empty Task List
- Scenario: Initialize with empty list
- Execution: todo-management Operation 1 with 0 tasks
- Result: ✅ PASS (handles gracefully, shows empty state)
- Error: None
Test 2: Single Task
- Scenario: List with 1 task only
- Execution: Complete workflow on 1 task
- Result: ✅ PASS (works correctly, minimal case handled)
Test 3: 100 Tasks
- Scenario: Large task list
- Execution: Report progress on 100-task list
- Result: ✅ PASS (handles large lists, performance acceptable)
- Note: Report generation ~5 seconds (good)
Test 4: Non-Existent Task
- Scenario: Start task #999 (doesn't exist)
- Execution: Operation 2 (Start Task 999)
- Result: ✅ PASS (clear error: "Task 999 not found")
- Error Handling: Excellent (specific error message)
Test 5: Double Complete
- Scenario: Complete task #5 twice
- Execution: Operation 3 twice on same task
- Result: ✅ PASS (second attempt shows "Already completed")
- Error Handling: Good (informative message)
Overall Edge Case Test: ✅ PASS
- All critical edge cases handled correctly
- Error messages clear and helpful
- No crashes or undefined behavior
- Graceful handling of unusual scenarios
Recommendation: Edge case handling excellent
Purpose: Complete functional validation across all 5 operations
When to Use:
Process:
Time Estimate: 2-4 hours
Output: Complete test report with PASS/FAIL for deployment
Purpose: Fast functional validation (examples only)
When to Use:
Process:
Time Estimate: 15-30 minutes (automated)
Output: Example validation results
Purpose: Select specific operations based on needs
When to Use:
Process:
Practice: Run Quick Check during development, Comprehensive before deployment
Rationale: Early testing catches issues before they compound
Application: Quick Check daily, Comprehensive pre-deploy
Practice: Use automated example validation (validate-examples.py)
Rationale: 80% automated, fast, catches broken examples instantly
Application: Run after any example changes
Practice: Use actual use cases for functional testing
Rationale: Real scenarios reveal issues documentation review misses
Application: Test scenarios from "When to Use" section
Practice: Always run regression tests after skill changes
Rationale: Prevents breaking existing functionality with improvements
Application: Before/after comparison for all updates
Practice: Save test reports for comparison over time
Rationale: Track testing trends, identify patterns
Application: Generate test report for each comprehensive test
Practice: Don't deploy with broken examples
Rationale: Broken examples destroy user confidence
Application: Example validation must PASS before deploy
Symptom: Users report broken examples after deployment
Cause: Not testing examples before release
Fix: Run Operation 2 (Example Validation) before every deployment
Prevention: Make example validation mandatory in deployment checklist
Symptom: Skills break with unusual inputs or edge cases
Cause: Not testing edge cases
Fix: Run Operation 5 (Edge Case Testing)
Prevention: Include edge case testing in comprehensive mode
Symptom: Updates break previously working functionality
Cause: Not testing before/after updates
Fix: Run Operation 4 (Regression Testing) after changes
Prevention: Make regression testing mandatory for all updates
Symptom: Workflow skills break when actually composing other skills
Cause: Testing skills individually, not integrated
Fix: Run Operation 3 (Integration Testing) for workflow skills
Prevention: Always test integrations for workflow/composition skills
Symptom: Testing takes too long, often skipped
Cause: Not using automation
Fix: Use validate-examples.py for automated example checking
Prevention: Automate where possible (examples, scripts, structure)
| Operation | Focus | Automation | Time | Pass Criteria | |-----------|-------|------------|------|---------------| | Functional | Core functionality works | 30% | 30-90m | Scenarios succeed | | Example Validation | Examples execute correctly | 80% | 20-45m | ≥90% examples work | | Integration | Skills work together | 20% | 30-90m | Integrations smooth | | Regression | Updates don't break functionality | 60% | 30-60m | No new failures | | Edge Case | Handles unusual scenarios | 30% | 30-90m | Critical edge cases handled |
| Mode | Time | Operations | Use Case | |------|------|------------|----------| | Comprehensive | 2-4h | All 5 operations | Pre-deployment, certification | | Quick Check | 15-30m | Example validation only | During development | | Custom | Variable | Selected operations | Targeted testing |
| Result | Meaning | Action | |--------|---------|--------| | PASS | All tests successful | Deploy with confidence | | PARTIAL | Some issues, not critical | Fix issues, re-test, then deploy | | FAIL | Critical issues | Fix before deployment |
Use Both for Complete Validation:
review-multi (quality) + testing-validator (functionality) = Complete Validation
review-multi: Is it good? (structure, content, patterns, usability)
testing-validator: Does it work? (functional, examples, integration)
Together: Ready to deploy? (quality + functionality validated)
# Validate all examples automatically
python3 scripts/validate-examples.py /path/to/skill
# Run comprehensive test suite
python3 scripts/test-runner.py /path/to/skill --mode comprehensive
# Generate test report
python3 scripts/generate-test-report.py test-results.json --output report.md
testing-validator ensures skills work correctly through comprehensive functional testing, example validation, integration testing, regression testing, and edge case validation.
development
Setup secure web-based terminal access to WSL2 from mobile/tablet via ttyd + ngrok/Cloudflare/Tailscale. One-command install, start, stop, status. Use when you need remote terminal access, web terminal, browser-based shell, or mobile access to WSL2 environment.
development
Complete development workflows where Claude writes the code while Gemini and Codex provide research, planning, reviews, and different perspectives. Claude remains the main developer. Use for complex projects requiring expert planning and multi-perspective reviews.
development
Systematic progress tracking for skill development. Manages task states (pending/in_progress/completed), updates in real-time, reports progress, identifies blockers, and maintains momentum. Use when tracking skill development, coordinating work, or reporting progress.
testing
Comprehensive testing workflow orchestrating functional testing, example validation, integration testing, and usability assessment. Sequential workflow for complete skill testing from examples through scenarios to integration validation. Use when conducting thorough testing, pre-deployment validation, ensuring skill functionality, or comprehensive quality checks.