skills/test-standards/SKILL.md
Test quality validation detecting mesa-optimization, happy-path bias, vacuous assertions, and error-swallowing anti-patterns. Use when reviewing test files for quality issues, evaluating test meaningfulness, or ensuring tests validate behavior rather than passing trivially. Supports JavaScript, TypeScript, and Python test frameworks.
npx skillsauth add auldsyababua/instructor-workflow test-standardsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Validates test quality through automated scanning and LLM-guided heuristics to detect:
expect.assertions(n))Execute test-standards during QA validation:
Run script first for deterministic, fast detection:
Scan specific test files:
python scripts/test_quality_scanner.py tests/auth.test.ts tests/api.test.ts --format json
Scan test directory:
python scripts/test_quality_scanner.py ./tests --format json
Save report:
python scripts/test_quality_scanner.py ./tests --output test-quality-report.json
Exclude patterns:
python scripts/test_quality_scanner.py ./tests --exclude node_modules .snapshots --format json
Parse JSON output and evaluate findings:
Finding Structure:
{
"category": "no_assertions|tautological_assertion|vacuous_check|mock_only_validation|...",
"severity": "CRITICAL|HIGH|MEDIUM|LOW",
"file": "tests/auth.test.ts",
"test_name": "validates user login",
"line": 42,
"pattern": "regex pattern or description",
"context": "relevant code snippet",
"message": "human-readable description"
}
Report Structure:
{
"files_scanned": 15,
"total_tests": 128,
"total_findings": 12,
"findings_by_severity": {
"CRITICAL": 2,
"HIGH": 5,
"MEDIUM": 4,
"LOW": 1
},
"findings": [...],
"test_stats": [
{
"file": "tests/auth.test.ts",
"total_tests": 25,
"tests_with_assertions": 23,
"tests_without_assertions": 2,
"async_tests_without_protection": 3,
"error_tests": 5,
"success_tests": 18,
"avg_assertions_per_test": 3.2
}
],
"summary": {
"no_assertions": 2,
"tautological_assertions": 1,
"vacuous_checks": 3,
"mock_only_validation": 2,
"missing_assertion_count": 3,
"error_swallowing": 1,
"low_assertion_density": 0,
"happy_path_bias": 1
}
}
Severity Guidelines:
After automated scanning, apply judgment for patterns the script cannot detect:
Compare test changes in git diff to detect semantic weakening:
Indicators of Weakening:
Example - Weakened Test:
// Before (git diff -)
expect(response.data).toMatchObject({
id: expect.any(String),
status: 'active',
count: expect.any(Number)
});
// After (git diff +)
expect(response.data).toBeDefined(); // ❌ Lost specificity
Action: Flag as HIGH severity, request Action Agent restore specific assertions.
Verify test changes align with issue scope:
Legitimate Changes:
Suspicious Changes:
Action: Request clarification or justification in scratch notes.
Identify functions/features lacking tests:
Review Approach:
Action: Request additional tests for uncovered critical functionality.
Combine automated findings with heuristic review:
## Test Standards Validation for [ISSUE-ID]
### Automated Scan Summary
- Test Files Scanned: X
- Total Tests: Y
- Total Findings: Z
- CRITICAL: A findings
- HIGH: B findings
- MEDIUM: C findings
### Critical Findings (BLOCK)
1. [test_name in file:line] - No assertions found
2. [test_name in file:line] - Tautological assertion: expect(true).toBe(true)
### High Priority Findings (FIX REQUIRED)
1. [test_name in file:line] - Only validates mocks, not behavior
2. [test_name in file:line] - Catch block swallows errors without assertions
### Medium Priority Findings (REVIEW)
1. [file:0] - Happy-path bias: 15 success tests, 0 error tests
2. [test_name in file:line] - Async try/catch missing expect.assertions(n)
### Heuristic Review
- **Assertion Quality**: PASS/WARN/FAIL with examples
- **Coverage Completeness**: PASS/WARN/FAIL with gaps identified
- **Scope Alignment**: PASS/WARN with details
### Test Statistics
- Average assertions per test: 3.2
- Tests without assertions: 2 (8%)
- Async tests without protection: 3 (12%)
- Error test coverage: 20%
### Recommendation
[APPROVED | CHANGES REQUIRED | BLOCKED]
### Action Items
[Specific fixes needed with file:line references]
Test executes code but makes no assertions. Always indicates a problem.
Assertion always true regardless of implementation (e.g., expect(true).toBe(true)).
Only checks property exists without validating value (e.g., only .toBeDefined()).
Test only verifies mocks were called, doesn't validate actual behavior.
Async test with try/catch missing expect.assertions(n) protection.
Catch block swallows errors without assertions (test always passes).
Test has suspiciously few assertions for its length (may be superficial).
File has only success-path tests, no error handling tests.
Execute test-standards at Step 3: Change Review (Diff) or Step 7: Tests (Right-Sized) in QA workflow:
For detailed examples and patterns, read references/test-quality-patterns.md:
Scripts:
scripts/test_quality_scanner.py - Automated test quality detectionReferences:
references/test-quality-patterns.md - Detailed patterns and examplestools
Toolkit for interacting with and testing local web applications using Playwright. Supports verifying frontend functionality, debugging UI behavior, capturing browser screenshots, and viewing browser logs.
testing
Three-step Linear update protocol after job completion - update child issue, check parent completion, update parent if all children done
testing
This skill should be used whenever users need help planning trips, creating travel itineraries, managing travel budgets, or seeking destination advice. On first use, collects comprehensive travel preferences including budget level, travel style, interests, and dietary restrictions. Generates detailed travel plans with day-by-day itineraries, budget breakdowns, packing checklists, cultural do's and don'ts, and region-specific schedules. Maintains database of preferences and past trips for personalized recommendations.
tools
Proactive token budget assessment and task chunking strategy. Use this skill when queries involve multiple large file uploads, requests for comprehensive multi-document analysis, complex multi-step workflows with heavy research (10+ tool calls), phrases like "complete analysis", "full audit", "thorough review", "deep dive", or tasks combining extensive research with large output artifacts. This skill helps assess token consumption risk early and recommend chunking strategies before beginning work.