skills/code-validation/SKILL.md
Automated code validation for diff review, change hygiene, and red flag detection. Use when reviewing git diffs, PRs, or changed files for test disabling patterns, secrets, path portability issues, security flags, and large deletions. Supports Python, TypeScript, JavaScript, HTML, and CSS.
npx skillsauth add auldsyababua/instructor-workflow code-validationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Validates code changes through automated scanning and LLM-guided heuristics to detect:
Execute code-validation as part of QA validation protocol:
Run scripts first for deterministic, fast checks:
For Git Diffs (comparing branches):
# Compare feature branch against main
python scripts/diff_analyzer.py --base main --format json
# Compare specific commit range
python scripts/diff_analyzer.py --range HEAD~5..HEAD --format json
# Save report to file
python scripts/diff_analyzer.py --base main --output validation-report.json
For Static File Analysis (staged changes or specific files):
# Scan specific files
python scripts/static_analyzer.py src/app.py src/utils.py --format json
# Scan entire directory
python scripts/static_analyzer.py ./src --format json
# Exclude patterns
python scripts/static_analyzer.py ./src --exclude node_modules .git --format json
# Save report
python scripts/static_analyzer.py ./src --output scan-report.json
Parse JSON output and evaluate findings:
Finding Structure:
{
"category": "test_disabling|secret_exposure|path_portability|security_flags|large_deletion|dependency_change",
"severity": "CRITICAL|HIGH|MEDIUM|LOW",
"file": "path/to/file.py",
"line": 42,
"pattern": "regex pattern matched",
"context": "actual line content",
"message": "human-readable description"
}
Severity Guidelines:
After automated scanning, apply human judgment for:
Scripts cannot detect semantic changes. Manually review:
Red Flags:
// Before
expect(response.data).toMatchObject({
id: expect.any(String),
status: 'active',
count: expect.any(Number)
});
// After - WEAKENED
expect(response.data).toBeDefined(); // ❌ Lost specificity
Evaluate if exception handling is justified:
When Acceptable:
// Top-level boundary
app.use((err, req, res, next) => {
logger.error(err);
res.status(500).json({ error: 'Internal error' });
});
Red Flag:
// Business logic swallowing errors
try {
await processPayment(data);
} catch (e) {
// ❌ Silent failure, no logging
}
Assess if changes align with issue scope:
Legitimate:
Scope Creep:
Verify changes match current production architecture:
.project-context.mdCombine automated findings with heuristic review:
## Code Validation Results for [ISSUE-ID]
### Automated Scan Summary
- Files Changed: X
- Total Findings: Y
- CRITICAL: Z findings
- HIGH: A findings
- MEDIUM: B findings
### Critical Findings (BLOCK)
[List CRITICAL severity findings with file:line references]
### High Priority Findings (FIX REQUIRED)
[List HIGH severity findings]
### Heuristic Review
- Test Assertion Quality: [PASS/FAIL with specifics]
- Exception Handling: [PASS/WARN/FAIL with examples]
- Scope Alignment: [PASS/WARN/FAIL with details]
- Architecture Compliance: [PASS/FAIL with ADR references]
### Recommendation
[APPROVED | CHANGES REQUIRED | BLOCKED]
### Action Items
[Specific fixes needed with file:line references]
Both scripts output JSON with this structure:
{
"commit_range": "main..HEAD",
"files_scanned": 42,
"files_changed": 15,
"total_findings": 8,
"findings_by_severity": {
"CRITICAL": 1,
"HIGH": 3,
"MEDIUM": 4,
"LOW": 0
},
"findings": [
{
"category": "secret_exposure",
"severity": "CRITICAL",
"file": "src/config.py",
"line": 12,
"pattern": "...",
"context": "API_KEY = 'sk_live_abc123...'",
"message": "Potential hardcoded secret"
}
],
"summary": {
"test_disabling": 2,
"secret_exposure": 1,
"path_portability": 3,
"security_flags": 1,
"dependency_changes": 1,
"large_deletions": 0
}
}
Patterns indicating tests were disabled rather than fixed:
.skip(), .only(), .todo()xit(), xdescribe(), fit(), fdescribe()@pytest.skip, @unittest.skipAction: Require Action Agent to fix tests or justify with comment
Hardcoded credentials or API keys:
Action: BLOCK merge, require environment variables
User-specific paths that won't work for other developers:
/Users/username//home/username/C:\Users\username\~/Desktop, ~/DocumentsAction: BLOCK if in documentation, require repo-relative paths
Commands that weaken security:
--no-verify, --insecure, -kchmod 777StrictHostKeyChecking no--allow-rootAction: Require justification comment or removal
Files with >100 lines removed:
Action: Manual review to verify deletions are intentional
New imports or package additions:
Action: Verify in package.json/requirements.txt, run security audit
Execute code-validation at Step 3: Change Review (Diff) in QA workflow:
If CRITICAL or multiple HIGH findings:
scripts/diff_analyzer.py - Analyzes git diffs for red flagsscripts/static_analyzer.py - Scans files without git contexttools
Toolkit for interacting with and testing local web applications using Playwright. Supports verifying frontend functionality, debugging UI behavior, capturing browser screenshots, and viewing browser logs.
testing
Three-step Linear update protocol after job completion - update child issue, check parent completion, update parent if all children done
testing
This skill should be used whenever users need help planning trips, creating travel itineraries, managing travel budgets, or seeking destination advice. On first use, collects comprehensive travel preferences including budget level, travel style, interests, and dietary restrictions. Generates detailed travel plans with day-by-day itineraries, budget breakdowns, packing checklists, cultural do's and don'ts, and region-specific schedules. Maintains database of preferences and past trips for personalized recommendations.
tools
Proactive token budget assessment and task chunking strategy. Use this skill when queries involve multiple large file uploads, requests for comprehensive multi-document analysis, complex multi-step workflows with heavy research (10+ tool calls), phrases like "complete analysis", "full audit", "thorough review", "deep dive", or tasks combining extensive research with large output artifacts. This skill helps assess token consumption risk early and recommend chunking strategies before beginning work.