skills/post-verify/SKILL.md
Use after code modification to discover/create tests, run them, assess coverage, and fix failures. Supports multi-angle defect prediction and semantic audit via parallel agents.
npx skillsauth add till-crazy-tears-us-apart/claude-code-engineering-suite post-verifyInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Post-implementation verification skill. Discovers existing tests, creates temporary tests when none exist, runs them, evaluates coverage and assertion quality, and drives a fix loop until all tests pass.
Supports three effort levels for multi-angle analysis:
Relationship to TDD: TDD operates before implementation (RED→GREEN→REFACTOR). This skill operates after implementation to verify completed changes. They are complementary, not overlapping.
Path Convention: All paths below are relative to
~/.claude/. UseRead("~/.claude/skills/post-verify/...")to access them — they are NOT in the project working directory.
| File | Purpose |
| :--- | :--- |
| skills/post-verify/frameworks.json | Test framework detection rules (Phase 2). User-extensible. |
| skills/post-verify/anti_patterns.json | Assertion anti-pattern detection rules (Phase 6.1). User-extensible. |
| skills/post-verify/schemas/prediction_scenario.json | Output schema for defect prediction agents (Phase 2.5). |
| skills/post-verify/schemas/audit_finding.json | Output schema for semantic audit agents (Phase 6.2). |
| skills/post-verify/prompts/predict_input_boundary.md | Prediction Angle A prompt template. |
| skills/post-verify/prompts/predict_error_path.md | Prediction Angle B prompt template. |
| skills/post-verify/prompts/predict_state_interaction.md | Prediction Angle C prompt template (high only). |
| skills/post-verify/prompts/audit_coverage_gap.md | Audit Angle A prompt template. |
| skills/post-verify/prompts/audit_assertion_strength.md | Audit Angle B prompt template (high only). |
| skills/post-verify/prompts/audit_test_isolation.md | Audit Angle C prompt template (high only). |
| skills/post-verify/templates/test_python.py.j2 | Jinja2 template for temporary Python tests (Phase 3). |
| skills/post-verify/templates/test_javascript.js.j2 | Jinja2 template for temporary JavaScript tests (Phase 3). |
| skills/post-verify/templates/test_go.go.j2 | Jinja2 template for temporary Go tests (Phase 3). |
| skills/post-verify/templates/report.md.j2 | Jinja2 template for the final report (Phase 8). |
| skills/post-verify/render.py | Template rendering helper. Uses Jinja2 when available, falls back to str.format(). |
render.py attempts import jinja2. If unavailable, all templates are rendered via built-in string formatting (functionally equivalent, but templates are not externally editable in fallback mode). Jinja2 can be installed via install.py (optional step).
| Environment Variable | Default | Description |
| :--- | :--- | :--- |
| POST_VERIFY_MAX_RETRIES | -1 (unlimited) | Maximum test-fix iterations. -1 = no limit. Positive integer = hard cap. |
| POST_VERIFY_EFFORT | medium | Fallback effort level when not specified as argument. |
/post-verify [effort] [target_files_or_functions...]
low, medium, or high (case-insensitive): use it as effort level, remaining args are targets.POST_VERIFY_EFFORT env var (default medium), all args are targets.| Effort | Prediction Angles | Semantic Audit Angles | Total Agents | | :--- | :--- | :--- | :--- | | low | 0 (skip Phase 2.5) | 0 (regex only in Phase 6) | 0 | | medium | 2 (A: Input Boundary, B: Error Path) | 1 (A: Coverage Gap) | 3 | | high | 3 (A + B + C: State/Interaction) | 3 (A + B: Assertion Strength + C: Test Isolation) | 6 |
Final reports are saved to .claude/temp_test/report_{timestamp}.md in the project directory via render.save_report(). This directory is created automatically if absent.
Goal: Determine what changed and what needs testing.
git diff --name-only HEAD (or git diff --cached --name-only if staged). If not a git repo, ask the user to specify targets via AskUserQuestion.change_set list: [{file, symbol, type}].Output: Print the change set as a summary table to the user.
Goal: Find existing tests that cover the change set.
Load detection rules from frameworks.json. Each entry defines:
indicators: file existence or file-content checks, evaluated in priority order.run_command / coverage_command: command templates with {test_files}, {module}, {package}, {test_pattern} placeholders.test_file_pattern / test_dir_patterns: glob patterns for locating test files.Execute indicator checks for each framework entry in priority order. Stop at the first match.
If no framework is detected and no test directories exist → proceed to Phase 2.5 (or Phase 3 if effort=low).
For each item in change_set:
{symbol → [test_file:test_function]}.Output: Print a coverage mapping table. Flag uncovered symbols.
Trigger: effort level is medium or high. Skip entirely when effort is low.
Goal: Use parallel agents to identify failure scenarios from multiple independent perspectives, driving targeted test generation in Phase 3.
Gather the following context for agent prompts:
git diff HEAD output for the target files (or full source if no prior version).change_set..claude/logic_index.json exists, run impact.py and include the upstream/downstream summary.Read the prompt templates from ~/.claude/skills/post-verify/prompts/:
| Effort | Agents Launched (in parallel) |
| :--- | :--- |
| medium | predict_input_boundary.md (Angle A) + predict_error_path.md (Angle B) |
| high | A + B + predict_state_interaction.md (Angle C) |
For each angle, construct the Agent call:
Agent({
description: "Post-verify: [angle name]",
prompt: "[prompt template content]\n\n---\n\n## Provided Context\n\n### Diff\n```\n{diff}\n```\n\n### Source\n```\n{source}\n```\n\n### Existing Tests\n{test_names}\n\n### Caller/Callee Context\n{impact_summary}"
})
Launch all agents in parallel (single message, multiple Agent tool calls).
symbol + similar scenario text (>80% token overlap), keep the one with higher priority.scenario_list for Phase 3 consumption.Output: Print the merged scenario list as a table:
| # | Symbol | Category | Scenario | Priority | | :--- | :--- | :--- | :--- | :--- |
Trigger: Any symbol in change_set has no existing test coverage, OR scenario_list is non-empty.
When Phase 2.5 produced scenarios:
scenario_list, generate a test targeting that specific failure condition.test_{symbol}_{category_short}_{scenario_slug}
category_short: boundary, errpath, statescenario_slug: 2-3 word kebab-case summarytrigger_input as the test's input setup.expected_behavior to derive the assertion.priority=high scenarios are generated first.| Condition | Location | Cleanup |
| :--- | :--- | :--- |
| Module importable without project-specific setup | /tmp/_postverify_{timestamp}/ | Delete entire directory after Phase 7 |
| Module requires project structure (relative imports, config files, fixtures) | Project directory: _temp_postverify_test_{timestamp}.py | Delete file after Phase 7 |
Use render.render_template() to generate test files from the appropriate language template (test_python.py.j2, test_javascript.js.j2, test_go.go.j2). Populate the context dict with:
{
"module_name": "...",
"imports": ["import ...", ...],
"test_cases": [
{
"name": "function_name_scenario_expected",
"description": "...",
"body_lines": ["result = func(arg)", "assert result == expected"],
"is_async": False
}
]
}
If the target language has no matching template, generate tests directly via LLM (no template).
Temporary tests MUST satisfy:
test_{function_name}_{scenario}_{expected_outcome}
Example: test_load_policy_missing_env_var_returns_always
frameworks.json or constructed for temp tests).When a test fails, determine the fault location before attempting any fix. Do NOT default to blaming the implementation.
Triage decision tree:
Test fails
├─ Is the assertion itself correct?
│ ├─ NO → Test defect. Fix the test.
│ └─ YES ↓
├─ Does the test setup match real-world conditions?
│ ├─ NO → Test defect. Fix setup/fixtures.
│ └─ YES ↓
├─ Is the test importing/calling the correct symbol?
│ ├─ NO → Test defect. Fix import/call.
│ └─ YES ↓
└─ Implementation defect. Fix the code.
Rule: When triage is ambiguous, report both hypotheses to the user via AskUserQuestion and let them decide. Do NOT guess.
When scenario_list exists: for each scenario-driven test that fails (revealing a real implementation issue), increment prediction_confirmed counter. This feeds into Phase 8 accuracy metrics.
iteration = 0
max_retries = POST_VERIFY_MAX_RETRIES (from env, default -1)
LOOP:
IF max_retries >= 0 AND iteration >= max_retries:
HALT. Report: "Reached maximum retry limit ({max_retries}). {N} tests still failing."
EXIT with failure summary.
1. Triage failure (Section 4.2).
2. Propose fix via AskUserQuestion:
- Show: failing test name, error message, triage conclusion (test defect vs. code defect).
- Options: "Apply fix" / "Skip this failure" / "Abort verification".
3. IF user approves:
- Apply fix (Edit tool).
- Re-run tests.
- IF all pass → break LOOP.
- ELSE → iteration += 1, continue LOOP.
4. IF user skips → continue to next failure.
5. IF user aborts → EXIT immediately, skip to Phase 7 cleanup.
Trigger: All tests pass (or user chose to skip remaining failures).
coverage_command from frameworks.json (e.g., pytest --cov={module} --cov-branch --cov-report=term-missing).Print a table:
| Symbol | Branches | Covered | Coverage | Status |
| :--- | :--- | :--- | :--- | :--- |
| load_policy | 6 | 5 | 83% | PASS |
| inject_all | 10 | 7 | 70% | FAIL |
If any symbol is below 80%:
--cov-report=term-missing or static analysis).Goal: Verify that passing tests are testing meaningful behavior, not trivially true.
Load detection rules from anti_patterns.json. Each entry defines:
id, name, severity (critical / warning / info)regex: list of patterns to match against test sourcenegative_regex (optional): patterns that, if present, negate the matchdetection: "ast_scan" for rules requiring structural analysis (no regex shortcut)languages: applicable language filterFor each relevant test file:
Grep.ast_scan rules by reading the test file and analyzing structure.negative_regex: flag only if positive matches exist AND no negative matches exist.Trigger: effort level is medium or high. Skip when low.
Read prompt templates from ~/.claude/skills/post-verify/prompts/:
| Effort | Agents Launched (in parallel) |
| :--- | :--- |
| medium | audit_coverage_gap.md (Angle A) |
| high | A + audit_assertion_strength.md (Angle B) + audit_test_isolation.md (Angle C) |
For each audit angle, construct the Agent call:
Agent({
description: "Post-verify audit: [angle name]",
prompt: "[prompt template content]\n\n---\n\n## Provided Context\n\n### Test Source\n```\n{test_code}\n```\n\n### Implementation Source\n```\n{impl_code}\n```\n\n### Test Results\n{pass_fail_summary}\n\n### Change Set\n{change_set_table}"
})
Launch all audit agents in parallel.
Print findings table. Critical findings from either layer MUST be fixed (following the same AskUser fix loop as Phase 4). Warnings are reported but not blocking.
/tmp/_postverify_{timestamp}/ — rm -rf_temp_postverify_test_{timestamp}.py in project directory — rm -f.pyc / __pycache__ generated by temp tests in /tmp.ls the temp paths to confirm deletion.Use render.save_report() to generate and persist the report to .claude/temp_test/report_{timestamp}.md.
Populate the context dict:
{
"project_name": "...",
"effort_level": "medium",
"change_set": [{"file": "...", "symbol": "...", "type": "..."}],
"coverage_map": [{"symbol": "...", "existing_count": N, "temp_count": N}],
"prediction_scenarios": [
{"symbol": "...", "scenario": "...", "category": "...", "priority": "...", "test_result": "PASS|FAIL"}
],
"prediction_accuracy": {"total": N, "confirmed_by_test": M},
"test_results": [{"name": "...", "status": "PASS/FAIL", "duration": "..."}],
"passed": N,
"total": N,
"fix_iterations": N,
"coverage_data": [{"symbol": "...", "branches": N, "covered": N, "percent": N, "status": "PASS/FAIL"}],
"audit_findings": [{"id": "...", "pattern_name": "...", "severity": "...", "file": "...", "line": N, "source": "regex|semantic"}],
"semantic_findings": [{"test_name": "...", "issue": "...", "category": "...", "severity": "...", "evidence": "...", "suggestion": "..."}],
"final_status": "PASS / FAIL"
}
Also print a condensed summary to stdout:
Post-Verify Complete
====================
Effort: {effort_level}
Change Set: {N} symbols across {M} files
Prediction: {scenarios} scenarios identified, {confirmed}/{scenarios} confirmed by tests
Tests: {existing} existing, {created} temporary (cleaned)
Results: {passed}/{total} passed | Fix iterations: {iterations}
Branch Coverage: {min}% - {max}% (threshold: 80%)
Audit: {critical} critical, {warning} warnings ({regex_count} regex + {semantic_count} semantic)
Status: PASS / FAIL
Report: .claude/temp_test/report_{timestamp}.md
AskUserQuestion.low, the skill behaves identically to the pre-v2 version (no agents, no prediction, regex-only audit).data-ai
Deep repository analysis with multi-agent parallel perspectives. Requires /init + /remy-index as prerequisites. Produces structured research reports.
tools
Analyze CI/CD failure logs to diagnose build, test, and gate failures. Supports GitHub Actions (gh CLI), local log files, and pasted logs. Produces evidence packets for /remy-patch.
development
Generate persistent unit tests for existing or stub code. Supports post-hoc testing (default) and TDD mode (--tdd). Multi-angle agent analysis at medium/high effort levels.
tools
--- name: remy-secure description: Review branch changes for security vulnerabilities. Multi-stage: regex pre-scan, parallel agents, and false-positive filtering. allowed-tools: Read, Grep, Glob, Bash, AskUserQuestion, Agent argument-hint: "[low|medium|high] [diff_range (optional, e.g. HEAD~3...HEAD)]" disable-model-invocation: true --- # Security Audit Protocol Security-focused review of code changes on the current branch. Identifies exploitable vulnerabilities with high confidence (≥ 8/10),