.claude/skills/verify-sprint/SKILL.md
Spec-fidelity verification tracing requirements through code.
npx skillsauth add leogodin217/leos_claude_starter verify-sprintInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Spec-fidelity review. Walk the spec's algorithms and requirements step-by-step, verify the code implements each one correctly. Find behavioral bugs that mechanical checks miss.
After /review-sprint passes. That command checks surface properties (coverage, linting, dead code). This command checks behavioral correctness — does the code actually do what the spec says?
/review-sprint)/arch-review)/review-tests)This is the review that asks: "If I follow the spec's algorithm with a pencil, does the code do the same thing at every step?"
CLAUDE.md — Principles (especially #7, #8)docs/architecture/pending/*.md)docs/sprints/current/spec.md — Contracts and phase breakdowngit diff --name-only main..HEAD -- '*.py' ':!tests/' ':!demos/')Load the architecture doc FIRST. Read it completely before touching any code. The spec is the oracle.
Code navigation: Use LSP tools for all code tracing — find_definition to jump to implementations, find_references to find usages, get_incoming_calls to trace call chains, get_hover for type info. Do not Grep for def foo or class Bar. Reserve Grep for pattern searches only.
Read the architecture doc and sprint spec. Extract every behavioral requirement into a checklist. Categories:
Algorithm steps: Numbered steps in resolution/processing algorithms. Each step is a requirement.
Branching logic: "If X then Y, else Z" — each branch is a requirement.
Timestamp/RNG semantics: Which distribution, what parameters, what order of RNG consumption.
Error conditions: What raises, when, with what message.
Feature interactions: How the new feature interacts with existing features (events, re-entry, mutations, deactivation).
Invariants: Properties stated as "always true" in the spec.
Write each requirement as a one-line checklist item with a spec citation:
- [ ] Step 6i: Terminal state behaviors fire after transition recorded (architecture doc line N)
- [ ] Dropout behaviors use sequential exponential gaps, not uniform (Behaviors section)
- [ ] Runtime probability sum > 1.0 raises SimulationError (Algorithm step 4)
For EACH checklist item:
Critical distinction: "Code exists for this feature" is NOT the same as "code correctly implements this feature." A function that handles transitions may still use the wrong selection algorithm.
For each requirement verified in step 2, check if a test exercises it:
Common gaps:
Look specifically at:
Encapsulation: Does new code use public APIs or reach into private attributes?
Precision: Are numeric conversions lossy? (float→int, timedelta→seconds)
Floating point: Are equality/comparison checks on accumulated floats safe?
Duplication: Is the same logic implemented twice with slight variations? (Often indicates a missed abstraction or a branch that should dispatch to different implementations.)
For each test:
A test that passes by accident (wrong layer, wrong code path, insufficient assertions) is worse than no test — it creates false confidence.
Structure findings as:
# Sprint Verification: [Name]
**Date:** YYYY-MM-DD
**Spec:** [path to architecture doc]
**Sprint:** [path to sprint spec]
## Requirements Checklist
### Algorithm Steps
- [x] Step 1: Description — VERIFIED (file:line)
- [ ] Step 6i: Terminal behaviors — MISSING (code returns before evaluation)
- [x] Step 5: Weighted selection — VERIFIED (file:line)
### Feature Interactions
- [x] Mutations visible to subsequent states — VERIFIED
- [ ] Events frozen at entry tick — NOT TESTED
### Invariants
- [x] Deterministic — VERIFIED (test exists)
- [x] Monotonic timestamps — VERIFIED
## Findings
### Bug: [Title]
**Spec says:** [Quote from spec with section reference]
**Code does:** [What actually happens, with file:line]
**Impact:** [What breaks for educators/students]
**Test gap:** [Why existing tests don't catch this]
### Code Quality: [Title]
**Location:** file:line
**Issue:** [Description]
**Severity:** High / Medium / Low
## Summary
| Category | Total | Verified | Missing | Wrong |
|----------|-------|----------|---------|-------|
| Algorithm steps | N | N | N | N |
| Feature interactions | N | N | N | N |
| Invariants | N | N | N | N |
| Error conditions | N | N | N | N |
**Verdict:** [PASS / ISSUES FOUND]
/review-sprint.development
Analyze Claude Code session transcripts — search, summarize, list, or inspect how a session went.
testing
Course designer mode for creating exercises, configs, and QA criteria.
testing
System architect mode for designing interfaces, contracts, and architecture decisions.
testing
Comprehensive test review using parallel test-reviewer agents.