Evaluate Sprint Spec

Adversarial evaluation of sprint spec before implementation. Run in a new session to ensure fresh eyes.

Purpose

Find problems in the spec that would cause implementation to fail or produce poor results. The evaluator is deliberately adversarial — looking for ways the spec could be misinterpreted, is incomplete, or violates principles.

Context Loading

Load ONLY:

CLAUDE.md — Principles (especially #7 and #8)
docs/sprints/current/spec.md — The spec to evaluate

DO NOT load:

Architecture docs (spec should be self-contained)
Capability docs (evaluating spec as written)
Previous conversation context (that's why new session)
Source files via Read — Do NOT read .py files to understand the codebase. Use LSP tools instead (see below).

Code Verification — LSP Only

When verifying spec claims against the codebase (class names, function signatures, caller counts, line numbers), use only LSP tools. Do NOT read entire source files — this wastes context on code irrelevant to the evaluation.

| Verification need | LSP tool | Example | |---|---|---| | "Does this class/function exist?" | find_definition | Spec says TypeResolver — verify the actual class name | | "Who calls this function?" | find_references | Spec says "only two callers" — verify caller count | | "What's the signature?" | get_hover | Spec says rng param at line 203 — verify | | "What type is this?" | get_hover | Spec references Distribution union — verify it exists | | "Does this symbol exist in the module?" | find_workspace_symbols | Spec says export from __init__.py — verify |

NEVER use Read on source files for this skill. If you catch yourself about to read a .py file, use an LSP tool instead.

Evaluation Checklist

Structural Checks

[ ] Every phase has: Delivers, Demo, Contracts sections
[ ] Every contract has: full signature with types
[ ] Every contract has: docstring with Args, Returns, Raises
[ ] Success criteria are checkboxes (testable)
[ ] Scope lists what's NOT included
[ ] File structure section shows where code goes

Principle Checks

[ ] No default parameters in any signature (Principle #7)
[ ] No = None, = [], = {} in signatures (Principle #7)
[ ] No "Future:", "TODO:", "placeholder", "stub" language (Principle #8)
[ ] No loops/iterations described that "will do X later" (Principle #8)
[ ] No hardcoded domain values in contracts (Principle #2)

Consistency Checks

[ ] All types referenced in contracts exist (in spec or codebase)
[ ] All functions called in demos are defined (in spec or codebase)
[ ] Phase N only uses contracts from phases 1..N (dependency order)
[ ] Stated scope matches contracts (nothing extra, nothing missing)
[ ] Contract names follow existing codebase conventions

Ambiguity Checks

[ ] No weasel words: "appropriate", "as needed", "etc.", "various", "properly"
[ ] No vague verbs: "handle", "process", "manage" without specifics
[ ] Error conditions are specific (not just "raises Exception")
[ ] Return types are concrete (not "suitable value")
[ ] None/empty input behavior is specified
[ ] Edge cases appear in Raises section

Testability Checks

[ ] Each contract has at least one obvious test case
[ ] Demo requirements are specific enough to automate
[ ] Success criteria are measurable (not "works correctly")
[ ] Can write a test that would FAIL if contract is wrong

Architecture Checks

[ ] New code doesn't break existing interfaces
[ ] Module placement matches existing structure
[ ] Naming follows existing conventions
[ ] No circular dependencies introduced

Evaluation Process

Step 1: Read Spec Cold

Read the entire spec without referring to other docs. Note:

What's confusing?
What questions do you have?
What seems underspecified?

Step 2: Run Checklist

Go through each check systematically. For every failure:

Note the location (section, line if possible)
Describe the problem specifically
Suggest a fix

Step 3: Adversarial Questions

For each contract, ask:

"What's the worst reasonable misinterpretation?"
"What input would break this?"
"What if I implemented this lazily/wrong — would tests catch it?"

Step 4: Dependency Trace

Trace through phases in order:

Phase 1: What does it need? (should be nothing or existing code)
Phase 2: What does it need from Phase 1? Is that actually delivered?
Continue for all phases

Step 5: Demo Feasibility

For each demo:

Can I actually write this with the contracts provided?
Does it prove what it claims to prove?
What could pass the demo but still be wrong?

Output Format

# Spec Evaluation: [Sprint Name]

**Verdict: PASS / NEEDS WORK / FAIL**

**Summary:** [One sentence assessment]

---

## Blocking Issues

Issues that MUST be fixed before implementation.

### 1. [Short Title]

**Location:** [Section/line reference]
**Category:** [Structural/Principle/Consistency/Ambiguity/Testability/Architecture]
**Problem:** [Specific description]
**Impact:** [What goes wrong if not fixed]
**Suggested Fix:** [How to fix it]

---

## Warnings

Issues that SHOULD be addressed but aren't blocking.

### 1. [Short Title]

**Location:** [Section/line reference]
**Problem:** [Description]
**Suggestion:** [How to improve]

---

## Notes

Observations that aren't issues but worth considering.

- [Note 1]
- [Note 2]

---

## Checklist Results

| Category | Pass | Fail | Issues |
|----------|------|------|--------|
| Structural | 5 | 1 | Missing file structure |
| Principles | 4 | 0 | — |
| Consistency | 3 | 1 | Undefined type |
| Ambiguity | 3 | 2 | Weasel words |
| Testability | 3 | 0 | — |
| Architecture | 4 | 0 | — |

---

## Verdict Explanation

**PASS:** No blocking issues. Warnings are minor. Spec is ready for implementation.

**NEEDS WORK:** No blocking issues but warnings are significant. Review before proceeding.

**FAIL:** Blocking issues found. Must fix and re-evaluate before implementation.

Verdicts

PASS

Zero blocking issues
Warnings are cosmetic or minor
Checklist mostly green
Confident implementation will succeed

NEEDS WORK

Zero blocking issues
But significant warnings that could cause problems
User should review warnings before proceeding
Implementation might need course correction

FAIL

One or more blocking issues
Spec is incomplete, contradictory, or violates principles
Must fix and run eval-sprint again
Do NOT proceed to implement-sprint

Common Blocking Issues

| Issue | Why It Blocks | |-------|---------------| | Missing contract for stated scope | Implementer won't know what to build | | Undefined type referenced | Code won't compile | | Default parameter in signature | Principle #7 violation | | Phase dependency violation | Phase N can't be built | | Ambiguous return type | Implementer will guess wrong |

Common Warnings

| Issue | Why It's a Warning | |-------|-------------------| | Vague error message | Implementation will work but be unhelpful | | Missing edge case | Might cause bug but not blocking | | Inconsistent naming | Annoying but not fatal | | Demo doesn't fully exercise contract | Partial verification |

After Evaluation

If PASS: Proceed to /implement-sprint
If NEEDS WORK: Review warnings, decide to proceed or fix
If FAIL: Fix blocking issues, run /eval-sprint again

Tips for Evaluators

Be skeptical — Assume the spec has problems until proven otherwise
Read literally — Don't fill in gaps with what you think they meant
Think like a lazy implementer — What's the minimal interpretation?
Think like a malicious implementer — What technically satisfies the spec but is wrong?
Check the edges — Empty lists, None values, zero counts, boundary conditions

Evaluate Sprint Spec

Adversarial evaluation of sprint spec before implementation. Run in a new session to ensure fresh eyes.

Purpose

Context Loading

Load ONLY:

CLAUDE.md — Principles (especially #7 and #8)
docs/sprints/current/spec.md — The spec to evaluate

DO NOT load:

Architecture docs (spec should be self-contained)
Capability docs (evaluating spec as written)
Previous conversation context (that's why new session)
Source files via Read — Do NOT read .py files to understand the codebase. Use LSP tools instead (see below).

Code Verification — LSP Only

NEVER use Read on source files for this skill. If you catch yourself about to read a .py file, use an LSP tool instead.

Evaluation Checklist

Structural Checks

[ ] Every phase has: Delivers, Demo, Contracts sections
[ ] Every contract has: full signature with types
[ ] Every contract has: docstring with Args, Returns, Raises
[ ] Success criteria are checkboxes (testable)
[ ] Scope lists what's NOT included
[ ] File structure section shows where code goes

Principle Checks

[ ] No default parameters in any signature (Principle #7)
[ ] No = None, = [], = {} in signatures (Principle #7)
[ ] No "Future:", "TODO:", "placeholder", "stub" language (Principle #8)
[ ] No loops/iterations described that "will do X later" (Principle #8)
[ ] No hardcoded domain values in contracts (Principle #2)

Consistency Checks

[ ] All types referenced in contracts exist (in spec or codebase)
[ ] All functions called in demos are defined (in spec or codebase)
[ ] Phase N only uses contracts from phases 1..N (dependency order)
[ ] Stated scope matches contracts (nothing extra, nothing missing)
[ ] Contract names follow existing codebase conventions

Ambiguity Checks

[ ] No weasel words: "appropriate", "as needed", "etc.", "various", "properly"
[ ] No vague verbs: "handle", "process", "manage" without specifics
[ ] Error conditions are specific (not just "raises Exception")
[ ] Return types are concrete (not "suitable value")
[ ] None/empty input behavior is specified
[ ] Edge cases appear in Raises section

Testability Checks

[ ] Each contract has at least one obvious test case
[ ] Demo requirements are specific enough to automate
[ ] Success criteria are measurable (not "works correctly")
[ ] Can write a test that would FAIL if contract is wrong

Architecture Checks

[ ] New code doesn't break existing interfaces
[ ] Module placement matches existing structure
[ ] Naming follows existing conventions
[ ] No circular dependencies introduced

Evaluation Process

Step 1: Read Spec Cold

Read the entire spec without referring to other docs. Note:

What's confusing?
What questions do you have?
What seems underspecified?

Step 2: Run Checklist

Go through each check systematically. For every failure:

Note the location (section, line if possible)
Describe the problem specifically
Suggest a fix

Step 3: Adversarial Questions

For each contract, ask:

"What's the worst reasonable misinterpretation?"
"What input would break this?"
"What if I implemented this lazily/wrong — would tests catch it?"

Step 4: Dependency Trace

Trace through phases in order:

Phase 1: What does it need? (should be nothing or existing code)
Phase 2: What does it need from Phase 1? Is that actually delivered?
Continue for all phases

Step 5: Demo Feasibility

For each demo:

Can I actually write this with the contracts provided?
Does it prove what it claims to prove?
What could pass the demo but still be wrong?

Output Format

# Spec Evaluation: [Sprint Name]

**Verdict: PASS / NEEDS WORK / FAIL**

**Summary:** [One sentence assessment]

---

## Blocking Issues

Issues that MUST be fixed before implementation.

### 1. [Short Title]

**Location:** [Section/line reference]
**Category:** [Structural/Principle/Consistency/Ambiguity/Testability/Architecture]
**Problem:** [Specific description]
**Impact:** [What goes wrong if not fixed]
**Suggested Fix:** [How to fix it]

---

## Warnings

Issues that SHOULD be addressed but aren't blocking.

### 1. [Short Title]

**Location:** [Section/line reference]
**Problem:** [Description]
**Suggestion:** [How to improve]

---

## Notes

Observations that aren't issues but worth considering.

- [Note 1]
- [Note 2]

---

## Checklist Results

| Category | Pass | Fail | Issues |
|----------|------|------|--------|
| Structural | 5 | 1 | Missing file structure |
| Principles | 4 | 0 | — |
| Consistency | 3 | 1 | Undefined type |
| Ambiguity | 3 | 2 | Weasel words |
| Testability | 3 | 0 | — |
| Architecture | 4 | 0 | — |

---

## Verdict Explanation

**PASS:** No blocking issues. Warnings are minor. Spec is ready for implementation.

**NEEDS WORK:** No blocking issues but warnings are significant. Review before proceeding.

**FAIL:** Blocking issues found. Must fix and re-evaluate before implementation.

Verdicts

PASS

Zero blocking issues
Warnings are cosmetic or minor
Checklist mostly green
Confident implementation will succeed

NEEDS WORK

Zero blocking issues
But significant warnings that could cause problems
User should review warnings before proceeding
Implementation might need course correction

FAIL

One or more blocking issues
Spec is incomplete, contradictory, or violates principles
Must fix and run eval-sprint again
Do NOT proceed to implement-sprint

Common Blocking Issues

Common Warnings

After Evaluation

If PASS: Proceed to /implement-sprint
If NEEDS WORK: Review warnings, decide to proceed or fix
If FAIL: Fix blocking issues, run /eval-sprint again

Tips for Evaluators

Be skeptical — Assume the spec has problems until proven otherwise
Read literally — Don't fill in gaps with what you think they meant
Think like a lazy implementer — What's the minimal interpretation?
Think like a malicious implementer — What technically satisfies the spec but is wrong?
Check the edges — Empty lists, None values, zero counts, boundary conditions

Adoption

leogodin217/eval-sprint

$ install --global

Security Scan Results

SKILL.md

Evaluate Sprint Spec

Purpose

Context Loading

Code Verification — LSP Only

Evaluation Checklist

Structural Checks

Principle Checks

Consistency Checks

Ambiguity Checks

Testability Checks

Architecture Checks

Evaluation Process

Step 1: Read Spec Cold

Step 2: Run Checklist

Step 3: Adversarial Questions

Step 4: Dependency Trace

Step 5: Demo Feasibility

Output Format

Verdicts

PASS

NEEDS WORK

FAIL

Common Blocking Issues

Common Warnings

After Evaluation

Tips for Evaluators

Related Skills

leogodin217/verify-sprint

leogodin217/session

leogodin217/role-educator

leogodin217/role-architect

leogodin217/eval-sprint

$ install --global

Security Scan Results

SKILL.md

Evaluate Sprint Spec

Purpose

Context Loading

Code Verification — LSP Only

Evaluation Checklist

Structural Checks

Principle Checks

Consistency Checks

Ambiguity Checks

Testability Checks

Architecture Checks

Evaluation Process

Step 1: Read Spec Cold

Step 2: Run Checklist

Step 3: Adversarial Questions

Step 4: Dependency Trace

Step 5: Demo Feasibility

Output Format

Verdicts

PASS

NEEDS WORK

FAIL

Common Blocking Issues

Common Warnings

After Evaluation

Tips for Evaluators

Related Skills

leogodin217/verify-sprint

leogodin217/session

leogodin217/role-educator

leogodin217/role-architect