.claude/skills/edd/SKILL.md
Eval-Driven Development (EDD) Framework v2.87.0 - Define-before-implement pattern with structured evals. Provides workflow: Define specifications → Implement features → Verify against evals. Components: TEMPLATE.md for eval definitions, edd.sh CLI script, /edd skill invocation. Check types: CC- (Capability), BC- (Behavior), NFC- (Non-Functional). Integrates with orchestrator workflow for quality-first development. Keywords: evals, define, implement, verify, capability checks, behavior checks, non-functional checks, template, quality assurance, test-driven, specification. Use when: defining new features with structured evals, implementing with verification requirements, creating quality specifications, TDD-style workflow with evals.
npx skillsauth add alfredolopez80/multi-agent-ralph-loop eddInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Eval-Driven Development is a quality-first development pattern that enforces define-before-implement workflow with structured evaluations.
~/.claude/settings.json or CLI/env varsANTHROPIC_DEFAULT_*_MODEL env varsEDD provides a systematic approach to software development with three phases:
| Prefix | Type | Purpose |
|--------|------|---------|
| CC- | Capability Checks | Feature capabilities and functionality |
| BC- | Behavior Checks | Expected behaviors and responses |
| NFC- | Non-Functional Checks | Performance, security, maintainability |
# Invoke EDD workflow
/edd "Define memory-search feature"
# CLI script (if available)
ralph edd define memory-search
ralph edd check memory-search
Each eval definition includes:
# Memory Search Eval
**Status**: DRAFT
**Created**: 2026-01-30
## Capability Checks
- [ ] CC-1: Search across semantic memory
- [ ] CC-2: Support filtering by type
## Behavior Checks
- [ ] BC-1: Returns ranked results
- [ ] BC-2: Handles empty queries gracefully
## Non-Functional Checks
- [ ] NFC-1: Search completes in <2s
- [ ] NFC-2: Memory usage <100MB
## Implementation Notes
- Use parallel search for performance
- Cache frequent queries
## Verification Evidence
- Test results attached
EDD integrates with the orchestrator workflow to ensure quality-first development:
EDD framework now supports swarm mode for parallel evaluation across multiple check types.
When invoked via /edd, the framework automatically spawns a specialized evaluation team:
Task:
subagent_type: "general-purpose"
model: "sonnet"
team_name: "edd-evaluation-team"
name: "edd-coordinator"
mode: "delegate"
run_in_background: true
prompt: |
Execute Eval-Driven Development workflow for: $ARGUMENTS
EDD Pattern:
1. DEFINE - Create structured eval specifications
2. DISTRIBUTE - Assign check types to specialists
3. VERIFY - Validate against eval criteria
4. CONSOLIDATE - Merge findings from all evaluators
| Role | Purpose | Specialization | |------|---------|----------------| | Coordinator | EDD workflow orchestration | Manages eval lifecycle, consolidates findings | | Teammate 1 | Capability Checks specialist | CC- prefix: feature capabilities and functionality | | Teammate 2 | Behavior Checks specialist | BC- prefix: expected behaviors and responses | | Teammate 3 | Non-Functional Checks specialist | NFC- prefix: performance, security, maintainability |
User invokes: /edd "Define memory-search feature"
1. Team "edd-evaluation-team" created
2. Coordinator (edd-coordinator) receives task
3. 3 Teammates spawned with check-type specializations
4. Eval definition distributed:
- Teammate 1 → Capability Checks (CC-)
- Teammate 2 → Behavior Checks (BC-)
- Teammate 3 → Non-Functional Checks (NFC-)
5. Teammates work in parallel (background execution)
6. Coordinator monitors progress and gathers results
7. Findings consolidated into single eval specification
8. Final eval document returned
Each teammate focuses on their check type:
# Teammate 1: Capability Checks
CC-1: Feature can perform X
CC-2: Feature supports Y configuration
CC-3: Feature integrates with Z system
# Teammate 2: Behavior Checks
BC-1: Feature handles error case A gracefully
BC-2: Feature returns expected response for B
BC-3: Feature maintains state across C
# Teammate 3: Non-Functional Checks
NFC-1: Response time < 100ms
NFC-2: Memory usage < 50MB
NFC-3: Security vulnerability scan passes
Teammates use the built-in mailbox system:
# Teammate sends finding to coordinator
SendMessage:
type: "message"
recipient: "edd-coordinator"
content: "CC-3 defined: Feature integrates with auth system via OAuth2"
All teammates share a unified task list:
# Location: ~/.claude/tasks/edd-evaluation-team/tasks.json
# Example tasks:
[
{"id": "1", "subject": "Define Capability Checks", "owner": "teammate-1"},
{"id": "2", "subject": "Define Behavior Checks", "owner": "teammate-2"},
{"id": "3", "subject": "Define Non-Functional Checks", "owner": "teammate-3"},
{"id": "4", "subject": "Consolidate eval specification", "owner": "edd-coordinator"}
]
To disable swarm mode:
/edd "Define feature X" --no-swarm
# Evals saved to ~/.claude/evals/
ls ~/.claude/evals/
# View last eval
cat ~/.claude/evals/latest.md
Test suite: tests/test_v264_edd_framework.bats (33 tests)
Run tests:
bats tests/test_v264_edd_framework.bats
Additional tests for swarm mode integration:
# Test swarm team creation
tests/edd/test-swarm-team-creation.sh
# Test parallel evaluation
tests/edd/test-parallel-evaluation.sh
Current: Framework defined with swarm mode integration (v2.81.1) Note: TEMPLATE.md and evals directory structure ready for use
Version: v2.64 | Status: DRAFT | Tests: 33 passing
Esta skill genera reportes automáticos completos para trazabilidad:
Cuando esta skill completa, se genera automáticamente:
docs/actions/edd/{timestamp}.md.claude/metadata/actions/edd/{timestamp}.jsonCada reporte incluye:
# Listar todos los reportes de esta skill
ls -lt docs/actions/edd/
# Ver el reporte más reciente
cat $(ls -t docs/actions/edd/*.md | head -1)
# Buscar reportes fallidos
grep -l "Status: FAILED" docs/actions/edd/*.md
source .claude/lib/action-report-lib.sh
start_action_report "edd" "Task description"
# ... ejecución ...
complete_action_report "success" "Summary" "Recommendations"
development
Living knowledge base management. Actions: search (query vault), save (store learning), index (update indices), compile (raw->wiki->rules graduation), init (create vault structure). Follows Karpathy pipeline: ingest->compile->query. Use when: (1) searching accumulated knowledge, (2) saving learnings, (3) compiling raw notes into wiki, (4) initializing a new vault. Triggers: /vault, 'vault search', 'knowledge base', 'save learning'.
testing
Produce a verifiable technical specification before coding. 6 mandatory sections: Interfaces, Behaviors, Invariants (from Aristotle Phase 2), File Plan, Test Plan, Exit Criteria (executable bash commands + expected results). Use when: (1) before implementing features with complexity > 4, (2) as Step 1.5 in orchestrator workflow, (3) when requirements need formalization. Triggers: /spec, 'create spec', 'write specification', 'technical spec'.
testing
Pre-launch shipping checklist orchestrating /gates, /security, /browser-test, /perf. Ensures nothing ships without passing all quality checks. Use when: (1) before deploying, (2) before merging to main, (3) before release. Triggers: /ship, 'ship it', 'ready to deploy', 'pre-launch check'.
development
Performance optimization skill. Core Web Vitals via Lighthouse, bundle size analysis, metrics tracking over time. Use when: (1) optimizing frontend performance, (2) analyzing bundle size, (3) tracking metrics regression. Triggers: /perf, 'performance audit', 'core web vitals', 'bundle size'.