.claude/skills/end-to-end-orchestrator/SKILL.md
Complete development workflow orchestrator coordinating all multi-ai skills (research → planning → implementation → testing → verification) with quality gates, failure recovery, and state management. Single-command complete workflows from objective to production-ready code. Use when implementing complete features requiring full pipeline, coordinating multiple skills automatically, or executing production-grade development cycles end-to-end.
npx skillsauth add adaptationio/skrillz end-to-end-orchestratorInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
end-to-end-orchestrator provides single-command complete development workflows, coordinating all 5 multi-ai skills from research through production deployment.
Purpose: Transform "I want feature X" into production-ready code through automated skill coordination
Pattern: Workflow-based (5-stage pipeline with quality gates)
Key Innovation: Automatic orchestration of research → planning → implementation → testing → verification with failure recovery and quality gates
The Complete Pipeline:
Input: Feature description
↓
1. Research (multi-ai-research) [optional]
↓ [Quality Gate: Research complete]
2. Planning (multi-ai-planning)
↓ [Quality Gate: Plan ≥90/100]
3. Implementation (multi-ai-implementation)
↓ [Quality Gate: Tests pass, coverage ≥80%]
4. Testing (multi-ai-testing)
↓ [Quality Gate: Coverage ≥95%, verified]
5. Verification (multi-ai-verification)
↓ [Quality Gate: Score ≥90/100, all layers pass]
Output: Production-ready code
Use end-to-end-orchestrator when:
When NOT to Use:
Purpose: Ground implementation in proven patterns
Process:
Determine if Research Needed:
// Check if objective is familiar
const similarWork = await recallMemory({
type: 'episodic',
query: objective
});
if (similarWork.length === 0) {
// Unfamiliar domain → research needed
needsResearch = true;
} else {
// Familiar → can skip research, use past learnings
needsResearch = false;
}
Execute Research (if needed):
Use multi-ai-research for "[domain] implementation patterns and best practices"
What It Provides:
Quality Gate: Research Complete:
✅ Research findings documented
✅ Patterns identified (minimum 2)
✅ Best practices extracted (minimum 3)
✅ Quality score ≥95/100
If Fail: Research incomplete → retry research OR proceed without (user decides)
Outputs:
Time: 30-60 minutes (can skip if familiar domain)
Next: Proceed to Stage 2
Purpose: Create agent-executable plan with quality ≥90/100
Process:
Load Research Context (if research done):
let context = "";
if (researchDone) {
context = await readFile('.analysis/ANALYSIS_FINAL.md');
}
Invoke Planning:
Use multi-ai-planning to create plan for [objective]
${context ? `Research findings available in: .analysis/ANALYSIS_FINAL.md` : ''}
Create comprehensive plan following 6-step workflow.
What It Does:
Quality Gate: Plan Approved:
✅ Plan created
✅ Quality score ≥90/100
✅ All tasks have verification
✅ Dependencies mapped
✅ No circular dependencies
If Fail (score <90):
Save Plan to Shared State:
# Save for next stage
cp plans/[plan-id]/plan.json .multi-ai-context/plan.json
Outputs:
Time: 1.5-3 hours
Next: Proceed to Stage 3
Purpose: Execute plan with TDD, produce working code
Process:
Load Plan:
const plan = JSON.parse(readFile('.multi-ai-context/plan.json'));
console.log(`📋 Loaded plan: ${plan.objective}`);
console.log(` Tasks: ${plan.tasks.length}`);
console.log(` Estimated: ${plan.metadata.estimated_total_hours} hours`);
Invoke Implementation:
Use multi-ai-implementation following plan in .multi-ai-context/plan.json
Execute all 6 steps:
1. Explore & gather context
2. Plan architecture (plan already created, refine as needed)
3. Implement incrementally with TDD
4. Coordinate multi-agent (if parallel tasks)
5. Integration & E2E testing
6. Quality verification before commit
Success criteria from plan.
What It Does:
Quality Gate: Implementation Complete:
✅ All plan tasks implemented
✅ All tests passing
✅ Coverage ≥80% (gate), ideally ≥95%
✅ No regressions
✅ Doom loop avoided (< max retries)
If Fail:
Save Implementation State:
# Save for next stage
echo '{
"status": "implemented",
"files_changed": [...],
"tests_run": 95,
"tests_passed": 95,
"coverage": 87,
"commits": ["abc123", "def456"]
}' > .multi-ai-context/implementation-status.json
Outputs:
Time: 3-10 hours (varies by complexity)
Next: Proceed to Stage 4
Purpose: Verify tests are comprehensive and prevent gaming
Process:
Load Implementation Context:
const implStatus = JSON.parse(
readFile('.multi-ai-context/implementation-status.json')
);
console.log(`🧪 Testing implementation:`);
console.log(` Files changed: ${implStatus.files_changed.length}`);
console.log(` Current coverage: ${implStatus.coverage}%`);
Invoke Independent Testing:
Use multi-ai-testing independent verification workflow
Verify:
- Tests in: tests/
- Code in: src/
- Specifications in: .multi-ai-context/plan.json
Workflows to execute:
1. Test quality verification (independent agent)
2. Coverage validation (≥95% target)
3. Edge case discovery (AI-powered)
4. Multi-agent ensemble scoring (if critical feature)
Score test quality (0-100).
What It Does:
Quality Gate: Testing Verified:
✅ Test quality score ≥90/100
✅ Coverage ≥95% (target achieved)
✅ Independent verification passed
✅ No test gaming detected
✅ Edge cases covered
If Fail:
Save Testing State:
echo '{
"status": "tested",
"test_quality_score": 92,
"coverage": 96,
"tests_total": 112,
"edge_cases": 23,
"gaming_detected": false
}' > .multi-ai-context/testing-status.json
Outputs:
Time: 1-3 hours
Next: Proceed to Stage 5
Purpose: Final quality assurance before production
Process:
Load All Context:
const plan = JSON.parse(readFile('.multi-ai-context/plan.json'));
const implStatus = JSON.parse(readFile('.multi-ai-context/implementation-status.json'));
const testStatus = JSON.parse(readFile('.multi-ai-context/testing-status.json'));
console.log(`🔍 Final verification:`);
console.log(` Objective: ${plan.objective}`);
console.log(` Implementation: ${implStatus.status}`);
console.log(` Testing: ${testStatus.coverage}% coverage`);
Invoke Multi-Layer Verification:
Use multi-ai-verification for complete quality check
Verify:
- Code: src/
- Tests: tests/
- Plan: .multi-ai-context/plan.json
Execute all 5 layers:
1. Rules-based (linting, types, schema, SAST)
2. Functional (tests, coverage, examples)
3. Visual (if UI: screenshots, a11y)
4. Integration (E2E, API compatibility)
5. Quality scoring (LLM-as-judge, 0-100)
All 5 quality gates must pass.
What It Does:
Quality Gate: Production Ready:
✅ Layer 1 (Rules): PASS
✅ Layer 2 (Functional): PASS, coverage 96%
✅ Layer 3 (Visual): PASS or SKIPPED
✅ Layer 4 (Integration): PASS
✅ Layer 5 (Quality): 92/100 ≥90 ✅
ALL GATES PASSED → PRODUCTION APPROVED
If Fail:
Generate Final Report:
# Feature Implementation Complete
**Objective**: [from plan]
## Pipeline Execution Summary
### Stage 1: Research
- Status: ✅ Complete
- Quality: 97/100
- Time: 52 minutes
### Stage 2: Planning
- Status: ✅ Complete
- Quality: 94/100
- Tasks: 23
- Time: 1.8 hours
### Stage 3: Implementation
- Status: ✅ Complete
- Files changed: 15
- Lines added: 847
- Commits: 12
- Time: 6.2 hours
### Stage 4: Testing
- Status: ✅ Complete
- Test quality: 92/100
- Coverage: 96%
- Tests: 112
- Time: 1.5 hours
### Stage 5: Verification
- Status: ✅ Complete
- Quality score: 92/100
- All layers: PASS
- Time: 1.2 hours
## Final Metrics
- **Total Time**: 11.3 hours
- **Quality**: 92/100
- **Coverage**: 96%
- **Status**: ✅ PRODUCTION READY
## Commits
- abc123: feat: Add database schema
- def456: feat: Implement OAuth integration
- [... 10 more ...]
## Next Steps
- Create PR for team review
- Deploy to staging
- Production release
Save to Memory (if agent-memory-system available):
await storeMemory({
type: 'episodic',
event: {
description: `Complete implementation: ${objective}`,
outcomes: {
total_time: 11.3,
quality_score: 92,
test_coverage: 96,
stages_completed: 5
},
learnings: extractedDuringPipeline
}
});
Outputs:
Time: 30-90 minutes
Result: ✅ PRODUCTION READY
Stage Fails → Recovery Strategy:
Research Fails:
Planning Fails (score <90):
Implementation Fails:
Testing Fails (coverage <80% or quality <90):
Verification Fails (score <90 or layer fails):
When to Escalate to Human:
Escalation Format:
# ⚠️ ESCALATION REQUIRED
**Stage**: Implementation (Stage 3)
**Failure**: Doom loop detected (3 failed attempts)
## Context
- Objective: Implement user authentication
- Failing Task: 2.2.2 Token generation
- Error: Tests fail with "undefined userId" repeatedly
## Attempts Made
1. Attempt 1: Added userId to payload → Same error
2. Attempt 2: Changed payload structure → Same error
3. Attempt 3: Different JWT library → Same error
## Root Cause Analysis
- Tests expect `user.id` but implementation uses `user.userId`
- Mismatch in data model between test and implementation
- Auto-fix failed 3 times
## Recommended Actions
1. Review test specifications vs. implementation
2. Align data model (user.id vs. user.userId)
3. Manual intervention required
## State Saved
- Checkpoint: checkpoint-003 (before attempts)
- Rollback available: `git checkout checkpoint-003`
- Continue after fix: Resume from Task 2.2.2
From Plan:
const plan = readFile('.multi-ai-context/plan.json');
// Plan identifies parallel groups
const parallelGroups = plan.parallel_groups;
// Example:
// Group 1: Tasks 2.1, 2.2, 2.3 (independent)
// Can execute in parallel
Pattern:
// Stage 3: Implementation with parallel tasks
const parallelGroup = plan.parallel_groups.find(g => g.group_id === 'pg2');
// Spawn parallel implementation agents
const results = await Promise.all(
parallelGroup.tasks.map(taskId => {
const task = plan.tasks.find(t => t.id === taskId);
return task({
description: `Implement ${task.description}`,
prompt: `Implement task ${task.id}: ${task.description}
Specifications from plan:
${JSON.stringify(task, null, 2)}
Success criteria:
${task.verification.success_criteria.join('\n')}
Write implementation and tests.
Report completion status.`
});
})
);
// Verify all parallel tasks completed
const allSucceeded = results.every(r => r.status === 'complete');
if (allSucceeded) {
// Proceed to integration
} else {
// Handle failures
}
Time Savings: 20-40% faster than sequential execution
Shared Context Directory: .multi-ai-context/
Standard Files:
.multi-ai-context/
├── research-findings.json # From multi-ai-research
├── plan.json # From multi-ai-planning
├── implementation-status.json # From multi-ai-implementation
├── testing-status.json # From multi-ai-testing
├── verification-report.json # From multi-ai-verification
├── pipeline-state.json # Orchestrator state
└── failure-history.json # For doom loop detection
Benefits:
Real-Time Progress:
{
"pipeline_id": "pipeline_20250126_1200",
"objective": "Implement user authentication",
"started_at": "2025-01-26T12:00:00Z",
"current_stage": 3,
"stages": [
{
"stage": 1,
"name": "Research",
"status": "complete",
"duration_minutes": 52,
"quality": 97
},
{
"stage": 2,
"name": "Planning",
"status": "complete",
"duration_minutes": 108,
"quality": 94
},
{
"stage": 3,
"name": "Implementation",
"status": "in_progress",
"started_at": "2025-01-26T13:48:00Z",
"tasks_total": 23,
"tasks_complete": 15,
"tasks_remaining": 8,
"percent_complete": 65
},
{
"stage": 4,
"name": "Testing",
"status": "pending"
},
{
"stage": 5,
"name": "Verification",
"status": "pending"
}
],
"estimated_completion": "2025-01-26T20:00:00Z",
"quality_target": 90,
"current_quality_estimate": 92
}
Query Progress:
# Check current status
cat .multi-ai-context/pipeline-state.json | jq '.current_stage, .stages[2].percent_complete'
# Output: Stage 3, 65% complete
All 5 Stages:
Research → Planning → Implementation → Testing → Verification
Time: 8-20 hours Quality: Maximum (all gates, ≥90) Use For: Production features, complex implementations
4 Stages (familiar domains):
Planning → Implementation → Testing → Verification
Time: 6-15 hours Quality: High (all gates except research) Use For: Familiar domains, time-sensitive features
Implementation + Basic Verification:
Planning → Implementation → Testing (basic) → Verification (Layers 1-2 only)
Time: 3-8 hours Quality: Good (essential gates only) Use For: Internal tools, prototypes
Even for "simple" features - planning quality ≥90 prevents issues
If similar work done before, recall patterns instead of researching
Check .multi-ai-context/pipeline-state.json to track progress
If gate fails, there's a real issue - don't skip fixes
Each stage completion saves state (enables recovery)
Complete understanding of what was built and quality achieved
Coordinates:
Provides:
Before Pipeline:
After Pipeline:
Session Hooks:
Notification Hooks:
| Stage | Skill | Time | Quality Gate | Output | |-------|-------|------|--------------|--------| | 1 | multi-ai-research | 30-60m | ≥95/100 | Research findings | | 2 | multi-ai-planning | 1.5-3h | ≥90/100 | Executable plan | | 3 | multi-ai-implementation | 3-10h | Tests pass, ≥80% cov | Working code | | 4 | multi-ai-testing | 1-3h | ≥95% cov, quality ≥90 | Verified tests | | 5 | multi-ai-verification | 1-3h | ≥90/100, all layers | Production ready |
Total: 8-20 hours → Production-ready feature
| Mode | Stages | Time | Quality | Use For | |------|--------|------|---------|---------| | Standard | All 5 | 8-20h | Maximum | Production features | | Fast | 2-5 (skip research) | 6-15h | High | Familiar domains | | Quick | 2,3,4,5 (basic) | 3-8h | Good | Internal tools |
end-to-end-orchestrator provides complete automation from feature description to production-ready code, coordinating all 5 multi-ai skills with quality gates, failure recovery, and state management - delivering enterprise-grade development workflows in a single command.
For examples, see examples/. For failure recovery, see Failure Recovery section.
development
Setup secure web-based terminal access to WSL2 from mobile/tablet via ttyd + ngrok/Cloudflare/Tailscale. One-command install, start, stop, status. Use when you need remote terminal access, web terminal, browser-based shell, or mobile access to WSL2 environment.
development
Complete development workflows where Claude writes the code while Gemini and Codex provide research, planning, reviews, and different perspectives. Claude remains the main developer. Use for complex projects requiring expert planning and multi-perspective reviews.
development
Systematic progress tracking for skill development. Manages task states (pending/in_progress/completed), updates in real-time, reports progress, identifies blockers, and maintains momentum. Use when tracking skill development, coordinating work, or reporting progress.
testing
Comprehensive testing workflow orchestrating functional testing, example validation, integration testing, and usability assessment. Sequential workflow for complete skill testing from examples through scenarios to integration validation. Use when conducting thorough testing, pre-deployment validation, ensuring skill functionality, or comprehensive quality checks.