.claude/skills/error-recovery-escalation/SKILL.md
5-level error recovery escalation: retry, nudge, replan, fallback, force-done — each with entry criteria and timeout before escalating
npx skillsauth add oimiragieo/agent-studio error-recovery-escalationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Provide a structured, 5-level escalation ladder for recovering from agent errors. Each level has specific entry criteria, an action, and a timeout before escalating to the next level.
Level 1: RETRY — Same action, transient error
Level 2: NUDGE — Adjust parameters, same approach
Level 3: REPLAN — New approach, same goal
Level 4: FALLBACK — Different agent or model
Level 5: FORCE-DONE — Partial results with explanation
Skill({ skill: 'error-recovery-escalation' });
Invoke when:
behavioral-loop-detection)Entry criteria:
Action:
Timeout: 3 attempts × backoff = max ~30 seconds total Escalate to Level 2 when: 3 retries exhausted without success
Example:
Read('.claude/context/data/index.db') → ENOENT
→ Level 1: Retry 3 times with 1s backoff
→ Still fails → escalate to Level 2
Entry criteria:
Action:
Timeout: 5 minutes total for all nudge variants Escalate to Level 3 when: All nudge variants fail or timeout expires
Example:
Write('output/result.json', ...) → EPERM
→ Level 2: Try .claude/context/tmp/result.json, then /tmp/result.json
→ All fail → escalate to Level 3
Entry criteria:
Action:
planner agent or invoke plan-generator skill for a new planTimeout: 15 minutes for new plan + initial execution Escalate to Level 4 when: Replan also fails, or planner cannot produce a viable plan
Example:
Auth middleware implementation fails repeatedly
→ Level 3: Replan with different auth library (passport → jose)
→ New plan also fails due to dependency conflict → escalate to Level 4
Entry criteria:
Action:
developer → devops-troubleshooter)Timeout: 20 minutes for fallback agent execution Escalate to Level 5 when: Fallback agent also fails or timeout expires
Example:
Developer agent cannot resolve platform-specific build error
→ Level 4: Spawn devops-troubleshooter with full error log
→ Still unresolved after 20min → escalate to Level 5
Entry criteria:
Action:
{ partial: true, escalationLevel: 5, explanation: '...' }Output contract:
{
"status": "partial",
"completedSteps": ["step1", "step2"],
"failedAt": "step3",
"failureReason": "<specific error>",
"escalationPath": [1, 2, 3, 4, 5],
"recommendation": "<what a human should do to finish>"
}
Example:
External API service is down
→ Level 5: Emit partial results (all non-API steps completed)
→ TaskUpdate with partial: true, explanation of API outage
→ Recommend retry when service is restored
| Error Type | Entry Level | Rationale | | ------------------------------ | ---------------------- | ------------------------------- | | Network timeout | 1 (retry) | Transient, safe to retry | | Rate limit | 1 (retry with backoff) | Transient, wait and retry | | File not found | 2 (nudge) | Wrong path, try alternatives | | Permission denied | 2 (nudge) | Wrong path/permissions | | Wrong output format | 2 (nudge) | Adjust input parameters | | Goal misalignment (judge FAIL) | 3 (replan) | Need different approach | | Agent capability mismatch | 4 (fallback) | Need different agent | | External service down | 5 (force-done) | Cannot resolve programmatically | | Missing credentials | 5 (force-done) | Requires human intervention |
Every level transition MUST be recorded:
// On entering a level
TaskUpdate({
taskId: '<task-id>',
status: 'in_progress',
metadata: {
recoveryLevel: <1-5>,
recoveryAction: 'retry|nudge|replan|fallback|force-done',
errorType: '<classification>',
previousLevels: [<completed levels>],
enteredAt: new Date().toISOString(),
}
});
// On Level 5 completion
TaskUpdate({
taskId: '<task-id>',
status: 'completed',
metadata: {
partial: true,
escalationLevel: 5,
escalationPath: [1, 2, 3, 4, 5],
completedSteps: ['<step1>', '<step2>'],
failedAt: '<step>',
failureReason: '<specific error>',
recommendation: '<human follow-up action>',
summary: 'Force-done: partial results emitted after 5-level escalation',
}
});
| Skill | Relationship |
| -------------------------------- | ---------------------------------------------------------------------- |
| behavioral-loop-detection | Feeds escalation triggers (REPLAN at 3, EXPLORE at 5, FORCE-DONE at 8) |
| judge-verification | FAIL verdict triggers escalation at appropriate level |
| plan-generator | Called at Level 3 (replan) to produce new approach |
| verification-before-completion | Gates completion before escalation is declared unnecessary |
| debugging | Provides root cause analysis before Level 3 escalation |
Input validated against schemas/input.schema.json before execution.
Output contract defined in schemas/output.schema.json.
Pre-execution hook: hooks/pre-execute.cjs
Post-execution hook: hooks/post-execute.cjs (emits observability event)
Before starting:
Read .claude/context/memory/learnings.md for known error patterns in this codebase.
Check .claude/context/memory/issues.md for known blockers.
After completing:
.claude/context/memory/issues.md with root cause.claude/context/memory/learnings.mdASSUME INTERRUPTION: If it's not in memory, it didn't happen.
tools
Comprehensive biosignal processing toolkit for analyzing physiological data including ECG, EEG, EDA, RSP, PPG, EMG, and EOG signals. Use this skill when processing cardiovascular signals, brain activity, electrodermal responses, respiratory patterns, muscle activity, or eye movements. Applicable for heart rate variability analysis, event-related potentials, complexity measures, autonomic nervous system assessment, psychophysiology research, and multi-modal physiological signal integration.
tools
Comprehensive toolkit for creating, analyzing, and visualizing complex networks and graphs in Python. Use when working with network/graph data structures, analyzing relationships between entities, computing graph algorithms (shortest paths, centrality, clustering), detecting communities, generating synthetic networks, or visualizing network topologies. Applicable to social networks, biological networks, transportation systems, citation networks, and any domain involving pairwise relationships.
data-ai
Molecular featurization for ML (100+ featurizers). ECFP, MACCS, descriptors, pretrained models (ChemBERTa), convert SMILES to features, for QSAR and molecular ML.
development
Run Python code in the cloud with serverless containers, GPUs, and autoscaling. Use when deploying ML models, running batch processing jobs, scheduling compute-intensive tasks, or serving APIs that require GPU acceleration or dynamic scaling.