skills/error-recovery/SKILL.md
When a step fails during an agentic task, classify the error (transient, configuration, logic, or permanent), apply the right recovery strategy, and escalate to the user when all strategies are exhausted. Triggers on error messages, exceptions, tracebacks, 'failed', 'not working', 'retry', or when 2 consecutive steps fail.
npx skillsauth add fatih-developer/fth-skills error-recoveryInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
When an error occurs, stop, think, and try the right recovery strategy. No blind retries — understand the error signal first, then act.
Core principle: Every error carries a signal. Read the signal first, then act.
Classify every error into one of 4 categories — the recovery strategy depends on the category:
Retrying usually fixes it. Infrastructure or network related.
Environment or setup issue. Code is correct but setup is wrong.
Code or approach is wrong. Retrying produces the same error.
Out of control, cannot be fixed. External service or permission boundary.
For transient errors, use exponential backoff:
Attempt 1: Retry immediately
Attempt 2: Wait 2 seconds
Attempt 3: Wait 4 seconds
Attempt 4: Wait 8 seconds -> move on or escalate
Maximum retries: 3 attempts. If all 3 fail → re-evaluate the category.
Rate limit (429) special rule:
Retry-After header, wait that durationError received
|
Classify the error
|
+------------------------------------+
| Transient? -> Wait & Retry (max 3)|
| Config? -> Fix & Continue |
| Logic? -> Alternative approach|
| Permanent? -> Escalation |
+------------------------------------+
|
Every strategy fails -> Escalation
Escalate to the user when:
ERROR ESCALATION
================================
Failed step : [step name]
Error : [error message summary]
Category : [Transient / Config / Logic / Permanent]
Tried : [what was attempted — short list]
Result : All strategies exhausted
================================
Options:
A) [Alternative approach suggestion]
B) [Simpler / partial solution]
C) Skip this step, continue
D) Stop the task
For bulk operations where some items succeed and some fail:
PARTIAL SUCCESS
================================
Successful : N / Total
Failed : M items
================================
Failed items:
- [item]: [reason]
Options:
A) Retry only failed items
B) Continue with successful items, skip failed
C) Cancel all
Log every error and recovery attempt:
[ERROR LOG]
Step : [step name / number]
Error : [message]
Category : [type]
Attempt 1: [strategy] -> [result]
Attempt 2: [strategy] -> [result]
Result : Recovered / Escalated
checkpoint-guardian (risk assessment before retry), memory-ledger (logs errors and fixes), and agent-reviewer (retrospective analysis).tools
Create, optimize, critique, and programmatically structure prompts for AI systems. Use this skill whenever the user is designing or improving a static prompt, system prompt, coding prompt, agent prompt, workflow prompt, MCP-oriented prompt package, or an algorithmic prompt optimization pipeline. Also use it when the user asks to turn vague AI behavior into a precise instruction set, tool policy, agent spec, evaluation metric, or prompt architecture.
testing
Assumption-first architecture review skill to stress-test project plans and expose hidden risks.
testing
Enforce and manage DESIGN.md specifications, extract design systems from URLs, and combine design reasoning with token roles to prevent drift.
testing
Forces the agent to act with a Claude-like product mindset, prioritizing user journey, UX states, and visual quality before coding.