Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

lubu-labs/langgraph-error-handling

Name: langgraph-error-handling
Author: lubu-labs

skills/langgraph-error-handling/SKILL.md

npx skillsauth add lubu-labs/langchain-agent-skills langgraph-error-handling

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

LangGraph Error Handling

Use This Skill For

Adding RetryPolicy to flaky nodes (API, DB, model/tool calls)
Designing LLM recovery loops (Command + error state + retry counters)
Adding human approval/escalation with interrupt() and resume
Handling prebuilt ToolNode failures
Debugging transactional failure behavior in parallel supersteps

Strategy Selection

Use this order:

Transient/infrastructure issue (429, timeout, 5xx, temporary DB lock) -> RetryPolicy
Recoverable by model/tool args correction -> store error in state and route back with Command
Needs user approval or missing info -> interrupt() + resume
Unknown/programming bug -> let it bubble up and debug

| Error Type | Owner | Primary Mechanism | |---|---|---| | Transient | System | RetryPolicy | | LLM-recoverable | LLM | State update + Command(goto=...) | | User-fixable | Human | interrupt() + Command(resume=...) | | Unexpected | Developer | Raise/log/debug |

For full taxonomy, load references/error-types.md.

Minimal Patterns

1) Retry Transient Failures

from langgraph.types import RetryPolicy

builder.add_node(
    "call_api",
    call_api,
    retry_policy=RetryPolicy(max_attempts=3, initial_interval=1.0),
)

builder.addNode("callApi", callApi, {
  retryPolicy: { maxAttempts: 3, initialInterval: 1.0 },
});

Notes:

Python and JS default retry behavior differs by exception type.
Prefer targeted retry_on/retryOn for non-transient domains.

2) LLM Recovery Loop

Use MessagesState in Python for message state.

from typing import Literal
from typing_extensions import NotRequired
from langgraph.graph import MessagesState
from langgraph.types import Command

class State(MessagesState):
    error: NotRequired[str]
    retry_count: NotRequired[int]

def agent(state: State) -> Command[Literal["tool", "__end__"]]:
    if state.get("retry_count", 0) >= 3:
        return Command(goto="__end__")
    if state.get("error"):
        return Command(goto="tool")
    return Command(goto="tool")

import { StateGraph, Command, END } from "@langchain/langgraph";

// If a node returns Command in JS, add `ends` on addNode.
builder.addNode("agent", agentNode, { ends: ["tool", END] });

3) Human-In-The-Loop Escalation

from langgraph.types import interrupt, Command

def human_review(state):
    approved = interrupt({
        "question": "Proceed?",
        "payload": state["pending_action"],
    })
    return Command(goto="execute" if approved else "cancel")

# resume
graph.invoke(Command(resume=True), config={"configurable": {"thread_id": "t-1"}})

import { Command, interrupt } from "@langchain/langgraph";

const approved = interrupt({ question: "Proceed?" });
// later
await graph.invoke(new Command({ resume: true }), {
  configurable: { thread_id: "t-1" },
});

Requirements:

Compile with a checkpointer for interrupt flows.
Reuse the same thread_id on resume.

For deep HITL patterns, load references/human-escalation.md.

ToolNode Error Handling

from langgraph.prebuilt import ToolNode

tool_node = ToolNode(tools, handle_tool_errors=True)
tool_node = ToolNode(tools, handle_tool_errors="Please try again.")
tool_node = ToolNode(tools, handle_tool_errors=(ValueError, TypeError))

Use custom handlers when you need deterministic error shaping for model recovery. For broader tool-recovery design, load references/llm-recovery.md.

Critical Behavior (Do Not Skip)

Supersteps are transactional: one failing parallel branch fails the whole superstep state update.
RetryPolicy retries failing branches, not successful siblings.
interrupt() re-runs the node on resume: side effects before interrupt must be idempotent, or moved after interrupt / separate node.
JS Command routing requires ends metadata on addNode(...).
Use explicit retry limits (max_attempts, plus state counters for recovery loops).

Local Assets In This Skill

Scripts

scripts/classify_error.py: classify exception category and recommended handling
scripts/wrap_with_retry.py: generate boilerplate node wrappers with retry/recovery/escalation options

Run from repo root:

uv run skills/langgraph-error-handling/scripts/classify_error.py TimeoutError --verbose
uv run skills/langgraph-error-handling/scripts/wrap_with_retry.py call_llm --with-llm-recovery

Examples

assets/examples/retry-example/: retry + recovery loop (Python and JS)
assets/examples/human-loop-example/: interrupt/resume approval flow (Python and JS)

Load References On Demand

references/error-types.md: error taxonomy and classification rules
references/retry-strategies.md: retry tuning, backoff, circuit-breaker-style patterns
references/llm-recovery.md: recovery-loop and ToolNode strategies
references/human-escalation.md: human approval, interrupts, and escalation patterns

Common Failure Modes

| Symptom | Root Cause | Fix | |---|---|---| | interrupt() fails at runtime | no checkpointer | compile with checkpointer | | Resume starts new run | different thread_id | reuse same thread_id | | JS Command route not taken | missing ends | add ends to addNode | | Infinite loop | no termination counter/condition | add retry counter + terminal branch | | Retry never triggers | exception excluded by retry filter | set explicit retry_on/retryOn |

lubu-labs/langgraph-error-handling

skills/langgraph-error-handling/SKILL.md

Implement LangGraph error handling with current v1 patterns. Use when users need to classify failures, add RetryPolicy for transient issues, build LLM recovery loops with Command routing, add human-in-the-loop with interrupt()/resume, handle ToolNode errors, or choose a safe strategy between retry, recovery, and escalation.

88 stars

tools

Updated Apr 6, 2026

$ install --global

skillsauth

npx skillsauth add lubu-labs/langchain-agent-skills langgraph-error-handling

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 6, 2026, 8:43 PM108.1s15 files scanned

SKILL.md

name:: langgraph-error-handling
description:: Implement LangGraph error handling with current v1 patterns. Use when users need to classify failures, add RetryPolicy for transient issues, build LLM recovery loops with Command routing, add human-in-the-loop with interrupt()/resume, handle ToolNode errors, or choose a safe strategy between retry, recovery, and escalation.

LangGraph Error Handling

Use This Skill For

Adding RetryPolicy to flaky nodes (API, DB, model/tool calls)
Designing LLM recovery loops (Command + error state + retry counters)
Adding human approval/escalation with interrupt() and resume
Handling prebuilt ToolNode failures
Debugging transactional failure behavior in parallel supersteps

Strategy Selection

Use this order:

Transient/infrastructure issue (429, timeout, 5xx, temporary DB lock) -> RetryPolicy
Recoverable by model/tool args correction -> store error in state and route back with Command
Needs user approval or missing info -> interrupt() + resume
Unknown/programming bug -> let it bubble up and debug

For full taxonomy, load references/error-types.md.

Minimal Patterns

1) Retry Transient Failures

from langgraph.types import RetryPolicy

builder.add_node(
    "call_api",
    call_api,
    retry_policy=RetryPolicy(max_attempts=3, initial_interval=1.0),
)

builder.addNode("callApi", callApi, {
  retryPolicy: { maxAttempts: 3, initialInterval: 1.0 },
});

Notes:

Python and JS default retry behavior differs by exception type.
Prefer targeted retry_on/retryOn for non-transient domains.

2) LLM Recovery Loop

Use MessagesState in Python for message state.

from typing import Literal
from typing_extensions import NotRequired
from langgraph.graph import MessagesState
from langgraph.types import Command

class State(MessagesState):
    error: NotRequired[str]
    retry_count: NotRequired[int]

def agent(state: State) -> Command[Literal["tool", "__end__"]]:
    if state.get("retry_count", 0) >= 3:
        return Command(goto="__end__")
    if state.get("error"):
        return Command(goto="tool")
    return Command(goto="tool")

import { StateGraph, Command, END } from "@langchain/langgraph";

// If a node returns Command in JS, add `ends` on addNode.
builder.addNode("agent", agentNode, { ends: ["tool", END] });

3) Human-In-The-Loop Escalation

from langgraph.types import interrupt, Command

def human_review(state):
    approved = interrupt({
        "question": "Proceed?",
        "payload": state["pending_action"],
    })
    return Command(goto="execute" if approved else "cancel")

# resume
graph.invoke(Command(resume=True), config={"configurable": {"thread_id": "t-1"}})

import { Command, interrupt } from "@langchain/langgraph";

const approved = interrupt({ question: "Proceed?" });
// later
await graph.invoke(new Command({ resume: true }), {
  configurable: { thread_id: "t-1" },
});

Requirements:

Compile with a checkpointer for interrupt flows.
Reuse the same thread_id on resume.

For deep HITL patterns, load references/human-escalation.md.

ToolNode Error Handling

from langgraph.prebuilt import ToolNode

tool_node = ToolNode(tools, handle_tool_errors=True)
tool_node = ToolNode(tools, handle_tool_errors="Please try again.")
tool_node = ToolNode(tools, handle_tool_errors=(ValueError, TypeError))

Use custom handlers when you need deterministic error shaping for model recovery. For broader tool-recovery design, load references/llm-recovery.md.

Critical Behavior (Do Not Skip)

Supersteps are transactional: one failing parallel branch fails the whole superstep state update.
RetryPolicy retries failing branches, not successful siblings.
interrupt() re-runs the node on resume: side effects before interrupt must be idempotent, or moved after interrupt / separate node.
JS Command routing requires ends metadata on addNode(...).
Use explicit retry limits (max_attempts, plus state counters for recovery loops).

Local Assets In This Skill

Scripts

scripts/classify_error.py: classify exception category and recommended handling
scripts/wrap_with_retry.py: generate boilerplate node wrappers with retry/recovery/escalation options

Run from repo root:

uv run skills/langgraph-error-handling/scripts/classify_error.py TimeoutError --verbose
uv run skills/langgraph-error-handling/scripts/wrap_with_retry.py call_llm --with-llm-recovery

Examples

assets/examples/retry-example/: retry + recovery loop (Python and JS)
assets/examples/human-loop-example/: interrupt/resume approval flow (Python and JS)

Load References On Demand

references/error-types.md: error taxonomy and classification rules
references/retry-strategies.md: retry tuning, backoff, circuit-breaker-style patterns
references/llm-recovery.md: recovery-loop and ToolNode strategies
references/human-escalation.md: human approval, interrupts, and escalation patterns

Common Failure Modes

Related Skills

lubu-labs/skill-creator

tools

VerifiedTrustedCommunity

Guide for creating effective skills. This skill should be used when users want to create a new skill (or update an existing skill) that extends Claude's capabilities with specialized knowledge, workflows, or tool integrations.

88SKILL.mdUpdated Apr 6, 2026

lubu-labs/skill-creator

lubu-labs/langsmith-trace-analyzer

tools

VerifiedTrustedCommunity

Fetch, organize, and analyze LangSmith traces for debugging and evaluation. Use when you need to: query traces/runs by project, metadata, status, or time window; download traces to JSON; organize outcomes into passed/failed/error buckets; analyze token/message/tool-call patterns; compare passed vs failed behavior; or investigate benchmark and production failures.

88SKILL.mdUpdated Apr 6, 2026

lubu-labs/langsmith-trace-analyzer

lubu-labs/langgraph-testing-evaluation

tools

VerifiedTrustedCommunity

Use this skill when you need to test or evaluate LangGraph/LangChain agents: writing unit or integration tests, generating test scaffolds, mocking LLM/tool behavior, running trajectory evaluation (match or LLM-as-judge), running LangSmith dataset evaluations, and comparing two agent versions with A/B-style offline analysis. Use it for Python and JavaScript/TypeScript workflows, evaluator design, experiment setup, regression gates, and debugging flaky/incorrect evaluation results.

88SKILL.mdUpdated Apr 6, 2026

lubu-labs/langgraph-testing-evaluation

lubu-labs/langgraph-state-management

development

VerifiedTrustedCommunity

Design state schemas, implement reducers, configure persistence, and debug state issues for LangGraph applications. Use when users want to (1) design or define state schemas for LangGraph graphs, (2) implement reducer functions for state accumulation, (3) configure persistence with checkpointers (InMemorySaver/MemorySaver, SqliteSaver, PostgresSaver), (4) debug state update issues or unexpected state behavior, (5) migrate state schemas between versions, (6) validate state schema structure, (7) choose between TypedDict and MessagesState patterns, (8) implement custom reducers for lists, dicts, or sets, (9) use the Overwrite type to bypass reducers, (10) set up thread-based persistence for multi-turn conversations, or (11) inspect checkpoints for debugging.

88SKILL.mdUpdated Apr 6, 2026

lubu-labs/langgraph-state-management

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/lubu-labs/langchain-agent-skills.git

# Copy into Claude Code skills folder (global)
cp -r langchain-agent-skills/skills/langgraph-error-handling ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

lubu-labs/langchain-agent-skills

88 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT