Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

curiositech/dag-replay-debugger

Name: dag-replay-debugger
Author: curiositech

skills/dag-replay-debugger/SKILL.md

npx skillsauth add curiositech/windags-skills dag-replay-debugger

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

DAG Replay Debugger

Time-travel debugging for DAG executions. Inspect any node's full state (inputs, prompt, output, reasoning), replay from any checkpoint with modifications, and compare execution traces.

When to Use

✅ Use for:

Post-mortem analysis of failed or low-quality DAG executions
Inspecting exactly what a node received and produced
Replaying from a checkpoint with modified inputs or skills
Comparing two execution traces to find where they diverged
Understanding WHY a node made a specific decision

❌ NOT for:

Live monitoring of running DAGs (use websocket-streaming)
Automated failure recovery (use dag-mutation-strategist)
Profiling cost/performance (use dag-ops)

Core Capabilities

1. State Inspection

At any node in a completed execution, view:

┌──────────────────────────────────────────────────────┐
│  Node: analyze-codebase (Wave 2)                     │
│  Status: completed ✓  Duration: 4.2s  Cost: $0.028  │
│                                                      │
│  Model: claude-sonnet-4-5                            │
│  Skills loaded: code-review-skill, react-server-...  │
│                                                      │
│  ▸ System Prompt (3,421 tokens)         [Expand]     │
│  ▸ User Message (1,205 tokens)          [Expand]     │
│  ▸ Input from upstream nodes            [Expand]     │
│  ▸ Full output (1,847 tokens)           [Expand]     │
│  ▸ Evaluator scores                     [Expand]     │
│    Self: 0.85  Peer: 0.78  Downstream: accepted      │
│  ▸ Context Store entries used           [Expand]     │
│                                                      │
│  [Replay from here]  [Edit & Replay]  [Compare]      │
└──────────────────────────────────────────────────────┘

2. Replay from Checkpoint

Pick any completed node and re-execute from that point forward:

Same inputs: Useful for non-deterministic debugging (did the model just get unlucky?)
Modified inputs: Edit the upstream output, then replay to see if downstream behaves differently
Modified skills: Swap in a different skill version, then replay to compare output quality
Modified model: Try the same node with Haiku vs. Sonnet to validate routing decisions

flowchart LR
  A[Select checkpoint node] --> B{Modification?}
  B -->|None| C[Replay from here, same inputs]
  B -->|Edit input| D[Modify upstream output]
  B -->|Swap skill| E[Change skill assignment]
  B -->|Change model| F[Change model tier]
  D --> G[Re-execute node + downstream]
  E --> G
  F --> G
  C --> G
  G --> H[Compare with original execution]

3. Execution Diff

Compare two traces side-by-side:

Original Run (2026-02-05 14:32)      │  Replay Run (2026-02-05 14:45)
────────────────────────────────────  │  ──────────────────────────────
Node: analyze-codebase               │  Node: analyze-codebase
Model: sonnet-4.5                    │  Model: haiku-4.5 ← CHANGED
Output: 3 recommendations            │  Output: 2 recommendations ← DIFF
  1. Extract auth module ✓            │    1. Extract auth module ✓
  2. Add error boundaries ✓           │    2. Add error boundaries ✓
  3. Migrate to React Query ✗         │    [missing] ← DIFF
Downstream accepted: yes             │  Downstream accepted: no ← DIFF
Cost: $0.028                         │  Cost: $0.001 ← 96% cheaper

This tells you: Haiku saved money but missed recommendation #3, which caused downstream rejection. The routing decision (Sonnet for this node) was correct.

4. Reasoning Trace

For models with extended thinking, inspect the thinking tokens:

[thinking]
The user asked me to analyze this codebase for refactoring opportunities.
Looking at src/auth.ts — it's 450 lines with mixed concerns (auth + validation + session).
This violates single-responsibility. I should recommend extracting...
[/thinking]

This exposes WHY the agent made its decisions, not just what it decided.

Debugging Workflow

flowchart TD
  P[Problem: DAG produced bad output] --> I[Identify which node's output is wrong]
  I --> S[Inspect that node's full state]
  S --> Q{Is the input good?}
  Q -->|Bad input| U[Trace upstream: which node produced bad input?]
  U --> S
  Q -->|Good input, bad output| R{Is the skill appropriate?}
  R -->|Wrong skill| SK[Try different skill via Edit & Replay]
  R -->|Right skill, bad reasoning| M{Model too weak?}
  M -->|Yes| MU[Try stronger model via Edit & Replay]
  M -->|No| PR[Examine prompt: is the skill's process clear enough?]
  PR --> FIX[Improve the skill and re-run]

Anti-Patterns

Debugging Without Traces

Wrong: Trying to figure out what went wrong without execution traces. Right: Every DAG execution should save full traces (input, prompt, output, timing, cost per node). Debug from data, not guesses.

Replaying the Whole DAG

Wrong: Re-running the entire 10-node DAG to test a fix to Node 7. Right: Replay from Node 7's checkpoint. Nodes 1-6 were fine — don't re-execute them.

Ignoring the Reasoning Trace

Wrong: Only looking at inputs and outputs, not the thinking process. Right: If extended thinking is available, inspect it. The reasoning trace often reveals the exact moment the agent went wrong.

curiositech/dag-replay-debugger

skills/dag-replay-debugger/SKILL.md

Time-travel debugging for DAG executions. Inspect agent state at any node, replay decisions with modified inputs, compare execution traces side-by-side, and identify where reasoning diverged. Inspired by LangGraph Studio's state-editing model and Temporal's event history. Activate on "debug DAG", "replay execution", "time travel debug", "inspect node state", "what went wrong at step", "compare runs", "execution diff". NOT for live monitoring (use dag-runtime + websocket-streaming), failure analysis (use dag-ops), or general code debugging.

development

Updated Apr 4, 2026

$ install --global

skillsauth

npx skillsauth add curiositech/windags-skills dag-replay-debugger

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 4, 2026, 2:06 PM39.2s1 file scanned

SKILL.md

license:: BSL-1.1
name:: dag-replay-debugger
description:: Time-travel debugging for DAG executions. Inspect agent state at any node, replay decisions with modified inputs, compare execution traces side-by-side, and identify where reasoning diverged. Inspired by LangGraph Studio's state-editing model and Temporal's event history. Activate on "debug DAG", "replay execution", "time travel debug", "inspect node state", "what went wrong at step", "compare runs", "execution diff". NOT for live monitoring (use dag-runtime + websocket-streaming), failure analysis (use dag-ops), or general code debugging.
allowed-tools:: Read,Grep,Glob
category:: Agent & Orchestration

DAG Replay Debugger

Time-travel debugging for DAG executions. Inspect any node's full state (inputs, prompt, output, reasoning), replay from any checkpoint with modifications, and compare execution traces.

When to Use

✅ Use for:

Post-mortem analysis of failed or low-quality DAG executions
Inspecting exactly what a node received and produced
Replaying from a checkpoint with modified inputs or skills
Comparing two execution traces to find where they diverged
Understanding WHY a node made a specific decision

❌ NOT for:

Live monitoring of running DAGs (use websocket-streaming)
Automated failure recovery (use dag-mutation-strategist)
Profiling cost/performance (use dag-ops)

Core Capabilities

1. State Inspection

At any node in a completed execution, view:

┌──────────────────────────────────────────────────────┐
│  Node: analyze-codebase (Wave 2)                     │
│  Status: completed ✓  Duration: 4.2s  Cost: $0.028  │
│                                                      │
│  Model: claude-sonnet-4-5                            │
│  Skills loaded: code-review-skill, react-server-...  │
│                                                      │
│  ▸ System Prompt (3,421 tokens)         [Expand]     │
│  ▸ User Message (1,205 tokens)          [Expand]     │
│  ▸ Input from upstream nodes            [Expand]     │
│  ▸ Full output (1,847 tokens)           [Expand]     │
│  ▸ Evaluator scores                     [Expand]     │
│    Self: 0.85  Peer: 0.78  Downstream: accepted      │
│  ▸ Context Store entries used           [Expand]     │
│                                                      │
│  [Replay from here]  [Edit & Replay]  [Compare]      │
└──────────────────────────────────────────────────────┘

2. Replay from Checkpoint

Pick any completed node and re-execute from that point forward:

Same inputs: Useful for non-deterministic debugging (did the model just get unlucky?)
Modified inputs: Edit the upstream output, then replay to see if downstream behaves differently
Modified skills: Swap in a different skill version, then replay to compare output quality
Modified model: Try the same node with Haiku vs. Sonnet to validate routing decisions

flowchart LR
  A[Select checkpoint node] --> B{Modification?}
  B -->|None| C[Replay from here, same inputs]
  B -->|Edit input| D[Modify upstream output]
  B -->|Swap skill| E[Change skill assignment]
  B -->|Change model| F[Change model tier]
  D --> G[Re-execute node + downstream]
  E --> G
  F --> G
  C --> G
  G --> H[Compare with original execution]

3. Execution Diff

Compare two traces side-by-side:

Original Run (2026-02-05 14:32)      │  Replay Run (2026-02-05 14:45)
────────────────────────────────────  │  ──────────────────────────────
Node: analyze-codebase               │  Node: analyze-codebase
Model: sonnet-4.5                    │  Model: haiku-4.5 ← CHANGED
Output: 3 recommendations            │  Output: 2 recommendations ← DIFF
  1. Extract auth module ✓            │    1. Extract auth module ✓
  2. Add error boundaries ✓           │    2. Add error boundaries ✓
  3. Migrate to React Query ✗         │    [missing] ← DIFF
Downstream accepted: yes             │  Downstream accepted: no ← DIFF
Cost: $0.028                         │  Cost: $0.001 ← 96% cheaper

This tells you: Haiku saved money but missed recommendation #3, which caused downstream rejection. The routing decision (Sonnet for this node) was correct.

4. Reasoning Trace

For models with extended thinking, inspect the thinking tokens:

[thinking]
The user asked me to analyze this codebase for refactoring opportunities.
Looking at src/auth.ts — it's 450 lines with mixed concerns (auth + validation + session).
This violates single-responsibility. I should recommend extracting...
[/thinking]

This exposes WHY the agent made its decisions, not just what it decided.

Debugging Workflow

flowchart TD
  P[Problem: DAG produced bad output] --> I[Identify which node's output is wrong]
  I --> S[Inspect that node's full state]
  S --> Q{Is the input good?}
  Q -->|Bad input| U[Trace upstream: which node produced bad input?]
  U --> S
  Q -->|Good input, bad output| R{Is the skill appropriate?}
  R -->|Wrong skill| SK[Try different skill via Edit & Replay]
  R -->|Right skill, bad reasoning| M{Model too weak?}
  M -->|Yes| MU[Try stronger model via Edit & Replay]
  M -->|No| PR[Examine prompt: is the skill's process clear enough?]
  PR --> FIX[Improve the skill and re-run]

Anti-Patterns

Debugging Without Traces

Replaying the Whole DAG

Wrong: Re-running the entire 10-node DAG to test a fix to Node 7. Right: Replay from Node 7's checkpoint. Nodes 1-6 were fine — don't re-execute them.

Ignoring the Reasoning Trace

Related Skills

curiositech/revisiting-interview-data-analysing-turn

data-ai

VerifiedTrustedCommunity

license: Apache-2.0 NOT for unrelated tasks outside this domain.

8SKILL.mdUpdated Jul 19, 2026

curiositech/revisiting-interview-data-analysing-turn

curiositech/redis-patterns-expert

development

VerifiedTrustedCommunity

Use when designing caching strategies (cache-aside, write-through, write-behind), implementing distributed locks, building rate limiters, leaderboards, real-time streams (XADD/consumer groups), pub/sub, or tuning eviction policies. Triggers: thundering-herd on cache miss, dogpile on key expiry, Redlock vs SET-NX-PX choice, sliding-window rate limiter, hot-key on a single cluster slot, big-key blowup, MULTI/EXEC across slots, KEYS in production. NOT for Redis Cluster operations/admin (different domain), embedded KV (SQLite, leveldb), in-process LRU caches, or Memcached.

8SKILL.mdUpdated Jul 19, 2026

curiositech/redis-patterns-expert

curiositech/react-server-components-boundary

tools

VerifiedTrustedCommunity

Drawing the `'use client'` boundary correctly in React Server Components apps (Next.js App Router, RSC frameworks) — leaf-pushing, slot composition, serialization rules, and environment poisoning prevention. Grounded in react.dev and Next.js 16 docs.

8SKILL.mdUpdated Jul 19, 2026

curiositech/react-server-components-boundary

curiositech/rate-limiting-strategy

development

VerifiedTrustedCommunity

Use when designing rate limiting for an API, choosing between token bucket / sliding window / leaky bucket / fixed window, implementing it in Redis, deciding edge (Cloudflare/Upstash) vs origin enforcement, sizing per-user vs per-IP vs per-endpoint quotas, returning the right 429 response with Retry-After, or fixing the boundary-burst bug in fixed-window limiters. Triggers: 429 too many requests, INCR + EXPIRE, ZADD + ZREMRANGEBYSCORE + ZCARD, X-RateLimit-Remaining header, Cloudflare WAF rate limiting rules, Upstash @upstash/ratelimit, leaky bucket shaping vs policing, distributed rate limiter consistency. NOT for DDoS mitigation specifically (different scale), CAPTCHA / bot management, full WAF design, or per-user quota billing.

8SKILL.mdUpdated Jul 19, 2026

curiositech/rate-limiting-strategy

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/curiositech/windags-skills.git

# Copy into Claude Code skills folder (global)
cp -r windags-skills/skills/dag-replay-debugger ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

curiositech/windags-skills

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT