skills/websocket-streaming/SKILL.md
--- --- name: websocket-streaming license: Apache-2.0 description: Implements real-time bidirectional communication between DAG execution engines and visualization dashboards via WebSocket. Covers connection management, typed event protocols, reconnection with backoff, and React hook integration. Activate on "WebSocket", "real-time updates", "live streaming", "execution events", "state streaming", "push notifications". NOT for HTTP REST APIs, server-sent events (SSE), or general networking. allo
npx skillsauth add curiositech/windags-skills skills/websocket-streamingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Real-time bidirectional communication between DAG execution engines and dashboards via typed event protocols and connection state management.
CONNECTING → OPEN → CLOSING → CLOSED
↓ ↓ ↓ ↓
wait process queue reconnect
If connection === 'CONNECTING':
- Buffer outgoing messages in memory queue
- Show "connecting..." indicator
- Timeout after 10s → retry with backoff
If connection === 'OPEN':
- Send queued messages immediately
- Process incoming events via typed dispatch
- Send heartbeat every 30s
If connection === 'CLOSING':
- Stop sending new messages
- Finish processing in-flight events
- Prepare for reconnection
If connection === 'CLOSED':
- Calculate backoff: min(1000 * 2^attempt, 30000)ms
- Increment reconnect attempt counter
- Trigger reconnection after delay
If event.type === 'node_state':
- Update store.nodes[event.node_id].status
- Trigger UI re-render for affected node
- Log metrics if present
If event.type === 'cost_update':
- Update budget display in header
- Show warning if remaining < 20%
- Log cost trajectory for analytics
If event.type === 'human_gate_waiting':
- Show modal with gate presentation
- Enable approval/rejection buttons
- Start timeout countdown (default 5min)
If event.type === 'error':
- Show error toast notification
- Highlight affected node (if node_id present)
- Log to error tracking service
If connectionState === 'OPEN' && bufferQueue.length === 0:
- Send message immediately via ws.send()
If connectionState !== 'OPEN' && message.priority === 'high':
- Add to front of bufferQueue
- Limit high-priority queue to 50 messages
If connectionState !== 'OPEN' && message.priority === 'normal':
- Add to back of bufferQueue
- Drop oldest if queue > 200 messages
If connectionState becomes 'OPEN' && bufferQueue.length > 0:
- Send all queued messages in order
- Clear buffer queue
- Resume normal operation
Symptom: Rapid connect/disconnect cycles, exponentially increasing CPU usage Detection: If reconnect attempts > 5 in 60 seconds Fix: Implement exponential backoff with max delay (30s), circuit breaker after 10 failures
Symptom: Dashboard becomes unresponsive, memory usage spikes, UI freezes Detection: If incoming message rate > 100/second or buffer queue > 1000 messages Fix: Implement message throttling, batch state updates, drop non-critical events
Symptom: Dashboard shows wrong node statuses, missing execution progress Detection: If timestamp gap > 30s between disconnect and reconnect Fix: Send 'resync_request' on reconnect, server responds with full current state
Symptom: Memory usage grows steadily, never decreases, eventual crash Detection: If WebSocket reference count > 5 for single DAG, or buffer never clears Fix: Properly close previous WebSocket before creating new one, clear event listeners
Symptom: Critical events (human gates, errors) never reach dashboard Detection: If expected event doesn't arrive within timeout window Fix: Implement message acknowledgment, server retries unacknowledged critical events
Scenario: User is monitoring DAG execution. Network drops for 45 seconds during critical node processing. Connection recovers.
Step 1 - Connection Loss Detection:
// WebSocket onclose event fires
ws.onclose = () => {
setConnectionState('CLOSED');
// Decision: Network issue or server restart?
// → Try reconnect (could be temporary network)
scheduleReconnect();
};
Step 2 - Buffering Decision:
// User tries to send human decision during outage
const sendDecision = (decision) => {
if (connectionState !== 'OPEN') {
// Decision: Buffer or reject?
// → Buffer high-priority messages (human decisions)
bufferQueue.unshift({
type: 'human_decision',
priority: 'high',
timestamp: Date.now()
});
showToast("Decision queued - connection lost");
}
};
Step 3 - Reconnection with State Gap:
ws.onopen = () => {
const disconnectDuration = Date.now() - lastDisconnectTime;
if (disconnectDuration > 30000) {
// Decision: Full resync or continue?
// → 45s gap requires full resync
ws.send(JSON.stringify({ type: 'resync_request', last_seen: lastEventTimestamp }));
// Server responds with missed events + current state
// Trade-off: Higher bandwidth but guaranteed consistency
}
// Send buffered messages
flushBufferQueue();
};
What novice misses: Assumes reconnection means everything is fine, doesn't handle state gap What expert catches: Calculates disconnect duration, requests resync for gaps > 30s, preserves critical user actions in buffer
Don't use WebSocket streaming for:
polling-pattern-optimizer with REST endpoints insteadfile-transfer-handler with HTTP multipart insteadsse-event-stream for simpler one-way communicationapi-architect for traditional REST/GraphQL patternsdatabase-change-streams for direct DB subscriptionsauth-flow-manager for login/logout/token refreshDelegate to other skills:
message-queue-architectwebsocket-load-balancerwebrtc-connection-managertools
Building resilient distributed systems with circuit breakers, retries with full-jitter exponential backoff, retry budgets (per-request 3-attempt + per-client 10% ratio per Google SRE), deadline propagation, and the cascading-failure math (4 layers × 3 retries = 64x amplification). Grounded in Resilience4j, Microsoft Cloud Patterns, AWS Architecture Blog (Marc Brooker), and Google SRE Book.
testing
Designing HTTP cache headers that work correctly across browsers, CDNs, and shared proxies — `Cache-Control` directives per RFC 9111, `stale-while-revalidate` and `stale-if-error` per RFC 5861, the Vary header for varying responses, and surrogate keys for tag-based purging. Grounded in IETF RFCs and Cloudflare/Fastly docs.
development
Use when designing or fixing a Content Security Policy on a real site, choosing between nonce-based and hash-based CSP, adding strict-dynamic, debugging "Refused to execute inline script" errors, deploying CSP in report-only mode first, configuring report-to / report-uri, or auditing an existing policy for unsafe-inline / unsafe-eval / wildcards. Triggers: "CSP blocks legitimate inline script", strict-dynamic, nonce-{RANDOM}, sha256-{HASH}, object-src none, base-uri none, frame-ancestors, Trusted Types, X-Content-Security-Policy obsolete, report-only vs enforced. NOT for general HTTP security headers (HSTS, COOP/COEP), Trusted Types deep dive, CORS configuration, or building a WAF.
tools
Choosing and operating an HTTP API versioning strategy that doesn't break clients — Stripe's date-based pinned versions, the Deprecation/Sunset header pair (RFC 9745 + RFC 8594), URI vs header vs media-type approaches, and the version-transformer pattern. Grounded in Stripe's published architecture and IETF RFCs.