skills/llm-streaming-response-handler/SKILL.md
Build production LLM streaming UIs with Server-Sent Events, real-time token display, cancellation, error recovery. Handles OpenAI/Anthropic/Claude streaming APIs. Use for chatbots, AI assistants, real-time text generation. Activate on "LLM streaming", "SSE", "token stream", "chat UI", "real-time AI". NOT for batch processing, non-streaming APIs, or WebSocket bidirectional chat.
npx skillsauth add curiositech/windags-skills llm-streaming-response-handlerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Expert in building production-grade streaming interfaces for LLM responses that feel instant and responsive.
✅ Use for:
❌ NOT for:
Does your LLM interaction:
├── Need immediate visual feedback? → Streaming
├── Display long-form content (>100 words)? → Streaming
├── User expects typewriter effect? → Streaming
├── Short response (<50 words)? → Regular fetch
└── Background processing? → Regular fetch
Why SSE over WebSockets for LLM streaming:
Timeline:
| Provider | Streaming Method | Response Format |
|----------|------------------|-----------------|
| OpenAI | SSE | data: {"choices":[{"delta":{"content":"token"}}]} |
| Anthropic | SSE | data: {"type":"content_block_delta","delta":{"text":"token"}} |
| Claude (API) | SSE | data: {"delta":{"text":"token"}} |
| Vercel AI SDK | SSE | Normalized across providers |
Novice thinking: "Collect all tokens, then show complete response"
Problem: Defeats the entire purpose of streaming.
Wrong approach:
// ❌ Waits for entire response before showing anything
const response = await fetch('/api/chat', { method: 'POST', body: prompt });
const fullText = await response.text();
setMessage(fullText); // User sees nothing until done
Correct approach:
// ✅ Display tokens as they arrive
const response = await fetch('/api/chat', {
method: 'POST',
body: JSON.stringify({ prompt })
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n').filter(line => line.trim());
for (const line of lines) {
if (line.startsWith('data: ')) {
const data = JSON.parse(line.slice(6));
setMessage(prev => prev + data.content); // Update immediately
}
}
}
Timeline:
Problem: User can't stop generation, wasting tokens and money.
Symptom: "Stop" button doesn't work or doesn't exist.
Correct approach:
// ✅ AbortController for cancellation
const [abortController, setAbortController] = useState<AbortController | null>(null);
const streamResponse = async () => {
const controller = new AbortController();
setAbortController(controller);
try {
const response = await fetch('/api/chat', {
signal: controller.signal,
method: 'POST',
body: JSON.stringify({ prompt })
});
// Stream handling...
} catch (error) {
if (error.name === 'AbortError') {
console.log('Stream cancelled by user');
}
} finally {
setAbortController(null);
}
};
const cancelStream = () => {
abortController?.abort();
};
return (
<button onClick={cancelStream} disabled={!abortController}>
Stop Generating
</button>
);
Problem: Stream fails mid-response, user sees partial text with no indication of failure.
Correct approach:
// ✅ Error states and recovery
const [streamState, setStreamState] = useState<'idle' | 'streaming' | 'error' | 'complete'>('idle');
const [errorMessage, setErrorMessage] = useState<string | null>(null);
try {
setStreamState('streaming');
// Streaming logic...
setStreamState('complete');
} catch (error) {
setStreamState('error');
if (error.name === 'AbortError') {
setErrorMessage('Generation stopped');
} else if (error.message.includes('429')) {
setErrorMessage('Rate limit exceeded. Try again in a moment.');
} else {
setErrorMessage('Something went wrong. Please retry.');
}
}
// UI feedback
{streamState === 'error' && (
<div className="error-banner">
{errorMessage}
<button onClick={retryStream}>Retry</button>
</div>
)}
Problem: Streams not cleaned up, causing memory leaks.
Symptom: Browser slows down after multiple requests.
Correct approach:
// ✅ Cleanup with useEffect
useEffect(() => {
let reader: ReadableStreamDefaultReader | null = null;
const streamResponse = async () => {
const response = await fetch('/api/chat', { ... });
reader = response.body.getReader();
// Streaming...
};
streamResponse();
// Cleanup on unmount
return () => {
reader?.cancel();
};
}, [prompt]);
Problem: UI feels frozen between slow tokens.
Correct approach:
// ✅ Animated cursor during generation
<div className="message">
{content}
{isStreaming && <span className="typing-cursor">▊</span>}
</div>
.typing-cursor {
animation: blink 1s step-end infinite;
}
@keyframes blink {
50% { opacity: 0; }
}
async function* streamCompletion(prompt: string) {
const response = await fetch('/api/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ prompt })
});
const reader = response.body!.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n');
for (const line of lines) {
if (line.startsWith('data: ')) {
const data = JSON.parse(line.slice(6));
if (data.content) {
yield data.content;
}
if (data.done) {
return;
}
}
}
}
}
// Usage
for await (const token of streamCompletion('Hello')) {
console.log(token);
}
import { useState, useCallback } from 'react';
interface UseStreamingOptions {
onToken?: (token: string) => void;
onComplete?: (fullText: string) => void;
onError?: (error: Error) => void;
}
export function useStreaming(options: UseStreamingOptions = {}) {
const [content, setContent] = useState('');
const [isStreaming, setIsStreaming] = useState(false);
const [error, setError] = useState<Error | null>(null);
const [abortController, setAbortController] = useState<AbortController | null>(null);
const stream = useCallback(async (prompt: string) => {
const controller = new AbortController();
setAbortController(controller);
setIsStreaming(true);
setError(null);
setContent('');
try {
const response = await fetch('/api/chat', {
method: 'POST',
signal: controller.signal,
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ prompt })
});
const reader = response.body!.getReader();
const decoder = new TextDecoder();
let accumulated = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n').filter(line => line.trim());
for (const line of lines) {
if (line.startsWith('data: ')) {
const data = JSON.parse(line.slice(6));
if (data.content) {
accumulated += data.content;
setContent(accumulated);
options.onToken?.(data.content);
}
}
}
}
options.onComplete?.(accumulated);
} catch (err) {
if (err.name !== 'AbortError') {
setError(err as Error);
options.onError?.(err as Error);
}
} finally {
setIsStreaming(false);
setAbortController(null);
}
}, [options]);
const cancel = useCallback(() => {
abortController?.abort();
}, [abortController]);
return { content, isStreaming, error, stream, cancel };
}
// Usage in component
function ChatInterface() {
const { content, isStreaming, stream, cancel } = useStreaming({
onToken: (token) => console.log('New token:', token),
onComplete: (text) => console.log('Done:', text)
});
return (
<div>
<div className="message">
{content}
{isStreaming && <span className="cursor">▊</span>}
</div>
<button onClick={() => stream('Tell me a story')} disabled={isStreaming}>
Generate
</button>
{isStreaming && <button onClick={cancel}>Stop</button>}
</div>
);
}
// app/api/chat/route.ts
import { OpenAI } from 'openai';
export const runtime = 'edge'; // Required for streaming
export async function POST(req: Request) {
const { prompt } = await req.json();
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY
});
const stream = await openai.chat.completions.create({
model: 'gpt-4',
messages: [{ role: 'user', content: prompt }],
stream: true
});
// Convert OpenAI stream to SSE format
const encoder = new TextEncoder();
const readable = new ReadableStream({
async start(controller) {
try {
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content;
if (content) {
const sseMessage = `data: ${JSON.stringify({ content })}\n\n`;
controller.enqueue(encoder.encode(sseMessage));
}
}
// Send completion signal
controller.enqueue(encoder.encode('data: {"done":true}\n\n'));
controller.close();
} catch (error) {
controller.error(error);
}
}
});
return new Response(readable, {
headers: {
'Content-Type': 'text/event-stream',
'Cache-Control': 'no-cache',
'Connection': 'keep-alive'
}
});
}
□ AbortController for cancellation
□ Error states with retry capability
□ Typing indicator during generation
□ Cleanup on component unmount
□ Rate limiting on API route
□ Token usage tracking
□ Streaming fallback (if API fails)
□ Accessibility (screen reader announces updates)
□ Mobile-friendly (touch targets for stop button)
□ Network error recovery (auto-retry on disconnect)
□ Max response length enforcement
□ Cost estimation before generation
| Scenario | Use Streaming? | |----------|----------------| | Chat interface | ✅ Yes | | Long-form content generation | ✅ Yes | | Code generation with preview | ✅ Yes | | Short completions (<50 words) | ❌ No - regular fetch | | Background jobs | ❌ No - use job queue | | Bidirectional chat | ⚠️ Use WebSockets instead |
| Feature | SSE | WebSockets | Long Polling | |---------|-----|-----------|--------------| | Complexity | Low | Medium | High | | Auto-reconnect | ✅ | ❌ | ❌ | | Bidirectional | ❌ | ✅ | ❌ | | Firewall-friendly | ✅ | ⚠️ | ✅ | | Browser support | ✅ All modern | ✅ All modern | ✅ Universal | | LLM API support | ✅ Standard | ❌ Rare | ❌ Not used |
/references/sse-protocol.md - Server-Sent Events specification details/references/vercel-ai-sdk.md - Vercel AI SDK integration patterns/references/error-recovery.md - Stream error handling strategiesscripts/stream_tester.ts - Test SSE endpoints locallyscripts/token_counter.ts - Estimate costs before generationThis skill guides: LLM streaming implementation | SSE protocol | Real-time UI updates | Cancellation | Error recovery | Token-by-token display
tools
Building resilient distributed systems with circuit breakers, retries with full-jitter exponential backoff, retry budgets (per-request 3-attempt + per-client 10% ratio per Google SRE), deadline propagation, and the cascading-failure math (4 layers × 3 retries = 64x amplification). Grounded in Resilience4j, Microsoft Cloud Patterns, AWS Architecture Blog (Marc Brooker), and Google SRE Book.
testing
Designing HTTP cache headers that work correctly across browsers, CDNs, and shared proxies — `Cache-Control` directives per RFC 9111, `stale-while-revalidate` and `stale-if-error` per RFC 5861, the Vary header for varying responses, and surrogate keys for tag-based purging. Grounded in IETF RFCs and Cloudflare/Fastly docs.
development
Use when designing or fixing a Content Security Policy on a real site, choosing between nonce-based and hash-based CSP, adding strict-dynamic, debugging "Refused to execute inline script" errors, deploying CSP in report-only mode first, configuring report-to / report-uri, or auditing an existing policy for unsafe-inline / unsafe-eval / wildcards. Triggers: "CSP blocks legitimate inline script", strict-dynamic, nonce-{RANDOM}, sha256-{HASH}, object-src none, base-uri none, frame-ancestors, Trusted Types, X-Content-Security-Policy obsolete, report-only vs enforced. NOT for general HTTP security headers (HSTS, COOP/COEP), Trusted Types deep dive, CORS configuration, or building a WAF.
tools
Choosing and operating an HTTP API versioning strategy that doesn't break clients — Stripe's date-based pinned versions, the Deprecation/Sunset header pair (RFC 9745 + RFC 8594), URI vs header vs media-type approaches, and the version-transformer pattern. Grounded in Stripe's published architecture and IETF RFCs.