LLM Streaming Response Handler

Expert in building production-grade streaming interfaces for LLM responses that feel instant and responsive.

When to Use

✅ Use for:

Chat interfaces with typing animation
Real-time AI assistants
Code generation with live preview
Document summarization with progressive display
Any UI where users expect immediate feedback from LLMs

❌ NOT for:

Batch document processing (no user watching)
APIs that don't support streaming
WebSocket-based bidirectional chat (use Socket.IO)
Simple request/response (fetch is fine)

Quick Decision Tree

Does your LLM interaction:
├── Need immediate visual feedback? → Streaming
├── Display long-form content (&gt;100 words)? → Streaming
├── User expects typewriter effect? → Streaming
├── Short response (&lt;50 words)? → Regular fetch
└── Background processing? → Regular fetch

Technology Selection

Server-Sent Events (SSE) - Recommended

Why SSE over WebSockets for LLM streaming:

Simplicity: HTTP-based, works with existing infrastructure
Auto-reconnect: Built-in reconnection logic
Firewall-friendly: Easier than WebSockets through proxies
One-way perfect: LLMs only stream server → client

Timeline:

2015-2020: WebSockets for everything
2020: SSE adoption for streaming APIs
2023+: SSE standard for LLM streaming (OpenAI, Anthropic)
2024: Vercel AI SDK popularizes SSE patterns

Streaming APIs

| Provider | Streaming Method | Response Format | |----------|------------------|-----------------| | OpenAI | SSE | data: {"choices":[{"delta":{"content":"token"}}]} | | Anthropic | SSE | data: {"type":"content_block_delta","delta":{"text":"token"}} | | Claude (API) | SSE | data: {"delta":{"text":"token"}} | | Vercel AI SDK | SSE | Normalized across providers |

Common Anti-Patterns

Anti-Pattern 1: Buffering Before Display

Novice thinking: "Collect all tokens, then show complete response"

Problem: Defeats the entire purpose of streaming.

Wrong approach:

// ❌ Waits for entire response before showing anything
const response = await fetch('/api/chat', { method: 'POST', body: prompt });
const fullText = await response.text();
setMessage(fullText); // User sees nothing until done

Correct approach:

// ✅ Display tokens as they arrive
const response = await fetch('/api/chat', {
  method: 'POST',
  body: JSON.stringify({ prompt })
});

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  const chunk = decoder.decode(value);
  const lines = chunk.split('\n').filter(line => line.trim());

  for (const line of lines) {
    if (line.startsWith('data: ')) {
      const data = JSON.parse(line.slice(6));
      setMessage(prev => prev + data.content); // Update immediately
    }
  }
}

Timeline:

Pre-2023: Many apps buffered entire response
2023+: Token-by-token display expected

Anti-Pattern 2: No Stream Cancellation

Problem: User can't stop generation, wasting tokens and money.

Symptom: "Stop" button doesn't work or doesn't exist.

Correct approach:

// ✅ AbortController for cancellation
const [abortController, setAbortController] = useState<AbortController | null>(null);

const streamResponse = async () => {
  const controller = new AbortController();
  setAbortController(controller);

  try {
    const response = await fetch('/api/chat', {
      signal: controller.signal,
      method: 'POST',
      body: JSON.stringify({ prompt })
    });

    // Stream handling...
  } catch (error) {
    if (error.name === 'AbortError') {
      console.log('Stream cancelled by user');
    }
  } finally {
    setAbortController(null);
  }
};

const cancelStream = () => {
  abortController?.abort();
};

return (
  <button onClick={cancelStream} disabled={!abortController}>
    Stop Generating
  </button>
);

Anti-Pattern 3: No Error Recovery

Problem: Stream fails mid-response, user sees partial text with no indication of failure.

Correct approach:

// ✅ Error states and recovery
const [streamState, setStreamState] = useState<'idle' | 'streaming' | 'error' | 'complete'>('idle');
const [errorMessage, setErrorMessage] = useState<string | null>(null);

try {
  setStreamState('streaming');

  // Streaming logic...

  setStreamState('complete');
} catch (error) {
  setStreamState('error');

  if (error.name === 'AbortError') {
    setErrorMessage('Generation stopped');
  } else if (error.message.includes('429')) {
    setErrorMessage('Rate limit exceeded. Try again in a moment.');
  } else {
    setErrorMessage('Something went wrong. Please retry.');
  }
}

// UI feedback
{streamState === 'error' && (
  <div className="error-banner">
    {errorMessage}
    <button onClick={retryStream}>Retry</button>
  </div>
)}

Anti-Pattern 4: Memory Leaks from Unclosed Streams

Problem: Streams not cleaned up, causing memory leaks.

Symptom: Browser slows down after multiple requests.

Correct approach:

// ✅ Cleanup with useEffect
useEffect(() => {
  let reader: ReadableStreamDefaultReader | null = null;

  const streamResponse = async () => {
    const response = await fetch('/api/chat', { ... });
    reader = response.body.getReader();

    // Streaming...
  };

  streamResponse();

  // Cleanup on unmount
  return () => {
    reader?.cancel();
  };
}, [prompt]);

Anti-Pattern 5: No Typing Indicator Between Tokens

Problem: UI feels frozen between slow tokens.

Correct approach:

// ✅ Animated cursor during generation
<div className="message">
  {content}
  {isStreaming && <span className="typing-cursor">▊</span>}
</div>

.typing-cursor {
  animation: blink 1s step-end infinite;
}

@keyframes blink {
  50% { opacity: 0; }
}

Implementation Patterns

Pattern 1: Basic SSE Stream Handler

async function* streamCompletion(prompt: string) {
  const response = await fetch('/api/chat', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ prompt })
  });

  const reader = response.body!.getReader();
  const decoder = new TextDecoder();

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    const chunk = decoder.decode(value);
    const lines = chunk.split('\n');

    for (const line of lines) {
      if (line.startsWith('data: ')) {
        const data = JSON.parse(line.slice(6));

        if (data.content) {
          yield data.content;
        }

        if (data.done) {
          return;
        }
      }
    }
  }
}

// Usage
for await (const token of streamCompletion('Hello')) {
  console.log(token);
}

Pattern 2: React Hook for Streaming

import { useState, useCallback } from 'react';

interface UseStreamingOptions {
  onToken?: (token: string) => void;
  onComplete?: (fullText: string) => void;
  onError?: (error: Error) => void;
}

export function useStreaming(options: UseStreamingOptions = {}) {
  const [content, setContent] = useState('');
  const [isStreaming, setIsStreaming] = useState(false);
  const [error, setError] = useState<Error | null>(null);
  const [abortController, setAbortController] = useState<AbortController | null>(null);

  const stream = useCallback(async (prompt: string) => {
    const controller = new AbortController();
    setAbortController(controller);
    setIsStreaming(true);
    setError(null);
    setContent('');

    try {
      const response = await fetch('/api/chat', {
        method: 'POST',
        signal: controller.signal,
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ prompt })
      });

      const reader = response.body!.getReader();
      const decoder = new TextDecoder();

      let accumulated = '';

      while (true) {
        const { done, value } = await reader.read();
        if (done) break;

        const chunk = decoder.decode(value);
        const lines = chunk.split('\n').filter(line => line.trim());

        for (const line of lines) {
          if (line.startsWith('data: ')) {
            const data = JSON.parse(line.slice(6));

            if (data.content) {
              accumulated += data.content;
              setContent(accumulated);
              options.onToken?.(data.content);
            }
          }
        }
      }

      options.onComplete?.(accumulated);
    } catch (err) {
      if (err.name !== 'AbortError') {
        setError(err as Error);
        options.onError?.(err as Error);
      }
    } finally {
      setIsStreaming(false);
      setAbortController(null);
    }
  }, [options]);

  const cancel = useCallback(() => {
    abortController?.abort();
  }, [abortController]);

  return { content, isStreaming, error, stream, cancel };
}

// Usage in component
function ChatInterface() {
  const { content, isStreaming, stream, cancel } = useStreaming({
    onToken: (token) => console.log('New token:', token),
    onComplete: (text) => console.log('Done:', text)
  });

  return (
    <div>
      <div className="message">
        {content}
        {isStreaming && <span className="cursor">▊</span>}
      </div>

      <button onClick={() => stream('Tell me a story')} disabled={isStreaming}>
        Generate
      </button>

      {isStreaming && <button onClick={cancel}>Stop</button>}
    </div>
  );
}

Pattern 3: Server-Side Streaming (Next.js)

// app/api/chat/route.ts
import { OpenAI } from 'openai';

export const runtime = 'edge'; // Required for streaming

export async function POST(req: Request) {
  const { prompt } = await req.json();

  const openai = new OpenAI({
    apiKey: process.env.OPENAI_API_KEY
  });

  const stream = await openai.chat.completions.create({
    model: 'gpt-4',
    messages: [{ role: 'user', content: prompt }],
    stream: true
  });

  // Convert OpenAI stream to SSE format
  const encoder = new TextEncoder();

  const readable = new ReadableStream({
    async start(controller) {
      try {
        for await (const chunk of stream) {
          const content = chunk.choices[0]?.delta?.content;

          if (content) {
            const sseMessage = `data: ${JSON.stringify({ content })}\n\n`;
            controller.enqueue(encoder.encode(sseMessage));
          }
        }

        // Send completion signal
        controller.enqueue(encoder.encode('data: {"done":true}\n\n'));
        controller.close();
      } catch (error) {
        controller.error(error);
      }
    }
  });

  return new Response(readable, {
    headers: {
      'Content-Type': 'text/event-stream',
      'Cache-Control': 'no-cache',
      'Connection': 'keep-alive'
    }
  });
}

Production Checklist

□ AbortController for cancellation
□ Error states with retry capability
□ Typing indicator during generation
□ Cleanup on component unmount
□ Rate limiting on API route
□ Token usage tracking
□ Streaming fallback (if API fails)
□ Accessibility (screen reader announces updates)
□ Mobile-friendly (touch targets for stop button)
□ Network error recovery (auto-retry on disconnect)
□ Max response length enforcement
□ Cost estimation before generation

When to Use vs Avoid

| Scenario | Use Streaming? | |----------|----------------| | Chat interface | ✅ Yes | | Long-form content generation | ✅ Yes | | Code generation with preview | ✅ Yes | | Short completions (<50 words) | ❌ No - regular fetch | | Background jobs | ❌ No - use job queue | | Bidirectional chat | ⚠️ Use WebSockets instead |

Technology Comparison

| Feature | SSE | WebSockets | Long Polling | |---------|-----|-----------|--------------| | Complexity | Low | Medium | High | | Auto-reconnect | ✅ | ❌ | ❌ | | Bidirectional | ❌ | ✅ | ❌ | | Firewall-friendly | ✅ | ⚠️ | ✅ | | Browser support | ✅ All modern | ✅ All modern | ✅ Universal | | LLM API support | ✅ Standard | ❌ Rare | ❌ Not used |

References

/references/sse-protocol.md - Server-Sent Events specification details
/references/vercel-ai-sdk.md - Vercel AI SDK integration patterns
/references/error-recovery.md - Stream error handling strategies

Scripts

scripts/stream_tester.ts - Test SSE endpoints locally
scripts/token_counter.ts - Estimate costs before generation

LLM Streaming Response Handler

Expert in building production-grade streaming interfaces for LLM responses that feel instant and responsive.

When to Use

✅ Use for:

Chat interfaces with typing animation
Real-time AI assistants
Code generation with live preview
Document summarization with progressive display
Any UI where users expect immediate feedback from LLMs

❌ NOT for:

Batch document processing (no user watching)
APIs that don't support streaming
WebSocket-based bidirectional chat (use Socket.IO)
Simple request/response (fetch is fine)

Quick Decision Tree

Does your LLM interaction:
├── Need immediate visual feedback? → Streaming
├── Display long-form content (&gt;100 words)? → Streaming
├── User expects typewriter effect? → Streaming
├── Short response (&lt;50 words)? → Regular fetch
└── Background processing? → Regular fetch

Technology Selection

Server-Sent Events (SSE) - Recommended

Why SSE over WebSockets for LLM streaming:

Simplicity: HTTP-based, works with existing infrastructure
Auto-reconnect: Built-in reconnection logic
Firewall-friendly: Easier than WebSockets through proxies
One-way perfect: LLMs only stream server → client

Timeline:

2015-2020: WebSockets for everything
2020: SSE adoption for streaming APIs
2023+: SSE standard for LLM streaming (OpenAI, Anthropic)
2024: Vercel AI SDK popularizes SSE patterns

Streaming APIs

Common Anti-Patterns

Anti-Pattern 1: Buffering Before Display

Novice thinking: "Collect all tokens, then show complete response"

Problem: Defeats the entire purpose of streaming.

Wrong approach:

// ❌ Waits for entire response before showing anything
const response = await fetch('/api/chat', { method: 'POST', body: prompt });
const fullText = await response.text();
setMessage(fullText); // User sees nothing until done

Correct approach:

// ✅ Display tokens as they arrive
const response = await fetch('/api/chat', {
  method: 'POST',
  body: JSON.stringify({ prompt })
});

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  const chunk = decoder.decode(value);
  const lines = chunk.split('\n').filter(line => line.trim());

  for (const line of lines) {
    if (line.startsWith('data: ')) {
      const data = JSON.parse(line.slice(6));
      setMessage(prev => prev + data.content); // Update immediately
    }
  }
}

Timeline:

Pre-2023: Many apps buffered entire response
2023+: Token-by-token display expected

Anti-Pattern 2: No Stream Cancellation

Problem: User can't stop generation, wasting tokens and money.

Symptom: "Stop" button doesn't work or doesn't exist.

Correct approach:

// ✅ AbortController for cancellation
const [abortController, setAbortController] = useState<AbortController | null>(null);

const streamResponse = async () => {
  const controller = new AbortController();
  setAbortController(controller);

  try {
    const response = await fetch('/api/chat', {
      signal: controller.signal,
      method: 'POST',
      body: JSON.stringify({ prompt })
    });

    // Stream handling...
  } catch (error) {
    if (error.name === 'AbortError') {
      console.log('Stream cancelled by user');
    }
  } finally {
    setAbortController(null);
  }
};

const cancelStream = () => {
  abortController?.abort();
};

return (
  <button onClick={cancelStream} disabled={!abortController}>
    Stop Generating
  </button>
);

Anti-Pattern 3: No Error Recovery

Problem: Stream fails mid-response, user sees partial text with no indication of failure.

Correct approach:

// ✅ Error states and recovery
const [streamState, setStreamState] = useState<'idle' | 'streaming' | 'error' | 'complete'>('idle');
const [errorMessage, setErrorMessage] = useState<string | null>(null);

try {
  setStreamState('streaming');

  // Streaming logic...

  setStreamState('complete');
} catch (error) {
  setStreamState('error');

  if (error.name === 'AbortError') {
    setErrorMessage('Generation stopped');
  } else if (error.message.includes('429')) {
    setErrorMessage('Rate limit exceeded. Try again in a moment.');
  } else {
    setErrorMessage('Something went wrong. Please retry.');
  }
}

// UI feedback
{streamState === 'error' && (
  <div className="error-banner">
    {errorMessage}
    <button onClick={retryStream}>Retry</button>
  </div>
)}

Anti-Pattern 4: Memory Leaks from Unclosed Streams

Problem: Streams not cleaned up, causing memory leaks.

Symptom: Browser slows down after multiple requests.

Correct approach:

// ✅ Cleanup with useEffect
useEffect(() => {
  let reader: ReadableStreamDefaultReader | null = null;

  const streamResponse = async () => {
    const response = await fetch('/api/chat', { ... });
    reader = response.body.getReader();

    // Streaming...
  };

  streamResponse();

  // Cleanup on unmount
  return () => {
    reader?.cancel();
  };
}, [prompt]);

Anti-Pattern 5: No Typing Indicator Between Tokens

Problem: UI feels frozen between slow tokens.

Correct approach:

// ✅ Animated cursor during generation
<div className="message">
  {content}
  {isStreaming && <span className="typing-cursor">▊</span>}
</div>

.typing-cursor {
  animation: blink 1s step-end infinite;
}

@keyframes blink {
  50% { opacity: 0; }
}

Implementation Patterns

Pattern 1: Basic SSE Stream Handler

async function* streamCompletion(prompt: string) {
  const response = await fetch('/api/chat', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ prompt })
  });

  const reader = response.body!.getReader();
  const decoder = new TextDecoder();

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    const chunk = decoder.decode(value);
    const lines = chunk.split('\n');

    for (const line of lines) {
      if (line.startsWith('data: ')) {
        const data = JSON.parse(line.slice(6));

        if (data.content) {
          yield data.content;
        }

        if (data.done) {
          return;
        }
      }
    }
  }
}

// Usage
for await (const token of streamCompletion('Hello')) {
  console.log(token);
}

Pattern 2: React Hook for Streaming

import { useState, useCallback } from 'react';

interface UseStreamingOptions {
  onToken?: (token: string) => void;
  onComplete?: (fullText: string) => void;
  onError?: (error: Error) => void;
}

export function useStreaming(options: UseStreamingOptions = {}) {
  const [content, setContent] = useState('');
  const [isStreaming, setIsStreaming] = useState(false);
  const [error, setError] = useState<Error | null>(null);
  const [abortController, setAbortController] = useState<AbortController | null>(null);

  const stream = useCallback(async (prompt: string) => {
    const controller = new AbortController();
    setAbortController(controller);
    setIsStreaming(true);
    setError(null);
    setContent('');

    try {
      const response = await fetch('/api/chat', {
        method: 'POST',
        signal: controller.signal,
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ prompt })
      });

      const reader = response.body!.getReader();
      const decoder = new TextDecoder();

      let accumulated = '';

      while (true) {
        const { done, value } = await reader.read();
        if (done) break;

        const chunk = decoder.decode(value);
        const lines = chunk.split('\n').filter(line => line.trim());

        for (const line of lines) {
          if (line.startsWith('data: ')) {
            const data = JSON.parse(line.slice(6));

            if (data.content) {
              accumulated += data.content;
              setContent(accumulated);
              options.onToken?.(data.content);
            }
          }
        }
      }

      options.onComplete?.(accumulated);
    } catch (err) {
      if (err.name !== 'AbortError') {
        setError(err as Error);
        options.onError?.(err as Error);
      }
    } finally {
      setIsStreaming(false);
      setAbortController(null);
    }
  }, [options]);

  const cancel = useCallback(() => {
    abortController?.abort();
  }, [abortController]);

  return { content, isStreaming, error, stream, cancel };
}

// Usage in component
function ChatInterface() {
  const { content, isStreaming, stream, cancel } = useStreaming({
    onToken: (token) => console.log('New token:', token),
    onComplete: (text) => console.log('Done:', text)
  });

  return (
    <div>
      <div className="message">
        {content}
        {isStreaming && <span className="cursor">▊</span>}
      </div>

      <button onClick={() => stream('Tell me a story')} disabled={isStreaming}>
        Generate
      </button>

      {isStreaming && <button onClick={cancel}>Stop</button>}
    </div>
  );
}

Pattern 3: Server-Side Streaming (Next.js)

// app/api/chat/route.ts
import { OpenAI } from 'openai';

export const runtime = 'edge'; // Required for streaming

export async function POST(req: Request) {
  const { prompt } = await req.json();

  const openai = new OpenAI({
    apiKey: process.env.OPENAI_API_KEY
  });

  const stream = await openai.chat.completions.create({
    model: 'gpt-4',
    messages: [{ role: 'user', content: prompt }],
    stream: true
  });

  // Convert OpenAI stream to SSE format
  const encoder = new TextEncoder();

  const readable = new ReadableStream({
    async start(controller) {
      try {
        for await (const chunk of stream) {
          const content = chunk.choices[0]?.delta?.content;

          if (content) {
            const sseMessage = `data: ${JSON.stringify({ content })}\n\n`;
            controller.enqueue(encoder.encode(sseMessage));
          }
        }

        // Send completion signal
        controller.enqueue(encoder.encode('data: {"done":true}\n\n'));
        controller.close();
      } catch (error) {
        controller.error(error);
      }
    }
  });

  return new Response(readable, {
    headers: {
      'Content-Type': 'text/event-stream',
      'Cache-Control': 'no-cache',
      'Connection': 'keep-alive'
    }
  });
}

Production Checklist

□ AbortController for cancellation
□ Error states with retry capability
□ Typing indicator during generation
□ Cleanup on component unmount
□ Rate limiting on API route
□ Token usage tracking
□ Streaming fallback (if API fails)
□ Accessibility (screen reader announces updates)
□ Mobile-friendly (touch targets for stop button)
□ Network error recovery (auto-retry on disconnect)
□ Max response length enforcement
□ Cost estimation before generation

When to Use vs Avoid

Technology Comparison

References

/references/sse-protocol.md - Server-Sent Events specification details
/references/vercel-ai-sdk.md - Vercel AI SDK integration patterns
/references/error-recovery.md - Stream error handling strategies

Scripts

scripts/stream_tester.ts - Test SSE endpoints locally
scripts/token_counter.ts - Estimate costs before generation

Adoption

curiositech/llm-streaming-response-handler

$ install --global

Security Scan Results

SKILL.md

LLM Streaming Response Handler

When to Use

Quick Decision Tree

Technology Selection

Server-Sent Events (SSE) - Recommended

Streaming APIs

Common Anti-Patterns

Anti-Pattern 1: Buffering Before Display

Anti-Pattern 2: No Stream Cancellation

Anti-Pattern 3: No Error Recovery

Anti-Pattern 4: Memory Leaks from Unclosed Streams

Anti-Pattern 5: No Typing Indicator Between Tokens

Implementation Patterns

Pattern 1: Basic SSE Stream Handler

Pattern 2: React Hook for Streaming

Pattern 3: Server-Side Streaming (Next.js)

Production Checklist

When to Use vs Avoid

Technology Comparison

References

Scripts

Related Skills

curiositech/revisiting-interview-data-analysing-turn

curiositech/redis-patterns-expert

curiositech/react-server-components-boundary

curiositech/rate-limiting-strategy

curiositech/llm-streaming-response-handler

$ install --global

Security Scan Results

SKILL.md

LLM Streaming Response Handler

When to Use

Quick Decision Tree

Technology Selection

Server-Sent Events (SSE) - Recommended

Streaming APIs

Common Anti-Patterns

Anti-Pattern 1: Buffering Before Display

Anti-Pattern 2: No Stream Cancellation

Anti-Pattern 3: No Error Recovery

Anti-Pattern 4: Memory Leaks from Unclosed Streams

Anti-Pattern 5: No Typing Indicator Between Tokens

Implementation Patterns

Pattern 1: Basic SSE Stream Handler

Pattern 2: React Hook for Streaming

Pattern 3: Server-Side Streaming (Next.js)

Production Checklist

When to Use vs Avoid

Technology Comparison

References

Scripts

Related Skills

curiositech/revisiting-interview-data-analysing-turn

curiositech/redis-patterns-expert

curiositech/react-server-components-boundary

curiositech/rate-limiting-strategy