
# Audio Format Conversion ## Capability Handles audio encoding and decoding between various formats (mulaw, linear16, PCM) and sample rates, enabling interoperability between Twilio (mulaw 8kHz) and STT/TTS providers that may use different audio formats. ## MCP Tools | Tool | Input Schema | Output | Rate Limit | |------|-------------|--------|------------| | `audio.decode` | `z.object({ data: z.instanceof(Buffer), fromEncoding: z.enum(['mulaw', 'linear16', 'pcm']), toEncoding: z.enum(['mulaw
# Barge-In Handling ## Capability Detects user speech during TTS playback and immediately interrupts audio output to handle the new utterance, providing a natural conversational experience where users can interrupt the agent mid-sentence. ## MCP Tools | Tool | Input Schema | Output | Rate Limit | |------|-------------|--------|------------| | `bargeIn.enable` | `z.object({ sessionId: z.string(), config: BargeInConfig })` | `{ enabled: boolean }` | 10 RPM | | `bargeIn.detect` | `z.object({ se
# Conversation History ## Capability Manages multi-turn conversation context for voice agents, building structured history for MCP requests, handling context window limits, and maintaining coherent dialogue across turns with token-aware truncation. ## MCP Tools | Tool | Input Schema | Output | Rate Limit | |------|-------------|--------|------------| | `history.build` | `z.object({ sessionId: z.string(), maxTurns: z.number().optional() })` | `{ history: Array<{role: string, content: string}>
# Latency Budget ## Capability Enforces end-to-end latency budgets for voice interactions, tracking per-stage timing and triggering fallbacks when budgets are exceeded to maintain sub-second response times. ## MCP Tools | Tool | Input Schema | Output | Rate Limit | |------|-------------|--------|------------| | `latency.startTurn` | `z.object({ sessionId: z.string(), turnId: z.string() })` | `{ timerId: string, budget: BudgetConfig }` | 100 RPM | | `latency.checkStage` | `z.object({ timerId:
# Pipeline Orchestration ## Capability Coordinates the complete voice interaction pipeline from audio input to audio output, managing the flow between STT, MCP client, and TTS stages with event-driven architecture and latency budget enforcement. ## MCP Tools | Tool | Input Schema | Output | Rate Limit | |------|-------------|--------|------------| | `pipeline.create` | `z.object({ sessionId: z.string(), config: z.object({ stt: z.string(), tts: z.string(), mcp: z.string() }) })` | `{ pipeline
# STT Provider Interface ## Capability Provides a unified interface for speech-to-text (STT) providers, enabling real-time streaming transcription with interim results, endpoint detection, and automatic reconnection handling. ## MCP Tools | Tool | Input Schema | Output | Rate Limit | |------|-------------|--------|------------| | `stt.connect` | `z.object({ provider: z.enum(['deepgram', 'aws-transcribe', 'google-cloud']), config: z.object({ apiKey: z.string().optional(), sampleRate: z.number
# TTS Provider Interface ## Capability Provides a unified interface for text-to-speech (TTS) providers, enabling streaming audio synthesis with first-byte latency tracking, voice selection, and output format conversion. ## MCP Tools | Tool | Input Schema | Output | Rate Limit | |------|-------------|--------|------------| | `tts.synthesize` | `z.object({ text: z.string(), config: z.object({ provider: z.string(), voice: z.string().optional(), speed: z.number().optional() }) })` | `{ chunks: A
# MCP Client Integration ## Capability Connects to any MCP (Model Context Protocol) server to process user utterances, discover available tools, manage conversation history, and receive structured agent responses with timeout and retry handling. ## MCP Tools | Tool | Input Schema | Output | Rate Limit | |------|-------------|--------|------------| | `mcp.connect` | `z.object({ endpoint: z.string().url(), auth: z.object({ type: z.string(), credentials: z.record(z.string()) }).optional() })` |
# Session Management ## Capability Manages voice call sessions with unique session IDs, conversation history, context preservation across turns, and automatic cleanup on disconnect or timeout. ## MCP Tools | Tool | Input Schema | Output | Rate Limit | |------|-------------|--------|------------| | `session.create` | `z.object({ callSid: z.string(), config: SessionConfig })` | `{ sessionId: string, createdAt: string }` | 100 RPM | | `session.get` | `z.object({ sessionId: z.string() })` | `{ s
# Response Sanitization ## Capability Cleans and transforms MCP agent responses for optimal text-to-speech output, removing SSML/Markdown markup, handling special characters, truncating overly long responses, and ensuring voice-friendly formatting. ## MCP Tools | Tool | Input Schema | Output | Rate Limit | |------|-------------|--------|------------| | `sanitize.forTTS` | `z.object({ text: z.string(), options: SanitizeOptions.optional() })` | `{ sanitized: string, warnings: string[] }` | 100
# Twilio Media Streams ## Capability Handles Twilio Media Streams WebSocket connections for real-time bidirectional audio communication, parsing inbound messages, encoding outbound audio, and managing call lifecycle events. ## MCP Tools | Tool | Input Schema | Output | Rate Limit | |------|-------------|--------|------------| | `twilio.handleStart` | `z.object({ message: z.object({ event: z.literal('start'), callSid: z.string(), streamSid: z.string(), format: z.string(), tracks: z.array(z.st
# Telephony Lifecycle ## Capability Manages the complete lifecycle of voice calls from TwiML webhook initiation through call completion, including call connect, transfer, conference, and disconnect handling with proper session cleanup. ## MCP Tools | Tool | Input Schema | Output | Rate Limit | |------|-------------|--------|------------| | `telephony.generateTwiML` | `z.object({ sessionId: z.string(), wsUrl: z.string().url() })` | `{ twiml: string }` | 100 RPM | | `telephony.handleConnect` |