skills/twilio-media-streams/SKILL.md
# Twilio Media Streams ## Capability Handles Twilio Media Streams WebSocket connections for real-time bidirectional audio communication, parsing inbound messages, encoding outbound audio, and managing call lifecycle events. ## MCP Tools | Tool | Input Schema | Output | Rate Limit | |------|-------------|--------|------------| | `twilio.handleStart` | `z.object({ message: z.object({ event: z.literal('start'), callSid: z.string(), streamSid: z.string(), format: z.string(), tracks: z.array(z.st
npx skillsauth add reaatech/voice-agent-kit skills/twilio-media-streamsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Handles Twilio Media Streams WebSocket connections for real-time bidirectional audio communication, parsing inbound messages, encoding outbound audio, and managing call lifecycle events.
| Tool | Input Schema | Output | Rate Limit |
|------|-------------|--------|------------|
| twilio.handleStart | z.object({ message: z.object({ event: z.literal('start'), callSid: z.string(), streamSid: z.string(), format: z.string(), tracks: z.array(z.string()) }) }) | { sessionId: string, connected: boolean } | 100 RPM |
| twilio.handleMedia | z.object({ message: z.object({ event: z.literal('media'), streamSid: z.string(), payload: z.string(), timestamp: z.string() }) }) | { audioChunks: AudioChunk[] } | 1000 RPM |
| twilio.handleStop | z.object({ message: z.object({ event: z.literal('stop'), callSid: z.string(), streamSid: z.string() }) }) | { sessionId: string, closed: boolean } | 100 RPM |
| twilio.sendAudio | z.object({ streamSid: z.string(), audio: z.instanceof(Buffer) }) | { sent: boolean, chunkCount: number } | 1000 RPM |
| twilio.sendClear | z.object({ streamSid: z.string() }) | { sent: boolean } | 100 RPM |
| twilio.sendMark | z.object({ streamSid: z.string(), markId: z.string() }) | { sent: boolean, markId: string } | 100 RPM |
{
"name": "twilio.handleStart",
"arguments": {
"message": {
"event": "start",
"callSid": "CAxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"streamSid": "MStreamxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"format": "audio/x-mulaw",
"sampleRate": 8000,
"tracks": ["inbound_audio"]
}
}
}
{
"sessionId": "sess-abc123-def456",
"connected": true
}
{
"name": "twilio.handleMedia",
"arguments": {
"message": {
"event": "media",
"streamSid": "MStreamxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"payload": "Exxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx==",
"timestamp": "1681617600000"
}
}
}
{
"audioChunks": [
{
"buffer": "<Buffer 0x7f 0x7f 0x7f ...>",
"sampleRate": 8000,
"encoding": "mulaw",
"channels": 1,
"timestamp": 1681617600000
}
]
}
{
"name": "twilio.sendAudio",
"arguments": {
"streamSid": "MStreamxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"audio": "<Buffer mulaw 8kHz audio>"
}
}
{
"sent": true,
"chunkCount": 5
}
{
"name": "twilio.sendClear",
"arguments": {
"streamSid": "MStreamxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
}
}
{
"sent": true
}
| Failure | Cause | Recovery | |---------|-------|----------| | Invalid message format | Malformed Twilio message | Log error, skip message | | WebSocket disconnected | Network issue | Attempt reconnection, cleanup session | | Audio encoding error | Invalid buffer format | Log error, send silence instead | | Rate limit exceeded | Too many messages | Buffer messages, send in batches | | Call already ended | Duplicate stop message | Ignore, cleanup if needed |
// Validate incoming webhook requests
import twilio from 'twilio';
const validator = twilio.validateRequest(
process.env.TWILIO_AUTH_TOKEN,
request.headers['x-twilio-signature'],
request.originalUrl,
request.body
);
if (!validator) {
return res.status(403).send('Invalid signature');
}
{
"event": "start",
"callSid": "CAxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"streamSid": "MStreamxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"format": "audio/x-mulaw",
"sampleRate": 8000,
"tracks": ["inbound_audio"],
"customParameters": {}
}
{
"event": "media",
"streamSid": "MStreamxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"payload": "Exxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx==",
"timestamp": "1681617600000"
}
{
"event": "stop",
"callSid": "CAxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"streamSid": "MStreamxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
}
{
"event": "media",
"streamSid": "MStreamxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"payload": "Exxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx=="
}
{
"event": "clear",
"streamSid": "MStreamxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
}
{
"event": "mark",
"streamSid": "MStreamxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"markId": "mark-001"
}
For smooth playback:
| Metric | Type | Description |
|--------|------|-------------|
| twilio.calls.total | Counter | Total calls handled |
| twilio.calls.active | Gauge | Active calls |
| twilio.media.chunks | Counter | Audio chunks processed |
| twilio.media.bytes | Counter | Audio bytes transferred |
| twilio.errors.total | Counter | Twilio errors |
| twilio.websocket.disconnects | Counter | WebSocket disconnections |
| Span | Attributes |
|------|------------|
| twilio.call.start | call_sid, stream_sid, format |
| twilio.media.process | stream_sid, chunk_size |
| twilio.call.end | call_sid, duration_ms |
| twilio.audio.send | stream_sid, chunk_count |
tools
# TTS Provider Interface ## Capability Provides a unified interface for text-to-speech (TTS) providers, enabling streaming audio synthesis with first-byte latency tracking, voice selection, and output format conversion. ## MCP Tools | Tool | Input Schema | Output | Rate Limit | |------|-------------|--------|------------| | `tts.synthesize` | `z.object({ text: z.string(), config: z.object({ provider: z.string(), voice: z.string().optional(), speed: z.number().optional() }) })` | `{ chunks: A
tools
# Telephony Lifecycle ## Capability Manages the complete lifecycle of voice calls from TwiML webhook initiation through call completion, including call connect, transfer, conference, and disconnect handling with proper session cleanup. ## MCP Tools | Tool | Input Schema | Output | Rate Limit | |------|-------------|--------|------------| | `telephony.generateTwiML` | `z.object({ sessionId: z.string(), wsUrl: z.string().url() })` | `{ twiml: string }` | 100 RPM | | `telephony.handleConnect` |
tools
# STT Provider Interface ## Capability Provides a unified interface for speech-to-text (STT) providers, enabling real-time streaming transcription with interim results, endpoint detection, and automatic reconnection handling. ## MCP Tools | Tool | Input Schema | Output | Rate Limit | |------|-------------|--------|------------| | `stt.connect` | `z.object({ provider: z.enum(['deepgram', 'aws-transcribe', 'google-cloud']), config: z.object({ apiKey: z.string().optional(), sampleRate: z.number
tools
# Session Management ## Capability Manages voice call sessions with unique session IDs, conversation history, context preservation across turns, and automatic cleanup on disconnect or timeout. ## MCP Tools | Tool | Input Schema | Output | Rate Limit | |------|-------------|--------|------------| | `session.create` | `z.object({ callSid: z.string(), config: SessionConfig })` | `{ sessionId: string, createdAt: string }` | 100 RPM | | `session.get` | `z.object({ sessionId: z.string() })` | `{ s