skills/tts-provider-interface/SKILL.md
# TTS Provider Interface ## Capability Provides a unified interface for text-to-speech (TTS) providers, enabling streaming audio synthesis with first-byte latency tracking, voice selection, and output format conversion. ## MCP Tools | Tool | Input Schema | Output | Rate Limit | |------|-------------|--------|------------| | `tts.synthesize` | `z.object({ text: z.string(), config: z.object({ provider: z.string(), voice: z.string().optional(), speed: z.number().optional() }) })` | `{ chunks: A
npx skillsauth add reaatech/voice-agent-kit skills/tts-provider-interfaceInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Provides a unified interface for text-to-speech (TTS) providers, enabling streaming audio synthesis with first-byte latency tracking, voice selection, and output format conversion.
| Tool | Input Schema | Output | Rate Limit |
|------|-------------|--------|------------|
| tts.synthesize | z.object({ text: z.string(), config: z.object({ provider: z.string(), voice: z.string().optional(), speed: z.number().optional() }) }) | { chunks: AudioChunk[], firstByteMs: number } | 100 RPM |
| tts.cancel | z.object({ synthesisId: z.string() }) | { cancelled: boolean } | 100 RPM |
| tts.status | z.object({ synthesisId: z.string() }) | { status: string, progress: number, chunksGenerated: number } | 60 RPM |
| tts.benchmark | z.object({ provider: z.string(), text: z.string() }) | { firstByteMs: number, totalMs: number, chunkCount: number } | 10 RPM |
{
"name": "tts.synthesize",
"arguments": {
"text": "I can help you reset your password. Please hold while I send the reset link.",
"config": {
"provider": "deepgram",
"voice": "aura-asteria-en",
"speed": 1.0
}
}
}
{
"chunks": [
{ "audio": "<base64-pcm>", "timestamp": 1681617600000, "duration_ms": 50 },
{ "audio": "<base64-pcm>", "timestamp": 1681617600050, "duration_ms": 50 },
{ "audio": "<base64-pcm>", "timestamp": 1681617600100, "duration_ms": 50 }
],
"firstByteMs": 120
}
{
"name": "tts.cancel",
"arguments": {
"synthesisId": "tts-synth-456"
}
}
{
"cancelled": true
}
{
"name": "tts.benchmark",
"arguments": {
"provider": "deepgram",
"text": "This is a test message to measure latency."
}
}
{
"firstByteMs": 145,
"totalMs": 890,
"chunkCount": 18
}
| Failure | Cause | Recovery | |---------|-------|----------| | Provider API error | Invalid text, rate limit | Retry with backoff or use fallback | | WebSocket disconnect | Network issue | Reconnect and resume if possible | | Invalid voice ID | Voice not available | Fall back to default voice | | Text too long | Exceeds provider limit | Truncate or split into multiple requests | | Synthesis timeout | Provider slow | Cancel and use fallback response |
aura-asteria-en, aura-perseus-en, aura-hera-enneural for best quality{ languageCode: "en-US", name: "en-US-Neural2-A" }All TTS output must be converted to Twilio-compatible format:
For smooth Twilio playback:
tools
# Twilio Media Streams ## Capability Handles Twilio Media Streams WebSocket connections for real-time bidirectional audio communication, parsing inbound messages, encoding outbound audio, and managing call lifecycle events. ## MCP Tools | Tool | Input Schema | Output | Rate Limit | |------|-------------|--------|------------| | `twilio.handleStart` | `z.object({ message: z.object({ event: z.literal('start'), callSid: z.string(), streamSid: z.string(), format: z.string(), tracks: z.array(z.st
tools
# Telephony Lifecycle ## Capability Manages the complete lifecycle of voice calls from TwiML webhook initiation through call completion, including call connect, transfer, conference, and disconnect handling with proper session cleanup. ## MCP Tools | Tool | Input Schema | Output | Rate Limit | |------|-------------|--------|------------| | `telephony.generateTwiML` | `z.object({ sessionId: z.string(), wsUrl: z.string().url() })` | `{ twiml: string }` | 100 RPM | | `telephony.handleConnect` |
tools
# STT Provider Interface ## Capability Provides a unified interface for speech-to-text (STT) providers, enabling real-time streaming transcription with interim results, endpoint detection, and automatic reconnection handling. ## MCP Tools | Tool | Input Schema | Output | Rate Limit | |------|-------------|--------|------------| | `stt.connect` | `z.object({ provider: z.enum(['deepgram', 'aws-transcribe', 'google-cloud']), config: z.object({ apiKey: z.string().optional(), sampleRate: z.number
tools
# Session Management ## Capability Manages voice call sessions with unique session IDs, conversation history, context preservation across turns, and automatic cleanup on disconnect or timeout. ## MCP Tools | Tool | Input Schema | Output | Rate Limit | |------|-------------|--------|------------| | `session.create` | `z.object({ callSid: z.string(), config: SessionConfig })` | `{ sessionId: string, createdAt: string }` | 100 RPM | | `session.get` | `z.object({ sessionId: z.string() })` | `{ s