skills/stt-provider-interface/SKILL.md
# STT Provider Interface ## Capability Provides a unified interface for speech-to-text (STT) providers, enabling real-time streaming transcription with interim results, endpoint detection, and automatic reconnection handling. ## MCP Tools | Tool | Input Schema | Output | Rate Limit | |------|-------------|--------|------------| | `stt.connect` | `z.object({ provider: z.enum(['deepgram', 'aws-transcribe', 'google-cloud']), config: z.object({ apiKey: z.string().optional(), sampleRate: z.number
npx skillsauth add reaatech/voice-agent-kit skills/stt-provider-interfaceInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Provides a unified interface for speech-to-text (STT) providers, enabling real-time streaming transcription with interim results, endpoint detection, and automatic reconnection handling.
| Tool | Input Schema | Output | Rate Limit |
|------|-------------|--------|------------|
| stt.connect | z.object({ provider: z.enum(['deepgram', 'aws-transcribe', 'google-cloud']), config: z.object({ apiKey: z.string().optional(), sampleRate: z.number() }) }) | { connected: boolean, provider: string } | 10 RPM |
| stt.streamAudio | z.object({ connectionId: z.string(), chunk: z.instanceof(Buffer), timestamp: z.number() }) | { utterances: Utterance[] } | 1000 RPM |
| stt.close | z.object({ connectionId: z.string() }) | { closed: boolean } | 60 RPM |
| stt.status | z.object({ connectionId: z.string() }) | { connected: boolean, latency: number, errorCount: number } | 60 RPM |
{
"name": "stt.connect",
"arguments": {
"provider": "deepgram",
"config": {
"apiKey": "${DEEPGRAM_API_KEY}",
"sampleRate": 8000
}
}
}
{
"connected": true,
"provider": "deepgram"
}
{
"name": "stt.streamAudio",
"arguments": {
"connectionId": "stt-conn-123",
"chunk": "<base64-mulaw-8kHz>",
"timestamp": 1681617600000
}
}
{
"utterances": [
{
"transcript": "I'd like to book",
"confidence": 0.87,
"isFinal": false,
"timestamp": 1681617600100
},
{
"transcript": "I'd like to book an appointment",
"confidence": 0.95,
"isFinal": true,
"timestamp": 1681617600500,
"duration_ms": 2300
}
]
}
endOfSpeech event after configurable silence threshold (default: 500ms)sttProvider.onEndOfSpeech(() => {
// User finished speaking - send to MCP
pipeline.sendToMCP(lastUtterance);
});
| Failure | Cause | Recovery | |---------|-------|----------| | WebSocket disconnect | Network issue, provider restart | Auto-reconnect with exponential backoff | | Authentication failure | Invalid API key | Return error, do not retry | | Audio format mismatch | Wrong sample rate or encoding | Convert format or return error | | Rate limit exceeded | Too many requests | Backoff and retry with jitter | | Provider timeout | No response within timeout | Reconnect and resume streaming |
tools
# Twilio Media Streams ## Capability Handles Twilio Media Streams WebSocket connections for real-time bidirectional audio communication, parsing inbound messages, encoding outbound audio, and managing call lifecycle events. ## MCP Tools | Tool | Input Schema | Output | Rate Limit | |------|-------------|--------|------------| | `twilio.handleStart` | `z.object({ message: z.object({ event: z.literal('start'), callSid: z.string(), streamSid: z.string(), format: z.string(), tracks: z.array(z.st
tools
# TTS Provider Interface ## Capability Provides a unified interface for text-to-speech (TTS) providers, enabling streaming audio synthesis with first-byte latency tracking, voice selection, and output format conversion. ## MCP Tools | Tool | Input Schema | Output | Rate Limit | |------|-------------|--------|------------| | `tts.synthesize` | `z.object({ text: z.string(), config: z.object({ provider: z.string(), voice: z.string().optional(), speed: z.number().optional() }) })` | `{ chunks: A
tools
# Telephony Lifecycle ## Capability Manages the complete lifecycle of voice calls from TwiML webhook initiation through call completion, including call connect, transfer, conference, and disconnect handling with proper session cleanup. ## MCP Tools | Tool | Input Schema | Output | Rate Limit | |------|-------------|--------|------------| | `telephony.generateTwiML` | `z.object({ sessionId: z.string(), wsUrl: z.string().url() })` | `{ twiml: string }` | 100 RPM | | `telephony.handleConnect` |
tools
# Session Management ## Capability Manages voice call sessions with unique session IDs, conversation history, context preservation across turns, and automatic cleanup on disconnect or timeout. ## MCP Tools | Tool | Input Schema | Output | Rate Limit | |------|-------------|--------|------------| | `session.create` | `z.object({ callSid: z.string(), config: SessionConfig })` | `{ sessionId: string, createdAt: string }` | 100 RPM | | `session.get` | `z.object({ sessionId: z.string() })` | `{ s