Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

reaatech/skills/tts-provider-interface

Name: skills/tts-provider-interface
Author: reaatech

skills/tts-provider-interface/SKILL.md

npx skillsauth add reaatech/voice-agent-kit skills/tts-provider-interface

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

TTS Provider Interface

Capability

Provides a unified interface for text-to-speech (TTS) providers, enabling streaming audio synthesis with first-byte latency tracking, voice selection, and output format conversion.

MCP Tools

| Tool | Input Schema | Output | Rate Limit | |------|-------------|--------|------------| | tts.synthesize | z.object({ text: z.string(), config: z.object({ provider: z.string(), voice: z.string().optional(), speed: z.number().optional() }) }) | { chunks: AudioChunk[], firstByteMs: number } | 100 RPM | | tts.cancel | z.object({ synthesisId: z.string() }) | { cancelled: boolean } | 100 RPM | | tts.status | z.object({ synthesisId: z.string() }) | { status: string, progress: number, chunksGenerated: number } | 60 RPM | | tts.benchmark | z.object({ provider: z.string(), text: z.string() }) | { firstByteMs: number, totalMs: number, chunkCount: number } | 10 RPM |

Usage Examples

Example 1: Stream TTS synthesis

User intent: Convert text to streaming audio

Tool call:

{
  "name": "tts.synthesize",
  "arguments": {
    "text": "I can help you reset your password. Please hold while I send the reset link.",
    "config": {
      "provider": "deepgram",
      "voice": "aura-asteria-en",
      "speed": 1.0
    }
  }
}

Expected response (streaming chunks):

{
  "chunks": [
    { "audio": "<base64-pcm>", "timestamp": 1681617600000, "duration_ms": 50 },
    { "audio": "<base64-pcm>", "timestamp": 1681617600050, "duration_ms": 50 },
    { "audio": "<base64-pcm>", "timestamp": 1681617600100, "duration_ms": 50 }
  ],
  "firstByteMs": 120
}

Example 2: Cancel in-progress TTS (barge-in)

User intent: Stop TTS playback when user interrupts

Tool call:

{
  "name": "tts.cancel",
  "arguments": {
    "synthesisId": "tts-synth-456"
  }
}

Expected response:
```
{
  "cancelled": true
}
```

Example 3: Benchmark TTS latency

User intent: Measure TTS provider performance

Tool call:

{
  "name": "tts.benchmark",
  "arguments": {
    "provider": "deepgram",
    "text": "This is a test message to measure latency."
  }
}

Expected response:

{
  "firstByteMs": 145,
  "totalMs": 890,
  "chunkCount": 18
}

Error Handling

Known Failure Modes

| Failure | Cause | Recovery | |---------|-------|----------| | Provider API error | Invalid text, rate limit | Retry with backoff or use fallback | | WebSocket disconnect | Network issue | Reconnect and resume if possible | | Invalid voice ID | Voice not available | Fall back to default voice | | Text too long | Exceeds provider limit | Truncate or split into multiple requests | | Synthesis timeout | Provider slow | Cancel and use fallback response |

Recovery Strategies

Transient errors: Retry once with 100ms delay
Permanent errors: Return error, emit fallback audio (silence or beep)
Barge-in cancellation: Immediately stop synthesis, flush buffers

Security Considerations

PII Handling

Never log full synthesized text in production
Redact potential PII in TTS input logs
Do not store audio files unless explicitly configured

Permissions

API keys from environment variables only
Voice selection should be validated against allowed voices
Rate limiting per API key to prevent abuse

Audit Logging

Log synthesis requests (text length, voice, provider)
Track latency metrics (first-byte, total)
Record cancellation events (barge-in vs timeout)

Provider-Specific Notes

Deepgram Aura

Protocol: WebSocket streaming
Features: Ultra-low latency, multiple voices, streaming output
First-byte latency: ~100-200ms typical
Output format: PCM 24kHz, needs resampling to mulaw 8kHz
Voice examples: aura-asteria-en, aura-perseus-en, aura-hera-en

AWS Polly

Protocol: REST API with streaming response
Features: Neural voices, SSML support, speech marks
First-byte latency: ~200-400ms typical
Output format: Configurable (mp3, pcm, ogg)
Engine: Use neural for best quality

Google Cloud TTS

Protocol: gRPC streaming
Features: WaveNet and Neural2 voices, SSML, pitch/speed control
First-byte latency: ~200-300ms typical
Output format: LINEAR16, MP3, OGG
Voice selection: { languageCode: "en-US", name: "en-US-Neural2-A" }

Audio Output Formatting

Resampling Requirements

All TTS output must be converted to Twilio-compatible format:

Encoding: mulaw (µ-law)
Sample rate: 8000 Hz
Channels: 1 (mono)
Frame size: 20ms (160 samples)

Chunk Sizing

For smooth Twilio playback:

Chunk duration: 20ms frames
Buffer size: 160 samples per chunk (mulaw 8kHz)
Max buffer: 100ms (5 chunks) to minimize latency

Related Skills

Pipeline Orchestration
Audio Format Conversion
Barge-In Handling
Latency Budget

reaatech/skills/tts-provider-interface

skills/tts-provider-interface/SKILL.md

# TTS Provider Interface ## Capability Provides a unified interface for text-to-speech (TTS) providers, enabling streaming audio synthesis with first-byte latency tracking, voice selection, and output format conversion. ## MCP Tools | Tool | Input Schema | Output | Rate Limit | |------|-------------|--------|------------| | `tts.synthesize` | `z.object({ text: z.string(), config: z.object({ provider: z.string(), voice: z.string().optional(), speed: z.number().optional() }) })` | `{ chunks: A

tools

Updated Apr 25, 2026

$ install --global

skillsauth

npx skillsauth add reaatech/voice-agent-kit skills/tts-provider-interface

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 25, 2026, 3:03 PM64.9s1 file scanned

SKILL.md

TTS Provider Interface

Capability

Provides a unified interface for text-to-speech (TTS) providers, enabling streaming audio synthesis with first-byte latency tracking, voice selection, and output format conversion.

MCP Tools

Usage Examples

Example 1: Stream TTS synthesis

User intent: Convert text to streaming audio

Tool call:

{
  "name": "tts.synthesize",
  "arguments": {
    "text": "I can help you reset your password. Please hold while I send the reset link.",
    "config": {
      "provider": "deepgram",
      "voice": "aura-asteria-en",
      "speed": 1.0
    }
  }
}

Expected response (streaming chunks):

{
  "chunks": [
    { "audio": "<base64-pcm>", "timestamp": 1681617600000, "duration_ms": 50 },
    { "audio": "<base64-pcm>", "timestamp": 1681617600050, "duration_ms": 50 },
    { "audio": "<base64-pcm>", "timestamp": 1681617600100, "duration_ms": 50 }
  ],
  "firstByteMs": 120
}

Example 2: Cancel in-progress TTS (barge-in)

User intent: Stop TTS playback when user interrupts

Tool call:

{
  "name": "tts.cancel",
  "arguments": {
    "synthesisId": "tts-synth-456"
  }
}

Expected response:
```
{
  "cancelled": true
}
```

Example 3: Benchmark TTS latency

User intent: Measure TTS provider performance

Tool call:

{
  "name": "tts.benchmark",
  "arguments": {
    "provider": "deepgram",
    "text": "This is a test message to measure latency."
  }
}

Expected response:

{
  "firstByteMs": 145,
  "totalMs": 890,
  "chunkCount": 18
}

Error Handling

Known Failure Modes

Recovery Strategies

Transient errors: Retry once with 100ms delay
Permanent errors: Return error, emit fallback audio (silence or beep)
Barge-in cancellation: Immediately stop synthesis, flush buffers

Security Considerations

PII Handling

Never log full synthesized text in production
Redact potential PII in TTS input logs
Do not store audio files unless explicitly configured

Permissions

API keys from environment variables only
Voice selection should be validated against allowed voices
Rate limiting per API key to prevent abuse

Audit Logging

Log synthesis requests (text length, voice, provider)
Track latency metrics (first-byte, total)
Record cancellation events (barge-in vs timeout)

Provider-Specific Notes

Deepgram Aura

Protocol: WebSocket streaming
Features: Ultra-low latency, multiple voices, streaming output
First-byte latency: ~100-200ms typical
Output format: PCM 24kHz, needs resampling to mulaw 8kHz
Voice examples: aura-asteria-en, aura-perseus-en, aura-hera-en

AWS Polly

Protocol: REST API with streaming response
Features: Neural voices, SSML support, speech marks
First-byte latency: ~200-400ms typical
Output format: Configurable (mp3, pcm, ogg)
Engine: Use neural for best quality

Google Cloud TTS

Protocol: gRPC streaming
Features: WaveNet and Neural2 voices, SSML, pitch/speed control
First-byte latency: ~200-300ms typical
Output format: LINEAR16, MP3, OGG
Voice selection: { languageCode: "en-US", name: "en-US-Neural2-A" }

Audio Output Formatting

Resampling Requirements

All TTS output must be converted to Twilio-compatible format:

Encoding: mulaw (µ-law)
Sample rate: 8000 Hz
Channels: 1 (mono)
Frame size: 20ms (160 samples)

Chunk Sizing

For smooth Twilio playback:

Chunk duration: 20ms frames
Buffer size: 160 samples per chunk (mulaw 8kHz)
Max buffer: 100ms (5 chunks) to minimize latency

Related Skills

Pipeline Orchestration
Audio Format Conversion
Barge-In Handling
Latency Budget

Related Skills

reaatech/skills/twilio-media-streams

tools

VerifiedTrustedCommunity

# Twilio Media Streams ## Capability Handles Twilio Media Streams WebSocket connections for real-time bidirectional audio communication, parsing inbound messages, encoding outbound audio, and managing call lifecycle events. ## MCP Tools | Tool | Input Schema | Output | Rate Limit | |------|-------------|--------|------------| | `twilio.handleStart` | `z.object({ message: z.object({ event: z.literal('start'), callSid: z.string(), streamSid: z.string(), format: z.string(), tracks: z.array(z.st

SKILL.mdUpdated Apr 25, 2026

reaatech/skills/twilio-media-streams

reaatech/skills/telephony-lifecycle

tools

VerifiedTrustedCommunity

# Telephony Lifecycle ## Capability Manages the complete lifecycle of voice calls from TwiML webhook initiation through call completion, including call connect, transfer, conference, and disconnect handling with proper session cleanup. ## MCP Tools | Tool | Input Schema | Output | Rate Limit | |------|-------------|--------|------------| | `telephony.generateTwiML` | `z.object({ sessionId: z.string(), wsUrl: z.string().url() })` | `{ twiml: string }` | 100 RPM | | `telephony.handleConnect` |

SKILL.mdUpdated Apr 25, 2026

reaatech/skills/telephony-lifecycle

reaatech/skills/stt-provider-interface

tools

VerifiedTrustedCommunity

# STT Provider Interface ## Capability Provides a unified interface for speech-to-text (STT) providers, enabling real-time streaming transcription with interim results, endpoint detection, and automatic reconnection handling. ## MCP Tools | Tool | Input Schema | Output | Rate Limit | |------|-------------|--------|------------| | `stt.connect` | `z.object({ provider: z.enum(['deepgram', 'aws-transcribe', 'google-cloud']), config: z.object({ apiKey: z.string().optional(), sampleRate: z.number

SKILL.mdUpdated Apr 25, 2026

reaatech/skills/stt-provider-interface

reaatech/skills/session-management

tools

VerifiedTrustedCommunity

# Session Management ## Capability Manages voice call sessions with unique session IDs, conversation history, context preservation across turns, and automatic cleanup on disconnect or timeout. ## MCP Tools | Tool | Input Schema | Output | Rate Limit | |------|-------------|--------|------------| | `session.create` | `z.object({ callSid: z.string(), config: SessionConfig })` | `{ sessionId: string, createdAt: string }` | 100 RPM | | `session.get` | `z.object({ sessionId: z.string() })` | `{ s

SKILL.mdUpdated Apr 25, 2026

reaatech/skills/session-management

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/reaatech/voice-agent-kit.git

# Copy into Claude Code skills folder (global)
cp -r voice-agent-kit/skills/tts-provider-interface ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

reaatech/voice-agent-kit

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT