skills/barge-in-handling/SKILL.md
# Barge-In Handling ## Capability Detects user speech during TTS playback and immediately interrupts audio output to handle the new utterance, providing a natural conversational experience where users can interrupt the agent mid-sentence. ## MCP Tools | Tool | Input Schema | Output | Rate Limit | |------|-------------|--------|------------| | `bargeIn.enable` | `z.object({ sessionId: z.string(), config: BargeInConfig })` | `{ enabled: boolean }` | 10 RPM | | `bargeIn.detect` | `z.object({ se
npx skillsauth add reaatech/voice-agent-kit skills/barge-in-handlingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Detects user speech during TTS playback and immediately interrupts audio output to handle the new utterance, providing a natural conversational experience where users can interrupt the agent mid-sentence.
| Tool | Input Schema | Output | Rate Limit |
|------|-------------|--------|------------|
| bargeIn.enable | z.object({ sessionId: z.string(), config: BargeInConfig }) | { enabled: boolean } | 10 RPM |
| bargeIn.detect | z.object({ sessionId: z.string(), interimTranscript: z.string(), confidence: z.number() }) | { interrupted: boolean, action: 'continue' \| 'interrupt' } | 1000 RPM |
| bargeIn.trigger | z.object({ sessionId: z.string(), reason: z.string() }) | { cancelled: boolean, ttsStopped: boolean } | 100 RPM |
| bargeIn.disable | z.object({ sessionId: z.string() }) | { disabled: boolean } | 10 RPM |
{
"name": "bargeIn.enable",
"arguments": {
"sessionId": "sess-abc123",
"config": {
"minSpeechDuration": 300,
"confidenceThreshold": 0.7,
"silenceThreshold": 200
}
}
}
{
"enabled": true
}
{
"name": "bargeIn.detect",
"arguments": {
"sessionId": "sess-abc123",
"interimTranscript": "wait I didn't mean",
"confidence": 0.85
}
}
{
"interrupted": true,
"action": "interrupt"
}
{
"name": "bargeIn.trigger",
"arguments": {
"sessionId": "sess-abc123",
"reason": "user_interrupted"
}
}
{
"cancelled": true,
"ttsStopped": true
}
| Failure | Cause | Recovery | |---------|-------|----------| | TTS already stopped | Race condition | Log warning, continue with new utterance | | False positive detection | Background noise | Tune confidence threshold, add min duration | | Missed detection | Low confidence threshold | Lower threshold, increase STT sensitivity | | WebSocket send failure | Connection closed | Cleanup session, end call |
# voice-agent-kit.config.ts
bargeIn:
# Detection settings
enabled: true
minSpeechDuration: 300 # ms of speech before triggering
confidenceThreshold: 0.7 # STT confidence required
silenceThreshold: 200 # ms silence before considering complete
# Response settings
immediateCancel: true # Cancel TTS immediately on detect
drainQueue: false # Don't send remaining TTS chunks
# Tuning
ignoreShortUtterances: true # Ignore < 2 words
minWords: 2
function shouldInterrupt(interimTranscript: string, confidence: number): boolean {
// Don't interrupt if confidence too low
if (confidence < 0.7) return false;
// Don't interrupt for very short utterances (likely noise)
const words = interimTranscript.split(/\s+/).filter(w => w.length > 0);
if (words.length < 2) return false;
// Interrupt for common interruption patterns
const interruptionPatterns = [
/\b(wait|stop|hold on|never mind|actually|no|that's not)/i,
/\b(let me|I want|I need|can I)\b/i
];
return interruptionPatterns.some(pattern => pattern.test(interimTranscript));
}
1. STT emits interim transcript during TTS playback
│
2. Barge-in detector evaluates transcript + confidence
│
3. If interruption detected:
│ a. Send 'clear' message to Twilio (stops playback)
│ b. Cancel in-flight TTS synthesis
│ c. Emit 'barge_in' event
│ d. Feed new utterance to pipeline
│
4. Previous turn abandoned (no history update)
│
5. New turn begins with user's interruption
| Metric | Type | Description |
|--------|------|-------------|
| voice.barge_in.count | Counter | Total barge-in events |
| voice.barge_in.false_positives | Counter | Incorrectly triggered interruptions |
| voice.barge_in.missed | Counter | Missed interruptions (user repeated) |
| voice.barge_in.latency_ms | Histogram | Time from speech detect to TTS cancel |
| Span | Attributes |
|------|------------|
| voice.barge_in.detect | session_id, transcript_length, confidence |
| voice.barge_in.trigger | session_id, reason, tts_position_ms |
| voice.barge_in.cancel_tts | session_id, chunks_cancelled |
tools
# Twilio Media Streams ## Capability Handles Twilio Media Streams WebSocket connections for real-time bidirectional audio communication, parsing inbound messages, encoding outbound audio, and managing call lifecycle events. ## MCP Tools | Tool | Input Schema | Output | Rate Limit | |------|-------------|--------|------------| | `twilio.handleStart` | `z.object({ message: z.object({ event: z.literal('start'), callSid: z.string(), streamSid: z.string(), format: z.string(), tracks: z.array(z.st
tools
# TTS Provider Interface ## Capability Provides a unified interface for text-to-speech (TTS) providers, enabling streaming audio synthesis with first-byte latency tracking, voice selection, and output format conversion. ## MCP Tools | Tool | Input Schema | Output | Rate Limit | |------|-------------|--------|------------| | `tts.synthesize` | `z.object({ text: z.string(), config: z.object({ provider: z.string(), voice: z.string().optional(), speed: z.number().optional() }) })` | `{ chunks: A
tools
# Telephony Lifecycle ## Capability Manages the complete lifecycle of voice calls from TwiML webhook initiation through call completion, including call connect, transfer, conference, and disconnect handling with proper session cleanup. ## MCP Tools | Tool | Input Schema | Output | Rate Limit | |------|-------------|--------|------------| | `telephony.generateTwiML` | `z.object({ sessionId: z.string(), wsUrl: z.string().url() })` | `{ twiml: string }` | 100 RPM | | `telephony.handleConnect` |
tools
# STT Provider Interface ## Capability Provides a unified interface for speech-to-text (STT) providers, enabling real-time streaming transcription with interim results, endpoint detection, and automatic reconnection handling. ## MCP Tools | Tool | Input Schema | Output | Rate Limit | |------|-------------|--------|------------| | `stt.connect` | `z.object({ provider: z.enum(['deepgram', 'aws-transcribe', 'google-cloud']), config: z.object({ apiKey: z.string().optional(), sampleRate: z.number