.copilot/skills/echo-suppression/SKILL.md
# Skill: WebSocket Echo Suppression for Realtime Voice AI ## Problem In a WebSocket middleware that bridges client audio ↔ AI model (e.g., OpenAI Realtime API), the AI's audio response leaks from the client's speakers back into the microphone, gets forwarded to the model, and is transcribed as phantom user input — creating an infinite self-conversation loop. ## Pattern Multi-layered suppression combining **client-side early muting** and **server-side audio gating** to eliminate the timing ga
npx skillsauth add swigerb/sonicaidrivethru .copilot/skills/echo-suppressionInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
In a WebSocket middleware that bridges client audio ↔ AI model (e.g., OpenAI Realtime API), the AI's audio response leaks from the client's speakers back into the microphone, gets forwarded to the model, and is transcribed as phantom user input — creating an infinite self-conversation loop.
Multi-layered suppression combining client-side early muting and server-side audio gating to eliminate the timing gap where echo leaks through.
response.created — mute the mic gain node at the earliest possible server event, BEFORE audio deltas arrive. Muting on response.audio.delta is too late — audio samples have already been sent.input_audio_buffer.clear on response.created — flush any already-buffered echo from the server's audio pipeline.response.done — re-open the mic when the AI finishes speaking.input_audio_buffer.speech_started stops playback, unmutes mic, resets state.State tracking (in server→client message path):
response.audio.delta → set ai_speaking = Trueresponse.audio.done → clear flag, start cooldown timer, send input_audio_buffer.clearinput_audio_buffer.speech_started → clear flag + cooldown (barge-in)Audio gating (in client→server message path):
ai_speaking or within cooldown window → drop input_audio_buffer.appendPerformance: Use fast substring markers ('"response.audio.delta"' in data) instead of JSON parse on the hot path.
The OpenAI Realtime API sends events in this order:
response.created ← mute here (earliest signal)response.output_item.addedresponse.content_part.addedresponse.audio.delta (repeated) ← too late to muteresponse.audio_transcript.delta (interleaved)response.done ← unmute hereWhen a useCallback inside a hook needs to call sendJsonMessage from useWebSocket, but useWebSocket takes that callback as a parameter, use a useRef to break the cycle:
const sendRef = useRef<(msg: object) => void>(() => {});
const onMessage = useCallback(() => { sendRef.current({...}); }, []);
const { sendJsonMessage } = useWebSocket(url, { onMessage });
useEffect(() => { sendRef.current = sendJsonMessage; }, [sendJsonMessage]);
_ECHO_COOLDOWN_SEC (default 0.3s): Post-response suppression window. Increase if echo persists on high-latency audio hardware.threshold (default 0.8): Higher rejects weak echo; too high may miss soft-spoken users.silence_duration_ms (default 500): Buffer before committing detected speech.data-ai
{what this skill teaches agents}
data-ai
{what this skill teaches agents}
tools
Cross-platform path handling and command patterns
development
Update tests when changing APIs — no exceptions