skills/assembly-ai-streaming/SKILL.md
This skill should be used when working with AssemblyAI’s Speech-to-Text and LLM Gateway APIs, especially for streaming/live transcription, meeting notetakers, and voice agents that need low-latency transcripts and audio analysis.
npx skillsauth add ratacat/claude-skills assemblyai-streamingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Use this skill to build and maintain code that talks to AssemblyAI’s:
wss://streaming.assemblyai.com/v3/ws)https://api.assemblyai.com/v2/transcript)https://llm-gateway.assemblyai.com)The emphasis is on streaming/live transcription, meeting notetakers, and voice agents, while still covering async workflows and post-processing.
This skill assumes a Claude Code environment with access to Python (preferred) and Bash.
Use this skill when:
Do not use this skill when:
Pre-recorded Speech-to-Text (Async)
POST /v2/transcript → GET /v2/transcript/{id}Streaming Speech-to-Text
wss://streaming.assemblyai.com/v3/wsLLM Gateway
POST /v1/chat/completions at https://llm-gateway.assemblyai.comspeech_models: ["slam-1", "universal"] etc.
language_code vs language_detection:
language_code when the language is known.language_detection: true when unknown; optionally set language_confidence_threshold.keyterms_prompt:
summarization, iab_categories, content_safety, entity_detection, auto_chapters, sentiment_analysis, speaker_labels, auto_highlights, redact_pii, etc.Connection URL:
wss://streaming.assemblyai.com/v3/wswss://streaming.eu.assemblyai.com/v3/wsImportant query parameters:
sample_rate (required): e.g. 16000format_turns (bool): return formatted final transcripts; avoid for low-latency voice agents.speech_model: universal-streaming-english (default) or universal-streaming-multi.rompt: JSON-encoded list of terms, e.g. ["AssemblyAI", "Slam-1", "Keanu Reeves"]`.
end_of_turn_confidence_threshold (0.0–1.0, default ~0.4)min_end_of_turn_silence_when_confident (ms, default ~400)max_turn_silence (ms, default ~1280)Headers:
Authorization: <API_KEY> or a short-lived token query parameter issued by your backend.Messages:
{"type": "UpdateConfig", ...}, {"type": "Terminate"}, {"type": "ForceEndpoint"}.Begin event with id, expires_at.Turn events with:
transcript (immutable partials/finals),utterance (complete semantic chunk),end_of_turn (bool),turn_is_formatted (bool),words array with timestamps/confidences.Termination event with summary stats.https://api.assemblyai.comhttps://api.eu.assemblyai.comwss://streaming.assemblyai.com/v3/wswss://streaming.eu.assemblyai.com/v3/wsAlways keep base URLs consistent per project; don’t mix US/EU endpoints for the same data.
ASSEMBLYAI_API_KEY."<YOUR_API_KEY>") in snippets.Is the audio live?
Is latency critical (<1s) for responses?
format_turns=false and careful turn detection.Do transcripts leave the backend?
redact_pii (and optionally redact_pii_audio) before sharing.Need LLM-based processing (Q&A, structured summaries)?
chat/completions.requests/websocket-client only if SDK cannot be installed.status, error fields in transcript JSON).transcript) and utterance field.Goal: Stream mic audio to AssemblyAI and print transcripts in real time.
Use this when the environment has Python and assemblyai + pyaudio installed, and the user wants a quick streaming demo.
import assemblyai as aai
from assemblyai.streaming import v3 as aai_stream
import pyaudio
API_KEY = "<YOUR_API_KEY>"
aai.settings.api_key = API_KEY
SAMPLE_RATE = 16000
CHUNK_MS = 50
FRAMES_PER_BUFFER = int(SAMPLE_RATE * (CHUNK_MS / 1000.0))
def main():
client = aai_stream.StreamingClient(
aai_stream.StreamingClientOptions(
api_key=API_KEY,
api_host="streaming.assemblyai.com", # or "streaming.eu.assemblyai.com"
)
)
def on_begin(_client, event: aai_stream.BeginEvent):
print(f"Session started: {event.id}, expires at {event.expires_at}")
def on_turn(_client, event: aai_stream.TurnEvent):
# Use immutable transcript text
text = (event.transcript or "").strip()
if not text:
return
# Use formatted finals only for display; keep unformatted for LLMs
if event.turn_is_formatted:
print(f"[FINAL] {text}")
else:
print(f"[PARTIAL] {text}", end="\r")
def on_terminated(_client, event: aai_stream.TerminationEvent):
print(f"\nTerminated. Audio duration={event.audio_duration_seconds}s")
def on_error(_client, error: aai_stream.StreamingError):
print(f"\nStreaming error: {error}")
client.on(aai_stream.StreamingEvents.Begin, on_begin)
client.on(aai_stream.StreamingEvents.Turn, on_turn)
client.on(aai_stream.StreamingEvents.Termination, on_terminated)
client.on(aai_stream.StreamingEvents.Error, on_error)
client.connect(
aai_stream.StreamingParameters(
sample_rate=SAMPLE_RATE,
format_turns=False, # better latency for voice agents
)
)
pa = pyaudio.PyAudio()
stream = pa.open(
format=pyaudio.paInt16,
channels=1,
rate=SAMPLE_RATE,
input=True,
frames_per_buffer=FRAMES_PER_BUFFER,
)
try:
print("Speak into your microphone (Ctrl+C to stop)...")
def audio_gen():
while True:
yield stream.read(FRAMES_PER_BUFFER, exception_on_overflow=False)
client.stream(audio_gen())
except KeyboardInterrupt:
pass
finally:
client.disconnect(terminate=True)
stream.stop_stream()
stream.close()
pa.terminate()
if __name__ == "__main__":
main()
tools
Build and test iOS apps on simulator using XcodeBuildMCP
development
Produces concise, clear documentation by applying Elements of Style principles. Use when writing or improving any technical documentation (READMEs, guides, API docs, architecture docs). Not for code comments.
testing
Use when user asks to create, write, edit, or test a skill. Also use when documenting reusable techniques, patterns, or workflows for future Claude instances.
testing
Execute work plans efficiently while maintaining quality and finishing features