.claude/skills/ts-assemblyai/SKILL.md
AssemblyAI API for speech recognition, transcription, and audio intelligence. Use when transcribing audio or video files, performing speaker diarization, running sentiment analysis on calls, detecting unsafe content in audio, or asking LLM-powered questions about recorded content with LeMUR.
npx skillsauth add eliferjunior/Claude assemblyaiInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
AssemblyAI provides best-in-class speech recognition plus an intelligence layer: speaker diarization, sentiment analysis, auto chapters, content moderation, and LeMUR (LLM-powered Q&A on audio). Use it to turn audio/video files into structured, queryable data.
pip install assemblyai python-dotenv
export ASSEMBLYAI_API_KEY="your_api_key_here"
import assemblyai as aai
import os
aai.settings.api_key = os.environ["ASSEMBLYAI_API_KEY"]
def transcribe(audio_source: str) -> aai.Transcript:
"""
audio_source: URL (https://...) or local file path.
Returns the completed Transcript object.
"""
transcriber = aai.Transcriber()
transcript = transcriber.transcribe(audio_source)
if transcript.status == aai.TranscriptStatus.error:
raise RuntimeError(f"Transcription error: {transcript.error}")
print(f"Transcript ID: {transcript.id}")
print(f"Text (first 300 chars): {transcript.text[:300]}...")
return transcript
t = transcribe("https://assembly.ai/sports_injuries.mp3")
print(t.text)
def transcribe_rich(audio_source: str) -> aai.Transcript:
"""Transcribe with speaker labels, sentiment, chapters, and content safety."""
config = aai.TranscriptionConfig(
speaker_labels=True, # Who said what
sentiment_analysis=True, # Positive/negative/neutral per sentence
auto_chapters=True, # Generate chapter markers
content_safety=True, # Detect profanity, hate speech, etc.
auto_highlights=True, # Key phrases and topics
entity_detection=True, # People, places, organizations
iab_categories=True, # Topic taxonomy
language_detection=True # Detect language automatically
)
transcriber = aai.Transcriber()
transcript = transcriber.transcribe(audio_source, config=config)
if transcript.status == aai.TranscriptStatus.error:
raise RuntimeError(transcript.error)
return transcript
t = transcribe_rich("https://your-audio.com/podcast.mp3")
# Speaker diarization
print("\n--- Speakers ---")
for utt in t.utterances:
print(f"[{utt.speaker}] {utt.text}")
# Chapters
print("\n--- Chapters ---")
for ch in t.chapters:
start_min = ch.start // 60000
print(f"[{start_min}m] {ch.headline}: {ch.summary}")
# Sentiment
print("\n--- Sentiment ---")
for s in t.sentiment_analysis[:5]:
print(f"{s.sentiment.value}: {s.text[:80]}")
# Content safety
print("\n--- Content Safety ---")
for label, result in t.content_safety_labels.results.items():
if result.status == "flagged":
print(f"Flagged: {label} (confidence: {result.confidence:.2f})")
import assemblyai as aai
import pyaudio # pip install pyaudio
def on_open(session_opened: aai.RealtimeSessionOpened):
print(f"Session opened: {session_opened.session_id}")
def on_data(transcript: aai.RealtimeTranscript):
if not transcript.text:
return
if isinstance(transcript, aai.RealtimeFinalTranscript):
print(f"\n[FINAL] {transcript.text}")
else:
print(f"\r[partial] {transcript.text}", end="")
def on_error(error: aai.RealtimeError):
print(f"Error: {error}")
def on_close():
print("Session closed.")
def stream_microphone():
"""Stream microphone input to AssemblyAI for real-time transcription."""
transcriber = aai.RealtimeTranscriber(
sample_rate=16_000,
on_data=on_data,
on_error=on_error,
on_open=on_open,
on_close=on_close,
end_utterance_silence_threshold=700
)
transcriber.connect()
FRAMES_PER_BUFFER = 3200
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 16_000
p = pyaudio.PyAudio()
stream = p.open(format=FORMAT, channels=CHANNELS, rate=RATE,
input=True, frames_per_buffer=FRAMES_PER_BUFFER)
try:
print("Recording... Press Ctrl+C to stop.")
while True:
data = stream.read(FRAMES_PER_BUFFER)
transcriber.stream(data)
except KeyboardInterrupt:
pass
finally:
stream.stop_stream()
stream.close()
p.terminate()
transcriber.close()
stream_microphone()
def lemur_qa(transcript_id: str, questions: list[str]) -> list[dict]:
"""
Ask LeMUR questions about a transcript.
Returns list of {question, answer} dicts.
"""
transcript = aai.Transcript.get_by_id(transcript_id)
questions_answers = transcript.lemur.question_answer(
questions=[
aai.LemurQuestion(question=q, answer_format="concise")
for q in questions
],
final_model=aai.LemurModel.claude3_5_sonnet
)
results = []
for qa in questions_answers.response:
print(f"Q: {qa.question}\nA: {qa.answer}\n")
results.append({"question": qa.question, "answer": qa.answer})
return results
# Use LeMUR to extract structured insights
lemur_qa(t.id, [
"What are the main topics discussed?",
"List any action items or decisions made.",
"What is the overall sentiment of the conversation?"
])
def lemur_summarize(transcript_id: str, context: str = "") -> str:
"""Generate a concise summary of a transcript."""
transcript = aai.Transcript.get_by_id(transcript_id)
result = transcript.lemur.summarize(
context=context or "This is a podcast episode.",
answer_format="bullet points",
final_model=aai.LemurModel.claude3_5_sonnet
)
print(result.response)
return result.response
summary = lemur_summarize(t.id, context="B2B SaaS podcast discussing AI trends")
def generate_show_notes(audio_url: str) -> dict:
"""Full podcast processing pipeline."""
config = aai.TranscriptionConfig(
speaker_labels=True,
auto_chapters=True,
auto_highlights=True
)
transcriber = aai.Transcriber()
transcript = transcriber.transcribe(audio_url, config=config)
if transcript.status == aai.TranscriptStatus.error:
raise RuntimeError(transcript.error)
# Build chapters list
chapters = [
{"time": f"{ch.start // 60000}:{(ch.start % 60000) // 1000:02d}",
"title": ch.headline,
"summary": ch.summary}
for ch in transcript.chapters
]
# LeMUR for show notes
show_notes = transcript.lemur.task(
prompt=(
"Write podcast show notes in markdown. Include: "
"1-paragraph episode summary, key takeaways as bullets, "
"and a list of resources mentioned."
),
final_model=aai.LemurModel.claude3_5_sonnet
)
# Social clips (key quotes)
social_prompt = transcript.lemur.task(
prompt="Extract 3 compelling quotes suitable for social media posts. Format each as a standalone quote with speaker label.",
final_model=aai.LemurModel.claude3_5_sonnet
)
return {
"transcript_id": transcript.id,
"full_text": transcript.text,
"chapters": chapters,
"show_notes": show_notes.response,
"social_clips": social_prompt.response
}
result = generate_show_notes("https://your-podcast.com/episode-42.mp3")
print(result["show_notes"])
| Feature | Config param | Description |
|---------|-------------|-------------|
| Speaker labels | speaker_labels=True | Identify and label each speaker |
| Sentiment analysis | sentiment_analysis=True | Per-sentence positive/negative/neutral |
| Auto chapters | auto_chapters=True | Detect topic segments with summaries |
| Content safety | content_safety=True | Flag hate speech, profanity, etc. |
| Entity detection | entity_detection=True | Extract names, places, organizations |
| Key phrases | auto_highlights=True | Most important topics and phrases |
| Language detection | language_detection=True | Auto-detect spoken language |
| PII redaction | redact_pii=True | Mask personal information |
transcriber.transcribe() — the SDK handles uploading.redact_pii=True) is useful for compliance when transcribing customer calls.development
Expert guidance for Fireworks AI, the platform for running open-source LLMs (Llama, Mixtral, Qwen, etc.) with enterprise-grade speed and reliability. Helps developers integrate Fireworks' inference API, fine-tune models, and deploy custom model endpoints with function calling and structured output support.
development
Convert any website into clean, structured data with Firecrawl — API-first web scraping service. Use when someone asks to "turn a website into markdown", "scrape website for LLM", "Firecrawl", "extract website content as clean text", "crawl and convert to structured data", or "scrape website for RAG". Covers single-page scraping, full-site crawling, structured extraction, and LLM-ready output.
tools
Expert guidance for Firebase, Google's platform for building and scaling web and mobile applications. Helps developers set up authentication, Firestore/Realtime Database, Cloud Functions, hosting, storage, and analytics using Firebase's SDK and CLI.
development
When the user needs to build file upload functionality for a web application. Use when the user mentions "file upload," "image upload," "upload endpoint," "multipart upload," "presigned URL," "S3 upload," "file validation," "upload to cloud storage," or "accept user files." Handles upload endpoints, file validation (type, size, magic bytes), cloud storage integration, and upload status tracking. For image/video processing after upload, see media-transcoder.