tools/transcribe/SKILL.md
Transcribe audio files to text with optional diarization and known-speaker hints. Use when a user asks to transcribe speech from audio/video, extract text from recordings, or label speakers in interviews or meetings.
npx skillsauth add letta-ai/skills transcribeInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Transcribe audio using OpenAI, with optional speaker diarization when requested. Prefer the bundled CLI for deterministic, repeatable runs.
OPENAI_API_KEY is set. If missing, ask the user to set it locally (do not ask them to paste the key).transcribe_diarize.py CLI with sensible defaults (fast text transcription).output/transcribe/ when working in this repo.gpt-4o-mini-transcribe with --response-format text for fast transcription.--model gpt-4o-transcribe-diarize --response-format diarized_json.--chunking-strategy auto.gpt-4o-transcribe-diarize.output/transcribe/<job-id>/ for evaluation runs.--out-dir for multiple files to avoid overwriting.Prefer uv for dependency management.
uv pip install openai
If uv is unavailable:
python3 -m pip install openai
OPENAI_API_KEY must be set for live API calls.# Set to the directory containing this SKILL.md
export TRANSCRIBE_CLI="<path-to-skill>/scripts/transcribe_diarize.py"
Replace <path-to-skill> with the actual skill installation directory (e.g. .skills/transcribe or ~/.letta/skills/transcribe).
Single file (fast text default):
python3 "$TRANSCRIBE_CLI" \
path/to/audio.wav \
--out transcript.txt
Diarization with known speakers (up to 4):
python3 "$TRANSCRIBE_CLI" \
meeting.m4a \
--model gpt-4o-transcribe-diarize \
--known-speaker "Alice=refs/alice.wav" \
--known-speaker "Bob=refs/bob.wav" \
--response-format diarized_json \
--out-dir output/transcribe/meeting
Plain text output (explicit):
python3 "$TRANSCRIBE_CLI" \
interview.mp3 \
--response-format text \
--out interview.txt
references/api.md: supported formats, limits, response formats, and known-speaker notes.testing
Navigates archived ChatGPT or Claude-style conversation exports and a MemFS reference archive on demand. Use when recalling what a past assistant knew, searching old conversations, rendering specific chats, seeding reference memory from export sidecars, or mining historical context without doing a full import.
testing
Migrates deprecated Letta Filesystem folders/files to MemFS using markdown document corpora, chunking, local lexical search, and QMD semantic search via the memfs-search skill. Use when replacing folders.files.upload, working with PDFs or document QA, or emulating open_file, grep_file, and search_file behavior.
data-ai
Configures Letta agent compaction settings and custom summarization prompts. Use when a user asks to change an agent's compaction prompt, improve summaries after context eviction, tune sliding-window or all-message compaction, or design companion/coding-agent continuity summaries.
development
Semantic search over agent memory files. Use when you need to find conceptually related memory blocks, discover forgotten reference files, check what you already know before creating new memory, or search beyond exact keyword matching. Currently supports QMD (local, no API keys).