tools/speech/SKILL.md
Use when the user asks for text-to-speech narration or voiceover, accessibility reads, audio prompts, or batch speech generation via the OpenAI Audio API; run the bundled CLI (`scripts/text_to_speech.py`) with built-in voices and require `OPENAI_API_KEY` for live calls. Custom voice creation is out of scope.
npx skillsauth add letta-ai/skills speechInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Generate spoken audio for the current project (narration, product demo voiceover, IVR prompts, accessibility reads). Defaults to gpt-4o-mini-tts-2025-12-15 and built-in voices, and prefers the bundled CLI for deterministic, reproducible runs.
scripts/text_to_speech.py) with sensible defaults (see references/cli.md).tmp/speech/ for intermediate files (for example JSONL batches); delete when done.output/speech/ when working in this repo.--out or --out-dir to control output paths; keep filenames stable and descriptive.Prefer uv for dependency management.
Python packages:
uv pip install openai
If uv is unavailable:
python3 -m pip install openai
OPENAI_API_KEY must be set for live API calls.If the key is missing, give the user these steps:
OPENAI_API_KEY as an environment variable in their system.If installation isn't possible in this environment, tell the user which dependency is missing and how to install it locally.
gpt-4o-mini-tts-2025-12-15 unless the user requests another model.cedar. If the user wants a brighter tone, prefer marin.instructions are supported for GPT-4o mini TTS models, but not for tts-1 or tts-1-hd.--rpm at 50.OPENAI_API_KEY before any live API call.openai package) for all API calls; do not use raw HTTP.scripts/text_to_speech.py) over writing new one-off scripts.scripts/text_to_speech.py. If something is missing, ask the user before doing anything else.Reformat user direction into a short, labeled spec. Only make implicit details explicit; do not invent new requirements.
Quick clarification (augmentation vs invention):
Template (include only relevant lines):
Voice Affect: <overall character and texture of the voice>
Tone: <attitude, formality, warmth>
Pacing: <slow, steady, brisk>
Emotion: <key emotions to convey>
Pronunciation: <words to enunciate or emphasize>
Pauses: <where to add intentional pauses>
Emphasis: <key words or phrases to stress>
Delivery: <cadence or rhythm notes>
Augmentation rules:
Input text: "Welcome to the demo. Today we'll show how it works."
Instructions:
Voice Affect: Warm and composed.
Tone: Friendly and confident.
Pacing: Steady and moderate.
Emphasis: Stress "demo" and "show".
{"input":"Thank you for calling. Please hold.","voice":"cedar","response_format":"mp3","out":"hold.mp3"}
{"input":"For sales, press 1. For support, press 2.","voice":"marin","instructions":"Tone: Clear and neutral. Pacing: Slow.","response_format":"wav"}
More principles: references/prompting.md. Copy/paste specs: references/sample-prompts.md.
Use these modules when the request is for a specific delivery style. They provide targeted defaults and templates.
references/narration.mdreferences/voiceover.mdreferences/ivr.mdreferences/accessibility.mdreferences/cli.mdreferences/audio-api.mdreferences/voice-directions.mdreferences/network.mdreferences/cli.md: how to run speech generation/batches via scripts/text_to_speech.py (commands, flags, recipes).references/audio-api.md: API parameters, limits, voice list.references/voice-directions.md: instruction patterns and examples.references/prompting.md: instruction best practices (structure, constraints, iteration patterns).references/sample-prompts.md: copy/paste instruction recipes (examples only; no extra theory).references/narration.md: templates + defaults for narration and explainers.references/voiceover.md: templates + defaults for product demo voiceovers.references/ivr.md: templates + defaults for IVR/phone prompts.references/accessibility.md: templates + defaults for accessibility reads.references/network.md: environment/network troubleshooting.testing
Navigates archived ChatGPT or Claude-style conversation exports and a MemFS reference archive on demand. Use when recalling what a past assistant knew, searching old conversations, rendering specific chats, seeding reference memory from export sidecars, or mining historical context without doing a full import.
testing
Migrates deprecated Letta Filesystem folders/files to MemFS using markdown document corpora, chunking, local lexical search, and QMD semantic search via the memfs-search skill. Use when replacing folders.files.upload, working with PDFs or document QA, or emulating open_file, grep_file, and search_file behavior.
data-ai
Configures Letta agent compaction settings and custom summarization prompts. Use when a user asks to change an agent's compaction prompt, improve summaries after context eviction, tune sliding-window or all-message compaction, or design companion/coding-agent continuity summaries.
development
Semantic search over agent memory files. Use when you need to find conceptually related memory blocks, discover forgotten reference files, check what you already know before creating new memory, or search beyond exact keyword matching. Currently supports QMD (local, no API keys).