speech/SKILL.md
Use when the user asks for text-to-speech narration or voiceover, accessibility reads, audio prompts, or batch speech generation via the OpenAI Audio API; run the bundled CLI (`scripts/text_to_speech.py`) with built-in voices and require `OPENAI_API_KEY` for live calls. Custom voice creation is out of scope.
npx skillsauth add syl2042/codex_skills speechInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
4 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Generate spoken audio for the current project (narration, product demo voiceover, IVR prompts, accessibility reads). Defaults to gpt-4o-mini-tts-2025-12-15 and built-in voices, and prefers the bundled CLI for deterministic, reproducible runs.
scripts/text_to_speech.py) with sensible defaults (see references/cli.md).tmp/speech/ for intermediate files (for example JSONL batches); delete when done.output/speech/ when working in this repo.--out or --out-dir to control output paths; keep filenames stable and descriptive.Prefer uv for dependency management.
Python packages:
uv pip install openai
If uv is unavailable:
python3 -m pip install openai
OPENAI_API_KEY must be set for live API calls.If the key is missing, give the user these steps:
OPENAI_API_KEY as an environment variable in their system.If installation isn't possible in this environment, tell the user which dependency is missing and how to install it locally.
gpt-4o-mini-tts-2025-12-15 unless the user requests another model.cedar. If the user wants a brighter tone, prefer marin.instructions are supported for GPT-4o mini TTS models, but not for tts-1 or tts-1-hd.--rpm at 50.OPENAI_API_KEY before any live API call.openai package) for all API calls; do not use raw HTTP.scripts/text_to_speech.py) over writing new one-off scripts.scripts/text_to_speech.py. If something is missing, ask the user before doing anything else.Reformat user direction into a short, labeled spec. Only make implicit details explicit; do not invent new requirements.
Quick clarification (augmentation vs invention):
Template (include only relevant lines):
Voice Affect: <overall character and texture of the voice>
Tone: <attitude, formality, warmth>
Pacing: <slow, steady, brisk>
Emotion: <key emotions to convey>
Pronunciation: <words to enunciate or emphasize>
Pauses: <where to add intentional pauses>
Emphasis: <key words or phrases to stress>
Delivery: <cadence or rhythm notes>
Augmentation rules:
Input text: "Welcome to the demo. Today we'll show how it works."
Instructions:
Voice Affect: Warm and composed.
Tone: Friendly and confident.
Pacing: Steady and moderate.
Emphasis: Stress "demo" and "show".
{"input":"Thank you for calling. Please hold.","voice":"cedar","response_format":"mp3","out":"hold.mp3"}
{"input":"For sales, press 1. For support, press 2.","voice":"marin","instructions":"Tone: Clear and neutral. Pacing: Slow.","response_format":"wav"}
More principles: references/prompting.md. Copy/paste specs: references/sample-prompts.md.
Use these modules when the request is for a specific delivery style. They provide targeted defaults and templates.
references/narration.mdreferences/voiceover.mdreferences/ivr.mdreferences/accessibility.mdreferences/cli.mdreferences/audio-api.mdreferences/voice-directions.mdreferences/codex-network.mdreferences/cli.md: how to run speech generation/batches via scripts/text_to_speech.py (commands, flags, recipes).references/audio-api.md: API parameters, limits, voice list.references/voice-directions.md: instruction patterns and examples.references/prompting.md: instruction best practices (structure, constraints, iteration patterns).references/sample-prompts.md: copy/paste instruction recipes (examples only; no extra theory).references/narration.md: templates + defaults for narration and explainers.references/voiceover.md: templates + defaults for product demo voiceovers.references/ivr.md: templates + defaults for IVR/phone prompts.references/accessibility.md: templates + defaults for accessibility reads.references/codex-network.md: environment/sandbox/network-approval troubleshooting.development
Download YouTube videos with customizable quality and format options. Use this skill when the user asks to download, save, or grab YouTube videos. Supports various quality settings (best, 1080p, 720p, 480p, 360p), multiple formats (mp4, webm, mkv), and audio-only downloads as MP3.
development
Deploy applications and websites to Vercel. Use when the user requests deployment actions like "deploy my app", "deploy and give me the link", "push this live", or "create a preview deployment".
content-media
Transcribe audio files to text with optional diarization and known-speaker hints. Use when a user asks to transcribe speech from audio/video, extract text from recordings, or label speakers in interviews or meetings.
tools
Toolkit for styling artifacts with a theme. These artifacts can be slides, docs, reportings, HTML landing pages, etc. There are 10 pre-set themes with colors/fonts that you can apply to any artifact that has been creating, or can generate a new theme on-the-fly.