text-to-short-video/SKILL.md
Generate 30-60 second short videos by directly calling an OpenAI-compatible video generation API from text. Keep role consistency by using roles.json as role prompt source and only updating role descriptions.
npx skillsauth add laitszkin/apollo-toolkit text-to-short-videoInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
roles.json as the authoritative role source and collect only the minimal prompt, duration, and sizing inputs.Collect only what is required:
project_dir (absolute path)content_name (output folder/file name)width x height, default 1080x1920)50, keep in 30-60 range)If critical inputs are missing, ask concise follow-up questions.
Always use:
<project_dir>/pictures/<content_name>/roles.jsonRequired JSON format:
{
"characters": [
{
"id": "lin_xia",
"name": "Lin Xia",
"appearance": "short black hair, amber eyes, slim build",
"outfit": "dark trench coat, silver pendant, leather boots",
"description": "standing calmly, observant expression"
}
]
}
Consistency rules:
roles.json as the role prompt source.id, name, appearance, outfit unchanged for existing roles.description to reflect this clip's action/emotion.{"characters": []}
Use this template:
~/.codex/skills/text-to-short-video/.env.exampleCopy to:
~/.codex/skills/text-to-short-video/.envRequired keys:
OPENAI_API_URLOPENAI_API_KEYOptional keys:
OPENAI_VIDEO_MODELOPENAI_VIDEO_DURATION_SECONDSOPENAI_VIDEO_ASPECT_RATIOOPENAI_VIDEO_SIZEOPENAI_VIDEO_POLL_SECONDSTEXT_TO_SHORT_VIDEO_WIDTHTEXT_TO_SHORT_VIDEO_HEIGHTroles.json before prompt generation<project_dir>/pictures/<content_name>/roles.json.description when clip-specific motion/emotion is needed.roles.json and only description is clip-specific.<project_dir>/video/<content_name>/shorts/api/prompt_input.jsonSuggested local prompt_input.json structure:
{
"roles_file": "<project_dir>/pictures/<content_name>/roles.json",
"description_overrides": {
"lin_xia": "running through rain, breathing hard, determined"
},
"final_prompt": "..."
}
${OPENAI_API_URL%/}/videos/generations<project_dir>/video/<content_name>/shorts/api/Example request fields (provider-compatible variants are allowed):
{
"model": "${OPENAI_VIDEO_MODEL}",
"prompt": "...",
"duration": 50,
"size": "1080x1920",
"aspect_ratio": "9:16"
}
${OPENAI_API_URL%/}/videos/generations/<job_id> every OPENAI_VIDEO_POLL_SECONDS seconds.Save to:
<project_dir>/video/<content_name>/shorts/<content_name>_important.mp4If provider returns multiple outputs, keep the best one that matches requested size/duration closest.
When output ratio or resolution differs from target, run:
apltk enforce-video-aspect-ratio \
--input-video "<downloaded_video_path>" \
--output-video "<final_output_video_path>" \
--env-file ~/.codex/skills/text-to-short-video/.env \
--force
Behavior:
Return absolute paths for:
roles.json used for role consistency.mp4.mp4 (if post-processing executed)Also report:
30-60 seconds)width x height)Before finishing, verify:
roles.json uses required schema and is used as prompt sourceid/name/appearance/outfit) were not modifieddescription was changed for clip-specific behavior30-60 seconds (or user-approved exception)development
Review a pull request — interactive PR selection via `gh`, 4-dimension code review (hallucinated code, architecture, performance, test validity), then post severity-graded comments with fix suggestions on the PR. Not for spec-based review — use `review` instead.
development
Read a user-specified PDF that marks the week's key financial events, deeply research each marked event with current sources, capture any additional breaking financial developments, and produce a concise Chinese-capable PDF briefing that explains what happened and why it matters.
documentation
Generate long-form videos (more than 10 minutes) by following user instructions and invoking related skills only when needed (`openai-text-to-image-storyboard`, `docs-to-voice`, `remotion-best-practices`). For text inputs, extract a complete long-form story arc, generate fresh storyboard images (no reuse of previously generated pictures), and render a 16:9 animated long-form video.
tools
協助完成自動化版本發佈。同步文檔、更新版本號、推送 tag 並建立 GitHub Release。