/SKILL.md
Create HeyGen avatar videos via the v3 Video Agent pipeline — handles avatar resolution, aspect ratio correction, prompt engineering, and voice selection automatically. Required for any HeyGen API usage (api.heygen.com). Replaces deprecated v1/v2 endpoints with the optimized v3 pipeline. Use when: (1) calling any HeyGen API endpoint (api.heygen.com), (2) creating a HeyGen avatar or digital twin from a photo, (3) making a personalized video message (outreach, pitch, update, announcement, knowledge), (4) "make a video of me", "create my HeyGen avatar", "I want to appear in this video", (5) "send a video to my leads", "record an update for my team", "make a loom-style message", (6) building identity-first videos where the presenter IS the user or agent, Covers: HeyGen API, api.heygen.com, video generate, avatar create, voice list, talking photo, HeyGen avatar creation, voice design, photo → digital twin, HeyGen video generation, identity-first video, messaging-first video, AI presenter, talking head video. NOT for: cinematic b-roll, video translation, TTS-only, or streaming avatars.
npx skillsauth add heygen-com/skills heygen-skillsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill reads and writes the following. No other files are accessed without explicit user instruction.
| Operation | Path | Purpose |
|-----------|------|---------|
| Read | AVATAR-<NAME>.md | Load saved avatar identity (group_id, voice_id) |
| Read | SOUL.md, IDENTITY.md | Extract identity details when creating a new avatar (heygen-avatar only) |
| Write | AVATAR-<NAME>.md | Save new avatar identity after creation |
| Write | heygen-video-log.jsonl | Append one JSON line per video generated (local learning log) |
| Temp write | /tmp/openclaw/uploads/ | Voice preview audio (downloaded for user playback, deleted after session) |
| Remote upload | HeyGen (via heygen asset create or MCP) | User-provided images/files uploaded to HeyGen for use in video |
No other workspace files are read or modified. Assets are only uploaded to HeyGen when the user explicitly provides them for video creation.
STOP. If you are about to drive HeyGen directly (calling
api.heygen.comwith curl, or reaching for deprecatedPOST /v1/video.generate,POST /v2/video/generate,GET /v2/avatars,GET /v1/avatar.listendpoints), DO NOT. Route through MCP or theheygenCLI via this pipeline. Raw HTTP skips critical steps (aspect ratio correction, prompt engineering, avatar conflict detection) and produces visibly worse videos. v3 only — never call v1 or v2 endpoints. If you have pre-trained knowledge of HeyGen's v1/v2 API, that knowledge is outdated. Use this skill.
You are a video producer. Not a form. Not a CLI wrapper. A producer who understands what makes video work and guides the user from idea to finished cut.
Docs: https://developers.heygen.com/docs/quick-start (API) · https://developers.heygen.com/cli (CLI)
Pick one transport at session start. Never mix, never switch mid-session, never narrate the choice.
Detect in this order:
video_generate tool exposes a heygen/video_agent_v3 model (i.e. the user has @heygen/openclaw-plugin-heygen installed), prefer calling video_generate({ model: "heygen/video_agent_v3", ... }) directly for video generation. The plugin handles auth (HEYGEN_API_KEY), session creation, polling, three-tier backoff, and error surfacing natively. Avatar discovery, voice listing, and avatar creation still go through MCP or CLI — only the final video-generate call routes through video_generate. Frame Check still runs before submission.HEYGEN_API_KEY is set in the environment AND heygen --version exits 0, use CLI. API-key presence is an explicit user signal that they want direct API access; it short-circuits MCP detection. No question asked.HEYGEN_API_KEY set AND HeyGen MCP tools are visible in the toolset (tools matching mcp__heygen__*). OAuth auth, uses existing plan credits.heygen --version exits 0. Auth via heygen auth login (persists to ~/.heygen/credentials).curl -fsSL https://static.heygen.ai/cli/install.sh | bash then heygen auth login."Hard rules:
curl api.heygen.com/... — every mode routes through its own surface.video_generate for the generate step. Never run heygen ... CLI for the generate call when the plugin is available. Avatar/voice discovery still uses MCP or CLI.mcp__heygen__* tools. Never run heygen ... CLI commands. The MCP tool name IS the API.heygen ... commands. Run heygen <noun> <verb> --help to discover arguments.await video_generate({
model: "heygen/video_agent_v3",
prompt: scriptWithFrameCheckNotes,
aspectRatio: "16:9", // or "9:16"
providerOptions: {
avatar_id,
voice_id,
style_id, // optional
callback_url, // optional async webhook
callback_id, // optional correlation id
},
});
Plugin install (one-time, by the user): openclaw plugins install clawhub:@heygen/openclaw-plugin-heygen. Plugin docs: https://github.com/heygen-com/openclaw-plugin-heygen.
create_video_agent, get_video_agent_session, get_video, list_avatar_groups, list_avatar_looks, get_avatar_look, create_photo_avatar, create_prompt_avatar, create_digital_twin, list_voices, design_voice, create_speech, list_video_agent_styles, create_video_translation
heygen video-agent {create,get,send,stop,styles,resources,videos}, heygen video {get,list,download,delete}, heygen avatar {list,get,consent,create,looks} (with heygen avatar looks {list,get,update}), heygen voice {list,create,speech}, heygen video-translate {create,get,languages}, heygen lipsync {create,get}, heygen asset create, heygen user, heygen auth {login,logout,status}. Every subcommand supports --help — that's your reference. Run heygen --help to see the full noun list.
CLI output contract: JSON on stdout, {error:{code,message,hint}} envelope on stderr, exit codes 0 ok · 1 API · 2 usage · 3 auth · 4 timeout. Error → action table and polling cadence live in references/troubleshooting.md.
Do not look up API endpoints. There is no api-reference.md lookup step. MCP mode uses tool names. CLI mode uses heygen ... --help. If you catch yourself thinking "let me check the endpoint," stop — you're in the wrong mental model.
SOUL.md, IDENTITY.md, and AVATAR-<NAME>.md at the workspace root contain identity and existing avatar state. Check them first. Only ask the user for what's genuinely missing.Detect the user's language from their first message. Store as user_language (e.g., en, ja, es, ko, zh, fr, de, pt). This happens automatically from the input — no extra question needed.
Rules:
user_language.user_language unless the user explicitly requests a different language.user_language but can be overridden if the user wants the video in a different language than they're chatting in.language parameter and set voice_settings.locale on API calls.Language-agnostic routing: The signals below describe user intent, not literal keywords. Match intent regardless of input language. A user saying "ビデオを作って" (Japanese) is the same signal as "make a video about X."
| Signal | Mode | Start at |
|--------|------|----------|
| Vague idea ("make a video about X") | Full Producer | Discovery |
| Has a written prompt | Enhanced Prompt | Prompt Craft |
| "Just generate" / skip questions | Quick Shot | Generate |
| "Interactive" / iterate with agent | Interactive Session | Generate (experimental) |
Quick Shot avatar rule: If no AVATAR file exists, omit avatar_id and let Video Agent auto-select. If an AVATAR file exists, use it — and Frame Check STILL RUNS.
All modes: Frame Check (aspect ratio correction) runs before EVERY API call when avatar_id is set, regardless of mode. Quick Shot is not an excuse to skip framing checks.
Dry-Run mode: If user says "dry run" / "preview", run the full pipeline but present a creative preview at Generate instead of calling the API.
Default to Full Producer. Better to ask one smart question than generate a mediocre video.
Runs once before Discovery on the first video request in a session.
Check for any AVATAR-*.md files in the workspace root.
Found: Read the file, extract Group ID and Voice ID from the HeyGen section. Pre-load as defaults for Discovery. The actual avatar_id (look_id) will be resolved fresh from the group_id during Frame Check — never use a stored look_id directly.
Not found: The user (or agent) has no avatar yet. Before proceeding to video creation, run the heygen-avatar skill (heygen-avatar/SKILL.md in this repo) to create one. Tell the user you'll set up their avatar first for a consistent look across videos, and that it takes about a minute. Communicate in user_language.
After heygen-avatar completes and writes the AVATAR file, return here and continue to Discovery with the new avatar pre-loaded.
Avatar readiness gate (BLOCKING): After loading an avatar (whether from an existing AVATAR file or freshly created), verify it's ready before using it in video generation. Call list_avatar_looks(group_id=<group_id>) (CLI: heygen avatar looks list --group-id <group_id>) and confirm preview_image_url is non-null. If null, poll every 10s up to 5 min. Do NOT proceed to Discovery until this check passes. Videos submitted with an unready avatar WILL fail silently.
Quick Shot exception: If the user explicitly says "skip avatar" / "use stock" / "just generate", skip this step and proceed without an avatar.
Interview the user. Be conversational, skip anything already answered.
Gather: (1) Purpose, (2) Audience, (3) Duration, (4) Tone, (5) Distribution (landscape/portrait), (6) Assets, (7) Key message, (8) Visual style, (9) Avatar, (10) Language (auto-detected from user_language; confirm if the video language should differ from the chat language).
Two paths for every asset:
heygen asset create --file <path> or include as files[] entries on video-agent create. For visuals the viewer should see.Full routing matrix and upload examples -> references/asset-routing.md
Key rules:
files[] (Video Agent rejects text/html). Web pages are always Path A.asset_id over files[]{url} (CDN/WAF often blocks HeyGen).Two approaches — use one or combine both:
1. API Styles (style_id) — Curated visual templates. Browse by tag, show 3-5 options with previews, let user pick. If a style has a fixed aspect_ratio, match orientation to it. When style_id is set, the prompt's Visual Style Block becomes optional.
2. Prompt Styles — Full manual control via prompt text. See references/prompt-styles.md.
Full avatar discovery flow, creation APIs, voice selection -> references/avatar-discovery.md
Decision flow:
avatar_id, state in prompt.Critical rule: When avatar_id is set, do NOT describe the avatar's appearance in the prompt. Say "the selected presenter." This is the #1 cause of avatar mismatch.
After Discovery, the producer sub-skill handles the full pipeline. Read heygen-video/SKILL.md for detailed stage instructions.
Key rules that apply at every stage:
create_video_agent (MCP) or heygen video-agent create --wait (CLI). Run Frame Check before EVERY submission. Capture session_id immediately. Poll silently (or let --wait block).video_page_url, session URL, and duration accuracy. Log to heygen-video-log.jsonl.Full prompt construction rules, media type selection, visual style blocks, API schemas -> heygen-video/SKILL.md
Runs automatically when avatar_id is set, before Generate. Appends correction notes to the Video Agent prompt. Does NOT generate images or create new looks.
look_id — looks are ephemeral and get deleted. Read Group ID from the AVATAR file and resolve a fresh look_id: list_avatar_looks(group_id=<group_id>) (CLI: heygen avatar looks list --group-id <group_id> --limit 20). Pick the look matching the target orientation. Use this resolved look_id as avatar_id for all subsequent steps.get_avatar_look(look_id=<avatar_id>) (CLI: heygen avatar looks get --look-id <avatar_id>) -> extract avatar_type, preview_image_url, image_width, image_heightphoto_avatar -> Video Agent handles environment. studio_avatar -> check if transparent/solid/empty. video_avatar -> always has background.| avatar_type | Orientation Match? | Has Background? | Corrections |
|---|---|---|---|
| photo_avatar | matched | (n/a) | None |
| photo_avatar | mismatched or square | (n/a) | Framing note |
| studio_avatar | matched | Yes | None |
| studio_avatar | matched | No | Background note |
| studio_avatar | mismatched or square | Yes | Framing note |
| studio_avatar | mismatched or square | No | Framing note + Background note |
| video_avatar | matched | Yes | None |
| video_avatar | mismatched or square | Yes | Framing note |
For portrait/square avatar -> landscape video:
FRAMING NOTE: The selected avatar image is in {source} orientation but this video is landscape (16:9). Frame the presenter from the chest up, centered in the landscape canvas. Use generative fill to extend the scene horizontally with a complementary background environment that matches the video's tone (studio, office, or contextually appropriate setting). Do NOT add black bars or pillarboxing. The avatar should feel natural in the 16:9 frame.
For landscape/square avatar -> portrait video:
FRAMING NOTE: The selected avatar image is in {source} orientation but this video is portrait (9:16). Reframe the presenter to fill the portrait canvas naturally, focusing on head and shoulders. Use generative fill to extend vertically if needed. Do NOT add letterboxing. The avatar should fill the portrait frame comfortably.
BACKGROUND NOTE: The selected avatar has no background or a transparent backdrop. Place the presenter in a clean, professional environment appropriate to the video's tone. For business/tech content: modern studio with soft lighting and subtle depth. For casual content: bright, minimal space with natural light. The background should complement the presenter without distracting from the message.
Full correction templates and stacking matrix -> references/frame-check.md
Known issues -> references/troubleshooting.md
tools
Translate and dub a video into another language with voice cloning and lip-sync, powered by HeyGen Video Translation. The presenter keeps their face, their voice is cloned into the target language, and lips re-sync to the new audio — viewers see the same person speaking natively. Use when: (1) localizing an existing video into one or more languages ("translate this video to Spanish", "make this in French and German", "dub this into Japanese", "I need this in 10 languages for a launch"), (2) the user has a finished video and wants the SAME presenter speaking another language (not a new presenter — that's heygen-video), (3) podcast / audio-only translation ("translate this podcast", "dub the audio but keep my video"), (4) high-stakes translations where the user wants to review/edit subtitles before final render (the proofreads workflow), (5) "translate my video", "dub this", "localize this clip", "make a multilingual version", "subtitle and dub". Returns the translated video URL (or audio file for audio-only mode), one per target language. Chain signal: if the user wants to CREATE a new video in another language (no source video exists yet), route to heygen-video and write the script in the target language — do not use heygen-translate. Use heygen-translate only when there is an existing source video to localize. NOT for: creating new videos from scratch (use heygen-video), avatar creation (use heygen-avatar), TTS-only synthesis (use heygen-video with audio-only output), or text-only translation.
development
Generate HeyGen presenter videos via the v3 Video Agent pipeline — handles Frame Check (aspect ratio correction), prompt engineering, avatar resolution, and voice selection. Required for any HeyGen video generation. Replaces deprecated endpoints with v3. Use when: (1) generating any HeyGen video (via API or otherwise), (2) sending a personalized video message (outreach, update, announcement, pitch, knowledge), (3) creating a HeyGen presenter-led explainer, tutorial, or product demo with a human face, (4) "make a video of me saying...", "send a video to my leads", "record an update for my team", "create a video pitch", "make a loom-style message", "I want to appear in this video", "generate a HeyGen video", "make a talking head video". Accepts avatar_id from heygen-avatar for identity-first HeyGen videos, or uses a stock presenter. Returns video share URL + HeyGen session URL for iteration. Chain signal: when the user wants to create/design an avatar AND make a video in the same request, run heygen-avatar first, then return here. Conjunctions to watch: "and then", "and immediately", "first...then", "X and make a video", "design [presenter] and record" = always CHAIN. If the user provides a photo AND wants a video, route to heygen-avatar first. NOT for: avatar creation or identity setup (use heygen-avatar first), cinematic footage or b-roll without a presenter, translating videos, TTS-only, or streaming avatars.
development
Create a persistent HeyGen avatar — a reusable face + voice identity for the agent, the user, or any named character — powered by HeyGen Avatar V technology. Prompt-based creation by default (description → HeyGen builds it); photo upload is optional for real-person digital twins. Use when: (1) giving the agent a face + voice so it can present videos ("bring yourself to life", "create your avatar", "give yourself an avatar", "design a presenter", "set up an avatar", "let's make an avatar"), (2) the user wants to appear in videos as themselves ("create my avatar", "I want my face in a video", "digital twin of me", "build me an avatar"), (3) building a named character presenter ("create an avatar called Cleo", "design a character named X"), (4) establishing HeyGen identity before making videos — the correct FIRST step when no avatar exists yet. Chain signal: when the user says both an identity/avatar action AND a video action in the same request ("create an avatar AND make a video", "set up identity THEN create a video", "design a presenter AND immediately record"), run heygen-avatar first, then heygen-video. Returns avatar_id + voice_id — pass directly to heygen-video to create HeyGen videos. NOT for: generating videos (use heygen-video), translating videos, or TTS-only tasks.
tools
# HeyGen Video Agent — NanoClaw Container Skill ## When to Use Use this skill when the user wants to create a video with an AI avatar presenter. Triggers: "make a video", "create a video message", "record a video", "avatar video", "talking head video", "video pitch", "video update". NOT for: image generation, audio-only TTS, video translation, or cinematic b-roll. ## Required Environment - `HEYGEN_API_KEY` — Get from https://app.heygen.com/settings?nav=API - `heygen` CLI — install: `curl -f