skills/golem-powers/content-demo-creation/SKILL.md
Create polished product demo videos by recording a real running app with Computer Use or recreating a product UX as deterministic Remotion/Three output. Use for demo videos, walkthroughs, feature showcases, and product UX mimics. NOT for static screenshots, slide decks, or QA bug-hunting.
npx skillsauth add etanhey/golems content-demo-creationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Produce a genuinely good product demo video. Two modes. Pick by what exists: a running app (CU-demo) or only docs/a target UX (mimic-demo). The output is always a video an absent stakeholder can watch remotely — delivered to iCloud/Obsidian, not left on a local desktop.
| You have… | Mode | Output | |---|---|---| | A real, runnable app | CU-demo | Screen-recorded narrated walkthrough of the live app | | Only docs / a UX to recreate | mimic-demo | Deterministic Remotion render that recreates the UX | | A running app but want a polished (not raw) result | Both | CU-demo for truth + mimic-demo polish; or CU footage as reference for the render |
If unsure, default to mimic-demo — it's deterministic, re-renderable, and doesn't depend on a fragile live app state.
Before you recreate ANY pixel, ground the demo in the REAL product UI. This gate is non-negotiable and applies to both modes — it is the #1 way a demo fails.
The authoritative reference is, in priority order:
flow-bar/Sources/VoiceBar/*.swift, the real React/SwiftUI views that ship).Allowlist gap (real, 2026-05-29): some apps aren't in the Computer-Use allowlist, so you literally cannot screenshot them live (voice-LEAD hit this with the VoiceBar). When that happens, do NOT fall back to inventing — ask the user for a screen recording of the real setup, run it through the qa-video frame pipeline, and use those frames as the Gate 0 reference. A recording the user already made beats a live screenshot you can't take.
NEVER mimic from:
pipeline.tsx on the product website) — those are idealized inventions, not the product.Why (real failure, 2026-05-28): a VoiceLayer demo was built off the marketing site + blueprint. Etan: "looks nothing like my setup, nor the VoiceBar." The truth was in flow-bar/Sources/VoiceBar/*.swift. A beautiful render of the wrong UI is a failed demo.
Checklist before rendering/filming:
Hard-won rules (each is a real correction from a prior demo run — see codex-019e6d0f BrainBar demo):
screencapture/bash screenshots look dead and are disqualified. ("bash screenshots aren't the same.")/tmp/.<app>-toggle file, a fresh launch, a seeded fixture) so the walkthrough is reproducible.~/Library/Mobile Documents/com~apple~CloudDocs/...) or the synced Obsidian folder, then report the exact path. ("put it inside of iCloud or the Obsidian folders so they sync... and tell me where it is.")Pipeline:
pre-flight state → start screen recording → CU drives the app feature-by-feature
→ narrate (live or scripted) → stop recording → trim/assemble
→ fix any bug surfaced, re-take the affected segment → deliver to iCloud/Obsidian → report path
Working hypothesis (to validate in eval, grounded in contentClaude's live run): for UI-accurate demos, a deterministic Remotion + @remotion/three render beats prompting an AI video model. The render is pixel-controlled, re-renderable, and never hallucinates the UI. contentClaude (surface:14) independently chose this stack for the VoiceLayer demo and hit the version-mismatch trap below — eval round 1 will confirm whether the render quality justifies the approach.
Pipeline:
PASS GATE 0 (reference the REAL shipping UI — screenshot of running app and/or real UI source; NOT marketing site, NOT blueprint)
→ read how-it-works docs (README, feature pages) for FLOW/narrative only — never for visuals
→ build a Remotion + @remotion/three composition recreating the REAL UX
→ PIN all remotion package versions to one number (see scripts/check-remotion-versions.sh)
→ render deterministic MP4
→ [optional] AI video model (LTX local when RAM allows, else cloud) for B-roll / ambient ONLY
→ [optional] voiceover (VoiceLayer TTS to dogfood, or silent + on-screen captions)
→ deliver to iCloud/Obsidian → report path
Rules:
4.0.422 core vs 4.0.421 @remotion/google-fonts/@remotion/paths) breaks React context, hooks, and renders. Pin every @remotion/* + remotion to the SAME exact version (drop the ^). Run scripts/check-remotion-versions.sh before rendering.The video-gen stage MUST be cloud-or-local switchable. LTX-2.3 Q4 is ~19.4GB into unified memory.
scripts/ram-gate.sh). If free+inactive is not comfortably above the model size while the agent fleet runs, do NOT run locally — route to cloud (Replicate/fal). Do not OOM the running ecosystem.A demo is shippable when:
This skill is built the skillCreator way: produce a demo → review quality → feed specific feedback → improve the skill → re-render. Not one-shot. See EVAL.md for the running rounds + deltas.
tools
The human-eval UX contract for Phoenix views: turn-by-turn scrollable replay (not a scorecard), hide-but-copyable IDs, collapsed thinking, identity chips, tool filters, tiny frozen starter datasets, mark-wrong-in-thread, mobile-first. Use when: building or reviewing ANY Phoenix/eval view, annotation UI, session replay, or human-grading surface. Triggers: phoenix view, eval UI, annotation view, session replay, human eval UX, grading interface. NOT for: Phoenix data pipelines/ingest (capture scripts have their own specs).
tools
macOS systems specialist — AppKit NSPanel architecture, launchd services, socket activation, MCP bridge resilience, syspolicyd, and high-frequency SwiftUI dashboards. Use when building menu-bar apps, LaunchAgents, debugging syspolicyd/Gatekeeper/TCC, resilient UDS/MCP bridges, or SwiftUI dashboards at 10Hz+.
development
Bulk LLM-judging protocol for fleet-dispatched verdict runs (KG cluster, eval harness). Use when: dispatching or running judge workers (J1/J2/RT), planning bulk-apply from verdict JSONL, or triaging evidence_degraded outputs. Triggers: judge fleet, bulk judge, R3 verdicts, kg-judge, RT gate, evidence_degraded. NOT for: single-item code review, Phoenix view UX (use phoenix-human-view), or non-judge eval pipelines.
development
Quiet-down protocol for sprint close: when the fleet wraps, delete ALL polling crons and monitors, send ONE final dashboard + ONE message, then go SILENT. Use when: fleet wraps, all workers done, overnight queue exhausted, sprint close, Etan asleep/away with nothing approved left. Triggers: fleet wrap, wrap the fleet, stand down, going quiet, sprint close. NOT for: mid-sprint monitoring (keep your loops), spawning a successor (use /session-handoff first).