skills/capy-video-gen-skill/SKILL.md
Multi-shot AI video generation pipeline with face identity consistency. Converts scripts or ideas into complete videos using character extraction, storyboarding, frame generation, and video assembly. 300 experiments validated, 70% face distance improvement. Use when the user asks to create a video from a script, story, idea, or wants multi-shot video with consistent characters.
npx skillsauth add happycapy-ai/happycapy-skills capy-video-gen-skillInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Generate complete multi-shot videos from scripts or ideas with consistent character faces across all scenes. Built for HappyCapy AI Gateway. 300 experiments validated, 70% face distance improvement.
ViMax converts text scripts into full videos through an automated pipeline:
The ViMax pipeline code is at: /home/node/a0/workspace/527fb591-1439-4b5b-ad5d-90f972773f95/workspace/tmp/ViMax/
All commands must be run from this directory using the venv:
cd /home/node/a0/workspace/527fb591-1439-4b5b-ad5d-90f972773f95/workspace/tmp/ViMax
AI_GATEWAY_API_KEY environment variable (auto-configured in HappyCapy).venv/ (already set up)Edit the script, requirements, and style in the entry script, then run:
cd /home/node/a0/workspace/527fb591-1439-4b5b-ad5d-90f972773f95/workspace/tmp/ViMax
.venv/bin/python main_happycapy_script2video.py
For generating from a brief idea (auto-generates script first):
cd /home/node/a0/workspace/527fb591-1439-4b5b-ad5d-90f972773f95/workspace/tmp/ViMax
.venv/bin/python main_happycapy_idea2video.py
import asyncio
from langchain.chat_models import init_chat_model
from tools.render_backend import RenderBackend
from utils.config_loader import load_config
from pipelines.script2video_pipeline import Script2VideoPipeline
config = load_config("configs/happycapy_script2video.yaml")
chat_model = init_chat_model(**config["chat_model"]["init_args"])
backend = RenderBackend.from_config(config)
pipeline = Script2VideoPipeline(
chat_model=chat_model,
image_generator=backend.image_generator,
video_generator=backend.video_generator,
working_dir=config["working_dir"],
)
# Run the pipeline
asyncio.run(pipeline(
script="Your script here...",
user_requirement="No more than 8 shots total.",
style="Cinematic, warm lighting"
))
{working_dir}/final_video.mp4configs/happycapy_script2video.yamlconfigs/happycapy_idea2video.yamlHappyCapy configs at configs/happycapy_script2video.yaml:
chat_model:
init_args:
model: gpt-4.1
model_provider: openai
api_key: ${AI_GATEWAY_API_KEY}
base_url: https://ai-gateway.happycapy.ai/api/v1/openai/v1
image_generator:
class_path: tools.ImageGeneratorHappyCapyAPI
init_args:
api_key: ${AI_GATEWAY_API_KEY}
model: google/gemini-3.1-flash-image-preview
video_generator:
class_path: tools.VideoGeneratorHappyCapyAPI
init_args:
api_key: ${AI_GATEWAY_API_KEY}
model: google/veo-3.1-generate-preview
working_dir: .working_dir/script2video
| Agent | File | Purpose |
|-------|------|---------|
| CharacterExtractor | agents/character_extractor.py | Extract characters with static/dynamic features from script |
| CharacterPortraitsGenerator | agents/character_portraits_generator.py | Generate front/side/back portraits for each character |
| StoryboardArtist | agents/storyboard_artist.py | Design shot-by-shot storyboard with first/last frames and motion |
| ReferenceImageSelector | agents/reference_image_selector.py | Select best reference images for each frame (face identity #1 priority) |
| CameraImageGenerator | agents/camera_image_generator.py | Build camera trees and generate transition videos |
| BestImageSelector | agents/best_image_selector.py | Select best generated image from candidates |
| Screenwriter | agents/screenwriter.py | Generate scripts from ideas |
| Tool | File | Purpose |
|------|------|---------|
| ImageGeneratorHappyCapyAPI | tools/image_generator_happycapy_api.py | Image generation via HappyCapy Gateway (Gemini) |
| VideoGeneratorHappyCapyAPI | tools/video_generator_happycapy_api.py | Video generation via HappyCapy Gateway (Veo) |
| RenderBackend | tools/render_backend.py | Factory for instantiating generators from config |
CharacterInScene - Character with identifier, static_features, dynamic_featuresShotDescription - Shot with ff_desc, lf_desc, motion_desc, variation_typeCamera - Camera with parent-child relationshipsFrame - Frame with shot_idx, frame_type, visible charactersImageOutput / VideoOutput - Generation outputs with save methodsThis pipeline includes face identity improvements validated through 257 experiments (70% improvement in face distance, from 0.74 to 0.22):
Reference Image Selector: Face identity is the #1 priority when selecting reference images. The front-view portrait is always included when a character's face is visible.
Character Portraits: Enhanced prompts generate identity-critical details (exact nose shape, eye spacing, jawline, distinguishing marks) for cross-scene recognition.
Video Prompt Face Lock: Every video generation prompt is prepended with a face identity instruction requiring the character's face to remain identical to the starting frame throughout the clip.
character_portraits_registry to skip AI portrait generationSee FACE_IDENTITY_GUIDE.md in the ViMax directory for full details.
After a run, the working directory contains:
.working_dir/script2video/
characters.json # Extracted characters
character_portraits_registry.json # Portrait paths registry
character_portraits/ # Generated portraits
0_CharacterName/
front.png
side.png
back.png
storyboard.json # Shot descriptions
camera_tree.json # Camera relationships
shots/
0/
shot_description.json
first_frame.png
last_frame.png (if medium/large variation)
video.mp4
1/
...
final_video.mp4 # Final concatenated output
To use real photos instead of AI-generated portraits:
# Build a portrait registry pointing to your photos
character_portraits_registry = {
"Alice": {
"front": {"path": "/path/to/alice_front.png", "description": "Front view of Alice"},
"side": {"path": "/path/to/alice_side.png", "description": "Side view of Alice"},
"back": {"path": "/path/to/alice_back.png", "description": "Back view of Alice"},
}
}
# Pass to pipeline (skips portrait generation)
await pipeline(
script=script,
user_requirement=user_requirement,
style=style,
character_portraits_registry=character_portraits_registry,
)
Edit the YAML config to use different models:
google/gemini-3.1-flash-image-preview (recommended for face identity)google/veo-3.1-generate-preview (recommended) or openai/sora-2gpt-4.1 (recommended) or any OpenAI-compatible modelRun from the ViMax root directory:
cd /home/node/a0/workspace/527fb591-1439-4b5b-ad5d-90f972773f95/workspace/tmp/ViMax
.venv/bin/python main_happycapy_script2video.py
Reduce max_requests_per_minute in the YAML config.
tools
Universal LaTeX document skill: create, compile, and convert any document to professional PDF with PNG previews. Supports resumes, reports, cover letters, invoices, academic papers, theses/dissertations, academic CVs, presentations (Beamer), scientific posters, formal letters, exams/quizzes, books, cheat sheets, reference cards, exam formula sheets, fillable PDF forms (hyperref form fields), conditional content (etoolbox toggles), mail merge from CSV/JSON (Jinja2 templates), version diffing (latexdiff), charts (pgfplots + matplotlib), tables (booktabs + CSV import), images (TikZ), Mermaid diagrams, AI-generated images, watermarks, landscape pages, bibliography/citations (BibTeX/biblatex), multi-language/CJK (auto XeLaTeX), algorithms/pseudocode, colored boxes (tcolorbox), SI units (siunitx), Pandoc format conversion (Markdown/DOCX/HTML ↔ LaTeX), and PDF-to-LaTeX conversion of handwritten or printed documents (math, business, legal, general). Compile script supports pdflatex, xelatex, lualatex with auto-detection, latexmk backend, texfot log filtering, PDF/A output, and verbosity control (--verbose/--quiet). Empirically optimized scaling: single agent 1-10 pages, split 11-20, batch-7 pipeline 21+. Use when user asks to: (1) create a resume/CV/cover letter, (2) write a LaTeX document, (3) create PDF with tables/charts/images, (4) compile a .tex file, (5) make a report/invoice/presentation, (6) anything involving LaTeX or pdflatex, (7) convert/OCR a PDF to LaTeX, (8) convert handwritten notes, (9) create charts/graphs/diagrams, (10) create slides, (11) write a thesis or dissertation, (12) create an academic CV, (13) create a poster, (14) create an exam/quiz, (15) create a book, (16) convert between document formats (Markdown, DOCX, HTML to/from LaTeX), (17) generate Mermaid diagrams for LaTeX, (18) create a formal business letter, (19) create a cheat sheet or reference card, (20) create an exam formula sheet or crib sheet, (21) condense lecture notes/PDFs into a cheat sheet, (22) create a fillable PDF form with text fields/checkboxes/dropdowns, (23) create a document with conditional content/toggles (show/hide sections), (24) generate batch/mail-merge documents from CSV/JSON data, (25) create a version diff PDF (latexdiff) highlighting changes between documents, (26) create a homework or assignment submission with problems and solutions, (27) create a lab report with data tables, graphs, and error analysis, (28) encrypt or password-protect a PDF, (29) merge multiple PDFs into one, (30) optimize/compress a PDF for web or email, (31) lint or check a LaTeX document for common issues, (32) count words in a LaTeX document, (33) analyze document statistics (figures, tables, citations), (34) fetch BibTeX from a DOI, (35) convert a Graphviz .dot file to PDF/PNG, (36) convert a PlantUML .puml file to PDF/PNG, (37) create a one-pager/fact sheet/executive summary, (38) create a datasheet or product specification sheet, (39) extract pages from a PDF (page ranges, odd/even), (40) check LaTeX package availability before compiling, (41) analyze citations and cross-reference with .bib files, (42) debug LaTeX compilation errors, (43) make a document accessible (PDF/A, tagged PDF), (44) create lecture notes or course handouts, (45) fill an existing PDF form (fillable fields or non-fillable with annotations), (46) extract text or tables from a PDF (pdfplumber, pypdf), (47) OCR a scanned PDF to text (pytesseract), (48) create a PDF programmatically with reportlab (Canvas, Platypus), (49) rotate or crop PDF pages (pypdf), (50) add a watermark to an existing PDF, (51) extract metadata from a PDF (title, author, subject).
testing
Improves the quality of images, especially screenshots, by enhancing resolution, sharpness, and clarity. Perfect for preparing images for presentations, documentation, or social media posts.
tools
HappyCapy-specific skill for publishing content to 13+ social media platforms (Instagram, Twitter, LinkedIn, Threads, Facebook, TikTok, YouTube, Pinterest, Reddit, Telegram, Discord, etc.) simultaneously with platform-optimized styles, optional AI-generated media (video/image), and smart error handling. Uses Late MCP integration available in HappyCapy environment. Use when you need to cross-post to social media, create multi-platform marketing content, share announcements across platforms, publish with platform-specific adaptations, generate AI media for posts, or manage social media publishing workflows. Supports interactive content creation with user-guided platform selection, media generation choices, preview before publish, and automatic retry with character limit adjustments.
development
Automate HappyCapy skill creation by finding and adapting existing skills from anthropics/skills repository. Handles environment constraints (Python 3.11, Node.js 24, no Docker). Use when user wants to create or adapt skills for specific tasks.