/SKILL.md
Plans cinematic images by filling a 12-field director sheet (subject, action, environment, shot, camera, lens, lighting, composition, color grade, mood, aspect ratio, anti-list) that compiles into a model-optimized prompt. Use when generating or iterating on images with GPT Image 2, OpenArt, Midjourney, Flux, Nano Banana, Ideogram, or any text-to-image model, or when the user mentions cinematic shots, hero images, portraits, product photos, album covers, concept art, editorial imagery, or gives a short prompt where creative intent is richer than the text typed.
npx skillsauth add infoanupama/skillsauth scene-directorInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
One-line principle: Think like a film director, not a typist. Plan the shot before you render it.
A short creative prompt ("a woman drinking coffee at sunrise") is ambiguous in twelve ways — lens, framing, subject identity, emotional beat, light direction, location specificity, wardrobe, color grade, era, medium, mood, aspect ratio. Image models silently pick one combination; creators usually meant another. The result is retries.
Scene Director forces an explicit reasoning pass: extract intent → fill a structured director sheet → compile the sheet into a model-optimized prompt → generate → critique → revise the sheet (not the prompt).
This skill pairs especially well with GPT Image 2, whose reasoning layer can act on structured input. It also works with any diffusion model — the sheet compiles differently per target.
Use this skill when:
Do NOT use this skill when:
User intent (short prompt)
↓
[1] Interrogate — ask at most 2 clarifying questions ONLY when intent is load-bearing ambiguous
↓
[2] Fill the Director Sheet (12 fields, see Quick Reference)
↓
[3] Compile — render sheet into a model-specific prompt string (see references/gpt-image-2-prompting.md)
↓
[4] Generate — call the image model
↓
[5] Critique — compare output to sheet. What's off?
↓
[6] Revise the SHEET (not the prompt). Re-compile. Re-generate.
The key move is step 6. Editing the sheet keeps intent traceable across iterations; editing raw prompt strings loses the plot.
| # | Field | Question it answers | Example | |---|-------|---------------------|---------| | 1 | Logline | One sentence, what is this image? | "A tired founder reads a term sheet at 2am." | | 2 | Subject | Who/what, specifically? Age, build, wardrobe, ethnicity if relevant. | "Woman, 30s, South Asian, hoodie over pajamas, hair up" | | 3 | Action / beat | What exact moment? | "Mid-sigh, pen held mid-air, eyes fixed on paper" | | 4 | Environment | Where, and what tells us that? | "Home office, IKEA desk, plant, window showing city lights" | | 5 | Time & era | When, literally. | "2am, present day (2026)" | | 6 | Shot type | Wide / medium / close-up / extreme close-up / OTS | "Medium close-up" | | 7 | Camera & lens | Focal length, angle, height | "35mm, eye-level, slight Dutch tilt" | | 8 | Lighting | Key + fill + practical sources, direction, quality | "Warm desk lamp key from camera-right, cool window fill from behind, practical laptop glow on face" | | 9 | Composition | Rule of thirds / centered / leading lines / negative space | "Subject on left third, paper fills bottom-right, negative space top-left for title" | | 10 | Color & grade | Palette, contrast, film stock | "Teal shadows, amber highlights, Kodak Portra 400 feel, gentle grain" | | 11 | Mood | Emotional tone in 2-3 words | "Weary, hopeful, private" | | 12 | Deliverable | Aspect ratio, output size, use case | "3:2 landscape, hero image for pitch deck cover slide" |
An optional 13th field — Anti-list: things to explicitly exclude ("no stock-photo smile, no plastic skin, no generic office"). For GPT Image 2 this maps cleanly to negative guidance.
Before (raw prompt, typical user input):
cinematic shot of a founder reading term sheet at 2am, moody
After (director sheet compiled for GPT Image 2):
A medium close-up of a South Asian woman in her 30s, wearing a soft grey hoodie
over pajamas, hair loosely tied up. She is mid-sigh, holding a pen in her right
hand just above a paper term sheet on a cluttered IKEA desk. Eyes locked on the
document. Home office at 2am, plants, a window behind her revealing out-of-focus
city lights. Shot on a 35mm lens, eye-level, very subtle Dutch tilt. Key light
is a warm tungsten desk lamp from camera-right; cool blue fill spills from the
window behind; a practical laptop glow lifts her face from below. Subject sits
on the left third; paper fills the bottom-right; negative space top-left reserved
for title treatment. Teal shadows, amber highlights, Kodak Portra 400 color
response, gentle natural film grain. Mood: weary, hopeful, private. Photoreal,
3:2 landscape. Avoid: stock-photo smile, plastic skin, generic corporate office,
over-sharp digital look.
Same creative seed. Ten times more faithful to intent. No additional user effort — the skill did the work.
See references/gpt-image-2-prompting.md for the full compilation ruleset, and assets/director-sheet-template.md for a fillable template.
When iterating, ALWAYS edit fields on the sheet, not tokens in the prompt. Rationale: prompts are lossy projections of intent; sheets are the source of truth. A user saying "warmer" should update field 10 (color & grade) and field 8 (lighting temperature), after which the prompt recompiles automatically. This is how you get predictable iteration at 8M-MAU scale — the sheet is diffable, the prompt is not.
These rules exist because default model output trends toward a "plastic" look. Enforce them in the compile step unless the user's logline explicitly calls for a stylized/illustrative result.
User says "make it moodier" → don't just add the word "moody". Walk the sheet:
Then recompile. The difference between amateur and professional image direction is which field you edit.
If a reference image is provided, invert the workflow: extract the sheet FROM the reference (what shot type, what lens, what grade is this?), then have the user modify specific fields. This prevents the common failure mode of "make it look like this but different" where the model keeps no anchor.
| Mistake | Fix | |---------|-----| | Asking the user 10 clarifying questions | Ask at most 2; fill the rest with sensible defaults and flag them explicitly in the sheet | | Over-specifying (every field padded with adjectives) | Minimum viable sheet; leave fields terse when intent is clear | | Describing the image instead of the moment | Describe action as a verb in a beat, not a pose | | Forgetting the anti-list | GPT Image 2 benefits enormously from explicit negative guidance | | Editing the compiled prompt instead of the sheet | Treat the prompt as a build artifact; never hand-edit | | Pretending to know the subject's identity | For fictional people, describe visually; never invent a real named person |
--ar, --stylize, --chaos at the tail.See references/ for per-model compile rules.
SKILL.md — this filereferences/director-sheet-schema.md — the full JSON schema for programmatic usereferences/cinematic-vocabulary.md — shot types, lens characteristics, lighting terms, film stocksreferences/gpt-image-2-prompting.md — model-specific compile rules and examplesassets/director-sheet-template.md — fillable markdown template for humansscripts/render_prompt.py — converts a director-sheet JSON into a compiled prompt stringevals/evals.json — three realistic test prompts for red/green testingImage models don't fail at rendering. They fail at guessing what you meant. Scene Director closes that gap with a sheet, a compile step, and an iteration protocol that edits intent rather than text.
For OpenArt's 8M MAU, the value is mechanical: every retry that becomes a sheet-edit is one less generation spent chasing a feeling the model couldn't read from a 12-word prompt.
development
Maintainer-only workflow for handling GitHub Secret Scanning alerts on OpenClaw. Use when Codex needs to triage, redact, clean up, and resolve secret leakage found in issue comments, issue bodies, PR comments, or other GitHub content.
development
Maintainer workflow for OpenClaw releases, prereleases, changelog release notes, and publish validation. Use when Codex needs to prepare or verify stable or beta release steps, align version naming, assemble release notes, check release auth requirements, or validate publish-time commands and artifacts.
development
Run, watch, debug, and extend OpenClaw QA testing with qa-lab and qa-channel. Use when Codex needs to execute the repo-backed QA suite, inspect live QA artifacts, debug failing scenarios, add new QA scenarios, or explain the OpenClaw QA workflow. Prefer the live OpenAI lane with regular openai/gpt-5.4 in fast mode; do not use gpt-5.4-pro or gpt-5.4-mini unless the user explicitly overrides that policy.
development
End-to-end Parallels smoke, upgrade, and rerun workflow for OpenClaw across macOS, Windows, and Linux guests. Use when Codex needs to run, rerun, debug, or interpret VM-based install, onboarding, gateway smoke tests, latest-release-to-main upgrade checks, fresh snapshot retests, or optional Discord roundtrip verification under Parallels.