skill-candidates/local-comfy-wan-multigpu/SKILL.md
Operate local-only ComfyUI Wan image-to-video workflows with MultiGPU placement, quality-first source staging, long-clip validation, and Explorer-based export handoff. Use when Codex needs to turn private local stills into local video artifacts without browser-only UX, while avoiding destructive crops, stray console windows, and GPU contention with chat models.
npx skillsauth add grtninja/skill-arbiter local-comfy-wan-multigpuInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Use this skill when the goal is a real local media-generation pass, not just an API or health smoke.
All local paths and endpoints in this skill are intentionally parameterized.
Set environment variables (for example LOCAL_MEDIA_WORKBENCH_TOOLS_ROOT, LOCAL_COMFY_*, and LOCAL_MUSETALK_REPO) instead of hardcoding workstation-specific paths.
scripts/init_media_job.py.scripts/build_contact_sheet.py.scripts/stage_reference_frame.py for deterministic fit or pad staging.smoke_test from quality_render:
smoke_test: shortest cheap proof only.quality_render: choose a properly framed source and push the workflow to the longest supported clip length.scripts/inspect_wan_limits.py.16 fps as the native Wan baseline unless a specific workflow proves otherwise.--cuda-device; keep both CUDA devices visible and place loaders deliberately.scripts/launch_comfy_headless.py.scripts/run_wan_flf2v.py for first/last-frame quality renders and living-portrait baselines.scripts/run_keyframe_range_pipeline.py when the user supplies a range of approved keyframes that should become a single complete video.scripts/run_complete_media_loop.py when the job should drive staging, raw motion, Jessica TTS, MuseTalk, QC, and export packaging in one pass.scripts/review_video_result.py.face-yunet, face-landmarks, face-headpose, face-eye-state for source preflightpose-simcc, emotion-ferplus, face-eye-state for raw-render body/eye/expression QCsegmentation-bisenet, depth-hr, and DaSiamRPN tracker pair for masks, ROI tracking, and mouth-region inspectionaudio-speech-enhancement only for recorded/noisy audio, not clean Jessica TTSWanFirstLastFrameToVideo over a plain one-way image-to-video run when they are available locally.scripts/prepare_capcut_loop_handoff.py for the preferred forward + reverse editor handoffscripts/run_loop_vfi_master.py for a Comfy-based VFI masterscripts/make_loop_master.py only as the fallback ffmpeg-only lane30 or 60 fps, render Wan at native cadence first and master upward with dedicated VFI instead of pretending Wan itself should be the final 30/60 fps source.18 minutes per raw slice with a 30 minute hard cap1 sentence, 6-14 words, and <= 7s per accepted spoken sliceaf_jessica lane unless the user explicitly chooses another approved voice.scripts/synthesize_kokoro_tts.py with the local Kokoro model files when the workbench needs a deterministic offline TTS source.scripts/run_musetalk_pass.py for the current animated-source spoken pass lane.scripts/analyze_spoken_clip.py to compare the spoken output against the expected line.LatentSync 1.6 as the primary quality-upgrade target for mouth clarity; treat MuseTalk as the current fallback/baseline until the local LatentSync lane is fully installed.25 fps video and 16 kHz mono audio before MuseTalk / LatentSync-class processing.30/60 fps variants unless the user explicitly asks for VFI masters.25 fps clip.30/60 fps as opt-in post-review variants, not the automatic default.<WORKBENCH_TOOLS>/evaluate_heterogeneous_media_pipeline.py<AMUSE_BASE_URL> as a GUI/inventory bridge unless a future native execution API is provenDeviceSelectorMultiGPU options as the source of truth when /system_stats underreports CUDA visibilityscripts/validate_wan_bundle.pywan22_i2v_a14b_flf2v -> wan22_ti2v_5b -> wan22_s2v_14b -> wan22_animate_14b5060 Ti: preferred raw Wan lane and heavier segment owner4060 Ti: secondary raw Wan lane with smaller slices matched to its measured throughput7600 XT: peer generation lane when a Radeon Comfy worker is healthy; otherwise use it continuously for AMUSE DirectML generation/repair/upscale, Qwen review, and sidecar critique<WORKBENCH_TOOLS>/set_mx3_media_assist_mode.py --mode status<WORKBENCH_TOOLS>/set_mx3_media_assist_mode.py --mode enable<WORKBENCH_TOOLS>/set_mx3_media_assist_mode.py --mode restore --from <saved-status-json>scripts/mx3_frame_gate.py on each raw slice before lip-sync; the current implementation is a post-render frame-QC scaffold until a thinner dedicated MX3 face/headpose route is wired<WORKBENCH_TOOLS>/build_multilane_review_report.pyQwen3.5-VL for image review, Radeon Qwen3.5 text for distillation when useful, and include the current runtime evaluationvision/analyze-frame surface as an extra review laneslice_completed; do not wait for the whole batch to finish before the Radeon and MX3 start critiquing<WORKBENCH_TOOLS>/monitor_segmented_batch.py --review-completed-slices --watch-until-idleraw_render from mastered_output. A higher-fps looping master may be derived from the raw render when that gives a better result than a more expensive rerun.6 fps or similarly under-cranked output a useful dry run unless the user explicitly asked for a bare smoke test.af_jessica lane is available.30/60 fps variants until the 25 fps reviewed master is accepted.For Wan image-to-video lanes:
5060 Ti lane for high-noise UNet work and the 4060 Ti lane for the low-noise / helper lane when both are visible to the runtime.When the user asks for a useful render, inspect the live node/object schema or workflow template and choose the longest supported clip length that is practical on the current machine. Do not default to a 4-second or near-minimal clip unless the user asked for a quick smoke.
For Wan image-to-video work, prefer the official/native cadence and master upward later:
16 fps81 frames at a 480p-class area or portrait-equivalent areaRIFE or FILM after raw approvalUse this skill for direct local ComfyUI/Wan media work.
Do not use this skill for:
references/complete-work-loop.mdreferences/media-mode-checklist.mdreferences/mx3-media-assist.mdreferences/penny-continuity-contract.mdreferences/review-loop.mdreferences/source-selection-rules.mdreferences/wan-native-vfi-best-practices.mdIf the lane is blocked or the render is poor:
tools
Run a defender-first security sweep on code, configs, prompts, model/tooling surfaces, or third-party contribution lanes. Use when a request involves safe bug, leak, zero-day-class, exploit, or hack hunting for protection, when contributing to outside repositories and you want a focused security pass, or when touching auth, secrets, permissions, network exposure, prompt/tool boundaries, data flow, or update/build surfaces. This skill is defensive only and must never be used for weaponization or unauthorized access.
development
Validate and repair VRM Sandbox startup acceptance with shim-first local model authority, frontend/backend bring-up, and avatar-runtime launch proof. Use when launch behavior, chat handoff, voice fallback, or runtime bridge acceptance must be verified end to end.
documentation
Align documented voice-command catalogs, endpoint action allowances, and live runtime handlers so operator-visible voice surfaces match what the stack can actually execute. Use when voice command docs, parser matrices, endpoint permissions, or runtime action routing drift apart.
development
Track SkillHub trend and topic drift, maintain a bounded rewrite watchlist, and surface emerging gaps worth turning into repo-owned skills. Use when the marketplace query set shows new families or when the current shortlist has gone stale.