skills/vllm-omni-video-gen/SKILL.md
Generate videos with vLLM-Omni using Wan2.2 and other video generation models. Use when generating videos from text, creating videos from images, configuring video generation parameters, or working with text-to-video or image-to-video models.
npx skillsauth add hsliuustc0106/vllm-omni-skills vllm-omni-video-genInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
vLLM-Omni supports video generation through diffusion transformer models, primarily the Wan2.2 family. Three modes are supported: text-to-video (T2V), image-to-video (I2V), and text+image-to-video (TI2V).
| Model | HF ID | Mode | Min VRAM |
|-------|-------|------|----------|
| Wan2.2-T2V-A14B | Wan-AI/Wan2.2-T2V-A14B-Diffusers | Text-to-video | 48 GB |
| Wan2.2-TI2V-5B | Wan-AI/Wan2.2-TI2V-5B-Diffusers | Text+Image-to-video | 24 GB |
| Wan2.2-I2V-A14B | Wan-AI/Wan2.2-I2V-A14B-Diffusers | Image-to-video | 48 GB |
| NextStep-1.1 | stepfun-ai/NextStep-1.1 | Text-to-video | 24 GB |
| Helios-Distilled | naver-ai/Helios-Distilled | Text-to-video | 24 GB |
| daVinci-MagiHuman | SII-GAIR/daVinci-MagiHuman-Base-1080p | Image-to-video + audio | 24 GB |
daVinci-MagiHuman is an image-to-video model that also generates audio (44100 Hz, 25 fps). Use --enable-diffusion-pipeline-profiler to get per-stage timing (stage_durations) and peak memory (peak_memory_mb) in video responses (async poll JSON or sync HTTP headers).
from vllm_omni.entrypoints.omni import Omni
omni = Omni(model="Wan-AI/Wan2.2-T2V-A14B-Diffusers")
outputs = omni.generate("A dog running on a beach at sunset")
video = outputs[0].request_output[0].video
video.save("dog_beach.mp4")
vllm serve Wan-AI/Wan2.2-T2V-A14B-Diffusers --omni --port 8091
curl -s http://localhost:8091/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [{"role": "user", "content": "A dog running on a beach at sunset"}],
"extra_body": {
"num_inference_steps": 50,
"guidance_scale": 5.0,
"seed": 42
}
}'
Animate a static image into a video:
from vllm_omni.entrypoints.omni import Omni
omni = Omni(model="Wan-AI/Wan2.2-I2V-A14B-Diffusers")
outputs = omni.generate(
prompt="The person starts walking forward",
images=["portrait.jpg"],
)
outputs[0].request_output[0].video.save("animated.mp4")
Combine a text description and reference image:
omni = Omni(model="Wan-AI/Wan2.2-TI2V-5B-Diffusers")
outputs = omni.generate(
prompt="The city lights up at night with moving traffic",
images=["cityscape.jpg"],
)
outputs[0].request_output[0].video.save("city_night.mp4")
| Parameter | Description | Typical Range |
|-----------|-------------|---------------|
| num_inference_steps | Denoising steps | 30-100 |
| guidance_scale | CFG scale | 3.0-7.0 |
| seed | Random seed | Any integer |
| num_frames | Number of output frames | Model-dependent |
| fps | Frames per second | 8-24 |
Video generation is significantly more compute-intensive than image generation:
vllm serve <model> --omni --cpu-offload-gb 20
transformer + transformer-2), the sequential offloader now offloads all other DiTs to CPU when any one is running. This allows Wan2.2-T2V to fit on 64GB GPUs with --enable-cpu-offload --tensor-parallel-size 2.Generation too slow: Use tensor parallelism or enable TeaCache/Cache-DiT acceleration. Helios supports cache-dit (--enable-cache-dit) for ~20% speedup.
LTX-2 error with diffusers>=0.38.0: Fixed in #3661. Text encoder normalization moved into the diffusers connector. Update vllm-omni to the latest version when upgrading diffusers to 0.38.0+.
Out of memory: Reduce resolution/frame count or use CPU offloading.
Choppy output: Increase num_inference_steps and num_frames.
development
Use before submitting a PR to vllm-project/vllm-omni — self-check the branch against project conventions, catch dead code, verify accuracy/performance claims, and confirm merge readiness. Use when the user says "pre-check", "self review", "pre-submit check", or "check my PR before I open it."
development
--- name: vllm-omni-test-report description: Two report kinds; **default output is always HTML** unless the user explicitly asks for Markdown (.md). **Release** — `scripts/compose_full_report.py` (**测试结论**, Buildkite metrics, **Test Result** = Common stack + optional `--log-dir-h*` nightly-style summaries + H100/CI block, **Issue tracking** = GitHub `ci-failure` + *local test* in:title, Open bugs); use `--format markdown` only when the user wants .md or `patch_report_*.py`. **Nightly** — `script
testing
Review PRs on vllm-project/vllm-omni by routing to the right domain skills, checking critical evidence, and focusing comments on blocking issues. Use when reviewing pull requests or local branches, triaging review depth, running detailed or default review, or checking tests, benchmarks, and breaking changes in vllm-omni.
testing
Install and configure vLLM-Omni for omni-modality model inference. Use when setting up vllm-omni, configuring the environment, installing dependencies, resolving GPU driver issues, or preparing a machine for model serving.