Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

hsliuustc0106/vllm-omni-video-gen

Name: vllm-omni-video-gen
Author: hsliuustc0106

skills/vllm-omni-video-gen/SKILL.md

npx skillsauth add hsliuustc0106/vllm-omni-skills vllm-omni-video-gen

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

vLLM-Omni Video Generation

Overview

vLLM-Omni supports video generation through diffusion transformer models, primarily the Wan2.2 family. Three modes are supported: text-to-video (T2V), image-to-video (I2V), and text+image-to-video (TI2V).

Supported Video Models

| Model | HF ID | Mode | Min VRAM | |-------|-------|------|----------| | Wan2.2-T2V-A14B | Wan-AI/Wan2.2-T2V-A14B-Diffusers | Text-to-video | 48 GB | | Wan2.2-TI2V-5B | Wan-AI/Wan2.2-TI2V-5B-Diffusers | Text+Image-to-video | 24 GB | | Wan2.2-I2V-A14B | Wan-AI/Wan2.2-I2V-A14B-Diffusers | Image-to-video | 48 GB | | NextStep-1.1 | stepfun-ai/NextStep-1.1 | Text-to-video | 24 GB | | Helios-Distilled | naver-ai/Helios-Distilled | Text-to-video | 24 GB | | daVinci-MagiHuman | SII-GAIR/daVinci-MagiHuman-Base-1080p | Image-to-video + audio | 24 GB |

daVinci-MagiHuman is an image-to-video model that also generates audio (44100 Hz, 25 fps). Use --enable-diffusion-pipeline-profiler to get per-stage timing (stage_durations) and peak memory (peak_memory_mb) in video responses (async poll JSON or sync HTTP headers).

Quick Start: Text-to-Video

Offline

from vllm_omni.entrypoints.omni import Omni

omni = Omni(model="Wan-AI/Wan2.2-T2V-A14B-Diffusers")
outputs = omni.generate("A dog running on a beach at sunset")
video = outputs[0].request_output[0].video
video.save("dog_beach.mp4")

Online API

vllm serve Wan-AI/Wan2.2-T2V-A14B-Diffusers --omni --port 8091

curl -s http://localhost:8091/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{"role": "user", "content": "A dog running on a beach at sunset"}],
    "extra_body": {
      "num_inference_steps": 50,
      "guidance_scale": 5.0,
      "seed": 42
    }
  }'

Image-to-Video

Animate a static image into a video:

from vllm_omni.entrypoints.omni import Omni

omni = Omni(model="Wan-AI/Wan2.2-I2V-A14B-Diffusers")
outputs = omni.generate(
    prompt="The person starts walking forward",
    images=["portrait.jpg"],
)
outputs[0].request_output[0].video.save("animated.mp4")

Text+Image-to-Video (TI2V)

Combine a text description and reference image:

omni = Omni(model="Wan-AI/Wan2.2-TI2V-5B-Diffusers")
outputs = omni.generate(
    prompt="The city lights up at night with moving traffic",
    images=["cityscape.jpg"],
)
outputs[0].request_output[0].video.save("city_night.mp4")

Video Generation Parameters

| Parameter | Description | Typical Range | |-----------|-------------|---------------| | num_inference_steps | Denoising steps | 30-100 | | guidance_scale | CFG scale | 3.0-7.0 | | seed | Random seed | Any integer | | num_frames | Number of output frames | Model-dependent | | fps | Frames per second | 8-24 |

Performance Considerations

Video generation is significantly more compute-intensive than image generation:

A single video may take 2-10 minutes on a single GPU
Multi-GPU tensor parallelism strongly recommended for 14B models
Multi-thread weight loading (enabled by default) significantly reduces cold-start time for Wan2.2 models
Enable TeaCache for diffusion acceleration (see vllm-omni-perf skill)

CPU offloading can help fit larger models:

vllm serve <model> --omni --cpu-offload-gb 20

For multi-transformer pipelines (e.g., Wan2.2-T2V has transformer + transformer-2), the sequential offloader now offloads all other DiTs to CPU when any one is running. This allows Wan2.2-T2V to fit on 64GB GPUs with --enable-cpu-offload --tensor-parallel-size 2.

Troubleshooting

Generation too slow: Use tensor parallelism or enable TeaCache/Cache-DiT acceleration. Helios supports cache-dit (--enable-cache-dit) for ~20% speedup.

LTX-2 error with diffusers>=0.38.0: Fixed in #3661. Text encoder normalization moved into the diffusers connector. Update vllm-omni to the latest version when upgrading diffusers to 0.38.0+.

Out of memory: Reduce resolution/frame count or use CPU offloading.

Choppy output: Increase num_inference_steps and num_frames.

References

For Wan2.2 model details and advanced config, see references/wan-models.md

hsliuustc0106/vllm-omni-video-gen

skills/vllm-omni-video-gen/SKILL.md

Generate videos with vLLM-Omni using Wan2.2 and other video generation models. Use when generating videos from text, creating videos from images, configuring video generation parameters, or working with text-to-video or image-to-video models.

67 stars

data-ai

Updated May 25, 2026

$ install --global

skillsauth

npx skillsauth add hsliuustc0106/vllm-omni-skills vllm-omni-video-gen

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 25, 2026, 2:17 AM11.9s2 files scanned

SKILL.md

name:: vllm-omni-video-gen
description:: Generate videos with vLLM-Omni using Wan2.2 and other video generation models. Use when generating videos from text, creating videos from images, configuring video generation parameters, or working with text-to-video or image-to-video models.

vLLM-Omni Video Generation

Overview

Supported Video Models

Quick Start: Text-to-Video

Offline

from vllm_omni.entrypoints.omni import Omni

omni = Omni(model="Wan-AI/Wan2.2-T2V-A14B-Diffusers")
outputs = omni.generate("A dog running on a beach at sunset")
video = outputs[0].request_output[0].video
video.save("dog_beach.mp4")

Online API

vllm serve Wan-AI/Wan2.2-T2V-A14B-Diffusers --omni --port 8091

curl -s http://localhost:8091/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{"role": "user", "content": "A dog running on a beach at sunset"}],
    "extra_body": {
      "num_inference_steps": 50,
      "guidance_scale": 5.0,
      "seed": 42
    }
  }'

Image-to-Video

Animate a static image into a video:

from vllm_omni.entrypoints.omni import Omni

omni = Omni(model="Wan-AI/Wan2.2-I2V-A14B-Diffusers")
outputs = omni.generate(
    prompt="The person starts walking forward",
    images=["portrait.jpg"],
)
outputs[0].request_output[0].video.save("animated.mp4")

Text+Image-to-Video (TI2V)

Combine a text description and reference image:

omni = Omni(model="Wan-AI/Wan2.2-TI2V-5B-Diffusers")
outputs = omni.generate(
    prompt="The city lights up at night with moving traffic",
    images=["cityscape.jpg"],
)
outputs[0].request_output[0].video.save("city_night.mp4")

Video Generation Parameters

Performance Considerations

Video generation is significantly more compute-intensive than image generation:

A single video may take 2-10 minutes on a single GPU
Multi-GPU tensor parallelism strongly recommended for 14B models
Multi-thread weight loading (enabled by default) significantly reduces cold-start time for Wan2.2 models
Enable TeaCache for diffusion acceleration (see vllm-omni-perf skill)

CPU offloading can help fit larger models:

vllm serve <model> --omni --cpu-offload-gb 20

For multi-transformer pipelines (e.g., Wan2.2-T2V has transformer + transformer-2), the sequential offloader now offloads all other DiTs to CPU when any one is running. This allows Wan2.2-T2V to fit on 64GB GPUs with --enable-cpu-offload --tensor-parallel-size 2.

Troubleshooting

Generation too slow: Use tensor parallelism or enable TeaCache/Cache-DiT acceleration. Helios supports cache-dit (--enable-cache-dit) for ~20% speedup.

LTX-2 error with diffusers>=0.38.0: Fixed in #3661. Text encoder normalization moved into the diffusers connector. Update vllm-omni to the latest version when upgrading diffusers to 0.38.0+.

Out of memory: Reduce resolution/frame count or use CPU offloading.

Choppy output: Increase num_inference_steps and num_frames.

References

For Wan2.2 model details and advanced config, see references/wan-models.md

Related Skills

hsliuustc0106/vllm-omni-pre-check

development

VerifiedTrustedCommunity

Use before submitting a PR to vllm-project/vllm-omni — self-check the branch against project conventions, catch dead code, verify accuracy/performance claims, and confirm merge readiness. Use when the user says "pre-check", "self review", "pre-submit check", or "check my PR before I open it."

69SKILL.mdUpdated May 29, 2026

hsliuustc0106/vllm-omni-pre-check

hsliuustc0106/skills/vllm-omni-test-report

development

VerifiedTrustedCommunity

--- name: vllm-omni-test-report description: Two report kinds; **default output is always HTML** unless the user explicitly asks for Markdown (.md). **Release** — `scripts/compose_full_report.py` (**测试结论**, Buildkite metrics, **Test Result** = Common stack + optional `--log-dir-h*` nightly-style summaries + H100/CI block, **Issue tracking** = GitHub `ci-failure` + *local test* in:title, Open bugs); use `--format markdown` only when the user wants .md or `patch_report_*.py`. **Nightly** — `script

69SKILL.mdUpdated May 3, 2026

hsliuustc0106/skills/vllm-omni-test-report

hsliuustc0106/vllm-omni-review

testing

VerifiedTrustedCommunity

Review PRs on vllm-project/vllm-omni by routing to the right domain skills, checking critical evidence, and focusing comments on blocking issues. Use when reviewing pull requests or local branches, triaging review depth, running detailed or default review, or checking tests, benchmarks, and breaking changes in vllm-omni.

69SKILL.mdUpdated May 3, 2026

hsliuustc0106/vllm-omni-review

hsliuustc0106/vllm-omni-setup

testing

VerifiedTrustedCommunity

Install and configure vLLM-Omni for omni-modality model inference. Use when setting up vllm-omni, configuring the environment, installing dependencies, resolving GPU driver issues, or preparing a machine for model serving.

67SKILL.mdUpdated May 3, 2026

hsliuustc0106/vllm-omni-setup

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/hsliuustc0106/vllm-omni-skills.git

# Copy into Claude Code skills folder (global)
cp -r vllm-omni-skills/skills/vllm-omni-video-gen ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

hsliuustc0106/vllm-omni-skills

67 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT