skills/vllm-omni-setup/SKILL.md
Install and configure vLLM-Omni for omni-modality model inference. Use when setting up vllm-omni, configuring the environment, installing dependencies, resolving GPU driver issues, or preparing a machine for model serving.
npx skillsauth add hsliuustc0106/vllm-omni-skills vllm-omni-setupInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
vLLM-Omni extends vLLM to support omni-modality model inference (text, image, video, audio) with both autoregressive and diffusion architectures. This skill covers installation from source and environment configuration.
Set these before running any commands. Check the vllm-omni quickstart for currently recommended versions.
export VLLM_VERSION="0.20.0" # vLLM pip package version
export VLLM_OMNI_VERSION="v0.20.0" # vLLM-Omni release / Docker tag
export PYTHON_VERSION="3.12" # Python version
uv package manager (recommended) or pipuv venv --python $PYTHON_VERSION --seed
source .venv/bin/activate
Select the command matching your hardware:
NVIDIA GPU (CUDA):
uv pip install vllm==$VLLM_VERSION --torch-backend=auto
ROCm (AMD):
uv pip install vllm==$VLLM_VERSION --extra-index-url https://wheels.vllm.ai/rocm/$VLLM_VERSION/rocm722
git clone https://github.com/vllm-project/vllm-omni.git
cd vllm-omni
uv pip install -e .
from vllm_omni.entrypoints.omni import Omni
print("vLLM-Omni installed successfully")
Run a minimal text-to-image generation to verify the full stack:
from vllm_omni.entrypoints.omni import Omni
omni = Omni(model="Tongyi-MAI/Z-Image-Turbo")
outputs = omni.generate("a red circle on white background")
outputs[0].request_output[0].images[0].save("test.png")
print("Setup verified - test.png generated")
For containerized deployment:
docker pull vllm/vllm-omni:$VLLM_OMNI_VERSION
docker run --gpus all -p 8091:8091 vllm/vllm-omni:$VLLM_OMNI_VERSION \
vllm serve Tongyi-MAI/Z-Image-Turbo --omni --port 8091
| Variable | Purpose | Default |
|----------|---------|---------|
| VLLM_OMNI_LOG_LEVEL | Logging verbosity | INFO |
| HF_HOME | Hugging Face cache directory | ~/.cache/huggingface |
| CUDA_VISIBLE_DEVICES | GPU device selection | all GPUs |
| VLLM_WORKER_MULTIPROC_METHOD | Worker process method | spawn |
CUDA version mismatch: Ensure your CUDA toolkit version matches the PyTorch build. Check with nvidia-smi and python -c "import torch; print(torch.version.cuda)".
Out of memory on model load: Use gpu_memory_utilization parameter to limit memory. Start with 0.8 and adjust:
omni = Omni(model="...", gpu_memory_utilization=0.8)
Model download failures: Set HF_HOME to a directory with sufficient disk space. Large models (e.g., Qwen3-Omni-30B) require 60GB+ of disk.
development
Use before submitting a PR to vllm-project/vllm-omni — self-check the branch against project conventions, catch dead code, verify accuracy/performance claims, and confirm merge readiness. Use when the user says "pre-check", "self review", "pre-submit check", or "check my PR before I open it."
development
--- name: vllm-omni-test-report description: Two report kinds; **default output is always HTML** unless the user explicitly asks for Markdown (.md). **Release** — `scripts/compose_full_report.py` (**测试结论**, Buildkite metrics, **Test Result** = Common stack + optional `--log-dir-h*` nightly-style summaries + H100/CI block, **Issue tracking** = GitHub `ci-failure` + *local test* in:title, Open bugs); use `--format markdown` only when the user wants .md or `patch_report_*.py`. **Nightly** — `script
testing
Review PRs on vllm-project/vllm-omni by routing to the right domain skills, checking critical evidence, and focusing comments on blocking issues. Use when reviewing pull requests or local branches, triaging review depth, running detailed or default review, or checking tests, benchmarks, and breaking changes in vllm-omni.
data-ai
Generate videos with vLLM-Omni using Wan2.2 and other video generation models. Use when generating videos from text, creating videos from images, configuring video generation parameters, or working with text-to-video or image-to-video models.