skills/vllm-omni-recipe/SKILL.md
Use when adding a recipe for omnimodal models (text-to-image, text-to-video, text-to-audio, image-to-video, any-to-any, diffusion transformers) to the vLLM recipes repository, or documenting vLLM-Omni deployment
npx skillsauth add hsliuustc0106/vllm-omni-skills vllm-omni-recipeInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
vLLM-Omni extends vLLM to support non-autoregressive models like Diffusion Transformers (DiT) for omnimodal generation: text-to-image, text-to-video, text-to-audio, image-to-video, and any-to-any generation.
This skill guides creating deployment guides for omnimodal models in the vLLM recipes repository.
Every recipe follows this structure. Sections marked ⚪ are optional.
# ModelName Usage Guide
[Introduction with HuggingFace link, architecture description]
## Installing vLLM-Omni
[Version-variable based installation]
## [Modality] Generation
[Python API and CLI examples]
## Recommended Deployment Strategy
[Hardware recommendations by model size]
## Key Parameters
[Generation config table]
## Expected Performance ⚪
[Only if verified measurements available]
## Accuracy Comparison ⚪
[Only if verified measurements available]
## Online Serving ⚪
[If supported]
## Additional Resources
[Model card, examples, related links]
For detailed section templates and code examples, see references/recipe-template.md.
Include:
Use version variables:
export VLLM_VERSION="0.16.0"
uv venv
source .venv/bin/activate
uv pip install vllm==$VLLM_VERSION
uv pip install git+https://github.com/vllm-project/vllm-omni.git
Add modality-specific dependencies: pillow/diffusers for image/video, soundfile for audio.
Provide Python API examples for all supported modalities. See references/recipe-template.md for code examples.
Include hardware recommendations table with:
Document generation parameters: height, width, num_inference_steps, guidance_scale, negative_prompt, num_frames (video), audio_end_in_s (audio).
Only include if you have verified measurements. Do not fabricate benchmark numbers.
If model supports OpenAI-compatible serving:
vllm serve org/model-name --omni
For DiT models that support caching:
omni = Omni(model="org/model-name", cache_backend="cache_dit")
{OrgName}/ (e.g., Qwen/, DeepSeek/){ModelName}.md (e.g., Qwen-Image.md)Wan2_2.md or Wan2.2.md| Mistake | Fix |
|---------|-----|
| Placeholder version (0.XX.0) | Use $VLLM_VERSION variable |
| Missing modality dependencies | Add soundfile for audio, diffusers for video |
| Wrong Omni import | Use from vllm_omni.entrypoints.omni import Omni |
| Fabricated benchmarks | Only include verified measurements |
| Missing from README | Add to skills index |
# ModelName Usage Guide format$VLLM_VERSIONdevelopment
Use before submitting a PR to vllm-project/vllm-omni — self-check the branch against project conventions, catch dead code, verify accuracy/performance claims, and confirm merge readiness. Use when the user says "pre-check", "self review", "pre-submit check", or "check my PR before I open it."
development
--- name: vllm-omni-test-report description: Two report kinds; **default output is always HTML** unless the user explicitly asks for Markdown (.md). **Release** — `scripts/compose_full_report.py` (**测试结论**, Buildkite metrics, **Test Result** = Common stack + optional `--log-dir-h*` nightly-style summaries + H100/CI block, **Issue tracking** = GitHub `ci-failure` + *local test* in:title, Open bugs); use `--format markdown` only when the user wants .md or `patch_report_*.py`. **Nightly** — `script
testing
Review PRs on vllm-project/vllm-omni by routing to the right domain skills, checking critical evidence, and focusing comments on blocking issues. Use when reviewing pull requests or local branches, triaging review depth, running detailed or default review, or checking tests, benchmarks, and breaking changes in vllm-omni.
data-ai
Generate videos with vLLM-Omni using Wan2.2 and other video generation models. Use when generating videos from text, creating videos from images, configuring video generation parameters, or working with text-to-video or image-to-video models.