skills/vllm-omni-diffusion-benchmark-profile/SKILL.md
Benchmark, profile, and tune vLLM-Omni diffusion models, especially Wan/Qwen/Helios style pipelines on GPU or NPU. Use this when measuring denoise latency, collecting torch or torch_npu profiler traces, reading ASCEND_PROFILER_OUTPUT artifacts, comparing before/after performance, or diagnosing bottlenecks such as SP communication, VAE convs, data transforms, offload overlap, and RoPE overhead.
npx skillsauth add hsliuustc0106/vllm-omni-skills vllm-omni-diffusion-benchmark-profileInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Use this skill when the task is to benchmark, profile, compare, or optimize a vLLM-Omni diffusion model.
Primary target:
Secondary targets:
Correctness rule:
Use this skill for:
image_to_video.py, text_to_video.py, text_to_image.py, image edit, or other standalone diffusion examplesvllm serve ... --profiler-config ...torch.profiler analysistorch_npu.profiler / ASCEND_PROFILER_OUTPUT analysisDo not use this skill for:
num_inference_steps to 2 unless the task specifically needs long-run behavior.ViewCopy, TransData, Transpose, InplaceCopy2 steps.--enable-layerwise-offload--vae-patch-parallel-size--vae-use-tiling
are enabled, because these materially change the bottleneck mix.Use --profiler-dir to enable torch/torch_npu profiler output.
Common examples:
examples/offline_inference/image_to_video/image_to_video.pyexamples/offline_inference/text_to_video/text_to_video.pyexamples/offline_inference/text_to_image/text_to_image.pyTypical NPU Wan2.2 I2V profile command:
cd /root/vllm-workspace/vllm-omni/examples/offline_inference/image_to_video
python image_to_video.py \
--model Wan-AI/Wan2.2-I2V-A14B-Diffusers \
--image cherry_blossom.jpg \
--prompt "Cherry blossoms swaying gently in the breeze, petals falling, smooth motion" \
--negative-prompt "<optional quality filter>" \
--height 512 \
--width 768 \
--num-frames 49 \
--guidance-scale 4.0 \
--num-inference-steps 2 \
--flow-shift 12.0 \
--fps 16 \
--output i2v_output.mp4 \
--enable-layerwise-offload \
--ulysses-degree 8 \
--vae-patch-parallel-size 8 \
--vae-use-tiling \
--profiler-dir ./wan22_step2_profile
Use:
vllm serve Wan-AI/Wan2.2-I2V-A14B-Diffusers \
--omni \
--port 8091 \
--profiler-config '{"profiler": "torch", "torch_profiler_dir": "./vllm_profile"}'
Then control profiling with /start_profile and /stop_profile.
.json.gz trace in Perfetto or chrome://tracingkey_averages() tables or exported tablesNPU profiling uses torch_npu.profiler.tensorboard_trace_handler(...) and must be parsed offline.
Core files after analyse(<trace_dir>):
step_trace_time.csvop_statistic.csvoperator_details.csvcommunication.jsonkernel_details.csvtrace_view.jsonUse:
from torch_npu.profiler.profiler import analyse
analyse("./wan22_step2_profile/diffusion_worker_0")
Or adapt:
/root/vllm-workspace/analyze_profiling.pyexamples/offline_inference/image_to_video/analyse.pyFor NPU artifact interpretation, read:
references/npu-workflow.mdFor common bottlenecks and likely fixes, read:
references/known-hotspots.mdWhen reading a profile, answer these in order:
Always summarize changes using the same fields:
For NPU, prefer comparing:
step_trace_time.csvop_statistic.csvcommunication.jsonoperator_details.csvall_to_all_* wrapper.A good perf investigation should usually produce:
development
Use before submitting a PR to vllm-project/vllm-omni — self-check the branch against project conventions, catch dead code, verify accuracy/performance claims, and confirm merge readiness. Use when the user says "pre-check", "self review", "pre-submit check", or "check my PR before I open it."
development
--- name: vllm-omni-test-report description: Two report kinds; **default output is always HTML** unless the user explicitly asks for Markdown (.md). **Release** — `scripts/compose_full_report.py` (**测试结论**, Buildkite metrics, **Test Result** = Common stack + optional `--log-dir-h*` nightly-style summaries + H100/CI block, **Issue tracking** = GitHub `ci-failure` + *local test* in:title, Open bugs); use `--format markdown` only when the user wants .md or `patch_report_*.py`. **Nightly** — `script
testing
Review PRs on vllm-project/vllm-omni by routing to the right domain skills, checking critical evidence, and focusing comments on blocking issues. Use when reviewing pull requests or local branches, triaging review depth, running detailed or default review, or checking tests, benchmarks, and breaking changes in vllm-omni.
data-ai
Generate videos with vLLM-Omni using Wan2.2 and other video generation models. Use when generating videos from text, creating videos from images, configuring video generation parameters, or working with text-to-video or image-to-video models.