Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

hsliuustc0106/vllm-omni-hardware

Name: vllm-omni-hardware
Author: hsliuustc0106

skills/vllm-omni-hardware/SKILL.md

npx skillsauth add hsliuustc0106/vllm-omni-skills vllm-omni-hardware

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

vLLM-Omni Hardware Configuration

Overview

vLLM-Omni supports four hardware backends: NVIDIA CUDA (default), AMD ROCm, Huawei NPU (Ascend), and Intel XPU. Each backend has specific installation steps and configuration options.

Supported Backends

| Backend | Accelerators | Install Method | Maturity | |---------|-------------|----------------|----------| | CUDA | NVIDIA A100/H100/L40/RTX | uv pip install vllm==$VLLM_VERSION | Production | | ROCm | AMD MI300X/MI250X | uv pip install vllm==$VLLM_VERSION --extra-index-url ... | Production | | NPU | Huawei Ascend 910B | Source build with CANN | Supported | | XPU | Intel Data Center GPU Max | Source build with oneAPI | Experimental |

Backend Selection Workflow

Step 1: Identify Hardware

# NVIDIA GPU
nvidia-smi

# AMD GPU
rocm-smi

# Huawei NPU
npu-smi info

# Intel XPU
xpu-smi discovery

Step 2: Install for Backend

CUDA (NVIDIA):

uv pip install vllm==$VLLM_VERSION --torch-backend=auto

ROCm (AMD):

uv pip install vllm==$VLLM_VERSION --extra-index-url https://wheels.vllm.ai/rocm/$VLLM_VERSION/rocm722

NPU (Huawei):

# Requires CANN toolkit pre-installed
git clone https://github.com/vllm-project/vllm-omni.git
cd vllm-omni
pip install -e ".[npu]"

XPU (Intel):

# Requires oneAPI toolkit pre-installed
git clone https://github.com/vllm-project/vllm-omni.git
cd vllm-omni
pip install -e ".[xpu]"

Step 3: Verify Backend

import torch

# CUDA
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"GPU count: {torch.cuda.device_count()}")

# ROCm (same API as CUDA via HIP)
print(f"ROCm/HIP available: {torch.cuda.is_available()}")

# XPU
print(f"XPU available: {torch.xpu.is_available()}")

Device Selection

Control which devices vLLM-Omni uses:

# CUDA: Select specific GPUs
CUDA_VISIBLE_DEVICES=0,1 vllm serve <model> --omni

# ROCm: Select specific GPUs
HIP_VISIBLE_DEVICES=0,1 vllm serve <model> --omni

# NPU: Select specific devices
ASCEND_RT_VISIBLE_DEVICES=0,1 vllm serve <model> --omni

Model Support by Backend

Not all models are supported on every backend. Check the support matrix:

| Model | CUDA | ROCm | NPU | XPU | |-------|------|------|-----|-----| | Qwen3-Omni | Yes | Yes | Yes | No | | Qwen2.5-Omni | Yes | Yes | Yes | No | | Qwen-Image | Yes | Yes | Yes | No | | Z-Image | Yes | Yes | Yes | No | | BAGEL | Yes | Yes | No | No | | Wan2.2 | Yes | Yes | Yes | No | | FLUX | Yes | Yes | Yes | No | | Qwen3-TTS | Yes | Yes | Yes | No | | Stable-Diffusion-3 | Yes | Yes | No | No | | Stable-Audio | Yes | No | No | No |

Troubleshooting

CUDA out of memory: Reduce --gpu-memory-utilization or use tensor parallelism across multiple GPUs.

ROCm kernel compilation slow: First launch compiles kernels for your GPU. Subsequent launches reuse cached kernels. Set MIOPEN_USER_DB_PATH for persistent kernel cache.

ROCm Wan2.2 RoPE error (too many values to unpack): Fixed in #3463. Added dedicated forward_hip() RoPE path for ROCm using flash attention rotary embedding.

NPU operator not supported: Some operations fall back to CPU on NPU. Check logs for fallback warnings and update CANN to the latest version.

Out-of-Tree Hardware Backends

Out-of-tree hardware backends can customize the diffusion engine without forking core code by subclassing OmniPlatform and overriding extension points:

| Method | Returns | Purpose | |--------|---------|---------| | get_diffusion_worker_cls() | Fully qualified class path | Custom diffusion worker | | get_diffusion_model_runner_cls() | Fully qualified class path | Custom model runner |

Use OmniPlatformEnum.OOT with is_out_of_tree() for platform detection. Register custom diffusion pipelines via register_diffusion_model(model_arch, module_name, class_name) in vllm_omni.diffusion.registry. Defaults preserve existing behavior for CUDA, ROCm, NPU, XPU, and MUSA. NPU LaserAttention unsupported error: On Ascend NPU with mindiesd, selecting FLASH_ATTN as the diffusion attention backend (--diffusion-attn-backend FLASH_ATTN) auto-imports mindiesd to configure ASCEND_CUSTOM_OPP_PATH. The internal environment variable MINDIE_SD_FA_TYPE is set to ascend_laser_attention automatically. Fixed in #2674.

References

For CUDA-specific optimization, see references/cuda.md
For ROCm setup details, see references/rocm.md
For NPU configuration, see references/npu.md
For XPU setup, see references/xpu.md

hsliuustc0106/vllm-omni-hardware

skills/vllm-omni-hardware/SKILL.md

Configure vLLM-Omni for different hardware backends including NVIDIA CUDA, AMD ROCm, Huawei NPU, and Intel XPU. Use when selecting a hardware backend, troubleshooting GPU issues, configuring device placement, or optimizing for specific accelerators.

67 stars

development

Updated May 25, 2026

$ install --global

skillsauth

npx skillsauth add hsliuustc0106/vllm-omni-skills vllm-omni-hardware

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 25, 2026, 2:17 AM10.3s5 files scanned

SKILL.md

name:: vllm-omni-hardware
description:: Configure vLLM-Omni for different hardware backends including NVIDIA CUDA, AMD ROCm, Huawei NPU, and Intel XPU. Use when selecting a hardware backend, troubleshooting GPU issues, configuring device placement, or optimizing for specific accelerators.

vLLM-Omni Hardware Configuration

Overview

vLLM-Omni supports four hardware backends: NVIDIA CUDA (default), AMD ROCm, Huawei NPU (Ascend), and Intel XPU. Each backend has specific installation steps and configuration options.

Supported Backends

Backend Selection Workflow

Step 1: Identify Hardware

# NVIDIA GPU
nvidia-smi

# AMD GPU
rocm-smi

# Huawei NPU
npu-smi info

# Intel XPU
xpu-smi discovery

Step 2: Install for Backend

CUDA (NVIDIA):

uv pip install vllm==$VLLM_VERSION --torch-backend=auto

ROCm (AMD):

uv pip install vllm==$VLLM_VERSION --extra-index-url https://wheels.vllm.ai/rocm/$VLLM_VERSION/rocm722

NPU (Huawei):

# Requires CANN toolkit pre-installed
git clone https://github.com/vllm-project/vllm-omni.git
cd vllm-omni
pip install -e ".[npu]"

XPU (Intel):

# Requires oneAPI toolkit pre-installed
git clone https://github.com/vllm-project/vllm-omni.git
cd vllm-omni
pip install -e ".[xpu]"

Step 3: Verify Backend

import torch

# CUDA
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"GPU count: {torch.cuda.device_count()}")

# ROCm (same API as CUDA via HIP)
print(f"ROCm/HIP available: {torch.cuda.is_available()}")

# XPU
print(f"XPU available: {torch.xpu.is_available()}")

Device Selection

Control which devices vLLM-Omni uses:

# CUDA: Select specific GPUs
CUDA_VISIBLE_DEVICES=0,1 vllm serve <model> --omni

# ROCm: Select specific GPUs
HIP_VISIBLE_DEVICES=0,1 vllm serve <model> --omni

# NPU: Select specific devices
ASCEND_RT_VISIBLE_DEVICES=0,1 vllm serve <model> --omni

Model Support by Backend

Not all models are supported on every backend. Check the support matrix:

Troubleshooting

CUDA out of memory: Reduce --gpu-memory-utilization or use tensor parallelism across multiple GPUs.

ROCm kernel compilation slow: First launch compiles kernels for your GPU. Subsequent launches reuse cached kernels. Set MIOPEN_USER_DB_PATH for persistent kernel cache.

ROCm Wan2.2 RoPE error (too many values to unpack): Fixed in #3463. Added dedicated forward_hip() RoPE path for ROCm using flash attention rotary embedding.

NPU operator not supported: Some operations fall back to CPU on NPU. Check logs for fallback warnings and update CANN to the latest version.

Out-of-Tree Hardware Backends

Out-of-tree hardware backends can customize the diffusion engine without forking core code by subclassing OmniPlatform and overriding extension points:

References

For CUDA-specific optimization, see references/cuda.md
For ROCm setup details, see references/rocm.md
For NPU configuration, see references/npu.md
For XPU setup, see references/xpu.md

Related Skills

hsliuustc0106/vllm-omni-pre-check

development

VerifiedTrustedCommunity

Use before submitting a PR to vllm-project/vllm-omni — self-check the branch against project conventions, catch dead code, verify accuracy/performance claims, and confirm merge readiness. Use when the user says "pre-check", "self review", "pre-submit check", or "check my PR before I open it."

69SKILL.mdUpdated May 29, 2026

hsliuustc0106/vllm-omni-pre-check

hsliuustc0106/skills/vllm-omni-test-report

development

VerifiedTrustedCommunity

--- name: vllm-omni-test-report description: Two report kinds; **default output is always HTML** unless the user explicitly asks for Markdown (.md). **Release** — `scripts/compose_full_report.py` (**测试结论**, Buildkite metrics, **Test Result** = Common stack + optional `--log-dir-h*` nightly-style summaries + H100/CI block, **Issue tracking** = GitHub `ci-failure` + *local test* in:title, Open bugs); use `--format markdown` only when the user wants .md or `patch_report_*.py`. **Nightly** — `script

69SKILL.mdUpdated May 3, 2026

hsliuustc0106/skills/vllm-omni-test-report

hsliuustc0106/vllm-omni-review

testing

VerifiedTrustedCommunity

Review PRs on vllm-project/vllm-omni by routing to the right domain skills, checking critical evidence, and focusing comments on blocking issues. Use when reviewing pull requests or local branches, triaging review depth, running detailed or default review, or checking tests, benchmarks, and breaking changes in vllm-omni.

69SKILL.mdUpdated May 3, 2026

hsliuustc0106/vllm-omni-review

hsliuustc0106/vllm-omni-video-gen

data-ai

VerifiedTrustedCommunity

Generate videos with vLLM-Omni using Wan2.2 and other video generation models. Use when generating videos from text, creating videos from images, configuring video generation parameters, or working with text-to-video or image-to-video models.

67SKILL.mdUpdated May 3, 2026

hsliuustc0106/vllm-omni-video-gen

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/hsliuustc0106/vllm-omni-skills.git

# Copy into Claude Code skills folder (global)
cp -r vllm-omni-skills/skills/vllm-omni-hardware ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

hsliuustc0106/vllm-omni-skills

67 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT