hsliuustc0106

24 verified skills1,536 total stars

vllm-omni-pre-check

Use before submitting a PR to vllm-project/vllm-omni — self-check the branch against project conventions, catch dead code, verify accuracy/performance claims, and confirm merge readiness. Use when the user says "pre-check", "self review", "pre-submit check", or "check my PR before I open it."

development69

vllm-omni-review

Review PRs on vllm-project/vllm-omni by routing to the right domain skills, checking critical evidence, and focusing comments on blocking issues. Use when reviewing pull requests or local branches, triaging review depth, running detailed or default review, or checking tests, benchmarks, and breaking changes in vllm-omni.

testing69

skills/vllm-omni-test-report

--- name: vllm-omni-test-report description: Two report kinds; **default output is always HTML** unless the user explicitly asks for Markdown (.md). **Release** — `scripts/compose_full_report.py` (**测试结论**, Buildkite metrics, **Test Result** = Common stack + optional `--log-dir-h*` nightly-style summaries + H100/CI block, **Issue tracking** = GitHub `ci-failure` + *local test* in:title, Open bugs); use `--format markdown` only when the user wants .md or `patch_report_*.py`. **Nightly** — `script

development69

vllm-omni-hardware

Configure vLLM-Omni for different hardware backends including NVIDIA CUDA, AMD ROCm, Huawei NPU, and Intel XPU. Use when selecting a hardware backend, troubleshooting GPU issues, configuring device placement, or optimizing for specific accelerators.

development67

vllm-omni-distributed

Scale vLLM-Omni across multiple GPUs and nodes using tensor parallelism, pipeline parallelism, OmniConnector disaggregation, connector backends, and Ray. Use when setting up multi-GPU inference, distributing model execution across machines, deploying disaggregated execution, developing OmniConnector backends, or scaling inference horizontally.

development67

vllm-omni-audio-tts

Generate audio and speech with vLLM-Omni using Qwen3-TTS, Fish Speech S2 Pro, CosyVoice3, MiMo-Audio, and Stable-Audio models. Use when synthesizing speech from text, generating audio effects or music, configuring TTS parameters, cloning voices, adding new TTS models, or working with text-to-speech models.

data-ai67

vllm-omni-perf

Optimize vLLM-Omni performance through benchmarking, TeaCache, Cache-DiT, quantization, CPU offloading, and parallelism tuning. Use when improving inference speed, reducing latency, lowering memory usage, running benchmarks, or enabling diffusion acceleration.

testing67

vllm-omni-image-gen

Generate and edit images with vLLM-Omni using models like FLUX, Stable Diffusion 3, Qwen-Image, GLM-Image, BAGEL, and Z-Image. Use when generating images from text, editing images, configuring diffusion parameters, or working with image generation models.

data-ai67

vllm-omni-quantization

Use when working on vLLM-Omni quantization for autoregressive, diffusion, or multi-stage omni models, choosing methods such as `awq`, `gptq`, `fp8`, `int8`, `gguf`, or ModelOpt checkpoints, adding quantized model support, or debugging memory, loader, quality, or performance issues.

development67

vllm-omni-serving

Launch and configure vLLM-Omni API servers for production model serving. Use when starting a model server, configuring stage pipelines, setting up GPU memory, enabling optimizations, or deploying models behind a load balancer.

development67

vllm-omni-video-gen

Generate videos with vLLM-Omni using Wan2.2 and other video generation models. Use when generating videos from text, creating videos from images, configuring video generation parameters, or working with text-to-video or image-to-video models.

data-ai67

vllm-omni-setup

Install and configure vLLM-Omni for omni-modality model inference. Use when setting up vllm-omni, configuring the environment, installing dependencies, resolving GPU driver issues, or preparing a machine for model serving.

testing67

vllm-omni-contrib

Contribute to vLLM-Omni by adding new model support, fixing bugs, or improving features. Use when integrating a new model into vllm-omni, setting up a development environment, writing tests, or submitting pull requests to the vllm-omni project.

development67

vllm-omni-tts-integration

Integrate a new text-to-speech model into vLLM-Omni from HuggingFace reference implementation through production-ready serving with streaming and CUDA graph acceleration. Use when adding a new TTS model, wiring stage separation for speech synthesis, enabling online voice generation serving, debugging TTS integration behavior, or building audio output pipelines.

development64

skills/vllm-omni-nightly-local

--- name: vllm-omni-nightly-local description: On HK - SSH, Slurm, non-interactive docker exec (bash -lc): **`source /rebase/.venv/bin/activate`** inside the container before repo commands, then run `tools/nightly/run_nightly_jobs.sh` and write logs under logs/nightly_jobs. Sync logs and optional logs/nightly_perf_manual.xlsx to your laptop, then use vllm-omni-test-report report kind nightly + scripts/nightly_local_log_report.py — **default output HTML** (`--html-report`) unless the user explici

tools64

add-diffusion-model

Add a new diffusion model (text-to-image, text-to-video, image-to-video, text-to-audio, image editing) to vLLM-Omni, including Cache-DiT acceleration and parallelism support (TP, SP/USP, CFG-Parallel, HSDP). Use when integrating a new diffusion model, porting a diffusers pipeline or a custom model repo to vllm-omni, creating a new DiT transformer adapter, adding diffusion model support, or enabling multi-GPU parallelism and cache acceleration for an existing model.

devops59

vllm-omni-multimodal

Transcribe speech, generate images from prompts, analyze video content, and convert between modalities using multimodal omni-modality models like Qwen2.5-Omni and Qwen3-Omni. Use when working with multimodal models for speech recognition, image generation, video understanding, voice synthesis, or any task combining text, image, audio, and video inputs and outputs simultaneously.

data-ai59

vllm-omni-recipe

Use when adding a recipe for omnimodal models (text-to-image, text-to-video, text-to-audio, image-to-video, any-to-any, diffusion transformers) to the vLLM recipes repository, or documenting vLLM-Omni deployment

testing59

vllm-omni-review-miner

Scan pull requests of a specific type (e.g., new model support) from vLLM and vLLM-Omni repos, extract code review patterns and suggestions, then generate a specialized review automation skill. Use when learning from historical pull request reviews, building domain-specific code review expertise, or automating review pattern extraction.

tools59

vllm-omni-release-note-writer

Use when drafting or editing release notes for vllm-project/vllm-omni, especially when summarizing changes between tags, organizing highlights, and matching the style of recent vLLM-Omni releases

testing59

vllm-omni-api

Integrate with vLLM-Omni using the OpenAI-compatible API for text, image, video, and audio generation. Use when building client applications, calling vllm-omni endpoints, sending requests to the API server, or integrating vllm-omni into an application.

tools59

vllm-omni-diffusion-benchmark-profile

Benchmark, profile, and tune vLLM-Omni diffusion models, especially Wan/Qwen/Helios style pipelines on GPU or NPU. Use this when measuring denoise latency, collecting torch or torch_npu profiler traces, reading ASCEND_PROFILER_OUTPUT artifacts, comparing before/after performance, or diagnosing bottlenecks such as SP communication, VAE convs, data transforms, offload overlap, and RoPE overhead.

testing59

vllm-omni-diffusion-perf-optim

Guide for achieving optimal inference performance with vLLM-Omni diffusion models. Covers all lossless and lossy optimization methods (parallelism, torch.compile, CPU offload, quantization, cache acceleration), per-model support tables, and ready-to-use recipes. Use when asked to speed up diffusion inference, reduce latency, lower VRAM usage, or tune a diffusion pipeline.

development59

vllm-omni-cicd

Set up CI/CD pipelines for vLLM-Omni model deployments including Docker builds, automated testing, rolling updates, and deployment validation. Use when creating deployment pipelines, automating model serving updates, setting up Docker workflows, or configuring GitHub Actions for vllm-omni.

development59

hsliuustc0106

vllm-omni-pre-check

vllm-omni-review

skills/vllm-omni-test-report

vllm-omni-hardware

vllm-omni-distributed

vllm-omni-audio-tts

vllm-omni-perf

vllm-omni-image-gen

vllm-omni-quantization

vllm-omni-serving

vllm-omni-video-gen

vllm-omni-setup

vllm-omni-contrib

vllm-omni-tts-integration

skills/vllm-omni-nightly-local

add-diffusion-model

vllm-omni-multimodal

vllm-omni-recipe

vllm-omni-review-miner

vllm-omni-release-note-writer

vllm-omni-api

vllm-omni-diffusion-benchmark-profile

vllm-omni-diffusion-perf-optim

vllm-omni-cicd

Adoption

hsliuustc0106

vllm-omni-pre-check

vllm-omni-review

skills/vllm-omni-test-report

vllm-omni-hardware

vllm-omni-distributed

vllm-omni-audio-tts

vllm-omni-perf

vllm-omni-image-gen

vllm-omni-quantization

vllm-omni-serving

vllm-omni-video-gen

vllm-omni-setup

vllm-omni-contrib

vllm-omni-tts-integration

skills/vllm-omni-nightly-local

add-diffusion-model

vllm-omni-multimodal

vllm-omni-recipe

vllm-omni-review-miner

vllm-omni-release-note-writer

vllm-omni-api

vllm-omni-diffusion-benchmark-profile

vllm-omni-diffusion-perf-optim

vllm-omni-cicd