skills/vllm-omni-review/SKILL.md
Review PRs on vllm-project/vllm-omni by routing to the right domain skills, checking critical evidence, and focusing comments on blocking issues. Use when reviewing pull requests or local branches, triaging review depth, running detailed or default review, or checking tests, benchmarks, and breaking changes in vllm-omni.
npx skillsauth add hsliuustc0106/vllm-omni-skills vllm-omni-reviewInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Review PRs like a real maintainer — direct, selective, and focused on high-signal issues. Prioritize 2-3 real problems per PR over exhaustive coverage. Most PRs should get 1-5 short comments; some just get an empty APPROVE.
Use this skill as a router for vllm-project/vllm-omni pull request reviews. Keep the default context small, load only the references that match the diff, and prioritize high-confidence findings over coverage theater.
Inspired by common PR-review skill patterns (e.g. explicit modes + tool choice); repo is always vllm-project/vllm-omni unless the user says otherwise.
| Mode | What to do |
|------|------------|
| No PR / branch given | Do not start a full review. Ask for a PR number or URL, or local branch review vs main (or named base). |
| PR number or URL | Use gh against this repo: gh pr view <n> --repo vllm-project/vllm-omni, gh pr diff <n> --repo vllm-project/vllm-omni. Commands and JSON fields: references/review-execution.md. |
| Local branch | No GitHub PR yet: git fetch if needed, then git diff <base>...HEAD, git diff --stat <base>...HEAD, git log <base>..HEAD --oneline. Use branch name instead of PR # in the review header; same blocker scan and routing. |
| Pre-filled prompt | If the prompt already includes PR title/body, checks, or thread summaries (e.g. from GitHub), do not re-fetch metadata unless something is missing; still obtain the diff if not present (gh pr diff or git diff against merge base). |
Depth: Default = maintainer-style brevity (comment budget). If the user asks for detailed / in-depth / line-by-line, add a Specific comments list with path:line items; do not duplicate those points as long prose in the review body. Still respect the usual inline ceiling unless the user explicitly wants an audit-style pass.
Parallel investigation: Large diffs or multiple subsystems (e.g. entrypoints/ + engine/ + diffusion/) → split by directory or concern and investigate in parallel when subagents exist.
Subagent context compression: Never fetch a 500-line source file into your main context. Instead, delegate to a subagent with a specific question (e.g. "What does _resolve_ref_audio return?") and get back a compact summary. This is the single biggest token saver — a subagent burns its own context, you get back a paragraph.
When to use subagents vs direct fetch:
| Situation | Open |
|-----------|------|
| Every review | references/review-execution.md — gates, gh commands, comment budget, tone, incremental posting, batch/CI triage, Python style flags |
| Verification (hardware) | references/verification.md — checkout, test run, E2E smoke, claim verification |
| Prefix / multi-skill / hardware guess | references/review-routing.md |
| Blocker scan details + merge-blocking patterns | references/blocker-patterns.md — Part 1 patterns; Part 2 = former “pitfalls” (footguns, MRO, connectors, async, etc.) |
| System layout + code-pattern review (async, connectors, validation, …) | references/architecture.md — includes “Code patterns for review” at the end |
| Diffusion / image / video model PRs | references/diffusion-checklist.md |
| New model / omni pipeline PRs (TTS, audio, multimodal) | references/model-addition-checklist.md — profiling + baseline comparison (blocking gate), dead-code scan, description/diff integrity, copy-paste detection, registry consistency |
| High-risk change; need coverage matrix / docs sync | references/tests-docs-checklist.md |
| Calibrating phrasing from real maintainers | references/maintainer-style-study.md |
Legacy paths (do not load — content merged): pitfalls.md → blocker-patterns.md Part 2; code-patterns.md → architecture.md Code patterns for review; python-style-guide.md → review-execution.md Python style (review flags); batch/CI triage → review-execution.md (Batch / CI sections).
If context is limited, prioritize: blocker scan → evidence → domain routing → verdict.
Always run the blocker scan. Under context pressure, do a shallow scan of the most critical categories (Correctness, Security) and flag that the scan was incomplete.
Check whether this PR is still a draft or WIP in the PR title, if so, end the review process.
Token budget principle: Post inline comments as you find them. Use subagents for codebase investigation. Load references only after skimming the diff. If you're past ~60% context and haven't posted comments, wrap up and post what you have — partial review posted is better than a perfect review lost.
If this is a ready-to-review PR, check the mergeability and required checks (DCO, pre-commit, mergeability). If failing, stop and ask the author to fix gates before proceeding.
For gate commands, review submission, and comment style, see references/review-execution.md.
Then continue with the workflow below.
Diff-first loading: Fetch the diff and PR metadata first. Skim the diff to understand what subsystems are touched, then load ONLY the references that match. Do not load references speculatively.
Fetch:
[Bugfix] and [Feature] PRs only when conventions are unclearGroup changes by kind (runtime code, tests, docs, configs) to see where risk sits; then load references (Step 4) only for areas touched.
Do not fetch broad extra context unless the diff leaves real ambiguity.
Execute this scan before any other review activity. For each category, explicitly mark PASS or list blocking issues found.
BLOCKER scan:
| Category | Result |
|---------------------|-----------------------------------------|
| Correctness | PASS / ISSUES: (list) |
| Reliability/Safety | PASS / ISSUES: (list) |
| Breaking Changes | PASS / (check PR description first) |
| Test Coverage | PASS / (check PR desc for evidence) / needs tests |
| Documentation | PASS / ISSUES: (list) |
| Security | PASS / ISSUES: (list) |
Blocker categories:
| Category | Flag These Patterns | |----------|---------------------| | Correctness | Silent exception swallows, uninitialized variables, off-by-one errors, logic inversions, missing returns | | Reliability/Safety | Unclosed resources, race conditions, missing None checks, hardcoded timeouts, silent fallbacks | | Breaking Changes | Signature changes without compat, removed public APIs, changed defaults, config removals | | Test Coverage | Bug fix without regression test, new API without tests, performance claims without benchmarks | | Documentation | New public API without docs, breaking changes without migration guide, new config without docs | | Security | Hardcoded secrets, user input in eval/format strings, insecure deserialization |
Evidence standard: Code inspection suffices for code-level blockers. For test coverage, require CI logs or PR description evidence.
Confidence threshold: Flag obvious cases only. For suspicious but uncertain cases, add a non-blocking comment.
Special cases: | PR Type | Action | |---------|--------| | Doc-only PRs | Skip categories 1-4 and 6, proceed to 5 (Documentation) | | Config-only PRs | Focus on Breaking Changes + Documentation | | Test-only PRs | Focus on Correctness of test logic | | Draft PRs | Do not block; add a single non-blocking comment: "Ready for full review when draft status removed. Preliminary scan available on request." |
For detailed anti-patterns with code examples, see references/blocker-patterns.md.
If blockers found: Track issues internally (category + file + line). Do not paste structured BLOCKING ISSUES: templates into the review body.
If no blockers: List non-blocking suggestions and proceed to Step 3.
When hardware access (SSH/server/GPU) is available, verify the PR works — not just that it looks correct. Bugs found during verification are blocking. See references/verification.md for detailed procedures.
gh pr checkout <n> on a server with the appropriate GPU/modelpytest -m "core_model" for the changed area)If no hardware access, skip to Step 4.
Use the title prefix and changed directories to decide whether a domain skill is required. Doc-only, config-only, and test-only PRs usually skip domain skills unless the diff crosses into model or API areas.
Full prefix table, multi-skill combos, hardware detection, and delegation triggers: references/review-routing.md.
Use only the files in this table. Older docs may mention references/pitfalls.md or references/code-patterns.md; those files were removed — use Part 2 of blocker patterns and Code patterns for review in architecture instead.
| Diff Area | Load |
| ----------------------------------------------------------------------------------------- | ---------------------------------------------------------- |
| vllm_omni/engine/, vllm_omni/stages/, vllm_omni/connectors/, vllm_omni/diffusion/ | blocker-patterns.md Part 2 (common pitfalls) |
| New model under vllm_omni/model_executor/models/ ([Model] or un-prefixed new-model PR) | model-addition-checklist.md — blocking gate: profiling + baseline comparison, then dead-code, copy-paste, registry/config consistency |
| Async, distributed coordination, validation, connector behavior | architecture.md — section Code patterns for review (at end of file) |
| Scheduler, stage boundaries, execution model, critical paths | Architecture (full) |
| High-risk changes (core logic, configs/params, error handling, concurrency/distributed, I/O) or [Feature] / [Bugfix] PRs | references/tests-docs-checklist.md |
Pick the narrowest references that match the diff; avoid loading every row by default.
When tests or benchmarks are missing and PR description evidence is insufficient, ask for specific evidence:
| Change Type | Minimum Evidence to Request | |-------------|-----------------------------| | New model | Profiling trace (kernel + memory) + baseline comparison vs official repo/HF/diffusers (see blocking gate in model-addition-checklist) | | API behavior | Functional tests covering success + invalid input + response contract | | Model execution | Inference correctness tests comparing outputs against baseline | | Performance optimization | A/B benchmark (baseline vs PR) + accuracy comparison on same hardware. Mean ± stddev over ≥3 runs, warmup excluded. Accuracy regression is blocking. | | New feature (performance-affecting) | Performance comparison test: baseline vs. with change (latency, throughput, VRAM) | | Memory management | Peak memory measurement showing no regression | | Bug fixes | Regression test that reproduces the original bug |
For [Feature] PRs affecting performance or [Performance] PRs, use the checklist in references/tests-docs-checklist.md section 5.
Be explicit in review comments. Treat "manual verification only" as insufficient unless automation is genuinely impossible.
When to activate: [Performance] / [Perf] prefix, or quantitative perf/accuracy claims in PR body, or Step 5 flagged missing benchmarks.
Core rule: Every perf PR must provide A/B results for BOTH perf AND accuracy. Speedup with silent quality degradation is a blocker.
Load references/perf-verification.md for full workflow, regression rules, git worktree setup, and evidence templates.
Quick check (reviewer asks contributor):
Regression rules: Accuracy degradation = blocking. VRAM > 5% regression = blocking. Latency > 10% regression = must explain.
Independent verification (when hardware available): git worktrees (main vs PR), fresh venvs, A/B run with 20-min timeout per branch. Deliver Claimed vs Measured report locally; ask before posting to PR.
When to activate: PR adds or modifies test files, or PR touches core code (engine/, stages/, connectors/) without adding tests, or PR is test-only.
Load references/test-quality-evaluation.md.
Workflow:
perf-verification.md)pytest with --run-level core_model by default; use advanced_model only if hardware is sufficientGraceful degradation:
| Level | Condition | What happens | |-------|-----------|-------------| | Full analysis | Hardware matches test markers | Static + runtime execution | | Static-only | Hardware doesn't match or no GPU | Static analysis only; report which tests were skipped |
Delivery: Local assessment first, ask user before posting. Convert worst 1-2 findings to inline comments (counts against comment budget). If D-grade dimension or code bug found, escalate to REQUEST_CHANGES via Step 9.
Post inline comments directly to GitHub as you find them. Do not accumulate comments for a batch post at the end. Each gh api call posts one or more comments immediately. If context runs out mid-review, the comments already posted are safe on GitHub.
Posting strategy:
Do not submit a review event (APPROVE / COMMENT / REQUEST_CHANGES) — leave the verdict decision to the user.
Summarize locally:
Recommended verdict mapping:
APPROVE — no blockers; body optional (empty is fine).COMMENT — suggestions only; body optional (~50% should be empty).REQUEST_CHANGES — genuine blocking issues only (crashes, data loss, security, policy gates).For tone and inline style, see references/review-execution.md. For maintainer phrasing samples, see references/maintainer-style-study.md.
Trust PR description and CI evidence before demanding new tests. Prefer regression tests for [Bugfix], contract tests for API changes, and scan engine → connectors → stages → entrypoints before peripheral files. Skip style nits unless they mask correctness.
Fetch more context when:
Keep additional fetches narrow and tied to a specific uncertainty.
When: Title prefix [Model], [New Model], [Image], [ImageGen], [Video], [VideoGen], or [Diffusion], or a new diffusion model path under vllm_omni/diffusion/.
Load references/diffusion-checklist.md instead of expanding the full workflow here: Dimensions 1–4 (including optional canonical PR body template + measurement guidance for Dimension 1), gh/grep detection commands, Quick Red Flags, combined-feature rules.
Gate order before approve: PR body evidence (script, samples, latency, VRAM) → offline and online paths → at least one acceleration and one memory optimization (new models) → docs tables + usage examples → required e2e online test → offline test if no e2e → combined test when 2+ accel/memory features.
Use Quick Red Flags actions for comment severity; write human-shaped GitHub comments, not giant pasted templates.
For daily sessions (reply-first, PR selection, pacing, re-review): references/review-execution.md — sections Batch and daily review sessions and related scripts.
All paths are under skills/vllm-omni-review/references/. There is no pitfalls.md, code-patterns.md, or python-style-guide.md — see the legacy map in Which reference to load above.
gh fetch/submit, comment budget, tone, Python style (review flags) (imports, naming, common flags — use to avoid nit wars), batch session, CI log triagedevelopment
Use before submitting a PR to vllm-project/vllm-omni — self-check the branch against project conventions, catch dead code, verify accuracy/performance claims, and confirm merge readiness. Use when the user says "pre-check", "self review", "pre-submit check", or "check my PR before I open it."
development
--- name: vllm-omni-test-report description: Two report kinds; **default output is always HTML** unless the user explicitly asks for Markdown (.md). **Release** — `scripts/compose_full_report.py` (**测试结论**, Buildkite metrics, **Test Result** = Common stack + optional `--log-dir-h*` nightly-style summaries + H100/CI block, **Issue tracking** = GitHub `ci-failure` + *local test* in:title, Open bugs); use `--format markdown` only when the user wants .md or `patch_report_*.py`. **Nightly** — `script
data-ai
Generate videos with vLLM-Omni using Wan2.2 and other video generation models. Use when generating videos from text, creating videos from images, configuring video generation parameters, or working with text-to-video or image-to-video models.
testing
Install and configure vLLM-Omni for omni-modality model inference. Use when setting up vllm-omni, configuring the environment, installing dependencies, resolving GPU driver issues, or preparing a machine for model serving.