vLLM-Omni PR Review

Overview

Review PRs like a real maintainer — direct, selective, and focused on high-signal issues. Prioritize 2-3 real problems per PR over exhaustive coverage. Most PRs should get 1-5 short comments; some just get an empty APPROVE.

Use this skill as a router for vllm-project/vllm-omni pull request reviews. Keep the default context small, load only the references that match the diff, and prioritize high-confidence findings over coverage theater.

Usage modes

Inspired by common PR-review skill patterns (e.g. explicit modes + tool choice); repo is always vllm-project/vllm-omni unless the user says otherwise.

| Mode | What to do | |------|------------| | No PR / branch given | Do not start a full review. Ask for a PR number or URL, or local branch review vs main (or named base). | | PR number or URL | Use gh against this repo: gh pr view <n> --repo vllm-project/vllm-omni, gh pr diff <n> --repo vllm-project/vllm-omni. Commands and JSON fields: references/review-execution.md. | | Local branch | No GitHub PR yet: git fetch if needed, then git diff <base>...HEAD, git diff --stat <base>...HEAD, git log <base>..HEAD --oneline. Use branch name instead of PR # in the review header; same blocker scan and routing. | | Pre-filled prompt | If the prompt already includes PR title/body, checks, or thread summaries (e.g. from GitHub), do not re-fetch metadata unless something is missing; still obtain the diff if not present (gh pr diff or git diff against merge base). |

Depth: Default = maintainer-style brevity (comment budget). If the user asks for detailed / in-depth / line-by-line, add a Specific comments list with path:line items; do not duplicate those points as long prose in the review body. Still respect the usual inline ceiling unless the user explicitly wants an audit-style pass.

What to prioritize (CI-complement)

Investigate, don’t guess — If a checklist item might apply (e.g. connector lifecycle, diffusion cache), read surrounding code or use parallel subagents for independent areas. A wrong guess is worse than silence.
Focus on what automation doesn’t prove — Design, API contracts, stage/connector invariants, test adequacy for omni paths, breaking behavior. Do not re-argue formatting or issues pre-commit / CI already failed (point at the gate instead).
Structured notes — If you produce a multi-section writeup for the user, omit sections with no findings (no “No issues” padding).

Parallel investigation: Large diffs or multiple subsystems (e.g. entrypoints/ + engine/ + diffusion/) → split by directory or concern and investigate in parallel when subagents exist.

Subagent context compression: Never fetch a 500-line source file into your main context. Instead, delegate to a subagent with a specific question (e.g. "What does _resolve_ref_audio return?") and get back a compact summary. This is the single biggest token saver — a subagent burns its own context, you get back a paragraph.

When to use subagents vs direct fetch:

Reading surrounding code to verify a pattern → subagent (returns summary)
Reading a reference file you need for the blocker scan → direct (you need the full patterns)
Fetching PR metadata / diff → direct (small, needed in main context)
Verifying function signatures, return types, class hierarchies → subagent

Which reference to load (do not load everything)

| Situation | Open | |-----------|------| | Every review | references/review-execution.md — gates, gh commands, comment budget, tone, incremental posting, batch/CI triage, Python style flags | | Verification (hardware) | references/verification.md — checkout, test run, E2E smoke, claim verification | | Prefix / multi-skill / hardware guess | references/review-routing.md | | Blocker scan details + merge-blocking patterns | references/blocker-patterns.md — Part 1 patterns; Part 2 = former “pitfalls” (footguns, MRO, connectors, async, etc.) | | System layout + code-pattern review (async, connectors, validation, …) | references/architecture.md — includes “Code patterns for review” at the end | | Diffusion / image / video model PRs | references/diffusion-checklist.md | | New model / omni pipeline PRs (TTS, audio, multimodal) | references/model-addition-checklist.md — profiling + baseline comparison (blocking gate), dead-code scan, description/diff integrity, copy-paste detection, registry consistency | | High-risk change; need coverage matrix / docs sync | references/tests-docs-checklist.md | | Calibrating phrasing from real maintainers | references/maintainer-style-study.md |

Legacy paths (do not load — content merged): pitfalls.md → blocker-patterns.md Part 2; code-patterns.md → architecture.md Code patterns for review; python-style-guide.md → review-execution.md Python style (review flags); batch/CI triage → review-execution.md (Batch / CI sections).

Priority Hierarchy Under Context Pressure

If context is limited, prioritize: blocker scan → evidence → domain routing → verdict.

Always run the blocker scan. Under context pressure, do a shallow scan of the most critical categories (Correctness, Security) and flag that the scan was incomplete.

Core Workflow

Check whether this PR is still a draft or WIP in the PR title, if so, end the review process.

Token budget principle: Post inline comments as you find them. Use subagents for codebase investigation. Load references only after skimming the diff. If you're past ~60% context and haven't posted comments, wrap up and post what you have — partial review posted is better than a perfect review lost.

Step 0: Verify Review Gates First

If this is a ready-to-review PR, check the mergeability and required checks (DCO, pre-commit, mergeability). If failing, stop and ask the author to fix gates before proceeding.

For gate commands, review submission, and comment style, see references/review-execution.md.

Then continue with the workflow below.

Step 1: Fetch Diff First, Then Decide What to Load

Diff-first loading: Fetch the diff and PR metadata first. Skim the diff to understand what subsystems are touched, then load ONLY the references that match. Do not load references speculatively.

Fetch:

PR metadata and changed files
The diff
Linked issues for [Bugfix] and [Feature] PRs only when conventions are unclear
Related PRs only when conventions or prior decisions are unclear

Group changes by kind (runtime code, tests, docs, configs) to see where risk sits; then load references (Step 4) only for areas touched.

Do not fetch broad extra context unless the diff leaves real ambiguity.

Step 2: Blocker Scan (Required First)

Execute this scan before any other review activity. For each category, explicitly mark PASS or list blocking issues found.

BLOCKER scan:
| Category            | Result                                  |
|---------------------|-----------------------------------------|
| Correctness         | PASS / ISSUES: (list)                   |
| Reliability/Safety  | PASS / ISSUES: (list)                   |
| Breaking Changes    | PASS / (check PR description first)     |
| Test Coverage       | PASS / (check PR desc for evidence) / needs tests |
| Documentation       | PASS / ISSUES: (list)                   |
| Security            | PASS / ISSUES: (list)                   |

Blocker categories:

| Category | Flag These Patterns | |----------|---------------------| | Correctness | Silent exception swallows, uninitialized variables, off-by-one errors, logic inversions, missing returns | | Reliability/Safety | Unclosed resources, race conditions, missing None checks, hardcoded timeouts, silent fallbacks | | Breaking Changes | Signature changes without compat, removed public APIs, changed defaults, config removals | | Test Coverage | Bug fix without regression test, new API without tests, performance claims without benchmarks | | Documentation | New public API without docs, breaking changes without migration guide, new config without docs | | Security | Hardcoded secrets, user input in eval/format strings, insecure deserialization |

Evidence standard: Code inspection suffices for code-level blockers. For test coverage, require CI logs or PR description evidence.

Confidence threshold: Flag obvious cases only. For suspicious but uncertain cases, add a non-blocking comment.

Special cases: | PR Type | Action | |---------|--------| | Doc-only PRs | Skip categories 1-4 and 6, proceed to 5 (Documentation) | | Config-only PRs | Focus on Breaking Changes + Documentation | | Test-only PRs | Focus on Correctness of test logic | | Draft PRs | Do not block; add a single non-blocking comment: "Ready for full review when draft status removed. Preliminary scan available on request." |

For detailed anti-patterns with code examples, see references/blocker-patterns.md.

If blockers found: Track issues internally (category + file + line). Do not paste structured BLOCKING ISSUES: templates into the review body.

If no blockers: List non-blocking suggestions and proceed to Step 3.

Step 3: Run Verification (Blocking)

When hardware access (SSH/server/GPU) is available, verify the PR works — not just that it looks correct. Bugs found during verification are blocking. See references/verification.md for detailed procedures.

Checkout — gh pr checkout <n> on a server with the appropriate GPU/model
Unit tests — run affected tests at minimum (e.g. pytest -m "core_model" for the changed area)
E2E smoke — for bench/tool/metric PRs, start a server and run a quick smoke test (10 prompts, low concurrency) to verify new outputs appear and the tool doesn't crash
Compare claims — check PR claims (metrics appear, values sensible, no crashes) against actual output
Report — post findings as a PR comment; bugs found are blocking

If no hardware access, skip to Step 4.

Step 4: Route to the Right Skill

Use the title prefix and changed directories to decide whether a domain skill is required. Doc-only, config-only, and test-only PRs usually skip domain skills unless the diff crosses into model or API areas.

Full prefix table, multi-skill combos, hardware detection, and delegation triggers: references/review-routing.md.

Step 5: Load Only the Relevant Review Reference

Use only the files in this table. Older docs may mention references/pitfalls.md or references/code-patterns.md; those files were removed — use Part 2 of blocker patterns and Code patterns for review in architecture instead.

| Diff Area | Load | | ----------------------------------------------------------------------------------------- | ---------------------------------------------------------- | | vllm_omni/engine/, vllm_omni/stages/, vllm_omni/connectors/, vllm_omni/diffusion/ | blocker-patterns.md Part 2 (common pitfalls) | | New model under vllm_omni/model_executor/models/ ([Model] or un-prefixed new-model PR) | model-addition-checklist.md — blocking gate: profiling + baseline comparison, then dead-code, copy-paste, registry/config consistency | | Async, distributed coordination, validation, connector behavior | architecture.md — section Code patterns for review (at end of file) | | Scheduler, stage boundaries, execution model, critical paths | Architecture (full) | | High-risk changes (core logic, configs/params, error handling, concurrency/distributed, I/O) or [Feature] / [Bugfix] PRs | references/tests-docs-checklist.md |

Pick the narrowest references that match the diff; avoid loading every row by default.

Step 6: Ask for Concrete Validation Evidence

When tests or benchmarks are missing and PR description evidence is insufficient, ask for specific evidence:

| Change Type | Minimum Evidence to Request | |-------------|-----------------------------| | New model | Profiling trace (kernel + memory) + baseline comparison vs official repo/HF/diffusers (see blocking gate in model-addition-checklist) | | API behavior | Functional tests covering success + invalid input + response contract | | Model execution | Inference correctness tests comparing outputs against baseline | | Performance optimization | A/B benchmark (baseline vs PR) + accuracy comparison on same hardware. Mean ± stddev over ≥3 runs, warmup excluded. Accuracy regression is blocking. | | New feature (performance-affecting) | Performance comparison test: baseline vs. with change (latency, throughput, VRAM) | | Memory management | Peak memory measurement showing no regression | | Bug fixes | Regression test that reproduces the original bug |

For [Feature] PRs affecting performance or [Performance] PRs, use the checklist in references/tests-docs-checklist.md section 5.

Be explicit in review comments. Treat "manual verification only" as insufficient unless automation is genuinely impossible.

Step 7: Verify Perf/Accuracy Claims — A/B Required (Blocking)

When to activate: [Performance] / [Perf] prefix, or quantitative perf/accuracy claims in PR body, or Step 5 flagged missing benchmarks.

Core rule: Every perf PR must provide A/B results for BOTH perf AND accuracy. Speedup with silent quality degradation is a blocker.

Load references/perf-verification.md for full workflow, regression rules, git worktree setup, and evidence templates.

Quick check (reviewer asks contributor):

Perf A/B — Latency, throughput, VRAM: baseline (main) vs PR, same HW/inputs. Mean ± stddev over ≥3 runs, warmup excluded.
Accuracy A/B — Quantitative metric or side-by-side samples. Same seeds/inputs, baseline vs PR.
Environment — HW, software versions, model, config, exact commands.

Regression rules: Accuracy degradation = blocking. VRAM > 5% regression = blocking. Latency > 10% regression = must explain.

Independent verification (when hardware available): git worktrees (main vs PR), fresh venvs, A/B run with 20-min timeout per branch. Deliver Claimed vs Measured report locally; ask before posting to PR.

Step 8: Evaluate Test Quality (Blocking)

When to activate: PR adds or modifies test files, or PR touches core code (engine/, stages/, connectors/) without adding tests, or PR is test-only.

Load references/test-quality-evaluation.md.

Workflow:

Static analysis (always runs) — check assertion quality, anti-patterns, marker compliance, edge case coverage
Detect hardware — same detection as Step 6 (cross-referenced from perf-verification.md)
Find affected tests — map changed source files to test files via grep (not path convention)
Filter by hardware — skip tests requiring unavailable resources
Run tests — pytest with --run-level core_model by default; use advanced_model only if hardware is sufficient
Categorize failures — test bug / code bug / infrastructure / flaky
Assess quality — score assertion quality, edge case coverage, marker compliance, anti-patterns (A-D grades for internal analysis)

Graceful degradation:

| Level | Condition | What happens | |-------|-----------|-------------| | Full analysis | Hardware matches test markers | Static + runtime execution | | Static-only | Hardware doesn't match or no GPU | Static analysis only; report which tests were skipped |

Delivery: Local assessment first, ask user before posting. Convert worst 1-2 findings to inline comments (counts against comment budget). If D-grade dimension or code bug found, escalate to REQUEST_CHANGES via Step 9.

Step 9: Incremental Posting + Final Verdict

Post inline comments directly to GitHub as you find them. Do not accumulate comments for a batch post at the end. Each gh api call posts one or more comments immediately. If context runs out mid-review, the comments already posted are safe on GitHub.

Posting strategy:

After completing the blocker scan, post any blocking-issue comments immediately
As domain review surfaces issues, post each comment right away
Minor style nits can be batched (up to 3) in a single review call if they're on the same file
If you find yourself past ~60% context, stop investigating and post whatever you have

Final Verdict

Do not submit a review event (APPROVE / COMMENT / REQUEST_CHANGES) — leave the verdict decision to the user.

Summarize locally:

What was validated
What still lacks evidence
Recommended verdict (local presentation only)

Recommended verdict mapping:

APPROVE — no blockers; body optional (empty is fine).
COMMENT — suggestions only; body optional (~50% should be empty).
REQUEST_CHANGES — genuine blocking issues only (crashes, data loss, security, policy gates).

For tone and inline style, see references/review-execution.md. For maintainer phrasing samples, see references/maintainer-style-study.md.

Review Heuristics

Trust PR description and CI evidence before demanding new tests. Prefer regression tests for [Bugfix], contract tests for API changes, and scan engine → connectors → stages → entrypoints before peripheral files. Skip style nits unless they mask correctness.

When to Fetch More Context

Fetch more context when:

The diff snippet hides lifecycle or cleanup behavior
A config key or API field is introduced without nearby validation
A benchmark claim references unseen measurement code
The PR appears to rely on prior design discussion

Keep additional fetches narrow and tied to a specific uncertainty.

Diffusion Model PR Review

When: Title prefix [Model], [New Model], [Image], [ImageGen], [Video], [VideoGen], or [Diffusion], or a new diffusion model path under vllm_omni/diffusion/.

Load references/diffusion-checklist.md instead of expanding the full workflow here: Dimensions 1–4 (including optional canonical PR body template + measurement guidance for Dimension 1), gh/grep detection commands, Quick Red Flags, combined-feature rules.

Gate order before approve: PR body evidence (script, samples, latency, VRAM) → offline and online paths → at least one acceleration and one memory optimization (new models) → docs tables + usage examples → required e2e online test → offline test if no e2e → combined test when 2+ accel/memory features.

Use Quick Red Flags actions for comment severity; write human-shaped GitHub comments, not giant pasted templates.

Batch Review Session

For daily sessions (reply-first, PR selection, pacing, re-review): references/review-execution.md — sections Batch and daily review sessions and related scripts.

References

All paths are under skills/vllm-omni-review/references/. There is no pitfalls.md, code-patterns.md, or python-style-guide.md — see the legacy map in Which reference to load above.

Review execution — Gates, gh fetch/submit, comment budget, tone, Python style (review flags) (imports, naming, common flags — use to avoid nit wars), batch session, CI log triage
Verification — Hardware verification: checkout, unit tests, E2E smoke, claim comparison, reporting
Review routing — Prefix → domain skill, hardware, multi-skill
Blocker patterns — Part 1: merge-blocking patterns; Part 2: MRO/mixins, connectors, async/sync, diffusion latents, API validation, tensor parallel, mocks (former pitfalls content)
Architecture — Layers and critical paths; end section Code patterns for review = async, distributed, KV cache, validation, connectors, errors, logging (former code-patterns content)
Diffusion checklist — Diffusion PR dimensions, PR body template, Quick Red Flags
Tests & docs checklist — High-risk coverage matrix and docs sync
Maintainer style study — Example maintainer phrasing

vLLM-Omni PR Review

Overview

Usage modes

Inspired by common PR-review skill patterns (e.g. explicit modes + tool choice); repo is always vllm-project/vllm-omni unless the user says otherwise.

What to prioritize (CI-complement)

Investigate, don’t guess — If a checklist item might apply (e.g. connector lifecycle, diffusion cache), read surrounding code or use parallel subagents for independent areas. A wrong guess is worse than silence.
Focus on what automation doesn’t prove — Design, API contracts, stage/connector invariants, test adequacy for omni paths, breaking behavior. Do not re-argue formatting or issues pre-commit / CI already failed (point at the gate instead).
Structured notes — If you produce a multi-section writeup for the user, omit sections with no findings (no “No issues” padding).

Parallel investigation: Large diffs or multiple subsystems (e.g. entrypoints/ + engine/ + diffusion/) → split by directory or concern and investigate in parallel when subagents exist.

When to use subagents vs direct fetch:

Reading surrounding code to verify a pattern → subagent (returns summary)
Reading a reference file you need for the blocker scan → direct (you need the full patterns)
Fetching PR metadata / diff → direct (small, needed in main context)
Verifying function signatures, return types, class hierarchies → subagent

Which reference to load (do not load everything)

Priority Hierarchy Under Context Pressure

If context is limited, prioritize: blocker scan → evidence → domain routing → verdict.

Always run the blocker scan. Under context pressure, do a shallow scan of the most critical categories (Correctness, Security) and flag that the scan was incomplete.

Core Workflow

Check whether this PR is still a draft or WIP in the PR title, if so, end the review process.

Step 0: Verify Review Gates First

If this is a ready-to-review PR, check the mergeability and required checks (DCO, pre-commit, mergeability). If failing, stop and ask the author to fix gates before proceeding.

For gate commands, review submission, and comment style, see references/review-execution.md.

Then continue with the workflow below.

Step 1: Fetch Diff First, Then Decide What to Load

Diff-first loading: Fetch the diff and PR metadata first. Skim the diff to understand what subsystems are touched, then load ONLY the references that match. Do not load references speculatively.

Fetch:

PR metadata and changed files
The diff
Linked issues for [Bugfix] and [Feature] PRs only when conventions are unclear
Related PRs only when conventions or prior decisions are unclear

Group changes by kind (runtime code, tests, docs, configs) to see where risk sits; then load references (Step 4) only for areas touched.

Do not fetch broad extra context unless the diff leaves real ambiguity.

Step 2: Blocker Scan (Required First)

Execute this scan before any other review activity. For each category, explicitly mark PASS or list blocking issues found.

BLOCKER scan:
| Category            | Result                                  |
|---------------------|-----------------------------------------|
| Correctness         | PASS / ISSUES: (list)                   |
| Reliability/Safety  | PASS / ISSUES: (list)                   |
| Breaking Changes    | PASS / (check PR description first)     |
| Test Coverage       | PASS / (check PR desc for evidence) / needs tests |
| Documentation       | PASS / ISSUES: (list)                   |
| Security            | PASS / ISSUES: (list)                   |

Blocker categories:

Evidence standard: Code inspection suffices for code-level blockers. For test coverage, require CI logs or PR description evidence.

Confidence threshold: Flag obvious cases only. For suspicious but uncertain cases, add a non-blocking comment.

For detailed anti-patterns with code examples, see references/blocker-patterns.md.

If blockers found: Track issues internally (category + file + line). Do not paste structured BLOCKING ISSUES: templates into the review body.

If no blockers: List non-blocking suggestions and proceed to Step 3.

Step 3: Run Verification (Blocking)

Checkout — gh pr checkout <n> on a server with the appropriate GPU/model
Unit tests — run affected tests at minimum (e.g. pytest -m "core_model" for the changed area)
E2E smoke — for bench/tool/metric PRs, start a server and run a quick smoke test (10 prompts, low concurrency) to verify new outputs appear and the tool doesn't crash
Compare claims — check PR claims (metrics appear, values sensible, no crashes) against actual output
Report — post findings as a PR comment; bugs found are blocking

If no hardware access, skip to Step 4.

Step 4: Route to the Right Skill

Full prefix table, multi-skill combos, hardware detection, and delegation triggers: references/review-routing.md.

Step 5: Load Only the Relevant Review Reference

Pick the narrowest references that match the diff; avoid loading every row by default.

Step 6: Ask for Concrete Validation Evidence

When tests or benchmarks are missing and PR description evidence is insufficient, ask for specific evidence:

For [Feature] PRs affecting performance or [Performance] PRs, use the checklist in references/tests-docs-checklist.md section 5.

Be explicit in review comments. Treat "manual verification only" as insufficient unless automation is genuinely impossible.

Step 7: Verify Perf/Accuracy Claims — A/B Required (Blocking)

When to activate: [Performance] / [Perf] prefix, or quantitative perf/accuracy claims in PR body, or Step 5 flagged missing benchmarks.

Core rule: Every perf PR must provide A/B results for BOTH perf AND accuracy. Speedup with silent quality degradation is a blocker.

Load references/perf-verification.md for full workflow, regression rules, git worktree setup, and evidence templates.

Quick check (reviewer asks contributor):

Perf A/B — Latency, throughput, VRAM: baseline (main) vs PR, same HW/inputs. Mean ± stddev over ≥3 runs, warmup excluded.
Accuracy A/B — Quantitative metric or side-by-side samples. Same seeds/inputs, baseline vs PR.
Environment — HW, software versions, model, config, exact commands.

Regression rules: Accuracy degradation = blocking. VRAM > 5% regression = blocking. Latency > 10% regression = must explain.

Step 8: Evaluate Test Quality (Blocking)

When to activate: PR adds or modifies test files, or PR touches core code (engine/, stages/, connectors/) without adding tests, or PR is test-only.

Load references/test-quality-evaluation.md.

Workflow:

Static analysis (always runs) — check assertion quality, anti-patterns, marker compliance, edge case coverage
Detect hardware — same detection as Step 6 (cross-referenced from perf-verification.md)
Find affected tests — map changed source files to test files via grep (not path convention)
Filter by hardware — skip tests requiring unavailable resources
Run tests — pytest with --run-level core_model by default; use advanced_model only if hardware is sufficient
Categorize failures — test bug / code bug / infrastructure / flaky
Assess quality — score assertion quality, edge case coverage, marker compliance, anti-patterns (A-D grades for internal analysis)

Graceful degradation:

Step 9: Incremental Posting + Final Verdict

Posting strategy:

After completing the blocker scan, post any blocking-issue comments immediately
As domain review surfaces issues, post each comment right away
Minor style nits can be batched (up to 3) in a single review call if they're on the same file
If you find yourself past ~60% context, stop investigating and post whatever you have

Final Verdict

Do not submit a review event (APPROVE / COMMENT / REQUEST_CHANGES) — leave the verdict decision to the user.

Summarize locally:

What was validated
What still lacks evidence
Recommended verdict (local presentation only)

Recommended verdict mapping:

APPROVE — no blockers; body optional (empty is fine).
COMMENT — suggestions only; body optional (~50% should be empty).
REQUEST_CHANGES — genuine blocking issues only (crashes, data loss, security, policy gates).

For tone and inline style, see references/review-execution.md. For maintainer phrasing samples, see references/maintainer-style-study.md.

Review Heuristics

When to Fetch More Context

Fetch more context when:

The diff snippet hides lifecycle or cleanup behavior
A config key or API field is introduced without nearby validation
A benchmark claim references unseen measurement code
The PR appears to rely on prior design discussion

Keep additional fetches narrow and tied to a specific uncertainty.

Diffusion Model PR Review

When: Title prefix [Model], [New Model], [Image], [ImageGen], [Video], [VideoGen], or [Diffusion], or a new diffusion model path under vllm_omni/diffusion/.

Use Quick Red Flags actions for comment severity; write human-shaped GitHub comments, not giant pasted templates.

Batch Review Session

For daily sessions (reply-first, PR selection, pacing, re-review): references/review-execution.md — sections Batch and daily review sessions and related scripts.

References

All paths are under skills/vllm-omni-review/references/. There is no pitfalls.md, code-patterns.md, or python-style-guide.md — see the legacy map in Which reference to load above.

Review execution — Gates, gh fetch/submit, comment budget, tone, Python style (review flags) (imports, naming, common flags — use to avoid nit wars), batch session, CI log triage
Verification — Hardware verification: checkout, unit tests, E2E smoke, claim comparison, reporting
Review routing — Prefix → domain skill, hardware, multi-skill
Blocker patterns — Part 1: merge-blocking patterns; Part 2: MRO/mixins, connectors, async/sync, diffusion latents, API validation, tensor parallel, mocks (former pitfalls content)
Architecture — Layers and critical paths; end section Code patterns for review = async, distributed, KV cache, validation, connectors, errors, logging (former code-patterns content)
Diffusion checklist — Diffusion PR dimensions, PR body template, Quick Red Flags
Tests & docs checklist — High-risk coverage matrix and docs sync
Maintainer style study — Example maintainer phrasing

Adoption

hsliuustc0106/vllm-omni-review

$ install --global

Security Scan Results

SKILL.md

vLLM-Omni PR Review

Overview

Usage modes

What to prioritize (CI-complement)

Which reference to load (do not load everything)

Priority Hierarchy Under Context Pressure

Core Workflow

Step 0: Verify Review Gates First

Step 1: Fetch Diff First, Then Decide What to Load

Step 2: Blocker Scan (Required First)

Step 3: Run Verification (Blocking)

Step 4: Route to the Right Skill

Step 5: Load Only the Relevant Review Reference

Step 6: Ask for Concrete Validation Evidence

Step 7: Verify Perf/Accuracy Claims — A/B Required (Blocking)

Step 8: Evaluate Test Quality (Blocking)

Step 9: Incremental Posting + Final Verdict

Final Verdict

Review Heuristics

When to Fetch More Context

Diffusion Model PR Review

Batch Review Session

References

Related Skills

hsliuustc0106/vllm-omni-pre-check

hsliuustc0106/skills/vllm-omni-test-report

hsliuustc0106/vllm-omni-video-gen

hsliuustc0106/vllm-omni-setup

hsliuustc0106/vllm-omni-review

$ install --global

Security Scan Results

SKILL.md

vLLM-Omni PR Review

Overview

Usage modes

What to prioritize (CI-complement)

Which reference to load (do not load everything)

Priority Hierarchy Under Context Pressure

Core Workflow

Step 0: Verify Review Gates First

Step 1: Fetch Diff First, Then Decide What to Load

Step 2: Blocker Scan (Required First)

Step 3: Run Verification (Blocking)

Step 4: Route to the Right Skill

Step 5: Load Only the Relevant Review Reference

Step 6: Ask for Concrete Validation Evidence

Step 7: Verify Perf/Accuracy Claims — A/B Required (Blocking)

Step 8: Evaluate Test Quality (Blocking)

Step 9: Incremental Posting + Final Verdict

Final Verdict

Review Heuristics

When to Fetch More Context

Diffusion Model PR Review

Batch Review Session

References

Related Skills

hsliuustc0106/vllm-omni-pre-check

hsliuustc0106/skills/vllm-omni-test-report

hsliuustc0106/vllm-omni-video-gen

hsliuustc0106/vllm-omni-setup