skills/sglang-humanize-review/SKILL.md
Perform SGLang code review in the style of human maintainers by consulting the 2024-2025 non-agent PR review corpus, including inline code snippets, original multilingual comments, and discussion threads. Use when reviewing SGLang PRs, diffs, patches, or local changes for correctness, tests, performance, GPU/runtime risks, API compatibility, and maintainability.
npx skillsauth add BBuf/AI-Infra-Auto-Driven-SKILLS sglang-humanize-reviewInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Use this skill when the user asks for a human-style SGLang code review or wants review feedback that resembles SGLang maintainers instead of generic linting.
The bundled corpus was collected from sgl-project/sglang PRs created in 2024
and 2025, excluding PRs authored by bots or obvious coding-agent accounts. It
contains 10,959 inline review threads and 18,266 human reviewer comments. Each
thread preserves:
diff_hunk code contextRead references/corpus-summary.md first for coverage, counts, top paths, and category distribution. Do not load the gzip corpus directly into context; query it with the helper script.
Search the corpus by topic, path, category, or reviewer:
python3 skills/sglang-humanize-review/scripts/query_sglang_review_corpus.py \
--query cuda --limit 5
python3 skills/sglang-humanize-review/scripts/query_sglang_review_corpus.py \
--path python/sglang/srt --category correctness --limit 8
python3 skills/sglang-humanize-review/scripts/query_sglang_review_corpus.py \
--query server_args --format jsonl --limit 3
The full corpus is:
references/sglang-review-corpus-2024-2025.jsonl.gz
Regenerate it only when the user asks to refresh the evidence:
python3 skills/sglang-humanize-review/scripts/collect_sglang_review_corpus.py \
--repo sgl-project/sglang \
--start-year 2024 \
--end-year 2025 \
--out-dir skills/sglang-humanize-review/references
git diff, gh pr diff, or the patch supplied by the user.references/corpus-summary.md.
cuda, kv cache,
server_args, openai, logprob, tp, dp, eagle, fp8,
benchmark, or pytest.Prioritize these risks because they recur heavily in the 2024-2025 human review threads:
server_args, CLI defaults, endpoint behavior, streaming, and backward
compatibility.Mirror human SGLang review habits:
For a normal review, return:
For a review-prep pass before the user opens a PR, return:
For a corpus-backed explanation, include the query terms and summarize the matched review behavior without dumping long comment bodies.
development
Run an autonomous Humanize-governed vLLM SOTA performance loop for one LLM model: first perform the fixed fair vLLM/SGLang/TensorRT-LLM deployment search and benchmark, then start one RLCR loop that repeatedly decides the gap, profiles the current bottleneck, runs layer/kernel pipeline analysis, patches vLLM code, optionally uses ncu-report-skill for kernel evidence, and revalidates until vLLM matches or beats the best observed framework under the same workload and SLA.
devops
Inspect LLM torch profiler traces at forward-pass, layer, and kernel level. Use when you need layer timings, anchor-kernel boundaries, representative kernel flows, or Perfetto time ranges.
development
Run an autonomous Humanize-governed SGLang SOTA performance loop for one LLM model: first perform the fixed fair SGLang/vLLM/TensorRT-LLM deployment search and benchmark, then start one RLCR loop that repeatedly decides the gap, profiles the current bottleneck, runs layer/kernel pipeline analysis, patches SGLang code, optionally uses ncu-report-skill for kernel evidence, and revalidates until SGLang matches or beats the best observed framework under the same workload and SLA.
documentation
Use when an SGLang, vLLM, or TensorRT-LLM serving/model optimization task needs prior model-family PR evidence. Query and read the PR-driven history docs under model-pr-optimization-history before choosing source paths, fast paths, kernel/fusion ideas, regression risks, or validation lanes.