skills/model-architecture-diagram/SKILL.md
Return public original model architecture diagrams for user-specified LLM, VLM, MoE, diffusion, OCR, and SGLang/sgl-cookbook model families. Use when the user asks for a model structure chart, architecture diagram, or rendered image link for a specific model such as DeepSeek, GLM, Qwen, Kimi, MiniMax, Step, Hunyuan, or Qwen3-VL.
npx skillsauth add BBuf/AI-Infra-Auto-Driven-SKILLS model-architecture-diagramInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Return only public original diagrams indexed by this skill.
python3 skills/model-architecture-diagram/scripts/model_architecture_diagram.py "<model name>"
kind: existing, return the raw image Markdown it prints and preserve the source attribution line.kind: no_match, tell the user that no public original architecture diagram is indexed for that model.Use references/diagram-index.json as the source of truth. It stores raw GitHub image URLs from:
datawhalechina/self-llmCalvinXKY/InfraTechTongyi-MAI/Z-ImageWan-Video/Wan2.1Wan-Video/Wan2.2Tencent-Hunyuan/HunyuanVideoTencent-Hunyuan/Hunyuan3D-2brayevalerien/Flux.1-Architecture-DiagramPrefer detailed implementation, cookbook, or architecture-card diagrams over paper figures. Good sources show module boundaries, dataflow, MoE / attention / cache paths, or model-specific runtime structure rather than only a high-level paper overview. Official repository diagrams and curated implementation diagrams are first choice; paper figures are fallback only when no more detailed public original diagram is indexed.
Do not copy remote image binaries into the skill. Return the raw GitHub URLs so the chat renderer can display the original image.
For a direct match, show the original image. Good direct matches include:
If multiple diagrams match, show all high-confidence matches up to the resolver's default limit. For example, DeepSeek V3 may return the full architecture plus MLA MHA/MQA diagrams.
Do not commit the sgl-cookbook-model-architecture-images/ gallery into the repository. The public-original image set is hosted as a GitHub Release asset and indexed by a GitHub issue.
Current hosted artifact:
sha256:ea432081849a250429d3d1ecf246e267c5cc42f989aaf4b9ca695b581e7fa50fThe artifact contains 44 public original diagram image files from the indexed upstream repositories, plus a lightweight index.html, index.md, manifest.json, HTML contact sheet, and architecture-audit.md.
To inspect the gallery locally:
curl -L -o /tmp/sgl-cookbook-model-architecture-images-2026-05-02.zip \
https://github.com/BBuf/AI-Infra-Auto-Driven-SKILLS/releases/download/sgl-cookbook-architecture-images-2026-05-02/sgl-cookbook-model-architecture-images-2026-05-02.zip
unzip -q /tmp/sgl-cookbook-model-architecture-images-2026-05-02.zip -d /tmp
open /tmp/sgl-cookbook-model-architecture-images/index.html
List known original diagram aliases:
python3 skills/model-architecture-diagram/scripts/model_architecture_diagram.py --list-known
Emit JSON for automation:
python3 skills/model-architecture-diagram/scripts/model_architecture_diagram.py "GLM-5" --format json
references/diagram-index.json: original diagram link index and aliases.references/source-notes.md: audited source repositories and local cache paths.development
Perform SGLang code review in the style of human maintainers by consulting the full non-agent PR review episode corpus from project start through the latest refresh (June 2026), including inline review threads, top-level PR comments, review submissions, original multilingual text, and multi-round discussions. Use when reviewing SGLang PRs, diffs, patches, or local changes for correctness, tests, performance, GPU/runtime risks, API compatibility, and maintainability.
documentation
Use when an SGLang, vLLM, or TensorRT-LLM serving/model optimization task needs prior model-family PR evidence. Query and read the PR-driven history docs under model-pr-optimization-history before choosing source paths, fast paths, kernel/fusion ideas, regression risks, or validation lanes.
development
Run an autonomous Humanize-governed vLLM SOTA performance loop for one LLM model: first perform the fixed fair vLLM/SGLang/TensorRT-LLM deployment search and benchmark, then start one RLCR loop that repeatedly decides the gap, profiles the current bottleneck, runs layer/kernel pipeline analysis, patches vLLM code, optionally uses ncu-report-skill for kernel evidence, and revalidates until vLLM matches or beats the best observed framework under the same workload and SLA.
devops
Inspect LLM torch profiler traces at forward-pass, layer, and kernel level. Use when you need layer timings, anchor-kernel boundaries, representative kernel flows, or Perfetto time ranges.