skills/model-optimization/sglang/sglang-z-image-turbo-optimization/SKILL.md
PR-backed optimization manual for Z-Image and Z-Image-Turbo in SGLang Diffusion. Use when Codex needs to audit or extend Z-Image registry entries, Turbo/base sampling defaults, CFG normalization, sequence-parallel latent sharding, Cache-DiT/TeaCache behavior, LoRA/FP8 coverage, or AMD nightly validation.
npx skillsauth add BBuf/AI-Infra-Auto-Driven-SKILLS sglang-z-image-turbo-optimizationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Z-Image and Z-Image-Turbo are native SGLang Diffusion model families, not generic Diffusers-only fallbacks. Current main registers both Tongyi-MAI/Z-Image-Turbo and Tongyi-MAI/Z-Image, uses ZImagePipelineConfig, and serves through ZImagePipeline or ComfyUIZImagePipeline.
Current evidence snapshot:
origin/main: bca3dd958 on 2026-04-24python/sglang/multimodal_gen/configs/pipeline_configs/zimage.pypython/sglang/multimodal_gen/runtime/models/dits/zimage.pypython/sglang/multimodal_gen/registry.pydocs_new/cookbook/diffusion/Z-Image/Z-Image-Turbo.mdxCapture:
Tongyi-MAI/Z-Image-Turbo or Tongyi-MAI/Z-ImageZImageTurboSamplingParams or ZImageSamplingParamscfg_normalization, guidance_scale, and guidance_rescale--sp-degree, --ulysses-degree, --ring-degreenightly-amd-1-gpu-zimage-turboz-image detectors through the Turbo sampling params unless the model id contains z-image-turbo.cfg_normalization is a quality-control knob, not a generic sampling parameter to silently drop.__main__.Tongyi-MAI/Z-Image-Turbo to ZImageTurboSamplingParams and ZImagePipelineConfig.Tongyi-MAI/Z-Image to ZImageSamplingParams and excludes Turbo.test/registered/amd/test_zimage_turbo.py runs via python test/registered/amd/test_zimage_turbo.py -v.Before adding Z-Image evidence, open the PR diff/source and update references/pr-history.md with motivation, key implementation, short code excerpts, reviewed files, and validation implications.
references/pr-history.md: diff-reviewed Z-Image PR cards.development
Run an autonomous Humanize-governed vLLM SOTA performance loop for one LLM model: first perform the fixed fair vLLM/SGLang/TensorRT-LLM deployment search and benchmark, then start one RLCR loop that repeatedly decides the gap, profiles the current bottleneck, runs layer/kernel pipeline analysis, patches vLLM code, optionally uses ncu-report-skill for kernel evidence, and revalidates until vLLM matches or beats the best observed framework under the same workload and SLA.
devops
Inspect LLM torch profiler traces at forward-pass, layer, and kernel level. Use when you need layer timings, anchor-kernel boundaries, representative kernel flows, or Perfetto time ranges.
development
Run an autonomous Humanize-governed SGLang SOTA performance loop for one LLM model: first perform the fixed fair SGLang/vLLM/TensorRT-LLM deployment search and benchmark, then start one RLCR loop that repeatedly decides the gap, profiles the current bottleneck, runs layer/kernel pipeline analysis, patches SGLang code, optionally uses ncu-report-skill for kernel evidence, and revalidates until SGLang matches or beats the best observed framework under the same workload and SLA.
documentation
Use when an SGLang, vLLM, or TensorRT-LLM serving/model optimization task needs prior model-family PR evidence. Query and read the PR-driven history docs under model-pr-optimization-history before choosing source paths, fast paths, kernel/fusion ideas, regression risks, or validation lanes.