skills/model-optimization/sglang/sglang-ltx23-hq-optimization/SKILL.md
PR-backed and current-main optimization manual for LTX-2.3 High Quality pipeline in SGLang Diffusion. Use when Codex needs to audit or extend LTX2TwoStageHQPipeline, LTX-2.3 two-stage LoRA switching, HQ sigma/timestep semantics, res2s RK2 refinement, audio/video denoising, Gemma prompt trimming, low-VRAM device snapshots, or LTX-2.3 HQ sampling defaults.
npx skillsauth add BBuf/AI-Infra-Auto-Driven-SKILLS sglang-ltx23-hq-optimizationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
LTX-2.3 HQ is a SGLang Diffusion two-stage video+audio pipeline lane. It differs from legacy LTX-2 and non-HQ LTX-2.3 because stage 1 runs at half resolution, HQ sigma shift is resolution-aware, distilled LoRA strength is stage-specific, and stage 2 uses HQ timestep semantics plus res2s refinement behavior.
Current evidence snapshot:
origin/main: bca3dd958 on 2026-04-24ltx_2_pipeline.py, ltx_2_denoising.py, configs/sample/ltx_2.py, configs/pipeline_configs/ltx_2.pyCapture:
LTX2Pipeline, LTX2TwoStagePipeline, or LTX2TwoStageHQPipelineltx_variant == "ltx_2_3"LTX2HalveResolutionStageDo not apply HQ semantics to every LTX-2.3 path.
Before adding LTX-2.3 HQ evidence, open the PR diff/source and update references/pr-history.md with motivation, implementation, code excerpts, reviewed files, and validation implications.
references/pr-history.md: diff-reviewed LTX-2.3 HQ PR cards.development
Perform SGLang code review in the style of human maintainers by consulting the full non-agent PR review episode corpus from project start through the latest refresh (June 2026), including inline review threads, top-level PR comments, review submissions, original multilingual text, and multi-round discussions. Use when reviewing SGLang PRs, diffs, patches, or local changes for correctness, tests, performance, GPU/runtime risks, API compatibility, and maintainability.
documentation
Use when an SGLang, vLLM, or TensorRT-LLM serving/model optimization task needs prior model-family PR evidence. Query and read the PR-driven history docs under model-pr-optimization-history before choosing source paths, fast paths, kernel/fusion ideas, regression risks, or validation lanes.
development
Run an autonomous Humanize-governed vLLM SOTA performance loop for one LLM model: first perform the fixed fair vLLM/SGLang/TensorRT-LLM deployment search and benchmark, then start one RLCR loop that repeatedly decides the gap, profiles the current bottleneck, runs layer/kernel pipeline analysis, patches vLLM code, optionally uses ncu-report-skill for kernel evidence, and revalidates until vLLM matches or beats the best observed framework under the same workload and SLA.
devops
Inspect LLM torch profiler traces at forward-pass, layer, and kernel level. Use when you need layer timings, anchor-kernel boundaries, representative kernel flows, or Perfetto time ranges.