.claude/skills/review-latency/SKILL.md
Audit the two latency-critical paths for blocking work, redundant computation, and micro-optimization opportunities
npx skillsauth add cwilliams5/Alt-Tabby review-latencyInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Enter planning mode. Deep-audit both latency-critical paths for anything that adds delay — from micro-optimizations to architectural blockers. Use maximum parallelism — spawn explore agents for independent paths.
This is the highest-priority performance surface. The project's overriding goal is responsiveness — the user must never feel lag when Alt-Tabbing. Every microsecond matters on these paths because costs compound: a 50μs waste in an eligibility check runs 50× per focus event = 2.5ms. A cache miss in display list building runs every paint. Micro-optimizations are not just welcome, they're the point.
The architecture is single-process: producers, window store, and GUI all run in MainProcess. There is no IPC on the critical path — the enrichment pump (icon/process resolution) is async and off the hot path.
This skill covers the control flow around rendering — everything from keypress to "start painting" and from "painting done" to "window visible." It does NOT cover per-frame rendering internals, which have dedicated skills:
/review-paint — D2D paint pipeline: pre-render + BeginDraw→EndDraw, 8-layer compositor, effect helpers, per-frame allocations in gui_paint.ahk, gui_effects.ahk, gui_bgimage.ahk, gui_gdip.ahk, gui_math.ahk/review-d3d — D3D11 shader host: d2d_shader.ahk buffer allocations, state calls, GPU readback, compute dispatch/review-shaders — HLSL pixel shader source: ALU, transcendentals, loop optimization in src/shaders/*.hlslHowever, the frame pacing mechanism IS in scope for this skill — the three-tier pacing in gui_animation.ahk (compositor clock → waitable swap chain → QPC spin-wait) directly affects input-to-photon latency. The decision of when to render (frame pacing) is latency-critical; what to render (paint internals) is not.
Similarly, DComp operations that affect overlay visibility timing are in scope: D2D_SetClipRect, D2D_Commit, DWM cloaking/uncloaking sequences. Present(0,0) is non-blocking post-Phase 1 — verify this hasn't regressed.
If you find a latency issue that lives inside the rendering pipeline (e.g., "paint takes too long because of X"), note it briefly and defer to the appropriate skill.
An external event (focus change, window created/destroyed, komorebi workspace switch) must update the window store as fast as possible so the data is fresh when the user Alt-Tabs.
WinEventHook callback ─┐
Komorebi subscription ─┼──► Eligibility check ──► Store upsert ──► Dirty tracking
WinEnum (on-demand) ─┘
Key files:
src/core/winevent_hook.ahk — primary producer, fires on every focus changesrc/core/komorebi_sub.ahk, komorebi_state.ahk, komorebi_lite.ahk — workspace tracking with multi-layer cachesrc/core/winenum_lite.ahk — full window enumeration (startup, snapshot)src/shared/blacklist.ahk — Blacklist_IsWindowEligible() — called per-window, per-eventsrc/shared/window_list.ahk — store internals, upsert, dirty tracking, display listsrc/core/mru_lite.ahk — fallback MRU (if WinEventHook fails)Questions to ask:
The user presses Alt → Tab and must see the overlay with correct data as fast as possible. Then each subsequent Tab press must update the selection and repaint instantly.
This skill focuses on the control flow and data preparation — from keypress through state machine to the point where rendering begins, and from rendering completion to overlay visibility. The rendering pipeline itself (D2D draw calls, effects, compositing) is covered by /review-paint.
Alt down ──► Pre-warm (refresh data early)
Tab down ──► Freeze list ──► Build display items ──► [Paint — see /review-paint] ──► Show overlay
Tab again ──► Move selection ──► [Repaint]
Alt up ──► Activate window ──► Hide overlay
Escape ──► Cancel ──► Hide overlay
Key files:
src/gui/gui_interceptor.ahk — keyboard hook callbacks, event dispatchsrc/gui/gui_state.ahk — state machine transitionssrc/gui/gui_input.ahk — input handling, selection movementsrc/gui/gui_data.ahk — live data layer, refresh, pre-cache, display evictionsrc/gui/gui_overlay.ahk — show/hide mechanics (DWM cloaking, anti-flash)src/shared/gui_antiflash.ahk — DWM cloaking / alpha sequencingsrc/gui/gui_monitor.ahk — monitor detection, DPIsrc/gui/gui_workspace.ahk — workspace label buildingsrc/gui/gui_pump.ahk — enrichment pump integrationOut of scope (covered by dedicated skills):
src/gui/gui_paint.ahk — per-frame rendering internals (/review-paint)src/gui/gui_effects.ahk — 8-layer compositor internals (/review-paint)src/gui/gui_gdip.ahk — D2D resource management (/review-paint)src/gui/gui_math.ahk — layout calculations (/review-paint)src/gui/gui_bgimage.ahk — background image layer (/review-paint)src/gui/d2d_shader.ahk — D3D11 interop (/review-d3d)src/shaders/*.hlsl — pixel shaders (/review-shaders)In scope from gui_animation.ahk (the frame pacing / latency boundary):
_Anim_FrameLoop — three-tier pacing: compositor clock wait vs waitable swap chain vs QPC spin-waitDCompositionBoostCompositorClock — DRR boost on show/hideAnim_EnsureTimer deferred start guard (STA pump safety — indirectly affects first-frame latency)Questions to ask:
GUI_Repaint()? Is any of it unnecessary or reorderable?The keyboard hooks run on the main thread. Anything that blocks the main thread delays hook processing. This includes:
query_timers.ps1)Critical "On" sectionsDllCall that might block (synchronous Win32 calls)This is separate from the two paths above — even if Path 1 and Path 2 are individually fast, a long-running timer callback between Alt-down and Tab-down steals time from hook processing.
Split by hot path (run in parallel):
src/core/, eligibility in blacklist.ahk, store internals in window_list.ahk. Focus on per-event callback cost.gui_interceptor.ahk, gui_state.ahk, gui_input.ahk, gui_data.ahk, gui_overlay.ahk, src/shared/gui_antiflash.ahk, gui_workspace.ahk. Focus on keypress-to-paint-call and paint-done-to-visible sequences. Do NOT audit the rendering pipeline itself.query_timers.ps1 output, Critical section durations, any synchronous I/O on the main thread. Scan all src/gui/ and src/core/ files for blocking operations.query_timers.ps1 — inventory all timers, find heavy callbacksquery_state.ps1 — trace state machine transitions for the Alt-Tab flowquery_interface.ps1 <file> — public API surface of hot path filesquery_function.ps1 <func> — extract function bodies without loading full filesquery_callchain.ps1 <func> — trace call depth from hot path entry pointsSurface everything — do not auto-exclude findings based on estimated size. Micro-optimizations on high-frequency paths are the point of this review.
For each finding, provide an honest assessment:
| Finding | File:Lines | Current Cost | Frequency | Compound Cost | Complexity | Fix |
|---------|-----------|-------------|-----------|---------------|------------|-----|
| Eligibility re-checks cloaked state on every focus event | blacklist.ahk:142 | ~30μs | 50×/focus burst | ~1.5ms | One-line cache | Cache cloaked state, invalidate on EVENT_OBJECT_CLOAKED |
| Display list rebuilds workspace labels every paint | gui_data.ahk:88 | ~200μs | Every Tab press | ~200μs | Medium — need invalidation signal | Pre-compute during pre-warm, cache until workspace change |
Columns explained:
Do not filter. A 10μs saving that runs 100× per paint (1ms compound) is worth knowing about even if the fix is complex. The user decides the tradeoff.
After explore agents report back, validate every finding yourself. This codebase has extensive caching and optimization already — what looks like a miss may be handled elsewhere.
For each candidate:
file.ahk lines X–Y" with actual code quoted. Trace the full execution path, not just one function.Section 1 — Path 1 findings (Window Change → Store):
| Finding | File:Lines | Current Cost | Frequency | Compound Cost | Complexity | Fix | |---------|-----------|-------------|-----------|---------------|------------|-----|
Section 2 — Path 2 findings (User Action → Pixels):
| Finding | File:Lines | Current Cost | Frequency | Compound Cost | Complexity | Fix | |---------|-----------|-------------|-----------|---------------|------------|-----|
Section 3 — Cross-cutting (Main Thread Blocking):
| Finding | File:Lines | Block Duration | When It Fires | Impact on Hooks | Complexity | Fix | |---------|-----------|---------------|--------------|----------------|------------|-----|
Order within each section by compound cost (highest first). Do not omit low-compound-cost findings — list them at the bottom.
Ignore any existing plans — create a fresh one.
tools
Create a new git worktree and switch the session into it
tools
Spawn agent to trace code flow via query tools — answer only, no context cost
tools
Commit, push, and create a PR for the current branch
tools
Retire a shader by moving its files to legacy/shaders_retired