.agents/skills/doppler-perf/SKILL.md
Diagnose and improve Doppler model/path performance with baselines, profiling traces, and controlled runtime/code experiments. (project)
npx skillsauth add clocksmith/doppler doppler-perfInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Use this skill when Doppler is slower than expected on decode, prefill, TTFT, or model-load-sensitive warm UX, and you need to diagnose or change the hot path for a specific model or runtime path. Use doppler-bench when the goal is reproducible benchmark evidence, compare-engine reporting, or vendor-registry coverage rather than tuning.
Read these before non-trivial performance, profiling, or methodology changes:
docs/style/general-style-guide.mddocs/style/javascript-style-guide.mddocs/style/config-style-guide.mddocs/style/harness-style-guide.mddocs/style/benchmark-style-guide.mdAlso read:
docs/style/wgsl-style-guide.md for shader changesdocs/style/command-interface-design-guide.md when changing bench or debug command behaviorWhen performance work requires additive implementation changes, also open:
docs/developer-guides/README.mdCommon routes:
docs/developer-guides/06-kernel-path-config.mddocs/developer-guides/11-wgsl-kernel.mddocs/developer-guides/13-attention-variant.mddocs/developer-guides/15-kvcache-layout.mddocs/developer-guides/12-command-surface.mdruntime profiles + workload contracts).# Start from one clean benchmark baseline
npm run bench -- --config '{"request":{"modelId":"MODEL_ID","runtimeProfile":"profiles/throughput","cacheMode":"warm"},"run":{"surface":"browser","bench":{"save":true}}}' --json
Read from output:
model loaddecode tok/sprompt tok/s (TTFT)TTFTIf you need compare-engine or publication-grade evidence at this stage, switch to doppler-bench instead of expanding the squeeze loop.
# Single-token-style control
npm run bench -- \
--config '{"request":{"modelId":"MODEL_ID","cacheMode":"warm"},"run":{"surface":"browser","bench":{"save":true}}}' \
--runtime-config '{"inference":{"generation":{"maxTokens":128},"sampling":{"temperature":0,"topK":1,"topP":1,"repetitionPenalty":1,"greedyThreshold":0},"session":{"decodeLoop":{"batchSize":1,"stopCheckMode":"batch","readbackInterval":1,"ringTokens":1,"ringStop":1,"ringStaging":1,"disableCommandBatching":false}}}}' \
--json
# Moderate batched candidate
npm run bench -- \
--config '{"request":{"modelId":"MODEL_ID","cacheMode":"warm"},"run":{"surface":"browser","bench":{"save":true}}}' \
--runtime-config '{"inference":{"generation":{"maxTokens":128},"sampling":{"temperature":0,"topK":1,"topP":1,"repetitionPenalty":1,"greedyThreshold":0},"session":{"decodeLoop":{"batchSize":4,"stopCheckMode":"batch","readbackInterval":4,"ringTokens":1,"ringStop":1,"ringStaging":1,"disableCommandBatching":false}}}}' \
--json
# Higher-throughput candidate
npm run bench -- \
--config '{"request":{"modelId":"MODEL_ID","cacheMode":"warm"},"run":{"surface":"browser","bench":{"save":true}}}' \
--runtime-config '{"inference":{"generation":{"maxTokens":128},"sampling":{"temperature":0,"topK":1,"topP":1,"repetitionPenalty":1,"greedyThreshold":0},"session":{"decodeLoop":{"batchSize":8,"stopCheckMode":"batch","readbackInterval":8,"ringTokens":1,"ringStop":1,"ringStaging":1,"disableCommandBatching":false}}}}' \
--json
# Trace-heavy debug run
npm run debug -- --config '{"request":{"modelId":"MODEL_ID","runtimeProfile":"profiles/verbose-trace"},"run":{"surface":"auto"}}' --json
# Logit-focused browser investigation
npm run debug -- --config '{"request":{"modelId":"MODEL_ID","runtimeProfile":"diagnostics/debug-logits"},"run":{"surface":"browser","browser":{"channel":"chrome","console":true}}}' --json
Rules:
bench is calibrate-only; do not override its intent in runtime config.debug is the investigate surface for traces, layer probes, and diagnostics.Common patterns:
Priority code hotspots:
src/inference/pipelines/text/logits/index.jssrc/inference/pipelines/text/generator.jssrc/inference/browser-harness.jssrc/memory/buffer-pool.js# Re-run the clean benchmark baseline after each material change
npm run bench -- --config '{"request":{"modelId":"MODEL_ID","runtimeProfile":"profiles/throughput","cacheMode":"warm"},"run":{"surface":"browser","bench":{"save":true}}}' --json
When the final evidence will be published, compared across engines, or used to update vendor-facing numbers, hand off to doppler-bench instead of extending the squeeze loop.
For each perf iteration, capture:
baseline command + result filechange (runtime-only or code patch)after command + result filemodel load, decode tok/s, prompt tok/s (TTFT), TTFTdocs/agents/benchmark-protocol.md — vendor benchmark registry and update checklistdocs/agents/hardware-notes.md — GPU memory assumptionsdoppler-bench for publication-grade benchmark execution, compare-engine evidence, and vendor normalization.doppler-debug for correctness checks while tuning performance.doppler-kernel-reviewer for WGSL/JS kernel quality review on perf patches.development
Diagnose and improve Doppler model/path performance with baselines, profiling traces, and controlled runtime/code experiments. (project)
documentation
Review kernels against DOPPLER style guide and propose style guide updates.
development
Diagnose inference regressions with Doppler's shared browser/Node command contract, runtime profiles, and report artifacts. (project)
testing
Convert GGUF or SafeTensors assets into Doppler RDRR manifests/shards using the current Node command surface, then verify load + inference. (project)