.agents/skills/doppler-bench/SKILL.md
Run Doppler and vendor benchmark workflows, capture reproducible JSON artifacts, and compare bench/profile coverage using the vendor registry. (project)
npx skillsauth add clocksmith/doppler doppler-benchInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Use this skill for repeatable performance measurement, compare-engine evidence, and vendor-registry coverage. Use doppler-perf when the goal is diagnosis or tuning rather than publication-grade benchmark evidence.
Read these before non-trivial edits or benchmark-methodology changes:
docs/style/general-style-guide.mddocs/style/javascript-style-guide.mddocs/style/config-style-guide.mddocs/style/command-interface-design-guide.mddocs/style/harness-style-guide.mddocs/style/benchmark-style-guide.mdWhen benchmark work becomes extension work, also open:
docs/developer-guides/README.mdCommon routes:
docs/developer-guides/06-kernel-path-config.mddocs/developer-guides/11-wgsl-kernel.mddocs/developer-guides/13-attention-variant.mddocs/developer-guides/15-kvcache-layout.mddocs/developer-guides/12-command-surface.mdruntimeConfig, runtime profiles, rule assets).sampling, seed, budget, run policy) must come from shared contract JSON and be identical across engines.# Fair compute comparison (default parity decode cadence)
node tools/compare-engines.js --mode compute --warmup 1 --runs 3 --decode-profile parity --save --json
# Doppler throughput-tuned decode cadence
node tools/compare-engines.js --mode compute --warmup 1 --runs 3 --decode-profile throughput --save --json
# Warm-start only (includes model load)
node tools/compare-engines.js --mode warm --warmup 1 --runs 3 --decode-profile parity --save --json
Notes:
--decode-profile parity maps Doppler to batchSize=1, readbackInterval=1 for closer TJS cadence parity.--decode-profile throughput maps Doppler to batchSize=4, readbackInterval=4.prompt_tokens / ttft_ms in compare output.# Warm-cache benchmark (recommended baseline)
npm run bench -- --config '{"request":{"modelId":"MODEL_ID","runtimeProfile":"profiles/throughput","cacheMode":"warm"},"run":{"surface":"browser","bench":{"save":true}}}' --json
# Cold-cache benchmark (cache disabled per run)
npm run bench -- --config '{"request":{"modelId":"MODEL_ID","runtimeProfile":"profiles/throughput","cacheMode":"cold"},"run":{"surface":"browser","bench":{"save":true}}}' --json
# Compare against last saved run
npm run bench -- --config '{"request":{"modelId":"MODEL_ID","runtimeProfile":"profiles/throughput","cacheMode":"warm"},"run":{"surface":"browser","bench":{"save":true,"compare":"last"}}}' --json
Notes:
bench defaults to --surface auto; set run.surface="browser" when you explicitly want the browser relay.benchmarks/vendors/results/; result.reportInfo.path is the authoritative saved location.bench is calibrate-only. Do not put runtime.shared.tooling.intent="investigate" in bench examples.Use direct runtime config for explicit decode-cadence evidence:
# Smaller cadence / lower latency
npm run bench -- \
--config '{"request":{"modelId":"MODEL_ID","cacheMode":"warm"},"run":{"surface":"browser","bench":{"save":true}}}' \
--runtime-config '{"inference":{"generation":{"maxTokens":128},"sampling":{"temperature":0,"topK":1,"topP":1,"repetitionPenalty":1,"greedyThreshold":0},"session":{"decodeLoop":{"batchSize":4,"stopCheckMode":"batch","readbackInterval":4,"ringTokens":1,"ringStop":1,"ringStaging":1,"disableCommandBatching":false}}}}' \
--json
# Larger cadence / higher throughput candidate
npm run bench -- \
--config '{"request":{"modelId":"MODEL_ID","cacheMode":"warm"},"run":{"surface":"browser","bench":{"save":true}}}' \
--runtime-config '{"inference":{"generation":{"maxTokens":128},"sampling":{"temperature":0,"topK":1,"topP":1,"repetitionPenalty":1,"greedyThreshold":0},"session":{"decodeLoop":{"batchSize":8,"stopCheckMode":"batch","readbackInterval":8,"ringTokens":1,"ringStop":1,"ringStaging":1,"disableCommandBatching":false}}}}' \
--json
If the numbers are bad or unstable, hand off to doppler-perf for diagnosis before publishing claims.
# Raw Transformers.js benchmark with ORT op profiling summary
node benchmarks/runners/transformersjs-bench.js --workload g3-p064-d064-t0-k1 --cache-mode warm --profile-ops on --profile-top 20 --json
# Normalize result into vendor registry output
node tools/vendor-bench.js run --target transformersjs --workload g3-p064-d064-t0-k1 -- node benchmarks/runners/transformersjs-bench.js --workload g3-p064-d064-t0-k1 --cache-mode warm --profile-ops on --profile-top 20 --json
# Validate registry + harness + capability matrix
node tools/vendor-bench.js validate
# Show capability coverage for all targets
node tools/vendor-bench.js capabilities
# Show exact Doppler -> Transformers.js feature gaps
node tools/vendor-bench.js gap --base doppler --target transformersjs
decode_tokens_per_secprefill_tokens_per_sec_ttft (preferred normalized prefill metric)prefill_tokens_per_sec (legacy alias)ttft_msdecode_ms_per_token_p50/p95model_load_msort_profiled_total_ms (Transformers.js harness)result.reportInfo.path (artifact anchor)src/cli/doppler-cli.jsbenchmarks/runners/transformersjs-bench.jsbenchmarks/runners/transformersjs-runner.htmlbenchmarks/vendors/registry.jsonbenchmarks/vendors/capabilities.jsonbenchmarks/vendors/results/src/config/runtime/profiles/throughput.jsondocs/developer-guides/README.mddocs/agents/benchmark-protocol.md — vendor benchmark registry and update checklistdocs/agents/hardware-notes.md — GPU memory assumptionsdoppler-perf for decode/prefill diagnosis and tuning after a benchmark exposes a problemdoppler-debug for correctness regressions discovered during bench runsdoppler-convert when conversion format or quantization differences affect perfdevelopment
Diagnose and improve Doppler model/path performance with baselines, profiling traces, and controlled runtime/code experiments. (project)
documentation
Review kernels against DOPPLER style guide and propose style guide updates.
development
Diagnose inference regressions with Doppler's shared browser/Node command contract, runtime profiles, and report artifacts. (project)
testing
Convert GGUF or SafeTensors assets into Doppler RDRR manifests/shards using the current Node command surface, then verify load + inference. (project)