skills/llm-benchmark-workflow/SKILL.md
Build and run the benchmarking tools in this repository (including llm-bench-cli) across supported backends, and triage benchmark build/runtime issues (shared libs placement, model paths, threads/tokens, JNI off). Use when changing benchmark code, adding metrics, comparing performance, or verifying benchmark binaries for a backend.
npx skillsauth add arm-examples/llm-runner llm-benchmark-workflowInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Benchmarks are gated behind -DBUILD_BENCHMARK=ON and produce executables under build/bin/.
Windows note: if python3 isn’t available, use python (or py -3) for the scripts below.
During configure, CMake may use FetchContent to download dependencies from the network. If your network is blocked, configure can fail before tests run. To enable network access when the process runs in a sandbox, see skills/README.md. Otherwise, run the build on your local machine (with network access) and share the resulting build path with the tool (for example: build dir: /path/to/build), or point CMake at pre-downloaded dependencies in your environment.
cmake --preset=native -B build -DBUILD_BENCHMARK=ON
cmake --build ./build --parallel
Select a backend as needed:
cmake --preset=native -B build -DBUILD_BENCHMARK=ON -DLLM_FRAMEWORK=llama.cpp
cmake --preset=native -B build -DBUILD_BENCHMARK=ON -DLLM_FRAMEWORK=onnxruntime-genai
cmake --preset=native -B build -DBUILD_BENCHMARK=ON -DLLM_FRAMEWORK=mediapipe
cmake --preset=native -B build -DBUILD_BENCHMARK=ON -DLLM_FRAMEWORK=mnn
If you want to avoid Java/JNI setup during benchmark iteration:
cmake --preset=native -B build -DBUILD_BENCHMARK=ON -DBUILD_JNI_LIB=OFF
llm-bench-clillm-bench-cli is backend-agnostic; it infers backend behavior from the model/config file you pass.
Passing a config JSON is recommended because it encodes the backend and paths.
./build/bin/llm-bench-cli -m <model_or_config_path> -i 128 -o 64 -c 2048 -t 4 -n 3 -w 1
For specifics and runtime pitfalls, load skills/llm-benchmark-workflow/references/bench-notes.md.
python3 skills/llm-benchmark-workflow/scripts/bench_smoke.py buildctest --test-dir ./build --output-on-failuretools
Update scripts/py/requirements.json entries (URLs + sha256sum) for models/tools, validate hash changes, and keep downloads deterministic without committing artifacts. Use when adding or refreshing model/tool downloads.
tools
Run fast “session start / doctor” checks for this repository (toolchain + wiring sanity, framework version report, optional upstream update check), optionally generate a debug bundle, and when needed bump pinned backend framework versions with build+ctest verification. Use at session start or when upgrading llama.cpp/onnxruntime-genai/mediapipe/mnn pins.
tools
Run a fast JNI-focused build/test smoke check (JNI on, minimal test run), and isolate JNI toolchain issues. Use when changing JNI/Java code or validating JNI setup.
development
Debug failing LLM integration tests caused by model output drift, incorrect context/runtime parameters (contextSize, batchSize, threads), prompt/template mismatches, or backend/framework regressions. Use when tests fail and you need to see the model response, reproduce a single failing CTest, or trace issues into src/cpp/frameworks (llama.cpp, onnxruntime-genai, mediapipe, mnn).