Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

arm-examples/llm-benchmark-workflow

Name: llm-benchmark-workflow
Author: arm-examples

skills/llm-benchmark-workflow/SKILL.md

npx skillsauth add arm-examples/llm-runner llm-benchmark-workflow

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Benchmark workflow

Benchmarks are gated behind -DBUILD_BENCHMARK=ON and produce executables under build/bin/.

Windows note: if python3 isn’t available, use python (or py -3) for the scripts below.

Offline/build caveat

During configure, CMake may use FetchContent to download dependencies from the network. If your network is blocked, configure can fail before tests run. To enable network access when the process runs in a sandbox, see skills/README.md. Otherwise, run the build on your local machine (with network access) and share the resulting build path with the tool (for example: build dir: /path/to/build), or point CMake at pre-downloaded dependencies in your environment.

Workflow

1) Configure + build benchmark binaries

cmake --preset=native -B build -DBUILD_BENCHMARK=ON
cmake --build ./build --parallel

Select a backend as needed:

cmake --preset=native -B build -DBUILD_BENCHMARK=ON -DLLM_FRAMEWORK=llama.cpp
cmake --preset=native -B build -DBUILD_BENCHMARK=ON -DLLM_FRAMEWORK=onnxruntime-genai
cmake --preset=native -B build -DBUILD_BENCHMARK=ON -DLLM_FRAMEWORK=mediapipe
cmake --preset=native -B build -DBUILD_BENCHMARK=ON -DLLM_FRAMEWORK=mnn

If you want to avoid Java/JNI setup during benchmark iteration:

cmake --preset=native -B build -DBUILD_BENCHMARK=ON -DBUILD_JNI_LIB=OFF

2) Run `llm-bench-cli`

llm-bench-cli is backend-agnostic; it infers backend behavior from the model/config file you pass. Passing a config JSON is recommended because it encodes the backend and paths.

./build/bin/llm-bench-cli -m <model_or_config_path> -i 128 -o 64 -c 2048 -t 4 -n 3 -w 1

For specifics and runtime pitfalls, load skills/llm-benchmark-workflow/references/bench-notes.md.

3) Quick smoke checks

Confirm binary exists and prints help:
- python3 skills/llm-benchmark-workflow/scripts/bench_smoke.py build
Confirm CTest still passes after benchmark changes:
- ctest --test-dir ./build --output-on-failure

arm-examples/llm-benchmark-workflow

skills/llm-benchmark-workflow/SKILL.md

Build and run the benchmarking tools in this repository (including llm-bench-cli) across supported backends, and triage benchmark build/runtime issues (shared libs placement, model paths, threads/tokens, JNI off). Use when changing benchmark code, adding metrics, comparing performance, or verifying benchmark binaries for a backend.

4 stars

tools

Updated Apr 21, 2026

$ install --global

skillsauth

npx skillsauth add arm-examples/llm-runner llm-benchmark-workflow

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 21, 2026, 11:57 AM49.5s3 files scanned

SKILL.md

name:: llm-benchmark-workflow
description:: Build and run the benchmarking tools in this repository (including llm-bench-cli) across supported backends, and triage benchmark build/runtime issues (shared libs placement, model paths, threads/tokens, JNI off). Use when changing benchmark code, adding metrics, comparing performance, or verifying benchmark binaries for a backend.

Benchmark workflow

Benchmarks are gated behind -DBUILD_BENCHMARK=ON and produce executables under build/bin/.

Windows note: if python3 isn’t available, use python (or py -3) for the scripts below.

Offline/build caveat

Workflow

1) Configure + build benchmark binaries

cmake --preset=native -B build -DBUILD_BENCHMARK=ON
cmake --build ./build --parallel

Select a backend as needed:

cmake --preset=native -B build -DBUILD_BENCHMARK=ON -DLLM_FRAMEWORK=llama.cpp
cmake --preset=native -B build -DBUILD_BENCHMARK=ON -DLLM_FRAMEWORK=onnxruntime-genai
cmake --preset=native -B build -DBUILD_BENCHMARK=ON -DLLM_FRAMEWORK=mediapipe
cmake --preset=native -B build -DBUILD_BENCHMARK=ON -DLLM_FRAMEWORK=mnn

If you want to avoid Java/JNI setup during benchmark iteration:

cmake --preset=native -B build -DBUILD_BENCHMARK=ON -DBUILD_JNI_LIB=OFF

2) Run `llm-bench-cli`

llm-bench-cli is backend-agnostic; it infers backend behavior from the model/config file you pass. Passing a config JSON is recommended because it encodes the backend and paths.

./build/bin/llm-bench-cli -m <model_or_config_path> -i 128 -o 64 -c 2048 -t 4 -n 3 -w 1

For specifics and runtime pitfalls, load skills/llm-benchmark-workflow/references/bench-notes.md.

3) Quick smoke checks

Confirm binary exists and prints help:
- python3 skills/llm-benchmark-workflow/scripts/bench_smoke.py build
Confirm CTest still passes after benchmark changes:
- ctest --test-dir ./build --output-on-failure

Related Skills

arm-examples/llm-update-downloads

tools

VerifiedTrustedCommunity

Update scripts/py/requirements.json entries (URLs + sha256sum) for models/tools, validate hash changes, and keep downloads deterministic without committing artifacts. Use when adding or refreshing model/tool downloads.

4SKILL.mdUpdated Apr 4, 2026

arm-examples/llm-update-downloads

arm-examples/llm-session-start

tools

VerifiedTrustedCommunity

Run fast “session start / doctor” checks for this repository (toolchain + wiring sanity, framework version report, optional upstream update check), optionally generate a debug bundle, and when needed bump pinned backend framework versions with build+ctest verification. Use at session start or when upgrading llama.cpp/onnxruntime-genai/mediapipe/mnn pins.

4SKILL.mdUpdated Apr 4, 2026

arm-examples/llm-session-start

arm-examples/llm-jni-smoke

tools

VerifiedTrustedCommunity

Run a fast JNI-focused build/test smoke check (JNI on, minimal test run), and isolate JNI toolchain issues. Use when changing JNI/Java code or validating JNI setup.

4SKILL.mdUpdated Apr 4, 2026

arm-examples/llm-jni-smoke

arm-examples/llm-debug-test-failures

development

VerifiedTrustedCommunity

Debug failing LLM integration tests caused by model output drift, incorrect context/runtime parameters (contextSize, batchSize, threads), prompt/template mismatches, or backend/framework regressions. Use when tests fail and you need to see the model response, reproduce a single failing CTest, or trace issues into src/cpp/frameworks (llama.cpp, onnxruntime-genai, mediapipe, mnn).

4SKILL.mdUpdated Apr 4, 2026

arm-examples/llm-debug-test-failures

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/arm-examples/llm-runner.git

# Copy into Claude Code skills folder (global)
cp -r llm-runner/skills/llm-benchmark-workflow ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

arm-examples/llm-runner

4 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT

Adoption

arm-examples/llm-benchmark-workflow

$ install --global

Security Scan Results

SKILL.md

Benchmark workflow

Offline/build caveat

Workflow

1) Configure + build benchmark binaries

2) Run llm-bench-cli

3) Quick smoke checks

Related Skills

arm-examples/llm-update-downloads

arm-examples/llm-session-start

arm-examples/llm-jni-smoke

arm-examples/llm-debug-test-failures

arm-examples/llm-benchmark-workflow

$ install --global

Security Scan Results

SKILL.md

Benchmark workflow

Offline/build caveat

Workflow

1) Configure + build benchmark binaries

2) Run llm-bench-cli

3) Quick smoke checks

Related Skills

arm-examples/llm-update-downloads

arm-examples/llm-session-start

arm-examples/llm-jni-smoke

arm-examples/llm-debug-test-failures

2) Run `llm-bench-cli`

2) Run `llm-bench-cli`