skills/browser-onnx/SKILL.md
Implements high-performance local machine learning inference in the browser using ONNX Runtime Web. Use this skill when the user needs privacy-first, low-latency, or offline AI capabilities (e.g., image classification, object detection, or NLP) without server-side processing.
npx skillsauth add thongnt0208/browser-onnx-skills browser-onnxInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill provides a comprehensive workflow for executing ONNX models locally in the browser using ONNX Runtime Web (ORT-Web). Local inference offers significant advantages in data privacy, reduced server costs, and unlimited scalability as each user brings their own compute power.
Install the required library via npm:
npm install onnxruntime-web
Note: For experimental features like WebGPU or WebNN, use the nightly version onnxruntime-web@dev.
Set global ort.env flags before creating a session to optimize the runtime environment.
ort.env.wasm.numThreads (default is half of hardware concurrency) and use a Proxy Worker (ort.env.wasm.proxy = true) to keep the UI responsive.ort.env.wasm.wasmPaths to point to local assets or a CDN.ort.env.webgpu.profiling = { mode: 'default' } for performance diagnosis during development.Initialize the session by choosing the appropriate Execution Provider (EP):
import * as ort from 'onnxruntime-web';
const session = await ort.InferenceSession.create('./model.onnx', {
executionProviders: ['webgpu', 'wasm'], // Prioritize GPU, fallback to CPU
graphOptimizationLevel: 'all' // Enable all graph-level optimizations
});
Input data must match the model's training format (e.g., NCHW for vision models).
new ort.Tensor('float32', float32Data,) to prepare the input feeds.enableGraphCapture: true to reduce CPU overhead by replaying kernel executions.ort.Tensor.fromGpuBuffer() and setting preferredOutputLocation: 'gpu-buffer' to avoid expensive memory copies.ArrayBuffer to ~2GB. Models exceeding this must be exported with external data.const session = await ort.InferenceSession.create(modelUrl, {
externalData: [{ path: './model.data', data: dataUrl }]
});
tensor.dispose() for GPU tensors to prevent memory leaks.Offload heavy translation tasks to a separate Web Worker using a singleton pattern to ensure the model (e.g., NLLB-200) loads only once.
Implement Non-Max Suppression (NMS). If the browser lacks support for specific NMS ops, run a separate NMS ONNX model to filter overlapping boxes locally.
development
Maintainer-only workflow for handling GitHub Secret Scanning alerts on OpenClaw. Use when Codex needs to triage, redact, clean up, and resolve secret leakage found in issue comments, issue bodies, PR comments, or other GitHub content.
development
Maintainer workflow for OpenClaw releases, prereleases, changelog release notes, and publish validation. Use when Codex needs to prepare or verify stable or beta release steps, align version naming, assemble release notes, check release auth requirements, or validate publish-time commands and artifacts.
development
Run, watch, debug, and extend OpenClaw QA testing with qa-lab and qa-channel. Use when Codex needs to execute the repo-backed QA suite, inspect live QA artifacts, debug failing scenarios, add new QA scenarios, or explain the OpenClaw QA workflow. Prefer the live OpenAI lane with regular openai/gpt-5.4 in fast mode; do not use gpt-5.4-pro or gpt-5.4-mini unless the user explicitly overrides that policy.
development
End-to-end Parallels smoke, upgrade, and rerun workflow for OpenClaw across macOS, Windows, and Linux guests. Use when Codex needs to run, rerun, debug, or interpret VM-based install, onboarding, gateway smoke tests, latest-release-to-main upgrade checks, fresh snapshot retests, or optional Discord roundtrip verification under Parallels.