Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

thongnt0208/browser-onnx

Name: browser-onnx
Author: thongnt0208

skills/browser-onnx/SKILL.md

npx skillsauth add thongnt0208/browser-onnx-skills browser-onnx

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Browser-Based ONNX Inference

This skill provides a comprehensive workflow for executing ONNX models locally in the browser using ONNX Runtime Web (ORT-Web). Local inference offers significant advantages in data privacy, reduced server costs, and unlimited scalability as each user brings their own compute power.

1. Setup and Installation

Install the required library via npm:

npm install onnxruntime-web

Note: For experimental features like WebGPU or WebNN, use the nightly version onnxruntime-web@dev.

2. Global Environment Configuration

Set global ort.env flags before creating a session to optimize the runtime environment.

WebAssembly (CPU): Enable multi-threading by setting ort.env.wasm.numThreads (default is half of hardware concurrency) and use a Proxy Worker (ort.env.wasm.proxy = true) to keep the UI responsive.
WASM Paths: If binaries are not in the same directory as the JS bundle, manually override paths using ort.env.wasm.wasmPaths to point to local assets or a CDN.
WebGPU (GPU): Use ort.env.webgpu.profiling = { mode: 'default' } for performance diagnosis during development.

3. Creating an Inference Session

Initialize the session by choosing the appropriate Execution Provider (EP):

import * as ort from 'onnxruntime-web';

const session = await ort.InferenceSession.create('./model.onnx', {
  executionProviders: ['webgpu', 'wasm'], // Prioritize GPU, fallback to CPU
  graphOptimizationLevel: 'all' // Enable all graph-level optimizations
});

4. Data Preprocessing

Input data must match the model's training format (e.g., NCHW for vision models).

Image-to-Tensor: Use libraries like JIMP or OpenCV.js to resize, normalize (divide by 255.0), and convert RGBA to RGB.
Tensor Creation: Use new ort.Tensor('float32', float32Data,) to prepare the input feeds.

5. Optimized Inference Patterns

Graph Capture: For models with static shapes on WebGPU, enable enableGraphCapture: true to reduce CPU overhead by replaying kernel executions.
IO Binding: For transformer models, keep data on the GPU by using ort.Tensor.fromGpuBuffer() and setting preferredOutputLocation: 'gpu-buffer' to avoid expensive memory copies.
Quantization: Prefer uint8 quantized models for CPU (WASM) inference to improve performance; avoid float16 on CPU as it lacks native support and is slow.

6. Large Model Handling (>2GB)

Platform Limits: Browsers like Chrome limit ArrayBuffer to ~2GB. Models exceeding this must be exported with external data.

Loading External Data: Explicitly link external weight files in the session options:

const session = await ort.InferenceSession.create(modelUrl, {
  externalData: [{ path: './model.data', data: dataUrl }]
});

7. Common Edge Cases

Memory Management: Explicitly call tensor.dispose() for GPU tensors to prevent memory leaks.
Zero-Sized Tensors: ORT-Web treats tensors with a dimension of 0 as CPU tensors regardless of the selected EP.
Thermal Throttling: Sustained inference on mobile devices may trigger frequency scaling, doubling latency. Use lightweight "tiny" models to maintain thermal equilibrium.

8. Examples

Multilingual Translation

Offload heavy translation tasks to a separate Web Worker using a singleton pattern to ensure the model (e.g., NLLB-200) loads only once.

Object Detection (YOLO)

Implement Non-Max Suppression (NMS). If the browser lacks support for specific NMS ops, run a separate NMS ONNX model to filter overlapping boxes locally.

thongnt0208/browser-onnx

skills/browser-onnx/SKILL.md

Implements high-performance local machine learning inference in the browser using ONNX Runtime Web. Use this skill when the user needs privacy-first, low-latency, or offline AI capabilities (e.g., image classification, object detection, or NLP) without server-side processing.

development

Updated Apr 20, 2026

$ install --global

skillsauth

npx skillsauth add thongnt0208/browser-onnx-skills browser-onnx

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 20, 2026, 3:01 PM4.6s1 file scanned

SKILL.md

name:: browser-onnx
description:: Implements high-performance local machine learning inference in the browser using ONNX Runtime Web. Use this skill when the user needs privacy-first, low-latency, or offline AI capabilities (e.g., image classification, object detection, or NLP) without server-side processing.
license:: MIT
compatibility:: Requires a browser with WebAssembly (WASM) support for CPU or WebGPU for hardware acceleration.
version:: 1.0.0
runtime:: onnxruntime-web

Browser-Based ONNX Inference

1. Setup and Installation

Install the required library via npm:

npm install onnxruntime-web

Note: For experimental features like WebGPU or WebNN, use the nightly version onnxruntime-web@dev.

2. Global Environment Configuration

Set global ort.env flags before creating a session to optimize the runtime environment.

WebAssembly (CPU): Enable multi-threading by setting ort.env.wasm.numThreads (default is half of hardware concurrency) and use a Proxy Worker (ort.env.wasm.proxy = true) to keep the UI responsive.
WASM Paths: If binaries are not in the same directory as the JS bundle, manually override paths using ort.env.wasm.wasmPaths to point to local assets or a CDN.
WebGPU (GPU): Use ort.env.webgpu.profiling = { mode: 'default' } for performance diagnosis during development.

3. Creating an Inference Session

Initialize the session by choosing the appropriate Execution Provider (EP):

import * as ort from 'onnxruntime-web';

const session = await ort.InferenceSession.create('./model.onnx', {
  executionProviders: ['webgpu', 'wasm'], // Prioritize GPU, fallback to CPU
  graphOptimizationLevel: 'all' // Enable all graph-level optimizations
});

4. Data Preprocessing

Input data must match the model's training format (e.g., NCHW for vision models).

Image-to-Tensor: Use libraries like JIMP or OpenCV.js to resize, normalize (divide by 255.0), and convert RGBA to RGB.
Tensor Creation: Use new ort.Tensor('float32', float32Data,) to prepare the input feeds.

5. Optimized Inference Patterns

Graph Capture: For models with static shapes on WebGPU, enable enableGraphCapture: true to reduce CPU overhead by replaying kernel executions.
IO Binding: For transformer models, keep data on the GPU by using ort.Tensor.fromGpuBuffer() and setting preferredOutputLocation: 'gpu-buffer' to avoid expensive memory copies.
Quantization: Prefer uint8 quantized models for CPU (WASM) inference to improve performance; avoid float16 on CPU as it lacks native support and is slow.

6. Large Model Handling (>2GB)

Platform Limits: Browsers like Chrome limit ArrayBuffer to ~2GB. Models exceeding this must be exported with external data.

Loading External Data: Explicitly link external weight files in the session options:

const session = await ort.InferenceSession.create(modelUrl, {
  externalData: [{ path: './model.data', data: dataUrl }]
});

7. Common Edge Cases

Memory Management: Explicitly call tensor.dispose() for GPU tensors to prevent memory leaks.
Zero-Sized Tensors: ORT-Web treats tensors with a dimension of 0 as CPU tensors regardless of the selected EP.
Thermal Throttling: Sustained inference on mobile devices may trigger frequency scaling, doubling latency. Use lightweight "tiny" models to maintain thermal equilibrium.

8. Examples

Multilingual Translation

Offload heavy translation tasks to a separate Web Worker using a singleton pattern to ensure the model (e.g., NLLB-200) loads only once.

Object Detection (YOLO)

Implement Non-Max Suppression (NMS). If the browser lacks support for specific NMS ops, run a separate NMS ONNX model to filter overlapping boxes locally.

Related Skills

openclaw/openclaw-secret-scanning-maintainer

development

VerifiedTrustedCommunity

Maintainer-only workflow for handling GitHub Secret Scanning alerts on OpenClaw. Use when Codex needs to triage, redact, clean up, and resolve secret leakage found in issue comments, issue bodies, PR comments, or other GitHub content.

357,764SKILL.mdUpdated Apr 15, 2026

openclaw/openclaw-secret-scanning-maintainer

openclaw/openclaw-release-maintainer

development

VerifiedTrustedCommunity

Maintainer workflow for OpenClaw releases, prereleases, changelog release notes, and publish validation. Use when Codex needs to prepare or verify stable or beta release steps, align version naming, assemble release notes, check release auth requirements, or validate publish-time commands and artifacts.

357,764SKILL.mdUpdated Apr 10, 2026

openclaw/openclaw-release-maintainer

openclaw/openclaw-qa-testing

development

VerifiedTrustedCommunity

Run, watch, debug, and extend OpenClaw QA testing with qa-lab and qa-channel. Use when Codex needs to execute the repo-backed QA suite, inspect live QA artifacts, debug failing scenarios, add new QA scenarios, or explain the OpenClaw QA workflow. Prefer the live OpenAI lane with regular openai/gpt-5.4 in fast mode; do not use gpt-5.4-pro or gpt-5.4-mini unless the user explicitly overrides that policy.

357,764SKILL.mdUpdated Apr 10, 2026

openclaw/openclaw-qa-testing

openclaw/openclaw-parallels-smoke

development

VerifiedTrustedCommunity

End-to-end Parallels smoke, upgrade, and rerun workflow for OpenClaw across macOS, Windows, and Linux guests. Use when Codex needs to run, rerun, debug, or interpret VM-based install, onboarding, gateway smoke tests, latest-release-to-main upgrade checks, fresh snapshot retests, or optional Discord roundtrip verification under Parallels.

357,764SKILL.mdUpdated Apr 10, 2026

openclaw/openclaw-parallels-smoke

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/thongnt0208/browser-onnx-skills.git

# Copy into Claude Code skills folder (global)
cp -r browser-onnx-skills/skills/browser-onnx ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

thongnt0208/browser-onnx-skills

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT