skills/react-native-best-practices/references/on-device-ai/SKILL.md
Build on-device AI features in React Native and Expo apps with React Native ExecuTorch. Use when adding AI to a mobile app without cloud dependencies — chatbots, image classification, object detection, OCR, semantic or instance segmentation, style transfer, image generation, pose estimation, speech-to-text, text-to-speech, voice activity detection, semantic search with embeddings, tokenization, privacy / PII redaction, or vision-language image understanding. Also use when mentioning offline / on-device / privacy AI, reducing cloud cost or latency, or managing ML models. Covers initExecutorch and every hook (useLLM, useClassification, useObjectDetection, useOCR, useSemanticSegmentation, useInstanceSegmentation, useStyleTransfer, useTextToImage, useImageEmbeddings, usePoseEstimation, useSpeechToText, useTextToSpeech, useVAD, useTextEmbeddings, useTokenizer, usePrivacyFilter, useExecutorchModule), tool calling, structured output, VLMs, Expo and bare resource-fetcher adapters, and error handling.
npx skillsauth add software-mansion-labs/skills on-device-aiInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Software Mansion's production patterns for on-device AI in React Native and Expo using React Native ExecuTorch.
Targets the current published API (v0.10.x). Load at most one reference file per question. For hook signatures, model constants, or config options not covered here, webfetch the matching page from docs.swmansion.com/react-native-executorch.
What does the feature need?
│
├── Generate / chat with text?
│ └── useLLM → see llm.md
│ ├── Plain chat → standard useLLM
│ ├── Image + text input → useLLM with a VLM model (LFM2_VL_*)
│ ├── Tool / function calling → configure with toolsConfig
│ └── Structured JSON output → getStructuredOutputPrompt
│
├── Understand or transform images?
│ ├── What is in this image? → useClassification → see vision.md
│ ├── Where are the objects? → useObjectDetection → see vision.md
│ ├── Per-pixel class → useSemanticSegmentation → see vision.md
│ ├── Per-instance segmentation → useInstanceSegmentation → see vision.md
│ ├── Human pose keypoints → usePoseEstimation → see vision.md
│ ├── Read text from image → useOCR / useVerticalOCR → see vision.md
│ ├── Apply artistic style → useStyleTransfer → see vision.md
│ ├── Generate image from prompt → useTextToImage → see vision.md
│ └── Embed image as vector → useImageEmbeddings → see vision.md
│
├── Speech / audio?
│ ├── Transcribe speech → useSpeechToText → see speech.md
│ ├── Synthesize speech → useTextToSpeech → see speech.md
│ └── Detect speech segments → useVAD → see speech.md
│
├── Text utilities?
│ ├── Embed text as vector → useTextEmbeddings → see vision.md
│ ├── Count or inspect tokens → useTokenizer → see setup.md
│ └── Redact PII from text → usePrivacyFilter → see setup.md
│
├── Full RAG pipeline (retrieval + generation + vector store)?
│ └── react-native-rag (sibling library) → see setup.md
│
└── Custom `.pte` model not covered by a dedicated hook?
└── useExecutorchModule → see setup.md
Call initExecutorch() at app entry, before any other API. The library does not bundle a network/file layer — you must register a resource-fetcher adapter (ExpoResourceFetcher for Expo, BareResourceFetcher for bare RN). Any hook called before initialization throws ResourceFetcherAdapterNotInitialized.
Check isReady before calling forward / generate / transcribe. All hooks load asynchronously. Inference before the model is ready throws ModuleNotLoaded.
Interrupt LLM generation before unmounting. Unmounting while isGenerating is true crashes. Call llm.interrupt() and wait for isGenerating === false before navigating away.
Use quantized model variants on mobile. Full-precision variants exceed device memory on most phones. Every supported model ships a _QUANTIZED variant — prefer it unless you've measured otherwise.
Audio for speech-to-text and VAD must be 16 kHz mono. Mismatched sample rates produce silently garbled transcriptions. Decode with new AudioContext({ sampleRate: 16000 }).
Audio from text-to-speech is 24 kHz. Create the playback context with new AudioContext({ sampleRate: 24000 }).
The New Architecture (Fabric) is required. Old architecture is unsupported. Expo Go is unsupported — use a custom dev build (npx expo prebuild). iOS release builds need a real device (the simulator lacks the Metal APIs ExecuTorch relies on).
// App.tsx (Expo)
import { initExecutorch } from 'react-native-executorch';
import { ExpoResourceFetcher } from 'react-native-executorch-expo-resource-fetcher';
initExecutorch({ resourceFetcher: ExpoResourceFetcher });
// App.tsx (bare React Native)
import { initExecutorch } from 'react-native-executorch';
import { BareResourceFetcher } from 'react-native-executorch-bare-resource-fetcher';
initExecutorch({ resourceFetcher: BareResourceFetcher });
Full setup, Metro config for bundled .pte files, custom adapters, model-loading strategies, and error handling: see setup.md.
| Hook | Purpose | Reference |
|---|---|---|
| useLLM | Text generation, chat, tool calling, VLM | llm.md |
| useClassification | Image categorisation | vision.md |
| useObjectDetection | Bounding-box detection (YOLO26, RF-DETR, SSDLite) | vision.md |
| useSemanticSegmentation | Per-pixel class segmentation | vision.md |
| useInstanceSegmentation | Per-instance segmentation | vision.md |
| usePoseEstimation | COCO 17-keypoint human pose | vision.md |
| useStyleTransfer | Artistic image filters | vision.md |
| useTextToImage | Stable Diffusion image generation | vision.md |
| useImageEmbeddings | CLIP image embeddings | vision.md |
| useOCR | Horizontal text OCR | vision.md |
| useVerticalOCR | Vertical text OCR (experimental, CJK) | vision.md |
| useTextEmbeddings | Sentence embeddings for similarity / RAG | vision.md |
| useSpeechToText | Whisper transcription (batch + streaming) | speech.md |
| useTextToSpeech | Kokoro TTS (batch + streaming, phoneme input) | speech.md |
| useVAD | FSMN voice activity detection | speech.md |
| useTokenizer | HuggingFace-compatible tokenization | setup.md |
| usePrivacyFilter | On-device PII / privacy redaction | setup.md |
| useExecutorchModule | Custom .pte model inference | setup.md |
Every hook also has a non-React Module counterpart (e.g. LLMModule.fromModelName(...), ClassificationModule.fromModelName(...)) for use outside React components.
| Symptom | Likely cause | Fix |
|---|---|---|
| ResourceFetcherAdapterNotInitialized | initExecutorch not called | Call it at app entry with an adapter |
| ModuleNotLoaded | Inference before model finished loading | Gate calls on isReady |
| MemoryAllocationFailed on launch | Model too large for device | Switch to _QUANTIZED variant or smaller parameter count |
| App crashes on screen navigation | Unmount during active generation | llm.interrupt() and await isGenerating === false |
| Whisper produces garbled text | Wrong sample rate | Decode audio at 16 kHz mono |
| TTS output sounds chipmunked | Playback context at wrong rate | Create AudioContext({ sampleRate: 24000 }) |
| Build fails on iOS simulator (release) | Simulator lacks Metal APIs | Build release on real device |
Full error code list and recovery patterns: setup.md.
| File | When to read |
|---|---|
| llm.md | useLLM functional + managed modes, tool calling, structured output (JSON Schema / Zod), interrupting, vision-language models, generation config |
| vision.md | Image classification, object detection, semantic + instance segmentation, pose estimation, OCR (horizontal + vertical), style transfer, text-to-image, image + text embeddings |
| speech.md | Speech-to-text (Whisper batch + streaming with timestamps), text-to-speech (Kokoro batch + streaming, phoneme input, voice catalogue), voice activity detection, audio sample-rate requirements |
| setup.md | initExecutorch, Expo / bare resource-fetcher adapters, model loading strategies, Metro config, error codes and recovery, useExecutorchModule for custom .pte models, useTokenizer, usePrivacyFilter, full model catalogue |
development
Use when the user mentions migrating deep links, switching away from Branch or AppsFlyer, replacing their deep linking SDK, setting up Detour deep linking for the first time, or asks how Branch/AppsFlyer concepts map to Detour. Covers the complete migration end to end - Detour Dashboard configuration, Universal Links and App Links setup, SDK swap with code examples, and analytics migration. Works across Android, iOS, React Native, and Flutter.
development
Complete onboarding guide for developers who are new to Detour, the open-source deferred deep linking SDK by Software Mansion. Use this skill whenever a user asks what Detour is, how to get started with Detour, how to set up deep linking with Detour, how to install the Detour SDK, how to configure the Detour dashboard, or how deferred deep linking works. Also use it when the user has no prior deep linking setup and wants to add deep links to their app. Covers everything from zero to production: account setup, dashboard configuration, Universal Links and App Links, platform SDK integration for React Native, iOS, Android, and Flutter, analytics, and architecture.
tools
React Native / Expo SDK for Fishjam — video/audio streaming on iOS and Android. Use when writing a React Native or Expo app that calls Fishjam, configures the Fishjam Expo plugin, sets up permissions, runs background streaming, integrates CallKit, or renders RTCView. Trigger on: '@fishjam-cloud/react-native-client', 'fishjam expo plugin', 'FishjamProvider mobile', 'useCameraPermissions', 'useMicrophonePermissions', 'useForegroundService', 'useCallKit', 'useCallKitEvent', 'useCallKitService', 'RTCView', 'RTCPIPView', 'ScreenCapturePickerView', 'startPIP', 'stopPIP', 'AudioDeviceType', 'useAudioOutput', '@fishjam-cloud/react-native-webrtc', 'fishjam react native', 'expo fishjam', 'fishjam ios', 'fishjam android', 'broadcast extension'. Re-exports @fishjam-cloud/react-client hooks plus mobile-only: permissions, foreground service, iOS broadcast extension, audio routing, CallKit, Expo config plugin.
tools
Browser-only React SDK for Fishjam — joining rooms, capturing camera/microphone/screen, displaying peers, and acting as a livestream streamer or viewer in a React web app. Use whenever the user is writing a React app in a browser that calls Fishjam APIs, sets up FishjamProvider, or uses any Fishjam React hook. Trigger on: '@fishjam-cloud/react-client', 'FishjamProvider', 'useConnection', 'useCamera', 'useMicrophone', 'useScreenShare', 'usePeers', 'useDataChannel', 'useVAD', 'useLivestreamStreamer', 'useLivestreamViewer', 'useCustomSource', 'useInitializeDevices', 'useUpdatePeerMetadata', 'useSandbox', 'PeerWithTracks', 'joinRoom', 'peerToken', 'fishjamId', 'fishjam react', '@fishjam-cloud/ts-client', 'FishjamClient ts-client'. Covers the provider, the full hook catalog, simulcast configuration, custom sources, data channels, VAD, livestream WHEP playback, device persistence, and reconnection. Briefly notes when to drop down to @fishjam-cloud/ts-client for non-React or worker contexts.