tools/skills/qai-runner-skill/SKILL.md
AIPC, AI Porting Conversion. Tools and workflows for model conversion, inspection, quantization, and inference to Qualcomm platform. Use this skill when working with AI model to ONNX, ONNX models for QNN/SNPE DLC, converting models to FP16/FP32, generating context binaries, or implementing inference for QNN/SNPE DLC.
npx skillsauth add quic/ai-engine-direct-helper aipc-toolkitInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill provides a suite of tools and documentation for developing and deploying AI models on Qualcomm AI PCs and edge devices. It covers the end-to-end workflow from AI model export to ONNX, model inspection and conversion, to inference implementation.
!!! cautions and issues!!!
Prerequisites:
QAIRT_SDK_ROOT environment variable must be setfor windows, Refer to references\win_qairt_setup.md. for linux, check offical web page - https://docs.qualcomm.com/doc/80-63442-10/topic/linux_setup.html
Refer to references/model_export_validation.md for the complete guide on exporting models to ONNX and performing multi-stage validation. The ONNX model will be used for QNN/SNPE conversion.
Convert ONNX models to either QNN or SNPE format with specified floating-point precision (FP16 or FP32).
There are two ways to convert ONNX models to SNPE DLC format:
Method 1: Using qairt-converter (Recommended)
Use qairt-converter to convert ONNX models to SNPE DLC format. This is a unified converter that supports both QNN and SNPE formats.
Usage: Choose host toolchain from system. For Windows, always use x86_64-windows-msvc. For x86 Linux, use x86_64-linux-clang. ARM Linux cross-compilation is not supported.
python ${QAIRT_SDK_ROOT}/bin/HOST_TOOLCHAIN/qairt-converter \
--input_network model.onnx \
--output_path model.dlc \
--float_bitwidth [16|32]
Use scripts/aipc_convert_fp.py to convert ONNX models to QNN format. This script automates calls to qnn-onnx-converter and qnn-model-lib-generator.
For ARM Windows, need extra " HTP Context Binary Generation" step for inference.
Usage:
python scripts/aipc_convert_fp.py --precision [16|32]
Precision Options:
--precision 16: FP16 conversion (default, recommended for NPU)--precision 32: FP32 conversionGenerated Files and Folders:
{model_name}.bin - Binary QNN model file{model_name}.cpp - C++ representation of the model{model_name}.yaml - Input/output configuration file (generated by inspection step){model_name}.dll - Compiled model library (in test_libs_{model_name}_fp{precision}_{target_arch}/{architecture}/ directory)test_libs_{model_name}_fp{precision}_{target_arch}/ - Directory containing the compiled model library and supporting filesCleanup Option:
By default, the script now includes a --clean_up flag for the qnn-model-lib-generator which cleans up temporary build files during compilation.
Example:
python scripts/aipc_convert_fp.py --precision 16
Use scripts/aipc_convert_int.py to convert ONNX models to QNN format with integer quantization (INT8/INT16). This script handles quantized model conversion with calibration data support.
Usage:
python scripts/aipc_convert_int.py --precision [8|16]
Precision Options:
--precision 8: INT8 conversion (quantized, recommended for power efficiency)--precision 16: INT16 conversion (quantized)Generated Files and Folders:
{model_name}.bin - Binary QNN quantized model file{model_name}.cpp - C++ representation of the quantized model{model_name}.yaml - Input/output configuration file (generated by inspection step){model_name}.dll - Compiled quantized model library (in test_libs_{model_name}_int{precision}_{target_arch}/{architecture}/ directory)test_libs_{model_name}_int{precision}_{target_arch}/ - Directory containing the compiled quantized model library and supporting filesCleanup Option:
By default, the script now includes a --clean_up flag for the qnn-model-lib-generator which cleans up temporary build files during compilation.
Example:
python scripts/aipc_convert_int.py --precision 8
If model conversion fails, follow this troubleshooting workflow:
Verify Conversion Status: Check the conversion output to identify specific errors or unsupported operators.
Apply Model Patches: Refer to references/model_export_validation.md for guidance on patching the model. This typically involves:
Retry Conversion: After patching, repeat the Model Conversion step with the updated ONNX model.
Escalation: If conversion continues to fail after patching attempts, do not substitute with a different model. Instead, escalate the issue for technical support to resolve the underlying conversion compatibility problem.
Note: Model patching should address the root cause of conversion failures rather than working around them with alternative models.
Use scripts/aipc_inspect_onnxio.py to inspect ONNX models and generate a YAML configuration file containing input and output tensor names, which will be used for inference scripts.
Usage:
python scripts/aipc_inspect_onnxio.py [model.onnx]
Run converted models through the ONNX→QNN/SNPE wrapper.
scripts/aipc and scripts/qai_onnxruntime.py into the working folder.QAIRT_SDK_ROOT is set and SDK runtime folders are on PATH (Windows) or LD_LIBRARY_PATH (Linux).python aipc path/to/onnx_inference.py
python scripts/aipc_inspect_onnxio.py model.onnx
Common fixes: add SDK bins to PATH, use absolute context-binary paths, or regenerate the YAML I/O file.
Usage: use absolute path for dlc model. in windows
qnn-context-binary-generator --backend QnnHtp.dll --model QnnModelDlc.dll --dlc_path [model.dlc] --binary_file [model.dlc]
in linux
${QAIRT_SDK_ROOT}/bin/aarch64-oe-linux-gcc11.2/qnn-context-binary-generator --backend libQnnHtp.so --model libQnnModelDlc.so --dlc_path [model.dlc] --binary_file [model.dlc]
it will create output folder, and model.dlc.bin will in output folder.
Use scripts/aipc_dev_gen_contextbin.py to generate a QNN context binary from a compiled model library (.so). This must be executed on the target device.
Usage:
python scripts/aipc_dev_gen_contextbin.py --model path/to/model.so
For a complete workflow starting from a PyTorch model:
PyTorch to ONNX Export: Export your PyTorch model to ONNX format. Ensure any custom operators are handled or replaced if necessary.
Model Inspection: Use aipc_inspect_onnxio.py to inspect the ONNX model and generate I/O configuration.
Model Conversion: Convert the ONNX model to SNPE format with your desired precision (FP16 recommended).
Conversion Troubleshooting: If conversion fails, apply model patches as described in model_export_validation.md and retry.
Context Binary Generation: On the target device, generate the QNN context binary using aipc_dev_gen_contextbin.py.
Inference Implementation: Integrate the converted model into your app using qai_onnxruntime and the aipc ONNX Runtime loader (QNN/SNPE).
For quantized models requiring calibration data:
Refer to references/model_quantization.md for calibration data preparation and detailed conversion steps.
Prerequisites: Before using any AIPC toolkit functionality, the QAIRT SDK environment must be properly configured.
Windows Setup:
Refer to references/aipcenv_setup_windows.md for detailed Windows-specific installation and configuration instructions.
Linux Setup:
Follow the official Qualcomm AI Engine Direct SDK documentation for Linux environment setup. Ensure the QAIRT_SDK_ROOT environment variable is properly set to point to your SDK installation directory.
Verification: After setup, verify your environment by checking:
echo $QAIRT_SDK_ROOT
ls $QAIRT_SDK_ROOT/bin/
The SDK should be accessible and all required binaries should be present in the bin directory.
User: "Deploy this AI application to Qualcomm platform"
Agent:
Deployment plan:
1. Export to ONNX and validate with test inference
2. Convert to QNN using aipc_convert_fp.py (FP16)
3. If conversion fails, patch model and retry
4. Inspect I/O with aipc_inspect_onnxio.py
5. Generate context binary on target device
6. Execute inference and validate performance
7. Retrieve and verify results
update and confirm it. let's run fast, don't walk!
scripts/aipc_inspect_onnxio.py: Inspects ONNX files and generates I/O YAML configurationsscripts/aipc_convert_fp.py: Converts ONNX models to QNN format with FP16/FP32 precisionscripts/aipc_dev_gen_contextbin.py: Generates QNN context binaries on target devicesreferences/model_export_validation.md: Complete guide for exporting and validating models, including patching strategiesreferences/model_quantization.md: Complete guide for model quantization and optimization techniquestools
QAI AppBuilder technical documentation retrieval. Find guides on installation, Python/C++ APIs, and model deployment examples. QAI AppBuilder is a rapid AI application development framework designed to simplify the deployment of QNN models on NPU (HTP) across Qualcomm(高通) Windows on Snapdragon, Android, and Linux platforms. This tool is highly suitable for deploying classic models (all types of models except large language models can be deployed via QAI AppBuilder), such as real_esrgan_x4plus, inception_v3, beit, easy_ocr, and whisper_base_en. This tool is only applicable for loading QNN (*.bin) format models and performing inference, and is not suitable for converting model formats.
development
GenieAPIService technical documentation retrieval. Find guides on platform deployment, model configuration, and API usage. GenieAPIService is an OpenAI-compatible API service that enables running large language models(include LLM & VLM model) locally on Qualcomm(高通) Windows on Snapdragon, Android, and Linux platforms. It leverages the device local NPU(HTP) or CPU for efficient inference.
tools
Use when work should span one or more detached tasks but still behave like one job with a single owner context. TaskFlow is the durable flow substrate under authoring layers like Lobster, ACPX, plugins, or plain code. Keep conditional logic in the caller; use TaskFlow for flow identity, child-task linkage, waiting state, revision-checked mutations, and user-facing emergence.
tools
# Lobster Lobster executes multi-step workflows with approval checkpoints. Use it when: - User wants a repeatable automation (triage, monitor, sync) - Actions need human approval before executing (send, post, delete) - Multiple tool calls should run as one deterministic operation ## When to use Lobster | User intent | Use Lobster? | | ------------------------------------------------------ | --------------------------