skills/npu-smi/SKILL.md
Huawei Ascend NPU npu-smi command reference. Use for device queries (health, temperature, power, memory, processes, ECC), configuration (thresholds, modes, fan), firmware upgrades (MCU, bootloader, VRD), virtualization (vNPU), and certificate management.
npx skillsauth add Ascend/agent-skills npu-smiInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Quick reference for Huawei Ascend NPU device management commands.
npu-smi info -l # List all devices
npu-smi info -t health -i 0 # Check device health
npu-smi info -t temp -i 0 -c 0 # Check temperature
npu-smi info -t power -i 0 -c 0 # Check power
npu-smi info -t memory -i 0 -c 0 # Check memory
npu-smi info -l # List devices
npu-smi info -t health -i <id> # Health status (OK/Warning/Error)
npu-smi info -t board -i <id> # Board details (firmware, software version)
npu-smi info -t npu -i <id> -c <chip> # Chip details (name, health, usage)
npu-smi info -m # List all chips
npu-smi info -t temp -i <id> -c <chip> # Temperature (NPU, AI Core)
npu-smi info -t power -i <id> -c <chip> # Power usage and limit
npu-smi info -t memory -i <id> -c <chip> # Memory usage, total, rate
npu-smi info proc -i <id> -c <chip> # Running processes (PID, memory, AI Core usage)
npu-smi info -t ecc -i <id> -c <chip> # ECC errors and mode
npu-smi info -t usages -i <id> -c <chip> # Utilization (AI Core, memory, bandwidth)
npu-smi info -t pcie-info -i <id> -c <chip> # PCIe speed and width
npu-smi info -t p2p -i <id> -c <chip> # P2P status and mode
npu-smi info -t product -i <id> -c <chip> # Product name and serial
See: references/device-queries.md for output formats, examples, monitoring scripts, and platform identification (A2 vs A3).
npu-smi set -t temperature -i <id> -c <chip> -d <value> # Temperature threshold (°C)
npu-smi set -t power-limit -i <id> -c <chip> -d <value> # Power limit (W)
npu-smi set -t ecc-mode -i <id> -c <chip> -d <0|1> # 0=Disable, 1=Enable
npu-smi set -t compute-mode -i <id> -c <chip> -d <mode> # 0=Default, 1=Exclusive, 2=Prohibited
npu-smi set -t persistence-mode -i <id> -d <0|1> # Persistence mode
npu-smi set -t p2p-mem-cfg -i <id> -c <chip> -d <0|1> # P2P configuration
npu-smi set -t pwm-mode -d <0|1> # 0=Manual, 1=Automatic
npu-smi set -t pwm-duty-ratio -d <0-100> # Fan speed (percent)
npu-smi set -t mac-addr -i <id> -c <chip> -d <mac_id> -s "XX:XX:XX:XX:XX:XX"
npu-smi set -t boot-select -i <id> -c <chip> -d <3|4> # 3=M.2 SSD, 4=eMMC
npu-smi set -t cpu-freq-up -i <id> -d <0|1> # 0=1.9GHz/800MHz, 1=1.0GHz/800MHz
npu-smi set -t sys-log-enable -d <0|1> # System logging
npu-smi clear -t ecc-info -i <id> -c <chip> # Clear ECC errors
npu-smi clear -t tls-cert-period -i <id> -c <chip> # Restore cert threshold
See: references/configuration.md for parameter tables and examples.
Query → Upgrade → Check Status → Activate → Restart
npu-smi upgrade -b <item> -i <id> # Query current version
npu-smi upgrade -t <item> -i <id> -f <file.hpm> # Upload firmware
npu-smi upgrade -q <item> -i <id> # Check upgrade status
npu-smi upgrade -a <item> -i <id> # Activate firmware
| Component | Item Name | Restart Required |
|-----------|-----------|------------------|
| MCU | mcu | Yes (restart) |
| Bootloader | bootloader | Yes (restart) |
| VRD | vrd | Yes (power cycle 30s) |
See: references/firmware-upgrade.md for complete procedures.
npu-smi info -t vnpu-mode # Query AVI mode (0=Container, 1=VM)
npu-smi info -t template-info # List all templates
npu-smi info -t template-info -i <id> # Templates for specific device
npu-smi info -t info-vnpu -i <id> -c <chip> # View vNPU info
npu-smi set -t vnpu-mode -d <0|1> # Set AVI mode
npu-smi set -t create-vnpu -i <id> -c <chip> -f <template> [-v <vnpu_id>] [-g <vgroup_id>]
npu-smi set -t destroy-vnpu -i <id> -c <chip> -v <vnpu_id>
vNPU ID Range: [phy_id*16+100, phy_id*16+115]
See: references/virtualization.md for vNPU creation and management.
npu-smi info -t tls-csr-get -i <id> -c <chip> # Generate CSR (PEM format)
npu-smi info -t tls-cert -i <id> -c <chip> # View certificate details
npu-smi info -t tls-cert-period -i <id> -c <chip> # Check expiration threshold
npu-smi info -t rootkey -i <id> -c <chip> # Rootkey status
npu-smi set -t tls-cert -i <id> -c <chip> -f "<tls.pem> <ca.pem> <subca.pem>"
npu-smi set -t tls-cert-period -i <id> -c <chip> -s <days> # Set threshold (7-180 days)
npu-smi clear -t tls-cert-period -i <id> -c <chip> # Restore default (90 days)
See: references/certificate-management.md for certificate lifecycle management.
| Parameter | Description | How to Get |
|-----------|-------------|------------|
| id | Device ID (NPU ID) | npu-smi info -l |
| chip_id | Chip ID | npu-smi info -m |
| vnpu_id | vNPU ID | Auto-assigned or specified in range |
| mac_id | MAC interface | 0=eth0, 1=eth1, 2=eth2, 3=eth3 |
Note: Chip name (e.g., 910B3) does not indicate server platform (A2 vs A3). Use
dmidecode -t system | grep Productornpu-smi info -t productto identify the server model. See references/device-queries.md for details.
npu-smi info -lnpu-smi info -mtesting
Kubernetes 集群健康检查与安全修复 — 诊断问题,用户确认后执行修复
tools
昇腾NPU CANN Toolkit+Kernels+NNAL安装部署技能。支持从官网下载run包安装和从Docker镜像提取两种方式,覆盖驱动检查、包下载、安装、环境变量配置与验证全流程。当用户需要安装CANN全套组件或指定版本CANN到自定义路径时调用。
development
编译 ATB (Ascend Transformer Boost) 测试框架。当用户需要编译 ATB 测试框架、 运行 CSV 测试、或构建 atb_test_framework 时调用。支持全量编译(含第三方依赖克隆与源替换) 和增量编译两种模式。需在 Docker 容器内配合 CANN 环境执行。
databases
ATB OPS→ACLNN 迁移标准化工作流主模板。整合前置学习、设计文档生成、CSV用例设计、 实际迁移、编译验证、测试验证全流程,提供明确的阶段 Gates 和用户确认机制。