skills/quant-by-modelslim/SKILL.md
Entry point for Ascend NPU inference toolchain. Use when running vLLM on Ascend/NPU, quantizing models with msmodelslim, or debugging NPU errors.
npx skillsauth add holyorevil/ascend-model-agent-plugin ascendInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill manages Ascend NPU-related tasks, troubleshooting, and toolchain usage.
Run at the start of every session before any quantization or inference task:
npu-smi info
Verify:
kill -9 $(pgrep -f vllm) 2>/dev/null
kill -9 $(pgrep -f python) 2>/dev/null
ASCEND_RT_VISIBLE_DEVICES controls which NPUs are visible to both vLLM and msmodelslim. Set this before any command that touches NPUs.
All actual run/quantization/inference commands must be saved to a shell script and executed through it. The script must redirect both stdout and stderr to a log file so that output is preserved for debugging.
Template:
cat > run.sh << 'EOF'
#!/bin/bash
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
LOG_FILE="${SCRIPT_DIR}/run_$(date +%Y%m%d_%H%M%S).log"
# Environment setup
export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3
# Run and log
"$@" 2>&1 | tee "$LOG_FILE"
EOF
chmod +x run.sh
./run.sh <your-command>
Key points:
2>&1 | tee "$LOG_FILE"chmod +x before execution&, no nohup, no run_in_background); run it in the foreground so output streams to the terminal in real timeFor detailed instructions on specific tools, refer to:
vllm, vllm-ascend, msmodelslim, and ais_bench — are installed in editable mode. Before referencing or modifying any of them, run pip show <package> to locate the source directory. Never assume a fixed path.pip show <package> to find the editable source location for deep debugging.git checkout -b debug/<topic>
data-ai
昇腾(Ascend) NPU 上 Triton 算子深度性能优化技能(Skill),致力于实现用户要求的 Triton 算子性能提升。核心技术包括但不限于 Unified Buffer (UB) 容量规划、多 Tokens 并行处理、MTE/Vector 流水并行、mask(掩码)优化等。当用户提及以下内容时,务必触发此技能(Skill):昇腾(Ascend)NPU 上 Vector 类 Triton 算子性能优化。
development
从模型仓库链接读取 README 文档。当用户想要从模型仓库链接(如 https://ai.gitcode.com/Ascend-SACT/Qwen3.5-27B-A2-Vllm-Ascend)获取部署文档、使用说明或任何仓库内容时触发此 skill。使用此 skill 来获取仓库的 README、文档内容、部署命令等。
tools
GPU代码到昇腾NPU适配审查专家。当用户需要将GPU上的代码(特别是深度学习、模型推理相关)迁移到华为昇腾NPU时,必须使用此skill进行全面审查。此skill能识别GPU到NPU迁移的堵点、编写适配脚本、生成验证方案,并输出完整的Markdown审查报告。触发场景包括:用户提到"NPU适配"、"昇腾迁移"、"GPU转NPU"、"Ascend"、"CANN"、"模型迁移"、"算子适配"等关键词,或者用户要求对GPU代码仓库进行审查并迁移到NPU平台。
data-ai
根据模型名称识别其所属系列和开发供应商。当用户需要从模型名称判断模型属于什么系列(如GLM、Qwen3、DeepSeek、MiniCPM等)以及其开发商/供应商时使用此skill。