dev-workflow/SKILL.md
通用开发验证工作流,用于 AI/ML 推理服务的开发、测试和验证。支持多种硬件后端(Ascend NPU、GPU)和推理引擎(vLLM、MindIE)。包含完整的需求对齐、代码检查、服务部署、性能测试和结果分析流程。当用户提到"开发"、"测试"、"性能对比"、"服务部署"、"推理验证"等需求时使用此工作流。
npx skillsauth add sunchendd/good_skills dev-workflowInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
在使用此工作流前,请先确认或自定义以下配置:
# === 可配置参数 ===
config:
# 硬件配置
hardware:
type: "npu" # npu 或 gpu
devices: "0,1" # 设备ID列表
# 模型配置
model:
main: "/data/models/Qwen3-32B"
draft: null # 投机解码的draft模型
# 服务配置
service:
port: 8000
tensor_parallel_size: 2
max_num_seqs: 16
speculative_config: null
# 测试配置
test:
concurrent: 4
input_tokens: 1024
output_tokens: 1024
warmup: 3
iterations: 10
提示: 使用时可告诉 Claude 自定义参数,如"使用端口 9000 和 GPU 0,1 启动服务"
在开始开发前,执行以下流程:
修改完成后执行:
修改点+时间<YYYYMMDD># 示例
git add <修改的文件>
git commit -m "feature: 添加自适应投机解码逻辑 20260120"
确认可用设备:
# NPU
python3 scripts/check_npu.py
# GPU
nvidia-smi
启动服务前先清理环境:
# 清理进程
./scripts/kill_server.sh
# 清理显存/NPU内存
./scripts/clear_memory.sh
根据配置选择启动命令:
vLLM 服务:
export CUDA_VISIBLE_DEVICES=${config.hardware.devices}
vllm serve ${config.model.main} \
--tp ${config.service.tensor_parallel_size} \
--port ${config.service.port} \
--max-num-seqs ${config.service.max_num_seqs}
vLLM 投机解码服务:
export ASCEND_RT_VISIBLE_DEVICES=${config.hardware.devices}
vllm serve ${config.model.main} \
--tp ${config.service.tensor_parallel_size} \
--port ${config.service.port} \
--spectral-config '{
"model": "${config.model.draft}",
"num_speculative_tokens": 4,
"method": "eagle3"
}'
MindIE 服务:
./scripts/start_mindie.sh \
--model ${config.model.main} \
--port ${config.service.port} \
--device ${config.hardware.type}
阻塞等待服务就绪:
./scripts/wait_for_service.sh \
--port ${config.service.port} \
--timeout 300
./scripts/run_perf_test.sh \
--port ${config.service.port} \
--concurrent ${config.test.concurrent} \
--input ${config.test.input_tokens} \
--output ${config.test.output_tokens}
./scripts/run_accuracy_test.sh \
--port ${config.service.port} \
--datasets "mmlu,cmmlu"
| 脚本 | 路径 | 说明 |
|------|------|------|
| 资源检查 | scripts/check_npu.py / scripts/check_gpu.py | 检查可用设备 |
| 清理环境 | scripts/kill_server.sh | 清理所有服务进程 |
| 启动 vLLM | scripts/start_vllm.sh | 启动 vLLM 服务 |
| 启动 MindIE | scripts/start_mindie.sh | 启动 MindIE 服务 |
| 健康检查 | scripts/wait_for_service.sh | 等待服务就绪 |
| 性能测试 | scripts/run_perf_test.sh | 执行性能测试 |
| 精度测试 | scripts/run_accuracy_test.sh | 执行精度测试 |
| 结果分析 | scripts/analyze_results.py | 分析测试结果 |
config:
hardware:
type: "npu"
devices: "14,15"
model:
main: "/data2/weights/Qwen_Qwen3-32B"
config:
hardware:
type: "gpu"
devices: "0,1"
model:
main: "/data/models/Qwen3-32B"
# 使用默认配置
"帮我启动 vLLM 服务并进行性能测试"
# 自定义配置
"使用端口 9000、GPU 0,1 进行对比测试"
"修改 max_num_seqs 为 32 后重新测试"
"测试 MindIE 推理性能"
tools
小红书检索与发布工具。Use when "小红书检索", "发布小红书", "xhs MCP", "搜索小红书内容". 基于本地 MCP Server 或 xhs-mcp CLI,支持搜索、查看、发布三种操作。
data-ai
自动周报生成。Use when "周报", "本周工作总结", "weekly report", "自动生成周报". 汇总本周日历、GitHub 活动,AI 生成工作总结+时间分析+下周规划,通过邮件和 Bark 推送。
testing
Use when testing vLLM performance, running benchmarks, comparing inference configurations, cleaning up GPU environments, or generating performance reports. Activates for benchmarking throughput/latency, configuring vLLM serve parameters, using evalscope or vllm bench, and producing comparison tables.
development
Use when developing vLLM features including speculative decoding (Eagle3, MTP, draft model, suffix, parallel drafting), KV cache optimization (sparsity, offloading, prefix caching), attention backends, and throughput/TPS improvements. Activates for architecture design, implementation, and parameter tuning of inference performance features.