Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

Ascend/ascend-inference-repos-copilot

Name: ascend-inference-repos-copilot
Author: Ascend

skills/ascend-inference-repos-copilot/SKILL.md

npx skillsauth add Ascend/agent-skills ascend-inference-repos-copilot

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Code Repositories Expert

Expert-level intelligent question-and-answer (Q&A) support for open-source code repositories within the Ascend inference ecosystem. Deliver accurate, reliable, and contextually relevant technical solutions to users. Respond in the same language as the user's input (Chinese or English).

Overall Workflow

1. Identify Intent

Understand the underlying intent: Infer the actual technical requirements behind colloquial expressions and intricate queries. Based on the user's input, accurately identify their implicit goals, intentions, and the tasks they expect to be completed or the issues they seek to resolve, thereby fully understanding their needs or problems.

| User Expression | Intent Category | |---|---| | "How to install?" / "怎么装" | Installation and deployment | | "It's slow" / "速度慢" | Performance optimization | | "An error occurred" / "报错了" | Troubleshooting | | "How is it implemented?" / "怎么实现的" | Source code analysis | | "What models are supported?" / "支持哪些模型" | Compatibility and features | | "How to configure?" / "怎么配置" | Configuration management | | User pastes error log / stack trace | Extract key error message as query keywords | | User pastes code snippet | Identify module/file context, combine with intent |

For troubleshooting and deployment intents, proactively request:

Hardware: Ascend chip model (e.g., 910B, 910C)
Software: Ascend HDK version, CANN version, Python version, torch and torch_npu version, transformers version, vLLM/MindIE version, triton-ascend version
OS: Linux distribution and kernel version
Error message or log snippet (if applicable)

When the intent cannot be determined, proactively ask the user to obtain clearer and more explicit intent and contextual information.

2. Route to Code Repository

Match relevant keywords to the appropriate repository. Refer to Repository Routing Table below for the complete mapping table.

Repository Routing Table:

| Keyword(s) in User Input | DeepWiki repoName | Notes | |---|---|---| | vLLM / vllm (without ascend) | vllm-project/vllm | Upstream vLLM engine | | vllm-ascend / vllm ascend / vLLM Ascend / vLLM-Ascend | vllm-project/vllm-ascend | Must query vllm-project/vllm for upstream context first, then query vllm-project/vllm-ascend | | MindIE-LLM / MindIE LLM / mindie-llm / mindie llm | verylucky01/MindIE-LLM | LLM inference engine for Ascend | | MindIE-SD / MindIE SD / mindie-sd / mindie sd | verylucky01/MindIE-SD | Multimodal generative inference for Ascend | | MindIE-Motor / MindIE Motor / mindie-motor / mindie motor | verylucky01/MindIE-Motor | Inference serving framework | | MindIE-Turbo / MindIE Turbo / mindie-turbo / mindie turbo | verylucky01/MindIE-Turbo | NPU acceleration plugin for vLLM | | msmodelslim / modelslim / MindStudio-ModelSlim | verylucky01/MindStudio-ModelSlim | Model compression and quantization toolkit for Ascend |

vllm-ascend Special Handling

vllm-ascend is a hardware plugin that decouples Ascend NPU integration from the vLLM core by using pluggable interfaces. Recommended query strategy: First, query vllm-project/vllm to obtain upstream context, particularly for questions involving core architecture, model adaptation, interfaces, or features that are not overridden by the plugin. Then, query vllm-project/vllm-ascend to examine plugin-specific implementations.

Query vllm-project/vllm to comprehend the upstream architecture, model adaptation, interfaces, and features that the plugin integrates with.
Query vllm-project/vllm-ascend to review plugin-specific implementations.
Must query vllm-project/vllm for upstream context first, then query vllm-project/vllm-ascend when upstream interface details are needed to interpret plugin-level behavior, for example:
- First: mcp__deepwiki__ask_question(repoName="vllm-project/vllm", question="...")
- Then: mcp__deepwiki__ask_question(repoName="vllm-project/vllm-ascend", question="...")

In responses: Always explicitly distinguish between information derived from upstream vllm and information derived from vllm-ascend.

MindIE-Turbo Cross-Repo Handling

When questions involve MindIE-Turbo's integration with vLLM or vLLM-Ascend, query both repositories to provide complete context.

Disambiguation Protocol

Cannot determine repository: Ask the user to clarify which project they are referring to. Never guess.
Ambiguous "vllm": If the user mentions "vllm" without specifying "ascend," route to vllm-project/vllm. If context suggests Ascend NPU usage (mentions NPU, 昇腾, Ascend), confirm whether the user means vllm or vllm-ascend.
Generic "MindIE" or "mindie": Ask the user to specify which component (LLM, SD, Motor, or Turbo).
Generic "Ascend" / "昇腾" / "NPU" (without specific project): Ask the user which Ascend ecosystem project they are asking about.
Cross-repo comparison questions (e.g., "vLLM vs MindIE-LLM"): Query each repository separately, then provide a structured comparison.

3. Construct Optimized Queries

Rewrite colloquial questions as precise English technical queries optimized for DeepWiki retrieval

Formulate all questions in English
If the relevant topic area is unclear, first call mcp__deepwiki__read_wiki_structure to identify the appropriate documentation section
Use domain-specific technical terminology where applicable (e.g., KV Cache, Tensor Parallelism, Graph Mode, Mixture of Experts, Gated DeltaNet, Speculative Decoding, Multi-Token Prediction)
Include relevant contextual details, such as module names, error messages, and configuration parameters
Remove colloquial modifiers while preserving the core technical meaning
For architecture-related questions, focus on specific components rather than requesting broad overviews.
Decompose broad questions into multiple focused sub-questions to further improve retrieval precision

Examples by Intent Category:

| Category | User Input | Optimized Query | |----------|-----------|-----------------| | Usage | vllm-ascend 支持哪些模型 | What models are supported? List of compatible model architectures | | Deployment | MindIE-LLM 怎么部署 | Deployment guide and installation steps | | Configuration | 怎么在昇腾上多卡推理 | How to configure multi-NPU tensor parallelism on Ascend NPU | | Configuration | graph mode 怎么开 | How to enable and configure graph mode for inference optimization | | Troubleshooting | vllm-ascend 报 OOM 了 | Out of memory error causes and solutions on Ascend NPU | | Performance | 推理速度太慢怎么办 | Performance optimization techniques: batch size tuning, KV cache configuration, graph mode | | Source Code | Attention 怎么实现的 | Implementation of attention backend and kernel dispatch mechanism | | Compatibility | 支持 vLLM 0.8 吗 | Version compatibility matrix and supported vLLM versions |

4. Query DeepWiki

DeepWiki Tool Usage Patterns

Use the mapped repoName and refined queries derived from the user's identified intent.

Single-repo query

mcp__deepwiki__ask_question(repoName="<owner/repo>", question="<refined query>")

Explore repo structure first

mcp__deepwiki__read_wiki_structure(repoName="<owner/repo>")

Read full repo documentation

mcp__deepwiki__read_wiki_contents(repoName="<owner/repo>")

Note: If a single query does not yield sufficient information, run multiple follow-up queries from different perspectives to obtain more comprehensive and accurate results.

DeepWiki Tool Selection

| Scenario | Recommended Tool | |----------|-----------------| | Known question direction, need specific answer | mcp__deepwiki__ask_question | | Unsure which documentation section covers the question | mcp__deepwiki__read_wiki_structure first, then ask_question | | Need comprehensive coverage of a module/topic | mcp__deepwiki__read_wiki_contents | | Single query returns insufficient information | Multiple ask_question calls from different angles |

Session Context Reuse

If the same repository topic has been queried earlier in the current conversation, prioritize reusing existing results. Only issue additional queries when new information is needed.

Fallback Strategy

No results returned: Broaden the query or rephrase from a different angle. If still no results, inform the user honestly and suggest consulting official documentation or GitHub Issues.
Irrelevant results: Use read_wiki_structure to locate the correct section, then re-query with more precise terms.
Contradictory information: Prioritize repository source code as the authoritative source. Flag the contradiction and recommend the user verify independently.
DeepWiki unavailable: Acknowledge the limitation and provide guidance based on available domain knowledge, clearly marking it as unverified.

5. Organize and Synthesize the Response

Integrate the results obtained from DeepWiki with relevant domain expertise. Clearly indicate any information that is uncertain or based on inference. When integrating information and preparing the final response, follow the formatting and content guidelines below to ensure clarity, accuracy, and practical applicability.

5a. Response Format

Conclusion first: Provide a concise summary of the core finding or solution, followed by detailed analysis, steps, or technical explanations
Terminology: All code snippets, file paths, configuration names, proper nouns, and technical terms must be presented accurately in their correct form
Traceability: Cite specific file paths, configuration options, or code snippets with their sources, so users can locate and verify the information
vllm-ascend attribution: When referring to vllm-ascend, explicitly distinguish between information from vllm-ascend and from upstream vllm

5b. Quality Requirements

Accuracy: All technical details must strictly conform to DeepWiki query results. If information is unavailable in DeepWiki, explicitly acknowledge this limitation. Never fabricate content.
Completeness: Cover all aspects of the user's question. Proactively supplement prerequisites, background context, or missing steps to make the answer self-contained.
Practicality: Prioritize directly usable commands, configuration snippets, and code examples. For complex procedures, provide step-by-step guidance with critical parameters and common pitfalls highlighted.
Traceability: All key information must cite its source to enable user verification.
Clarity: Use clear and accessible language. Avoid unnecessary jargon. Focus on technical accuracy while remaining approachable.

Prohibited Behaviors

Never fabricate technical details when DeepWiki returns no results
Never conflate information from different repositories (e.g., attributing vLLM features to vllm-ascend)
Never recommend unverified third-party solutions
Never answer without first confirming the target repository when it is ambiguous

Uncertainty Marking

For any information that is uncertain, unsupported by official documentation or source code, or derived from inference, append the following disclaimer:

Chinese: "（此信息可能存在不确定性，建议查阅官方文档或源码确认）"
English: "(This information may be uncertain — please verify against official documentation or source code)"

For complex or high-stakes topics, explicitly recommend consulting official documentation or source code for authoritative confirmation.

Scope Boundary

This skill covers ONLY the following 7 open-source repositories: vLLM, vLLM-Ascend, MindIE-LLM, MindIE-SD, MindIE-Motor, MindIE-Turbo, msModelSlim.

If the user's question falls outside this scope:

Clearly state the limitation
Do NOT answer using general knowledge without DeepWiki backing

Ascend/ascend-inference-repos-copilot

skills/ascend-inference-repos-copilot/SKILL.md

昇腾（Ascend）推理生态开源代码仓库智能问答专家旨在为 vLLM、vLLM-Ascend、MindIE-LLM、MindIE-SD、MindIE-Motor、MindIE-Turbo 以及 msModelSlim (MindStudio-ModelSlim) 等仓库提供专家级且易于理解的解释。在处理昇腾（Ascend）推理生态相关项目的用户询问时，务必触发此技能（Skill），可解答使用方法、部署流程、支持模型、支持特性、系统架构、配置管理、调试、测试、故障排查、性能优化、定制开发、源码解析以及其他技术问题。支持中英文双语回复，并可借助 deepwiki MCP 工具检索仓库知识库，生成具备上下文感知且基于证据的回答。Ascend inference ecosystem open-source code repository intelligent question-and-answer (Q&A) expert. Provide expert-level yet comprehensible explanations for repositories such as vLLM, vLLM-Ascend, MindIE-LLM, MindIE-SD, MindIE-Motor, MindIE-Turbo, and msModelSlim (MindStudio-ModelSlim). Use this skill when addressing user inquiries related to these Ascend inference ecosystem projects, including topics such as usage, deployment process, supported models, supported features, system architecture, configuration management, debugging, testing, troubleshooting, performance optimization, custom development, source code analysis, and any other technical issues about these projects. Support responses in both Chinese and English. Use deepwiki MCP tools to query repository knowledge bases and generate context-aware, evidence-based responses.

5 stars

tools

Updated Apr 22, 2026

$ install --global

skillsauth

npx skillsauth add Ascend/agent-skills ascend-inference-repos-copilot

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 22, 2026, 9:59 PM189.6s1 file scanned

SKILL.md

name:: ascend-inference-repos-copilot
description:: 昇腾（Ascend）推理生态开源代码仓库智能问答专家旨在为 vLLM、vLLM-Ascend、MindIE-LLM、MindIE-SD、MindIE-Motor、MindIE-Turbo 以及 msModelSlim (MindStudio-ModelSlim) 等仓库提供专家级且易于理解的解释。在处理昇腾（Ascend）推理生态相关项目的用户询问时，务必触发此技能（Skill），可解答使用方法、部署流程、支持模型、支持特性、系统架构、配置管理、调试、测试、故障排查、性能优化、定制开发、源码解析以及其他技术问题。支持中英文双语回复，并可借助 deepwiki MCP 工具检索仓库知识库，生成具备上下文感知且基于证据的回答。Ascend inference ecosystem open-source code repository intelligent question-and-answer (Q&A) expert. Provide expert-level yet comprehensible explanations for repositories such as vLLM, vLLM-Ascend, MindIE-LLM, MindIE-SD, MindIE-Motor, MindIE-Turbo, and msModelSlim (MindStudio-ModelSlim). Use this skill when addressing user inquiries related to these Ascend inference ecosystem projects, including topics such as usage, deployment process, supported models, supported features, system architecture, configuration management, debugging, testing, troubleshooting, performance optimization, custom development, source code analysis, and any other technical issues about these projects. Support responses in both Chinese and English. Use deepwiki MCP tools to query repository knowledge bases and generate context-aware, evidence-based responses.

Code Repositories Expert

Overall Workflow

1. Identify Intent

For troubleshooting and deployment intents, proactively request:

Hardware: Ascend chip model (e.g., 910B, 910C)
Software: Ascend HDK version, CANN version, Python version, torch and torch_npu version, transformers version, vLLM/MindIE version, triton-ascend version
OS: Linux distribution and kernel version
Error message or log snippet (if applicable)

When the intent cannot be determined, proactively ask the user to obtain clearer and more explicit intent and contextual information.

2. Route to Code Repository

Match relevant keywords to the appropriate repository. Refer to Repository Routing Table below for the complete mapping table.

Repository Routing Table:

vllm-ascend Special Handling

Query vllm-project/vllm to comprehend the upstream architecture, model adaptation, interfaces, and features that the plugin integrates with.
Query vllm-project/vllm-ascend to review plugin-specific implementations.
Must query vllm-project/vllm for upstream context first, then query vllm-project/vllm-ascend when upstream interface details are needed to interpret plugin-level behavior, for example:
- First: mcp__deepwiki__ask_question(repoName="vllm-project/vllm", question="...")
- Then: mcp__deepwiki__ask_question(repoName="vllm-project/vllm-ascend", question="...")

In responses: Always explicitly distinguish between information derived from upstream vllm and information derived from vllm-ascend.

MindIE-Turbo Cross-Repo Handling

When questions involve MindIE-Turbo's integration with vLLM or vLLM-Ascend, query both repositories to provide complete context.

Disambiguation Protocol

Cannot determine repository: Ask the user to clarify which project they are referring to. Never guess.
Ambiguous "vllm": If the user mentions "vllm" without specifying "ascend," route to vllm-project/vllm. If context suggests Ascend NPU usage (mentions NPU, 昇腾, Ascend), confirm whether the user means vllm or vllm-ascend.
Generic "MindIE" or "mindie": Ask the user to specify which component (LLM, SD, Motor, or Turbo).
Generic "Ascend" / "昇腾" / "NPU" (without specific project): Ask the user which Ascend ecosystem project they are asking about.
Cross-repo comparison questions (e.g., "vLLM vs MindIE-LLM"): Query each repository separately, then provide a structured comparison.

3. Construct Optimized Queries

Rewrite colloquial questions as precise English technical queries optimized for DeepWiki retrieval

Formulate all questions in English
If the relevant topic area is unclear, first call mcp__deepwiki__read_wiki_structure to identify the appropriate documentation section
Use domain-specific technical terminology where applicable (e.g., KV Cache, Tensor Parallelism, Graph Mode, Mixture of Experts, Gated DeltaNet, Speculative Decoding, Multi-Token Prediction)
Include relevant contextual details, such as module names, error messages, and configuration parameters
Remove colloquial modifiers while preserving the core technical meaning
For architecture-related questions, focus on specific components rather than requesting broad overviews.
Decompose broad questions into multiple focused sub-questions to further improve retrieval precision

Examples by Intent Category:

4. Query DeepWiki

DeepWiki Tool Usage Patterns

Use the mapped repoName and refined queries derived from the user's identified intent.

Single-repo query

mcp__deepwiki__ask_question(repoName="<owner/repo>", question="<refined query>")

Explore repo structure first

mcp__deepwiki__read_wiki_structure(repoName="<owner/repo>")

Read full repo documentation

mcp__deepwiki__read_wiki_contents(repoName="<owner/repo>")

Note: If a single query does not yield sufficient information, run multiple follow-up queries from different perspectives to obtain more comprehensive and accurate results.

DeepWiki Tool Selection

Session Context Reuse

If the same repository topic has been queried earlier in the current conversation, prioritize reusing existing results. Only issue additional queries when new information is needed.

Fallback Strategy

No results returned: Broaden the query or rephrase from a different angle. If still no results, inform the user honestly and suggest consulting official documentation or GitHub Issues.
Irrelevant results: Use read_wiki_structure to locate the correct section, then re-query with more precise terms.
Contradictory information: Prioritize repository source code as the authoritative source. Flag the contradiction and recommend the user verify independently.
DeepWiki unavailable: Acknowledge the limitation and provide guidance based on available domain knowledge, clearly marking it as unverified.

5. Organize and Synthesize the Response

5a. Response Format

Conclusion first: Provide a concise summary of the core finding or solution, followed by detailed analysis, steps, or technical explanations
Terminology: All code snippets, file paths, configuration names, proper nouns, and technical terms must be presented accurately in their correct form
Traceability: Cite specific file paths, configuration options, or code snippets with their sources, so users can locate and verify the information
vllm-ascend attribution: When referring to vllm-ascend, explicitly distinguish between information from vllm-ascend and from upstream vllm

5b. Quality Requirements

Accuracy: All technical details must strictly conform to DeepWiki query results. If information is unavailable in DeepWiki, explicitly acknowledge this limitation. Never fabricate content.
Completeness: Cover all aspects of the user's question. Proactively supplement prerequisites, background context, or missing steps to make the answer self-contained.
Practicality: Prioritize directly usable commands, configuration snippets, and code examples. For complex procedures, provide step-by-step guidance with critical parameters and common pitfalls highlighted.
Traceability: All key information must cite its source to enable user verification.
Clarity: Use clear and accessible language. Avoid unnecessary jargon. Focus on technical accuracy while remaining approachable.

Prohibited Behaviors

Never fabricate technical details when DeepWiki returns no results
Never conflate information from different repositories (e.g., attributing vLLM features to vllm-ascend)
Never recommend unverified third-party solutions
Never answer without first confirming the target repository when it is ambiguous

Uncertainty Marking

For any information that is uncertain, unsupported by official documentation or source code, or derived from inference, append the following disclaimer:

Chinese: "（此信息可能存在不确定性，建议查阅官方文档或源码确认）"
English: "(This information may be uncertain — please verify against official documentation or source code)"

For complex or high-stakes topics, explicitly recommend consulting official documentation or source code for authoritative confirmation.

Scope Boundary

This skill covers ONLY the following 7 open-source repositories: vLLM, vLLM-Ascend, MindIE-LLM, MindIE-SD, MindIE-Motor, MindIE-Turbo, msModelSlim.

If the user's question falls outside this scope:

Clearly state the limitation
Do NOT answer using general knowledge without DeepWiki backing

Related Skills

Ascend/k8s-check-fix

testing

VerifiedTrustedCommunity

Kubernetes 集群健康检查与安全修复 — 诊断问题，用户确认后执行修复

13SKILL.mdUpdated May 8, 2026

Ascend/cann-nnal-installer

tools

VerifiedTrustedCommunity

昇腾NPU CANN Toolkit+Kernels+NNAL安装部署技能。支持从官网下载run包安装和从Docker镜像提取两种方式，覆盖驱动检查、包下载、安装、环境变量配置与验证全流程。当用户需要安装CANN全套组件或指定版本CANN到自定义路径时调用。

13SKILL.mdUpdated May 8, 2026

Ascend/cann-nnal-installer

Ascend/atb-testframework-build

development

VerifiedTrustedCommunity

编译 ATB (Ascend Transformer Boost) 测试框架。当用户需要编译 ATB 测试框架、运行 CSV 测试、或构建 atb_test_framework 时调用。支持全量编译（含第三方依赖克隆与源替换）和增量编译两种模式。需在 Docker 容器内配合 CANN 环境执行。

13SKILL.mdUpdated May 7, 2026

Ascend/atb-testframework-build

Ascend/atb-ops-to-aclnn-migration-workflow

databases

VerifiedTrustedCommunity

ATB OPS→ACLNN 迁移标准化工作流主模板。整合前置学习、设计文档生成、CSV用例设计、实际迁移、编译验证、测试验证全流程，提供明确的阶段 Gates 和用户确认机制。

13SKILL.mdUpdated May 7, 2026

Ascend/atb-ops-to-aclnn-migration-workflow

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/Ascend/agent-skills.git

# Copy into Claude Code skills folder (global)
cp -r agent-skills/skills/ascend-inference-repos-copilot ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

Ascend/agent-skills

5 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT