skills/github-qa-extractor/SKILL.md
Extract important questions from GitHub repositories, including issues, pull requests, discussions, and code reviews, and generate Markdown question cards for deep study. Use this skill when the user wants to extract key questions from a repo, mine important technical problems from GitHub threads, or build a study set of high-value questions from open-source projects.
npx skillsauth add zoheth/vidya github-qa-extractorInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Extract important, high-impact questions from GitHub repositories. The core philosophy is simple: questions are more valuable than answers. A great question reveals the structure of a problem space; answers can always be found later.
Confirm gh is available and authenticated:
gh auth status
If not authenticated, ask the user to run gh auth login first.
The user provides a GitHub repo, for example https://github.com/owner/repo or owner/repo. Extract OWNER and REPO. Note any user-specified filters such as labels, date range, source types, topic area, or language.
Use gh with --json for structured output. Default to about 80 items per source, and adjust based on repo size.
Important: fetch all states, not just resolved items. Unanswered and open questions are often the most important.
# Closed issues — include both completed and not_planned, as "not_planned" often means hard design trade-offs
gh issue list --repo OWNER/REPO --state closed --limit 80 \
--json number,title,body,labels,comments,author,closedAt,url,stateReason,reactionGroups
# Open issues — long-standing open issues are often the hardest, most important questions
gh issue list --repo OWNER/REPO --state open --limit 80 \
--json number,title,body,labels,comments,author,createdAt,url,reactionGroups
Fetch comments for a promising issue:
gh issue view NUMBER --repo OWNER/REPO --json comments,reactionGroups
Tips:
gh label list --repo OWNER/REPO.question, help wanted, design, architecture, RFC, and discussion are high-signal.stateReason: "not_planned" with deep discussion often reveals important design boundaries and trade-offs.gh pr list --repo OWNER/REPO --state merged --limit 80 \
--json number,title,body,labels,comments,reviews,author,mergedAt,url
# Also check closed-unmerged PRs — rejected approaches often surface critical design questions
gh pr list --repo OWNER/REPO --state closed --limit 40 \
--json number,title,body,labels,comments,reviews,author,closedAt,url
Detailed review threads on a specific PR:
gh api repos/OWNER/REPO/pulls/NUMBER/comments --paginate
Closed-unmerged PRs are valuable because rejection reasons often reveal architectural constraints.
gh api graphql -f query='
query($owner: String!, $repo: String!, $first: Int!) {
repository(owner: $owner, name: $repo) {
discussions(first: $first, orderBy: {field: CREATED_AT, direction: DESC}) {
nodes {
number
title
body
url
createdAt
closedAt
answer { body author { login } createdAt }
labels(first: 5) { nodes { name } }
author { login }
category { name slug }
upvoteCount
comments(first: 15) {
nodes {
body
author { login }
createdAt
isAnswer
replies(first: 10) {
nodes { body author { login } createdAt }
}
}
}
}
}
}
}
' -f owner=OWNER -f repo=REPO -F first=80
To fetch category IDs for filtering:
gh api graphql -f query='
query($owner: String!, $repo: String!) {
repository(owner: $owner, name: $repo) {
discussionCategories(first: 20) {
nodes { id name slug }
}
}
}
' -f owner=OWNER -f repo=REPO
gh api repos/OWNER/REPO/pulls/comments \
--paginate \
--jq '.[] | {body, user: .user.login, url: .html_url, path, diff_hunk, author_association, in_reply_to_id, created_at}'
Group by in_reply_to_id to reconstruct threads. Focus on reviewer questions that ask "why", not just "what".
For a candidate issue, check how often it is referenced by other issues or PRs:
gh api graphql -f query='
query($owner: String!, $repo: String!, $number: Int!) {
repository(owner: $owner, name: $repo) {
issue(number: $number) {
timelineItems(first: 50, itemTypes: [CROSS_REFERENCED_EVENT]) {
totalCount
nodes {
... on CrossReferencedEvent {
source {
... on Issue { number title url }
... on PullRequest { number title url }
}
}
}
}
}
}
}
' -f owner=OWNER -f repo=REPO -F number=NUMBER
A high cross-reference count is a strong signal that this question is a central node in the project's problem space.
Read through the fetched data. The goal is to find questions that matter, regardless of whether they have answers.
Tier 1 — Structural importance
Tier 2 — Community resonance
Tier 3 — Depth indicators
not_planned with substantive discussion+1 or "me too" threads with no analytical contentThe raw issue title is often vague or context-dependent. Rewrite each question so that it is:
Example:
"Why does my config fail?" -> "How does the config resolution order work, and what happens when multiple sources conflict?"
Group questions into 3 to 8 categories. Infer from:
Suggested cross-repo categories:
Save a Markdown file like this:
# 关键问题 —— {repo_name}
> 提取自 GitHub issues、PRs、discussions 和 code reviews。
> {total_count} 个问题 · {category_count} 个分类
>
> **如何使用:** 这些问题塑造了这个项目。学习这些问题,不只是理解项目“做了什么”,更要理解它“为什么会这样设计”。在展开上下文前,先尝试自己回答每个问题。
---
## {分类名称}
### Q{n}. {简短问题标题}
> **来源:** [{source_type} #{number}]({url})
> **重要性:** {Critical/High/Medium} · **深度:** {Surface/Conceptual/Architectural}
> **状态:** {Answered/Open/Debated}
**问题:**
{用更清晰、自包含的方式重写后的问题。}
**为什么重要:**
{1-2 句话说明为什么这个问题重要。}
<details>
<summary>上下文与讨论</summary>
{总结线程中的关键讨论点。若已回答,包含答案;若存在争议,概括主要立场;若仍未解决,说明难点所在。}
**关键声音:** {谁提出了什么观点,尤其是维护者}
**结果:** [PR #{pr_number}]({pr_url}) —— {一句话说明做了什么改动}
</details>
---
The final output must be written in Simplified Chinese. Keep GitHub usernames, technical terms, code identifiers, and URLs in their original form.
Use these levels:
Critical, High, MediumSurface, Conceptual, ArchitecturalAnswered, Open, DebatedQuality rules:
Save as key_questions_{repo_name}.md in the working directory.
If the total question count exceeds 30, offer to split the output into separate files by category.
gh is not authenticated, tell the user to run gh auth login.gh is authenticated with access.development
Explain code through the lens of Naur's "Programming as Theory Building" — deliver the theory, not a behavioral narration. Use when the user says "explain this in non-code terms", "what's the theory here", or invokes /theory explicitly.
development
Co-read research papers with the user using a Socratic, multi-pass methodology. The agent handles all mechanical work — extracting structure, looking up terms, tracing references, generating probing questions, maintaining layered notes — while the user retains all interpretive and critical work (understanding, judgment, "if I were writing this..."). Trigger this skill whenever the user shares a research paper (PDF, arXiv link/ID, or paper title) and signals they want to engage with it deeply — phrases like "help me read this paper", "let's go through this paper", "walk me through [paper]", "I want to understand [paper]", or simply uploads a paper without specifying what they want. Especially well-suited to AI infrastructure, reinforcement learning, and embodied intelligence papers, but the methodology generalizes. Do NOT trigger when the user clearly only wants a one-shot summary or has a single specific factual question about a paper — this skill is for sustained co-reading sessions, not quick lookups.
development
Use this skill when the user wants to genuinely understand unfamiliar code in any of three modes — **orienting** (building a working theory of a codebase, library, project, commit, or PR), **debugging** (tracing a bug or unexpected behavior through unfamiliar code), or **extending** (planning a modification, feature addition, or refactor in code they don't fully own yet). Trigger phrases include "help me understand this code", "walk me through this codebase", "why does this commit do X", "something's broken in this module", "I need to add X to this library", "help me figure out where this bug lives", "explain the design of this library", and similar. **The user's goal is NOT a code summary — it's to grow a working theory in their own head, structured both as an adjudicated set of claims AND as a felt sense of the system's overall shape.** Trigger any time the user wants to "understand", "figure out", "debug", "fix", "extend", "modify", "trace", or "make sense of" some code, project, commit, PR, or bug — even when they don't say "theory". Do NOT use for queries answerable by a single docstring or README line.
tools
Describe what this skill does, when it should be used, and the kinds of user requests that should trigger it.