skills/socratic-paper-reading/SKILL.md
Co-read research papers with the user using a Socratic, multi-pass methodology. The agent handles all mechanical work — extracting structure, looking up terms, tracing references, generating probing questions, maintaining layered notes — while the user retains all interpretive and critical work (understanding, judgment, "if I were writing this..."). Trigger this skill whenever the user shares a research paper (PDF, arXiv link/ID, or paper title) and signals they want to engage with it deeply — phrases like "help me read this paper", "let's go through this paper", "walk me through [paper]", "I want to understand [paper]", or simply uploads a paper without specifying what they want. Especially well-suited to AI infrastructure, reinforcement learning, and embodied intelligence papers, but the methodology generalizes. Do NOT trigger when the user clearly only wants a one-shot summary or has a single specific factual question about a paper — this skill is for sustained co-reading sessions, not quick lookups.
npx skillsauth add zoheth/vidya socratic-paper-readingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
A co-reading methodology built on one principle: automate what doesn't grow the user; preserve what does.
The mechanical work of reading a paper — extracting structure, identifying paper type, looking up citations, finding figures, tracking what's been said, generating probing questions — does not improve the user's research taste. The interpretive work — judging whether claims are valid, deciding what's actually novel, imagining alternative framings — does. This skill loads the first onto the agent and protects the second for the user.
The methodology is adapted from Keshav's three-pass method, Andrew Ng's CS230 reading advice, and Shen Xiangyang & Hua Gang's "three layers, four stages, ten questions" framework.
Trigger when:
Do not trigger when:
If unclear, ask once: "Do you want a quick summary, or shall we do a proper read-through together?"
Before entering the phased flow, briefly establish two things:
Two sentences from the user is enough. Don't make this a long interview.
Default sequence is Phase 0 → 1 → 2 → 3 → (optional) 4. At each phase boundary, briefly check whether to proceed, skip, or revisit. Real reading is non-linear; treat phases as states the user moves between, not a forced march.
Without asking the user anything, do a Keshav-style first-pass yourself:
Identify the paper type. For this user's domains, common types include:
Different types have different attack surfaces. State the type up front — it matters for Phase 3.
Extract: title, venue/year, abstract, all figure captions, full section headings, conclusion. Read figure 1 closely — in ML papers it often encodes the entire contribution.
List the claims (at most 3, in plain language, not the paper's marketing phrasing).
List the setup: env / dataset / hardware / model size / number of seeds / open source? / what was held fixed vs. varied?
Note what stood out: a striking number, a non-obvious design choice, an unusual framing, an oddly specific limitation.
Do not produce a "summary" or judgment. This phase is information extraction only. Summarizing is the user's job in Phase 1. The output is a structured note dropped into the notes file (see "Notes file" below).
Hand control back. The user reads abstract + intro + key figures themselves, using the Phase 0 extract as scaffolding (not a substitute).
When they're ready, probe with these questions, in this order:
Do not accept vague answers. The agent's value in this phase is refusing abstraction. If the user says "they improve sample efficiency in RL", probe: "Sample efficiency measured how — env steps, wall-clock, or interactions? Relative to which baseline? On which benchmark?" This pressure is the entire point.
Specific probes refer to specific claims, numbers, or design choices the user just stated. Generic probes ("can you say more?") are not Socratic — they're filler.
If the user can't answer a probe, don't push to a "right answer" — log it under "open questions" in the notes file and let them know it'll come up in Phase 2.
The user reads the body in detail. The agent's role flips: stop driving, become a query handler.
When the user asks about a term, method, citation, or notation:
Do not interrupt with proactive questions in this phase. Reading flow matters. Probing happens at phase boundaries, not mid-pass.
When the user signals end of Phase 2 (or after a natural pause), ask the heavier comprehension questions:
This phase generates a paper-specific critique, not a canned domain checklist. The agent should:
Identify the type of evidence the paper relies on:
Generate paper-specific attack vectors. Read references/attack-surface-seeds.md to prime your thinking with seed examples in the user's domains. Use the seeds as inspiration, not as a fixed checklist — the goal is to find this paper's specific weaknesses, not run a generic audit.
Present 5-8 attack vectors, ranked by severity. For each:
Let the user pick which to investigate. For chosen ones, the agent helps with:
The user's judgment is the output. The agent supplies ammunition.
Default: skip. Only enter Phase 4 if at least one is true:
If entered:
Phase 4 is training, not output. The conversation matters more than any artifact written down.
Maintain a single layered markdown file at paper-notes-<short-title>.md in the working directory. Three layers, in one file, written progressively as the session unfolds.
# <Paper Title>
> arXiv: <id> | venue/year | type: <RL algo / training infra / ...>
> Session date: <YYYY-MM-DD>
## Capsule
[3-5 sentences. Filled in at end of Phase 2. The user's own one-paragraph summary, in their own words — not the agent's.]
## Refined notes
### What it claims
- [bullet list, claims as the user articulated them, not as the paper marketed them]
### What it actually shows
- [load-bearing evidence, with caveats]
### Attack surface
- [from Phase 3, with severity tags: kills / weakens / nitpick]
### Open questions
- [unresolved after Phase 2 / Phase 3]
### What I might use
- [operational — methods to try, framings worth borrowing, citations to follow up]
## Raw log
### Phase 0 extract
[agent's autonomous extraction, dumped here]
### Phase 1 Q&A
[probes asked and answers given]
### Phase 2 lookups
[term definitions and citation summaries as they came up]
### Phase 3 attack analysis
[full attack tree before user's selection]
### Phase 4 (if applicable)
[user's alternative framings, agent's alternatives, user's critiques]
The capsule and refined notes are what the user will reread later (or feed into a later related-work draft). The raw log is for traceability and for resuming a session that gets interrupted. Update the file at every phase transition, not at the end.
references/attack-surface-seeds.md — seed examples for Phase 3 critique generation, organized by paper type (RL algorithm, training infra, inference infra, embodied, benchmarks). Seeds are inspiration, not a checklist. Read this when entering Phase 3.development
Explain code through the lens of Naur's "Programming as Theory Building" — deliver the theory, not a behavioral narration. Use when the user says "explain this in non-code terms", "what's the theory here", or invokes /theory explicitly.
development
Use this skill when the user wants to genuinely understand unfamiliar code in any of three modes — **orienting** (building a working theory of a codebase, library, project, commit, or PR), **debugging** (tracing a bug or unexpected behavior through unfamiliar code), or **extending** (planning a modification, feature addition, or refactor in code they don't fully own yet). Trigger phrases include "help me understand this code", "walk me through this codebase", "why does this commit do X", "something's broken in this module", "I need to add X to this library", "help me figure out where this bug lives", "explain the design of this library", and similar. **The user's goal is NOT a code summary — it's to grow a working theory in their own head, structured both as an adjudicated set of claims AND as a felt sense of the system's overall shape.** Trigger any time the user wants to "understand", "figure out", "debug", "fix", "extend", "modify", "trace", or "make sense of" some code, project, commit, PR, or bug — even when they don't say "theory". Do NOT use for queries answerable by a single docstring or README line.
tools
Describe what this skill does, when it should be used, and the kinds of user requests that should trigger it.
development
Extract important questions from GitHub repositories, including issues, pull requests, discussions, and code reviews, and generate Markdown question cards for deep study. Use this skill when the user wants to extract key questions from a repo, mine important technical problems from GitHub threads, or build a study set of high-value questions from open-source projects.