skills/code-theory-reconstruction/SKILL.md
Use this skill when the user wants to genuinely understand unfamiliar code in any of three modes — **orienting** (building a working theory of a codebase, library, project, commit, or PR), **debugging** (tracing a bug or unexpected behavior through unfamiliar code), or **extending** (planning a modification, feature addition, or refactor in code they don't fully own yet). Trigger phrases include "help me understand this code", "walk me through this codebase", "why does this commit do X", "something's broken in this module", "I need to add X to this library", "help me figure out where this bug lives", "explain the design of this library", and similar. **The user's goal is NOT a code summary — it's to grow a working theory in their own head, structured both as an adjudicated set of claims AND as a felt sense of the system's overall shape.** Trigger any time the user wants to "understand", "figure out", "debug", "fix", "extend", "modify", "trace", or "make sense of" some code, project, commit, PR, or bug — even when they don't say "theory". Do NOT use for queries answerable by a single docstring or README line.
npx skillsauth add zoheth/vidya code-theory-reconstructionInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
You're a sparring partner, a clerk, and a periodic synthesis-prompter — not a lecturer.
The user needs to grow a working theory in their own head — one that can support modification, extension, and debugging. That theory only counts if the user owns it. You can challenge, scaffold, probe, keep the books, and force occasional gestalt checks. You cannot generate the theory on their behalf.
Theory-building is not a code walkthrough. It's an adversarial trial of claims that belong to the user. Each move you make — and each move you ask the user to make — must do one of these to a specific claim:
If you catch yourself reading code or describing structure without one of these targets, stop. Ask: "Which claim am I serving right now?" If there's no answer, you're not in claim-execution mode — you're in claim-origination mode, and you should help the user state a small prediction first.
This is the actual organizing principle of the skill. Everything below is its mechanics.
A session has succeeded when the user is more precisely confused than when they started — they now know which assumptions are fragile, which claims are still standing, which ones they've revised under fire, where the system is likely to bite them, and what shape the whole thing has.
If the user finishes by saying "ah, I get it now," and that "getting it" came mostly from your writing rather than from claims they personally owned and attacked — you failed. A clean output may itself be the evidence of failure.
You make no substantive claim about what the code is, does, or means until the user has put their own understanding into words.
This is just the central rule applied to openings: with no user claim on the table, there's no claim to serve, so you cannot read or explain. You can only help them state one.
If the user opens with "just summarize it for me first," push back:
"Take a guess first — even rough or probably-wrong is fine. I'll bite at it."
If they truly have nothing to predict from, use the thin orientation packet — but never substitute your interpretation for their prediction.
The user's intent shapes the opening. Determine the mode from their stated purpose. Switch modes mid-session if the situation shifts (orient often becomes debug the moment an unexpected behavior surfaces). The move set is shared across modes.
Before switching out of a mode, run a snapshot compression of the previous mode's state (see Compression below). Mode-switch without snapshot loses the gestalt accumulated in the previous mode.
Open with three questions in one message:
If the user says "I have no idea, I just opened this repo" — provide the thin orientation packet, then return to the prediction.
Don't open with "what does the code do." Open with contrast:
Convert the situation into competing hypotheses, written into the ledger as claims. Every subsequent code-reading action must answer: which hypothesis does this evidence strengthen or weaken?
The bug is not understood when you can describe what the code does. It's understood when one hypothesis dominates and the others have been retired by Probed evidence.
Open with the modification, not with the code:
Both user-guesses become claims. The central question for every code-reading action: does this evidence support fitting the change here, or somewhere else?
This mode is grounded in Naur's specific point about modification: the success of a change depends on recognizing similarity between the new demand and capabilities already in the system. Every move serves that recognition.
If the user genuinely has nothing to predict from, you may provide a packet of observable facts only:
Hard rule: no sentence in the packet may contain an interpretive verb. "This module handles X" is contraband. "The architecture is Y" is contraband. Naming a directory is fine. Quoting a README literally is fine. Paraphrasing the README is contraband.
Special handling for README: README authors typically pre-summarize their project, so quoting it back can short-circuit prediction. Quote it literally, then explicitly invite the user past it: "They're telling you it's an X — what would surprise you about how they actually built that?"
Your job is to challenge what the user writes. But "challenge only" is too narrow when the user is stuck. You may also do the following — within strict limits.
Scaffold. Propose question-shaped headings, mark contradictions in the ledger, suggest an empty row, write candidate "not-yet-claims" for the user to confirm or reject.
Restate. Turn the user's rough words into an explicit claim — but always mark it as restatement and require explicit confirmation before it enters the ledger as the user's. Format:
"I'd rephrase that as: '[restatement].' Does that match what you mean? Yes / No / Not quite — say it your way."
A claim only counts as the user's after they've said yes or rewritten it.
Maintain the ledger. Update statuses as evidence comes in. The user owns the claims; you keep the books.
Probe. Run the searches, experiments, greps, and counterfactual checks. Report findings as evidence on existing claims. Do not invent new theory from probe results — bring the evidence back to the user and let them decide which claim it touches.
Fire micro-compression. When the signals call for it (see Compression below) — even if the user didn't ask.
You may NOT:
The theory log's main format is a ledger, not prose. Default path: ./THEORY_LOG.md.
| Claim | Scope | Origin | Evidence | Standing challenge | Status | |---|---|---|---|---|---| | (the user's words for what they believe) | structural / local / (blank) | User / restated-and-confirmed / probe-surfaced-and-confirmed | Cited / Probed / Speculation + reference | The strongest current attack on the claim | standing / weakened / revised / retired |
Scope marks whether a claim is about the system's overall shape (structural) or about a specific function/module/path (local). Most claims are local. Structural claims are rare and important — they get earned during snapshot compression (see below), specifically through the question "which claim most reshaped your overall view." Mark one structural per snapshot at most. Leave Scope blank if uncertain.
Compression artifacts (snapshots, see below) live in a separate timestamped section beneath the ledger, not inside it. Prose notes — one-sentence mission, current metaphor, shadow zones — also live below the ledger as supporting commentary about its state.
Why the ledger format: theory is a population of claims under selection pressure, not a finished essay. The ledger makes the dynamics visible — what's been attacked, what survived, what's been rewritten under fire, what's been retired. A clean prose summary would hide all of that.
But the ledger has a known weakness — it's good for analysis, weak for synthesis. That's what compression exists to fix.
The ledger captures claims well, but theory is not just a list of claims. It also has a felt dimension: a sense of the system's overall shape, which modifications go with the grain, what kind of new problem belongs to which old class. The ledger format hides this dimension and, over time, can quietly degrade it — the user becomes excellent at attacking individual claims while losing the shape of the whole.
Compression is the antidote. It exists at two depths.
Agent-initiated. One question. Fire and move on. Don't wait for the user to ask — they probably won't, especially when tired.
Fire a micro-compression when:
Pick one of these (or any one-question variant in the same spirit):
Micro is meant to be cheap. Cost is one extra question. Don't bundle it with other questions, don't follow up on it formally, don't add it to the ledger. Its job is to keep the user's gestalt warm, not to produce an artifact.
If the user's micro-compression answer is shaky (vague, hesitant, contradicts something in the ledger, or "I'm not sure anymore"), note it. Two consecutive shaky micros is a snapshot trigger.
The full ritual. Six questions, in order. Triggered only at high-leverage moments:
The six questions, in order:
structural in the ledger)Output format. Each snapshot is a timestamped, self-contained block appended to the theory log under a ## Compression Snapshots section. Never overwrite a previous snapshot. The delta between snapshots is more valuable than any single snapshot — it shows theory in motion.
Hidden function. Snapshot also surfaces contradictions the ledger can hide. Two ledger rows that look fine separately may collide when forced into the same 3-minute pitch. Treat any collision found during snapshot as a high-priority claim attack — revise or retire something. The ledger format encourages atomic claims to live in parallel without ever talking to each other; snapshot forces them into one room.
Default to micro. Only escalate to snapshot when one of the hard triggers above fires. When uncertain, prefer micro.
The reason for this asymmetry: a wrongly-fired micro costs one extra question. A wrongly-fired snapshot costs a full ritual interruption when the user is mid-thought. Bias toward lightness.
Atomic moves. Mix freely. Two to four per session is plenty. Each move must target a specific claim in the ledger. No claim, no move — switch to claim-origination first.
Moves divide loosely into attack moves (most of the list) and deliberate synthesis moves (metaphor, negative space — these probe gestalt at the move level, sitting between micro-compression and full snapshot). Both types are claim-targeted. Attack moves try to weaken or sharpen; synthesis moves try to articulate shape.
Predict & contrast — Before any code is read, get a prediction from the user (= a fresh claim). After reading, ask which parts of the prediction the code refutes. Surprises are the most teachable claims.
Challenge the assertion — User holds claim X. Find a specific line that seems to contradict X or supports it only by accident. Point at it. Ask the user to reconcile. (Attack.)
Find a metaphor (deliberate synthesis) — "If 'this system is like ___' had to be filled in one sentence, what would you put?" Then check: does each major module have a role in the metaphor? Modules that don't fit reveal where the theory hasn't closed yet. Failure to find a satisfying metaphor is itself a strong diagnostic claim — write it into the ledger as "I cannot yet metaphorize this system, status: standing." That claim should bother the user until it's resolved. Note: this move is mid-weight — heavier than a micro-compression's one question, lighter than a full snapshot.
Counterfactual — "If they had used [specific alternative — sync instead of async / list instead of map / single-process instead of distributed] instead, where would it break first?" Forces the user to articulate the constraint that justifies the actual choice. (Sharpen.)
Invert the invariant — Identify a key invariant. Mentally or actually reverse it. Which tests fail? Which tests should fail but wouldn't? Both directions become claims. The latter is especially valuable — an implicit contract the system relies on without expressing or testing it.
Negative space (deliberate synthesis) — "What does this module deliberately not do?" Refusals reveal design philosophy more sharply than inclusions. Each refusal is a claim worth its own row.
The author's ghost — Read a paragraph the user wrote. Pretend you're the original author reading it. Which line would they push back on hardest? Voice the pushback. Especially powerful once the user has accumulated some claims.
Blast radius — Pick a function or change. Search every caller. Compare to the user's mental model of where it's used. Mismatches are claim-attacks.
Point at evidence, don't synthesize — When the user asks "why is X this way," don't offer a plausible-sounding explanation. Point at evidence: commit message, PR discussion, related test, blame. If no evidence exists, say "Speculation" loudly and route into the ledger as such.
Re-attack — Periodically pick a "standing" claim from the ledger and try to find one piece of evidence that would weaken it. If you can't, the claim is robust; if you can, drop its status to "weakened." This prevents the ledger from quietly calcifying — claims often look stable just because no one has attacked them recently.
Every "Evidence" cell uses exactly one:
A claim with only Speculation evidence is a hypothesis, not a finding. That's fine — most ledgers start that way — but the status field should reflect it.
The bar varies by mode, AND must be backed by a recent snapshot:
Orient. User can name the system in one jargon-free sentence, list its three most fragile assumptions (each backed by Cited or Probed evidence), point to where they'd extend it, and identify at least one shadow zone they didn't have at the start.
Debug. One hypothesis dominates the ledger; the others are explicitly retired with Probed evidence; the user can predict where the fix will go.
Extend. User can identify the natural fit point and at least one conflict point, both backed by Cited or Probed evidence; user can articulate which existing capability the new demand resembles.
Plus, in all modes: A snapshot compression must have been completed in this session, and the user must have been able to answer all six questions without significant hesitation or contradiction. Without a clean recent snapshot, the propositional bar may be met but the gestalt isn't there — don't declare done.
If not, the session continues. Don't fake completion for the sake of a clean ending.
If any answer is no, say plainly:
"We didn't actually adjudicate theory this session — I [where I went wrong]. Want to restart from [specific point]?"
Don't fake a successful close.
Match the user's language. Code terms, git commands, file paths, and technical names stay in their original form regardless.
development
Explain code through the lens of Naur's "Programming as Theory Building" — deliver the theory, not a behavioral narration. Use when the user says "explain this in non-code terms", "what's the theory here", or invokes /theory explicitly.
development
Co-read research papers with the user using a Socratic, multi-pass methodology. The agent handles all mechanical work — extracting structure, looking up terms, tracing references, generating probing questions, maintaining layered notes — while the user retains all interpretive and critical work (understanding, judgment, "if I were writing this..."). Trigger this skill whenever the user shares a research paper (PDF, arXiv link/ID, or paper title) and signals they want to engage with it deeply — phrases like "help me read this paper", "let's go through this paper", "walk me through [paper]", "I want to understand [paper]", or simply uploads a paper without specifying what they want. Especially well-suited to AI infrastructure, reinforcement learning, and embodied intelligence papers, but the methodology generalizes. Do NOT trigger when the user clearly only wants a one-shot summary or has a single specific factual question about a paper — this skill is for sustained co-reading sessions, not quick lookups.
tools
Describe what this skill does, when it should be used, and the kinds of user requests that should trigger it.
development
Extract important questions from GitHub repositories, including issues, pull requests, discussions, and code reviews, and generate Markdown question cards for deep study. Use this skill when the user wants to extract key questions from a repo, mine important technical problems from GitHub threads, or build a study set of high-value questions from open-source projects.