pstack/skills/why/SKILL.md
Use for 'why does X work this way', 'why we picked Y', design rationale, regressions, postmortems, or data-backed thresholds. Discovers available MCPs and queries each evidence category (source control, issue tracker, long-form docs, real-time chat, infrastructure observability, error tracking, product analytics warehouse) in parallel, then returns a cited read on decisions and tradeoffs. Use how for runtime behavior.
npx skillsauth add cursor/plugins whyInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Investigate the motivation and intent behind code. Why was it built this way? What edge cases were considered? What product, business, or operational constraints shaped the design? What alternatives were rejected, and why?
Companion to the how skill. how answers what the code does and how it works. why answers what forces led to its shape.
Historical context spreads across seven evidence categories: source control history, issue or ticket tracking, long-form documents, real-time team chat, infrastructure observability, error or exception tracking, and product analytics warehouses. You cannot predict from the question alone which one holds the answer, so the skill enumerates available MCPs at run time, maps each to a category, queries all seven in parallel, then synthesizes with explicit confidence calibration. Null results from searched categories are first-class evidence about how the decision was made; report them alongside positive findings. The default is coverage, not minimalism.
Operate as a careful, cautious, precise investigator. Think like a detective piecing together a historical case from fragmentary records. When the record is thin, say so.
Concretely:
This posture is the working method, not a disclaimer.
This skill builds a patchwork understanding from fragmented historical evidence. Tickets go stale. Chat threads get deleted. Commit messages lie. People change their minds between the PR description and the implementation. The original author may have left the company.
Be ruthlessly honest about what you know versus what you're inferring. The goal is not a satisfying story; it is to surface evidence, calibrate confidence, and let the user decide.
Principles:
Read references/epistemics.md for the full confidence framework and phrasing guide. The synthesizer must follow it.
Parse what the user is asking. The target is usually a chunk of code, a pattern, a feature, or a named design decision. The question is usually one of:
If the target is vague ("why do we do it this way?" with no clear referent), make your best guess from conversation context (open files, recent edits, cursor location, what was just discussed). State your interpretation briefly so the user can redirect if you're off, then proceed.
Before spawning investigators, anchor the investigation in concrete code. You need:
(#1234) in the subject line)Build this inline. It's cheap, and every investigator needs it.
# Blame target lines for last-touch commits
git blame -L <start>,<end> <file>
# Full file history, with patches, through renames
git log --follow -p -- <file>
# Last N commits touching the file, PR numbers visible
git log --oneline -20 -- <file>
# Extract PR numbers from a commit message
git log -1 --format=%B <commit>
Pull PR bodies and discussion via gh for any substantive commits:
gh pr view <number> --json title,body,author,createdAt,mergedAt,labels,closingIssuesReferences,comments,reviews
Capture this as seed context (file paths, symbols, commits, PR numbers, linked ticket IDs). Pass it to the investigators so they don't rediscover it.
Default to the full parallel investigation. Each evidence category lives in a different kind of system, and you cannot tell from the question alone which one holds the answer without looking. So look across every available category, in parallel, by default.
Before spawning investigators, list the available MCPs from the Cursor environment. Use the available-tools map when present. Otherwise inspect the mcps/ directory Cursor exposes for enabled MCP servers.
Map each available MCP to one evidence category:
Source control is always available through git and gh. For the other six, classify using the MCP name, server instructions, tool names, and resource descriptors. If an MCP could fit more than one category, choose the one matching its primary evidence. Record ambiguous cases in the coverage map.
Aim for a complete coverage map, not a minimal one. A null result from an issue tracker is evidence the decision was not ticketed, a useful fact in itself. Document the null, don't skip the search.
Launch all matching investigators in a single message so they run concurrently. One investigator per category lets each specialize in one tool's query vocabulary and result shape. Don't ask one agent to cover multiple MCPs.
Subagent config (each):
subagent_type: generalPurposemodel: composer-2.5-fastreadonly: false (agent mode). Do not use readonly/Ask mode. It strips MCP access, which disables MCP-backed investigators entirely. The source control investigator would be safe in readonly, but keep modes uniform. Investigators still shouldn't write anything. That's a posture, not a sandbox.Each investigator gets:
references/investigator-prompt.mdreferences/sources/<source>.md for the selected MCP, adapted from the examples in references/source-playbook.mdreferences/sources/incident-postmortem.md if the target code looks defensive (null checks, retry logic, timeout handling, rate limiting, feature flags, egress guards, OOM handlers)Spawn one investigator per category that has a matching MCP. Each owns exactly one tool or MCP.
Each entry lists what the category physically contains and the kind of "why" it uniquely surfaces. Use it to know what to expect back, how to name a gap when a category returns empty, and (only in the rare provably-irrelevant case) to justify a skip. Every category overlaps, but each owns a kind of evidence the others cannot recover.
Source control investigator. Git history, gh for PRs, code comments, tests. Always spawn; the only guaranteed source. Best at surfacing implementation-time rationale captured during review. PR descriptions stating the problem, review threads debating alternatives, inline comments encoding non-obvious constraints, test names that encode motivating edge cases, and commit messages linking tickets or incidents. Most trustworthy because it ties directly to the diff that shipped.
Issue / ticket tracker investigator (e.g. Linear, Jira, GitHub Issues, Plane, Shortcut MCP). Tickets, project docs, status updates, spec attachments. Best at surfacing the product or business forcing function. Customer requests ("Acme needs X for their SOC2 audit"), compliance deadlines, parent-initiative framing ("Q3 enterprise readiness"), ticket-level scope changes, and labels that categorize the motivation (customer:*, incident-followup, compliance, perf-regression). Strongest when the why is external to engineering.
Long-form documents investigator (e.g. Notion, Confluence, Google Docs, Coda MCP). PRDs, specs, RFCs, design docs, ADRs, postmortems, team pages, meeting notes. Best at surfacing long-form design rationale. Problem statements, explicit "alternatives considered" and "rejected approaches" sections, strategy documents that set priorities, ADRs with finalized decisions, and postmortem action items that tie directly to code. Where the why is written out before it becomes code.
Real-time team chat investigator (e.g. Slack, Discord, Microsoft Teams, Mattermost MCP). Feature-name and symbol searches, PR URL mentions, incident channels (#sev-*, #incident-*), author-handle activity around the ship date. Best at surfacing real-time deliberation that never reached a doc. Fire-drill decisions during incidents, Q&A between the PR author and reviewers, casual "we decided X because Y" threads, and rationale for small changes that didn't warrant a PRD. Especially important when the source control, ticket, and doc paper trail is thin.
Infrastructure observability investigator (e.g. Datadog, New Relic, Honeycomb, Grafana, Splunk MCP). Metrics, monitors, dashboards, logs, APM traces, formal incidents. Infra/runtime view. Best at surfacing infrastructure and runtime reality that motivated the code. Monitor thresholds whose numbers match code constants, metric spikes in the window right before a PR merge, dashboards created as postmortem action items, incident timelines that reference the target. Strongest when the target reacts to an infra signal (timeouts, retries, rate limits, circuit breakers).
Error / exception tracking investigator (e.g. Sentry, Rollbar, Bugsnag, Airbrake MCP). Issues, events, stack traces, releases. Best at surfacing the specific exceptions and error trajectories that motivated defensive or corrective code. Stack traces that pass through the target function, issues whose first-seen/last-seen windows bracket the PR ship date, release correlations that show an error stopping at a specific version. Strongest for catch blocks, null guards, type checks, retries, and other defenses.
Product analytics warehouse investigator (e.g. Databricks, Snowflake, BigQuery, ClickHouse, dbt, Redshift MCP). Product-analytics events, experiment and feature-flag exposure tables, usage and billing events, query history, warehouse telemetry. Product/data view. Complements infrastructure observability by covering user behavior and data reality around the ship date rather than infra metrics. Best at surfacing product and data reality that shaped the code. Feature-usage trajectories (a step-function ramp from zero is strong evidence that this PR launched it), experiment/flag exposure data tied to ship decisions, pre-ship distributions that reveal where a threshold constant came from (e.g., limit = 128 * 1024 matching the p99 of an upload-size column), and data-pipeline scale evidence for migrations/backfills. Strongest for flag-gated code, experiment-driven ships, data migrations, and "where did this number come from" questions.
Only skip with an explicit, written justification that goes in the final "Sources Consulted" section. Two valid reasons:
"It's pure feature code, error tracking won't have anything" is not sufficient, and neither is "I doubt long-form docs would have this." Run the search; let the null result speak. The cost of an investigator returning empty is one subagent. The cost of missing a design doc that actually exists is a wrong answer.
If your scope assessment suggests a single-commit trivial target where the PR description already contains the complete answer, you may answer inline only after confirming all seven available category searches would be redundant. Say so explicitly. This should be rare.
Spawn one synthesizer subagent:
subagent_type: generalPurposemodel: claude-opus-4-8-thinking-xhighreadonly: false (agent mode). The synthesizer's quality check spot-verifies citations, which can require MCP access. Readonly/Ask mode strips MCPs and defeats that.The synthesizer gets:
references/epistemics.mdreferences/synthesizer-prompt.mdIts job is the final output: a confidence-weighted, evidence-cited narrative with clearly separated "what we know" and "what we're inferring" sections, plus honest acknowledgment of gaps and null-result sources.
Take the synthesizer's output and present it to the user. You may lightly edit for clarity or add context from the conversation, but do not rewrite the confidence language. The epistemic framing is the product. Dropping the hedges to sound more authoritative is the exact failure mode this skill exists to prevent.
The final output uses this structure. Adapt as needed, but keep the confidence separation intact.
The Question. Restate what the user asked, concisely.
The Code in Question. File paths, line ranges, and key symbols. One or two lines so the reader is anchored.
What We Found (direct evidence). Claims with explicit citations (PR #, ticket ID, doc URL, chat permalink, commit hash, code comment with file:line). Each bullet is a thing we have textual evidence for. Use present tense and quote or paraphrase the source.
What We Can Reasonably Infer. Claims well-supported by indirect evidence or combinations of signals, but not explicitly stated anywhere. Each bullet must explain the inference chain: "Given A and B, it's likely that C." Use hedged language ("appears to", "likely", "suggests").
Competing Hypotheses. If the evidence fits multiple stories, list them. For each, give the hypothesis, the evidence for it, and the evidence against it. Don't force a winner when the record doesn't support one. (Skip this section if there's a clear answer.)
What We Don't Know. Explicit gaps. Questions the user asked that the evidence didn't answer. Sources we searched and came up empty. Be specific. "We searched the issue tracker for 'rate limit' and found no ticket discussing this specific threshold" is more useful than "we don't know why."
Sources Consulted. One line per investigator, including the ones that returned nothing. The reader should see at a glance (a) which MCPs were queried, (b) which came back empty, and (c) which were skipped and why. This coverage map lets the user judge breadth and redirect if something obvious was missed.
Format each line as: - <Source>: <what was searched>. <what was found, or "no relevant results," or "skipped. reason">.
Example:
git log --follow backend/retry.ts, PRs #49074, #47812. Found PR #49074 introduced exponential backoff and linked ENG-4421.retry_count metric and monitors around 2024-08-14. Found monitor "Upstream 5xx rate > 1%" created same day as PR #49074.retry.ts. Found issue SENTRY-3821 spiking in the week before the PR.<your_analytics_db>.<schema>.stg_backend_upstream_retry for the 30-day window around 2024-08-14. Daily failure-classified event count fell from ~1.2k/day pre-PR to <50/day post-PR. Also checked system.query.history for relevant migration queries. None found.After the Sources Consulted block, if the user's why question is a precursor to actually changing this code, convert the lineage findings into a Preserve / Change / Avoid / Risk constraint set suitable for planning the change.
references/epistemics.md. Confidence tiers and phrasing guide. The synthesizer must follow it.references/investigator-prompt.md. Base prompt template for investigator subagents.references/source-playbook.md. Index pointing at the category playbooks below.references/sources/*.md. One self-contained example playbook per category, plus cross-cutting incident-postmortem.md. Give an investigator the single file that matches its category and adapt it to the available MCP.references/synthesizer-prompt.md. Prompt template for the synthesizer subagent, including the output format.development
Apply when you catch yourself writing the same instruction a second time, or notice a recurring correction. Encode the rule as a lint, metadata flag, runtime check, or script instead of more text.
tools
Apply to any non-trivial work, not just bulk work: edits, migrations, analyses, checks. Build the tool that does it or proves it (codemod, script, generator, or a skill your subagents follow) instead of working by hand. The tool is the artifact a reviewer can rerun.
data-ai
Cut AI tells from any writing. Must always apply.
data-ai
Spawn three parallel review subagents over the active transcript, surface learnings, and route each to a concrete edit on an existing skill. Use when the user says reflect.