gtd/SKILL.md
Warrant-first research GTD system. Manages the capture-clarify-organize-reflect-engage cycle for causal inference research. Scaffolds hypotheses/, insights/, decisions/ directories. Interrogates conjectures, files results, tracks binding decisions, checks pipeline freshness, drives the courtroom checklist.
npx skillsauth add scunning1975/mixtapetools gtdInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Research with an AI thinking partner is iterated dialogue between human judgment and agent throughput, where every cycle either strengthens a warrant or kills a claim. The harness is whatever machinery makes that dialogue fast, honest, and recoverable.
Four elements:
| Element | Definition | Role of dialogue | |---|---|---| | Frame | A question worth asking | Interrogated by dialogue | | Work | A way to interrogate it | Supervised by dialogue (agents make this cheap) | | Warrant | A way to know you've earned the answer | Built by dialogue — this is the product | | Dialogue | The substrate across all three | Human and agent argue their way to claims that hold |
The binding constraint has shifted. Pre-agents, Work was binding (coding, cleaning, drafting). Agents make Work cheap. The binding constraint is now Frame and Warrant — what to ask, and whether you've earned the answer. Design the harness around that reallocation.
/gtd initCreates the directory structure in the current project:
hypotheses/INDEX.md — DAG of testable claims
insights/INDEX.md — Atomic findings with provenance
decisions/INDEX.md — Binding commitments that constrain the pipeline
dashboard.html — Visual status (serves from localhost)
scripts/build_dashboard_data.py — Regenerates dashboard_data.json
Then asks: "What's the first claim you want to test?"
/gtd conjectureThe clarify step. Adversarial interrogation:
hypotheses/HXX_slug.md and update INDEX./gtd insightFile a result:
Writes insights/YYYY-MM-DD_slug.md, updates the linked hypothesis, regenerates dashboard_data.json.
/gtd decideCommit a binding design choice:
Writes to decisions/INDEX.md. Updates CLAUDE.md if the decision persists across sessions.
/gtd pipelineCheck freshness:
Runs python3 scripts/build_dashboard_data.py and reports.
/gtd statusQuick orientation: hypothesis DAG, pipeline freshness, next actions.
/gtd courtroomWalk through the DiD checklist stage by stage:
For each: present the exhibit, interrogate it, confirm or flag. Populates the manuscript view as we go. After completion, draft the narrative from confirmed material in the chosen voice.
Every quasi-experimental study presents its case. The courtroom is the general form — not just DiD but any design that requires:
Two cross-cutting standards apply to ALL stages:
hypotheses/HXX_slug.md)---
id: H01a
status: conjecture | testing | confirmed | rejected | complicated
parent: H01
date_proposed: 2026-05-19
---
## Claim
[One sentence, testable.]
## Courtroom
- Estimand: [what parameter]
- Population: [on whom]
- Variation: [what identifies it]
- Falsification: [what kills it]
## Evidence
- [links to insights, added as they accumulate]
insights/YYYY-MM-DD_slug.md)---
date: 2026-04-10
updates: H01a
result: confirmed | rejected | complicated
stage: [2, 4] # optional — courtroom stage(s) this speaks to. Overrides keyword matching.
script: scripts/r/05_estimate_did.R
output: output/figures/event_study.pdf
---
## Finding
[The fact. Numbers. Script path. What it means for the hypothesis.]
## Key Numbers
[Table with point estimate, SE, CI, p-value, N]
## Context
[Specification details, baseline, relative magnitude]
decisions/INDEX.md)Table format. One row per binding decision:
| ID | Decision | Date | Rationale |
|---|---|---|---|
| D01 | Primary estimator is TWFE with district and week FE | 2026-04-01 | Sufficient pre-periods; no staggered-timing bias |
conjecture → testing: First pipeline script assigned to test this hypothesis
testing → confirmed: Positive evidence + falsification passes (Stages 2-4 confirmed)
testing → rejected: Evidence contradicts + falsification confirms the negative
testing → complicated: Evidence mixed OR falsification fails
complicated → confirmed: Complication resolved (new evidence or new design)
complicated → rejected: Further investigation confirms failure
Rules:
confirmed without passing falsification (Stage 3)conjecture to rejected (if "kills it" condition met immediately)complicated is NOT terminal — it requires resolutioncomplicated, parent is at most testing)| Level | Name | Contains | Example |
|---|---|---|---|
| 1 | Cleaning | Raw → clean; format standardization | 00_clean_survey.py |
| 2 | Derived | Clean → derived variables; joins, constructs | 02_build_panel.py |
| 3 | Classification | Derived → treatment/control assignment | 03_classify_treated.py |
| 4 | Figures | Descriptive outputs, maps, timelines | 04_descriptive_figures.R |
| 5 | Estimation | Causal inference; the main results | 05_estimate_did.R |
Rules:
output/figures/ must map to exactly one pipeline scriptFreshness is computed dynamically by comparing file modification times:
output.mtime >= script.mtime → FRESH (output generated after script was last modified)output.mtime < script.mtime → STALE (script changed since output was generated)Freshness is NEVER stored as a permanent field. It is always computed at runtime by build_dashboard_data.py. The fresh field in insight frontmatter is a snapshot at filing time — the dashboard recomputes it.
hypotheses/INDEX.md — Hierarchical DAG# Hypothesis DAG
## H01 — Main Claim
Status: **testing**
One sentence description.
### H01a — Sub-claim
Status: **confirmed** (date)
One sentence description.
Two levels: parent hypotheses (##) and children (###). Each entry has bold status inline.
insights/INDEX.md — Table# Insights Log
| Date | Finding | Hypothesis | Status |
|---|---|---|---|
| 2026-04-15 | [Placebo is null](file.md) | H01a | confirmed |
| 2026-04-10 | [Urban ATT = 2.3pp](file.md) | H01a | confirmed |
Most recent first. Links to individual insight files.
When /gtd courtroom confirms a stage:
build_dashboard_data.py regenerates the JSONWhen /gtd courtroom flags a stage as complicated:
result: complicatedcomplicatedOnly add hooks for failures that are silently wrong (produce plausible but incorrect output).
Do hook: Classification file changes but county file not rebuilt → wrong treatment set → wrong ATT → presented wrong numbers. Silent failure. Hook it.
Don't hook: Missing figure → LaTeX won't compile. Visible failure. Don't hook it.
Starter hook (adapt paths to your project):
{
"hooks": {
"PostToolUse": [{
"matcher": "Write",
"command": "if echo \"$TOOL_INPUT\" | grep -q 'LINCHPIN_FILE_NAME'; then echo '⚠️ PIPELINE DEPENDENCY: Rebuild downstream'; fi"
}]
}
}
The dashboard (dashboard.html) reads from dashboard_data.json generated by scripts/build_dashboard_data.py. It shows:
Serve with: cd project_root && python3 -m http.server 8080
| GTD Stage | Research Equivalent | Mechanism |
|---|---|---|
| Capture | Ideas emerge through dialogue | The chat itself |
| Clarify | Courtroom checklist + interrogation | /gtd conjecture or /gtd courtroom |
| Organize | Commit to directory | hypotheses/ decisions/ CLAUDE.md |
| Reflect | Dashboard review | dashboard.html |
| Engage | Run the pipeline | scripts/ → output/ |
testing
Create, edit, improve, or audit AgentSkills. Use when creating a new skill from scratch or when asked to improve, review, audit, tidy up, or clean up an existing skill or SKILL.md file. Also use when editing or restructuring a skill directory (moving files to references/ or scripts/, removing stale content, validating against the AgentSkills spec). Triggers on phrases like "create a skill", "author a skill", "tidy up a skill", "improve this skill", "review the skill", "clean up the skill", "audit the skill".
testing
Host security hardening and risk-tolerance configuration for OpenClaw deployments. Use when a user asks for security audits, firewall/SSH/update hardening, risk posture, exposure review, OpenClaw cron scheduling for periodic checks, or version status checks on a machine running OpenClaw (laptop, workstation, Pi, VPS).
testing
Create, edit, improve, or audit AgentSkills. Use when creating a new skill from scratch or when asked to improve, review, audit, tidy up, or clean up an existing skill or SKILL.md file. Also use when editing or restructuring a skill directory (moving files to references/ or scripts/, removing stale content, validating against the AgentSkills spec). Triggers on phrases like "create a skill", "author a skill", "tidy up a skill", "improve this skill", "review the skill", "clean up the skill", "audit the skill".
testing
Host security hardening and risk-tolerance configuration for OpenClaw deployments. Use when a user asks for security audits, firewall/SSH/update hardening, risk posture, exposure review, OpenClaw cron scheduling for periodic checks, or version status checks on a machine running OpenClaw (laptop, workstation, Pi, VPS).