skills/ds/SKILL.md
This skill should be used when the user asks to 'start data analysis', 'brainstorm analysis approach', 'plan a data project', 'clarify analysis requirements', 'explore this dataset', 'what's in this data', 'what should I analyze here', 'set up a new study', or needs the data science workflow (exploration starts with questions, not data pulls — the workflow enforces that ordering).
npx skillsauth add edwinhu/workflows dsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Before starting, check for an existing handoff:
.planning/HANDOFF.md exists| Level | Remaining Context | Action | |-------|------------------|--------| | Normal | >35% | Proceed normally | | Warning | 25-35% | Complete current question round, then trigger ds-handoff | | Critical | ≤25% | Immediately trigger ds-handoff — do not start new question rounds |
Refine vague analysis requests into clear objectives through Socratic questioning. NO data exploration, NO coding - just questions and objectives.
Load shared enforcement first.
Auto-load all constraints matching applies-to: ds:
!uv run python3 ${CLAUDE_SKILL_DIR}/../../scripts/load-constraints.py ds
You MUST have these constraints loaded before proceeding. No claiming you "remember" them.
<EXTREMELY-IMPORTANT> ## The Iron Law of DS BrainstormingASK QUESTIONS BEFORE ANYTHING ELSE. This is not negotiable.
Before loading data, before exploring, before proposing approaches, you MUST:
STOP - You're about to load data or explore before asking questions. Don't do this. </EXTREMELY-IMPORTANT>
| DO | DON'T | |-------|----------| | Ask clarifying questions | Load or explore data | | Understand analysis objectives | Run queries | | Identify data sources | Profile data (that's /ds-plan) | | Define success criteria | Create visualizations | | Ask about constraints | Write analysis code | | Check if replicating existing analysis | Propose specific methodology |
Brainstorm answers: WHAT and WHY Plan answers: HOW (data profile + tasks) (separate skill)
Employ AskUserQuestion immediately:
When multiple analysis questions arise, batch them into ONE AskUserQuestion call:
Batched (fast — 1 round-trip):
AskUserQuestion(questions=[
{"question": "Primary dataset?", "options": [{"label": "CRSP"}, {"label": "Compustat"}, {"label": "Both merged"}]},
{"question": "Sample period?", "options": [{"label": "2000-2024"}, {"label": "2010-2024"}, {"label": "Custom"}]},
{"question": "Frequency?", "options": [{"label": "Monthly"}, {"label": "Quarterly"}, {"label": "Annual"}]}
])
When to batch: After understanding the research question, if 3+ independent questions arise, batch them. When NOT to batch: If a question's answer changes what other questions to ask (e.g., dataset choice affects available variables).
CRITICAL: Ask early if replicating existing work:
AskUserQuestion:
question: "Are you replicating or extending existing analysis?"
options:
- label: "Replicating existing"
description: "Must match specific methodology/results"
- label: "Extending existing"
description: "Building on prior work with modifications"
- label: "New analysis"
description: "Fresh analysis, methodology flexible"
When replicating:
After objectives are clear:
AskUserQuestion for the user to select the preferred approachAfter selecting an approach:
.planning/SPEC.md# Spec: [Analysis Name]
> **For Claude:** After writing this spec, discover and load the ds-plan skill for Phase 2:
>Read `${CLAUDE_SKILL_DIR}/../../skills/ds-plan/SKILL.md` and follow its instructions.
## Objective
[What question this analysis answers]
## Data Sources
- [Source 1]: [location, format, time period]
- [Source 2]: [location, format, time period]
## Requirements
Assign each requirement a unique ID using `CATEGORY-NN` format (e.g., `DATA-01`, `VIZ-02`, `STAT-03`). Categories come from natural groupings in the analysis.
| ID | Requirement | Scope |
|----|-------------|-------|
| [CAT-01] | [Requirement 1] | v1 |
| [CAT-02] | [Requirement 2] | v1 |
Scope: `v1` = must complete, `v2` = nice to have, `out-of-scope` = explicitly excluded.
## Success Criteria
- [ ] [CAT-01] [Criterion]
- [ ] [CAT-02] [Criterion]
## Constraints
- Replication: [yes/no - if yes, reference source]
- Timeline: [deadline]
- Methodology: [required approaches]
## Chosen Approach
[Description of selected approach]
## External Skills Likely In Play
<!-- List plugin skills whose data/tools will be touched. ds-plan Step 5b will Glob their references/ and examples/ before drafting tasks. -->
- [e.g. wrds — holdings/voting data via SAS on WRDS grid]
- [e.g. gemini-batch — LLM extraction for text fields]
- [none]
## Rejected Alternatives
- Option B: [why rejected]
- Option C: [why rejected]
Checkpoint type: human-verify (SPEC.md content is machine-verifiable)
Before transitioning to ds-plan, execute this gate:
1. IDENTIFY → SPEC.md exists at `.planning/SPEC.md`
2. RUN → Read(".planning/SPEC.md")
3. READ → Verify it contains: Objectives, Data Sources, Requirements (with CATEGORY-NN IDs), Success Criteria sections
4. VERIFY → User has confirmed the objectives via AskUserQuestion response (not agent self-assessment).
Check: was AskUserQuestion called and did user respond affirmatively?
5. CLAIM → Only proceed to ds-plan if ALL checks pass
If ANY check fails, do NOT proceed. Fix the gap first.
Self-assessment is not user confirmation. If the user hasn't explicitly approved the objectives via AskUserQuestion, you haven't finished brainstorm.
Declare brainstorm complete when:
.planning/SPEC.md writtenThis skill is Phase 1 of the 5-phase /ds workflow:
┌──────────────┐ ┌──────────┐ ┌──────────────┐ ┌───────────┐ ┌───────────┐
│ ds-brainstorm│───→│ ds-plan │───→│ ds-implement │───→│ ds-review │───→│ ds-verify │
│ SPEC.md │ │ PLAN.md │ │ LEARNINGS.md │ │ APPROVED? │ │ COMPLETE? │
└──────────────┘ └──────────┘ └──────────────┘ └─────┬─────┘ └─────┬─────┘
↑ │ │
└── CHANGES REQ'D ───┘ │
↑ │
└──── NEEDS WORK ────────────────────┘
DO NOT:
After writing SPEC.md, update it with structured frontmatter:
---
phase: ds-brainstorm
status: completed
implements: [all requirement IDs assigned in this phase]
requires: [user input]
provides: [.planning/SPEC.md]
affects: [.planning/]
tags: [brainstorm, objectives, requirements]
---
One-liner rule: Must be SUBSTANTIVE. Good: "Panel regression study of CEO pay-performance sensitivity using CRSP-Compustat 2000-2024". Bad: "Brainstorm complete".
After completing brainstorm, dispatch the spec reviewer before proceeding:
Phase 1: Brainstorm -> SPEC.md written
-> Dispatch ds-spec-reviewer subagent
-> If APPROVED -> proceed to ds-plan
-> If ISSUES_FOUND -> fix SPEC.md -> re-dispatch reviewer (max 5 iterations)
Step 1: Discover and load the spec reviewer skill:
Read ${CLAUDE_SKILL_DIR}/../../skills/ds-spec-reviewer/SKILL.md and follow its instructions.
Step 2: Only after reviewer returns APPROVED, discover and load the next phase:
Read ${CLAUDE_SKILL_DIR}/../../skills/ds-plan/SKILL.md and follow its instructions.
Fallback (if Read fails): /ds-plan
CRITICAL: Do not skip to analysis implementation. Phase 2 profiles data and breaks down the analysis into discrete, manageable tasks. CRITICAL: Do not skip spec review. An unreviewed spec means profiling the wrong data and planning the wrong analysis.
tools
Use when "query Dewey Data", "deweydata.io", "SafeGraph places/patterns/spend", "Advan foot traffic", "POI / points of interest", "mobility data", "dataplor", "Veraset", "PassBy", "crypto/Bitcoin ATM locations", or any pull from the Dewey Data academic marketplace (UVA/NYU Platform Subscription) via the deweypy/deweydatapy client, DuckDB, or the Dewey MCP server.
development
Use when submitting jobs to UVA HPC (Rivanna/Afton), writing Slurm scripts (sbatch/srun/squeue), converting SGE to Slurm, running compute on any Slurm-managed cluster, or building WRDS data pipelines with polars on HPC. Triggers: 'submit to HPC', 'sbatch', 'squeue', 'slurm job', 'run on Rivanna', 'run on Afton', 'HPC array job', 'convert SGE to Slurm', 'polars on HPC', 'WRDS from HPC'.
testing
Internal skill for literature review and source materialization. Called after brainstorm, before setup. NOT user-facing.
development
This skill should be used when the user asks to "add paper", "paperpile add", "fetch PDF for", "find and add", "search paperpile", "find in paperpile", "paperpile search", "label paper", "trash paper", "download paper", "paperpile index", "edit paper metadata", "update paper title", "fix paper author", "paperpile edit", "find PDF online", "search google for PDF", "resolve PDF", "fetch PDF for citation", "get full-text for DOI", "resolve cite to PDF", or any request to manage their Paperpile library or resolve a citation to a local PDF.