skills/ds-validate/SKILL.md
Validate analysis outputs against SPEC.md requirements using DQ checks.
npx skillsauth add edwinhu/workflows ds-validateInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Announce: "Using ds-validate (Phase 3.5) to validate analysis outputs against SPEC.md requirements."
Phase 3.5 of the DS workflow (between implement and review). Maps every SPEC.md requirement to an output artifact and runs data quality checks.
<EXTREMELY-IMPORTANT> ## The Iron Law of ValidationNO REVIEW WITHOUT VALIDATION. This is not negotiable.
ds-review MUST NOT start until .planning/VALIDATION.md confirms all requirements have outputs. Validation is the DS equivalent of test coverage — without it, review is theater.
</EXTREMELY-IMPORTANT>
DS validation does NOT auto-fill gaps. Dev's test-gap-auditor can write missing tests. DS gaps require human judgment — a wrong output means a wrong analysis, not just a missing test. When gaps are found, present them to the user and let the user decide: fix (return to implement) or accept (proceed to review).
Before running runtime DQ checks, run the static analysis constraint check suite:
bash "${CLAUDE_SKILL_DIR}/../../scripts/check-all-ds.sh" "$(pwd)"
This runs all DS constraint check scripts (determinism, join audits, idempotency, error handling, schema contracts, standard errors, visualization integrity).
If any check FAILS: Report the failures in LEARNINGS.md. These are code quality issues in the analysis scripts that must be fixed before proceeding. Dispatch a fix subagent if needed.
If all checks PASS: Proceed to runtime DQ checks.
This flowchart IS the specification. If prose elsewhere and this diagram disagree, the diagram wins.
┌──────────────────────────────────────────────┐
│ 0. RUN static analysis suite (check-all-ds.sh)│
└───────────────────┬──────────────────────────┘
all pass? │
┌──── no ───────┴────── yes ──────┐
▼ ▼
┌──────────────────┐ ┌───────────────────────────────────┐
│ log to LEARNINGS │ │ 1-4. READ SPEC / PLAN / LEARNINGS, │
│ + dispatch fix │ │ DISCOVER ds-checks.md │
│ subagent, re-run │ └─────────────────┬─────────────────┘
└────────┬─────────┘ ▼
│ ┌────────────────────────────────────┐
│ │ 5. RUN ds-validate-coverage workflow│
│ │ (one read-only validator/requirement│
│ │ → JS gate, NOT a hand-tallied score)│
│ └─────────────────┬──────────────────┘
│ ▼
│ ┌────────────────────────────────────┐
│ │ 6. RENDER .planning/VALIDATION.md │
│ │ from the workflow result │
│ └─────────────────┬──────────────────┘
│ JS gate │
│ ┌── gaps_found ───────┴── validated ──┐
│ ▼ ▼
│ ┌──────────────────────┐ ┌──────────────────────┐
└──▶│ decision checkpoint: │ │ proceed to ds-review │
│ user fix-vs-accept │ │ (gate: status= │
│ (see Gate section); │ │ validated) │
│ accept ⇒ flip status │ └──────────────────────┘
│ to validated │
└──────────────────────┘
Note: Steps 1-4 stay in this skill as the reading/discovery preamble — the workflow's own Discover phase re-resolves them authoritatively, but reading them here lets the skill present context and decide scope before invoking the workflow.
Read .planning/SPEC.md and extract every requirement:
For each requirement in SPEC.md:
- Extract the requirement description
- Note the success criteria
- Note the expected output (table, figure, file, etc.)
Read .planning/PLAN.md and extract:
Read .planning/LEARNINGS.md and extract:
Read ${CLAUDE_SKILL_DIR}/../../skills/ds-implement/references/ds-checks.md and follow its instructions.
The per-requirement DQ fan-out and the COVERED/PARTIAL/MISSING + validated|gaps_found gate are owned by a ultracode workflow — a script, not hand-dispatched agents. This is why: the validators return RAW DQ statuses and the gate is computed in pure JS from those statuses, so the model can no longer tally the composite by hand (the old honor-system gate). The workflow also isolates one validation transcript per requirement out of main context.
1. Resolve the cached workflow path:
WF=$(command ls -d ~/.claude/plugins/cache/edwinhu-plugins/workflows/*/workflows/ds-validate-coverage.js 2>/dev/null | sort -V | tail -1)
# Local-plugin fallback (running from source, cache empty):
[ -z "$WF" ] && WF="${CLAUDE_SKILL_DIR}/../../workflows/ds-validate-coverage.js"
echo "$WF"
2. Run it (full pass first; on a re-run after fixes, pass onlyChecks + priorReviews from the prior result):
Workflow({ scriptPath: "<WF>", args: { projectDir: "<abs project dir>", pluginRoot: "<abs .../workflows dir>" } })
The workflow fans out one read-only validator per in-scope SPEC requirement (running DQ1-DQ5 + M1 from ds-checks.md), then computes — in JS, from raw statuses — each requirement's classification and the overall status. It returns { overallPass, status, counts, scoreTable, findings, reviews, reviewersThatFlagged }.
Do NOT recompute or rationalize the gate — result.status and result.overallPass are computed in JS. Write .planning/VALIDATION.md using result.scoreTable as the Requirements Map, result.counts for the frontmatter totals, and result.findings under DQ Details:
status: <result.status> # validated | gaps_found — verbatim from the workflow
requirements_total / covered / partial / missing: <result.counts>
Requirements Map: <result.scoreTable>
DQ Details: <result.findings>
The /goal fix loop stays in this skill: if status: gaps_found, present gaps (Step "Gate" below) and let the user decide fix vs accept. On a fix-and-re-validate cycle, re-run the workflow with onlyChecks: <prev result.reviewersThatFlagged> and priorReviews: <prev result.reviews> so unflagged requirements carry forward and only the gaps re-run live.
Each requirement is validated at four levels, in order:
| Level | Check | Example |
|-------|-------|---------|
| 1. Exists | Output file/variable present | output/results.csv exists |
| 2. Substantive | Real data, not empty | >0 rows, expected columns present |
| 3. DQ Passes | DQ1-DQ5 pass | No dupes on key, nulls handled, row counts trace |
| 4. Answers Question | Addresses SPEC.md requirement | Table includes specified variables |
For each requirement, assign a classification:
| Classification | Criteria | |---------------|----------| | COVERED | All 4 validation levels pass | | PARTIAL | Output exists but DQ issues found or doesn't fully address requirement | | MISSING | No output found for this requirement |
---
status: validated | gaps_found
date: [ISO 8601]
requirements_total: N
covered: N
partial: N
missing: N
---
# Output Validation
## Requirements Map
| # | Requirement | Output | DQ1 | DQ2 | DQ3 | DQ4 | DQ5 | M1 | Classification |
|---|-------------|--------|-----|-----|-----|-----|-----|----|----------------|
| 1 | [from SPEC] | [path] | PASS | PASS | PASS | PASS | PASS | PASS | COVERED |
| 2 | [from SPEC] | [path] | PASS | WARN | PASS | PASS | PASS | PASS | PARTIAL |
| 3 | [from SPEC] | — | — | — | — | — | — | — | MISSING |
## DQ Details
[For any non-PASS check, include the specific finding]
## Summary
- Requirements: N total
- Covered: X
- Partial: Y
- Missing: Z
| Condition | Status |
|-----------|--------|
| All requirements COVERED | validated |
| Any PARTIAL or MISSING remain, user has NOT yet decided | gaps_found |
| Gaps remain BUT the user explicitly accepted them | validated (+ ## Accepted Gaps section) |
Status validated means "dispositioned and cleared to proceed" — either clean, OR gaps the user explicitly accepted. The downstream ds-review gate (GATE_STATUS=validated) blocks on gaps_found, so an undispositioned gaps_found cannot silently pass into review. This is the structural backstop for the decision checkpoint below — do not rely on the prose alone.
When the user accepts gaps, rewrite VALIDATION.md frontmatter status: gaps_found → status: validated and append:
## Accepted Gaps
The user reviewed and accepted these gaps on proceeding to review:
- [REQ-ID] [PARTIAL/MISSING]: [what is incomplete and why the user accepted it]
When presenting validation results to the user (especially gaps), generate diagnostic plots to accelerate the decision:
| Validation Finding | Diagnostic to Generate | |-------------------|----------------------| | DQ2: High-null columns | Missingness heatmap (columns × rows) | | DQ3: Duplicate rows | Duplicate count bar chart by key columns | | DQ4: Row count mismatch | Pipeline waterfall chart (stage × row count) | | DQ5: Suspicious cardinality | Value frequency distribution plot | | PARTIAL requirements | Side-by-side: expected vs actual output summary |
When to generate: Only at decision checkpoints where the user must choose fix vs accept. Do not generate plots for COVERED requirements (no decision needed).
Format: Inline matplotlib/seaborn plots in notebooks, or saved to scratch/diagnostics/ for script-based workflows.
Checkpoint type: human-verify (VALIDATION.md status is machine-verifiable)
.planning/VALIDATION.md must exist before proceeding.
validated: human-verify checkpoint — auto-advanceable; proceed to ds-review.gaps_found: decision checkpoint — present gaps to user before proceeding.
status: validated and append the ## Accepted Gaps section (see Status Rules) BEFORE proceeding. The ds-review gate hooks on status: validated — leaving it at gaps_found will (correctly) block review, because an undispositioned gaps_found is indistinguishable from "user never decided."When the user chooses fix, the cycle ds-validate → ds-implement → ds-validate repeats. This loop is bounded — it does not cycle indefinitely. Track it in .planning/VALIDATE_STATE.md (analogous to ds-review's REVIEW_STATE.md):
---
iteration: 1
max_iterations: 3
status: gaps_found # gaps_found | validated
last_gaps: [REQ-ID, ...] # requirement IDs still PARTIAL/MISSING
---
iteration.gaps_found, STOP looping. Escalate to the user with a structured choice (AskUserQuestion): fix again (override the cap with explicit instruction), accept remaining gaps (flip to validated + Accepted Gaps), or rethink (return to /ds for re-planning). Do not silently start a 4th fix cycle — repeated failure to close the same gap is a signal the plan or data is wrong, not that one more pass will help.This is the critical difference from dev-test-gaps. In dev, missing tests can be auto-generated. In DS, missing or wrong outputs mean the analysis itself may be wrong. Only the user can judge whether a gap is acceptable. </EXTREMELY-IMPORTANT>
After validation is complete, discover and read the ds-review skill:
Read ${CLAUDE_SKILL_DIR}/../../skills/ds-review/SKILL.md and follow its instructions.
tools
Use when "query Dewey Data", "deweydata.io", "SafeGraph places/patterns/spend", "Advan foot traffic", "POI / points of interest", "mobility data", "dataplor", "Veraset", "PassBy", "crypto/Bitcoin ATM locations", or any pull from the Dewey Data academic marketplace (UVA/NYU Platform Subscription) via the deweypy/deweydatapy client, DuckDB, or the Dewey MCP server.
development
Use when submitting jobs to UVA HPC (Rivanna/Afton), writing Slurm scripts (sbatch/srun/squeue), converting SGE to Slurm, running compute on any Slurm-managed cluster, or building WRDS data pipelines with polars on HPC. Triggers: 'submit to HPC', 'sbatch', 'squeue', 'slurm job', 'run on Rivanna', 'run on Afton', 'HPC array job', 'convert SGE to Slurm', 'polars on HPC', 'WRDS from HPC'.
testing
Internal skill for literature review and source materialization. Called after brainstorm, before setup. NOT user-facing.
development
This skill should be used when the user asks to "add paper", "paperpile add", "fetch PDF for", "find and add", "search paperpile", "find in paperpile", "paperpile search", "label paper", "trash paper", "download paper", "paperpile index", "edit paper metadata", "update paper title", "fix paper author", "paperpile edit", "find PDF online", "search google for PDF", "resolve PDF", "fetch PDF for citation", "get full-text for DOI", "resolve cite to PDF", or any request to manage their Paperpile library or resolve a citation to a local PDF.