skills/phase-gate/SKILL.md
Experimental phase boundary checkpoint. Use before committing API spend, compute, or time to a new experimental phase — especially when scaling from pilot to production, applying a configuration selected on a subset to a larger dataset, or transitioning between optimisation and evaluation phases. Surfaces under-powered assumptions from prior phases and determines whether cheap validation is needed before proceeding. Trigger phrases: 'ready to run', 'start Phase X', 'scale up', 'production run', 'let's proceed with'. Also trigger PROACTIVELY when recognising a phase boundary, even if the user does not invoke it explicitly.
npx skillsauth add saross/personal-assistant phase-gateInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Structured protocol for catching under-powered assumptions before they become
expensive mistakes. Complements /review-implementation (which checks whether
the approach is optimal) by checking whether the decisions feeding the
approach are validated at the power level required for the next phase.
Human-AI experimental workflows have a specific failure mode: a decision made on a small sample (wide confidence intervals, overlapping alternatives) gets carried forward as settled fact into a larger and more expensive phase. The decision feels validated because it was data-driven — but the data didn't have the statistical power to distinguish between alternatives.
This is distinct from:
/review-implementation)The failure mode here is: the right question was asked, the right method was used, but the sample was too small to give a definitive answer — and nobody noticed before scaling up.
Work through each step in order. Present findings as a table at the end.
List every prior result or decision that the upcoming phase depends on. Be exhaustive — include decisions that seem obvious.
For each assumption, record:
For each assumption from Step 1:
Flag assumptions where:
Mark each assumption as:
For each under-powered or untested assumption:
A validation that costs 1% of the phase budget and takes 10 minutes is almost always worth running.
For each assumption: if validation shows we chose wrong, what changes?
Present the summary table and recommend:
## Phase Gate: [Phase Name]
| # | Assumption | Source | Status | Validation cost | If wrong... |
|---|------------|--------|--------|-----------------|-------------|
| 1 | ... | ... | Validated | — | — |
| 2 | ... | ... | Under-powered | $X, N min | Re-run phase |
| 3 | ... | ... | Untested | $Y, N min | Partial rework |
**Recommendation**: [Proceed / Validate first / Reconsider]
**Validations to run before proceeding:**
1. [Specific validation, cost, time]
2. [...]
development
This skill should be used when the user asks to "moderate marks", "produce marking dossiers", "double-mark" an assessment, run a "second-reader pass", or "build a moderation pack". Also trigger when the user has just entered rubric marks for a HUMN8031 Assessment 2 paper and wants a moderation dossier produced. Do not trigger for rubric design or rubric review — only for dossier production on a marked assessment.
testing
Generate valid Fieldmark notebook JSON files from natural language descriptions, field manuals, or specifications. Supports validation rules, conditional logic, and parent-child relationships.
development
Generate modular "lego brick" documentation for Fieldmark field types. Produces design docs (Notebook Editor configuration), collect docs (data collection usage), shared docs, Playwright screenshot specs, and practical fieldwork tips. This skill should be used when creating, updating, or reviewing field type documentation for the fieldmark-docs-staging repository.
development
Classify dual-nature entities (hotels, churches, schools, halls) as building-only, business/organisation-only, or both based on contextual linguistic analysis.