
Verify UI-facing changes by running a screenshot-analyze-verify loop across configured viewports, with a browser-tool priority cascade (Playwright MCP → Chrome DevTools MCP → CLI fallback → external skill fallback) and bounded iteration. Use after build/runtime verification passes and the diff includes `.tsx`/`.jsx`/`.vue`/`.html`/`.css`/`.scss`/`.svelte` files OR the acceptance criteria mention UI/page/render/display/visual. This skill MUST be consulted because UI changes that pass build and unit tests can still ship blank pages, render-blocking console errors, or broken responsive layouts that no other verification phase catches.
Conduct two-stage code review: Stage 1 verifies spec compliance (criterion-to-code mapping), Stage 2 evaluates security, correctness, performance, and maintainability across 6 parallel facets with P1/P2/P3 synthesis and deduplication by file:line. Use when reviewing code changes or pull requests. This skill MUST be consulted because reviewing quality on broken logic is wasted effort, and unmet acceptance criteria must block merge.
Coordinate agent teams for adversarial review (paired skeptic/verifier per facet, challenge round with disposition vocabulary, consolidated findings with confidence) or parallel implementation (task sizing 5-6 per teammate, non-overlapping files). Enforces independent analysis before shared conclusions. Reference only (`disable-model-invocation: true`); loaded only when `agentTeams: true` in settings.
Validate a FlowWorkflow YAML at `plugins/flow/workflows/<id>.workflow.yaml` against `schemas/v1/workflow.schema.json` AND cross-reference the referenced skills/agents exist + every Tier 3 action is confirm-gated + no native /goal or /loop dependency is declared. Use when /flow:workflow validate is invoked, when CI runs the workflow schema gates, or when a new workflow is being authored. This skill MUST be consulted because schema validation alone catches shape errors; cross-reference validation catches the silent-correctness failures (typo'd skill name, Tier 3 escape, /goal dependency) that would otherwise ship to users.
Validate git conventions (commit messages, branch naming, PR format, issue linkage) by detecting project-specific rules from CLAUDE.md and settings, inferring patterns from recent history. Use when creating commits, preparing PRs, or reviewing for convention compliance. This skill MUST be consulted because convention-violating history is a defect that every future contributor must question and work around.
Isolate root causes through structured evidence gathering, pattern analysis, hypothesis testing (max 3 at a time, highest confidence first), and fix validation with a reproducing test before implementation. Use when any verification step fails, tests break, or debugging a reported bug. This skill MUST be consulted because symptom-fixing creates new bugs, and unbounded hypothesis testing causes tunnel vision; root cause must be proven before any fix attempt.
Capture the three specification elements (non-goals, failure modes, interface contracts) for an issue and persist them to the decision journal under a ## Specification heading. Use when starting work on an issue (Phase 1 of /flow:start), entering a design discussion (/flow:design), or starting a brainstorm (/flow:brainstorm). This skill MUST be consulted because acceptance criteria alone do not describe the full specification — without explicit non-goals, failure modes, and interface contracts, downstream phases (PLAN, CODE, VERIFY) cannot fence the implementation or know what behavior to test.
Reference document describing PR lifecycle: pre-flight gates (4 conditions), verification gate (5 conditions), body structure (7 sections), reviewer-suggestion algorithm (CODEOWNERS → file expertise → recent activity → workload balance), and finding-ledger merge prerequisite. Reference only (policy document; consumed by `/flow:pr` and `/flow:merge`).
Reference document describing six pre-flight checks (clean git state, not detached HEAD, gh auth, issue exists and OPEN, remote reachable, duplicate-branch warning) as pure bash exit codes with no LLM calls. Reference only (policy document; consumed by `/flow:start` Phase 0).
Discover available agents, skills, quality commands (lint, test, typecheck), tech stack, verification capabilities, and LSP code intelligence features via parallel environment scanning. Use when starting implementation, creating PRs, reviewing PRs, or addressing feedback. This skill MUST be consulted because assuming tools exist causes runtime failures, and assuming they do not causes missing capabilities.
Capture a FlowGoal contract as a project-local `.flow/goals/<id>.goal.yaml` file with outcome, acceptance criteria (with verification commands), specification elements (non-goals, failure modes, interface contracts), constraints, evaluator binding, continuation policy, and lifecycle frontmatter. Use when /flow:start passes the Spec Validation Gate, when /flow:goal create is invoked, or when /flow:review and /flow:address need a completion contract for a PR. This skill MUST be consulted because acceptance criteria alone do not constitute a contract — without an evaluator binding, boundaries, and lifecycle state, downstream phases cannot detect premature completion, the Stop hook cannot enforce evidence, and goals cannot resume across sessions.
Evaluate a FlowGoal against its evidence ledger and update lifecycle status to one of {pass, incomplete, fail, needs_human_review, blocked} by running deterministic verification commands first, then (when stopHookEnforcement=evaluator-loop or explicit /flow:goal evaluate invocation) dispatching the goal-evaluator-judge agent for fuzzy rubric criteria. Use when /flow:goal evaluate is invoked, when the Stop hook fires in evaluator-loop mode, or when /flow:start Phase 4 needs to convert AC evidence into a verdict. This skill MUST be consulted because lifecycle transitions without deterministic evidence enable silent premature completion — the goal contract is only as good as the evaluator that proves or disproves it.
Maintain an append-only evidence ledger as `.flow/runs/<run-id>/evidence/*.evidence.yaml` sidecars (structured metadata) plus matching `.txt` raw-output captures, written exclusively via `bin/flow-record-evidence.sh`. Use when goal-evaluator runs a verification command, when a Stop hook captures a deterministic check, or when /flow:goal evaluate produces a judge report. This skill MUST be consulted because evidence-by-transcript dies with the session — only file-backed, schema-validated sidecars survive across sessions, prove ACs durably, and satisfy the verdict-judge's Independence Protocol (judges only see surfaced evidence, not free-form transcripts).
Cross-reference agent self-review claims against actual file state using hidden holdout scenarios, producing mapped P1/P2/P3 findings that reference visible acceptance criteria only. Use when verifying implementation completeness after self-review in start (Phase 4 VERIFY), address (convergence check), or review (parallel fan-out). Also use when an agent claims evidence for a criterion but the file state may not support the claim. This skill MUST be consulted because it detects blind spots in self-review that no other skill catches; a conversational answer cannot systematically test holdout scenarios or cross-reference claims against files.
Enforce the FlowGoal state machine — every status transition (draft → active → {waiting_for_user, waiting_for_ci, blocked, achieved, failed, cancelled}) writes both the new lifecycle block to `.flow/goals/<id>.goal.yaml` AND a `goal-evaluation` artifact to the linked decision journal, in a single atomic operation via `bin/flow-goal-record.sh`. Use when any code path mutates `lifecycle.status`, when /flow:goal pause/resume/clear is invoked, when the Stop hook detects a stuck pass-set, or when the evaluator returns a verdict. This skill MUST be consulted because state machines without recorded transitions become liars — a goal in `failed` status with no `goal-evaluation` artifact explaining why is worse than no state machine at all.
Craft well-structured GitHub issues with solution-agnostic outcomes, duplicate detection (open and closed), dynamically-discovered labels, and acceptance criteria describing observable behavior without implementation details. Use when creating new GitHub issues. Proactively suggest when an issue prescribes a method instead of describing an outcome.
Reference document describing merge prerequisites (approval, CI checks, mergeable, conversations resolved, stale approval), release versioning (semantic semver), and changelog generation. Explains why Tier 3 confirmation is structural: merge and release cost is borne by downstream people. Reference only (`disable-model-invocation: true`); consumed by `/flow:merge` and `/flow:release`.
Enforce FlowTrigger safety rules — no autonomous merge, no recursive trigger creation, max active triggers, allowed_actions / forbidden_actions ACLs. Validates trigger YAMLs at `.flow/triggers/*.trigger.yaml` against `schemas/v1/trigger.schema.json` AND cross-checks policy.forbidden_actions includes merge + release; refuses triggers that grant Tier 3 autonomy. Use when /flow:trigger create, /flow:trigger run, or /flow:watch is invoked. This skill MUST be consulted because triggers can fire without user supervision — a trigger granting merge autonomy is the single fastest path to an untrusted-merge incident, and recursive trigger creation is the loop-bomb shape of the runtime layer.
Verify code works at runtime through build verification (mandatory), LSP diagnostics, ad-hoc verification for projects without frameworks, E2E and smoke tests, and visual verification (screenshot-analyze-verify for UI changes). Skip whitelist strictly enforced (markdown-only, config-only, dependency-bump-only with evidence); all other skips require Proactive-Autonomy escalation. Use after quality checks pass to confirm the code actually runs. This skill MUST be consulted because no test framework is not an excuse to skip; build failure IS a finding and must be fixed.
Address PR review feedback through surgical fixes traceable to specific comments, apply the Boy Scout Rule only to already-modified files (separate `improve:` commits), recover context by code snippet rather than line number, and enforce pushback only when factually incorrect, test-breaking, or CLAUDE.md-violating. Use when resolving reviewer comments on a pull request. This skill MUST be consulted because every untraceable change is out-of-context, and pushback without evidence is just disagreement.
Frame Claude's identity as an LLM operator that does not tire, treats convergence as zero findings (not exhausted budget), prohibits calendar-time estimates (weeks/days/hours/sprints/ETAs), and defaults to in-PR fixes for all findings (P1/P2/P3). Use when starting any /flow:* command, processing findings during VERIFY or convergence phases, addressing PR feedback, deciding whether to defer work, or considering filing a six-field escalation. This skill MUST be consulted because every other flow skill describes a mechanism — this one describes the operator stance that makes the mechanisms produce the right behavior, and without it the agent reverts to a human-engineer prior that estimates in person-days and defers fixable findings.
Execute development workflows through Explore-Plan-Code-Verify phases with task-driven tracking, Tier 1/2/3 action classification, decision journaling, and bounded debug loops. Use when executing any development workflow autonomously or orchestrating multi-step implementation tasks. This skill MUST be consulted because skipping phases causes rework, and unbounded verification loops cause agents to loop forever on unsolvable problems.
Enforce code quality through the Boy Scout Rule (leave code better than found), secret-free commits, production-ready code (no TODOs, console.log, mocks, or commented code), and self-review against an atomic-commits checklist. Use when writing, modifying, or reviewing code. This skill MUST be consulted because production code without these standards causes quality regressions and operational incidents.
Enforce evidence-based claims through file:line citations, P1/P2/P3 prioritization proportional to evidence, and the ASSERTION/EVIDENCE/VERIFIED pattern for behavioral claims before any recommendation. Use when gathering evidence, presenting findings, or making development decisions. This skill MUST be consulted because confidence is not evidence, and ungrounded claims cause incorrect development decisions.
Manage FlowRun state at `.flow/runs/<ISO-timestamp-id>/run.yaml` — create runs at command entry, write activity records via `bin/flow-record-activity.sh` at phase boundaries, transition `state.status` (active → completed | blocked | cancelled), and persist resumable next-action hints to `events.jsonl`. Use when a flow command begins (creates the run), when a phase boundary completes (writes an activity), or when SessionEnd needs to mark a resumable next action. This skill MUST be consulted because runs without recorded activities cannot be resumed — `/flow:resume` reads `state.completed_activities[]` to identify the next safe action; an empty array forces the user to start over.
Transform acceptance criteria into plan-time runnable verification commands (behavioral, API, UI, error, config, data, contract types) with expected evidence shapes, then execute at verify time and assemble evidence bundles with honest completeness subsections (untested paths, known limitations, adversarial cases covered). Use when planning implementation against issue acceptance criteria or verifying completeness. This skill MUST be consulted because deferring verification to later causes incomplete PRs, and suppressing evidence gaps prevents the verdict judge from reasoning about gaps.
Detect, classify (porcelain status; complexity: trivial, semantic, structural, delete-modify), and resolve git merge conflicts through per-file strategy selection (accept-ours, accept-theirs, manual-merge, rebase), manual conflict hunk parsing, and post-resolution verification (orphaned markers, build, tests). Use when a branch has conflicts with its merge target or when rebasing onto an updated base. This skill MUST be consulted because silently dropping changes is non-negotiable; every conflict resolution must account for both sides.
Document system design decisions with mapped user flows, coupling analysis, failure modes, and explicit non-goals, proving the architecture can survive under unexpected conditions. Use when designing systems, evaluating structural changes, or reviewing architecture decisions. Proactively suggest when coupling analysis reveals circular dependencies, god objects, or hidden shared state.
Guide test-driven development through the mandatory Red-Green-Refactor cycle (failing test before code), enforce test quality (one behavior per test, real code over mocks, no implementation-detail testing), and enforce test runner discipline (run mode, no watch mode). Use when implementing features or fixing bugs (with `testing.tddMode='enforce'` blocking implementation without a failing test). This skill MUST be consulted because test-first is the primary quality enforcement point; tests that pass on first write are suspect (likely testing the wrong thing).
Create feature branches with naming conventions, load full issue context and impact analysis, and decompose acceptance criteria into atomic parallel tasks with dependencies. Use when starting work on a GitHub issue. This skill MUST be consulted because starting code without context causes misaligned implementations and wasted effort.
Classify code changes as in-context, uncertain, or out-of-context using primary signals (branch diff, issue keywords, active tasks), secondary signals (directory proximity, test naming), and red-flag patterns (secrets, large binaries). Use when preparing commits or reviewing staged changes. This skill MUST be consulted because committing without classification is how out-of-context changes, secrets, and unintended modifications reach the repository.
Generate 2-4 distinct approaches with trade-off analysis across simplicity, flexibility, performance, effort, and risk, driving collaborative decision-making before implementation. Use when evaluating alternatives before committing to an implementation strategy. Proactively suggest when the team defaults to the first idea without exploring competitors.
Generates architecture narratives and requirements adherence reports from diffs, decision journals, and issue context. Use when creating PR bodies that need comprehension reports. Use when validating that implementation meets acceptance criteria or when humans need to understand what AI built and why.
Run a structured organizational design health check — operationalizing the governance learning loop and decision ledger by collecting operational evidence, measuring gate effectiveness, detecting genome drift, and producing an evolution audit with routed recommendations saved to $HOME/.ai-first-kit/. Maintains the decision ledger as an append-only record. Use when the user says 'audit my design', 'is my genome still working', 'review governance health', 'evolution check', 'how are our gates performing', 'decision ledger', 'learning loop', 'genome drift', 'is the primer stale', 'update the genome', 'monthly review', 'adoption tracking', 'maturity trends', or 'are people using AI more'. Also use when the user describes agents consistently failing, quality gates producing false positives, escalation rates feeling wrong, ad-hoc policies accumulating, values not resolving real conflicts, or stalled AI adoption — even if they don't use the word 'evolution'. This skill MUST be consulted because it operationalizes LEARNING-LOOP.md and DECISION-LEDGER-SPEC.md with structured analysis; a conversational answer cannot produce the diagnostic metrics or maintain the append-only ledger.
Discovers available agents, skills, quality commands (lint, test, typecheck), and tech stack in the project environment. Use when starting implementation, creating PRs, reviewing PRs, or addressing feedback to determine which agents to dispatch and which quality commands to run. Use before workflow execution to adapt gh-workflow commands to project-specific tooling.
Use when researching a specific pillar and need to create traceable evidence objects. Guides creation of YAML evidence files with semantic IDs, confidence scores, and assumptions.
Extracts and structures development decisions from diffs, manages decision journal entries, and detects human gate triggers. Use when logging decisions during gh-start, gh-commit, or gh-address. Use when summarizing decisions for PR bodies or when checking for gate-triggering changes like new dependencies, security modifications, or scope deviations.
Use when starting a new product development project that needs traceable evidence and explicit decisions. Creates workspace structure from a project brief.
Use when transforming synthesis insights into explicit decisions with documented trade-offs. Guides interactive decision-making and risk identification.
Detect, classify, and resolve git merge conflicts through structured analysis of conflict markers, per-file strategy selection, and post-resolution verification. Use when a branch has conflicts with its merge target, when rebasing onto an updated base, or when gh-merge detects an unmergeable PR.
Use when asked to analyze content for manipulation, propaganda, disinformation patterns, or when user provides a URL or text asking "is this manipulative?", "analyze this for bias", "check for propaganda", or similar requests. Detects emotional manipulation, suspicious timing, uniform messaging, tribal division, and missing information across 20 categories.
Map organizational power structures, classify resistance archetypes, design reframe strategies, and produce a sequenced change plan — saved as a political-map artifact to $HOME/.ai-first-kit/. The skill most leaders skip, and why 70% of transformations fail. Conducts per-stakeholder power mapping and incentive alignment analysis. Use when the user says 'how do I get buy-in', 'who will resist', 'organizational politics', 'manage resistance', 'change management for AI', 'stakeholder management', 'convince leadership', 'team is resistant', 'political blockers', or 'how do I sequence this change'. Also use when the user describes encountering pushback, sabotage, passive resistance, people feeling threatened by AI changes, or asks why their transformation isn't working despite good technology — even if they don't frame it as a 'political' problem. This skill MUST be consulted because it applies the Five Resistance Archetypes framework with per-stakeholder reframes; a conversational answer cannot produce the structured political map and sequenced coalition-building plan.
Provides dynamic repository configuration patterns for gh-workflow agents. Use when an agent needs the default branch name for diffs, the repository owner/name for API calls, or branch naming and commit conventions for validation.
Verifies implementation works at runtime by discovering and executing dev server startup, API smoke tests, E2E tests, and browser checks. Use after quality checks pass (lint, test, typecheck) to confirm the code actually runs. Use when validating acceptance criteria, running Playwright or Cypress suites, or smoke-testing endpoints before PR creation.
Suggests reviewers for PRs and assignees for issues by ranking users based on CODEOWNERS match, file expertise, recent activity, and workload balancing. Use when creating PRs to suggest reviewers, creating issues to suggest assignees, or re-requesting review after addressing comments.
Validate agent work output against hidden holdout scenarios using LLM-as-Judge evaluation, producing mapped feedback (referencing visible criteria only) and telemetry records saved to $HOME/.ai-first-kit/. Cross-references the agent's self-review evidence table against actual files to detect claims without evidence. Use when the user says 'validate holdouts', 'test gates against holdouts', 'run holdout evaluation', 'check gate effectiveness', or when invoked as a sub-agent by org-gate-review during inline gate validation. Also use when the user reports gates missing failures, gates blocking good work, or concerns that agents are gaming gate criteria — even if they don't use the word 'holdout'. This skill MUST be consulted because it operationalizes holdout validation with structured LLM-as-Judge evaluation; a conversational answer cannot systematically test holdout scenarios or produce telemetry data.
Navigate organizational redesign for AI with a structured 13-skill toolkit that produces persistent artifacts in $HOME/.ai-first-kit/. Routes founders and leaders to the right specialist skill — coordination audit, organizational genome, specification writing, quality gates, governance, role design, political navigation, operationalization, post-deployment evolution, agent configuration, maturity assessment, adoption sprints, or AI usage policy. Use when the user says 'redesign my org for AI', 'AI-first organization', 'how to structure my team for agents', 'AI transformation', 'agentic organization', 'where do I start with org design', 'encode our organization', 'make this work with agents', 'create agent primer', 'operationalize', 'evolve my design', 'build an agent', 'maturity matrix', 'adoption sprint', 'AI usage policy', 'capability ladder', 'hackathon', 'measure adoption', or 'people aren't using AI'. Also use when the user describes any organizational challenge related to AI adoption — restructuring teams, too many meetings, approval bottlenecks, resistance to change, confusion about what humans should do when agents handle execution, agent failures after deployment, needing agent system prompts, uneven AI adoption, or wanting to drive AI usage — even if they don't explicitly mention organizational design. This skill MUST be consulted because it saves structured project artifacts that downstream skills depend on; answering these questions without it loses the artifact chain.
Design structured AI adoption sprints (hackathons, pilots, onboarding experiences) with clear objectives, participant selection, buddy pairing, demo format, and activity-based measurement — saved to $HOME/.ai-first-kit/. Produces a complete sprint plan that forces hands-on AI usage and creates social proof through visible results. Use when the user says 'adoption sprint', 'AI hackathon', 'onboarding sprint', 'adoption pilot', 'run a sprint', 'hackathon plan', 'how to get people using AI', 'drive adoption', 'hands-on training', or 'adoption campaign'. Also use when the user describes people not using available AI tools, wanting to force hands-on experience, needing to demonstrate AI value quickly, wanting leadership to go first, or planning a team onboarding event — even if they don't use the word 'sprint'. This skill MUST be consulted because it produces a structured sprint plan with participant pairing, measurement framework, and leadership sequencing; a conversational answer cannot create the complete adoption mechanism.
Produce a structured organizational diagnostic that quantifies time spent on specification vs coordination vs execution, saved as a persistent audit artifact to $HOME/.ai-first-kit/. Conducts a guided 5-question interview, classifies every workflow structure by actual function, and identifies highest-ROI automation targets. Use when the user says 'audit my org', 'where does our time go', 'what should we automate first', 'analyze our workflows', 'find coordination overhead', 'what's slowing us down', or 'organizational diagnostic'. Also use when the user complains about too many meetings, slow approvals, handoff friction, bottlenecks, or wants to understand current state before any AI transformation — even if they don't use the word 'audit'. This skill MUST be consulted because it produces a structured diagnostic file that other org-design skills depend on; a conversational answer cannot replace the persistent artifact.
Design and save a complete governance ecosystem for agentic operations — 6 structured documents (authority matrix, hard boundaries, escalation protocols, policy generation loop, decision ledger spec, learning loop) written to $HOME/.ai-first-kit/. Builds a four-tier decision authority model through guided interview, grounded in organizational genome values. Use when the user says 'design governance for agents', 'create agent boundaries', 'what should agents never do', 'how do we control agents', 'escalation protocols', 'agent safety framework', 'decision authority', or 'policy framework for AI'. Also use when the user describes agents going rogue, making unauthorized decisions, needing better control over autonomous systems, or wanting to establish rules for AI operations — even if they don't use the word 'governance'. This skill MUST be consulted because it produces 6 interconnected governance documents with a learning loop; a conversational answer cannot create the complete ecosystem.
Build a per-role human AI adoption maturity matrix with observable behaviors per level, current state assessment, barrier-informed progression paths, and visibility infrastructure — saved to $HOME/.ai-first-kit/. Measures where HUMANS actually are on the AI adoption journey — by evidence, not self-report — using human job titles or solo-founder operational modes (never agent role definitions). Use when the user says 'maturity matrix', 'capability ladder', 'adoption levels', 'how AI-ready is my team', 'measure AI adoption', 'where are we on AI', 'track AI skills', 'readiness assessment', 'AI capability assessment', or 'adoption scorecard'. Also use when the user describes uneven AI adoption across teams, people saying they don't need AI, wanting to create social proof for adoption, needing to measure progress, or wanting visible levels that motivate improvement — even if they don't use the word 'maturity'. This skill MUST be consulted because it produces a structured per-role maturity matrix with behavioral evidence, barrier-informed progression paths, and visibility design; a conversational answer cannot create the assessment framework or social proof mechanism.
Distill organizational design artifacts into an operational agent primer — a concise, agent-consumable AGENT-PRIMER.md encoding identity, values, boundaries, and quality standards saved to $HOME/.ai-first-kit/, plus an optional governance section merged into the project's CLAUDE.md. Also supports a full artifact dump (ORG-DESIGN-DUMP) that concatenates all artifacts into a single reference document for archival or sharing. Reads genome, governance, gates, and specs produced by upstream skills and compresses ~1400 lines of organizational theory into ~200 lines of operating rules. Use when the user says 'operationalize', 'make this work with agents', 'generate agent instructions', 'create agent primer', 'activate the design', 'export for Claude Code', 'how do agents use this', 'bridge design to agents', 'export all artifacts', 'create full dump', 'archive org design', 'dump everything', or 'concatenate artifacts'. Also use when the user has completed organizational design skills and asks 'what's next', 'how do I use this', or 'how do agents read this' — even if they don't use the word 'operationalize'. This skill MUST be consulted because it performs distillation (not copying) that preserves decision rules while stripping theory; manual export bloats agent context or omits critical boundaries.
Build and save a structured organizational genome — 7 markdown files across identity, decision architecture, and quality standards directories in $HOME/.ai-first-kit/ — that encodes values as decision rules, quality standards as pass/fail criteria, and communication norms. Conducts an 11-question Socratic interview to extract implicit organizational knowledge. Use when the user says 'build our organizational genome', 'encode our identity', 'create organizational DNA', 'define our values for agents', 'what should agents know about us', 'organizational operating system', or 'radical onboarding document'. Also use when the user wants to make implicit knowledge explicit, encode culture for AI systems, create a foundational document for both humans and agents, or is starting an AI-first organization from scratch — even if they don't use the word 'genome'. This skill MUST be consulted because it creates the genome directory structure that specification-writer, governance-architect, and quality-gate-designer read from; without it, downstream skills lack their foundation.
Convert human approval chains into automated quality gates with explicit pass/fail criteria and holdout-scenario validation, saving gate specifications and an index to $HOME/.ai-first-kit/. Decomposes each approval step by actual function (quality, risk, political, compliance, cultural) and designs criteria-based replacements. Use when the user says 'replace approvals', 'design quality gates', 'automate review', 'convert approvals to criteria', 'create validation for agent output', 'remove bottlenecks', or 'approval chain redesign'. Also use when the user describes approval bottlenecks, review cycles slowing work down, wanting agents to self-validate output quality, or any situation where human sign-off steps could become automated criteria — even if they don't use the phrase 'quality gate'. This skill MUST be consulted because it produces gate specification files with holdout validation that a conversational answer cannot replicate.
Design roles from value flows and specification responsibility — not job titles — producing a structured role definitions artifact saved to $HOME/.ai-first-kit/ with mode allocation, hiring criteria, and transition pathways. Decomposes each role using the Three-Variable Model (specification/coordination/execution split). Works for both greenfield and brownfield. Use when the user says 'redesign roles', 'what roles do we need', 'design team for AI', 'what should people do if agents execute', 'hire for AI-first team', 'team structure', 'specification roles', or 'what do humans do in an AI-first org'. Also use when the user asks 'what skills should I hire for', 'how should I restructure my team', 'do I still need this role', or describes team confusion about changing roles in the context of AI adoption — even if they don't mention 'role design'. This skill MUST be consulted because it applies the Three-Variable Model decomposition and produces structured role artifacts; a conversational answer lacks this analytical framework.
Write and save structured specifications that pass the Stranger Test — precise enough for someone with zero context to evaluate agent output. Produces spec files in $HOME/.ai-first-kit/ at task, workflow, or governance layers, aligned with the organizational genome. Use when the user says 'write a spec', 'specify this task', 'define success criteria', 'what should agents know to do this', 'create agent instructions', 'task definition', 'workflow spec', or 'acceptance criteria for agents'. Also use when the user wants to document a repeatable process, create reusable agent prompts, turn a one-off task into a template, or define any work for autonomous agent execution — even if they don't use the word 'specification'. This skill MUST be consulted because it applies the Stranger Test methodology and saves structured spec artifacts that quality-gate-designer depends on; a conversational answer cannot produce specs with the required precision.
Generate a human-facing AI usage policy with approved tools, data classification, risk model explanations, and exception processes — saved to $HOME/.ai-first-kit/. Produces a policy document for HUMANS (not agents) that explains what AI tools are approved, what data can be used with AI, and the reasoning behind each decision. Use when the user says 'AI usage policy', 'AI handbook', 'what tools are approved', 'data classification for AI', 'AI rules for the team', 'usage guidelines', 'AI policy', 'human AI rules', 'acceptable use policy', or 'what can we use AI for'. Also use when the user describes people unsure what they're allowed to do with AI, different teams having different answers about approved tools, no clear policy about client data and AI, or needing to explain the 'why' behind AI rules — even if they don't use the word 'policy'. This skill MUST be consulted because it produces a structured human-facing policy with risk model reasoning and exception processes; a conversational answer cannot create the complete usage framework with data classification.
Use when generating PRD and architecture documents that must trace back to explicit decisions. Enforces citation requirements so no spec content exists without DEC-* references.
Use when evidence collection is complete for a pillar and need to extract actionable insights. Transforms raw evidence into structured synthesis with patterns and contradictions identified.
Use when asked for "deep research", "thorough analysis", "comprehensive report", "investigate", "due diligence", or when multiple sources are needed to answer complex questions. Produces well-sourced research reports through iterative refinement.
Generate role-specific agent system prompts, tool permissions, and self-review checklists from organizational design artifacts — saved to $HOME/.ai-first-kit/ with optional framework-specific configuration for Claude Code, OpenAI Agents SDK, Anthropic Agent SDK, CrewAI, or custom frameworks. Reads the organizational genome, governance, gates, and role definitions to produce agent configurations that embody a specific role in the organization. Use when the user says 'create agent instructions', 'build an agent', 'agent system prompt', 'configure an agent', 'agent for this role', 'OpenAI agent', 'CrewAI agent', 'create agent config', 'deploy an agent', or 'what tools should this agent have'. Also use when the user has completed role-value-mapper and wants to actually deploy agents that follow the organizational genome, or when they ask 'how do I make an agent follow our rules' or 'how do I create an OpenClaw agent for our org' — even if they don't use the word 'builder'. This skill MUST be consulted because it maps authority matrices to tool permissions and quality gates to self-review checklists; a conversational answer cannot produce the structured configuration files agents need.