skills/ax-audit/SKILL.md
Audits agentic applications across two layers: agent-native architecture (tool parity, atomic tool granularity, context injection, explicit completion signals, approval gates) and agentic experience design (confidence cues, escape hatches, intent handshake, memory visibility, adaptive canvas). 23 rules across 4 feature playbooks (agent chat, tool execution, agent config, agent dashboard). Produces a 3-tier ship-readiness verdict (release-blocker / fix-this-sprint / backlog) plus an AX Relationship Summary naming the evolution stage, trust signal, and key gap. Use before merging an agentic feature PR, or when asked "is this agent-native?", "AX review", "AX critique", "critique this AI feature", "does this earn user trust?", "is this design actually agentic?", "trust review", "AX patterns check", or "audit this for AX". For traditional UX audits (forms, states, focus, async, microcopy) use ux-audit; for broad web UI quality (accessibility, layout, typography) use ui-audit.
npx skillsauth add mblode/agent-skills ax-auditInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Feature-level reviewer for applications where an agent acts on the user's behalf. Answers one question: does this agent earn trust, and where does it break?
rules-arch/, trust and relationship design in rules-ax/), ending in a ship-readiness verdict and an AX Relationship Summary.ux-audit); broad web UI quality: accessibility, layout, typography, performance (use ui-audit); agent instruction-file quality (use agents-md).If the scope contains no agentic features (only forms, lists, modals), stop and route to ux-audit. Running AX rules against traditional UI produces only noise.
Copy and track this checklist:
AX Audit progress:
- [ ] Step 1: Scope, via `git diff --name-only main` (PR mode) or explicit path (full sweep)
- [ ] Step 2: Detect agentic features per references/feature-playbooks.md
- [ ] Step 3: Run each detected feature's playbook in order, plus the diff-wide checks
- [ ] Step 4: For each check, load the rule file and follow its detection recipe
- [ ] Step 5: Tier each finding per references/ship-readiness.md (rule override table wins)
- [ ] Step 6: Render verdict + findings + AX Relationship Summary per references/output-format.md
- [ ] Step 7: Run the audit self-check and report its evidence counts
Step notes:
references/feature-playbooks.md. Four feature types: agent chat/copilot, agent tool execution, agent config, agent dashboard.parity-orphan-ui-action) runs on every PR-mode audit regardless of detected features.| Layer | Folder | Rules | Question it answers | Category index |
|---|---|---|---|---|
| 1: Agent-native architecture | rules-arch/ | 11 | Can the agent do what the user can do? Are tools atomic? Does the agent know what exists? Is completion explicit? | rules-arch/_sections.md |
| 2: Agentic experience | rules-ax/ | 12 | Does the agent earn trust? Can the user interrupt, undo, push back? Is memory visible? | rules-ax/_sections.md |
Load rules-arch/<category>-<slug>.md or rules-ax/<category>-<slug>.md when a playbook check names it. Categories: arch = parity, granularity, context, comm; ax = trust, control, context, comm. The layers share the comm and context prefixes but the rules are distinct: rules-arch/comm-no-approval-gate.md (orchestrator code has no gate logic) is not rules-ax/control-no-approval-gate.md (approval UI doesn't match the stakes).
Every finding gets exactly one tier (full trigger lists in references/ship-readiness.md):
release-blocker, fix before merge: no escape hatch, silent execution, heuristic completion, broken parity, ungated high-stakes actionsfix-this-sprint, merge with a tracked issue: no confidence cues, no intent handshake, opaque memory, bundled config toolsbacklog, ship and track: static canvas, no generative momentum, static API mapping, no checkpoint/resumeTier precedence: a rule's own surface-override table > the generic surface bump in references/ship-readiness.md > the rule's defaultTier. Apply at most one adjustment, never stack the generic bump on top of a rule's explicit override.
Verdict: ✅ READY (0 blockers, ≤3 sprint) · ⚠️ READY WITH FOLLOW-UP (0 blockers, ≥4 sprint) · ❌ NOT READY (≥1 blocker) · 🚫 INCOMPLETE (self-check failed).
Rendered after findings whenever any agentic feature was detected. Findings are for engineers; this summary is for designers and PMs, so never skip it. Four fields:
references/ax-evolution-curve.md)| File | Read when |
|---|---|
| references/feature-playbooks.md | Steps 2-3: detection heuristics, per-feature ordered checks, diff-wide checks |
| references/ship-readiness.md | Step 5: tier triggers, precedence, verdict logic |
| references/output-format.md | Step 6: findings JSON schema, summary schema, terminal rendering |
| references/agent-native-principles.md | A Layer 1 finding needs deeper grounding: parity, granularity, CRUD completeness, context patterns, approval matrices, checkpoint/resume |
| references/ax-evolution-curve.md | Writing the evolution-stage field of the AX Relationship Summary |
| rules-arch/_sections.md | Orienting in Layer 1 categories and their default tiers |
| rules-ax/_sections.md | Orienting in Layer 2 categories, default tiers, and co-firing rule pairs |
comm-no-intent-handshake defaults to fix-this-sprint but its own table says release-blocker on tool execution. Stacking the generic "+1 tier on tool execution" bump on top of explicit overrides double-upgrades backlog findings into blockers.AbortController.abort() is a false affordance. control-no-escape-hatch still fails: verify the abort() call, not the button label, or the audit passes a UI that lies to users.rg -l <feature-pattern>), then check each for the counter-pattern, and cite the file list as evidence.detection: observational rules cannot fail on grep evidence alone. granularity-static-api-mapping, trust-no-uncertainty-markers, control-over-conversational, and comm-no-generative-momentum require interaction-flow judgment; with static evidence only, return unknown with a reason instead of fail.ax-audit-ignore:<slug> comments count as suppressed, not pass. Report the suppressed count in the verdict block; a suppression with no trailing reason is itself worth a warn.ux-audit territory; duplicating them teaches engineers to dismiss the whole AX report.comm-no-generative-momentum and granularity-static-api-mapping default to backlog. Promoting cosmetic findings to blocker trains the team to ignore ❌ verdicts.Self-flag the audit INCOMPLETE if any of these are true, and include the counts as evidence in the report (planned vs. run rules per playbook, unknown rate, suppressed count):
unknownfail/warn finding lacks file:line evidence or a fix snippetux-audit: traditional UX quality on the same surfaces (run both on agentic features; ax-audit covers the agent layer, ux-audit the rest)ui-audit, broad web UI quality: accessibility, layout, typography, performanceagents-md: audit CLAUDE.md / AGENTS.md agent instruction filesdefine-architecture: repo structure and module boundariesdevelopment
Designs and builds UI end to end, from visual direction (palettes, type scales, design tokens, layout systems, landing-page CRO strategy, brand kits) to Tailwind implementation with the ui.sh design guideline system, including multiple variants with an in-browser picker, semantic markup scaffolds from screenshots, retrofitting dark mode or responsive behavior, and componentizing or canonicalizing Tailwind code. Use when asked to "build a landing page", "create a dashboard", "make this look good", "make this look premium", "pick a visual style", "design the UI for", "show me 3 hero options", "improve conversions", "create a brand kit", "turn this screenshot into markup", "add dark mode", "make a dark version of this image", "make this responsive", "fix this on mobile", "componentize this page", "clean up the Tailwind", or any prompt that designs, creates, or refines UI code. For auditing existing UI use ui-audit; for motion use ui-animation; for landing page copy use copywriting.
development
Collaborative interrogation that produces an implementation plan before any code is written. Explores the codebase and relevant docs first, asks one question at a time with a concrete recommended answer, grills the rationale behind documented decisions, flags fuzzy terminology, and walks a decision tree until shared understanding is reached, then writes a plan file. First step of the shipping pipeline; it creates plans, plan-reviewer stress-tests them, pr-creator opens the PR. Use when asked to "create a plan", "help me think through this", "plan this feature", "I want to build X", "grill me", "grill with docs", "understand the docs", "unpack the decisions", "brainstorm a spec", "what should the plan be", "think this through with me", or before starting any non-trivial implementation.
development
--- name: pr-reviewer description: Reviews the current local diff or branch and returns a read-only, severity-tiered findings report. It never edits files. Four modes: standard bug and compliance review, structural quality, AI slop detection, and whole-codebase security audit. Use when asked to run /pr-reviewer, "review my changes", or "code review" before commit, push, or handoff. "Thermo-nuclear review", "structural review", "deep code quality audit", "harsh maintainability review", and "code
development
--- name: ux-audit description: Feature-level UX audit for React/Next.js code, diff-aware by default. Catches what Lighthouse, axe, ESLint, and Storybook miss: state-coverage gaps (missing loading/empty/error), form data loss on validation, double-submit, broken focus management, optimistic UI without rollback, stale async responses, skeleton-induced layout shift, and vague microcopy. 33 modern failure-mode rules plus 30 Laws of UX rules across 12 feature playbooks. Produces a 3-tier ship-readin