skills/compliance-audit/SKILL.md
Auto-invoke after any do-work, tdd, systematic-debugging, or review-pr-copilot task completes to review the diff against active rules and skills, flag violations, update the skill if a gap is found, and close the loop between 'rule was loaded' and 'rule was followed'.
npx skillsauth add arndvs/ctrlshft compliance-auditInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Output "Read Compliance Audit skill." to chat to acknowledge you read this file.
Runs after a task completes. Reviews the actual diff against the rules and skills that were active during the session. Flags violations explicitly. Updates the skill if the gap is structural.
This skill exists because "Read X" confirms a rule was loaded into context — not that it was followed. The compliance audit closes that gap.
Auto-invoke after:
/do-work completes and produces a commit/tdd completes a red-green-refactor cyclesystematic-debugging produces a fixreview-pr-copilot finishes a round (catches self-initiated changes bundled into Copilot-comment commits — see PR #68 dogfood). Specifically check:
Can also be invoked manually: /compliance-audit to review the most recent commit against active context.
Before reviewing the diff, establish what rules and skills were in scope.
# What contexts were active during this session?
echo $ACTIVE_CONTEXTS
# What rule files are currently loaded?
ls ~/dotfiles/rules/
# What skills were explicitly invoked?
# (check the session transcript for "Read X" outputs)
Produce a list:
$ACTIVE_CONTEXTS)rules/ matching active contexts)# Most recent commit diff
git diff HEAD~1 HEAD
# Or staged changes if not yet committed
git diff --cached
The HITL-tier-classification and HITL-deferral-path checks under the review-pr-copilot trigger evaluate PR thread replies, not diffs. When auditing a review-pr-copilot round you must fetch those threads before scoring.
Required tools (MCP tools may be deferred — use tool_search if not loaded):
github-pull-request_currentActivePullRequest (preferred, returns thread node IDs)gh api graphql fallback (CLI command, not an MCP tool — run via terminal when the PR-extension cache is stale)mcp_github_pull_request_read (method get_review_comments) for comment bodies — degraded mode only (see below)Fetch pattern:
gh api graphql -f query='query {
repository(owner:"<owner>",name:"<repo>"){
pullRequest(number:<N>){
reviewThreads(first:50){
nodes{
id isResolved
firstComment: comments(first:1){ nodes{ databaseId path body author{login} } }
lastComments: comments(last:10){ nodes{ databaseId path body author{login} } }
}
}
}
}
}'
The query aliases two comment windows per thread: firstComment (the original Copilot review comment) and lastComments (the 10 most recent replies). In practice this covers all review threads — Copilot PRs rarely exceed 50 threads or 10 replies per thread. If a PR does exceed these limits, increase the first: / last: values or paginate using pageInfo { hasNextPage endCursor }. Pair firstComment.nodes[0] (Copilot's original) with the latest agent-authored entry in lastComments (scan from the end for a comment whose author.login matches the PR author or agent). For each thread the agent's reply touches, capture: thread ID, original Copilot comment body, the agent's reply body, and whether the thread is resolved. Pass this set to Phase 3 — the HITL checks key off reply text (presence of arithmetic, presence of effort-vs-subjectivity reasoning, presence of Filed as #N link or HITL-blocking declaration).
If the PR has no Copilot review threads, the HITL checks are vacuously passed — note "no HITL replies to audit" and continue to the diff-based checks in Phase 3.
Degraded mode — if both currentActivePullRequest and gh api graphql are unavailable (e.g. gh CLI not installed), fall back to mcp_github_pull_request_read (method get_review_comments). This API returns comment bodies but not isResolved state or GraphQL thread IDs, so the HITL-deferral-path check ("issue link + thread resolved") cannot be fully validated. In degraded mode: check reply bodies for arithmetic and Filed as #N links as normal, but mark the resolved-state check as UNCLEAR — thread resolution not verifiable without GraphQL and continue to Phase 3.
For each active rule and instruction file:
Output format per rule:
[RULE] global.instructions.md — Surgical Changes
[STATUS] ✓ PASS
[EVIDENCE] Only files mentioned in the task were modified. No reformatting of adjacent code.
[RULE] instructions/nextjs.instructions.md — Server Components by default
[STATUS] ✗ VIOLATION
[EVIDENCE] /app/components/UserCard.tsx added "use client" without justification.
Server component was appropriate here — no client-side interactivity required.
[SEVERITY] Medium — pattern could spread if uncorrected
Severity levels:
COMPLIANCE SUMMARY
──────────────────
Session: [task description]
Commit: [hash]
Rules checked: [n]
Skills checked: [n]
Results:
✓ PASS: [n]
✗ VIOLATION: [n]
⚠ UNCLEAR: [n] (rule ambiguous — couldn't verify either way)
Violations:
[list with severity]
Overall: PASS / FAIL / PARTIAL
If a violation reveals that the active skill or rule doesn't clearly prohibit the behavior, update the skill inline.
Decision rule:
Update format:
## ⚠ Known failure mode — [date]
**Situation:** [what happened]
**Rule that should have caught it:** [rule name]
**Why it didn't:** [ambiguity / gap in the wording]
**Fix:** [clarified instruction added below]
---
Add the fix directly to the relevant section of the skill or rule, not in a separate "known issues" block at the bottom.
Append to working/compliance-log.md:
## [date] — [task name] — [PASS/FAIL/PARTIAL]
Commit: [hash]
Active contexts: [list]
Violations: [n] ([severity summary])
[brief description of any violations and disposition]
Then push the result to the HUD daemon:
source ~/dotfiles/bin/write-hud-state.sh
update_hud_compliance <pass_count> <fail_count> <warn_count>
For each violation found, also emit individual events:
write_hud_event "fail" "VIOLATION — <rule_file> — <title> — <severity>"
This log becomes the stress test baseline and the honest answer to "has this been tested."
The audit produces:
working/compliance-log.mdThe report goes in the session. The log persists across sessions. Over time the log is the empirical record of compliance rate.
This audit cannot catch everything. It reviews the diff against stated rules — it cannot detect subtle semantic violations (e.g., code that technically compiles but violates the spirit of the architecture). The goal is not perfect verification. It's making violations visible and recoverable rather than silent.
The compliance rate over time is the real metric. A system with documented 85% compliance and a known improvement path is more trustworthy than one that claims 100% with no verification.
development
Use when implementing UI, checking dark/light mode, or validating animations — adds a visual feedback loop via browser screenshots so frontend changes are verified, not assumed.
development
Use when Claude Code sessions had many manual approval ("press 1") prompts or when auditing hook permissions; identifies which Bash commands required approval.
tools
Use after merging a PR or during periodic cleanup to archive plan-mode files by linking them to merged PRs.
testing
Use when stress-testing a plan against the project's domain model — grills the design, sharpens terminology, and updates documentation (CONTEXT.md, ADRs) inline as decisions crystallise.