.claude/skills/backpack-code-review-checklist/SKILL.md
Multi-agent review orchestrator for Backpack component PRs. Runs 5 parallel specialist agents, then confidence-scores findings to reduce false positives. Use for PR review, Constitution compliance checks, and pre-merge validation.
npx skillsauth add skyscanner/backpack backpack-code-review-checklistInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill reviews Backpack component PRs by dispatching 5 parallel specialist agents,
each focused on a narrow domain. A separate scoring pass filters false positives. The
confidence threshold is configurable (default 75, override via /backpack-code-review-checklist threshold=80).
Phase 0 Detect review mode + early-exit check
Phase 1 Lightweight metadata (PR number, SHA, file list, reference docs)
Phase 2 Launch 5 specialist agents IN PARALLEL (agents self-fetch data)
Phase 3 Confidence scoring (batch)
Phase 4 Filter (>= threshold), format, and output
Phase 5 Autopost gate (internal)
Key architecture decision: Agents have full tool access (Bash, Read, Grep, Glob, gh CLI).
Each agent fetches its own data (diff, files, history) independently. This enables true
parallelism — all 5 agents start working immediately without waiting for a central data
collection phase. The orchestrator's role is coordination and synthesis, not data fetching.
Before doing anything else, determine the review mode and check if review is needed.
Step 0.1 — Determine review mode:
PR mode: user message contains a github.com/.../pull/NNN URL
gh pr view NNN --repo Skyscanner/backpack --json headRefOid,files,state,isDraft,bodyhttps://github.com/Skyscanner/backpack/blob/[HEAD_COMMIT_SHA]/[PATH]#L[START]-L[END]BACKPACK_REVIEW_AUTOPOST=true.threshold–90 require human confirmation before posting.Local mode: no PR URL
git diff main...HEAD to get changes[path/file.tsx:29](path/file.tsx#L29)Step 0.2 — Early exit check (PR mode only). Skip review if:
If any condition is met, inform the user and stop.
The orchestrator collects only lightweight metadata — not the diff itself. Agents will
fetch their own data using their full tool access (Bash, Read, Grep, Glob, gh CLI).
Step 1.1 — Collect PR metadata (already done in Phase 0):
Step 1.2 — Read reference documents (skim for relevant sections):
.specify/memory/constitution.md — Core principles I-XIIICODE_REVIEW_GUIDELINES.md — Quality standardsdecisions/ — Relevant decision records (modern-sass-api.md, accessibility-tests.md, etc.)Step 1.3 — Summarise what this PR does in 2-3 sentences based on file list and PR body.
That's it. The orchestrator does NOT fetch the diff, does NOT build history context, and does NOT do any chunking. Each agent fetches exactly what it needs.
Agent pruning: Based on the file list from Phase 1, determine which agents to launch. Not all agents are needed for every PR:
| Agent | Launch when | Skip when |
|-------|------------|-----------|
| Agent 1 (Constitution & API) | Always | Never — applies to all PRs |
| Agent 2 (Sass & Token) | Changed files include .scss | No .scss files in diff |
| Agent 3 (A11y & Testing) | Always | Never — test coverage applies to all PRs |
| Agent 4 (History) | Changed files exist on main (i.e. not all-new files) | All files are new (no history to check) |
| Agent 5 (Bug Scanner) | Always | Never — bug scanning applies to all PRs |
Launch the selected agents in a single message using multiple Agent tool calls.
Each agent receives from the orchestrator: (a) the PR number (or "local mode"), (b) the head commit SHA, (c) the list of changed file paths, (d) the PR summary from Phase 1, and (e) its domain-specific Backpack rules.
Each agent self-fetches its own data using Bash, Read, Grep, Glob, and gh CLI.
Agents should use scoped fetches — e.g. gh pr diff [N] -- '*.scss' instead of
fetching the entire diff — to minimise redundant data across agents. Agents can do
iterative investigation — reading additional files, checking mixin sources, examining
package exports — without being limited to pre-injected context.
Phase 2 requirements:
file AND overlapping
startLine–endLine ranges AND similar title (semantic match), merge them into one.
Keep the issue with the more specific rule_id; if equal, keep the one from the
higher-priority source (constitution > sass-tokens > a11y-testing > history > bug-scan).Each agent MUST return a JSON array of issues:
[
{
"title": "Brief issue title (max 10 words)",
"explanation": "What is wrong, why it matters, what to use instead",
"file": "packages/bpk-component-foo/src/BpkFoo.tsx",
"startLine": 42,
"endLine": 45,
"source": "constitution|sass-tokens|a11y-testing|history|bug-scan",
"rule_id": "constitution.xi.classname-restriction",
"rule": "Constitution XI — className restriction",
"supporting_lines": [
{ "file": "packages/bpk-component-foo/src/BpkFoo.tsx", "startLine": 42, "endLine": 45 }
]
}
]
If an agent finds no issues, it returns [].
Scope: Naming, license, API encapsulation, TypeScript, documentation, and design approval checks.
Prompt to give this agent:
You are reviewing a Backpack design system PR. Your ONLY job is to check Constitution compliance for naming, licensing, and API design. Return issues as JSON. This agent intentionally covers API/TS/docs/design checks. Sass/token checks are handled by Agent 2, and accessibility/testing checks are handled by Agent 3.
PR number: [NUMBER] (repo: Skyscanner/backpack) — or "local mode" with
git diff main...HEADHead commit SHA: [SHA] Changed files: [INSERT LIST] PR summary: [INSERT]Step 1: Fetch the diff and read changed files. Use
gh pr diff [NUMBER] --repo Skyscanner/backpackorgit diff main...HEAD. Focus on.ts,.tsx,README.md, andexamples/files relevant to your scope. Read changed files directly with the Read tool for deeper inspection.Step 2: Check each changed file against these rules:
Naming & File Conventions (Constitution II)
- Component files: PascalCase (
BpkFoo.tsx)- Style files:
.module.scssmatching component name- Test files:
*-test.tsxandaccessibility-test.tsx- Package names:
bpk-component-[name](kebab-case)- CSS classes: BEM with
bpk-prefix (bpk-foo__element)License Headers (NON-NEGOTIABLE)
- ALL
.ts,.tsx,.scss,.jsfiles must have Apache 2.0 header- Must contain "Copyright 2016 Skyscanner Ltd"
- Check with:
grep -L "Copyright 2016 Skyscanner" [files]API Encapsulation (Constitution XI — CRITICAL)
- NEW components MUST NOT accept
classNameorstyleprops- Correct pattern:
Omit<ComponentPropsWithoutRef<'div'>, 'children' | 'className' | 'style'>- Wrong pattern: bare
ComponentPropsWithoutRef<'div'>which leaks className- Existing components may grandfather className. Determine if file is new:
git show main:[filepath]— exit code 0 = existing (grandfathered), exit code 128 = new file (must enforce restriction)- Accessibility props (e.g.
accessibilityLabel) must be REQUIRED, not optionalTypeScript (Constitution V)
- All new code in TypeScript
- Proper prop type interfaces
- JSDoc/TSDoc comments for public APIs
@deprecatedtags for deprecated APIsDocumentation (Constitution IX)
- README.md with usage examples
- Storybook stories in
examples/- British English for prose
- Public props documented with JSDoc/TSDoc
Design Approval (Constitution X — CONDITIONAL)
- Only required when PR includes visual changes:
.scssfiles,.stories.tsxfiles, new component directories, or Figma references in the description.- Skip for: pure bug fixes, dependency bumps, snapshot-only updates, docs-only PRs, test-only changes, and refactors with no visual impact.
- When required: check PR description for design approval evidence. Missing design approval on a visual-change PR = blocking issue.
Only flag issues in changed files. Ignore pre-existing violations. Return JSON array of issues. If none found, return
[].
Scope: Modern Sass API, token usage, mixin investigation, private token detection.
Prompt to give this agent:
You are reviewing Backpack SCSS changes. Your ONLY job is to check Sass API and token compliance. Return issues as JSON.
PR number: [NUMBER] (repo: Skyscanner/backpack) — or "local mode" Head commit SHA: [SHA] Changed files: [INSERT LIST] PR summary: [INSERT]
Step 1: Fetch the diff and read changed SCSS files. Use
gh pr diff [NUMBER]orgit diff main...HEAD. Focus on.scssfiles relevant to your scope. Read the full content of each changed SCSS file with the Read tool.Step 2: Check each changed SCSS file against these rules:
Modern Sass API (Constitution III — NON-NEGOTIABLE)
- Must use
@usesyntax, NEVER@import- Granular imports:
@use '../../bpk-mixins/tokens',@use '../../bpk-mixins/utils'- Namespace prefixes:
tokens.bpk-spacing-md(),tokens.$bpk-core-primary-day- CSS Modules (
.module.scss)- All sizing in
rem, notpxoremStep 3: Mixin investigation (CRITICAL — use your tools): For these known high-risk CSS patterns (not every CSS property):
:hover,transition,z-index,::before/::afterpseudo-elements
- Grep
packages/bpk-mixins/for an existing mixin that abstracts this pattern- Read 2-3 similar existing components to see how they handle the same pattern
- If a mixin exists and the new code bypasses it, flag it as a violation
Known mixin mappings (not exhaustive — always search):
:hover->@include utils.bpk-hover { }(gates behind.bpk-no-touch-support)transition: ... 0.2s->tokens.$bpk-duration-smtoken::beforetouch-target ->@include utils.bpk-touch-tappableToken Usage (Constitution III)
- All visual params must use design tokens (no magic numbers)
- Do NOT use
$bpk-private-*tokens from other components- Verify token SEMANTIC meaning matches usage context, not just colour value:
$bpk-text-disabled-*= disabled/non-interactive elements only$bpk-text-secondary-*= active but de-emphasised interactive elements$bpk-surface-hero-*= hero/prominent background areas$bpk-status-danger-*= error/destructive states$bpk-core-accent-*= selected/primary action statesStep 4: Package import investigation: For each
import X from '../../bpk-component-Y':
- Read
packages/bpk-component-Y/index.tsxto see full export list- Look for size/variant suffixes (Large, Small, OnDark, V2)
- Verify the imported variant matches context
Only flag issues in changed lines. Ignore pre-existing violations. Return JSON array of issues. If none found, return
[].
Scope: Accessibility compliance, test coverage, snapshot currency.
Prompt to give this agent:
You are reviewing Backpack accessibility and testing. Your ONLY job is to check a11y and test compliance. Return issues as JSON.
PR number: [NUMBER] (repo: Skyscanner/backpack) — or "local mode" Head commit SHA: [SHA] Changed files: [INSERT LIST] PR summary: [INSERT]
Step 1: Fetch the diff and read changed component + test files. Use
gh pr difforgit diff main...HEAD. Read the full content of changed TSX and test files. Also read related test files that may not be in the diff.Step 2: Check accessibility (Constitution IV — NON-NEGOTIABLE):
accessibility-test.tsxfile must exist for any component- Must use
jest-axefor automated checks- Tests must exercise the public interface
- Check for: keyboard navigation, ARIA labels, touch targets >= 44x44px
- Verify colour contrast considerations in SCSS
Step 3: Check testing coverage (Constitution VIII):
- Unit tests (Jest + Testing Library) exist for all new code
- Coverage thresholds: branches >= 70%, functions/lines/statements >= 75%
- Storybook stories exist in
examples/directoryStep 4: Snapshot currency (commonly missed): After ANY change to rendered output, snapshots MUST be regenerated.
- Read the
.snapfile for this component- Verify it matches the current component output
- Look for stale attributes or class names
Only flag issues in changed files. Ignore pre-existing violations. Return JSON array of issues. If none found, return
[].
Scope: Git blame, past PR comments, recurring patterns.
Prompt to give this agent:
You are analysing the git history of files changed in a Backpack PR to find context-based issues. Return issues as JSON.
PR number: [NUMBER] (repo: Skyscanner/backpack) — or "local mode" Head commit SHA: [SHA] Changed files: [INSERT LIST] PR summary: [INSERT]
Step 1: Fetch the diff. Use
gh pr difforgit diff main...HEAD.Step 2: For each changed file, investigate history using your tools:
git log --oneline -10 -- [file] git log --oneline --all --grep="revert" -- [file] gh pr list --repo Skyscanner/backpack --state merged --limit 10 --search "[filename]"Step 3: For the most relevant past PRs, check their review comments:
gh pr view [PAST_PR_NUMBER] --repo Skyscanner/backpack --commentsStep 4: Analyse patterns:
- Check if recently reverted code is being reintroduced
- Identify hotspot files (frequent recent changes = higher scrutiny)
- Check if past review comments flagged the same patterns now being introduced
Only report issues directly relevant to the current PR's changes. Do not flag pre-existing issues unrelated to this PR. Return JSON array of issues. If none found, return
[].
Scope: Shallow scan of the diff only, looking for obvious logic bugs.
Prompt to give this agent:
You are scanning a Backpack PR diff for obvious bugs. Your ONLY job is to find logic errors, not style issues. Return issues as JSON.
PR number: [NUMBER] (repo: Skyscanner/backpack) — or "local mode" Head commit SHA: [SHA] Changed files: [INSERT LIST] PR summary: [INSERT]
Step 1: Fetch the diff. Use
gh pr difforgit diff main...HEAD. Focus ONLY on the changed lines.Step 2: Scan for bugs:
- Logic errors (wrong condition, off-by-one, missing null check)
- Missing event handler cleanup (addEventListener without removeEventListener)
- React-specific bugs (missing deps in useEffect, stale closures, key prop issues)
- Type mismatches that TypeScript might not catch (wrong enum variant, swapped args)
- Accessibility bugs (onClick without onKeyDown, missing role, wrong ARIA attribute)
- CSS bugs (z-index conflicts, missing overflow handling, RTL issues)
Do NOT flag:
- Style issues or nitpicks
- Pre-existing issues not introduced in this PR
- Things a linter or type checker would catch
- General code quality opinions
- Missing tests (Agent 3 handles this)
- Token or Sass issues (Agent 2 handles this)
Be conservative. Only flag things you are confident are actual bugs. Return JSON array of issues. If none found, return
[].
After all 5 agents return, collect every issue into a single list.
Scoring dispatch policy:
len(issues) <= 15: launch parallel scoring agents — one per issue, all in one turn.len(issues) > 15: the orchestrator scores all issues directly in a single pass
(no sub-agents). Use the same scoring rubric below but process as a batch.Phase 3 requirements:
Each scoring agent receives:
rule_id, supporting_lines)Confidence threshold:
75/backpack-code-review-checklist threshold=80threshold=N and uses that value throughout.Scoring prompt for each issue:
Score this issue on a scale from 0-100 for confidence that it is a real, actionable issue:
Issue: [TITLE + EXPLANATION] Code: [RELEVANT SNIPPET] Rule reference: [CONSTITUTION/DECISION SECTION, if any] Issue metadata: [RULE_ID + SUPPORTING_LINES]
Scoring rubric:
- 0: False positive. Does not stand up to scrutiny, or is a pre-existing issue.
- 25: Might be real but could be a false positive. If stylistic, not explicitly required by Constitution or decisions/.
- 50: Real issue but minor. A nitpick or unlikely to matter in practice.
- 75: Verified real issue. Constitution or decisions/ explicitly requires this. The PR's approach is insufficient. Will impact functionality or consistency.
- 100: Certain. NON-NEGOTIABLE violation (license header, className leak in new component, missing accessibility test). Verified by reading the actual code.
For Constitution/decision issues: verify the rule ACTUALLY says what the issue claims.
- If you cannot find the rule verbatim, score 0.
- If the rule exists but the violation requires interpretation (not explicitly stated), score 50 max.
- Score >= 75 only if you have read the exact rule text AND confirmed the changed code contradicts it.
For bug issues: verify the bug can actually occur given the surrounding code context.
For history issues: verify the past feedback is relevant to the current change.
Return ONLY a JSON object:
{"score": NUMBER, "confidence_explanation": "brief explanation", "rule_id": "string", "supporting_lines": [{"file":"...","startLine":1,"endLine":2}]}
False positive patterns to score 0:
Filter:
threshold = value from inline argument, or 75 if not specifiedscore < thresholdthreshold <= score < 90 as requires_human_gate=trueVisibility mode:
### Code review block.Agent status (always include):
After aggregation, record which agents returned successfully and which failed:
[] or a valid JSON array of issues.If no issues remain:
### Code review
No issues found. Checked by N/5 agents (Constitution, Sass, A11y, History, Bug Scanner).
[If any agent failed: "Note: Agent N ([name]) failed — [brief reason]. Remaining agents completed successfully."]
🤖 Generated with [Claude Code](https://claude.ai/code)
If issues remain, format as flat numbered list:
### Code review
Found N issues (reviewed by M/5 agents, filtered by confidence scoring):
[If any agent failed: "Note: Agent N ([name]) failed — [brief reason]."]
1. [Concise title] — [explanation: what is wrong, why it matters, what to use instead.
Reference correct pattern from codebase if one exists.]
[link to offending lines — format depends on PR vs local mode]
[if human-gated] Gate rationale: [confidence_explanation]
2. [Next issue] — [explanation]
[link]
[if human-gated] Gate rationale: [confidence_explanation]
🤖 Generated with [Claude Code](https://claude.ai/code)
Internal metadata (source, confidence, rule_id, human_gate, confidence_explanation)
may be used by the orchestrator for gating decisions, but should not be printed in final output.
Link format rules:
PR mode — GitHub permalink with full SHA:
https://github.com/Skyscanner/backpack/blob/[HEAD_COMMIT_SHA]/[FILE_PATH]#L[START]-L[END]
Autopost policy:
BACKPACK_REVIEW_AUTOPOST=true.requires_human_gate=true, require explicit confirmation before posting.After passing guardrails, post to PR:
gh pr review NNN --comment --body "$(cat <<'EOF'
[review body]
EOF
)"
Local mode — VSCode-clickable link:
[packages/bpk-component-foo/src/BpkFoo.tsx:42](packages/bpk-component-foo/src/BpkFoo.tsx#L42)
Output rules:
requires_human_gate=true issues, include Gate rationale: [confidence_explanation]Phase 0-5 headings or internal tablesThis phase is internal by default. Do not print it in normal output mode.
Before finalising output, confirm all of the following:
className/style leakage for new componentsBACKPACK_REVIEW_AUTOPOST, human gate for 75-90)gh CLI — no external services beyond GitHub.The following sections provide reference material. Agent prompts include the key rules inline; these sections provide additional detail for edge cases.
New components (after Constitution ratification):
className or style propsOmit<ComponentPropsWithoutRef<'div'>, 'children' | 'className' | 'style'>Existing components (grandfathered):
className/style for backward compatibilityToken Reference: backpack-foundations/base.common.js
| Category | Pattern | Example |
|----------|---------|---------|
| Core | $bpk-core-* | $bpk-core-primary-day |
| Surface | $bpk-surface-* | $bpk-surface-default-day |
| Text | $bpk-text-* | $bpk-text-primary-day |
| Status | $bpk-status-* | $bpk-status-danger-spot-day |
| Line | $bpk-line-* | $bpk-line-day |
| Spacing | functions | tokens.bpk-spacing-base() |
| Private | $bpk-private-[component]-* | DO NOT use cross-component |
Backpack tokens use RGB notation (rgb(239, 243, 248)). When matching Figma/design colours:
base.common.jsaccessibilityLabel optional "to be flexible"bpk-mixins/ abstracts themdevelopment
Maintainer-only workflow for handling GitHub Secret Scanning alerts on OpenClaw. Use when Codex needs to triage, redact, clean up, and resolve secret leakage found in issue comments, issue bodies, PR comments, or other GitHub content.
development
Maintainer workflow for OpenClaw releases, prereleases, changelog release notes, and publish validation. Use when Codex needs to prepare or verify stable or beta release steps, align version naming, assemble release notes, check release auth requirements, or validate publish-time commands and artifacts.
development
Run, watch, debug, and extend OpenClaw QA testing with qa-lab and qa-channel. Use when Codex needs to execute the repo-backed QA suite, inspect live QA artifacts, debug failing scenarios, add new QA scenarios, or explain the OpenClaw QA workflow. Prefer the live OpenAI lane with regular openai/gpt-5.4 in fast mode; do not use gpt-5.4-pro or gpt-5.4-mini unless the user explicitly overrides that policy.
development
End-to-end Parallels smoke, upgrade, and rerun workflow for OpenClaw across macOS, Windows, and Linux guests. Use when Codex needs to run, rerun, debug, or interpret VM-based install, onboarding, gateway smoke tests, latest-release-to-main upgrade checks, fresh snapshot retests, or optional Discord roundtrip verification under Parallels.