codex-audit/SKILL.md
Deep code audit via Codex with full-access sandbox and validation gate. Triggers on "codex audit", "deep review", "audit with codex", "thorough code review", "/codex-audit". Unlike /codex, this gives Codex full access to run tests and explore, then validates every finding before presenting.
npx skillsauth add aromanarguello/roman-skills codex-auditInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Deep code review via Codex CLI with a validation gate. Codex gets full repo access to run tests, read files, and search — then Claude cross-checks every finding against actual source and project docs before presenting results.
# Try uncommitted changes first
DIFF=$(git diff HEAD)
# Fall back to last commit if working tree is clean
if [ -z "$DIFF" ]; then
DIFF=$(git diff HEAD~1 HEAD)
SCOPE="last commit"
else
SCOPE="uncommitted changes"
fi
Also run git log --oneline -5 to understand recent commit context.
Review the diff and write a 2-3 sentence summary of what changed. This becomes the <DIFF_SUMMARY> passed to Codex.
Run via a Bash subagent (Task tool) to keep output isolated from main context:
Task tool:
description: "Codex deep audit"
prompt: |
Run this command and return the full output:
git diff HEAD | codex exec --full-auto \
-s danger-full-access \
-C "$(pwd)" \
"You are a code reviewer. The following diff is piped to your stdin. Review it for: bugs, security issues, performance problems, logic errors, and style concerns. Be specific about file names and line numbers. You can read files, run tests, search the web for relevant docs, or do whatever you need for a thorough review. If everything looks good, say so. Here is context about what changed: <DIFF_SUMMARY>" \
-o /tmp/codex-audit-out.txt 2>/dev/null; cat /tmp/codex-audit-out.txt
If git diff HEAD is empty, use: git diff HEAD~1 HEAD | codex exec ...
Return the complete output.
This is the critical step. Codex operates with zero project context — it WILL flag intentional design choices as bugs and invent issues from misreadings.
For EACH finding Codex returns:
docs/, README, design docs) — is this an intentional choice?Classify each finding:
Present findings grouped by classification:
For each: what it is, where (file:line), why you believe it's legitimate.
For each: what Codex flagged, why you think it's wrong (with evidence).
If Codex found nothing, or all findings failed validation, say so explicitly.
Then ask: "Want me to address any of these?"
| Mistake | Fix |
|---------|-----|
| Trusting Codex output blindly | ALWAYS validate — step 4 is not optional |
| Running in main context | Use a Bash subagent via Task tool |
| Forgetting 2>/dev/null | Codex thinking tokens flood context |
| Forgetting -o flag | Output mixes with metadata noise |
| Skipping project doc check | Codex lacks context — you must supply it |
| Labeling style nits as confirmed | Only confirm meaningful issues |
development
Use when the user wants to brainstorm, stress-test, sharpen a plan, explore options, decide what to build, or says "grill me", "/grill-me", "help me think through", "sharpen this idea", or "what should I build".
development
Use at end of coding sessions to find and eliminate duplicated code, dead code, and unnecessary abstractions. Also use when codebase feels cluttered or when you suspect copy-paste patterns have accumulated.
development
Use when you have 2+ independent tasks to run concurrently without a formal plan. Triggers on "parallel research", "parallel subagents", "explore in parallel", "investigate multiple", "run tests in parallel". NOT for plan execution (use subagent-driven-development instead)
testing
Pre-merge review that runs PR quality, tech debt, security, regression, and performance analysis in parallel via general-purpose agents, aggregates findings into a unified prioritized report, then auto-fixes mechanical issues. Use when the user says "final review", "pre-merge review", "run all reviews", or wants a comprehensive check before merging. Defaults to all reviewers; accepts args to run a subset (e.g., `/final-review security techdebt`).