skills/codex-code-review/SKILL.md
Second-opinion code review from OpenAI Codex CLI. Structures feedback as CRITICAL/IMPROVEMENTS/POSITIVE.
npx skillsauth add notque/claude-code-toolkit codex-code-reviewInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Invoke OpenAI's Codex CLI (GPT-5.4 with maximum reasoning effort) to get an independent second opinion on code changes. Claude orchestrates the review: scoping what to review, constructing the prompt, invoking Codex in a read-only sandbox, then critically assessing the feedback before presenting it to the user.
The value is cross-model perspective. Codex has access to the git repo and filesystem directly, so it can read diffs, browse files, and understand context without Claude having to embed everything in the prompt.
Goal: Determine exactly what Codex should review before constructing the prompt.
Step 1: Ask the user what to review (if not already clear from context).
Common scoping patterns:
| User says | Scope |
|-----------|-------|
| "review my changes" | git diff (unstaged) or git diff --staged |
| "review the last commit" | git diff HEAD~1 |
| "review this PR" / "review this branch" | git diff main...HEAD (or appropriate base branch) |
| "review [file or directory]" | Specific paths |
| "review everything" | Full git diff main...HEAD |
Step 2: Identify focus areas (optional).
If the user mentioned specific concerns (performance, security, error handling), note them for the prompt. If not, let Codex do a general review.
Step 3: Gather context summary.
Build a brief context block for Codex that includes:
This context block goes into the Codex prompt. Keep it short because Codex can read the actual code itself -- the context just orients it.
Gate: You know what to review and have a context summary. Proceed to Phase 2.
Goal: Run codex exec with the right flags and a well-constructed prompt.
Step 1: Create temp file for output.
TMPFILE=$(mktemp /tmp/codex-review.XXXXXXXX)
Use exactly this pattern (no extension after the X's). macOS mktemp breaks
when you append an extension like .md after the template characters.
Step 2: Construct the prompt.
The prompt to Codex should have four parts:
Tell Codex to structure its output as:
## CRITICAL ISSUES
[Issues that would cause bugs, security vulnerabilities, data loss, or crashes.
Empty section if none found.]
## IMPROVEMENTS
[Suggestions for better code quality, performance, readability, or maintainability.
Not blocking but worth considering.]
## POSITIVE NOTES
[What's done well -- good patterns, clean abstractions, solid test coverage.
Reinforcing good practices matters.]
## SUMMARY
[2-3 sentence overall assessment: is this change ready, what's the biggest risk,
what's the biggest strength.]
Step 3: Run codex exec.
Preferred: Use codex exec review (purpose-built review subcommand):
codex exec review \
--base main \
-m gpt-5.4 \
-c 'model_reasoning_effort="xhigh"' \
--ephemeral \
--dangerously-bypass-approvals-and-sandbox \
-o "$TMPFILE"
The review subcommand handles diff computation internally. Use --base for
branch reviews, --commit <SHA> for single commits, or --uncommitted for
unstaged changes. Note: --base/--commit and a custom [PROMPT] argument
are mutually exclusive -- you cannot pass both.
Fallback: Use codex exec with a custom prompt when you need focus areas
or custom instructions:
codex exec \
-m gpt-5.4 \
-c 'model_reasoning_effort="xhigh"' \
--ephemeral \
--dangerously-bypass-approvals-and-sandbox \
-o "$TMPFILE" \
"YOUR_CONSTRUCTED_PROMPT_HERE"
For multi-line prompts, use a heredoc:
codex exec \
-m gpt-5.4 \
-c 'model_reasoning_effort="xhigh"' \
--ephemeral \
--dangerously-bypass-approvals-and-sandbox \
-o "$TMPFILE" \
"$(cat <<'PROMPT'
[Your constructed prompt here]
PROMPT
)"
Flag explanation:
-m gpt-5.4 -- use GPT-5.4 for maximum review quality-c 'model_reasoning_effort="xhigh"' -- maximize reasoning depth so Codex
doesn't shortcut analysis of complex code paths--ephemeral -- don't persist the Codex session (this is a one-shot review)--dangerously-bypass-approvals-and-sandbox -- bypass the bwrap sandbox
which fails in containerized/VM environments with "loopback: Failed
RTM_NEWADDR: Operation not permitted". This is safe because Claude Code
already provides external sandboxing and the review is read-only by nature.
The -s read-only flag is incompatible with this bypass (use one or the
other, not both)-o "$TMPFILE" -- capture Codex's output to the temp file for processingStep 4: Check exit code.
If codex exec exits non-zero, report the error to the user and stop. Do not
retry automatically because Codex failures are typically due to API issues,
authentication problems, or prompt-length limits that won't resolve on retry.
Read the stderr output and include it in your error report so the user can diagnose (e.g., rate limiting, invalid API key, model unavailable).
Gate: Codex exited 0 and $TMPFILE contains review output. Proceed to Phase 3.
Goal: Critically evaluate Codex's feedback before presenting it. Claude is the reviewer of the reviewer -- Codex provides signal, not authority.
Step 1: Read the output.
cat "$TMPFILE"
Step 2: Assess each finding.
For every issue Codex raised, evaluate:
| Question | Action | |----------|--------| | Is this actually correct? | Read the relevant code yourself. Codex may have misread the logic, missed context from other files, or misunderstood the intent. | | Is this already handled elsewhere? | Check if the concern is addressed in code Codex didn't examine (e.g., middleware, error handling upstream). | | Does this apply to this project's conventions? | Check the project's CLAUDE.md or style guides. What's an anti-pattern generally may be the accepted convention here. | | Is the severity right? | Codex may flag a style preference as critical, or miss that a "minor" issue is actually a data-loss risk in this context. |
Step 3: Classify each finding.
After assessment, classify each Codex finding into one of:
Step 4: Add Claude's own observations.
If you noticed issues that Codex missed, add them. The cross-model approach works both ways -- each model catches things the other doesn't.
Gate: Every Codex finding has been assessed. Proceed to Phase 4.
Goal: Present the unified review to the user.
Structure the report as:
## Codex Code Review (GPT-5.4 xhigh)
**Scope**: [what was reviewed -- e.g., "git diff main...HEAD (12 files)"]
### Critical Issues
[Agreed findings at critical severity, with Claude's assessment]
### Improvements
[Agreed suggestions, modified suggestions with Claude's notes]
### Positive Notes
[What both models agree is well-done]
### Filtered Findings
[Findings Claude disagreed with, plus reasoning -- keep brief]
### Claude's Additional Observations
[Issues Claude found that Codex missed, if any]
### Summary
[Combined assessment from both models]
Clean up the temp file:
rm -f "$TMPFILE"
Codex is a second opinion, not a senior reviewer with merge authority. Its findings need the same scrutiny you'd apply to any automated tool output. It can hallucinate issues, misread control flow, or flag patterns that are intentional in this codebase.
If Codex fails (non-zero exit, empty output, garbled response), report it to the user and let them decide. Automatic retry loops burn API quota and rarely fix the underlying problem (auth issues, rate limits, model errors).
Codex has filesystem access in its sandbox. Tell it which git command to run
(e.g., "run git diff main...HEAD to see the changes") rather than pasting the
diff into the prompt. Embedded diffs waste tokens, lose formatting, and hit
prompt length limits on large changes.
The entire value of this skill is cross-model triangulation. If Claude just
passes through Codex output verbatim, the user could have run codex exec
themselves. The assessment phase is where Claude adds context, catches errors,
and filters noise.
Even if a finding is clearly correct and the fix is obvious, present it to the user and let them decide. The user invoked this skill for a review, not for autonomous code changes.
Error: codex: command not found
npm install -g @openai/codex
(or check their installation docs). Verify with codex --version.Error: Non-zero exit code from codex exec
Error: Empty output file
$TMPFILE exists but is empty vs.
doesn't exist. Include any stderr output.Error: Output is not in expected format
documentation
Document translation: quick/normal/refined modes with chunked parallel subagents and glossary support.
development
AI image generation: Gemini and Nano Banana backends; single/series/batch workflows with prompt-to-disk.
testing
Unified voice content generation pipeline with mandatory validation and joy-check. 13-phase pipeline: LOAD, GROUND, STATS-CHECKPOINT, GENERATE, HOOK-GATE, VALIDATE, REFINE, VARIETY-GATE, JOY-CHECK, ANTI-AI, CLOSE-GATE, OUTPUT, CLEANUP. Use when writing articles, blog posts, or any content that uses a voice profile. Use for "write article", "blog post", "write in voice", "generate content", "draft article", "write about".
documentation
Critique-and-rewrite loop for voice fidelity validation.