.claude/skills/multi-ai-code-review/SKILL.md
Multi-perspective code review using Claude, Gemini, and Codex as specialized agents. 5-dimensional analysis (security, performance, maintainability, correctness, style) with LLM-as-judge consensus, quality scoring, and CI/CD integration. Use when reviewing PRs, auditing code quality, preparing production releases, or establishing code review workflows.
npx skillsauth add adaptationio/skrillz multi-ai-code-reviewInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
multi-ai-code-review provides comprehensive code review using multiple AI models as specialized agents, each analyzing code from a different perspective. Based on 2024-2025 best practices for AI-assisted code review.
Purpose: Multi-perspective code quality assessment using AI ensemble with human oversight
Pattern: Task-based (5 independent review dimensions + orchestration)
Key Principles (validated by tri-AI research):
Quality Targets:
Use multi-ai-code-review when:
When NOT to Use:
| Dimension | Agent | Focus | Weight | |-----------|-------|-------|--------| | Security | Security Specialist | OWASP Top 10, secrets, injection | 25% | | Performance | Performance Engineer | Complexity, memory, latency | 20% | | Maintainability | Architect | Patterns, modularity, DRY | 25% | | Correctness | QA Engineer | Logic, edge cases, tests | 20% | | Style | Nitpicker | Naming, formatting, conventions | 10% |
| Level | Action | Examples | |-------|--------|----------| | Critical | Block merge | SQL injection, exposed secrets, data loss | | High | Require fix | Race conditions, missing auth, memory leaks | | Medium | Suggest fix | Code duplication, missing tests, complexity | | Low | Optional | Style issues, naming, minor refactors |
Time: 2-5 minutes Automation: 80% Purpose: Fast security-focused review
Process:
Review this code for security vulnerabilities:
- SQL injection
- XSS vulnerabilities
- Hardcoded secrets/API keys
- Authentication bypasses
- Authorization flaws
- Input validation gaps
- Insecure dependencies
Code:
[PASTE CODE OR DIFF]
For each issue found, provide:
- Severity (Critical/High/Medium)
- Location (file:line)
- Description (what's wrong)
- Fix (specific code change)
gemini -p "Verify these security findings. Are any false positives?
[PASTE CLAUDE FINDINGS]
Code context:
[PASTE RELEVANT CODE]"
Time: 10-30 minutes Automation: 60% Purpose: Full multi-dimensional review
Process:
Step 1: Gather Context
# Get PR diff
git diff main...HEAD > /tmp/pr_diff.txt
# Identify affected areas
grep -E "^(\\+\\+\\+|---)" /tmp/pr_diff.txt | head -20
Step 2: Run Parallel Agent Reviews
Use Task tool to launch parallel agents:
Launch 3 parallel review agents:
Agent 1 (Security):
"Review this diff for security issues. Focus on:
- OWASP Top 10 vulnerabilities
- Authentication/authorization
- Input validation
- Secrets exposure
Diff: [DIFF]"
Agent 2 (Maintainability):
"Review this diff for maintainability. Focus on:
- Design patterns used correctly
- Code duplication (DRY)
- Modularity and cohesion
- Documentation quality
Diff: [DIFF]"
Agent 3 (Correctness):
"Review this diff for correctness. Focus on:
- Logic errors
- Edge cases not handled
- Test coverage gaps
- Error handling
Diff: [DIFF]"
Step 3: Orchestrate & Deduplicate
Synthesize findings from all agents:
[PASTE ALL AGENT OUTPUTS]
Tasks:
1. Remove duplicate findings
2. Rank by severity (Critical > High > Medium > Low)
3. Group by file
4. Generate summary table
5. Create final report with consensus issues only
Step 4: Generate Report
Output format:
## PR Review Summary
| File | Risk | Issues | Critical | High | Medium |
|------|------|--------|----------|------|--------|
| auth.py | High | 3 | 1 | 2 | 0 |
| api.py | Medium | 2 | 0 | 1 | 1 |
### Critical Issues (Block Merge)
1. **[auth.py:45]** SQL Injection vulnerability
- Why: User input directly in query
- Fix: Use parameterized queries
### High Issues (Require Fix)
...
### Consensus Score: 72/100
- Security: 65/100
- Performance: 80/100
- Maintainability: 70/100
- Correctness: 75/100
- Style: 85/100
Time: 5-15 minutes Automation: 70% Purpose: High-confidence findings through consensus
Process:
Claude Analysis:
Analyze this code for issues. Rate severity 1-10 for each:
[CODE]
Gemini Analysis (via CLI):
gemini -p "Analyze this code for issues. Rate severity 1-10 for each:
[CODE]"
Codex Analysis (via CLI):
codex "Analyze this code for issues. Rate severity 1-10 for each:
[CODE]"
Given these analyses from 3 AI models:
Claude: [FINDINGS]
Gemini: [FINDINGS]
Codex: [FINDINGS]
Identify issues where at least 2 models agree:
1. List consensus findings
2. Average severity scores
3. Note any disagreements
4. Final verdict for each issue
Time: 15-30 minutes Automation: 40% Purpose: Educational code review for learning
Process:
Review this code in mentorship mode. For a developer learning [LANGUAGE/FRAMEWORK]:
Code: [CODE]
For each finding:
1. **What's the issue** (be encouraging, not critical)
2. **Why it matters** (explain the underlying concept)
3. **How to improve** (show before/after with explanation)
4. **Learn more** (link to relevant documentation)
Also highlight:
- What was done well
- Good patterns to continue using
- Growth opportunities
Tone: Supportive and educational, never condescending.
Time: 30-60 minutes Automation: 50% Purpose: Comprehensive review before production
Process:
# Identify all changes since last release
git diff v1.0.0...HEAD --stat
git log v1.0.0...HEAD --oneline
## Pre-Release Audit: v1.1.0
### Security Clearance: PASS ✓
- No critical vulnerabilities
- All high issues resolved
- Secrets audit: Clean
### Performance Assessment: PASS ✓
- No new N+1 queries
- Response time within SLA
- Memory usage stable
### Test Coverage: 82% (target: 80%)
- Critical paths: 95%
- Edge cases: 78%
### Release Recommendation: APPROVED
| Task | Primary | Verification | Speed | |------|---------|--------------|-------| | Security scan | Claude | Gemini | Fast | | Architecture review | Claude | Codex | Medium | | Logic validation | Codex | Claude | Medium | | Style checking | Gemini | Claude | Fast | | Performance analysis | Claude | Codex | Medium |
Launch Multi-Agent Review:
# Using Task tool for parallel execution
# Each agent reviews independently, orchestrator synthesizes
Gemini Quick Check:
gemini -p "Quick security scan of this code: [CODE]"
Codex Deep Analysis:
codex "Analyze this code architecture and suggest improvements: [CODE]"
# .github/workflows/ai-review.yml
name: Multi-AI Code Review
on: [pull_request]
jobs:
review:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Get PR Diff
run: |
git diff origin/main...HEAD > pr_diff.txt
- name: Claude Review
uses: anthropics/claude-code-action@v1
with:
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
model: "claude-sonnet-4-5-20250929"
review_level: "detailed"
- name: Post Summary
uses: actions/github-script@v7
with:
script: |
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: `## AI Review Summary\n${process.env.REVIEW_SUMMARY}`
})
# Block merge for critical issues
quality_gates:
critical_issues: 0 # Must be zero
high_issues: 3 # Max allowed
coverage_minimum: 80 # Percent
score_minimum: 70 # Out of 100
Overall = (Security × 0.25) + (Performance × 0.20) +
(Maintainability × 0.25) + (Correctness × 0.20) +
(Style × 0.10)
| Score | Grade | Status | |-------|-------|--------| | ≥90 | A | Excellent - Ship it | | 80-89 | B | Good - Minor fixes | | 70-79 | C | Acceptable - Address issues | | 60-69 | D | Needs work - Significant fixes | | <60 | F | Failing - Major revision needed |
User: Review this PR for my authentication module
Claude: I'll perform a comprehensive multi-dimensional review.
[Launches parallel agents for security, maintainability, correctness]
## PR Review: Authentication Module
### Critical Issues (1)
1. **[auth.py:67]** Password stored in plaintext
- Severity: Critical
- Consensus: 3/3 models agree
- Fix: Use bcrypt hashing
```python
# Before
user.password = request.password
# After
import bcrypt
user.password = bcrypt.hashpw(request.password.encode(), bcrypt.gensalt())
Resolve critical security issues before merging.
---
## Related Skills
- **multi-ai-testing**: Generate tests for reviewed code
- **multi-ai-verification**: Validate fixes
- **multi-ai-implementation**: Implement suggested fixes
- **codex-review**: Codex-specific review patterns
- **review-multi**: Skill-specific reviews
---
## References
- `references/security-checklist.md` - OWASP Top 10 checklist
- `references/performance-patterns.md` - Performance anti-patterns
- `references/ci-cd-integration.md` - Full CI/CD setup guide
development
Setup secure web-based terminal access to WSL2 from mobile/tablet via ttyd + ngrok/Cloudflare/Tailscale. One-command install, start, stop, status. Use when you need remote terminal access, web terminal, browser-based shell, or mobile access to WSL2 environment.
development
Complete development workflows where Claude writes the code while Gemini and Codex provide research, planning, reviews, and different perspectives. Claude remains the main developer. Use for complex projects requiring expert planning and multi-perspective reviews.
development
Systematic progress tracking for skill development. Manages task states (pending/in_progress/completed), updates in real-time, reports progress, identifies blockers, and maintains momentum. Use when tracking skill development, coordinating work, or reporting progress.
testing
Comprehensive testing workflow orchestrating functional testing, example validation, integration testing, and usability assessment. Sequential workflow for complete skill testing from examples through scenarios to integration validation. Use when conducting thorough testing, pre-deployment validation, ensuring skill functionality, or comprehensive quality checks.