plugins/specweave/skills/judge-llm/SKILL.md
Ultrathink LLM-as-Judge validation of completed work. Uses extended thinking and Opus model for thorough, independent evaluation. Use when saying "judge my code", "judge-llm", "deep validate", or as part of sw:done closure.
npx skillsauth add anton-abyzov/specweave plugins/specweave/skills/judge-llmInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
ULTRATHINK BY DEFAULT - Validate completed work using extended thinking and the LLM-as-Judge pattern. Provides an independent second opinion separate from sw:grill.
| Aspect | sw:grill | sw:judge-llm |
|--------|-------------|-----------------|
| Execution | In-session (same context) | Separate Opus API call |
| Context | Shares conversation context | Fresh context (no bias) |
| Thinking | Standard reasoning | Extended thinking / ultrathink |
| Output | Confidence-scored findings | Structured verdict + score |
| Domain | Generic code review | Built-in domain criteria |
Key value: Independent perspective with fresh model context catches issues that in-session review may miss.
TypeScript: src/core/skills/skill-judge.ts
ANTHROPIC_API_KEY.specweave/logs/judge-llm.log# DEFAULT: Ultrathink validation (recommended)
sw:judge-llm src/file.ts
sw:judge-llm "src/**/*.ts"
# Validate git changes (ultrathink by default)
sw:judge-llm --staged # Staged changes
sw:judge-llm --last-commit # Last commit
sw:judge-llm --diff main # Diff vs branch
# Quick mode (ONLY if you need speed over thoroughness)
sw:judge-llm src/file.ts --quick
# Timeout control (default: 60s)
sw:judge-llm src/file.ts --timeout 120000
# Additional options
sw:judge-llm src/file.ts --strict # Fail on any concern
sw:judge-llm src/file.ts --fix # Include fix suggestions
sw:judge-llm src/file.ts --export # Export report to markdown
sw:judge-llm src/file.ts --verbose # Show progress to console
This skill uses the Anthropic API directly (NOT your Claude Code subscription). Each evaluation costs approximately $0.01-0.05 depending on code size.
Before invoking the Anthropic API, you MUST check consent:
.specweave/config.json → check externalModels.consent field"always-allow" → proceed silently"never" → skip API call, use in-session ultrathink evaluation instead"ask" (default):
"anthropic" is in externalModels.allowedProvidersgrantStandingConsent('anthropic', projectRoot) from src/core/llm/consent.ts, then proceedANTHROPIC_API_KEY set → falls back to pattern matching automatically (no cost, no consent needed)Determine what to validate:
--staged: get staged git changes--last-commit: get files from last commit--diff <branch>: get diff against branchUse extended thinking for deep LLM-as-Judge evaluation via the Opus model:
Claude MUST use ultrathink/extended thinking to:
1. DEEP READ: Thoroughly understand all code, context, and intent
2. MULTI-DIMENSIONAL ANALYSIS: Evaluate across ALL dimensions:
- Correctness: Does it work exactly as intended?
- Completeness: ALL edge cases handled? ALL requirements met?
- Security: ANY vulnerabilities? OWASP Top 10 checked?
- Performance: Algorithmic complexity? Memory usage? Bottlenecks?
- Maintainability: Clean? Clear? Follows conventions?
- Testability: Can it be tested? Are tests adequate?
- Error handling: All failure modes covered?
3. CRITICAL EVALUATION: Weigh ALL findings by severity
4. REASONED VERDICT: Form verdict based on thorough analysis
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
JUDGE-LLM VERDICT: APPROVED | CONCERNS | REJECTED
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Mode: ULTRATHINK (extended thinking)
Confidence: 0.XX
Files Analyzed: N
REASONING:
[Detailed chain-of-thought from extended thinking]
ISSUES (if any):
CRITICAL: [title]
[description]
[file:line]
[suggestion]
HIGH: [title]
...
LOW: [title]
...
VERDICT: [summary sentence]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
| Verdict | Meaning | Action | |---------|---------|--------| | APPROVED | Work is solid, no significant issues | Safe to proceed | | CONCERNS | Issues found worth addressing | Review and fix recommended | | REJECTED | Critical issues found | MUST fix before proceeding |
Rubric Integration: If rubric.md exists in the increment directory:
Evaluator: sw:judge-llm[x] PASS if verdict is not REJECTED, [!] FAIL — verdict: REJECTED otherwiseAfter evaluation (including consent-denied fallback), you MUST write a JSON report. The CLI checks for this file during closure.
Path: .specweave/increments/<id>/reports/judge-llm-report.json
Full evaluation report:
{
"version": "1.0",
"incrementId": "<id>",
"timestamp": "<ISO-8601>",
"verdict": "APPROVED|CONCERNS|REJECTED",
"score": 87,
"mode": "ultrathink|quick|pattern-match",
"timedOut": false,
"duration_ms": 45000,
"consentStatus": "granted",
"summary": "..."
}
Consent denied / no API key:
{
"version": "1.0",
"incrementId": "<id>",
"timestamp": "<ISO-8601>",
"verdict": "WAIVED",
"consentStatus": "denied",
"reason": "External API consent denied by user"
}
A WAIVED verdict is accepted by the CLI — does not block closure.
Progress logged to .specweave/logs/judge-llm.log. Default timeout 60s aborts if stuck (timedOut: true).
sw:grill - Confidence-scored pre-ship quality gate (in-session)sw:validate - Rule-based increment validationsw:done - Increment closure (runs both grill and judge-llm)tools
Generate AI videos from text prompts or images. Supports Google Veo 3.1 and Pollinations.ai (free). Use when generating video, creating animations, text-to-video, AI video, video generation, make clip, animate.
tools
Validate increment with rule-based checks and AI quality assessment. Use when saying "validate", "check quality", or "verify increment".
tools
Create and manage umbrella workspaces for multi-repo projects. Activate when the user wants to: create umbrella, umbrella init, wrap in umbrella, create workspace, setup multi-repo, migrate repos to umbrella, umbrella create, new workspace, restructure into umbrella, "wrap this repo", "create umbrella for these repos", "setup workspace with repos", "move repos into umbrella". Do NOT activate for: add a repo to existing umbrella (use sw:get), add a feature, add an increment, clone a repo (use sw:get).
tools
--- description: Merge completed parallel agent work and trigger GitHub sync per increment. Activates for: team merge, merge agents, combine work, team finish. --- # Team Merge **Verify all teammates completed, run quality gates, close increments, and trigger sync.** ## Usage ```bash sw:team-merge sw:team-merge --dry-run # Preview merge plan sw:team-merge --skip-sync # Merge without GitHub/JIRA sync ``` ## What This Skill Does 1. **Verify all teammates completed** -- bl