Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

anton-abyzov/plugins/specweave/skills/judge-llm

Name: plugins/specweave/skills/judge-llm
Author: anton-abyzov

plugins/specweave/skills/judge-llm/SKILL.md

npx skillsauth add anton-abyzov/specweave plugins/specweave/skills/judge-llm

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Ultrathink LLM-as-Judge Validation

ULTRATHINK BY DEFAULT - Validate completed work using extended thinking and the LLM-as-Judge pattern. Provides an independent second opinion separate from sw:grill.

How It Differs from sw:grill

| Aspect | sw:grill | sw:judge-llm | |--------|-------------|-----------------| | Execution | In-session (same context) | Separate Opus API call | | Context | Shares conversation context | Fresh context (no bias) | | Thinking | Standard reasoning | Extended thinking / ultrathink | | Output | Confidence-scored findings | Structured verdict + score | | Domain | Generic code review | Built-in domain criteria |

Key value: Independent perspective with fresh model context catches issues that in-session review may miss.

Implementation

TypeScript: src/core/skills/skill-judge.ts

Uses Anthropic SDK with user's ANTHROPIC_API_KEY
AbortController-based timeout to prevent stuck states (default: 60s)
Progress logging to .specweave/logs/judge-llm.log
Fallback to basic pattern matching if no API key
Domain-specific evaluation criteria (frontend, backend, mobile, infrastructure, testing, ML)

Usage

# DEFAULT: Ultrathink validation (recommended)
sw:judge-llm src/file.ts
sw:judge-llm "src/**/*.ts"

# Validate git changes (ultrathink by default)
sw:judge-llm --staged           # Staged changes
sw:judge-llm --last-commit      # Last commit
sw:judge-llm --diff main        # Diff vs branch

# Quick mode (ONLY if you need speed over thoroughness)
sw:judge-llm src/file.ts --quick

# Timeout control (default: 60s)
sw:judge-llm src/file.ts --timeout 120000

# Additional options
sw:judge-llm src/file.ts --strict   # Fail on any concern
sw:judge-llm src/file.ts --fix      # Include fix suggestions
sw:judge-llm src/file.ts --export   # Export report to markdown
sw:judge-llm src/file.ts --verbose  # Show progress to console

External API Cost Consent (MANDATORY)

This skill uses the Anthropic API directly (NOT your Claude Code subscription). Each evaluation costs approximately $0.01-0.05 depending on code size.

Before invoking the Anthropic API, you MUST check consent:

Read .specweave/config.json → check externalModels.consent field
If "always-allow" → proceed silently
If "never" → skip API call, use in-session ultrathink evaluation instead
If "ask" (default):
- Check if "anthropic" is in externalModels.allowedProviders
- If YES → proceed silently (standing permission)
- If NO → ASK USER: "Judge-LLM will call the Anthropic API using your ANTHROPIC_API_KEY. This costs ~$0.01-0.05 per evaluation. Proceed? (yes/no/always)"
  - "yes" → proceed this time only
  - "no" → skip API call, use in-session ultrathink instead
  - "always" → run: grantStandingConsent('anthropic', projectRoot) from src/core/llm/consent.ts, then proceed
No ANTHROPIC_API_KEY set → falls back to pattern matching automatically (no cost, no consent needed)

Workflow

Step 1: Gather Input

Determine what to validate:

If file paths provided: read those files
If --staged: get staged git changes
If --last-commit: get files from last commit
If --diff <branch>: get diff against branch
If no args: validate recent work in conversation context

Step 2: Ultrathink Analysis (Default)

Use extended thinking for deep LLM-as-Judge evaluation via the Opus model:

Claude MUST use ultrathink/extended thinking to:

1. DEEP READ: Thoroughly understand all code, context, and intent
2. MULTI-DIMENSIONAL ANALYSIS: Evaluate across ALL dimensions:
   - Correctness: Does it work exactly as intended?
   - Completeness: ALL edge cases handled? ALL requirements met?
   - Security: ANY vulnerabilities? OWASP Top 10 checked?
   - Performance: Algorithmic complexity? Memory usage? Bottlenecks?
   - Maintainability: Clean? Clear? Follows conventions?
   - Testability: Can it be tested? Are tests adequate?
   - Error handling: All failure modes covered?
3. CRITICAL EVALUATION: Weigh ALL findings by severity
4. REASONED VERDICT: Form verdict based on thorough analysis

Step 3: Return Verdict

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
JUDGE-LLM VERDICT: APPROVED | CONCERNS | REJECTED
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Mode: ULTRATHINK (extended thinking)
Confidence: 0.XX
Files Analyzed: N

REASONING:
[Detailed chain-of-thought from extended thinking]

ISSUES (if any):
  CRITICAL: [title]
   [description]
   [file:line]
   [suggestion]

  HIGH: [title]
   ...

  LOW: [title]
   ...

VERDICT: [summary sentence]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Verdict Meanings

| Verdict | Meaning | Action | |---------|---------|--------| | APPROVED | Work is solid, no significant issues | Safe to proceed | | CONCERNS | Issues found worth addressing | Review and fix recommended | | REJECTED | Critical issues found | MUST fix before proceeding |

Rubric Integration

Rubric Integration: If rubric.md exists in the increment directory:

Load the file and find all criteria where Evaluator: sw:judge-llm
After determining your verdict, update matching criteria: [x] PASS if verdict is not REJECTED, [!] FAIL — verdict: REJECTED otherwise
If rubric.md does not exist, proceed with existing behavior (no error)

Persistent Report (MANDATORY)

After evaluation (including consent-denied fallback), you MUST write a JSON report. The CLI checks for this file during closure.

Path: .specweave/increments/<id>/reports/judge-llm-report.json

Full evaluation report:

{
  "version": "1.0",
  "incrementId": "<id>",
  "timestamp": "<ISO-8601>",
  "verdict": "APPROVED|CONCERNS|REJECTED",
  "score": 87,
  "mode": "ultrathink|quick|pattern-match",
  "timedOut": false,
  "duration_ms": 45000,
  "consentStatus": "granted",
  "summary": "..."
}

Consent denied / no API key:

{
  "version": "1.0",
  "incrementId": "<id>",
  "timestamp": "<ISO-8601>",
  "verdict": "WAIVED",
  "consentStatus": "denied",
  "reason": "External API consent denied by user"
}

A WAIVED verdict is accepted by the CLI — does not block closure.

Visibility & Stuck Detection

Progress logged to .specweave/logs/judge-llm.log. Default timeout 60s aborts if stuck (timedOut: true).

sw:grill - Confidence-scored pre-ship quality gate (in-session)
sw:validate - Rule-based increment validation
sw:done - Increment closure (runs both grill and judge-llm)

Resources

Official Documentation

anton-abyzov/plugins/specweave/skills/judge-llm

plugins/specweave/skills/judge-llm/SKILL.md

Ultrathink LLM-as-Judge validation of completed work. Uses extended thinking and Opus model for thorough, independent evaluation. Use when saying "judge my code", "judge-llm", "deep validate", or as part of sw:done closure.

124 stars

tools

Updated Apr 15, 2026

$ install --global

skillsauth

npx skillsauth add anton-abyzov/specweave plugins/specweave/skills/judge-llm

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 24, 2026, 8:47 PM2.1s1 file scanned

SKILL.md

description:: Ultrathink LLM-as-Judge validation of completed work. Uses extended thinking and Opus model for thorough, independent evaluation. Use when saying "judge my code", "judge-llm", "deep validate", or as part of sw:done closure.
allowed-tools:: Read, Grep, Glob, Bash

Ultrathink LLM-as-Judge Validation

ULTRATHINK BY DEFAULT - Validate completed work using extended thinking and the LLM-as-Judge pattern. Provides an independent second opinion separate from sw:grill.

How It Differs from sw:grill

Key value: Independent perspective with fresh model context catches issues that in-session review may miss.

Implementation

TypeScript: src/core/skills/skill-judge.ts

Uses Anthropic SDK with user's ANTHROPIC_API_KEY
AbortController-based timeout to prevent stuck states (default: 60s)
Progress logging to .specweave/logs/judge-llm.log
Fallback to basic pattern matching if no API key
Domain-specific evaluation criteria (frontend, backend, mobile, infrastructure, testing, ML)

Usage

# DEFAULT: Ultrathink validation (recommended)
sw:judge-llm src/file.ts
sw:judge-llm "src/**/*.ts"

# Validate git changes (ultrathink by default)
sw:judge-llm --staged           # Staged changes
sw:judge-llm --last-commit      # Last commit
sw:judge-llm --diff main        # Diff vs branch

# Quick mode (ONLY if you need speed over thoroughness)
sw:judge-llm src/file.ts --quick

# Timeout control (default: 60s)
sw:judge-llm src/file.ts --timeout 120000

# Additional options
sw:judge-llm src/file.ts --strict   # Fail on any concern
sw:judge-llm src/file.ts --fix      # Include fix suggestions
sw:judge-llm src/file.ts --export   # Export report to markdown
sw:judge-llm src/file.ts --verbose  # Show progress to console

External API Cost Consent (MANDATORY)

This skill uses the Anthropic API directly (NOT your Claude Code subscription). Each evaluation costs approximately $0.01-0.05 depending on code size.

Before invoking the Anthropic API, you MUST check consent:

Read .specweave/config.json → check externalModels.consent field
If "always-allow" → proceed silently
If "never" → skip API call, use in-session ultrathink evaluation instead
If "ask" (default):
- Check if "anthropic" is in externalModels.allowedProviders
- If YES → proceed silently (standing permission)
- If NO → ASK USER: "Judge-LLM will call the Anthropic API using your ANTHROPIC_API_KEY. This costs ~$0.01-0.05 per evaluation. Proceed? (yes/no/always)"
  - "yes" → proceed this time only
  - "no" → skip API call, use in-session ultrathink instead
  - "always" → run: grantStandingConsent('anthropic', projectRoot) from src/core/llm/consent.ts, then proceed
No ANTHROPIC_API_KEY set → falls back to pattern matching automatically (no cost, no consent needed)

Workflow

Step 1: Gather Input

Determine what to validate:

If file paths provided: read those files
If --staged: get staged git changes
If --last-commit: get files from last commit
If --diff <branch>: get diff against branch
If no args: validate recent work in conversation context

Step 2: Ultrathink Analysis (Default)

Use extended thinking for deep LLM-as-Judge evaluation via the Opus model:

Claude MUST use ultrathink/extended thinking to:

1. DEEP READ: Thoroughly understand all code, context, and intent
2. MULTI-DIMENSIONAL ANALYSIS: Evaluate across ALL dimensions:
   - Correctness: Does it work exactly as intended?
   - Completeness: ALL edge cases handled? ALL requirements met?
   - Security: ANY vulnerabilities? OWASP Top 10 checked?
   - Performance: Algorithmic complexity? Memory usage? Bottlenecks?
   - Maintainability: Clean? Clear? Follows conventions?
   - Testability: Can it be tested? Are tests adequate?
   - Error handling: All failure modes covered?
3. CRITICAL EVALUATION: Weigh ALL findings by severity
4. REASONED VERDICT: Form verdict based on thorough analysis

Step 3: Return Verdict

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
JUDGE-LLM VERDICT: APPROVED | CONCERNS | REJECTED
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Mode: ULTRATHINK (extended thinking)
Confidence: 0.XX
Files Analyzed: N

REASONING:
[Detailed chain-of-thought from extended thinking]

ISSUES (if any):
  CRITICAL: [title]
   [description]
   [file:line]
   [suggestion]

  HIGH: [title]
   ...

  LOW: [title]
   ...

VERDICT: [summary sentence]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Verdict Meanings

Rubric Integration

Rubric Integration: If rubric.md exists in the increment directory:

Load the file and find all criteria where Evaluator: sw:judge-llm
After determining your verdict, update matching criteria: [x] PASS if verdict is not REJECTED, [!] FAIL — verdict: REJECTED otherwise
If rubric.md does not exist, proceed with existing behavior (no error)

Persistent Report (MANDATORY)

After evaluation (including consent-denied fallback), you MUST write a JSON report. The CLI checks for this file during closure.

Path: .specweave/increments/<id>/reports/judge-llm-report.json

Full evaluation report:

{
  "version": "1.0",
  "incrementId": "<id>",
  "timestamp": "<ISO-8601>",
  "verdict": "APPROVED|CONCERNS|REJECTED",
  "score": 87,
  "mode": "ultrathink|quick|pattern-match",
  "timedOut": false,
  "duration_ms": 45000,
  "consentStatus": "granted",
  "summary": "..."
}

Consent denied / no API key:

{
  "version": "1.0",
  "incrementId": "<id>",
  "timestamp": "<ISO-8601>",
  "verdict": "WAIVED",
  "consentStatus": "denied",
  "reason": "External API consent denied by user"
}

A WAIVED verdict is accepted by the CLI — does not block closure.

Visibility & Stuck Detection

Progress logged to .specweave/logs/judge-llm.log. Default timeout 60s aborts if stuck (timedOut: true).

sw:grill - Confidence-scored pre-ship quality gate (in-session)
sw:validate - Rule-based increment validation
sw:done - Increment closure (runs both grill and judge-llm)

Resources

Official Documentation

Related Skills

anton-abyzov/plugins/specweave/skills/video

tools

VerifiedTrustedCommunity

Generate AI videos from text prompts or images. Supports Google Veo 3.1 and Pollinations.ai (free). Use when generating video, creating animations, text-to-video, AI video, video generation, make clip, animate.

124SKILL.mdUpdated Apr 15, 2026

anton-abyzov/plugins/specweave/skills/video

anton-abyzov/plugins/specweave/skills/validate

tools

VerifiedTrustedCommunity

Validate increment with rule-based checks and AI quality assessment. Use when saying "validate", "check quality", or "verify increment".

124SKILL.mdUpdated Apr 15, 2026

anton-abyzov/plugins/specweave/skills/validate

anton-abyzov/plugins/specweave/skills/umbrella

tools

VerifiedTrustedCommunity

Create and manage umbrella workspaces for multi-repo projects. Activate when the user wants to: create umbrella, umbrella init, wrap in umbrella, create workspace, setup multi-repo, migrate repos to umbrella, umbrella create, new workspace, restructure into umbrella, "wrap this repo", "create umbrella for these repos", "setup workspace with repos", "move repos into umbrella". Do NOT activate for: add a repo to existing umbrella (use sw:get), add a feature, add an increment, clone a repo (use sw:get).

124SKILL.mdUpdated Apr 15, 2026

anton-abyzov/plugins/specweave/skills/umbrella

anton-abyzov/plugins/specweave/skills/team-merge

tools

VerifiedTrustedCommunity

--- description: Merge completed parallel agent work and trigger GitHub sync per increment. Activates for: team merge, merge agents, combine work, team finish. --- # Team Merge **Verify all teammates completed, run quality gates, close increments, and trigger sync.** ## Usage ```bash sw:team-merge sw:team-merge --dry-run # Preview merge plan sw:team-merge --skip-sync # Merge without GitHub/JIRA sync ``` ## What This Skill Does 1. **Verify all teammates completed** -- bl

124SKILL.mdUpdated Apr 15, 2026

anton-abyzov/plugins/specweave/skills/team-merge

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/anton-abyzov/specweave.git

# Copy into Claude Code skills folder (global)
cp -r specweave/plugins/specweave/skills/judge-llm ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

anton-abyzov/specweave

124 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT