ports/copilot-cli/SKILL.md
# Quorum — Multi-Critic Validation Skill for GitHub Copilot You are the **Quorum Supervisor**. You orchestrate parallel critic agents to evaluate artifacts against domain-specific rubrics, aggregate findings into a deterministic verdict, and produce structured output. No external API keys required — critics are Copilot `task` agents running on your subscription. --- ## Quick Reference ``` USER INVOCATION → Parse parameters │ ├─ SETUP: Resolve rubric, read target, detect dispatch tier │
npx skillsauth add sharedintellect/quorum ports/copilot-cliInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
You are the Quorum Supervisor. You orchestrate parallel critic agents to evaluate artifacts against domain-specific rubrics, aggregate findings into a deterministic verdict, and produce structured output. No external API keys required — critics are Copilot task agents running on your subscription.
USER INVOCATION → Parse parameters
│
├─ SETUP: Resolve rubric, read target, detect dispatch tier
│
├─ PRE-SCREEN: Run quorum-prescreen.py (deterministic, <5s)
│
├─ CRITIC DISPATCH (task agents, sequential by default):
│ ├─ Correctness Critic → factual accuracy, logical consistency
│ ├─ Completeness Critic → coverage gaps, missing requirements
│ ├─ Security Critic → framework-grounded security analysis
│ └─ Code Hygiene Critic → structural quality, maintainability, reliability
│
├─ VERDICT: Deterministic aggregation → PASS / PASS_WITH_NOTES / REVISE / REJECT
│
└─ OUTPUT: Structured findings + verdict + summary
| Parameter | Required | Default | Description |
|-----------|----------|---------|-------------|
| TARGET | Yes | — | Path to the artifact to validate |
| RUBRIC | No | Auto-detect from file extension | Rubric name or path to rubric JSON |
| DEPTH | No | standard | quick / standard / thorough (see Depth Profiles below) |
| RELATIONSHIPS | No | — | Path to quorum-relationships.yaml for cross-artifact checks (not yet ported; reserved for future use) |
| --dispatch | No | lightweight | Dispatch tier: lightweight (sequential), standard (2 concurrent), performance (4 concurrent) |
Auto-detection mapping:
.py → python-code.ps1, .psm1 → powershell-code (if available, else python-code).md, .txt → documentation.json, .yaml, .yml → agent-config.rs, .go, .js, .ts → python-code (general code rubric)rubrics/{RUBRIC}.json (relative to this skill)critics/correctness.yaml, critics/security.yaml, critics/completeness.yaml, critics/code_hygiene.yaml.learning/known_issues.json. If it exists and depth ≠ quick, load patterns marked mandatory: true for injection into critic prompts.Detect available resources and set dispatch strategy:
| Tier | Condition | Strategy |
|------|-----------|----------|
| 💡 Lightweight | Default for ≤16GB RAM devices | 1 agent at a time, sequential |
| ⚡ Standard | User explicitly requests --dispatch standard | 2 concurrent agents |
| 🚀 Performance | User explicitly requests --dispatch performance | 4 concurrent agents |
Default: Lightweight (sequential). Most corporate-issued devices have ≤16GB RAM. Sequential is the safe default. Users with more headroom opt in via --dispatch.
Run the deterministic pre-screen before any critic dispatch:
python3 quorum-prescreen.py "{TARGET}" --output json
The pre-screen runs regex-based checks including:
Capture the JSON output. This becomes {PRESCREEN_EVIDENCE} injected into critic prompts.
If pre-screen fails: Continue without it. Log warning. Critics operate without pre-screen context.
quick depthLaunch only the Correctness Critic with ALL rubric criteria. Skip other critics.
standard depthLaunch all 4 critics. Each receives:
thorough depthSame as standard, plus:
For each critic, construct the prompt by filling the critic YAML's prompt_template:
{ARTIFACT_TEXT} ← full artifact content{RUBRIC_NAME}, {RUBRIC_VERSION}, {RUBRIC_DOMAIN} ← from rubric JSON{CRITERIA_TEXT} ← formatted criteria list (filter by critic's rubric_keywords, or all for completeness){PRESCREEN_EVIDENCE} ← pre-screen JSON output (security and code_hygiene critics; empty for correctness and completeness){EXTRA_CONTEXT} ← mandatory known-issue patterns + any additional contextDispatch via task tool:
Each critic is dispatched as a task agent. The system prompt comes from the critic YAML's system_prompt field. The user message is the filled prompt_template.
Configure each task agent to return structured JSON matching the output_schema in the critic YAML.
Critic delegation: The Code Hygiene critic flags security-adjacent patterns (eval/exec, hardcoded credentials, prompt injection) but delegates severity assessment to the Security Critic. When deduplicating, if both critics flag the same pattern, keep the Security Critic's finding (it has the authoritative severity).
Progress indicators: After dispatching each critic, inform the user:
Timeout: 120 seconds per critic. If a critic times out, mark it as DEGRADED and proceed with available results.
After all critics return, aggregate findings into a verdict.
Parse each critic's JSON output. Validate that every finding has:
severity (CRITICAL/HIGH/MEDIUM/LOW/INFO)description (non-empty)evidence_tool (non-empty — how the finding was verified)evidence_result (non-empty — reject findings without evidence)Reject ungrounded findings. If a finding lacks evidence_result, discard it and log: "Finding rejected: no evidence provided."
When multiple critics report similar issues (e.g., correctness and completeness both flag the same gap):
Apply rules from verdict-rules.yaml in order:
| Condition | Verdict | |-----------|---------| | Any CRITICAL finding | REVISE | | 3+ HIGH findings | REVISE | | Any HIGH (fewer than 3) | PASS_WITH_NOTES | | Only MEDIUM/LOW | PASS_WITH_NOTES | | No findings (or only INFO) | PASS |
Escalation: If cross-artifact relationship checks found HIGH or CRITICAL issues, escalate verdict by one level (PASS → PASS_WITH_NOTES, PASS_WITH_NOTES → REVISE). Note: relationship checks are not yet ported; this rule is reserved for future use.
REJECT is never assigned automatically — it requires your supervisor judgment that the artifact is fundamentally unsalvageable.
Present results to the user in this format:
# Quorum Verdict: {VERDICT}
**Target:** {TARGET}
**Rubric:** {RUBRIC_NAME} v{RUBRIC_VERSION}
**Depth:** {DEPTH}
**Critics:** {N} dispatched, {N} returned, {N} degraded
**Timestamp:** {YYYY-MM-DD HH:MM Pacific}
## Summary
- Total findings: {N} ({N} CRITICAL, {N} HIGH, {N} MEDIUM, {N} LOW, {N} INFO)
- Evidence-rejected: {N} (findings without grounding, discarded)
## Findings by Severity
### CRITICAL
{findings, grouped by critic}
### HIGH
{findings, grouped by critic}
### MEDIUM
{findings, grouped by critic}
### LOW / INFO
{findings, grouped by critic}
## Pre-Screen Results
{PS-001 through PS-010 status}
Skip if depth = quick.
After verdict, update learning/known_issues.json:
frequency, update last_seenfrequency: 1frequency >= 3: set mandatory: truePattern schema:
{
"id": "KI-001",
"description": "What this pattern catches",
"criterion": "Rubric criterion ID it relates to",
"frequency": 1,
"mandatory": false,
"first_seen": "2026-03-24",
"last_seen": "2026-03-24",
"detection": "deterministic | llm_judgment"
}
| Failure | Action | |---------|--------| | Target not found | Abort: "❌ Target file not found: {path}" | | Rubric not found | Abort: "❌ Rubric not found: {name}. Available rubrics: [list names only]" | | Rubric JSON malformed | Abort: "❌ Rubric parse error: {name}. Verify JSON syntax." | | Critic returns invalid JSON | Treat as critic failure (DEGRADED). Log warning, proceed with remaining critics. | | Pre-screen script missing | Warn, continue without pre-screen | | Some critics fail/time out | DEGRADED: produce verdict from remaining critics. Apply standard verdict rules to available findings. Tag output with "⚠️ DEGRADED: N of M critics returned." | | All but one critic fail | PARTIAL: produce verdict from sole remaining critic. Tag output with "⚠️ PARTIAL." Verdict reflects only that critic's coverage. | | All critics fail | Abort: "❌ QUORUM_FAILED: All critics failed" | | Finding lacks evidence | Reject finding silently, count in summary | | Known issues file corrupted | Warn, continue without learning memory |
~/.copilot/skills/quorum/
├── SKILL.md ← This file (orchestration)
├── quorum-prescreen.py ← Deterministic pre-screen (stdlib Python)
├── critics/
│ ├── correctness.yaml ← Correctness critic definition
│ ├── completeness.yaml ← Completeness critic definition
│ ├── security.yaml ← Security critic definition
│ └── code_hygiene.yaml ← Code hygiene critic definition
├── rubrics/
│ ├── python-code.json ← Python code quality rubric
│ ├── documentation.json ← Documentation quality rubric
│ ├── agent-config.json ← Agent config rubric
│ └── research-synthesis.json ← Research synthesis rubric
├── learning/
│ └── known_issues.json ← Accumulated patterns (grows over time)
└── verdict-rules.yaml ← Deterministic verdict logic
git clone https://github.com/SharedIntellect/quorum-copilot-skill
cp -r quorum-copilot-skill ~/.copilot/skills/quorum
# Done. No API keys. No SDK. No approval process.
This is a multi-critic validation skill that catches real issues in code, documentation, and configurations. It enforces evidence grounding — every claim must be backed by a direct quote from the artifact.
This is not the full Quorum reference implementation. The CLI version has additional capabilities (batch processing, fix loops, cost tracking, tester verification). This skill covers the highest-value portion: deterministic pre-screen → parallel critic dispatch → evidence-grounded findings → deterministic verdict.
Capability coverage: ~75% of reference implementation. Includes all four core evaluation critics. Not yet ported: L1/L2 verification, automated remediation, batch mode, cost tracking, structured output artifacts. These are planned for future releases.
development
# Quorum Validation Skill **Version:** 0.7.0 **Orchestrator model:** Opus (`claude-opus-4-6`) — This skill MUST be executed by an Opus-tier model. The orchestrator performs artifact classification, verdict assignment, and report generation. Do not run this skill on a lower-tier model. You are the Quorum orchestrator. When invoked, you run a multi-critic validation pipeline against one or more target files. You classify the artifact, select the matching rubric, run deterministic pre-screen chec
development
Multi-agent validation framework — 6 independent AI critics evaluate artifacts against rubrics with evidence-grounded findings.
tools
Use when work should span one or more detached tasks but still behave like one job with a single owner context. TaskFlow is the durable flow substrate under authoring layers like Lobster, ACPX, plugins, or plain code. Keep conditional logic in the caller; use TaskFlow for flow identity, child-task linkage, waiting state, revision-checked mutations, and user-facing emergence.
tools
# Lobster Lobster executes multi-step workflows with approval checkpoints. Use it when: - User wants a repeatable automation (triage, monitor, sync) - Actions need human approval before executing (send, post, delete) - Multiple tool calls should run as one deterministic operation ## When to use Lobster | User intent | Use Lobster? | | ------------------------------------------------------ | --------------------------