SKILL.md

name:: verify
type:: skill
category:: instruction
description:: Judgement-based QA pass. Does this artifact meet its goal and serve its user? Demands excellence, not compliance. Owned by marsha; reads the spec's Fitness Rubric (designed upstream via /design-rubric).
modifies_files:: true
needs_task:: true
mode:: execution
allowed-tools:: Task,Read,Glob,Grep
version:: 2.0.0
permalink:: skills-verify

Judgement-Based Verification Guidelines

Conduct rigorous QA reviews of artifacts to ensure correctness, complete implementation, and fitness for purpose.

Step 0 — Premise Test (forced; runs BEFORE you read the diff)

Before you read a single line of the diff, judge the premise from the task + diffstat alone and write the sharp principal's one-sentence snap reaction — "was this a good idea, in this shape?" — verbatim, as forcing-check item 0. You cannot emit a PASS verdict without it; a bad premise is a FAIL regardless of test coverage (green tests are the expected surface of a bad premise, not a mitigant). Diffstat-first ordering is mandatory — reading the code first is exactly what lets a clean, well-tested surface launder a bad premise.

Full definition, the verbatim prompt, the never-a-checklist hard rule, and the worked specimen live in the canonical reference: [[premise-test.md]]. (FAIL is the local rejection token here; the arch-fit lens emits 🔴 REJECT for the same call.)

Core Directives

Default posture: assume it's broken. The burden is on the artifact to prove it works — not on you to prove it doesn't.

Verify Evidence: Read files, run code, and inspect actual outputs directly. Do not rely on agent summaries. Cite exact file paths, line numbers, or logs.
Classify the Bar:
- Mechanical Bar: Verify against Acceptance Criteria (AC). Verdict: PASS, FAIL, or REVISE.
- Fitness / Mixed Bar: Verify against the AC and the spec's ## Fitness Rubric. (If missing on a fitness task, return REVISE — fitness rubric missing).
Completeness check: Apply the completeness heuristic before signing off:
- Check freshness of inputs read.
- Verify changes are complete across all callsites.
- Acknowledge known limitations or constraints.
Project-rule check: If .agents/rules/RULES.md exists in this repo, read it before judging. Apply its rules with the same class/instance discipline as AXIOMS.md. Project-rule violations belong under Process Compliance in the report, cited by {#slug}. RULES.md is not the only standard: for a content/instruction artifact (skill, agent body, prompt, doc, spec) also identify the skill that owns its quality standard for that artifact type and verify against it — e.g. /craft for instruction / agent-definition / skill / prompt edits. The governing standard often lives in a skill, not RULES.md.
Forcing Checks: Write explicit answers for each in the report before a PASS verdict:
- Premise Test (step 0, before reading the diff): State verbatim the sharp-principal reaction from task + diffstat alone (see Step 0). A bad premise is a FAIL regardless of test coverage; you cannot reach PASS without writing it.
- Sentinel / Empty-State Audit: Count and list empty/sentinel fields (e.g. DERIVER_MISSING, N/A, TODO). Fail if primary value-signals are missing.
- Principal's-Eye Top-Line Read: State verbatim the most prominent headline element and verify correctness for the end-user. For "show me my X" surfaces, this means reproducing the principal's literal view (his account, host, launch-context) and confirming HIS OWN instance is present — a generic instance is FAIL (see /design-rubric self-instance requirement).
- Floor vs Ceiling: State verbatim: "exceptional, or merely working?". Merely working is not a PASS on fitness tasks.
No Anchoring/Bias:
- If you participated in designing or iterating on this artifact, you are disqualified from reviewing it for fitness.
- Dispatches must be neutral (do not pre-state expected verdicts).

Data Pipeline Verification

For any artifact with computed, aggregated, or derived output (dashboards, reports, metrics), trace source → output: confirm the source is real, populated, and fresh; independently cross-verify the values against that source; disable any fallback to prove the primary path works alone (a fallback silently masks a broken primary); and check behaviour under load. The question is not "did output appear?" but "is this the RIGHT data?" — plausible-looking output is the most dangerous kind of incorrect output.

HALT Triggers (Immediate FAIL)

Stop evaluation immediately and write a FAIL verdict if any of the following occur:

Bad premise — a sharp principal would not have built this, or not in this shape (step-0 Premise Test failed; full definition [[premise-test.md]]). FAIL regardless of green tests; test-passing is the expected surface of this failure, not a mitigant.
Primary fields rendering as sentinels/placeholders.
Headline element is wrong for the end user.
Repeated or empty section headers.
Placeholder text ({variable}, TODO, FIXME) in production.
Overlapping/clipped text in rendered visual output.
Suspiciously short output for complex operations.
Silent error swallowing (try/except without logging).
Test suite checking existence instead of content.
Data that looks plausible but does not match its source.

Verdict Format

Output reports exactly in this format:

## Verification Report

**Bar:** [mechanical / fitness / mixed]
**Verdict:** [PASS / FAIL / REVISE]

### Concrete observations

[Observed bugs/defects, file paths, line numbers, and log excerpts]

### Forcing checks

0. **Premise test (before reading the diff):** [verbatim sharp-principal reaction from task + diffstat alone — "was this a good idea, in this shape?" A bad premise -> FAIL regardless of tests; cannot reach PASS without this line]
1. **Sentinel/empty-state audit:** [count + list of sentinels/placeholders. If primary signals absent -> FAIL]
2. **Principal's-eye top-line read:** [headline element quoted, and whether correct]
3. **Floor vs ceiling:** [verbatim "exceptional, or merely working?"]

### Process compliance

[Project-rule violations cited by `{#slug}` from `.agents/rules/RULES.md` if present, or "RULES.md absent — skipped"]

### Judgement

[Prose evaluation against AC, Red Flags, and/or Fitness Rubric dimensions]

### Recommendation

[If FAIL/REVISE: specific remediation steps and user impact]

Browser-Driven UI Verification

For web applications:

Navigate to the URL and wait for page-ready.
Capture screenshots at 1920×1080 resolution.
Save screenshots to $AOPS_SESSIONS/qa-screenshots/YYYY-MM-DD/.
Apply visual analysis checks for layout and legibility defects.

SKILL.md

name:: verify
type:: skill
category:: instruction
description:: Judgement-based QA pass. Does this artifact meet its goal and serve its user? Demands excellence, not compliance. Owned by marsha; reads the spec's Fitness Rubric (designed upstream via /design-rubric).
modifies_files:: true
needs_task:: true
mode:: execution
allowed-tools:: Task,Read,Glob,Grep
version:: 2.0.0
permalink:: skills-verify

Judgement-Based Verification Guidelines

Conduct rigorous QA reviews of artifacts to ensure correctness, complete implementation, and fitness for purpose.

Step 0 — Premise Test (forced; runs BEFORE you read the diff)

Core Directives

Default posture: assume it's broken. The burden is on the artifact to prove it works — not on you to prove it doesn't.

Verify Evidence: Read files, run code, and inspect actual outputs directly. Do not rely on agent summaries. Cite exact file paths, line numbers, or logs.
Classify the Bar:
- Mechanical Bar: Verify against Acceptance Criteria (AC). Verdict: PASS, FAIL, or REVISE.
- Fitness / Mixed Bar: Verify against the AC and the spec's ## Fitness Rubric. (If missing on a fitness task, return REVISE — fitness rubric missing).
Completeness check: Apply the completeness heuristic before signing off:
- Check freshness of inputs read.
- Verify changes are complete across all callsites.
- Acknowledge known limitations or constraints.
Project-rule check: If .agents/rules/RULES.md exists in this repo, read it before judging. Apply its rules with the same class/instance discipline as AXIOMS.md. Project-rule violations belong under Process Compliance in the report, cited by {#slug}. RULES.md is not the only standard: for a content/instruction artifact (skill, agent body, prompt, doc, spec) also identify the skill that owns its quality standard for that artifact type and verify against it — e.g. /craft for instruction / agent-definition / skill / prompt edits. The governing standard often lives in a skill, not RULES.md.
Forcing Checks: Write explicit answers for each in the report before a PASS verdict:
- Premise Test (step 0, before reading the diff): State verbatim the sharp-principal reaction from task + diffstat alone (see Step 0). A bad premise is a FAIL regardless of test coverage; you cannot reach PASS without writing it.
- Sentinel / Empty-State Audit: Count and list empty/sentinel fields (e.g. DERIVER_MISSING, N/A, TODO). Fail if primary value-signals are missing.
- Principal's-Eye Top-Line Read: State verbatim the most prominent headline element and verify correctness for the end-user. For "show me my X" surfaces, this means reproducing the principal's literal view (his account, host, launch-context) and confirming HIS OWN instance is present — a generic instance is FAIL (see /design-rubric self-instance requirement).
- Floor vs Ceiling: State verbatim: "exceptional, or merely working?". Merely working is not a PASS on fitness tasks.
No Anchoring/Bias:
- If you participated in designing or iterating on this artifact, you are disqualified from reviewing it for fitness.
- Dispatches must be neutral (do not pre-state expected verdicts).

Data Pipeline Verification

HALT Triggers (Immediate FAIL)

Stop evaluation immediately and write a FAIL verdict if any of the following occur:

Bad premise — a sharp principal would not have built this, or not in this shape (step-0 Premise Test failed; full definition [[premise-test.md]]). FAIL regardless of green tests; test-passing is the expected surface of this failure, not a mitigant.
Primary fields rendering as sentinels/placeholders.
Headline element is wrong for the end user.
Repeated or empty section headers.
Placeholder text ({variable}, TODO, FIXME) in production.
Overlapping/clipped text in rendered visual output.
Suspiciously short output for complex operations.
Silent error swallowing (try/except without logging).
Test suite checking existence instead of content.
Data that looks plausible but does not match its source.

Verdict Format

Output reports exactly in this format:

## Verification Report

**Bar:** [mechanical / fitness / mixed]
**Verdict:** [PASS / FAIL / REVISE]

### Concrete observations

[Observed bugs/defects, file paths, line numbers, and log excerpts]

### Forcing checks

0. **Premise test (before reading the diff):** [verbatim sharp-principal reaction from task + diffstat alone — "was this a good idea, in this shape?" A bad premise -> FAIL regardless of tests; cannot reach PASS without this line]
1. **Sentinel/empty-state audit:** [count + list of sentinels/placeholders. If primary signals absent -> FAIL]
2. **Principal's-eye top-line read:** [headline element quoted, and whether correct]
3. **Floor vs ceiling:** [verbatim "exceptional, or merely working?"]

### Process compliance

[Project-rule violations cited by `{#slug}` from `.agents/rules/RULES.md` if present, or "RULES.md absent — skipped"]

### Judgement

[Prose evaluation against AC, Red Flags, and/or Fitness Rubric dimensions]

### Recommendation

[If FAIL/REVISE: specific remediation steps and user impact]

Browser-Driven UI Verification

For web applications:

Navigate to the URL and wait for page-ready.
Capture screenshots at 1920×1080 resolution.
Save screenshots to $AOPS_SESSIONS/qa-screenshots/YYYY-MM-DD/.
Apply visual analysis checks for layout and legibility defects.

Adoption

nicsuzor/verify

$ install --global

Security Scan Results

SKILL.md

Judgement-Based Verification Guidelines

Step 0 — Premise Test (forced; runs BEFORE you read the diff)

Core Directives

Data Pipeline Verification

HALT Triggers (Immediate FAIL)

Verdict Format

Browser-Driven UI Verification

Related Skills

nicsuzor/end_session

nicsuzor/dump

nicsuzor/daily

nicsuzor/narrative-digest

nicsuzor/verify

$ install --global

Security Scan Results

SKILL.md

Judgement-Based Verification Guidelines

Step 0 — Premise Test (forced; runs BEFORE you read the diff)

Core Directives

Data Pipeline Verification

HALT Triggers (Immediate FAIL)

Verdict Format

Browser-Driven UI Verification

Related Skills

nicsuzor/end_session

nicsuzor/dump

nicsuzor/daily

nicsuzor/narrative-digest