skills/mb-red-verify/SKILL.md
Adversarial semantic verification for one TASK-* so teams can catch solutions that pass process checks but are still wrong in substance.
npx skillsauth add mrvladd-d/memobank mb-red-verifyInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
TASK-* for semantic correctness, hidden failure modes, and systemic harm.TASK_ID, task intent, actual change surface, tests/evidence, and only then relevant spec reconciliation.red-verification.md, a concise semantic-risk report, and follow-up bugs/tasks when concerns are serious.Catch changes that are "disciplined but wrong":
mb-verify checks acceptance criteria and evidence-backed task completion.mb-review reviews Memory Bank quality, planning, and discipline in fresh context.mb-red-verify asks: "Is this solution actually right in substance?"Status transitions have two modes.
Scheduler mode:
/autopilot and /autonomous own task status transitions./execute returns scoped implementation handoff; it does not close tasks./verify gives functional verdict/evidence; in scheduler mode it does not close/fail/block/promote./red-verify gives semantic verdict for T2/T3; in scheduler mode it does not close/fail/block/promote..memory-bank/tasks/TASK-*.task.json record before /mb-sync./mb-sync records/reconciles already-written task state. It does not decide closure/failure/blocking/promotion and must not sync a decision that exists only in scheduler context.VERDICT: PASS plus SEMANTIC_VERDICT: semantic-pass before scheduler marks done.HUMAN_CHECKPOINT: done and ROLLBACK_RECOVERY_NOTE: present.Manual mode:
/execute -> /verify for one TASK.explicit standalone owner means either the user directly asked the current top-level agent to close the task, or the top-level agent/orchestrator explicitly runs a manual workflow for one TASK and records that it owns closure. Subagents/worker prompts do not silently become closure owners./verify PASS may mark T0 / T1 status: done only when explicit closure ownership is present and completed evidence has been written to the task record verify field and the compact/full protocol required by tier./verify records VERDICT: PASS, evidence, and a closure recommendation, leaves status unchanged, and tells the scheduler/owner to close.T2 / T3 manual closure requires /verify PASS plus /red-verify / mb-red-verify SEMANTIC_VERDICT: semantic-pass before status: done or /mb-sync; if semantic-pass is absent, leave closure pending or blocked, not done.semantic-concern in manual mode means do not trust the existing done state without human review / follow-up.mode field is used.mb-verify should usually run first.tier. Authoritative red-verification routing is only task.tier; the old risk / risk.level model is invalid.T2 / T3, linked SDD specs are present in task richer fields, feature spec_design_links, or spec-index.md; if absent, stop and route back to /spec-improve or /spec-auto.T2 / T3 require this pass before scheduler marks done.T2 / T3 after mb-verify PASS and before final closure//mb-sync; T0 / T1 usually skip it unless their real scope grew beyond the recorded tier.T0 / T1 usually skip it unless scope has grown and the tier is updated first.Create or update:
.protocols/<TASK_ID>/red-verification.mdStore a concise report in:
.tasks/<TASK_ID>/<TASK_ID>-S-RED-VERIFY-final-report-docs-01.mdIf concerns are material:
.memory-bank/bugs/BUG-<short>.md.task.json records indexed in .memory-bank/tasks/index.jsonUse:
./references/shared-protocols-red-verification-template.md./agents/red-verifier.mdDo not start by over-trusting the same full spec context the implementer used.
Prime in this order:
contracts/*, states/*, domains/*, runbooks/*, invariants)This keeps the verifier from merely confirming the workflow surface.
Use mb-red-verify when:
task.tier is T2 or T3Usually skip it for:
Read only what you need:
.protocols/<TASK_ID>/plan.md.protocols/<TASK_ID>/progress.md.protocols/<TASK_ID>/verification.md if it existsChallenge the solution from multiple angles:
Then inspect the smallest sufficient spec subset:
.memory-bank/spec-index.md and linked SDD specs for T2 / T3contracts/*domains/*states/*runbooks/*requirements.mdinvariants.mdIf code and specs disagree, record the drift explicitly rather than silently choosing one side.
The output must be concise and high-signal. Include:
For T3, also cover critical/security/runtime/recovery concerns and confirm exact marker lines HUMAN_CHECKPOINT: done and ROLLBACK_RECOVERY_NOTE: present are present before closure.
semantic-pass: no substantive concerns found; scheduler closure-eligible when mb-verify also has PASS; manual T2 / T3 closure is eligible when mb-verify also has PASSsemantic-concern: not proven wrong, but blocked or human-review-required; in manual mode, do not trust existing done without human review / follow-upsemantic-fail: substantively wrong, systemically harmful, or too risky to accept; recommend or apply task status: failed according to active workflow ownership and explicit closure ownershipWhen invoked by /autopilot or /autonomous, mb-red-verify must not independently close the task, write done, write failed, block dependents, or promote dependents. It writes the semantic verdict and returns the recommended status/dependent action to the scheduler.
For semantic-concern, recommend blocking task/dependents, reopening from done, or leaving the task pending human review. If human review accepts the concern, record owner/reason and repeat mb-red-verify; scheduler normal done requires semantic-pass.
For semantic-fail, file or recommend a bug, recommend follow-up tasks, recommend or apply status: failed according to active workflow ownership and explicit closure ownership, and stop downstream progression through the scheduler/explicit standalone owner.
red-verification.md exists and is substance-focused./verify.testing
Verify one TASK-* against acceptance criteria and record reproducible evidence.
testing
Review a Memory Bank with fresh-context specialists and produce a prioritized fix list.
development
Map an existing codebase into an as-is Memory Bank without inventing roadmap items.
tools
Create the Memory Bank skeleton and project command proxies in the current repository.