Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

bcbeidel/verify-work

Name: verify-work
Author: bcbeidel

plugins/work/skills/verify-work/SKILL.md

npx skillsauth add bcbeidel/wos verify-work

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Verify Work

Verify that completed work meets validation criteria — either from a plan's Validation section or from a hypothesis built from git diff, project conventions, and project docs.

Announce at start: "I'm using the verify-work skill to verify this work."

When to use

Also fires when the user phrases the request as:

"check my work"
"verify my changes"
"does this look right"
"check if done"
"are we done"
"did it work"

Trust Model

Plan files are trusted user input — the user authored the plan (or approved its generation in /work:plan-work) and the Validation section is part of that authored content. Commands listed there run with the agent's normal permissions; the user's review of the plan is the authorization for those commands. This skill does not execute arbitrary content from anywhere else.

Workflow

1. Determine Mode

Plan mode: A plan file path was provided, or the user references a plan. Read the plan file. Locate the Validation section. If it contains criteria, proceed to Step 2 (Plan Preconditions).

If the Validation section is missing or empty, stop and report: "This plan has no validation criteria. Add concrete criteria to the Validation section before running validation."

Ad-hoc mode: No plan provided or referenced. Proceed to Step 1b (Build Hypothesis).

1b. Build Hypothesis (ad-hoc mode)

Gather signals from three sources. See adhoc-validation for the full protocol.

Git diff: Run git diff main...HEAD --stat, git diff --stat, and git diff --cached --stat. Categorize changed files (source, tests, config, docs).

Config files: Scan for project config files to discover available checks (test runners, linters, type checkers, build tools). Only propose checks for tools actually configured.

Project docs: Read CLAUDE.md, AGENTS.md, README.md, and CONTRIBUTING.md for explicit test/lint/build commands and conventions.

1c. Present and Confirm (ad-hoc mode)

Present the hypothesis:

Based on your changes and project setup, here's what I'd validate:

Changes detected:
- [N] source files modified ([list key files])
- [N] test files modified
- [N] doc files modified

Proposed checks:
1. [auto] `command` — description
2. [auto] `command` — description
3. [human] Description of qualitative check

Add, remove, or modify any of these? Or confirm to run.

Every proposed check must cite its signal source (git diff, config file, or project doc). Wait for user confirmation before executing.

2. Plan Preconditions (plan mode only)

Check that all task checkboxes are complete:

bash <plugin-skills-dir>/start-work/scripts/check_tasks_complete.sh <path>

The script exits 0 with "OK: all tasks complete" if all boxes are checked. It exits 1 and prints the open task lines if any remain. If tasks remain, report them and stop:

"[N] task(s) incomplete. Complete all tasks before validating."

Do not proceed with partial validation.

3. Classify Criteria

Tag each numbered item in the Validation section:

Automated — item contains a runnable command in a code block
Human — item describes an observable condition requiring judgment
Mixed — item has both a runnable command and a judgment component

4. Run Automated Checks

Execute commands from automated and mixed criteria in priority order (numbered list order). Capture exit code and output per criterion.

Exit code 0 = pass
Non-zero exit code = fail (but read output — see automated-validation for interpretation nuance)

For mixed criteria, run the automated part first. If it fails, mark the criterion as failed without proceeding to the human component.

5. Present Human Criteria

Show the full numbered list with results:

Validation Results:
1. [PASS] `python python -m pytest tests/ -v` — 42 passed
2. [FAIL] `ruff check src/` — 3 errors found
3. [PENDING] All API responses use consistent error format
4. [PENDING] Documentation covers all new endpoints

Ask the user to confirm each pending (human) criterion. Default: present the full list and ask for confirmation on all pending items at once. If the user prefers one-by-one, switch to that mode.

See human-validation for presentation patterns, judgment vs. confirmation criteria, and escalation.

6. Handle Failures

If any criterion failed:

Report which criteria failed with evidence (command output, user rejection reason)
Load failure-diagnosis
Classify the gap type: integration gap, specification drift, or missing cross-cutting concern

Plan mode:

Suggest 1-3 new tasks to close the gap, formatted to match the plan's existing task style
Keep plan in executing state
Ask user: add suggested tasks to plan, or abandon?

If the user adds tasks, insert them into the plan's Tasks section (before the Validation heading, not after it). This is critical — assess_plan.py only parses tasks under task-related headings. Tasks appended after the Validation heading will be invisible to the execution tooling. Update the plan file and save. The plan returns to active execution.

Ad-hoc mode:

Present specific, actionable suggestions to fix each failure
Offer to re-run validation after fixes are applied

No plan file to update — suggestions are conversational.

7. On Success

When all criteria pass (automated + human confirmed) and none are marked uncertain:

Plan mode:

Update plan frontmatter: status: completed
Output structured summary:

Validation Complete — ALL PASSED

Plan: [plan name]
Criteria: [N] total ([A] automated, [H] human, [M] mixed)

Results:
1. [PASS] criterion description
2. [PASS] criterion description (human-confirmed)
...

Status updated: executing → completed

Ad-hoc mode:

Output results summary:

Validation Complete — ALL PASSED

Criteria: [N] total ([A] automated, [H] human)

Results:
1. [PASS] criterion description
2. [PASS] criterion description (human-confirmed)
...

Key Instructions

All tasks must be complete before validating (plan mode). The precondition check (Step 2) enforces this. Partial validation produces misleading results.
Run automated checks before human checks. Automated results inform human judgment. If tests fail, asking the user to judge code quality is premature.
Read command output, not just exit codes. A passing exit code with warning output may still indicate problems. A failing exit code from a missing environment is "blocked," not "failed."
Plan stays in executing on failure (plan mode). Never mark a plan as failed or completed when validation criteria fail. Add tasks to address gaps or abandon with a reason.
Run the criteria as given, not your own. In plan mode, run what the plan author wrote. In ad-hoc mode, run what the user confirmed. Do not invent additional criteria or skip criteria you consider redundant.

Anti-Pattern Guards

Skipping human criteria — automated-only validation misses qualitative concerns. Present human criteria even when all automated checks pass.
Diagnosing without evidence — when reporting failures, include command output, error messages, or specific observations. "It didn't work" is not a diagnosis.
Running quality judgment before structural checks pass — if plan structure is malformed (missing status, wrong task count), quality checks produce meaningless results. Structural preconditions (Step 2) must pass before any judgment-based criterion runs. A malformed plan is not "mostly validated."

Handoff

Chainable to: finish-work (on pass), start-work (on fail)

bcbeidel/verify-work

plugins/work/skills/verify-work/SKILL.md

Verifies completed work against validation criteria. Works in two modes: with a plan (runs the plan's Validation section) or ad-hoc (builds checks from git diff, project config, and project docs). Use when the user wants to "verify the work", "validate the work", or "run checks", or after completing all tasks in a plan.

1 stars

development

Updated May 8, 2026

$ install --global

skillsauth

npx skillsauth add bcbeidel/wos verify-work

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 9, 2026, 8:52 AM143.6s1 file scanned

SKILL.md

name:: verify-work
description:: >
modes:: with a plan (runs the plan's Validation section) or ad-hoc
argument-hint:: [plan file path (optional)]
user-invocable:: true
license:: MIT

Verify Work

Verify that completed work meets validation criteria — either from a plan's Validation section or from a hypothesis built from git diff, project conventions, and project docs.

Announce at start: "I'm using the verify-work skill to verify this work."

When to use

Also fires when the user phrases the request as:

"check my work"
"verify my changes"
"does this look right"
"check if done"
"are we done"
"did it work"

Trust Model

Workflow

1. Determine Mode

Plan mode: A plan file path was provided, or the user references a plan. Read the plan file. Locate the Validation section. If it contains criteria, proceed to Step 2 (Plan Preconditions).

If the Validation section is missing or empty, stop and report: "This plan has no validation criteria. Add concrete criteria to the Validation section before running validation."

Ad-hoc mode: No plan provided or referenced. Proceed to Step 1b (Build Hypothesis).

1b. Build Hypothesis (ad-hoc mode)

Gather signals from three sources. See adhoc-validation for the full protocol.

Git diff: Run git diff main...HEAD --stat, git diff --stat, and git diff --cached --stat. Categorize changed files (source, tests, config, docs).

Config files: Scan for project config files to discover available checks (test runners, linters, type checkers, build tools). Only propose checks for tools actually configured.

Project docs: Read CLAUDE.md, AGENTS.md, README.md, and CONTRIBUTING.md for explicit test/lint/build commands and conventions.

1c. Present and Confirm (ad-hoc mode)

Present the hypothesis:

Based on your changes and project setup, here's what I'd validate:

Changes detected:
- [N] source files modified ([list key files])
- [N] test files modified
- [N] doc files modified

Proposed checks:
1. [auto] `command` — description
2. [auto] `command` — description
3. [human] Description of qualitative check

Add, remove, or modify any of these? Or confirm to run.

Every proposed check must cite its signal source (git diff, config file, or project doc). Wait for user confirmation before executing.

2. Plan Preconditions (plan mode only)

Check that all task checkboxes are complete:

bash <plugin-skills-dir>/start-work/scripts/check_tasks_complete.sh <path>

The script exits 0 with "OK: all tasks complete" if all boxes are checked. It exits 1 and prints the open task lines if any remain. If tasks remain, report them and stop:

"[N] task(s) incomplete. Complete all tasks before validating."

Do not proceed with partial validation.

3. Classify Criteria

Tag each numbered item in the Validation section:

Automated — item contains a runnable command in a code block
Human — item describes an observable condition requiring judgment
Mixed — item has both a runnable command and a judgment component

4. Run Automated Checks

Execute commands from automated and mixed criteria in priority order (numbered list order). Capture exit code and output per criterion.

Exit code 0 = pass
Non-zero exit code = fail (but read output — see automated-validation for interpretation nuance)

For mixed criteria, run the automated part first. If it fails, mark the criterion as failed without proceeding to the human component.

5. Present Human Criteria

Show the full numbered list with results:

Validation Results:
1. [PASS] `python python -m pytest tests/ -v` — 42 passed
2. [FAIL] `ruff check src/` — 3 errors found
3. [PENDING] All API responses use consistent error format
4. [PENDING] Documentation covers all new endpoints

Ask the user to confirm each pending (human) criterion. Default: present the full list and ask for confirmation on all pending items at once. If the user prefers one-by-one, switch to that mode.

See human-validation for presentation patterns, judgment vs. confirmation criteria, and escalation.

6. Handle Failures

If any criterion failed:

Report which criteria failed with evidence (command output, user rejection reason)
Load failure-diagnosis
Classify the gap type: integration gap, specification drift, or missing cross-cutting concern

Plan mode:

Suggest 1-3 new tasks to close the gap, formatted to match the plan's existing task style
Keep plan in executing state
Ask user: add suggested tasks to plan, or abandon?

Ad-hoc mode:

Present specific, actionable suggestions to fix each failure
Offer to re-run validation after fixes are applied

No plan file to update — suggestions are conversational.

7. On Success

When all criteria pass (automated + human confirmed) and none are marked uncertain:

Plan mode:

Update plan frontmatter: status: completed
Output structured summary:

Validation Complete — ALL PASSED

Plan: [plan name]
Criteria: [N] total ([A] automated, [H] human, [M] mixed)

Results:
1. [PASS] criterion description
2. [PASS] criterion description (human-confirmed)
...

Status updated: executing → completed

Ad-hoc mode:

Output results summary:

Validation Complete — ALL PASSED

Criteria: [N] total ([A] automated, [H] human)

Results:
1. [PASS] criterion description
2. [PASS] criterion description (human-confirmed)
...

Key Instructions

All tasks must be complete before validating (plan mode). The precondition check (Step 2) enforces this. Partial validation produces misleading results.
Run automated checks before human checks. Automated results inform human judgment. If tests fail, asking the user to judge code quality is premature.
Read command output, not just exit codes. A passing exit code with warning output may still indicate problems. A failing exit code from a missing environment is "blocked," not "failed."
Plan stays in executing on failure (plan mode). Never mark a plan as failed or completed when validation criteria fail. Add tasks to address gaps or abandon with a reason.
Run the criteria as given, not your own. In plan mode, run what the plan author wrote. In ad-hoc mode, run what the user confirmed. Do not invent additional criteria or skip criteria you consider redundant.

Anti-Pattern Guards

Skipping human criteria — automated-only validation misses qualitative concerns. Present human criteria even when all automated checks pass.
Diagnosing without evidence — when reporting failures, include command output, error messages, or specific observations. "It didn't work" is not a diagnosis.
Running quality judgment before structural checks pass — if plan structure is malformed (missing status, wrong task count), quality checks produce meaningless results. Structural preconditions (Step 2) must pass before any judgment-based criterion runs. A malformed plan is not "mostly validated."

Handoff

Chainable to: finish-work (on pass), start-work (on fail)

Related Skills

bcbeidel/check-help-skill

tools

VerifiedTrustedCommunity

Use when the user wants to "audit a help skill", "review my plugin index", or "verify my help-skill is up to date". Audits a plugins/<plugin>/skills/help/SKILL.md against the help-skill rubric — coverage, freshness, frontmatter fidelity, plus five judgment dimensions and a trigger-collision check.

1SKILL.mdUpdated May 3, 2026

bcbeidel/check-help-skill

bcbeidel/build-help-skill

tools

VerifiedTrustedCommunity

Use when the user wants to "scaffold a help skill", "add a /<plugin>:help command", or "build a plugin index skill", or wants to give a plugin an orientation surface that lists its skills and common workflows. Produces a SKILL.md at plugins/<plugin>/skills/help/SKILL.md.

1SKILL.mdUpdated May 3, 2026

bcbeidel/build-help-skill

bcbeidel/check-skill-pair

tools

VerifiedTrustedCommunity

Audits pair-level integrity of a primitive-pair (the artifact `/build:build-skill-pair` produces) by walking the four required artifact slots — principles doc, `build-<primitive>/SKILL.md`, `check-<primitive>/SKILL.md`, and the `primitive-routing.md` registration — and reports cross-artifact issues a per-SKILL.md checker cannot see: missing principles doc, divergent principles paths between halves, absent routing registration, missing build→check handoff. Per-half structural compliance with the unified pattern (`check-skill-pattern.md`) is delegated to `plugins/build/_shared/scripts/check_skill_pattern.py`. Use when the user wants to "audit a skill pair", "review a primitive pair", or "validate the skill pair for X". Not for auditing a single SKILL.md — route to `/build:check-skill`. Not for re-distilling a stale principles doc — route to `/build:build-skill-pair`.

1SKILL.mdUpdated Apr 24, 2026

bcbeidel/check-skill-pair

bcbeidel/check-resolver

testing

VerifiedTrustedCommunity

Audit a root-level resolver — verify AGENTS.md pointer, managed-region integrity, filing-table coverage against disk, context-table actionability, and trigger-eval pass rate. Use when the user wants to "audit a resolver", "validate routing table", or "find dark capabilities".

1SKILL.mdUpdated Apr 24, 2026

bcbeidel/check-resolver

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/bcbeidel/wos.git

# Copy into Claude Code skills folder (global)
cp -r wos/plugins/work/skills/verify-work ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

bcbeidel/wos

1 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT