Validate Feature

Deploy the feature locally with DEBUG logging, run security scans and behavioral tests against live services, check CI/CD status, and verify OpenSpec spec compliance. Produces a structured validation report and posts it to the PR.

Arguments

$ARGUMENTS - OpenSpec change-id (required), optionally followed by flags:

--skip-e2e or --skip-playwright — skip the Playwright E2E phase
--skip-ci — skip the CI/CD status check
--skip-security — skip the Security Scan phase
--phase <name>[,<name>] — run only specified phases (e.g., --phase smoke,security)

Valid phase names: deploy, smoke, gen-eval, security, e2e, architecture, spec, logs, ci

Prerequisites

Feature branch exists with implementation commits (default openspec/<change-id>, or the operator-mandated branch when OPENSPEC_BRANCH_OVERRIDE is set)
Docker/docker-compose installed and running (for Deploy phase)
Approved OpenSpec proposal exists at openspec/changes/<change-id>/
Run /implement-feature first if no implementation exists

Provider-Neutral Dispatch

When validation delegates checks or evidence review, treat the provider-neutral dispatch adapter as the canonical cross-provider path. Claude Code, Codex, and Gemini/Jules are first-class providers when configured; Claude-specific harness examples are adapter internals, with inline validation as the fallback.

OpenSpec Execution Preference

Use OpenSpec-generated runtime assets first, then CLI fallback:

Claude: .claude/commands/opsx/*.md or .claude/skills/openspec-*/SKILL.md
Codex: .codex/skills/openspec-*/SKILL.md
Gemini: .gemini/commands/opsx/*.toml or .gemini/skills/openspec-*/SKILL.md
Fallback: direct openspec CLI commands

Coordinator Integration (Optional)

Use docs/coordination-detection-template.md as the shared detection preamble.

Detect transport and capability flags at skill start
Execute hooks only when the matching CAN_* flag is true
If coordinator is unavailable, continue with standalone behavior

Local CLI Mutation Boundary

Validation writes reports, evidence, logs, and sometimes follow-up artifacts. In local CLI execution, validation MUST run inside the feature worktree or another managed worktree before the first write:

eval "$(python3 "<skill-base-dir>/../worktree/scripts/worktree.py" setup "$CHANGE_ID")"
cd "$WORKTREE_PATH"
skills/.venv/bin/python skills/shared/checkout_policy.py require-mutation

Read-only CI status checks may inspect from the shared checkout, but any report or evidence file must be written in a worktree so it lands on the PR branch.

Steps

0. Detect Coordinator and Recall Memory

At skill start, run the coordination detection preamble and set:

COORDINATOR_AVAILABLE
COORDINATION_TRANSPORT (mcp|http|none)
CAN_LOCK, CAN_QUEUE_WORK, CAN_HANDOFF, CAN_MEMORY, CAN_GUARDRAILS

If CAN_MEMORY=true, recall relevant validation history:

MCP path: recall
HTTP path: "<skill-base-dir>/../coordination-bridge/scripts/coordination_bridge.py" try_recall(...)

On recall failure/unavailability, continue with validation and log informationally.

1. Determine Change ID and Configuration

# Parse change-id from argument or current branch
BRANCH=$(git branch --show-current)
CHANGE_ID=${ARGUMENTS%% --*}  # Everything before first flag
CHANGE_ID=${CHANGE_ID:-$(echo $BRANCH | sed 's/^openspec\///')}

# Detect worktree context and resolve OpenSpec path
# Note: detect auto-discovers context from the working directory;
# agent-id information is available via the worktree registry if needed.
eval "$(python3 "<skill-base-dir>/../worktree/scripts/worktree.py" detect)"
PROJECT_ROOT="${MAIN_REPO:-$(git rev-parse --show-toplevel)}"

Parse flags from $ARGUMENTS:

--skip-e2e or --skip-playwright → set SKIP_E2E=true
--skip-ci → set SKIP_CI=true
--skip-security → set SKIP_SECURITY=true
--phase <names> → set PHASES to comma-separated list; only run those phases

If --phase is provided, only the listed phases execute. If --phase includes phases other than deploy, assume services are already running (skip deploy and teardown).

2. Verify Prerequisites

# Resolve the expected feature branch — honors registry + OPENSPEC_BRANCH_OVERRIDE
eval "$(python3 "<skill-base-dir>/../worktree/scripts/worktree.py" resolve-branch "$CHANGE_ID")"
FEATURE_BRANCH="$BRANCH"

# Verify on feature branch
CURRENT_BRANCH="$(git branch --show-current)"
if [[ "$CURRENT_BRANCH" != "$FEATURE_BRANCH" ]]; then
  echo "ERROR: on '$CURRENT_BRANCH' but expected '$FEATURE_BRANCH' (source: $BRANCH_SOURCE)" >&2
  exit 1
fi

# Verify proposal exists
openspec show $CHANGE_ID

# Verify implementation commits exist
COMMIT_COUNT=$(git log --oneline main..HEAD | wc -l)
if [ "$COMMIT_COUNT" -eq 0 ]; then
  echo "ERROR: No implementation commits found on this branch."
  echo "Run /implement-feature $CHANGE_ID first."
  exit 1
fi

# Check Docker availability (only if Deploy phase will run)
if docker info > /dev/null 2>&1; then
  echo "Docker is available"
else
  echo "ERROR: Docker is not available. Install Docker Desktop or start the Docker daemon."
  echo "  macOS: brew install --cask docker"
  echo "  Linux: sudo systemctl start docker"
  exit 1
fi

If not on the feature branch, check out $FEATURE_BRANCH (which honors OPENSPEC_BRANCH_OVERRIDE). If no implementation commits exist, abort with guidance.

2.5. Prepare Validation Artifacts

Preferred path:

Use runtime-native verify/continue workflow (opsx:verify equivalent) for artifact guidance.

CLI fallback path:

openspec instructions validation-report --change "$CHANGE_ID"
openspec instructions architecture-impact --change "$CHANGE_ID"
openspec status --change "$CHANGE_ID"

Ensure validation-report.md and architecture-impact.md are updated in the change directory as part of this validation run.

3. Deploy Phase

Phase name: deploy Criticality: Critical (stops validation on failure)

# Find docker-compose file
COMPOSE_FILE=$(find "$PROJECT_ROOT" -maxdepth 2 -name "docker-compose.yml" | head -1)

if [ -z "$COMPOSE_FILE" ]; then
  echo "SKIP: No docker-compose.yml found. Skipping Deploy phase."
  echo "  Smoke tests will run against already-running services."
  DEPLOY_SKIPPED=true
else
  COMPOSE_DIR=$(dirname "$COMPOSE_FILE")
  LOG_FILE="/tmp/validate-feature-${CHANGE_ID}-$(date +%s).log"

  echo "Starting services with DEBUG logging..."
  echo "  Compose file: $COMPOSE_FILE"
  echo "  Log file: $LOG_FILE"

  # Start services with DEBUG logging, redirect output to log file.
  #
  # When the compose file gates the API server behind a profile (e.g. the
  # agent-coordinator's `coordinator-api` service uses `profiles: [api]` so it
  # doesn't auto-start during simple `docker compose up`), pass
  # `COMPOSE_PROFILES` so the API process IS started here — without it the
  # smoke + e2e phases get connection-refused on the API port. Multiple
  # profiles can be comma-separated (`api,langfuse`).
  AGENT_COORDINATOR_DB_PORT=${AGENT_COORDINATOR_DB_PORT:-54322} \
  AGENT_COORDINATOR_REST_PORT=${AGENT_COORDINATOR_REST_PORT:-8081} \
  AGENT_COORDINATOR_REALTIME_PORT=${AGENT_COORDINATOR_REALTIME_PORT:-4000} \
  COMPOSE_PROFILES=${COMPOSE_PROFILES:-api} \
  LOG_LEVEL=DEBUG docker-compose -f "$COMPOSE_FILE" up -d --build 2>&1 | tee "$LOG_FILE"

  # Wait for health checks
  echo "Waiting for services to be healthy..."
  docker-compose -f "$COMPOSE_FILE" ps

  # Wait for PostgreSQL health check (up to 30 seconds)
  for i in $(seq 1 30); do
    if docker-compose -f "$COMPOSE_FILE" exec -T postgres pg_isready -U postgres > /dev/null 2>&1; then
      echo "PostgreSQL is ready"
      break
    fi
    sleep 1
  done

  # Wait for REST API health endpoint (up to 30 seconds — the API container
  # may need build time on first run + warmup before /health flips to 200)
  for i in $(seq 1 30); do
    if curl -sf http://localhost:${AGENT_COORDINATOR_REST_PORT:-8081}/health > /dev/null 2>&1; then
      echo "REST API is ready"
      break
    fi
    sleep 1
  done

  # Collect running container logs in background
  docker-compose -f "$COMPOSE_FILE" logs -f >> "$LOG_FILE" 2>&1 &
  LOG_PID=$!

  DEPLOY_RESULT="pass"
fi

If Deploy fails, report the failure with Docker logs and skip to Teardown.

4. Smoke Phase

Phase name: smoke Criticality: Critical (stops validation on failure)

Run the reusable pytest smoke test suite against the live services. The suite is configurable via environment variables so it works with any deployed HTTP API.

# Configure for the target API (adjust per project)
export API_BASE_URL="${API_BASE_URL:-http://localhost:8000}"
export API_HEALTH_ENDPOINT="${API_HEALTH_ENDPOINT:-/health}"
export API_READY_ENDPOINT="${API_READY_ENDPOINT:-/ready}"
export API_AUTH_HEADER="${API_AUTH_HEADER:-X-Admin-Key}"
export API_AUTH_VALUE="${API_AUTH_VALUE:-$ADMIN_API_KEY}"
export API_PROTECTED_ENDPOINT="${API_PROTECTED_ENDPOINT:-/api/v1/settings/prompts}"
export API_CORS_ORIGIN="${API_CORS_ORIGIN:-http://localhost:5173}"

# Run smoke tests
SKILL_DIR="$(git rev-parse --show-toplevel)/skills/validate-feature"
pytest "$SKILL_DIR/scripts/smoke_tests/" -v --tb=short 2>&1
SMOKE_EXIT=$?

if [ $SMOKE_EXIT -eq 0 ]; then
  SMOKE_RESULT="pass"
elif [ $SMOKE_EXIT -eq 5 ]; then
  # Exit code 5 = no tests collected (services not running, all skipped)
  SMOKE_RESULT="skip"
  echo "SKIP: Services not running — smoke tests auto-skipped"
else
  SMOKE_RESULT="fail"
  SMOKE_FAILED=true
fi

The smoke tests cover:

Health: Health and readiness endpoints respond with 2xx
Auth enforcement: No credentials → 401/403, valid credentials → 2xx, garbage credentials rejected
CORS: Preflight returns correct Access-Control-* headers (skipped if CORS not configured)
Error sanitization: Error responses don't leak filesystem paths, stack traces, internal IPs, or credentials
Security headers: Content-Type set correctly, Server header not overly detailed, no X-Powered-By

If Smoke fails (SMOKE_EXIT != 0 and != 5), stop validation and skip to Teardown.

4b. Gen-Eval Phase (Optional)

Phase name: gen-eval Criticality: Non-critical (continues on failure)

Run generator-evaluator testing when interface descriptors exist for the project. This phase auto-detects descriptor files and selects between two modes:

cli-augmented mode when both an interface descriptor AND an OpenSpec change directory at openspec/changes/<change-id>/specs/ exist. Gen-eval is invoked with --mode cli-augmented --openspec-change <change-id> so the generator seeds scenarios from the change's WHEN/THEN spec blocks.
template-only mode (existing fallback) when descriptors exist but no OpenSpec change directory. Requires no CLI or SDK dependencies.
Skipped when no descriptors are found.

# Auto-detect gen-eval descriptors
GENEVAL_DESCRIPTORS=$(find "$PROJECT_ROOT" -path "*/evaluation/gen_eval/descriptors/*.yaml" -type f 2>/dev/null)

if [ -z "$GENEVAL_DESCRIPTORS" ]; then
  echo "SKIP: No gen-eval descriptors found. Skipping gen-eval phase."
  GENEVAL_RESULT="skip"
else
  # Mode selection: cli-augmented requires both descriptor AND OpenSpec change dir
  GENEVAL_CHANGE_DIR="$PROJECT_ROOT/openspec/changes/$CHANGE_ID/specs"
  if [ -d "$GENEVAL_CHANGE_DIR" ]; then
    GENEVAL_MODE_FLAGS="--mode cli-augmented --openspec-change $CHANGE_ID"
    GENEVAL_MODE_LABEL="mode=cli-augmented"
    echo "gen-eval: $GENEVAL_MODE_LABEL (descriptor + OpenSpec change present at $GENEVAL_CHANGE_DIR)"
  else
    GENEVAL_MODE_FLAGS="--mode template-only --no-services"
    GENEVAL_MODE_LABEL="mode=template-only"
    echo "gen-eval: $GENEVAL_MODE_LABEL (no OpenSpec change at openspec/changes/$CHANGE_ID/specs/, falling back to template-only)"
  fi

  echo "Running gen-eval testing ($GENEVAL_MODE_LABEL)..."
  GENEVAL_FAILED=false

  for DESCRIPTOR in $GENEVAL_DESCRIPTORS; do
    echo "  Descriptor: $DESCRIPTOR"
    # Resolve the module root (parent of evaluation/) and cd into it
    GENEVAL_MODULE_ROOT=$(dirname "$(dirname "$(dirname "$(dirname "$DESCRIPTOR")")")")
    GENEVAL_PYTHON="$GENEVAL_MODULE_ROOT/.venv/bin/python"
    if [ ! -f "$GENEVAL_PYTHON" ]; then GENEVAL_PYTHON="python3"; fi
    (cd "$GENEVAL_MODULE_ROOT" && "$GENEVAL_PYTHON" -m evaluation.gen_eval \
      --descriptor "$DESCRIPTOR" \
      $GENEVAL_MODE_FLAGS \
      --report-format both \
      --output-dir "$PROJECT_ROOT/openspec/changes/$CHANGE_ID" 2>&1)
    GENEVAL_EXIT=$?

    if [ $GENEVAL_EXIT -ne 0 ]; then
      GENEVAL_FAILED=true
      echo "  gen-eval: FAIL for $DESCRIPTOR (exit $GENEVAL_EXIT)"
    else
      echo "  gen-eval: PASS for $DESCRIPTOR"
    fi
  done

  if [ "$GENEVAL_FAILED" = true ]; then
    GENEVAL_RESULT="fail"
    echo "Gen-eval: FAIL — One or more descriptors had failures (non-blocking)"
  else
    GENEVAL_RESULT="pass"
    echo "Gen-eval: PASS — All descriptors passed"
  fi
fi

Gen-eval failures are non-critical and do not block validation. Results are included in the validation report for informational purposes. cli-augmented mode failures (e.g., from prompt-injection attempts caught by the parser) still degrade gracefully — the validate-feature pipeline continues to subsequent phases.

5. Security Phase

Phase name: security Criticality: Non-critical (continues on failure)

Run security scanners (OWASP Dependency-Check and ZAP) against the live deployment using the existing security-review orchestrator.

# Skip if --skip-security flag was provided
if [ "$SKIP_SECURITY" = true ]; then
  echo "SKIP: Security phase skipped (--skip-security flag)"
  SECURITY_RESULT="skip"
else
  echo "Running security scans against live deployment..."

  # Invoke the security-review orchestrator with the live API target
  python3 skills/security-review/scripts/main.py \
    --repo . \
    --out-dir docs/security-review \
    --zap-target "http://localhost:${AGENT_COORDINATOR_REST_PORT:-3000}" \
    --change "$CHANGE_ID" \
    --allow-degraded-pass 2>&1
  SECURITY_EXIT=$?

  if [ $SECURITY_EXIT -eq 0 ]; then
    SECURITY_RESULT="pass"
    echo "Security: PASS — No threshold findings"
  elif [ $SECURITY_EXIT -eq 10 ]; then
    SECURITY_RESULT="fail"
    echo "Security: FAIL — Threshold findings detected"
  elif [ $SECURITY_EXIT -eq 11 ]; then
    SECURITY_RESULT="degraded"
    echo "Security: INCONCLUSIVE — Scanners degraded (check prerequisites)"
  else
    SECURITY_RESULT="fail"
    echo "Security: ERROR — Unexpected exit code $SECURITY_EXIT"
  fi
fi

The Security phase reuses the /security-review skill's scripts without requiring a separate invocation. The --allow-degraded-pass flag ensures missing prerequisites (Java, container runtime) degrade gracefully instead of blocking validation.

6. E2E Phase

Phase name: e2e Criticality: Non-critical (continues on failure)

# Skip if --skip-e2e flag was provided
if [ "$SKIP_E2E" = true ]; then
  echo "SKIP: E2E phase skipped (--skip-e2e flag)"
else
  # Check if pytest-playwright is installed
  if python3 -c "import playwright" 2>/dev/null; then
    PLAYWRIGHT_AVAILABLE=true
  else
    PLAYWRIGHT_AVAILABLE=false
  fi

  # Check if E2E tests exist
  E2E_DIR=$(find "$PROJECT_ROOT" -path "*/tests/e2e" -type d | head -1)

  if [ -z "$E2E_DIR" ]; then
    echo "SKIP: No tests/e2e/ directory found. Skipping E2E phase."
  elif [ "$PLAYWRIGHT_AVAILABLE" = false ]; then
    echo "SKIP: pytest-playwright not installed. To install:"
    echo "  pip install pytest-playwright"
    echo "  playwright install chromium"
  else
    echo "Running E2E tests from $E2E_DIR..."
    pytest "$E2E_DIR" -v --tb=short 2>&1
    E2E_EXIT=$?

    if [ $E2E_EXIT -eq 0 ]; then
      E2E_RESULT="pass"
    else
      E2E_RESULT="fail"
    fi
  fi
fi

6b. Architecture Diagnostics Phase

Phase name: architecture Criticality: Non-critical (continues on failure)

Run architecture flow validation against the changed files:

# Get changed files relative to main
CHANGED_FILES=$(git diff --name-only main...HEAD | tr '\n' ',')

if [ -f "<skill-base-dir>/../validate-flows/scripts/validate_flows.py" ] && [ -f "docs/architecture-analysis/architecture.graph.json" ]; then
  echo "Running architecture validation on changed files..."
  python3 "<skill-base-dir>/../validate-flows/scripts/validate_flows.py" \
    --graph docs/architecture-analysis/architecture.graph.json \
    --output docs/architecture-analysis/architecture.diagnostics.json \
    --files "$CHANGED_FILES" 2>&1
  ARCH_EXIT=$?

  if [ $ARCH_EXIT -eq 0 ]; then
    ARCH_RESULT="pass"
    ARCH_ERRORS=$(python3 -c "import json; d=json.load(open('docs/architecture-analysis/architecture.diagnostics.json')); print(d['summary']['errors'])" 2>/dev/null || echo 0)
    ARCH_WARNINGS=$(python3 -c "import json; d=json.load(open('docs/architecture-analysis/architecture.diagnostics.json')); print(d['summary']['warnings'])" 2>/dev/null || echo 0)
    if [ "$ARCH_ERRORS" -gt 0 ]; then
      ARCH_RESULT="fail"
    elif [ "$ARCH_WARNINGS" -gt 0 ]; then
      ARCH_RESULT="warn"
    fi
  else
    ARCH_RESULT="fail"
  fi
else
  echo "SKIP: Architecture validation not available (missing scripts or artifacts)"
  echo "  Run 'make architecture' to generate architecture artifacts"
  ARCH_RESULT="skip"
fi

Report architecture diagnostics including broken flows, missing test coverage, orphaned code, and disconnected endpoints.

7. Spec Compliance Phase (via Change Context)

Phase name: spec Criticality: Non-critical (continues on failure) — EXCEPT the task-drift gate (7.0) which is CRITICAL within this phase.

7.0. Task Checkbox Drift Gate (CRITICAL)

Before verifying requirements against the live system, enforce that tasks.md reflects commit reality. Drift between the two is a spec-compliance failure: the plan document either overstates completeness (dangerous) or understates it (bookkeeping debt that breaks archive-time invariants).

# Run from the change's worktree or feature branch
TASKS_FILE="openspec/changes/<change-id>/tasks.md"
UNCHECKED=$(grep -cE "^\s*- \[ \]" "$TASKS_FILE" 2>/dev/null || echo 0)

# Count commits on the feature branch since divergence from main
# (excludes commits on main that the branch inherited)
COMMIT_COUNT=$(git rev-list --count main..HEAD 2>/dev/null || echo 0)

if [ "$UNCHECKED" -gt 0 ] && [ "$COMMIT_COUNT" -gt 0 ]; then
  echo "FAIL: task checkbox drift detected"
  echo "  $TASKS_FILE has $UNCHECKED unchecked boxes"
  echo "  branch has $COMMIT_COUNT commit(s) since main"
  echo ""
  echo "Either:"
  echo "  (a) complete the remaining tasks — in which case this validation run is premature"
  echo "  (b) reconcile — flip checkboxes for tasks whose code has landed (new commit, do NOT amend)"
  echo "  (c) defer — move genuinely-skipped tasks to deferred-tasks.md"
  echo ""
  echo "Do not proceed to requirement verification with drifted tasks.md."
  exit 1  # CRITICAL failure — halts the spec phase
fi

Why this is CRITICAL: Archive validation (openspec archive) checks the tasks artifact's overall status, not individual checkboxes — meaning drift can slip through archive-time and leave inaccurate history. Catching it here, before the archive path, ensures the spec phase is the single source of truth for "does the plan document match what was built?" Per the incident log, specialized-workflow-agents shipped 29 tasks' worth of implementation to main with 0/29 checkboxes flipped because the validation gate didn't catch the drift (this check was added 2026-04-22 in response).

In CI-vs-local behavior: In local validation, exit 1 halts the phase immediately. In CI-invoked validation (where halting would abort merge-gate automation unhelpfully), record the drift as a CRITICAL finding in validation-report.md under "Phase Results" with Result=fail and Details listing the specific unchecked task IDs — do not silently continue.

7.1. Requirement Traceability (per-requirement live verification)

Use the change-context.md traceability matrix as the spec compliance artifact:

Read change-context.md from the change directory ($OPENSPEC_PATH/changes/<change-id>/change-context.md).
- If it does not exist (pre-existing change implemented before this artifact was introduced), generate the skeleton now: read spec delta files from specs/, extract SHALL/MUST clauses, and create rows with Req ID, Spec Source, Description, and Test(s) derived from git diff --name-only main..HEAD.
For each row in the Requirement Traceability Matrix, verify the requirement against the live system:
- API scenarios: Make HTTP requests to the running service and verify responses
- MCP tool scenarios: Invoke MCP tools via the Python module and check results
- Database scenarios: Query PostgreSQL directly and verify state
- Configuration scenarios: Check file existence, content, or environment variables
Update the Evidence column for each row:
- pass <short-SHA> — requirement verified successfully against the live system
- fail <short-SHA> — requirement verification failed (include brief reason)
- deferred <reason> — cannot verify in this environment (e.g., requires production)
Update Coverage Summary with final counts: requirements traced, tests mapped, evidence collected, gaps, and deferred items.

Report results sourced from the updated change-context.md:

Spec Compliance Results (from change-context.md):
  ✓ skill-workflow.1: Change context artifact generated during implementation
  ✓ skill-workflow.2: 3-phase incremental generation
  ✗ skill-workflow.3: TDD enforcement — test written after implementation
  ✓ skill-workflow.4: Validation report references change-context.md

7.5. Work Package Evidence Completeness [local-parallel+]

Phase name: evidence Criticality: Non-critical (continues on failure)

This phase runs only when work-packages.yaml exists at openspec/changes/<change-id>/. It audits that all work packages produced valid results and contract compliance evidence is present.

For each work package, validate its result (if artifacts/<package-id>/work-queue-result.json exists):

skills/.venv/bin/python "<skill-base-dir>/../validate-packages/scripts/validate_work_result.py" \
  artifacts/<package-id>/result.json

Checks per package:

Result JSON validates against work-queue-result.schema.json
contracts_revision and plan_revision match work-packages.yaml
scope_check.passed is true
verification.passed is true
No unresolved escalations with disposition fix or escalate

Cross-package consistency:

No two packages report modifications to the same file
All packages used the same contracts_revision and plan_revision

If change-context.md exists, populate the Evidence column from work-queue results.

8. Log Analysis Phase

Phase name: logs Criticality: Non-critical (continues on failure)

Scan the collected log file for warning signs:

if [ -f "$LOG_FILE" ]; then
  echo "Analyzing logs: $LOG_FILE"
  echo "Log file size: $(wc -l < "$LOG_FILE") lines"

  # Count by severity
  WARNINGS=$(grep -c -i "WARNING" "$LOG_FILE" 2>/dev/null || echo 0)
  ERRORS=$(grep -c -i "ERROR" "$LOG_FILE" 2>/dev/null || echo 0)
  CRITICALS=$(grep -c -i "CRITICAL" "$LOG_FILE" 2>/dev/null || echo 0)

  # Check for specific patterns
  DEPRECATIONS=$(grep -c -i "deprecat" "$LOG_FILE" 2>/dev/null || echo 0)
  STACK_TRACES=$(grep -c "Traceback" "$LOG_FILE" 2>/dev/null || echo 0)
  UNHANDLED=$(grep -c "unhandled\|uncaught" "$LOG_FILE" 2>/dev/null || echo 0)

  echo "  Warnings: $WARNINGS"
  echo "  Errors: $ERRORS"
  echo "  Critical: $CRITICALS"
  echo "  Deprecations: $DEPRECATIONS"
  echo "  Stack traces: $STACK_TRACES"
  echo "  Unhandled exceptions: $UNHANDLED"

  # Show context for errors and critical entries
  if [ "$ERRORS" -gt 0 ] || [ "$CRITICALS" -gt 0 ]; then
    echo ""
    echo "Error/Critical entries with context:"
    grep -n -i -B2 -A2 "ERROR\|CRITICAL" "$LOG_FILE" | head -50
  fi

  # Show deprecation warnings
  if [ "$DEPRECATIONS" -gt 0 ]; then
    echo ""
    echo "Deprecation notices:"
    grep -n -i "deprecat" "$LOG_FILE" | head -20
  fi
else
  echo "SKIP: No log file available (Deploy phase was skipped or no services started)"
fi

Categorize findings by severity:

Critical: CRITICAL log entries, unhandled exceptions, stack traces
Warning: WARNING entries, deprecation notices
Info: Unusual patterns, high log volume from specific components

9. CI/CD Status Phase

Phase name: ci Criticality: Non-critical (continues on failure)

# Skip if --skip-ci flag was provided
if [ "$SKIP_CI" = true ]; then
  echo "SKIP: CI/CD check skipped (--skip-ci flag)"
else
  # Check if GitHub remote is configured
  if git remote get-url origin > /dev/null 2>&1; then
    # Check if PR exists for this branch (uses resolved FEATURE_BRANCH)
    PR_URL=$(gh pr view "$FEATURE_BRANCH" --json url --jq '.url' 2>/dev/null)

    if [ -n "$PR_URL" ]; then
      echo "PR found: $PR_URL"
      echo ""
      echo "CI/CD Check Status:"
      gh pr checks "$FEATURE_BRANCH" 2>/dev/null || echo "  No CI checks configured yet"
    else
      echo "No PR found for $FEATURE_BRANCH"
      echo "Checking latest workflow runs..."
      gh run list --branch "$FEATURE_BRANCH" --limit 3 2>/dev/null || echo "  No workflow runs found"
    fi
  else
    echo "SKIP: No GitHub remote configured"
  fi
fi

10. Teardown

Stop services and clean up:

# Only teardown if we started services (Deploy phase ran)
if [ "$DEPLOY_SKIPPED" != true ] && [ -n "$COMPOSE_FILE" ]; then
  echo "Stopping services..."

  # Stop background log collection
  if [ -n "$LOG_PID" ]; then
    kill $LOG_PID 2>/dev/null
  fi

  # Stop docker-compose services
  docker-compose -f "$COMPOSE_FILE" down

  echo "Services stopped"
fi

# Handle log file
if [ -f "$LOG_FILE" ]; then
  if [ "$ALL_PHASES_PASSED" = true ]; then
    rm "$LOG_FILE"
    echo "Log file removed (all phases passed)"
  else
    echo "Log file preserved for inspection: $LOG_FILE"
  fi
fi

11. Validation Report

Produce a structured summary of all phases:

## Validation Report: <change-id>

**Date**: YYYY-MM-DD HH:MM:SS
**Commit**: <short SHA>
**Branch**: <resolved feature branch — openspec/<change-id> by default, or operator override>

### Phase Results

✓ Deploy: Services started (N containers, DEBUG logging enabled)
✓ Smoke: All health checks passed (API, MCP, database)
○ Gen-Eval: Skipped (no descriptors found) _or_ ✓ Gen-Eval: 12/12 scenarios passed (template-only)
✓ Security: PASS — No threshold findings (dependency-check: ok, zap: ok)
✗ E2E: 3/5 tests passed, 2 failures
  - test_login_flow: TimeoutError on /api/auth
  - test_dashboard_load: Element not found: #stats-panel
✓ Architecture: No broken flows (2 warnings: orphaned functions)
✓ Spec Compliance: 8/8 requirements verified (see change-context.md)
⚠ Log Analysis: 3 warnings found
  - [WARNING] Deprecated function call: old_api_handler (line 142)
✓ CI/CD: All checks passing

### Result

**PASS** — Ready for `/cleanup-feature <change-id>`

_or_

**FAIL** — Address findings, then re-run `/validate-feature` or `/iterate-on-implementation`

Use these symbols:

✓ — Phase passed
✗ — Phase failed
⚠ — Phase passed with warnings
○ — Phase skipped

12. Persist Report

Write the validation report to the OpenSpec change directory:

REPORT_FILE="$OPENSPEC_PATH/changes/$CHANGE_ID/validation-report.md"
TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S')
COMMIT_SHA=$(git rev-parse --short HEAD)

# Write report (overwrites previous)
cat > "$REPORT_FILE" << EOF
# Validation Report: $CHANGE_ID

**Date**: $TIMESTAMP
**Commit**: $COMMIT_SHA
**Branch**: $FEATURE_BRANCH

## Phase Results

<phase results from Step 10>

## Result

<PASS or FAIL with guidance>
EOF

echo "Report written to: $REPORT_FILE"

13. PR Comment

Post the validation report as a PR comment:

PR_NUMBER=$(gh pr view "$FEATURE_BRANCH" --json number --jq '.number' 2>/dev/null)

if [ -n "$PR_NUMBER" ]; then
  gh pr comment "$PR_NUMBER" --body "$(cat <<EOF
## 🔍 Automated Validation Report

<contents of validation report from Step 10>

---
_Generated by \`/validate-feature $CHANGE_ID\` at $TIMESTAMP_
EOF
)"
  echo "Report posted to PR #$PR_NUMBER"
else
  echo "SKIP: No PR found for $FEATURE_BRANCH — report not posted"
  echo "  Create a PR first, then re-run to post the report"
fi

14. Append Session Log

Construct a PhaseRecord for the Validation phase and call write_both(). Validation phases are typically read-only; emphasize the validation outcomes (pass/fail summary, waivers, deferred issues) in the structured fields rather than free-form prose.

Capture from this validation:

Decisions — Decisions about waivers granted, phases run/skipped, or deferred issues. For clean passes, leave empty (the rendered markdown omits the Decisions section).
Open Questions — Questions raised by validation that need follow-up (e.g., "Is X behavior intentional?").
Completed Work — Phases that ran and passed (e.g., ["spec", "evidence", "deploy", "smoke"]).
Next Steps — Recommended next workflow step (cleanup, iterate-on-implementation, etc.).
Summary — 2–3 sentences: what was validated, pass/fail summary, any waivers.

Persist via PhaseRecord.write_both():

python3 - <<'EOF'
import sys
sys.path.insert(0, "skills/session-log/scripts")
from phase_record import PhaseRecord, Decision

record = PhaseRecord(
    change_id="<change-id>",
    phase_name="Validation",
    agent_type="<agent-type>",
    summary="<2-3 sentences: phases run, pass/fail summary, waivers>",
    decisions=[
        Decision(title="<title>", rationale="<rationale>"),
    ],
    completed_work=["spec", "evidence"],   # phases that passed
    next_steps=["/cleanup-feature <change-id>"],
)
result = record.write_both()
print(f"markdown_path={result.markdown_path}")
print(f"sanitized={result.sanitized}")
print(f"handoff_id={result.handoff_id or '(local fallback)'}")
print(f"handoff_local_path={result.handoff_local_path}")
for w in result.warnings:
    print(f"WARN: {w}", file=sys.stderr)
EOF

write_both() runs three best-effort steps internally: append rendered markdown → sanitize in-place → coordinator handoff (or local fallback at openspec/changes/<change-id>/handoffs/validation-<N>.json). Each step logs warnings on failure but does not raise.

Commit and push (validate-feature is read-only, so this needs a dedicated commit):

git add "openspec/changes/<change-id>/session-log.md"
git commit -m "chore: append validation session log for <change-id>"
git push

After Validation

If all phases PASS:

Ready for cleanup:
/cleanup-feature <change-id>

If phases FAIL:

Option 1: Fix findings and re-validate:
/iterate-on-implementation <change-id>
/validate-feature <change-id>

Option 2: Re-run specific failing phases:
/validate-feature <change-id> --phase smoke,spec

Option 3: Skip non-critical failures and proceed:
/cleanup-feature <change-id>

Present the validation report and let the user decide the next step.

Output

Validation report printed to console
Report persisted to openspec/changes/<change-id>/validation-report.md
Report posted as PR comment (if PR exists)
Services cleaned up (if Deploy phase ran)
Log file preserved (if failures occurred) or removed (if all passed)

If CAN_MEMORY=true, remember validation outcomes (phase pass/fail, key regressions, and next actions):

MCP path: remember
HTTP path: "<skill-base-dir>/../coordination-bridge/scripts/coordination_bridge.py" try_remember(...)

Next Step

After validation passes:

/cleanup-feature <change-id>

Validate Feature

Arguments

$ARGUMENTS - OpenSpec change-id (required), optionally followed by flags:

--skip-e2e or --skip-playwright — skip the Playwright E2E phase
--skip-ci — skip the CI/CD status check
--skip-security — skip the Security Scan phase
--phase <name>[,<name>] — run only specified phases (e.g., --phase smoke,security)

Valid phase names: deploy, smoke, gen-eval, security, e2e, architecture, spec, logs, ci

Prerequisites

Feature branch exists with implementation commits (default openspec/<change-id>, or the operator-mandated branch when OPENSPEC_BRANCH_OVERRIDE is set)
Docker/docker-compose installed and running (for Deploy phase)
Approved OpenSpec proposal exists at openspec/changes/<change-id>/
Run /implement-feature first if no implementation exists

Provider-Neutral Dispatch

OpenSpec Execution Preference

Use OpenSpec-generated runtime assets first, then CLI fallback:

Claude: .claude/commands/opsx/*.md or .claude/skills/openspec-*/SKILL.md
Codex: .codex/skills/openspec-*/SKILL.md
Gemini: .gemini/commands/opsx/*.toml or .gemini/skills/openspec-*/SKILL.md
Fallback: direct openspec CLI commands

Coordinator Integration (Optional)

Use docs/coordination-detection-template.md as the shared detection preamble.

Detect transport and capability flags at skill start
Execute hooks only when the matching CAN_* flag is true
If coordinator is unavailable, continue with standalone behavior

Local CLI Mutation Boundary

eval "$(python3 "<skill-base-dir>/../worktree/scripts/worktree.py" setup "$CHANGE_ID")"
cd "$WORKTREE_PATH"
skills/.venv/bin/python skills/shared/checkout_policy.py require-mutation

Read-only CI status checks may inspect from the shared checkout, but any report or evidence file must be written in a worktree so it lands on the PR branch.

Steps

0. Detect Coordinator and Recall Memory

At skill start, run the coordination detection preamble and set:

COORDINATOR_AVAILABLE
COORDINATION_TRANSPORT (mcp|http|none)
CAN_LOCK, CAN_QUEUE_WORK, CAN_HANDOFF, CAN_MEMORY, CAN_GUARDRAILS

If CAN_MEMORY=true, recall relevant validation history:

MCP path: recall
HTTP path: "<skill-base-dir>/../coordination-bridge/scripts/coordination_bridge.py" try_recall(...)

On recall failure/unavailability, continue with validation and log informationally.

1. Determine Change ID and Configuration

# Parse change-id from argument or current branch
BRANCH=$(git branch --show-current)
CHANGE_ID=${ARGUMENTS%% --*}  # Everything before first flag
CHANGE_ID=${CHANGE_ID:-$(echo $BRANCH | sed 's/^openspec\///')}

# Detect worktree context and resolve OpenSpec path
# Note: detect auto-discovers context from the working directory;
# agent-id information is available via the worktree registry if needed.
eval "$(python3 "<skill-base-dir>/../worktree/scripts/worktree.py" detect)"
PROJECT_ROOT="${MAIN_REPO:-$(git rev-parse --show-toplevel)}"

Parse flags from $ARGUMENTS:

--skip-e2e or --skip-playwright → set SKIP_E2E=true
--skip-ci → set SKIP_CI=true
--skip-security → set SKIP_SECURITY=true
--phase <names> → set PHASES to comma-separated list; only run those phases

If --phase is provided, only the listed phases execute. If --phase includes phases other than deploy, assume services are already running (skip deploy and teardown).

2. Verify Prerequisites

# Resolve the expected feature branch — honors registry + OPENSPEC_BRANCH_OVERRIDE
eval "$(python3 "<skill-base-dir>/../worktree/scripts/worktree.py" resolve-branch "$CHANGE_ID")"
FEATURE_BRANCH="$BRANCH"

# Verify on feature branch
CURRENT_BRANCH="$(git branch --show-current)"
if [[ "$CURRENT_BRANCH" != "$FEATURE_BRANCH" ]]; then
  echo "ERROR: on '$CURRENT_BRANCH' but expected '$FEATURE_BRANCH' (source: $BRANCH_SOURCE)" >&2
  exit 1
fi

# Verify proposal exists
openspec show $CHANGE_ID

# Verify implementation commits exist
COMMIT_COUNT=$(git log --oneline main..HEAD | wc -l)
if [ "$COMMIT_COUNT" -eq 0 ]; then
  echo "ERROR: No implementation commits found on this branch."
  echo "Run /implement-feature $CHANGE_ID first."
  exit 1
fi

# Check Docker availability (only if Deploy phase will run)
if docker info > /dev/null 2>&1; then
  echo "Docker is available"
else
  echo "ERROR: Docker is not available. Install Docker Desktop or start the Docker daemon."
  echo "  macOS: brew install --cask docker"
  echo "  Linux: sudo systemctl start docker"
  exit 1
fi

If not on the feature branch, check out $FEATURE_BRANCH (which honors OPENSPEC_BRANCH_OVERRIDE). If no implementation commits exist, abort with guidance.

2.5. Prepare Validation Artifacts

Preferred path:

Use runtime-native verify/continue workflow (opsx:verify equivalent) for artifact guidance.

CLI fallback path:

openspec instructions validation-report --change "$CHANGE_ID"
openspec instructions architecture-impact --change "$CHANGE_ID"
openspec status --change "$CHANGE_ID"

Ensure validation-report.md and architecture-impact.md are updated in the change directory as part of this validation run.

3. Deploy Phase

Phase name: deploy Criticality: Critical (stops validation on failure)

# Find docker-compose file
COMPOSE_FILE=$(find "$PROJECT_ROOT" -maxdepth 2 -name "docker-compose.yml" | head -1)

if [ -z "$COMPOSE_FILE" ]; then
  echo "SKIP: No docker-compose.yml found. Skipping Deploy phase."
  echo "  Smoke tests will run against already-running services."
  DEPLOY_SKIPPED=true
else
  COMPOSE_DIR=$(dirname "$COMPOSE_FILE")
  LOG_FILE="/tmp/validate-feature-${CHANGE_ID}-$(date +%s).log"

  echo "Starting services with DEBUG logging..."
  echo "  Compose file: $COMPOSE_FILE"
  echo "  Log file: $LOG_FILE"

  # Start services with DEBUG logging, redirect output to log file.
  #
  # When the compose file gates the API server behind a profile (e.g. the
  # agent-coordinator's `coordinator-api` service uses `profiles: [api]` so it
  # doesn't auto-start during simple `docker compose up`), pass
  # `COMPOSE_PROFILES` so the API process IS started here — without it the
  # smoke + e2e phases get connection-refused on the API port. Multiple
  # profiles can be comma-separated (`api,langfuse`).
  AGENT_COORDINATOR_DB_PORT=${AGENT_COORDINATOR_DB_PORT:-54322} \
  AGENT_COORDINATOR_REST_PORT=${AGENT_COORDINATOR_REST_PORT:-8081} \
  AGENT_COORDINATOR_REALTIME_PORT=${AGENT_COORDINATOR_REALTIME_PORT:-4000} \
  COMPOSE_PROFILES=${COMPOSE_PROFILES:-api} \
  LOG_LEVEL=DEBUG docker-compose -f "$COMPOSE_FILE" up -d --build 2>&1 | tee "$LOG_FILE"

  # Wait for health checks
  echo "Waiting for services to be healthy..."
  docker-compose -f "$COMPOSE_FILE" ps

  # Wait for PostgreSQL health check (up to 30 seconds)
  for i in $(seq 1 30); do
    if docker-compose -f "$COMPOSE_FILE" exec -T postgres pg_isready -U postgres > /dev/null 2>&1; then
      echo "PostgreSQL is ready"
      break
    fi
    sleep 1
  done

  # Wait for REST API health endpoint (up to 30 seconds — the API container
  # may need build time on first run + warmup before /health flips to 200)
  for i in $(seq 1 30); do
    if curl -sf http://localhost:${AGENT_COORDINATOR_REST_PORT:-8081}/health > /dev/null 2>&1; then
      echo "REST API is ready"
      break
    fi
    sleep 1
  done

  # Collect running container logs in background
  docker-compose -f "$COMPOSE_FILE" logs -f >> "$LOG_FILE" 2>&1 &
  LOG_PID=$!

  DEPLOY_RESULT="pass"
fi

If Deploy fails, report the failure with Docker logs and skip to Teardown.

4. Smoke Phase

Phase name: smoke Criticality: Critical (stops validation on failure)

Run the reusable pytest smoke test suite against the live services. The suite is configurable via environment variables so it works with any deployed HTTP API.

# Configure for the target API (adjust per project)
export API_BASE_URL="${API_BASE_URL:-http://localhost:8000}"
export API_HEALTH_ENDPOINT="${API_HEALTH_ENDPOINT:-/health}"
export API_READY_ENDPOINT="${API_READY_ENDPOINT:-/ready}"
export API_AUTH_HEADER="${API_AUTH_HEADER:-X-Admin-Key}"
export API_AUTH_VALUE="${API_AUTH_VALUE:-$ADMIN_API_KEY}"
export API_PROTECTED_ENDPOINT="${API_PROTECTED_ENDPOINT:-/api/v1/settings/prompts}"
export API_CORS_ORIGIN="${API_CORS_ORIGIN:-http://localhost:5173}"

# Run smoke tests
SKILL_DIR="$(git rev-parse --show-toplevel)/skills/validate-feature"
pytest "$SKILL_DIR/scripts/smoke_tests/" -v --tb=short 2>&1
SMOKE_EXIT=$?

if [ $SMOKE_EXIT -eq 0 ]; then
  SMOKE_RESULT="pass"
elif [ $SMOKE_EXIT -eq 5 ]; then
  # Exit code 5 = no tests collected (services not running, all skipped)
  SMOKE_RESULT="skip"
  echo "SKIP: Services not running — smoke tests auto-skipped"
else
  SMOKE_RESULT="fail"
  SMOKE_FAILED=true
fi

The smoke tests cover:

Health: Health and readiness endpoints respond with 2xx
Auth enforcement: No credentials → 401/403, valid credentials → 2xx, garbage credentials rejected
CORS: Preflight returns correct Access-Control-* headers (skipped if CORS not configured)
Error sanitization: Error responses don't leak filesystem paths, stack traces, internal IPs, or credentials
Security headers: Content-Type set correctly, Server header not overly detailed, no X-Powered-By

If Smoke fails (SMOKE_EXIT != 0 and != 5), stop validation and skip to Teardown.

4b. Gen-Eval Phase (Optional)

Phase name: gen-eval Criticality: Non-critical (continues on failure)

Run generator-evaluator testing when interface descriptors exist for the project. This phase auto-detects descriptor files and selects between two modes:

cli-augmented mode when both an interface descriptor AND an OpenSpec change directory at openspec/changes/<change-id>/specs/ exist. Gen-eval is invoked with --mode cli-augmented --openspec-change <change-id> so the generator seeds scenarios from the change's WHEN/THEN spec blocks.
template-only mode (existing fallback) when descriptors exist but no OpenSpec change directory. Requires no CLI or SDK dependencies.
Skipped when no descriptors are found.

# Auto-detect gen-eval descriptors
GENEVAL_DESCRIPTORS=$(find "$PROJECT_ROOT" -path "*/evaluation/gen_eval/descriptors/*.yaml" -type f 2>/dev/null)

if [ -z "$GENEVAL_DESCRIPTORS" ]; then
  echo "SKIP: No gen-eval descriptors found. Skipping gen-eval phase."
  GENEVAL_RESULT="skip"
else
  # Mode selection: cli-augmented requires both descriptor AND OpenSpec change dir
  GENEVAL_CHANGE_DIR="$PROJECT_ROOT/openspec/changes/$CHANGE_ID/specs"
  if [ -d "$GENEVAL_CHANGE_DIR" ]; then
    GENEVAL_MODE_FLAGS="--mode cli-augmented --openspec-change $CHANGE_ID"
    GENEVAL_MODE_LABEL="mode=cli-augmented"
    echo "gen-eval: $GENEVAL_MODE_LABEL (descriptor + OpenSpec change present at $GENEVAL_CHANGE_DIR)"
  else
    GENEVAL_MODE_FLAGS="--mode template-only --no-services"
    GENEVAL_MODE_LABEL="mode=template-only"
    echo "gen-eval: $GENEVAL_MODE_LABEL (no OpenSpec change at openspec/changes/$CHANGE_ID/specs/, falling back to template-only)"
  fi

  echo "Running gen-eval testing ($GENEVAL_MODE_LABEL)..."
  GENEVAL_FAILED=false

  for DESCRIPTOR in $GENEVAL_DESCRIPTORS; do
    echo "  Descriptor: $DESCRIPTOR"
    # Resolve the module root (parent of evaluation/) and cd into it
    GENEVAL_MODULE_ROOT=$(dirname "$(dirname "$(dirname "$(dirname "$DESCRIPTOR")")")")
    GENEVAL_PYTHON="$GENEVAL_MODULE_ROOT/.venv/bin/python"
    if [ ! -f "$GENEVAL_PYTHON" ]; then GENEVAL_PYTHON="python3"; fi
    (cd "$GENEVAL_MODULE_ROOT" && "$GENEVAL_PYTHON" -m evaluation.gen_eval \
      --descriptor "$DESCRIPTOR" \
      $GENEVAL_MODE_FLAGS \
      --report-format both \
      --output-dir "$PROJECT_ROOT/openspec/changes/$CHANGE_ID" 2>&1)
    GENEVAL_EXIT=$?

    if [ $GENEVAL_EXIT -ne 0 ]; then
      GENEVAL_FAILED=true
      echo "  gen-eval: FAIL for $DESCRIPTOR (exit $GENEVAL_EXIT)"
    else
      echo "  gen-eval: PASS for $DESCRIPTOR"
    fi
  done

  if [ "$GENEVAL_FAILED" = true ]; then
    GENEVAL_RESULT="fail"
    echo "Gen-eval: FAIL — One or more descriptors had failures (non-blocking)"
  else
    GENEVAL_RESULT="pass"
    echo "Gen-eval: PASS — All descriptors passed"
  fi
fi

5. Security Phase

Phase name: security Criticality: Non-critical (continues on failure)

Run security scanners (OWASP Dependency-Check and ZAP) against the live deployment using the existing security-review orchestrator.

# Skip if --skip-security flag was provided
if [ "$SKIP_SECURITY" = true ]; then
  echo "SKIP: Security phase skipped (--skip-security flag)"
  SECURITY_RESULT="skip"
else
  echo "Running security scans against live deployment..."

  # Invoke the security-review orchestrator with the live API target
  python3 skills/security-review/scripts/main.py \
    --repo . \
    --out-dir docs/security-review \
    --zap-target "http://localhost:${AGENT_COORDINATOR_REST_PORT:-3000}" \
    --change "$CHANGE_ID" \
    --allow-degraded-pass 2>&1
  SECURITY_EXIT=$?

  if [ $SECURITY_EXIT -eq 0 ]; then
    SECURITY_RESULT="pass"
    echo "Security: PASS — No threshold findings"
  elif [ $SECURITY_EXIT -eq 10 ]; then
    SECURITY_RESULT="fail"
    echo "Security: FAIL — Threshold findings detected"
  elif [ $SECURITY_EXIT -eq 11 ]; then
    SECURITY_RESULT="degraded"
    echo "Security: INCONCLUSIVE — Scanners degraded (check prerequisites)"
  else
    SECURITY_RESULT="fail"
    echo "Security: ERROR — Unexpected exit code $SECURITY_EXIT"
  fi
fi

6. E2E Phase

Phase name: e2e Criticality: Non-critical (continues on failure)

# Skip if --skip-e2e flag was provided
if [ "$SKIP_E2E" = true ]; then
  echo "SKIP: E2E phase skipped (--skip-e2e flag)"
else
  # Check if pytest-playwright is installed
  if python3 -c "import playwright" 2>/dev/null; then
    PLAYWRIGHT_AVAILABLE=true
  else
    PLAYWRIGHT_AVAILABLE=false
  fi

  # Check if E2E tests exist
  E2E_DIR=$(find "$PROJECT_ROOT" -path "*/tests/e2e" -type d | head -1)

  if [ -z "$E2E_DIR" ]; then
    echo "SKIP: No tests/e2e/ directory found. Skipping E2E phase."
  elif [ "$PLAYWRIGHT_AVAILABLE" = false ]; then
    echo "SKIP: pytest-playwright not installed. To install:"
    echo "  pip install pytest-playwright"
    echo "  playwright install chromium"
  else
    echo "Running E2E tests from $E2E_DIR..."
    pytest "$E2E_DIR" -v --tb=short 2>&1
    E2E_EXIT=$?

    if [ $E2E_EXIT -eq 0 ]; then
      E2E_RESULT="pass"
    else
      E2E_RESULT="fail"
    fi
  fi
fi

6b. Architecture Diagnostics Phase

Phase name: architecture Criticality: Non-critical (continues on failure)

Run architecture flow validation against the changed files:

# Get changed files relative to main
CHANGED_FILES=$(git diff --name-only main...HEAD | tr '\n' ',')

if [ -f "<skill-base-dir>/../validate-flows/scripts/validate_flows.py" ] && [ -f "docs/architecture-analysis/architecture.graph.json" ]; then
  echo "Running architecture validation on changed files..."
  python3 "<skill-base-dir>/../validate-flows/scripts/validate_flows.py" \
    --graph docs/architecture-analysis/architecture.graph.json \
    --output docs/architecture-analysis/architecture.diagnostics.json \
    --files "$CHANGED_FILES" 2>&1
  ARCH_EXIT=$?

  if [ $ARCH_EXIT -eq 0 ]; then
    ARCH_RESULT="pass"
    ARCH_ERRORS=$(python3 -c "import json; d=json.load(open('docs/architecture-analysis/architecture.diagnostics.json')); print(d['summary']['errors'])" 2>/dev/null || echo 0)
    ARCH_WARNINGS=$(python3 -c "import json; d=json.load(open('docs/architecture-analysis/architecture.diagnostics.json')); print(d['summary']['warnings'])" 2>/dev/null || echo 0)
    if [ "$ARCH_ERRORS" -gt 0 ]; then
      ARCH_RESULT="fail"
    elif [ "$ARCH_WARNINGS" -gt 0 ]; then
      ARCH_RESULT="warn"
    fi
  else
    ARCH_RESULT="fail"
  fi
else
  echo "SKIP: Architecture validation not available (missing scripts or artifacts)"
  echo "  Run 'make architecture' to generate architecture artifacts"
  ARCH_RESULT="skip"
fi

Report architecture diagnostics including broken flows, missing test coverage, orphaned code, and disconnected endpoints.

7. Spec Compliance Phase (via Change Context)

Phase name: spec Criticality: Non-critical (continues on failure) — EXCEPT the task-drift gate (7.0) which is CRITICAL within this phase.

7.0. Task Checkbox Drift Gate (CRITICAL)

# Run from the change's worktree or feature branch
TASKS_FILE="openspec/changes/<change-id>/tasks.md"
UNCHECKED=$(grep -cE "^\s*- \[ \]" "$TASKS_FILE" 2>/dev/null || echo 0)

# Count commits on the feature branch since divergence from main
# (excludes commits on main that the branch inherited)
COMMIT_COUNT=$(git rev-list --count main..HEAD 2>/dev/null || echo 0)

if [ "$UNCHECKED" -gt 0 ] && [ "$COMMIT_COUNT" -gt 0 ]; then
  echo "FAIL: task checkbox drift detected"
  echo "  $TASKS_FILE has $UNCHECKED unchecked boxes"
  echo "  branch has $COMMIT_COUNT commit(s) since main"
  echo ""
  echo "Either:"
  echo "  (a) complete the remaining tasks — in which case this validation run is premature"
  echo "  (b) reconcile — flip checkboxes for tasks whose code has landed (new commit, do NOT amend)"
  echo "  (c) defer — move genuinely-skipped tasks to deferred-tasks.md"
  echo ""
  echo "Do not proceed to requirement verification with drifted tasks.md."
  exit 1  # CRITICAL failure — halts the spec phase
fi

7.1. Requirement Traceability (per-requirement live verification)

Use the change-context.md traceability matrix as the spec compliance artifact:

Read change-context.md from the change directory ($OPENSPEC_PATH/changes/<change-id>/change-context.md).
- If it does not exist (pre-existing change implemented before this artifact was introduced), generate the skeleton now: read spec delta files from specs/, extract SHALL/MUST clauses, and create rows with Req ID, Spec Source, Description, and Test(s) derived from git diff --name-only main..HEAD.
For each row in the Requirement Traceability Matrix, verify the requirement against the live system:
- API scenarios: Make HTTP requests to the running service and verify responses
- MCP tool scenarios: Invoke MCP tools via the Python module and check results
- Database scenarios: Query PostgreSQL directly and verify state
- Configuration scenarios: Check file existence, content, or environment variables
Update the Evidence column for each row:
- pass <short-SHA> — requirement verified successfully against the live system
- fail <short-SHA> — requirement verification failed (include brief reason)
- deferred <reason> — cannot verify in this environment (e.g., requires production)
Update Coverage Summary with final counts: requirements traced, tests mapped, evidence collected, gaps, and deferred items.

Report results sourced from the updated change-context.md:

Spec Compliance Results (from change-context.md):
  ✓ skill-workflow.1: Change context artifact generated during implementation
  ✓ skill-workflow.2: 3-phase incremental generation
  ✗ skill-workflow.3: TDD enforcement — test written after implementation
  ✓ skill-workflow.4: Validation report references change-context.md

7.5. Work Package Evidence Completeness [local-parallel+]

Phase name: evidence Criticality: Non-critical (continues on failure)

This phase runs only when work-packages.yaml exists at openspec/changes/<change-id>/. It audits that all work packages produced valid results and contract compliance evidence is present.

For each work package, validate its result (if artifacts/<package-id>/work-queue-result.json exists):

skills/.venv/bin/python "<skill-base-dir>/../validate-packages/scripts/validate_work_result.py" \
  artifacts/<package-id>/result.json

Checks per package:

Result JSON validates against work-queue-result.schema.json
contracts_revision and plan_revision match work-packages.yaml
scope_check.passed is true
verification.passed is true
No unresolved escalations with disposition fix or escalate

Cross-package consistency:

No two packages report modifications to the same file
All packages used the same contracts_revision and plan_revision

If change-context.md exists, populate the Evidence column from work-queue results.

8. Log Analysis Phase

Phase name: logs Criticality: Non-critical (continues on failure)

Scan the collected log file for warning signs:

if [ -f "$LOG_FILE" ]; then
  echo "Analyzing logs: $LOG_FILE"
  echo "Log file size: $(wc -l < "$LOG_FILE") lines"

  # Count by severity
  WARNINGS=$(grep -c -i "WARNING" "$LOG_FILE" 2>/dev/null || echo 0)
  ERRORS=$(grep -c -i "ERROR" "$LOG_FILE" 2>/dev/null || echo 0)
  CRITICALS=$(grep -c -i "CRITICAL" "$LOG_FILE" 2>/dev/null || echo 0)

  # Check for specific patterns
  DEPRECATIONS=$(grep -c -i "deprecat" "$LOG_FILE" 2>/dev/null || echo 0)
  STACK_TRACES=$(grep -c "Traceback" "$LOG_FILE" 2>/dev/null || echo 0)
  UNHANDLED=$(grep -c "unhandled\|uncaught" "$LOG_FILE" 2>/dev/null || echo 0)

  echo "  Warnings: $WARNINGS"
  echo "  Errors: $ERRORS"
  echo "  Critical: $CRITICALS"
  echo "  Deprecations: $DEPRECATIONS"
  echo "  Stack traces: $STACK_TRACES"
  echo "  Unhandled exceptions: $UNHANDLED"

  # Show context for errors and critical entries
  if [ "$ERRORS" -gt 0 ] || [ "$CRITICALS" -gt 0 ]; then
    echo ""
    echo "Error/Critical entries with context:"
    grep -n -i -B2 -A2 "ERROR\|CRITICAL" "$LOG_FILE" | head -50
  fi

  # Show deprecation warnings
  if [ "$DEPRECATIONS" -gt 0 ]; then
    echo ""
    echo "Deprecation notices:"
    grep -n -i "deprecat" "$LOG_FILE" | head -20
  fi
else
  echo "SKIP: No log file available (Deploy phase was skipped or no services started)"
fi

Categorize findings by severity:

Critical: CRITICAL log entries, unhandled exceptions, stack traces
Warning: WARNING entries, deprecation notices
Info: Unusual patterns, high log volume from specific components

9. CI/CD Status Phase

Phase name: ci Criticality: Non-critical (continues on failure)

# Skip if --skip-ci flag was provided
if [ "$SKIP_CI" = true ]; then
  echo "SKIP: CI/CD check skipped (--skip-ci flag)"
else
  # Check if GitHub remote is configured
  if git remote get-url origin > /dev/null 2>&1; then
    # Check if PR exists for this branch (uses resolved FEATURE_BRANCH)
    PR_URL=$(gh pr view "$FEATURE_BRANCH" --json url --jq '.url' 2>/dev/null)

    if [ -n "$PR_URL" ]; then
      echo "PR found: $PR_URL"
      echo ""
      echo "CI/CD Check Status:"
      gh pr checks "$FEATURE_BRANCH" 2>/dev/null || echo "  No CI checks configured yet"
    else
      echo "No PR found for $FEATURE_BRANCH"
      echo "Checking latest workflow runs..."
      gh run list --branch "$FEATURE_BRANCH" --limit 3 2>/dev/null || echo "  No workflow runs found"
    fi
  else
    echo "SKIP: No GitHub remote configured"
  fi
fi

10. Teardown

Stop services and clean up:

# Only teardown if we started services (Deploy phase ran)
if [ "$DEPLOY_SKIPPED" != true ] && [ -n "$COMPOSE_FILE" ]; then
  echo "Stopping services..."

  # Stop background log collection
  if [ -n "$LOG_PID" ]; then
    kill $LOG_PID 2>/dev/null
  fi

  # Stop docker-compose services
  docker-compose -f "$COMPOSE_FILE" down

  echo "Services stopped"
fi

# Handle log file
if [ -f "$LOG_FILE" ]; then
  if [ "$ALL_PHASES_PASSED" = true ]; then
    rm "$LOG_FILE"
    echo "Log file removed (all phases passed)"
  else
    echo "Log file preserved for inspection: $LOG_FILE"
  fi
fi

11. Validation Report

Produce a structured summary of all phases:

## Validation Report: <change-id>

**Date**: YYYY-MM-DD HH:MM:SS
**Commit**: <short SHA>
**Branch**: <resolved feature branch — openspec/<change-id> by default, or operator override>

### Phase Results

✓ Deploy: Services started (N containers, DEBUG logging enabled)
✓ Smoke: All health checks passed (API, MCP, database)
○ Gen-Eval: Skipped (no descriptors found) _or_ ✓ Gen-Eval: 12/12 scenarios passed (template-only)
✓ Security: PASS — No threshold findings (dependency-check: ok, zap: ok)
✗ E2E: 3/5 tests passed, 2 failures
  - test_login_flow: TimeoutError on /api/auth
  - test_dashboard_load: Element not found: #stats-panel
✓ Architecture: No broken flows (2 warnings: orphaned functions)
✓ Spec Compliance: 8/8 requirements verified (see change-context.md)
⚠ Log Analysis: 3 warnings found
  - [WARNING] Deprecated function call: old_api_handler (line 142)
✓ CI/CD: All checks passing

### Result

**PASS** — Ready for `/cleanup-feature <change-id>`

_or_

**FAIL** — Address findings, then re-run `/validate-feature` or `/iterate-on-implementation`

Use these symbols:

✓ — Phase passed
✗ — Phase failed
⚠ — Phase passed with warnings
○ — Phase skipped

12. Persist Report

Write the validation report to the OpenSpec change directory:

REPORT_FILE="$OPENSPEC_PATH/changes/$CHANGE_ID/validation-report.md"
TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S')
COMMIT_SHA=$(git rev-parse --short HEAD)

# Write report (overwrites previous)
cat > "$REPORT_FILE" << EOF
# Validation Report: $CHANGE_ID

**Date**: $TIMESTAMP
**Commit**: $COMMIT_SHA
**Branch**: $FEATURE_BRANCH

## Phase Results

<phase results from Step 10>

## Result

<PASS or FAIL with guidance>
EOF

echo "Report written to: $REPORT_FILE"

13. PR Comment

Post the validation report as a PR comment:

PR_NUMBER=$(gh pr view "$FEATURE_BRANCH" --json number --jq '.number' 2>/dev/null)

if [ -n "$PR_NUMBER" ]; then
  gh pr comment "$PR_NUMBER" --body "$(cat <<EOF
## 🔍 Automated Validation Report

<contents of validation report from Step 10>

---
_Generated by \`/validate-feature $CHANGE_ID\` at $TIMESTAMP_
EOF
)"
  echo "Report posted to PR #$PR_NUMBER"
else
  echo "SKIP: No PR found for $FEATURE_BRANCH — report not posted"
  echo "  Create a PR first, then re-run to post the report"
fi

14. Append Session Log

Capture from this validation:

Decisions — Decisions about waivers granted, phases run/skipped, or deferred issues. For clean passes, leave empty (the rendered markdown omits the Decisions section).
Open Questions — Questions raised by validation that need follow-up (e.g., "Is X behavior intentional?").
Completed Work — Phases that ran and passed (e.g., ["spec", "evidence", "deploy", "smoke"]).
Next Steps — Recommended next workflow step (cleanup, iterate-on-implementation, etc.).
Summary — 2–3 sentences: what was validated, pass/fail summary, any waivers.

Persist via PhaseRecord.write_both():

python3 - <<'EOF'
import sys
sys.path.insert(0, "skills/session-log/scripts")
from phase_record import PhaseRecord, Decision

record = PhaseRecord(
    change_id="<change-id>",
    phase_name="Validation",
    agent_type="<agent-type>",
    summary="<2-3 sentences: phases run, pass/fail summary, waivers>",
    decisions=[
        Decision(title="<title>", rationale="<rationale>"),
    ],
    completed_work=["spec", "evidence"],   # phases that passed
    next_steps=["/cleanup-feature <change-id>"],
)
result = record.write_both()
print(f"markdown_path={result.markdown_path}")
print(f"sanitized={result.sanitized}")
print(f"handoff_id={result.handoff_id or '(local fallback)'}")
print(f"handoff_local_path={result.handoff_local_path}")
for w in result.warnings:
    print(f"WARN: {w}", file=sys.stderr)
EOF

Commit and push (validate-feature is read-only, so this needs a dedicated commit):

git add "openspec/changes/<change-id>/session-log.md"
git commit -m "chore: append validation session log for <change-id>"
git push

After Validation

If all phases PASS:

Ready for cleanup:
/cleanup-feature <change-id>

If phases FAIL:

Option 1: Fix findings and re-validate:
/iterate-on-implementation <change-id>
/validate-feature <change-id>

Option 2: Re-run specific failing phases:
/validate-feature <change-id> --phase smoke,spec

Option 3: Skip non-critical failures and proceed:
/cleanup-feature <change-id>

Present the validation report and let the user decide the next step.

Output

Validation report printed to console
Report persisted to openspec/changes/<change-id>/validation-report.md
Report posted as PR comment (if PR exists)
Services cleaned up (if Deploy phase ran)
Log file preserved (if failures occurred) or removed (if all passed)

If CAN_MEMORY=true, remember validation outcomes (phase pass/fail, key regressions, and next actions):

MCP path: remember
HTTP path: "<skill-base-dir>/../coordination-bridge/scripts/coordination_bridge.py" try_remember(...)

Next Step

After validation passes:

/cleanup-feature <change-id>

Adoption

jankneumann/validate-feature

$ install --global

Security Scan Results

SKILL.md

Validate Feature

Arguments

Prerequisites

Provider-Neutral Dispatch

OpenSpec Execution Preference

Coordinator Integration (Optional)

Local CLI Mutation Boundary

Steps

0. Detect Coordinator and Recall Memory

1. Determine Change ID and Configuration

2. Verify Prerequisites

2.5. Prepare Validation Artifacts

3. Deploy Phase

4. Smoke Phase

4b. Gen-Eval Phase (Optional)

5. Security Phase

6. E2E Phase

6b. Architecture Diagnostics Phase

7. Spec Compliance Phase (via Change Context)

7.0. Task Checkbox Drift Gate (CRITICAL)

7.1. Requirement Traceability (per-requirement live verification)

7.5. Work Package Evidence Completeness [local-parallel+]

8. Log Analysis Phase

9. CI/CD Status Phase

10. Teardown

11. Validation Report

12. Persist Report

13. PR Comment

14. Append Session Log

After Validation

Output

Next Step

Related Skills

jankneumann/review-artifacts

jankneumann/coordinator-task-status-renderer

jankneumann/missing-tail-block

jankneumann/missing-keys

jankneumann/validate-feature

$ install --global

Security Scan Results

SKILL.md

Validate Feature

Arguments

Prerequisites

Provider-Neutral Dispatch

OpenSpec Execution Preference

Coordinator Integration (Optional)

Local CLI Mutation Boundary

Steps

0. Detect Coordinator and Recall Memory

1. Determine Change ID and Configuration

2. Verify Prerequisites

2.5. Prepare Validation Artifacts

3. Deploy Phase

4. Smoke Phase

4b. Gen-Eval Phase (Optional)

5. Security Phase

6. E2E Phase

6b. Architecture Diagnostics Phase

7. Spec Compliance Phase (via Change Context)

7.0. Task Checkbox Drift Gate (CRITICAL)

7.1. Requirement Traceability (per-requirement live verification)

7.5. Work Package Evidence Completeness [local-parallel+]

8. Log Analysis Phase

9. CI/CD Status Phase

10. Teardown

11. Validation Report

12. Persist Report

13. PR Comment

14. Append Session Log

After Validation

Output

Next Step

Related Skills

jankneumann/review-artifacts