framework_eng/skills/tool-usage/review/cross-provider-review/SKILL.md
Use for advisory second-opinion review between model families. Routes GPT/Codex primary agents to Claude/Opus review and vice versa; supports sandbox sessions, follow-up, debate, sync, status, log, stats, show, and close lifecycle.
npx skillsauth add steelmorgan/1c-agent-based-dev-framework cross-provider-reviewInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
A single skill for cross-family second opinion. The reviewer is an advisory layer, not the final authority, and must not edit the real project.
AI governance classification: advice-only. Owner: orchestrator/primary agent. HITL is required where workflow,
product, or architectural approval gates require it. Quality signal: evidence-backed findings, a clear primary-agent
position, review trace, and observable lifecycle/cleanup.
.agents/skills/cross-provider-review/scripts/claude_opus_review.py.agents/skills/cross-provider-review/scripts/codex_review.pyThe skill works in two modes with different verdict semantics:
verdict: PASS is a mandatory condition for completion. This mode is used by the orchestrator exactly once at the end of the task, instead of an advisory final.The mode is fixed in the introductory prompt (via --constraints / --review-ask) - the reviewer must explicitly know whether the verdict is blocking or advisory.
references/review-prompt.md - default shape for advisory review (task artifacts and acceptance-bound reviews). Can be used in a simplified form for free-form opinion review / idea critique, as long as the read-only and evidence boundaries are explicit.references/finalization-prompt.md - template for gate mode (task finalization). Includes a strict structure: bidirectional rule compliance check, goal verification with a traceability table, anti-deception checklist, and an iterative protocol with escalation to the user after 3 rounds.Both adapters support the same lifecycle:
start: creates .review-sandboxes/<review_id>/workspace, materializes focused paths or full context (by default via hardlink - almost instant and with no disk usage) and launches the reviewer.ask: continues a saved session.debate: discusses one specific finding.sync: updates the sandbox from the real source paths.status: shows phase, heartbeat, pid, logs, timeout, result preview, and live progress counters.log: shows prompt/response history.stats: shows available token/cost stats, raw event stats, and tool-call counters.show: shows review metadata, cumulative stats, and runtime state as a single JSON payload.close: closes and, by default, removes the sandbox; use --keep-sandbox only for forensic/debug purposes.Status interpretation: a moving heartbeat means the process is alive; a stale heartbeat without stdout/stderr growth is
practical evidence of a stuck state; phase=timeout means a single invocation exceeded the timeout.
close)Level: CRITICAL / MUST. Sandbox cleanup is tied EXCLUSIVELY to an explicit
closecall. The adapters have NO automatic cleanup: noatexit, no signal handler, no TTL/age sweep, and no orphaned-sandbox collection atstart. If the agent never reachesclose(crash, yield, error/FAIL branch, escalation, forgetfulness), the.review-sandboxes/<review_id>/directory together with the full-context source mirror remains on disk forever. In practice, this has already led to dozens of orphaned directories that had to be deleted manually.
MUST for every start:
| Requirement | Description |
|-----------|----------|
| Start↔close pairing | Every start MUST have a paired close in the same agent session. start without a guaranteed close is forbidden |
| close on all branches | close is called on ANY termination path: PASS, FAIL, user escalation, review refusal, adapter error. Not just on the happy path |
| close in finalization | Before writing final-report.md, the agent MUST ensure that all open reviews are closed (see checkpoint below) |
| Cleanup report | The final report/context records the cleanup status of each review_id: closed or (rarely) kept --keep-sandbox: <forensic reason> |
| --keep-sandbox only when justified | Use ONLY for forensic/debug with an explicit written reason. Default is a normal close with deletion |
✅ CHECKPOINT before task completion (MUST run):
# 1. Показать все НЕзакрытые sandbox в проекте:
ls -1 .review-sandboxes/ 2>/dev/null
# 2. Для каждого оставшегося <review_id> — закрыть:
<adapter-script> close <review_id>
# 3. Подтвердить, что каталог пуст (ожидается 0):
ls -1 .review-sandboxes/ 2>/dev/null | wc -l
If step 3 returns anything other than 0, the task is NOT considered complete from the cleanup perspective: close the
remaining reviews and only then close the task. A non-empty .review-sandboxes/ at the time of the final report is a
violation of this skill.
Start:
.agents/skills/cross-provider-review/scripts/claude_opus_review.py start \
--full-context \
--task "<task>" \
--goal "<review focus>" \
--requirements "<requirements>" \
--constraints "Second-opinion review only. Do not implement fixes." \
--primary-target "<file>" \
--changed-files <file1> <file2> \
--open-concerns "<concerns>" \
--review-ask "Review this artifact as a second opinion. Order findings by severity." \
--question "Perform a second-opinion review of the current work."
Focused/free-form:
.agents/skills/cross-provider-review/scripts/claude_opus_review.py start \
--question "Review this idea and identify the strongest counterarguments." path/to/file.md
Start:
.agents/skills/cross-provider-review/scripts/codex_review.py start \
--full-context \
--task "<task>" \
--goal "<review focus>" \
--artifact-type "<code|tests|architecture|policy|prompt>" \
--requirements "<requirements>" \
--constraints "Second-opinion review only. Do not implement fixes." \
--primary-target "<file>" \
--changed-files <file1> <file2> \
--open-concerns "<concerns>" \
--review-ask "Review this artifact as a second opinion. Order findings by severity." \
--question "Perform a second-opinion review of the current work."
Focused/free-form:
.agents/skills/cross-provider-review/scripts/codex_review.py start \
--question "Review this idea and identify the strongest counterarguments." path/to/file.md
After start, use the same lifecycle for both adapters. In the examples below, <adapter-script> means the script selected
by routing:
.agents/skills/cross-provider-review/scripts/claude_opus_review.py.agents/skills/cross-provider-review/scripts/codex_review.py<adapter-script> ask REVIEW_ID --question "..."
<adapter-script> debate REVIEW_ID --issue "F-01" --finding "..." --position "..."
<adapter-script> sync REVIEW_ID
<adapter-script> status REVIEW_ID
<adapter-script> log REVIEW_ID
<adapter-script> stats REVIEW_ID
<adapter-script> show REVIEW_ID
<adapter-script> close REVIEW_ID
Use status while a blocking review is running for a long time. Use sync after changing source artifacts and before
follow-up or delta review. Use log, stats, and show for trace/debug; these are not mandatory commands on every
happy path. Use close --keep-sandbox only for rare forensic/debug cases.
status.runtime.progress and stats include adapter-observable activity for Claude and Codex reviews:
raw_events: number of JSON events from the CLI;event_types: event counters by type;tool_calls_total: total number of unique observed tool/function calls;tool_calls_by_name: counters for tool/function calls by tool name;unique_tool_call_ids: number of unique tool/function call ids, if the CLI provides ids;tool_result_events: observed tool/function result events;permission_denials: observed permission-denial events;server_tool_use: provider-reported server-side tool counters, if available.This is runtime observability, not a replacement for reviewer conclusions. The counters help distinguish a truly active review from a process where only the heartbeat changes.
--review-id: set a stable ID for task traceability.--timeout-sec: change the timeout of a single reviewer invocation.--copy-mode {hardlink,copy}: sandbox materialization mode. hardlink (default) - almost instant, ~0 bytes on disk; copy - full byte copy. Hardlink automatically falls back to copy on cross-device or unsupported FS.--keep-sandbox: preserve review files on close only for forensic/debug.--artifact-type, --skills, --reasoning-effort.--model.For per-artifact acceptance-bound review (advisory mode):
status if the review takes a long time.F-01...) if the reviewer did not do so.agree, partial, disagree, withdrawn, or out_of_scope.C-01... if needed.sync before follow-up or delta review.ask for follow-up/delta review and debate only for specific disputed finding IDs.log, stats, or show when trace/debug evidence is needed.Used by the orchestrator once at the end of the task. Unlike the advisory protocol, the reviewer has the final word here.
Prerequisite: the orchestrator must assemble a complete evidence pack (see references/finalization-prompt.md section "Input data"). If any item is missing, the reviewer responds verdict: FAIL on the first round.
Steps:
references/finalization-prompt.md. In --constraints, specify: "Finalization gate mode. Verdict is blocking, not advisory. Use bidirectional rule compliance check."verdict: PASS | FAIL + iteration: N of 3.verdict: PASS - the task may be closed. Record the review_id in final-report.md in the cross_provider_review block.verdict: FAIL - address the findings with evidence-based fixes (diff, new stdout, clarified log). Use ask for the next round.iteration: 3 and the verdict is not PASS - the reviewer issues escalate_to_user: true with dispute_summary. The orchestrator must escalate to the user, passing the dispute_summary verbatim. The user's decision is final.close) only after a documented PASS verdict or user override.Forbidden:
final-report.md + report to the user "done") without verdict: PASS or a user override.sync. The reviewers themselves are strictly read-only (see below), so hardlinks are safe: writing through them is impossible..git, .venv, .review-sandboxes, node_modules, __pycache__, common build outputs, and .claude, .codex, .cursor, .windsurf, .idea so the reviewer does not pick up hooks/permissions/MCP configs from the real project.--sandbox read-only - the kernel-level sandbox blocks any writes regardless of what the model wants.--tools=Read,Grep,Glob,LS, --permission-mode plan (plan-only mode without write/edit) and --strict-mcp-config (without --mcp-config this means "no MCP servers at all"). This is a three-layer permission-level guarantee.testing
MUST use BEFORE making a judgment about the cause of a conflict, a test failure, or an artifact dispute. Defines the end-to-end verification method L1→L6 and the classification of the first broken link.
development
MUST use AFTER a work cycle with ≥2 iterations (wrote → error → fixed → success). Provides the retrospective procedure and the format for recording practice/anti-patterns in references/learned-patterns.md or {project}/.context/learned-patterns.md.
tools
MUST use WHEN you are writing reusable knowledge into RLM (pattern / architectural decision / stable domain fact) OR reading it before a non-trivial task/solution in the domain. Provides the breakdown of native-push vs RLM-pull, tools for writing and reading RLM, H-MEM levels, and hygiene.
testing
MUST use WHEN the task is classified as simple (< 20 lines, 1 file, no new metadata objects, no architectural decisions). Provides a short cycle of 3 steps with a guard on the self path and mandatory verify.