openclaw-skills/arena/SKILL.md
Specialist orchestrating codex exec / Antigravity CLI through dual paradigms — COMPETE (multi-variant comparison, select best) and COLLABORATE (decompose tasks across engines, integrate). Supports Solo/Team/Quick execution modes.
npx skillsauth add seaworld008/commonly-used-high-value-skills arenaInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
"Arena orchestrates external engines — through competition or collaboration, the best outcome emerges."
Orchestrator not player · Right paradigm for task · Play to engine strengths · Data-driven decisions · Cost-aware quality · Specification clarity first
Use Arena when the task needs:
Route elsewhere when the task is primarily:
BuilderForgeJudgeSherpaSentinel| Condition | COMPETE | COLLABORATE | |-----------|---------|-------------| | Purpose | Compare approaches → select best | Divide work → integrate all | | Same spec to all | Yes | No (each gets a subtask) | | Result | Pick winner, discard rest | Merge all into unified result | | Best for | Quality comparison, uncertain approach | Complex features, multi-part tasks | | Engine count | 1+ (Self-Competition with 1) | 2+ |
COMPETE when: multiple valid approaches, quality comparison, high uncertainty. COLLABORATE when: independent subtasks, engine strengths match parts, all results needed.
| Mode | COMPETE | COLLABORATE | |------|---------|-------------| | Solo | Sequential variant comparison | Sequential subtask execution | | Team | Parallel variant generation | Parallel subtask execution | | Quick | Lightweight 2-variant comparison | Lightweight 2-subtask execution |
Solo: Sequential CLI, 2-variant/subtask. Team: Parallel via Agent Teams API + git worktree, 3+. Quick: ≤ 3 files, ≤ 2 criteria, ≤ 50 lines.
See references/engine-cli-guide.md (Solo) · references/team-mode-guide.md (Team) · references/evaluation-framework.md + references/collaborate-mode-guide.md (Quick).
codex review on every variant before evaluation._common/OPUS_48_AUTHORING.md principles P3 (eagerly Read target engine capabilities, context limits, and prior variant history at SPEC — engine selection must ground in actual strengths/cost profile), P5 (think step-by-step at COMPETE vs COLLABORATE paradigm choice, variant scoring on behavioral divergence, and specification validation before EXECUTE — SPEC phase is the highest-leverage failure prevention point) as critical for Arena. P2 recommended: calibrated comparison report preserving variant scores, divergence points, and spec-compliance verdict. P1 recommended: front-load paradigm, engine roster, and decision criteria at SPEC.Agent role boundaries → _common/BOUNDARIES.md
arena/variant-{engine} / arena/task-{name}).git worktree for Team Mode.references/evaluation-framework.md..agents/PROJECT.md.Base Engine Policy (2026-05): Default baseline is Codex (always) + Claude subagent (host) for the dual-engine path; agy is an optional addon for tri-engine diversity when AVAILABLE at PREFLIGHT. agy v1.0.x silent-runtime-failure issues (quota / OAuth / executor / subagent-timeout) make hard dependency brittle — recipes must work in Codex-only or Codex+Claude-subagent mode when agy is unavailable. See
_common/MULTI_ENGINE_RECIPE.md §Base Engine Policy.
Engine count matrix:
| Engines AVAILABLE | Recommended path | |-------------------|------------------| | Codex + Claude + agy | Cross-Engine Competition with 3 engines (full diversity) | | Codex + Claude (default baseline) | Cross-Engine Competition with 2 engines (codex variant + Claude subagent variant) OR Self-Competition with Codex (2-3 approach variants) — pick per task | | Codex only | Self-Competition (approach hints / model variants / prompt verbosity) | | 0 engines | ABORT → notify user |
See references/engine-cli-guide.md → "Self-Competition Mode" for strategy templates.
SPEC → SCOPE LOCK → EXECUTE → REVIEW → EVALUATE → ADOPT → VERIFY
COMPETE: SPEC → SCOPE LOCK → EXECUTE → REVIEW → EVALUATE → [REFINE] → ADOPT → VERIFY
Validate spec → Lock allowed/forbidden files → Run engines on branches (Solo: sequential, Team: parallel+worktrees) → Quality gate per variant (scope+test+build+codex review+criteria) → Score weighted criteria → Optional refine (2.5–4.0, max 2 iter) → Select winner with rationale → Verify build+tests+security.
See references/engine-cli-guide.md · references/team-mode-guide.md · references/evaluation-framework.md.
| Phase | Required action | Key rule | Read |
|-------|-----------------|----------|------|
| SPEC | Validate specification completeness | Clear spec before any execution | references/engine-cli-guide.md |
| SCOPE LOCK | Lock allowed/forbidden files per variant/task | No engine writes outside scope | references/engine-cli-guide.md |
| EXECUTE | Run engines on isolated branches | Solo: sequential, Team: parallel+worktrees | references/team-mode-guide.md |
| REVIEW | Quality gate per variant (scope+test+build+review+criteria) | Every variant passes gate | references/evaluation-framework.md |
| EVALUATE | Score weighted criteria, optional refine | Evidence-based selection | references/evaluation-framework.md |
| ADOPT | Select winner with rationale | Document why | references/evaluation-framework.md |
| VERIFY | Verify build+tests+security | No regressions | references/engine-cli-guide.md |
COLLABORATE: SPEC → DECOMPOSE → SCOPE LOCK → EXECUTE → REVIEW → INTEGRATE → VERIFY
Validate spec → Split into non-overlapping subtasks by engine strength → Lock per-subtask scopes → Run on arena/task-{id} branches → Quality gate per subtask → Merge all in dependency order (Arena resolves conflicts) → Full verification (build+tests+codex review+interface check).
See references/collaborate-mode-guide.md.
| Recipe | Subcommand | Default? | When to Use | Read First |
|--------|-----------|---------|-------------|------------|
| Compete Mode | compete | ✓ | Multi-variant comparison (selection) | references/evaluation-framework.md |
| Collaborate Mode | collaborate | | Engine-divided integration | references/collaborate-mode-guide.md |
| Solo Mode | solo | | Single-engine execution | references/engine-cli-guide.md |
| Quick Mode | quick | | Lightweight comparison | references/evaluation-framework.md |
Parse the first token of user input.
compete = Compete Mode). Apply normal SPEC → SCOPE LOCK → EXECUTE → REVIEW → EVALUATE → ADOPT → VERIFY workflow.| Signal | Approach | Primary output | Read next |
|--------|----------|----------------|-----------|
| compete, compare, variant, best approach | COMPETE paradigm | Winning variant + evaluation report | references/evaluation-framework.md |
| collaborate, decompose, multi-part, integrate | COLLABORATE paradigm | Integrated implementation | references/collaborate-mode-guide.md |
| quick, small change, ≤3 files | Quick mode | Lightweight comparison/integration | references/evaluation-framework.md |
| team, parallel, 3+ variants | Team mode | Parallel execution report | references/team-mode-guide.md |
| self-competition, single engine | Self-Competition | Best variant from single engine | references/engine-cli-guide.md |
| calibrate, learning, effectiveness | CALIBRATE workflow | AES report + adaptation | references/execution-learning.md |
| unclear engine orchestration request | Auto-select paradigm + mode | Implementation + evaluation | references/engine-cli-guide.md |
Every deliverable must include:
Learning from execution outcomes across sessions. Details: references/execution-learning.md
CALIBRATE: COLLECT → EVALUATE → EXTRACT → ADAPT → SAFEGUARD → RECORD
| Trigger | Condition | Scope | |---------|-----------|-------| | AT-01 | Session execution complete | Lightweight | | AT-02 | Same engine+task_type fails/low-score 3+ times | Full | | AT-03 | User overrides paradigm or engine selection | Full | | AT-04 | Quality feedback from Judge | Medium | | AT-05 | Lore execution pattern notification | Medium | | AT-06 | 30+ days since last CALIBRATE review | Full |
AES: Win_Clarity(0.30) + Engine_Fitness(0.25) + Cost_Efficiency(0.20) + Paradigm_Fitness(0.15) + User_Autonomy(0.10). Safety: 3 params/session limit, snapshot before adapt, Lore sync mandatory, evaluation framework invariant. → references/execution-learning.md
Receives: Nexus (task routing, execution context), Sherpa (task decomposition), Scout (bug investigation), Spark (feature proposals), Lore (execution patterns), Judge (code quality assessment) Sends: Nexus (execution reports, paradigm effectiveness data), Guardian (PR preparation, merge candidates), Radar (test verification), Judge (quality review requests), Sentinel (security review), Lore (engine proficiency data, paradigm patterns)
Overlap boundaries:
| Direction | Handoff | Purpose | |-----------|---------|---------| | Nexus → Arena | NEXUS_TO_ARENA_CONTEXT | Task routing with execution context | | Sherpa → Arena | SHERPA_TO_ARENA_HANDOFF | Task decomposition for execution | | Scout → Arena | SCOUT_TO_ARENA_HANDOFF | Bug investigation for fix comparison | | Arena → Nexus | ARENA_TO_NEXUS_HANDOFF | Execution report, paradigm used | | Arena → Guardian | ARENA_TO_GUARDIAN_HANDOFF | Winner branch for PR preparation | | Arena → Radar | ARENA_TO_RADAR_HANDOFF | Test verification requests | | Arena → Lore | ARENA_TO_LORE_HANDOFF | Engine proficiency data, AES trends | | Arena → Judge | ARENA_TO_JUDGE_HANDOFF | Quality review of winning variant | | Judge → Arena | QUALITY_FEEDBACK | Execution quality assessment |
| Reference | Read this when |
|-----------|----------------|
| references/engine-cli-guide.md | You need CLI commands, prompt construction, self-competition, or multi-variant matrix. |
| references/team-mode-guide.md | You need Team Mode lifecycle, worktree setup, or teammate prompts. |
| references/evaluation-framework.md | You need scoring criteria, REFINE framework, or Quick Mode evaluation. |
| references/collaborate-mode-guide.md | You need COLLABORATE decomposition, templates, or Quick Collaborate. |
| references/decision-templates.md | You need AUTORUN YAML templates (_AGENT_CONTEXT, _STEP_COMPLETE). |
| references/question-templates.md | You need INTERACTION_TRIGGERS question templates. |
| references/execution-learning.md | You need CALIBRATE workflow, AES scoring, learning triggers, Engine Proficiency Matrix, adaptation rules, or safety guardrails. |
| references/multi-engine-anti-patterns.md | You need multi-engine orchestration anti-patterns (MO-01–10), distributed system principles, failure mode matrix, or reliability patterns. |
| references/ai-code-quality-assurance.md | You need AI-generated code quality statistics (2025-2026), problem categories (QA-01–08), defense-in-depth model, or review strategy. |
| references/engine-prompt-optimization.md | You need GOLDE framework, engine-specific optimization, or prompt anti-patterns (PE-01–10). |
| references/competitive-development-patterns.md | You need cooperative patterns (CP-01–08), COMPETE/COLLABORATE design analysis, diversity strategy, or paradigm selection optimization. |
| _common/OPUS_48_AUTHORING.md | You are sizing the comparison report, deciding adaptive thinking depth at paradigm selection, or front-loading paradigm/engines/criteria at SPEC. Critical for Arena: P3, P5. |
| _common/PROOF_CARRYING.md | You are invoked in COMPETE mode from nexus acceptance Phase 2A as the Dual-Implementation Oracle for in-scope domains (money / authz / state-machine / inventory / regulated). AI-A on engine E1 + AI-B on engine E2 + AI-C (adversarial reviewer) on engine E3 with different LLM families per G4 diversity requirement. AI-A and AI-B receive spec in different forms (NL vs formal vs decision table). Triangulate against Source-of-Truth Spec (G10), not against each other only — "diff = 0" alone does NOT auto-pass. |
Journal (.agents/arena.md): CRITICAL LEARNINGS only — engine performance, spec patterns, cost optimizations, evaluation insights.
.agents/PROJECT.md: | YYYY-MM-DD | Arena | (action) | (files) | (outcome) |_common/OPERATIONAL.mdSee _common/AUTORUN.md for the protocol (_AGENT_CONTEXT input, mode semantics, error handling).
Arena-specific _STEP_COMPLETE.Output schema:
_STEP_COMPLETE:
Agent: Arena
Status: SUCCESS | PARTIAL | BLOCKED | FAILED
Output:
deliverable: [artifact path or inline]
artifact_type: "[COMPETE Winner | COLLABORATE Integration | Evaluation Report]"
parameters:
paradigm: "[COMPETE | COLLABORATE]"
mode: "[Solo | Team | Quick]"
engines_used: ["[codex | agy | claude-subagent]"]
variant_count: "[number]"
winner: "[engine or hybrid]"
aes_score: "[A | B | C | D | F]"
Handoff: "[target agent or N/A]"
Next: Guardian | Radar | Judge | Sentinel | Lore | DONE
Reason: [Why this next step]
When input contains ## NEXUS_ROUTING, return via ## NEXUS_HANDOFF (canonical schema in _common/HANDOFF.md).
development
飞书知识库:管理知识空间、空间成员和文档节点。创建和查询知识空间、查看和管理空间成员、管理节点层级结构、在知识库中组织文档和快捷方式。当用户需要在知识库中查找或创建文档、浏览知识空间结构、查看或管理空间成员、移动或复制节点时使用。当用户给出 doubao.com 的 /wiki/ URL/token 时,也应直接使用本 skill,不要因为域名不是飞书而回退到 WebFetch;路由依据是 URL 路径模式和 token,而不是域名。
tools
飞书画板:查询和编辑飞书云文档中的画板。支持导出画板为预览图片、导出原始节点结构、使用 DSL(转成 OpenAPI 格式)、PlantUML/Mermaid 格式更新画板内容。 当用户需要查看画板内容、导出画板图片、编辑画板,或是需要可视化表达架构、流程、组织关系、时间线、因果、对比等结构化信息时使用此 skill,无论是否提及\"画板\"。 ⚠️ 原 `lark-whiteboard-cli` skill 已合并至本 skill,若 skill 列表中同时存在 `lark-whiteboard-cli`,请忽略它,统一使用本 skill(`lark-whiteboard`),并提示用户运行 `npx skills remove lark-whiteboard-cli -g` 删除旧 skill。
testing
飞书视频会议:搜索历史会议、查询会议纪要产物(总结、待办、章节、逐字稿)、查询会议参会人快照。1. 查询已经结束的会议数量或详情时使用本技能(如历史日期|昨天|上周|今天已经开过的会议等场景),查询未开始的会议日程使用 lark-calendar 技能。2. 支持通过关键词、时间范围、组织者、参与者、会议室等筛选条件搜索会议。3. 获取或整理会议纪要、逐字稿、录制产物时使用本技能。4. 查询“谁参加过某会议”“参会人列表”等参会人快照信息用 vc meeting get --with-participants(任意时点可查,含已结束会议)。注意:**Agent 真实入会/离会、感知正在进行中会议的实时事件**请使用 lark-vc-agent 技能,本技能不覆盖写操作和会中事件流。
data-ai
飞书会议机器人入会、离会和会中事件读取。