skills/golem-powers/phoenix-human-view/SKILL.md
The human-eval UX contract for Phoenix views: turn-by-turn scrollable replay (not a scorecard), hide-but-copyable IDs, collapsed thinking, identity chips, tool filters, tiny frozen starter datasets, mark-wrong-in-thread, mobile-first. Use when: building or reviewing ANY Phoenix/eval view, annotation UI, session replay, or human-grading surface. Triggers: phoenix view, eval UI, annotation view, session replay, human eval UX, grading interface. NOT for: Phoenix data pipelines/ingest (capture scripts have their own specs).
npx skillsauth add etanhey/golems phoenix-human-viewInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Documentation-of-standard. This contract was extracted from ~25 live Etan corrections during the gen-10 Phoenix sprint (gen-10 weave #15, imp10) and is ALREADY ENCODED in shipped code (status table below). Any new Phoenix/eval view MUST satisfy it; any review of one checks against it.
A human-eval view is a reading surface for a human, not a database admin panel.
<local-command-stdout>,
<command-name>, harness caveats) are ⚙️ COMMAND turns, never 🧑 USER turns
— mislabeling poisons human judgment of "what the user said."Paths below are relative to skills/golem-powers/skill-creator/ — the Phoenix
pipeline lives inside the skill-creator skill, not at repo root.
| Contract item | State | Where |
|---|---|---|
| JSONL→trace ingest (source-of-truth transcripts, not screen scrape) | ✅ #462 | scripts/jsonl_to_phoenix_traces.py |
| Mobile-first annotation view | ✅ #457 | scripts/phoenix_mobile.py + static/phoenix-mobile/ |
| Auto-critic judge write-back + badges | ✅ #458/#459 | scripts/phoenix_auto_critic.py |
| Identity chips (repo/agent/model/role) | ✅ #460 | session cards (scripts/phoenix_mobile.py) |
| Tiny frozen starter dataset (suite-versioned) | ✅ #453-#455 | frozen cmux usage starter set |
| Capture v2 agent identity | ✅ #461 | scripts/cmux_capture.py |
| Project switcher (cmux ↔ coach, one server, :6043 retires) | ⏳ PHX-LEAD | phx-lead-gen2-kickoff.md item 1 |
| Turn-type taxonomy (USER/ORCHESTRATOR/COMMAND/ASSISTANT/TOOL) | ⏳ PHX-LEAD | kickoff item 2 (fixes contract item 10) |
| Thinking-collapse (default collapsed, global toggle) | ⏳ PHX-LEAD | kickoff item 3 |
| Tool-usage filters | ⏳ PHX-LEAD | kickoff item 4 |
| Mobile-from-work persistence (:6042/:6043 → launchd service) | ⏳ open loop | gen-10 weave open-loop #3 — the SAFE always-on (orc C12) |
Update this table when PHX items land — a stale ✅/⏳ here misleads every future view PR.
suite_version./ui-ux-pro-max, /interaction-design, /html-dashboard for polish.| Skill | Relationship |
|---|---|
| /pr-loop | Visual Self-QA Gate enforces the screenshot-gate on every view PR |
| /never-fabricate | R7 receipts for the visual claims; R9 for dataset counts |
| /skill-creator | Phoenix scripts live in its scripts/; eval datasets follow its RED/GREEN discipline |
| /orc | C12: launchd is the allowed always-on for mobile-from-work persistence |
tools
macOS systems specialist — AppKit NSPanel architecture, launchd services, socket activation, MCP bridge resilience, syspolicyd, and high-frequency SwiftUI dashboards. Use when building menu-bar apps, LaunchAgents, debugging syspolicyd/Gatekeeper/TCC, resilient UDS/MCP bridges, or SwiftUI dashboards at 10Hz+.
development
Bulk LLM-judging protocol for fleet-dispatched verdict runs (KG cluster, eval harness). Use when: dispatching or running judge workers (J1/J2/RT), planning bulk-apply from verdict JSONL, or triaging evidence_degraded outputs. Triggers: judge fleet, bulk judge, R3 verdicts, kg-judge, RT gate, evidence_degraded. NOT for: single-item code review, Phoenix view UX (use phoenix-human-view), or non-judge eval pipelines.
development
Quiet-down protocol for sprint close: when the fleet wraps, delete ALL polling crons and monitors, send ONE final dashboard + ONE message, then go SILENT. Use when: fleet wraps, all workers done, overnight queue exhausted, sprint close, Etan asleep/away with nothing approved left. Triggers: fleet wrap, wrap the fleet, stand down, going quiet, sprint close. NOT for: mid-sprint monitoring (keep your loops), spawning a successor (use /session-handoff first).
development
Brain Drive filing discipline — where every artifact goes + how to name it. Use WHENEVER touching Google Drive / Brain Drive: uploading, creating folders, saving research prompts/results, audits, plans, transcripts, dashboards, or when about to leave a durable artifact in docs.local/. Teaches the numbered folder model (01_STANDARDS / 02_GROUNDING / 03_RESEARCH / 04_INGEST / 06_ARCHIVE), date-prefixed naming, and the rule: FILE durable artifacts in the right Drive folder — docs.local/ is cache-only. NOT for querying Drive via Gemini (use /braindrive) or web research (use /gemini-research); for >100KB heavy archival defer to /google-drive-archive.