Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

etanhey/phoenix-human-view

Name: phoenix-human-view
Author: etanhey

skills/golem-powers/phoenix-human-view/SKILL.md

npx skillsauth add etanhey/golems phoenix-human-view

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

phoenix-human-view — the human-eval UX contract

Documentation-of-standard. This contract was extracted from ~25 live Etan corrections during the gen-10 Phoenix sprint (gen-10 weave #15, imp10) and is ALREADY ENCODED in shipped code (status table below). Any new Phoenix/eval view MUST satisfy it; any review of one checks against it.

The contract

A human-eval view is a reading surface for a human, not a database admin panel.

Turn-by-turn scrollable REPLAY, not a scorecard. The human reads the conversation as it happened — a CLI-transcript feel — and judges in place. Aggregate scores are secondary chrome, never the primary surface.
Fewer, human-readable columns. Show what a human reads (who said what, when, verdict). Everything else is detail-on-demand.
IDs are hidden but copyable. Session/trace/chunk IDs never occupy reading space — but one tap copies them (debugging needs the ID, reading never does).
Thinking and tool churn collapse by default. Consecutive thinking bundles under one collapsible "💭 Thinking" header, default COLLAPSED, global toggle. Auto-critic flags stay visible even when their turn is collapsed.
Identity chips: repo + agent + model + role. Every session/turn carries its identity as compact chips — who ran, where, on what model, in which role.
Tool filters. Filter sessions/turns by which tools were used.
Tiny frozen starter dataset. Human grading starts on a small FROZEN set (suite-versioned), not a firehose. Frozen = re-gradable = comparable.
Mark-wrong-in-thread. The human flags a wrong turn WHERE THEY READ IT — in the replay — not in a separate form.
Mobile-first. Etan grades from his phone at work. Every view ships working mobile layout, screenshot-verified on both (pr-loop Visual Self-QA Gate: clicked-into desktop + mobile shots before merge).
Turn-type honesty. Local-command artifacts (<local-command-stdout>, <command-name>, harness caveats) are ⚙️ COMMAND turns, never 🧑 USER turns — mislabeling poisons human judgment of "what the user said."

Status — shipped vs pending (verified 2026-06-05)

Paths below are relative to skills/golem-powers/skill-creator/ — the Phoenix pipeline lives inside the skill-creator skill, not at repo root.

| Contract item | State | Where | |---|---|---| | JSONL→trace ingest (source-of-truth transcripts, not screen scrape) | ✅ #462 | scripts/jsonl_to_phoenix_traces.py | | Mobile-first annotation view | ✅ #457 | scripts/phoenix_mobile.py + static/phoenix-mobile/ | | Auto-critic judge write-back + badges | ✅ #458/#459 | scripts/phoenix_auto_critic.py | | Identity chips (repo/agent/model/role) | ✅ #460 | session cards (scripts/phoenix_mobile.py) | | Tiny frozen starter dataset (suite-versioned) | ✅ #453-#455 | frozen cmux usage starter set | | Capture v2 agent identity | ✅ #461 | scripts/cmux_capture.py | | Project switcher (cmux ↔ coach, one server, :6043 retires) | ⏳ PHX-LEAD | phx-lead-gen2-kickoff.md item 1 | | Turn-type taxonomy (USER/ORCHESTRATOR/COMMAND/ASSISTANT/TOOL) | ⏳ PHX-LEAD | kickoff item 2 (fixes contract item 10) | | Thinking-collapse (default collapsed, global toggle) | ⏳ PHX-LEAD | kickoff item 3 | | Tool-usage filters | ⏳ PHX-LEAD | kickoff item 4 | | Mobile-from-work persistence (:6042/:6043 → launchd service) | ⏳ open loop | gen-10 weave open-loop #3 — the SAFE always-on (orc C12) |

Update this table when PHX items land — a stale ✅/⏳ here misleads every future view PR.

Hard rules for builders

Screenshot-gate every view PR: clicked-into desktop + mobile shots → orc → Etan's 👍 before merge (HOLD final design sign-off for Etan — never autonomous).
Never log raw finding/chunk/personal text into datasets (PII-by-log rule, eval-harness collab :286). Synthetic docs are QUARANTINED by suite_version.
/yash-upstream to Arize Phoenix is HOLD-for-Etan — outward-facing.
Apply /ui-ux-pro-max, /interaction-design, /html-dashboard for polish.

Integration

| Skill | Relationship | |---|---| | /pr-loop | Visual Self-QA Gate enforces the screenshot-gate on every view PR | | /never-fabricate | R7 receipts for the visual claims; R9 for dataset counts | | /skill-creator | Phoenix scripts live in its scripts/; eval datasets follow its RED/GREEN discipline | | /orc | C12: launchd is the allowed always-on for mobile-from-work persistence |

etanhey/phoenix-human-view

skills/golem-powers/phoenix-human-view/SKILL.md

The human-eval UX contract for Phoenix views: turn-by-turn scrollable replay (not a scorecard), hide-but-copyable IDs, collapsed thinking, identity chips, tool filters, tiny frozen starter datasets, mark-wrong-in-thread, mobile-first. Use when: building or reviewing ANY Phoenix/eval view, annotation UI, session replay, or human-grading surface. Triggers: phoenix view, eval UI, annotation view, session replay, human eval UX, grading interface. NOT for: Phoenix data pipelines/ingest (capture scripts have their own specs).

3 stars

tools

Updated Jun 7, 2026

$ install --global

skillsauth

npx skillsauth add etanhey/golems phoenix-human-view

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Jun 7, 2026, 3:06 AM11.8s3 files scanned

SKILL.md

name:: phoenix-human-view
description:: The human-eval UX contract for Phoenix views: turn-by-turn scrollable replay (not a scorecard), hide-but-copyable IDs, collapsed thinking, identity chips, tool filters, tiny frozen starter datasets, mark-wrong-in-thread, mobile-first. Use when: building or reviewing ANY Phoenix/eval view, annotation UI, session replay, or human-grading surface. Triggers: phoenix view, eval UI, annotation view, session replay, human eval UX, grading interface. NOT for: Phoenix data pipelines/ingest (capture scripts have their own specs).

phoenix-human-view — the human-eval UX contract

Documentation-of-standard. This contract was extracted from ~25 live Etan corrections during the gen-10 Phoenix sprint (gen-10 weave #15, imp10) and is ALREADY ENCODED in shipped code (status table below). Any new Phoenix/eval view MUST satisfy it; any review of one checks against it.

The contract

A human-eval view is a reading surface for a human, not a database admin panel.

Turn-by-turn scrollable REPLAY, not a scorecard. The human reads the conversation as it happened — a CLI-transcript feel — and judges in place. Aggregate scores are secondary chrome, never the primary surface.
Fewer, human-readable columns. Show what a human reads (who said what, when, verdict). Everything else is detail-on-demand.
IDs are hidden but copyable. Session/trace/chunk IDs never occupy reading space — but one tap copies them (debugging needs the ID, reading never does).
Thinking and tool churn collapse by default. Consecutive thinking bundles under one collapsible "💭 Thinking" header, default COLLAPSED, global toggle. Auto-critic flags stay visible even when their turn is collapsed.
Identity chips: repo + agent + model + role. Every session/turn carries its identity as compact chips — who ran, where, on what model, in which role.
Tool filters. Filter sessions/turns by which tools were used.
Tiny frozen starter dataset. Human grading starts on a small FROZEN set (suite-versioned), not a firehose. Frozen = re-gradable = comparable.
Mark-wrong-in-thread. The human flags a wrong turn WHERE THEY READ IT — in the replay — not in a separate form.
Mobile-first. Etan grades from his phone at work. Every view ships working mobile layout, screenshot-verified on both (pr-loop Visual Self-QA Gate: clicked-into desktop + mobile shots before merge).
Turn-type honesty. Local-command artifacts (<local-command-stdout>, <command-name>, harness caveats) are ⚙️ COMMAND turns, never 🧑 USER turns — mislabeling poisons human judgment of "what the user said."

Status — shipped vs pending (verified 2026-06-05)

Paths below are relative to skills/golem-powers/skill-creator/ — the Phoenix pipeline lives inside the skill-creator skill, not at repo root.

Update this table when PHX items land — a stale ✅/⏳ here misleads every future view PR.

Hard rules for builders

Screenshot-gate every view PR: clicked-into desktop + mobile shots → orc → Etan's 👍 before merge (HOLD final design sign-off for Etan — never autonomous).
Never log raw finding/chunk/personal text into datasets (PII-by-log rule, eval-harness collab :286). Synthetic docs are QUARANTINED by suite_version.
/yash-upstream to Arize Phoenix is HOLD-for-Etan — outward-facing.
Apply /ui-ux-pro-max, /interaction-design, /html-dashboard for polish.

Integration

Related Skills

etanhey/mac-systems

tools

VerifiedTrustedCommunity

macOS systems specialist — AppKit NSPanel architecture, launchd services, socket activation, MCP bridge resilience, syspolicyd, and high-frequency SwiftUI dashboards. Use when building menu-bar apps, LaunchAgents, debugging syspolicyd/Gatekeeper/TCC, resilient UDS/MCP bridges, or SwiftUI dashboards at 10Hz+.

3SKILL.mdUpdated Jun 7, 2026

etanhey/judge-fleet

development

VerifiedTrustedCommunity

Bulk LLM-judging protocol for fleet-dispatched verdict runs (KG cluster, eval harness). Use when: dispatching or running judge workers (J1/J2/RT), planning bulk-apply from verdict JSONL, or triaging evidence_degraded outputs. Triggers: judge fleet, bulk judge, R3 verdicts, kg-judge, RT gate, evidence_degraded. NOT for: single-item code review, Phoenix view UX (use phoenix-human-view), or non-judge eval pipelines.

3SKILL.mdUpdated Jun 7, 2026

etanhey/fleet-wrap

development

VerifiedTrustedCommunity

Quiet-down protocol for sprint close: when the fleet wraps, delete ALL polling crons and monitors, send ONE final dashboard + ONE message, then go SILENT. Use when: fleet wraps, all workers done, overnight queue exhausted, sprint close, Etan asleep/away with nothing approved left. Triggers: fleet wrap, wrap the fleet, stand down, going quiet, sprint close. NOT for: mid-sprint monitoring (keep your loops), spawning a successor (use /session-handoff first).

3SKILL.mdUpdated Jun 7, 2026

etanhey/drive-usage

development

VerifiedTrustedCommunity

Brain Drive filing discipline — where every artifact goes + how to name it. Use WHENEVER touching Google Drive / Brain Drive: uploading, creating folders, saving research prompts/results, audits, plans, transcripts, dashboards, or when about to leave a durable artifact in docs.local/. Teaches the numbered folder model (01_STANDARDS / 02_GROUNDING / 03_RESEARCH / 04_INGEST / 06_ARCHIVE), date-prefixed naming, and the rule: FILE durable artifacts in the right Drive folder — docs.local/ is cache-only. NOT for querying Drive via Gemini (use /braindrive) or web research (use /gemini-research); for >100KB heavy archival defer to /google-drive-archive.

3SKILL.mdUpdated Jun 7, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/etanhey/golems.git

# Copy into Claude Code skills folder (global)
cp -r golems/skills/golem-powers/phoenix-human-view ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

etanhey/golems

3 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT