skills/grill-ai-mastery/SKILL.md
Hybrid interview that probes AI-engineering mastery by tip-vocabulary depth — entity referencing, loop closure, observability, harness improvement — not by token usage or LOC. Start collaborative (two-way tip exchange), escalate to adversarial probing when depth is lacking. Trigger when the user says "interview me on AI", "stress-test my Claude usage", "evaluate this candidate's AI engineering", or otherwise asks for an AI-collab skill assessment. User-only — never auto-invoke.
npx skillsauth add outlinedriven/odin-codex-plugin grill-ai-masteryInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Probe AI mastery by what the subject names, not by how much they generate. The premise from the chat that prompted this skill: token usage and LOC are noise; concrete tip vocabulary (URL-as-entity-ref, loop closure, observability) is signal.
| Skill | Anchor | Posture |
| ------------------------ | ---------------------------------------------- | ----------------------------------------------- |
| grill-ai-mastery | AI-collab tip vocabulary tree (this file) | Hybrid: collaborative → adversarial |
| grill-me | Any plan/design under test | Linear adversarial, recommendation per question |
| request-refactor-plan | A refactor in particular | Adversarial interview specific to refactoring |
This skill is the AI-mastery anchor; grill-me is the domain-agnostic version. Pick by what's being assessed.
Open by asking the subject to name a tip they actually use when collaborating with an LLM. Two-way: surface one of yours back as a counter-tip. The exchange is the assessment, not a quiz. Watch for:
Stay collaborative as long as the depth matches the level the assessment is calibrated to.
Promote to adversarial questioning when any of these signals fire:
In adversarial mode, walk the tip-vocabulary tree:
Recommend per the AskUserQuestion contract below — mark the option with (Recommended) and place it first. The recommendation gives the subject a calibration point without grading on a hidden rubric.
AskUserQuestion tool contract (Claude Code reference)This protocol assumes a single "ask user" tool with the contract below. Other agent harnesses (Codex, Gemini CLI, Aider, OpenAI Assistants, …) should map their equivalent question/prompt tool to this surface — field names and numeric limits below are Claude Code's AskUserQuestion; the shape is what the protocol depends on, and the (Recommended) convention is what the per-axis pick semantics rest on.
Bad shape — never generate this:
Which of these defaults should I override before I lock in the plan?
❯ 1. [ ] Diff-only mode
2. [ ] Include root prompts
3. [ ] system-prompt-baseline.md wins on conflict
4. [ ] Bump plugin manifests
This is a single multiSelect: true question where unticked = "default stands". It collapses four independent axes into one checkbox list. Never generate this shape.
Correct shape — one single-select question per axis:
Q1 — Scope (single-select)
❯ Diff-only mode (Recommended) — propagate only recently-new baseline
Full block alignment — full sweep across all blocks
Q2 — Roots (single-select)
❯ Skip root prompts (Recommended) — derivative artifacts
Include root prompts — also touch ODD/{GENERIC,COMPACTED,MINIMAL}
Q3 — Conflict policy (single-select)
❯ Preserve target policy (Recommended) — non-conflicting only
system-prompt-baseline.md wins — override divergent target policy
Q4 — Manifests (single-select)
❯ Skip bump (Recommended) — sibling-harness scope
Bump minor — semver per system-prompt-baseline.md memory note
Positive routing rule: When the brief calls for the user to rarely have to type, route the intent into N per-axis single-select questions (≤4 per fire) — each axis's (Recommended) option carries the default. Ticking (Recommended) is accepting the default.
Never use multiSelect for axis-with-default override semantics. Reserve multiSelect strictly for additive picks (feature toggles, optional sub-tasks).
Per fire (one tool call):
questions array — minItems: 1, maxItems: 4. All questions in the array render as one batched UI; one user round-trip per fire.Per question:
question — full sentence ending in ?header — short chip label, ≤ 12 charactersmultiSelect — boolean (default false). false = single-pick (mutually exclusive options); true = subset of additive items (feature toggles, optional sub-tasks)options — array, minItems: 2, maxItems: 4Per option:
label — 1-5 words; the chip text the user sees and ticks. Mark the recommended choice by appending (Recommended) to its label and placing it first in the array.description — explanation of the trade-off / consequence; the one-sentence rationale lives here.preview — optional rendered content (markdown, monospace box). Single-select only (tool constraint). Use for visual comparisons (layout mockups, code diffs, file trees); skip when the difference is purely conceptual.Built-in escapes (do not duplicate):
annotations response field.Plan-mode caveat:
ExitPlanMode is for.ai-collab-protocols as a starting point rather than continuing the probe.grill-ai-mastery and grill-me share the grill- prefix and will tab-complete adjacent. Both exist intentionally: grill-me for general design grilling, this skill for AI-collab assessment specifically. Confirm with the user which they meant when invocation is ambiguous.
testing
ODIN's compress-operations dispatcher under the Compressor/Extender role. Invoke on "tidy", "clean up", "tidy this file/memory/workspace/git/docs", or when active context (current file, diff, stack, memory directory) has structural rot to resolve before touching behavior. Detects target domain from context and routes to the sibling skill. Requires explicit target or clear active-context signal — do not invoke speculatively.
development
Cross-domain taste skill — apply distinctive judgment to any artifact (prose, code, design, decisions) instead of converging to AI defaults. Two modes — `audit` (judge work against the two-sided charter and portable anchors) and `anchor` (load register before producing). Auto-detects by phrasing; override via `/taste audit | anchor`. Trigger on "is this slop?", "overkill?", "elegant?", "taste-test this".
tools
One-shot bootstrap of strict-mode tooling per ecosystem plus per-task GOALS.md scaffolding so an agentic loop can self-verify. Writes typechecker/linter/schema-validator config for TS (strict + noUncheckedIndexedAccess + exactOptionalPropertyTypes), Python (Pyright strict, Ruff strict), Rust (Clippy deny-correctness), Go (golangci-lint with staticcheck), OCaml (dune --release); establishes `.agent-tasks/<id>/GOALS.md` per-task convention distinct from project-stable AGENTS.md. C++/Java/Kotlin and framework specifics (Spring Boot, Nest, React-strict) are out of scope. Trigger on new project bootstrap, agentic-task setup, "make this self-verifying", "set the loop's goal", "scaffold goals for this issue". Pairs with `llm-self-loop` runtime.
tools
Install git pre-commit hooks via the project's hook tool — Husky+lint-staged (JS), pre-commit (Python/OCaml), lefthook (Go), cargo-husky (Rust). Use when the user wants commit-time formatting, linting, type-checking, or test gates. Detects ecosystem first.