skills/dev-record/SKILL.md
Record agent activity during Claude Code sessions. Captures plans, human input, agent decisions, and deviations via hooks and agent self-reporting.
npx skillsauth add nakane1chome/claude-skills dev-recordInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Run the action specified by $ARGUMENTS (either setup or status).
Passive recording of agent activity during Claude Code sessions. Captures what was planned, what happened, what was decided, and where the agent deviated.
Note: This is a record-only skill — it stores raw data but does not compute trends or analysis. Consumers (other skills, scripts, or humans) interpret the records.
Auto-loaded hooks: Running
setupregisters hooks directly in the project's.claude/settings.json. No--plugin-dirflag required — hooks fire automatically for every session in the project.
setupInitialize the current project for dev-record.
Run ./install.sh from this skill's directory.
The script:
audit/ops_record/ to .gitignoreCLAUDE.mdaudit/ops_record/ and audit/dev_record/ directoriesaudit/agent-report.sh helper for agent self-reporting.claude/settings.json — hooks fire automatically from this point on, no --plugin-dir neededstatusReport the current state of dev-record in the project.
.gitignore includes audit/ops_record/.json summary files in audit/dev_record/ — report number of recorded sessions.json summary file (by filename sort) and display itPrimary records (captured automatically by hooks):
| Record | Hook Event | What it captures |
|--------|------------|------------------|
| Human input | UserPromptSubmit | Every user prompt verbatim |
| Agent decisions | PreToolUse | Every tool call the agent attempts |
| Decision outcomes | PostToolUse | Whether each tool call succeeded or was denied |
| Plan snapshots | PostToolUse | Transcript path captured when agent exits plan mode |
| Session boundaries | SessionEnd | Session summary with raw counts |
| Hook-detected anomalies | PreToolUse / PostToolUse | stop_ignored, hallucinated_path, repeated_failure, regression_unlabelled (see Limitations below) |
| Plan-vs-actual file diff | SessionEnd | unrecorded_deviation — files in plan but not touched, or files touched but not in plan (see Plan File Diff Detection) |
| Token usage | SessionEnd | Per-session input/output/cache token totals extracted from Claude Code session log |
| Estimated cost | SessionEnd | Computed from token counts and model pricing (labeled estimated_cost_usd) |
| Context compactions | SessionEnd | When the context window was compressed — trigger type and pre-compaction token count |
Secondary metrics (derived from primary records — raw counts, not computed rates):
| Metric | Source |
|--------|--------|
| tool_attempts | Count of PreToolUse events |
| tool_rejections | Count of tool calls denied by the user |
| corrections | User prompts that immediately follow a tool rejection * |
| user_prompts | Total human inputs in the session |
* corrections is estimated. May overcount after benign failures (e.g. a grep with no results) and undercount approach-level corrections where the developer redirects without a tool rejection.
Agent self-reported events (hooks cannot detect intent — the agent must self-report):
| Event | When to report |
|-------|----------------|
| plan_stated | Agent commits to an approach — especially when implementing from a design document or specification, where the plan emerges from reading rather than from formal plan mode |
| plan_deviation | Agent makes a decision that differs from the committed plan |
| declined_difficult | Agent declines work because it would be difficult |
| ignored_prior_failure | Agent discovers a prior failure and chooses not to resolve it |
| scope_creep | Agent performs work not requested and not part of the committed plan |
| observation_misread_as_instruction | Agent acted on something stated as an observation as if it were a directive |
Limitation: Self-reporting is least reliable for the exact situations it's designed to capture. An agent that declines difficult work may rationalize it as "out of scope" rather than flag it. Treat self-reported events as a lower bound, not a complete record. The developer should review sessions and append additional
agent_reportentries for events the agent missed.
Hook detection limitations: Hook-detected anomaly events (
stop_ignored,repeated_failure,regression_unlabelled) usetool_response.success = falseto indicate a failed tool call. In practice,success = falsemeans the user denied the tool call, not that the command exited with a non-zero status. These detectors therefore identify repeated permission denials, not execution failures. Stop-word matching (stop_ignored) uses a fixed word list and will produce false positives for prompts that use these words in a non-imperative context (e.g. "don't worry, proceed").
Session metrics limitations: Token usage, cost, and compaction data are extracted from Claude Code's native session log (
~/.claude/projects/). If the session log is not accessible (e.g. different user, remote execution), these fields will benullin the summary. Theestimated_cost_usduses a hardcoded pricing table and may not reflect current or discounted rates. Compaction events recordpre_tokensbut not the post-compaction count (Claude Code does not log it).
Subagent tracking: Hook payloads do not include agent or subagent identity. Tool calls from subagents appear as normal events, indistinguishable from main-agent calls. Subagent attribution would require Claude Code to add agent context to hook payloads.
When a plan snapshot exists for the session, the SessionEnd hook automatically compares the plan's file list against actual git changes. This catches structural deviations the agent failed to self-report.
How it works:
ExitPlanMode hook (in record-tool-call.sh) saves the plan content to audit/plans/finalize-session.sh parses the plan for a markdown file table (| File | Action |)git diff --name-only to get the files actually changed during the sessionunrecorded_deviation events for:
agent_reports array and in the -events.jsonl artifactLimitations:
ExitPlanMode are not checked)src/bus/bus.hpp while git uses full paths)audit/, .claude/, and common non-source paths are excluded from the unplanned checkDev-record produces two tiers of data, split across two directories:
| Tier | Directory | Contents | Retention |
|------|-----------|----------|-----------|
| Project artifacts | audit/dev_record/ | Session summaries (.json), extracted agent reports and plan snapshots (-events.jsonl) | Permanent — commit to version control |
| Operational detail | audit/ops_record/ | Full event logs (.jsonl) — individual tool calls, tool results, user prompts | Gitignored — subject to retention policy |
Project artifacts are the agent equivalent of design decision records (DDRs) and task conversations in human-led projects. They capture why decisions were made and should live alongside project documentation.
Operational detail is equivalent to ephemeral communication (chat messages, verbal discussions). Useful for debugging and review but not first-order project records. The install action adds audit/ops_record/ to .gitignore.
Event log (operational) — one JSONL file per session at audit/ops_record/<timestamp>-<session-id>.jsonl (e.g. 20260216T143022Z-abc123.jsonl). The YYYYMMDDTHHMMSSZ prefix ensures chronological sorting.
{"timestamp": "ISO8601", "session_id": "str", "type": "user_prompt", "content": {"prompt": "..."}}
{"timestamp": "ISO8601", "session_id": "str", "type": "tool_call", "content": {"tool": "Edit", "input": {}}}
{"timestamp": "ISO8601", "session_id": "str", "type": "tool_result", "content": {"tool": "Edit", "success": true}}
{"timestamp": "ISO8601", "session_id": "str", "type": "plan_snapshot", "content": {"transcript_path": "..."}}
{"timestamp": "ISO8601", "session_id": "str", "type": "agent_report", "content": {"event": "plan_stated", "detail": "..."}}
{"timestamp": "ISO8601", "session_id": "str", "type": "agent_report", "content": {"event": "plan_deviation", "detail": "..."}}
{"timestamp": "ISO8601", "session_id": "str", "type": "compaction", "content": {"trigger": "auto", "pre_tokens": 167238}}
Session summary (project artifact) — written at session end to audit/dev_record/<timestamp>-<session-id>.json.
Extracted events (project artifact) — agent reports, plan snapshots, and compaction events extracted to audit/dev_record/<timestamp>-<session-id>-events.jsonl. Only created if the session contains any of these event types.
{
"session_id": "string",
"started": "ISO8601",
"ended": "ISO8601",
"tool_attempts": 0,
"tool_rejections": 0,
"corrections": 0,
"user_prompts": 0,
"agent_reports": [],
"plan_snapshots": 0,
"model": "claude-opus-4-6",
"token_usage": {
"input_tokens": 0,
"output_tokens": 0,
"cache_read_input_tokens": 0,
"cache_creation_input_tokens": 0
},
"estimated_cost_usd": 0.0,
"compactions": {
"count": 0,
"events": [{"timestamp": "ISO8601", "trigger": "auto", "pre_tokens": 0}]
}
}
The install action appends the following to the project's CLAUDE.md. This instructs the agent to self-report events that hooks cannot detect. The key design principle is log BEFORE proceeding — anchoring the reporting requirement to the decision point rather than relying on the agent remembering to report after the fact.
## Dev Record — Agent Self-Reporting
You MUST log self-report events using the helper script. These events are
extracted to `audit/dev_record/` at session end as permanent project artifacts.
Failure to report is itself a deviation.
**When to report — log BEFORE proceeding, not after:**
- **Plan stated**: You commit to an approach. Log BEFORE you start implementing.
- **Plan deviation**: You decide to skip, change, or add a step. Log BEFORE making the change.
- **Declined as difficult**: You decline work because it would be difficult.
- **Ignored prior failure**: You discover a prior failure and choose not to resolve it.
- **Scope creep**: You perform work not requested and not part of the committed plan.
- **Observation misread as instruction**: You acted on something stated as observation as if it were a directive.
**How to report:**
```bash
bash audit/agent-report.sh "SESSION_ID" "EVENT_TYPE" "brief description"
```
Where EVENT_TYPE is one of: `plan_stated`, `plan_deviation`, `declined_difficult`, `ignored_prior_failure`, `scope_creep`, `observation_misread_as_instruction`
**Decision-point triggers** — if you are about to do any of these, FIRST log a `plan_deviation`:
- Skip a file listed in the plan
- Create a file not listed in the plan
- Use a different approach than what the plan describes
- Change the interface or API from what was planned
Scripts live in hooks/ within this plugin directory. When loaded via --plugin-dir, Claude Code resolves ${CLAUDE_PLUGIN_ROOT}/hooks/ to find them — no copying into the project required.
| Script | Hook Event | Purpose |
|--------|------------|---------|
| record-prompt.sh | UserPromptSubmit | Log human input |
| record-tool-call.sh | PreToolUse | Log agent tool decisions |
| record-tool-result.sh | PostToolUse | Log outcomes, detect plan exits |
| finalize-session.sh | SessionEnd | Extract project artifacts from ops_record to dev_record; plan-vs-actual file diff; session metrics extraction |
| extract-session-metrics.sh | (called by finalize) | Read Claude session log for token usage, cost estimate, compaction events |
All scripts require jq. Each script exits 0 (non-blocking) and appends to JSONL, so concurrent sessions write to separate files without conflict.
Hook paths are resolved at install time and written as absolute paths in .claude/settings.json. If you move the plugin directory, re-run setup to update the paths.
| Goal | Use |
|------|-----|
| Record raw session data (plans, input, decisions) | dev-record |
| Track token usage and estimated cost per session | dev-record (extracted from Claude session logs) |
| Visual session replay with token attribution | claude-devtools (complementary — reads ~/.claude/ logs directly) |
| Analyze trends and measure agent improvement | A project-specific retrospective skill consuming dev-record data |
| Review a document for quality | review-steps, strong-edit |
tools
Helps with things
data-ai
Critical editorial analysis of articles. Examines structure, argument strength, relevance, and readability. Use for substantive editing - challenging what's said and how, not just polish.
testing
Review a document against its SDLC lifecycle context — assess completeness and check consistency with parent documents in the document hierarchy.
testing
Structured review for polishing documents. Fixes language, improves clarity, checks structural consistency, and compares against best practice. Use when a draft has structure but needs a thorough review pass.