skills/inspect-agent-run-with-prefactor-cli/SKILL.md
Use when performing root-cause analysis on a Prefactor agent run — bad output, surprising behavior, high cost, incomplete work, downvotes, or anything worth investigating. Run in the agent's own codebase. User provides agent instance ID (and agent ID if needed).
npx skillsauth add prefactordev/typescript-sdk inspect-agent-run-with-prefactor-cliInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Perform root-cause analysis on a specific agent run using Prefactor span data.
The user has already decided something in a run is wrong or worth investigating. They paste identifiers and a symptom; your job is to analyze the spans and deliver a findings report — what happened, what caused it, and where in the codebase it connects.
Do not prescribe fixes or recommended next actions. The human team owns remediation; your output equips their post-mortem with evidence and source locations.
Typical handoff:
Run this skill from the repository that deployed the agent under investigation — not from the Prefactor SDK repo or an unrelated project. Assume you are running in the correct repo and thus have access to the full codebase of the traces.
Spans show runtime behavior. The agent's codebase holds the prompts, tool implementations, schemas, guardrails, and wiring that explain why the system behaved that way. Root-cause analysis needs both:
When a span points at a tool call, find that tool's handler in this repo. When an LLM span shows a bloated prompt, find where that context is assembled. When schema fields look wrong or missing, find the registered span types here. Human-factor whys often resolve in this codebase even when spans only show the symptom.
If the workspace does not match the agent that produced the run, say so before drawing conclusions — you may be missing the source of truth for prompts, tools, and schemas.
This skill is for real root-cause analysis, not trigger spotting.
Trigger vs root cause
Human factors
Five whys (internal reasoning only)
What automation is good for here
Do not treat span data as a substitute for a post-mortem conversation. Use it to make the conversation sharper.
Collect before starting:
prefactor or a package-manager launcher.PREFACTOR_API_URL / PREFACTOR_API_TOKEN.agent_context.If the user only has an agent ID, list instances and pick the run that matches their time window or description.
Start with the context export for the instance under investigation:
prefactor --profile <profile> agent_instances agent_context <agent-instance-id> --output ./tmp/prefactor-agent-context.json
If agent_context is not available on that profile, use span listing with summaries:
prefactor --profile <profile> agent_spans list \
--agent_instance_id <agent-instance-id> \
--start_time <iso-start> \
--end_time <iso-end> \
--include_summaries
Read spans in execution order. Build a timeline: user input → agent decisions → LLM calls → tool calls → intermediate outputs → final response.
Note where the user-visible outcome diverged from source data. That divergence point is usually near the trigger, not the root cause.
Not every agent delegates work. Some runs are fully self-contained; others spawn subagents, jobs, or linked runs. Treat linked instances as conditional follow-up, not a required step.
Follow linked runs only when span payloads point to another instance that may explain user-visible output, missing output, errors, cost spikes, or incomplete work. Linked instance IDs often appear under keys such as subagent_agent_instance_id, agent_instance_id, jobId, outputs, or result_payload.
When a linked instance looks relevant:
prefactor --profile <profile> agent_instances agent_context <linked-agent-instance-id> --output /tmp/prefactor-linked-agent-context.json
Use the same span-listing fallback when agent_context is unavailable. Stop expanding the graph when additional linked runs no longer change the story.
If no delegation signals appear in spans, report linked instances as none / not applicable and continue analysis on the main instance.
For each item, report findings from specific span IDs or not observed:
Multiple independent faults can coexist. Do not collapse them into a single "the cause."
After span analysis, use five whys as a private thinking method — not as report structure.
Walk the chain internally:
Continue beyond five if the chain is still shallow. Stop when you reach a systemic condition a human would need to change.
Then synthesize: distill each confirmed or strongly inferred conclusion into a named finding in What we found. Include triggers, root causes, and contributing factors as separate findings when they are distinct — but never as a "why #1 → why #2" chain.
When linked instances were inspected, fold that into findings or the timeline; do not add a separate linked-instances section.
Deliver a findings report, not an action plan. Use these exact sections, even when the user asks for a concise answer:
One short paragraph: what happened, what the user experienced, and your verdict on why. No why chain here — just the conclusion a reader needs first.
Concrete issues as a flat list of named findings. Each finding gets:
Include triggers, root causes, and contributing factors as separate findings when they are distinct. Do not output the five-whys chain itself.
Ordered key events with span IDs and one-line summaries. Include linked-instance events here when relevant.
Explicit gaps that set honest expectations: missing context, human/process factors, intent behind prompts or schemas, conclusions that could not be verified because the workspace is not the agent's codebase.
A trimmed table of source locations directly relevant to the findings above — not a fix list, and not an exhaustive tour of the repo. For each entry:
Cover prompt templates, tool handlers, validators, and span schema registration only when a finding depends on them. Use code citations when referencing source.
Ground every technical claim in span IDs, instance IDs, summaries, token counts, payload sizes, or concrete payload/result values.
tools
Use when writing or fixing span summary templates (display templates) on Prefactor span type schemas, when spans show raw JSON or blank summaries in the Prefactor UI, or when you want one-line Liquid summaries of agent, llm, tool, and custom spans.
development
Use when choosing which Prefactor SDK skill to load for agent instrumentation or for building a custom provider integration on top of @prefactor/core.
development
Use when an agent is already instrumented with Prefactor and you need to populate data_risk fields on its span types to enable compliance tracking and data governance.
tools
Use when an existing agent already works without Prefactor and you need to add tracing for runs, llm calls, tool calls, and failures with minimal behavior changes.