codex/skills/tracepact/SKILL.md
Analyze OpenAI Responses API logs, Agents SDK traces, OpenTelemetry/LangSmith/Langfuse/custom spans, transcripts, and agent-loop code as effectful execution graphs. Use for agentic latency reduction, serial round-trip elimination, prompt-cache stability, model routing, tool-loop optimization, speculative execution validation, or proof-carrying rewrite design. Produces Latency Treaty IR, critical path, counterfactual schedules, prioritized fixes, instrumentation gaps, and CI-ready regression checks.
npx skillsauth add tkersey/dotfiles tracepactInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Agentic latency is not primarily a “slow model” problem. It is a coordination problem among typed effects.
Treat every model call, tool call, retrieval, guardrail, handoff, cache lookup, UI update, human approval, retry, and state rehydration as an effectful operation with a contract:
The skill’s job is to compile the observed run into a Latency Treaty IR: a machine-readable execution contract plus a counterfactual rewrite plan. The output should not merely say “parallelize tools.” It should state which dependency, policy, authority, replay, or quality invariant makes the rewrite legal.
Most trace audits summarize what happened. This skill asks what needed to happen.
For each span, classify it into one of these roles:
The final answer should be a rewrite package: an explanation, a trace-derived treaty, a counterfactual schedule, patch-level implementation guidance, and validation tests.
Accept any of the following:
previous_response_id chains.If timings or token counts are missing, perform a qualitative treaty analysis and mark all savings as inferred. Never invent precise millisecond savings.
Every audit must produce these sections, even if some are short:
Create a normalized span table. Capture fields when available:
span_id, parent_id, trace_id, turn_id, agent, phase, namekind: model, tool, retrieval, guardrail, handoff, router, planner, cache, state, ui, sleep, retry, human, otherstart_ms, end_ms, duration_ms, statusdepends_on: hard dependency span IDs, or inferred dependencies with reasonsconsumed_by: later spans that use this outputuser_visible: none, progress, partial, final, irreversible_actionmodel, reasoning_effort, service_tier, transport, parallel_tool_calls, structured_output, streaming_usedinput_tokens, cached_tokens, output_tokens, reasoning_tokens, tool_schema_tokens, retrieval_tokensprompt_hash, static_prefix_hash, dynamic_suffix_hash, prompt_cache_key, cache_retentiontool_name, tool_args_hash, tool_result_bytes, tool_result_tokensmutability: read_only, idempotent_write, external_write, payment_or_irreversible, unknownusage_policy: copyable, replayable, affine, linear, ephemeral, unknownfreshness_policy: fresh_required, stale_ok, cache_ok, replay_only, unknownauthority_policy: provider/capability/route/approval constraintsquality_policy: schema, eval threshold, confidence, abstention, or review requirementFor each operation, infer an effect treaty:
operation: normalized effect surface
provider_offer: who can satisfy it, at what latency/token/quality/authority cost
morphism_offer: legal adaptation to another provider, schema, tool, model, cache, or deterministic function
route: chosen provider path in the observed run
usage: copyable | replayable | affine | linear | ephemeral | unknown
freshness: fresh_required | replay_ok | cache_ok | speculative_ok | unknown
branch_policy: unrestricted | replay_only | single_live_branch | split_required | no_branch | host_owned | unknown
authority: none | read | scoped_write | user_approval | irreversible | unknown
blocking_reason: data_dependency | safety_gate | authority | freshness | branch_policy | output_schema | latency_accident | none
rewrite_legality: why the faster schedule is safe, or what proof is missing
A treaty is useful only if it distinguishes real constraints from accidental ordering. A database write may be linear and must stay ordered. A read-only retrieval is usually replayable/cacheable and can often move earlier, parallelize, or compress. A model router may be deterministic enough to replace with code. A policy check may remain mandatory but run concurrently or stream its result.
Compute three paths:
For each path, report:
The central question: which spans are causal witnesses for useful user value? Optimize those first.
Produce at least three candidate rewrites when evidence permits:
A counterfactual schedule must identify all changed dependencies:
Before: A(model planner) -> B(tool retrieve) -> C(model summarize) -> D(tool DB read) -> E(model final)
After: A'(small router + query plan) ─┬─ B(retrieve compressed)
├─ D(DB read)
└─ safety/policy precheck
-> E'(final with structured context)
Legality: B and D are read-only, independent, replayable, and their results are both consumed only by E'.
Use these issue families. Each finding must cite evidence from the trace or mark itself as inferred.
ACCIDENTAL_SERIALIZATION: independent model/tool spans are serialized by orchestration rather than data/policy dependency.ROUND_TRIP_AMPLIFICATION: the system turns one decision into repeated model → app → model loops.HANDOFF_LATENCY: subagents exchange summaries rather than capabilities/results, adding turns without new evidence.STATE_REHYDRATION_TAX: every turn reloads history, state, tools, or schemas instead of using stable conversation state or compact capsules.POLLING_SLEEP_GAP: fixed sleeps or polling delays are on the critical path.LATE_USER_VALUE: no useful UI output appears until after invisible internal work.EXCESSIVE_OUTPUT_TOKENS: long generation dominates wall-clock or user-perceived latency.UNNECESSARY_REASONING_MODEL: high-reasoning/planner model used for deterministic, simple, or well-defined work.MISSING_MODEL_ROUTING: no fast path for easy requests.PLANNER_EXECUTOR_OVERKILL: planner call plus executor call where a single structured call or deterministic router would suffice.REPAIR_LOOP: free-form output causes parse/validation retries; structured outputs or tighter schemas would remove retries.DUPLICATE_TOOL_CALL: same or equivalent tool/args called repeatedly without new information.SERIAL_READ_ONLY_TOOLS: read-only tools run one after another despite no dependency.TOOL_RESULT_BLOAT: raw tool/retrieval output is sent to the model instead of a bounded answer slice.OVERBROAD_TOOL_SURFACE: too many tools or large schemas loaded into every call; use tool search, narrower schemas, or a dispatcher.MUTATION_WITHOUT_TREATY: mutating tool lacks explicit authority, idempotency, branch, replay, and rollback semantics.CACHE_PREFIX_VOLATILITY: dynamic content appears before static instructions/tools/examples/schemas.LOW_CACHED_TOKEN_RATIO: long prompts have low cached_tokens / input_tokens.CONTEXT_BLOAT: repeated history, HTML/noisy retrieval, giant screenshots/files, or unconsumed context inflates prefill.TOOL_SCHEMA_TOKEN_TAX: tool schemas consume a material fraction of input tokens but only a few are relevant.GUARDRAIL_IN_WRONG_PLACE: required safety check is late, duplicated, or blocks work that could safely proceed.AUTHORITY_TOO_BROAD: the agent holds more tool authority than the current request needs, forcing slow review or riskier policy checks.SPECULATION_WITHOUT_CANCEL: speculative work is started but cannot be cancelled, bounded, or discarded safely.REPLAY_POLICY_MISSING: cache/replay could reduce latency, but freshness and branch semantics are undefined.When the stack uses OpenAI models or SDKs, inspect:
cached_tokens, select prompt_cache_key granularity, avoid timestamps/randomized JSON/tool-order changes in the prefix.previous_response_id.When the system resembles Ability’s defunctionalized Program/Session architecture, lean into it:
TreatyResolver-like logic as the place to choose direct handling, adapted handling, replay, cache, or rejection before a model or provider call.The strongest rewrite often converts an unstructured agent loop into a typed protocol with explicit provider offers:
Natural-language agent loop -> typed effect surface -> treaty resolver -> provider harness -> journaled replay -> model only at entropy-bearing decisions.
Score each finding:
critical_path_impact: ttfu, final, irreversible, throughput, cost_only, unknownseverity: critical, high, medium, lowexpected_savings: exact measured ms, inferred range, or qualitative low|medium|highconfidence: high, medium, lowimplementation_effort: XS, S, M, L, XLquality_risk: low, medium, highsafety_risk: low, medium, highproof_needed: dependency, authority, freshness, branch, eval, or instrumentation proof requiredrollback: flag, config, route fallback, or kill switchPrioritize by:
critical-path impact × expected savings × confidence ÷ (implementation effort × quality/safety risk)
Every bold rewrite needs a validation plan:
Be blunt and specific. The best report feels like a senior systems reviewer marking up a trace, not a generic LLM prompt critique.
Use language such as:
Avoid vague recommendations such as “optimize prompts,” “use caching,” or “parallelize” without the treaty proof that makes the recommendation legal.
scripts/tracepact.py: normalize JSON/JSONL traces, compute critical-path approximations, infer effect treaties, detect latency pathologies, and emit Markdown/JSON/Graphviz DOT.scripts/openai_instrumentation_snippets.py: print Python/TypeScript snippets for trace fields this skill expects.schemas/normalized_trace.schema.json: event schema for normalized spans.schemas/latency_treaty.schema.json: schema for treaty IR.schemas/latency_report.schema.json: schema for final audit reports.playbooks/: deeper review methods and patch patterns.examples/: sample trace, treaty, and report.Do not recommend removing safety checks. Instead, make them explicit effects and optimize their placement, batching, streaming, or determinism.
Do not recommend speculative execution of externally mutating tools unless idempotency, authority, branch policy, cancellation, and rollback semantics are explicit.
Do not claim exact latency savings without measured timings. Use inferred ranges and state assumptions.
Do not leak private trace contents into third-party tools. Redact secrets, PII, prompts, tool outputs, and customer data before sharing traces outside the trusted environment.
testing
Use before local patching when bugs, regressions, malformed state, crashes, parser failures, migrations, cache drift, protocol problems, compatibility requests, tolerant readers, fallbacks, coercions, retries, catch-and-continue logic, or local workarounds may broaden accepted invalid state.
testing
Use for bug reports, PR/issue prose, reviewer comments, user diagnoses, generated summaries, memories, retrieved context, public tracker context, claimed root causes, proposed fixes, fake-minimal repro risk, or any investigation where natural-language context could anchor the implementation scope.
development
Use when non-trivial work needs Challenge Escalation, latent-intelligence activation, frame-market selection, doctrine operators, dominant-move selection, ablation/surface-tax judgment, reification, review comment law, negative capability, route receipts, or proof-bearing refusal to mutate.
development
Apply Algebra-Driven Design. Use for ADD, denotational design, combinator models, law-driven architecture, domain algebra, property tests, codebase modeling, event sourcing, workflow design, or agentic skill design. If the canonical bundle is unavailable, use this wrapper as the minimal ADD kernel and report the missing bundle path.