Agent Architecture Audit

A diagnostic workflow for agent systems that hide failures behind wrapper layers, stale memory, retry loops, or transport/rendering mutations.

When to Activate

MANDATORY for:

Releasing any agent or LLM-powered application to production
Shipping features with tool calling, memory, or multi-step workflows
Agent behavior degrades after adding wrapper layers
User reports "the agent is getting worse" or "tools are flaky"
Same model works in playground but breaks inside your wrapper
Debugging agent behavior for more than 15 minutes without finding root cause

Especially critical when:

You've added new prompt layers, tool definitions, or memory systems
Different agents in your system behave inconsistently
The model was fine yesterday but is hallucinating today
You suspect hidden repair/retry loops silently mutating responses

Do not use for:

General code debugging — use agent-introspection-debugging
Code review — use language-specific reviewer agents
Security scanning — use security-review or security-review/scan
Agent performance benchmarking — use agent-eval
Writing new features — use the appropriate workflow skill

The 12-Layer Stack

Every agent system has these layers. Any of them can corrupt the answer:

| # | Layer | What Goes Wrong | |---|-------|----------------| | 1 | System prompt | Conflicting instructions, instruction bloat | | 2 | Session history | Stale context injection from previous turns | | 3 | Long-term memory | Pollution across sessions, old topics in new conversations | | 4 | Distillation | Compressed artifacts re-entering as pseudo-facts | | 5 | Active recall | Redundant re-summary layers wasting context | | 6 | Tool selection | Wrong tool routing, model skips required tools | | 7 | Tool execution | Hallucinated execution — claims to call but doesn't | | 8 | Tool interpretation | Misread or ignored tool output | | 9 | Answer shaping | Format corruption in final response | | 10 | Platform rendering | Transport-layer mutation (UI, API, CLI mutates valid answers) | | 11 | Hidden repair loops | Silent fallback/retry agents running second LLM pass | | 12 | Persistence | Expired state or cached artifacts reused as live evidence |

Common Failure Patterns

1. Wrapper Regression

The base model produces correct answers, but the wrapper layers make it worse.

Symptoms:

Model works fine in playground or direct API call, breaks in your agent
Added a new prompt layer, existing behavior degraded
Agent sounds confident but is confidently wrong
"It was working before the last update"

2. Memory Contamination

Old topics leak into new conversations through history, memory retrieval, or distillation.

Symptoms:

Agent brings up unrelated past topics
User corrections don't stick (old memory overwrites new)
Same-session artifacts re-enter as pseudo-facts
Memory grows without bound, degrading response quality over time

3. Tool Discipline Failure

Tools are declared in the prompt but not enforced in code. The model skips them or hallucinates execution.

Symptoms:

"Must use tool X" in prompt, but model answers without calling it
Tool results look correct but were never actually executed
Different tools fight over the same responsibility
Model uses tool when it shouldn't, or skips it when it must

4. Rendering/Transport Corruption

The agent's internal answer is correct, but the platform layer mutates it during delivery.

Symptoms:

Logs show correct answer, user sees broken output
Markdown rendering, JSON parsing, or streaming fragments corrupt valid responses
Hidden fallback agent quietly replaces the answer before delivery
Output differs between terminal and UI

5. Hidden Agent Layers

Silent repair, retry, summarization, or recall agents run without explicit contracts.

Symptoms:

Output changes between internal generation and user delivery
"Auto-fix" loops run a second LLM pass the user doesn't know about
Multiple agents modify the same output without coordination
Answers get "smoothed" or "corrected" by invisible layers

Audit Workflow

Phase 1: Scope

Define what you're auditing:

Target system — what agent application?
Entrypoints — how do users interact with it?
Model stack — which LLM(s) and providers?
Symptoms — what does the user report?
Time window — when did it start?
Layers to audit — which of the 12 layers apply?

Phase 2: Evidence Collection

Gather evidence from the codebase:

Source code — agent loop, tool router, memory admission, prompt assembly
Logs — historical session traces, tool call records
Config — prompt templates, tool schemas, provider settings
Memory files — SOPs, knowledge bases, session archives

Use rg to search for anti-patterns:

# Tool requirements expressed only in prompt text (not code)
rg "must.*tool|必须.*工具|required.*call" --type md

# Tool execution without validation
rg "tool_call|toolCall|tool_use" --type py --type ts

# Hidden LLM calls outside main agent loop
rg "completion|chat\.create|messages\.create|llm\.invoke"

# Memory admission without user-correction priority
rg "memory.*admit|long.*term.*update|persist.*memory" --type py --type ts

# Fallback loops that run additional LLM calls
rg "fallback|retry.*llm|repair.*prompt|re-?prompt" --type py --type ts

# Silent output mutation
rg "mutate|rewrite.*response|transform.*output|shap" --type py --type ts

Phase 3: Failure Mapping

For each finding, document:

Symptom — what the user sees
Mechanism — how the wrapper causes it
Source layer — which of the 12 layers
Root cause — the deepest cause
Evidence — file:line or log:row reference
Confidence — 0.0 to 1.0

Phase 4: Fix Strategy

Default fix order (code-first, not prompt-first):

Code-gate tool requirements — enforce in code, not just prompt text
Remove or narrow hidden repair agents — make fallback explicit with contracts
Reduce context duplication — same info through prompt + history + memory + distillation
Tighten memory admission — user corrections > agent assertions
Tighten distillation triggers — don't compress what shouldn't be compressed
Reduce rendering mutation — pass-through, don't transform
Convert to typed JSON envelopes — structured internal flow, not freeform prose

Severity Model

| Level | Meaning | Action | |-------|---------|--------| | critical | Agent can confidently produce wrong operational behavior | Fix before next release | | high | Agent frequently degrades correctness or stability | Fix this sprint | | medium | Correctness usually survives but output is fragile or wasteful | Plan for next cycle | | low | Mostly cosmetic or maintainability issues | Backlog |

Output Format

Present findings to the user in this order:

Severity-ranked findings (most critical first)
Architecture diagnosis (which layer corrupted what, and why)
Ordered fix plan (code-first, not prompt-first)

Do not lead with compliments or summaries. If the system is broken, say so directly.

Quick Diagnostic Questions

When auditing an agent system, answer these:

| # | Question | If Yes → | |---|----------|----------| | 1 | Can the model skip a required tool and still answer? | Tool not code-gated | | 2 | Does old conversation content appear in new turns? | Memory contamination | | 3 | Is the same info in system prompt AND memory AND history? | Context duplication | | 4 | Does the platform run a second LLM pass before delivery? | Hidden repair loop | | 5 | Does the output differ between internal generation and user delivery? | Rendering corruption | | 6 | Are "must use tool X" rules only in prompt text? | Tool discipline failure | | 7 | Can the agent's own monologue become persistent memory? | Memory poisoning |

Anti-Patterns to Avoid

Avoid blaming the model before falsifying wrapper-layer regressions.
Avoid blaming memory without showing the contamination path.
Do not let a clean current state erase a dirty historical incident.
Do not treat markdown prose as a trustworthy internal protocol.
Do not accept "must use tool" in prompt text when code never enforces it.
Keep findings direct, evidence-backed, and severity-ranked.

Report Schema

Audits should produce structured reports following this shape:

{
  "schema_version": "ecc.agent-architecture-audit.report.v1",
  "executive_verdict": {
    "overall_health": "high_risk",
    "primary_failure_mode": "string",
    "most_urgent_fix": "string"
  },
  "scope": {
    "target_name": "string",
    "model_stack": ["string"],
    "layers_to_audit": ["string"]
  },
  "findings": [
    {
      "severity": "critical|high|medium|low",
      "title": "string",
      "mechanism": "string",
      "source_layer": "string",
      "root_cause": "string",
      "evidence_refs": ["file:line"],
      "confidence": 0.0,
      "recommended_fix": "string"
    }
  ],
  "ordered_fix_plan": [
    { "order": 1, "goal": "string", "why_now": "string", "expected_effect": "string" }
  ]
}

Related Skills

agent-introspection-debugging — Debug agent runtime failures (loops, timeouts, state errors)
agent-eval — Benchmark agent performance head-to-head
security-review — Security audit for code and configuration
autonomous-agent-harness — Set up autonomous agent operations
agent-harness-construction — Build agent harnesses from scratch

Agent Architecture Audit

A diagnostic workflow for agent systems that hide failures behind wrapper layers, stale memory, retry loops, or transport/rendering mutations.

When to Activate

MANDATORY for:

Releasing any agent or LLM-powered application to production
Shipping features with tool calling, memory, or multi-step workflows
Agent behavior degrades after adding wrapper layers
User reports "the agent is getting worse" or "tools are flaky"
Same model works in playground but breaks inside your wrapper
Debugging agent behavior for more than 15 minutes without finding root cause

Especially critical when:

You've added new prompt layers, tool definitions, or memory systems
Different agents in your system behave inconsistently
The model was fine yesterday but is hallucinating today
You suspect hidden repair/retry loops silently mutating responses

Do not use for:

General code debugging — use agent-introspection-debugging
Code review — use language-specific reviewer agents
Security scanning — use security-review or security-review/scan
Agent performance benchmarking — use agent-eval
Writing new features — use the appropriate workflow skill

The 12-Layer Stack

Every agent system has these layers. Any of them can corrupt the answer:

Common Failure Patterns

1. Wrapper Regression

The base model produces correct answers, but the wrapper layers make it worse.

Symptoms:

Model works fine in playground or direct API call, breaks in your agent
Added a new prompt layer, existing behavior degraded
Agent sounds confident but is confidently wrong
"It was working before the last update"

2. Memory Contamination

Old topics leak into new conversations through history, memory retrieval, or distillation.

Symptoms:

Agent brings up unrelated past topics
User corrections don't stick (old memory overwrites new)
Same-session artifacts re-enter as pseudo-facts
Memory grows without bound, degrading response quality over time

3. Tool Discipline Failure

Tools are declared in the prompt but not enforced in code. The model skips them or hallucinates execution.

Symptoms:

"Must use tool X" in prompt, but model answers without calling it
Tool results look correct but were never actually executed
Different tools fight over the same responsibility
Model uses tool when it shouldn't, or skips it when it must

4. Rendering/Transport Corruption

The agent's internal answer is correct, but the platform layer mutates it during delivery.

Symptoms:

Logs show correct answer, user sees broken output
Markdown rendering, JSON parsing, or streaming fragments corrupt valid responses
Hidden fallback agent quietly replaces the answer before delivery
Output differs between terminal and UI

5. Hidden Agent Layers

Silent repair, retry, summarization, or recall agents run without explicit contracts.

Symptoms:

Output changes between internal generation and user delivery
"Auto-fix" loops run a second LLM pass the user doesn't know about
Multiple agents modify the same output without coordination
Answers get "smoothed" or "corrected" by invisible layers

Audit Workflow

Phase 1: Scope

Define what you're auditing:

Target system — what agent application?
Entrypoints — how do users interact with it?
Model stack — which LLM(s) and providers?
Symptoms — what does the user report?
Time window — when did it start?
Layers to audit — which of the 12 layers apply?

Phase 2: Evidence Collection

Gather evidence from the codebase:

Source code — agent loop, tool router, memory admission, prompt assembly
Logs — historical session traces, tool call records
Config — prompt templates, tool schemas, provider settings
Memory files — SOPs, knowledge bases, session archives

Use rg to search for anti-patterns:

# Tool requirements expressed only in prompt text (not code)
rg "must.*tool|必须.*工具|required.*call" --type md

# Tool execution without validation
rg "tool_call|toolCall|tool_use" --type py --type ts

# Hidden LLM calls outside main agent loop
rg "completion|chat\.create|messages\.create|llm\.invoke"

# Memory admission without user-correction priority
rg "memory.*admit|long.*term.*update|persist.*memory" --type py --type ts

# Fallback loops that run additional LLM calls
rg "fallback|retry.*llm|repair.*prompt|re-?prompt" --type py --type ts

# Silent output mutation
rg "mutate|rewrite.*response|transform.*output|shap" --type py --type ts

Phase 3: Failure Mapping

For each finding, document:

Symptom — what the user sees
Mechanism — how the wrapper causes it
Source layer — which of the 12 layers
Root cause — the deepest cause
Evidence — file:line or log:row reference
Confidence — 0.0 to 1.0

Phase 4: Fix Strategy

Default fix order (code-first, not prompt-first):

Code-gate tool requirements — enforce in code, not just prompt text
Remove or narrow hidden repair agents — make fallback explicit with contracts
Reduce context duplication — same info through prompt + history + memory + distillation
Tighten memory admission — user corrections > agent assertions
Tighten distillation triggers — don't compress what shouldn't be compressed
Reduce rendering mutation — pass-through, don't transform
Convert to typed JSON envelopes — structured internal flow, not freeform prose

Severity Model

Output Format

Present findings to the user in this order:

Severity-ranked findings (most critical first)
Architecture diagnosis (which layer corrupted what, and why)
Ordered fix plan (code-first, not prompt-first)

Do not lead with compliments or summaries. If the system is broken, say so directly.

Quick Diagnostic Questions

When auditing an agent system, answer these:

Anti-Patterns to Avoid

Avoid blaming the model before falsifying wrapper-layer regressions.
Avoid blaming memory without showing the contamination path.
Do not let a clean current state erase a dirty historical incident.
Do not treat markdown prose as a trustworthy internal protocol.
Do not accept "must use tool" in prompt text when code never enforces it.
Keep findings direct, evidence-backed, and severity-ranked.

Report Schema

Audits should produce structured reports following this shape:

{
  "schema_version": "ecc.agent-architecture-audit.report.v1",
  "executive_verdict": {
    "overall_health": "high_risk",
    "primary_failure_mode": "string",
    "most_urgent_fix": "string"
  },
  "scope": {
    "target_name": "string",
    "model_stack": ["string"],
    "layers_to_audit": ["string"]
  },
  "findings": [
    {
      "severity": "critical|high|medium|low",
      "title": "string",
      "mechanism": "string",
      "source_layer": "string",
      "root_cause": "string",
      "evidence_refs": ["file:line"],
      "confidence": 0.0,
      "recommended_fix": "string"
    }
  ],
  "ordered_fix_plan": [
    { "order": 1, "goal": "string", "why_now": "string", "expected_effect": "string" }
  ]
}

Related Skills

agent-introspection-debugging — Debug agent runtime failures (loops, timeouts, state errors)
agent-eval — Benchmark agent performance head-to-head
security-review — Security audit for code and configuration
autonomous-agent-harness — Set up autonomous agent operations
agent-harness-construction — Build agent harnesses from scratch

Adoption

affaan-m/agent-architecture-audit

$ install --global

Security Scan Results

SKILL.md

Agent Architecture Audit

When to Activate

The 12-Layer Stack

Common Failure Patterns

1. Wrapper Regression

2. Memory Contamination

3. Tool Discipline Failure

4. Rendering/Transport Corruption

5. Hidden Agent Layers

Audit Workflow

Phase 1: Scope

Phase 2: Evidence Collection

Phase 3: Failure Mapping

Phase 4: Fix Strategy

Severity Model

Output Format

Quick Diagnostic Questions

Anti-Patterns to Avoid

Report Schema

Related Skills

Related Skills

affaan-m/dynamic-workflow-mode

affaan-m/react-testing

affaan-m/react-performance

affaan-m/react-patterns

affaan-m/agent-architecture-audit

$ install --global

Security Scan Results

SKILL.md

Agent Architecture Audit

When to Activate

The 12-Layer Stack

Common Failure Patterns

1. Wrapper Regression

2. Memory Contamination

3. Tool Discipline Failure

4. Rendering/Transport Corruption

5. Hidden Agent Layers

Audit Workflow

Phase 1: Scope

Phase 2: Evidence Collection

Phase 3: Failure Mapping

Phase 4: Fix Strategy

Severity Model

Output Format

Quick Diagnostic Questions

Anti-Patterns to Avoid

Report Schema

Related Skills

Related Skills

affaan-m/dynamic-workflow-mode

affaan-m/react-testing

affaan-m/react-performance

affaan-m/react-patterns