skills/ai-agents-architect/SKILL.md
Use when deciding WHETHER to build an AI agent (vs pipeline/chain), choosing an agent architecture pattern (ReAct, Plan-Execute, routing, multi-agent), designing tool schemas for agents, or debugging agent failures (loops, hallucinated tool calls, degraded tool selection). Use when the question is about agent DESIGN, not implementation. NEVER for implementing specific agent frameworks (use agent-development, agents-crewai). NEVER for agent memory design (use agent-memory-systems). NEVER for agent evaluation (use agent-evaluation).
npx skillsauth add sharkitect-solutions/sharkitect-claude-toolkit ai-agents-architectInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Think like an architect who has shipped agents to production and learned that most agent failures are architecture failures — the wrong pattern for the problem, too many tools, no escape hatches. The hardest decision is usually "should this be an agent at all?"
Every agent adds cost you must justify:
| Tax | What It Costs | Typical Impact | |-----|--------------|----------------| | Latency | Each reasoning step = 1-5s LLM call | 5-step task = 5-25s minimum | | Token cost | Reasoning + tool descriptions + history per step | 3-10x vs single LLM call | | Unpredictability | Non-deterministic paths through tools | Same input, different results | | Debuggability | Multi-step traces hard to reproduce | 10x debugging time | | Failure surface | Each step can fail, hallucinate, or loop | Compound failure rates |
Should this be an agent?
│
├─ Is the task STATIC (same steps every time)?
│ └─ YES → Use a deterministic pipeline. No agent needed.
│ (ETL, format conversion, template filling)
│
├─ Does it need CONDITIONAL logic but predictable branches?
│ └─ YES → Use a chain/router. Still no agent.
│ (Classify → route to handler, if/else workflows)
│
├─ Does it need to DISCOVER what to do based on results?
│ └─ YES → This is an agent use case.
│ (Research tasks, debugging, multi-step problem solving)
│
└─ Does it need to ADAPT its plan mid-execution?
└─ YES → This is a strong agent use case.
(Complex reasoning, open-ended exploration)
The brutal truth: 70% of "agent" projects in production are pipelines with an LLM call in the middle. They don't need ReAct loops, tool registries, or memory systems. They need a well-written prompt and a json.loads().
| Pattern | Best For | Avoid When | Typical Steps | |---------|----------|------------|---------------| | ReAct | Exploratory tasks, tool-heavy work | Deterministic sequences, >10 steps | 3-8 | | Plan-Execute | Complex multi-step tasks with clear subgoals | Simple tasks, rapidly changing context | 5-20 | | Routing | Classification → specialized handler | Tasks needing iteration or discovery | 1-2 | | Multi-Agent | Distinct roles with different tool sets | When single agent with role-switching works | Varies | | OODA | Real-time reactive systems, monitoring | Batch processing, one-shot tasks | Continuous |
ReAct (Reason-Act-Observe) is the default pattern but has specific failure modes:
Plan-Execute is better when:
- Task has >5 clear sub-steps
- Steps have dependencies (step 3 needs step 1's output)
- You want human review of the plan before execution
- Failure at step N shouldn't require restarting from step 1
ReAct is better when:
- You don't know how many steps are needed
- Each step's action depends on what you discover
- The task is exploratory (research, debugging)
- Speed matters more than predictability
Tool descriptions matter more than the system prompt. The agent reads tool descriptions at every step to decide which tool to call. Bad descriptions = wrong tool selection = agent failure.
BAD tool description:
"search" - "Searches for things"
GOOD tool description:
"search_knowledge_base" - "Search internal knowledge base for
product documentation and support articles. Returns top 5 matching
documents with relevance scores. Use for: customer questions about
product features, troubleshooting steps, pricing info. Do NOT use
for: general web search, competitor info, real-time data."
The rules:
search_knowledge_base not kb or search| Tools Available | Selection Accuracy | Impact | |----------------|-------------------|--------| | 1-5 | ~95% correct | Reliable | | 6-15 | ~85% correct | Acceptable | | 16-30 | ~65% correct | Frequent wrong tool | | 30+ | ~40% correct | Agent is guessing |
Solutions when you have many tools:
Do you need multiple agents?
│
├─ Do different parts need DIFFERENT tool sets?
│ ├─ YES and tools would conflict → Multi-agent
│ └─ YES but tools are compatible → Single agent, more tools (if <15)
│
├─ Do different parts need DIFFERENT system prompts?
│ ├─ YES, fundamentally different personas → Multi-agent
│ └─ YES, minor tone shifts → Single agent with role-switching
│
├─ Do parts need to run in PARALLEL?
│ ├─ YES → Multi-agent (parallel execution)
│ └─ NO → Likely single agent
│
└─ Is the task DECOMPOSABLE into independent subtasks?
├─ YES, clean boundaries → Multi-agent with orchestrator
└─ NO, tightly coupled → Single agent
| Pattern | How It Works | Failure Mode | |---------|-------------|-------------| | Orchestrator | Central agent delegates to specialists | Orchestrator becomes bottleneck; misunderstands specialist output | | Pipeline | Agent A's output feeds Agent B | No feedback loop; error in A propagates silently | | Debate | Multiple agents critique each other | Converges to consensus mush; tokens explode | | Hierarchical | Manager agents supervise worker agents | Over-engineering; each layer adds latency + cost |
Default to orchestrator pattern. It's the simplest to debug, easiest to extend, and has the clearest failure modes. Only use other patterns when orchestrator demonstrably fails.
| Failure | Symptom | Root Cause | Fix | |---------|---------|------------|-----| | Infinite loop | Agent repeats same action | No loop detection, bad stop condition | Max iterations + action deduplication | | Hallucinated tool call | Agent fabricates tool output without calling it | Tool description unclear, or model confused | Verify tool was actually called in traces | | Tool selection drift | Agent picks wrong tool increasingly | Context window filling with irrelevant history | Summarize history, filter tools per step | | Plan abandonment | Agent ignores its own plan mid-execution | New observation contradicts plan, no replan logic | Explicit replan trigger when observations diverge | | Graceless failure | Agent errors out with no useful output | No fallback, no partial result handling | Return partial results + clear error context | | Silent wrong answer | Agent confidently returns incorrect result | No verification step, no self-check | Add verification tool, structured self-critique |
Every agent MUST have a way to gracefully give up:
After N failed attempts at the same sub-task:
1. Return what you HAVE accomplished (partial results)
2. Explain what you COULDN'T do and why
3. Suggest what a human should do next
4. Do NOT retry the same failing action
Without escape hatches, agents loop until they hit token limits, waste money, and return nothing useful.
| Rationalization | When It Appears | Why It's Wrong | |----------------|-----------------|----------------| | "Let's build an agent for this" | Starting any LLM task | Most tasks are pipelines. Ask "does this need to discover what to do?" first. | | "More tools = more capable" | Designing agent tool set | More tools = worse selection accuracy. 5-10 focused tools beat 30 unfocused ones. | | "We need multiple agents" | Complex task decomposition | Single agent with role-switching handles most cases. Multi-agent adds communication overhead. | | "ReAct handles everything" | Choosing architecture | ReAct breaks on long tasks, precise sequences, and high-branching decisions. Match pattern to task. | | "The agent will figure it out" | Skipping tool description quality | Tool descriptions are the agent's primary decision input. Vague descriptions = random tool selection. |
development
When the user wants help with paid advertising campaigns on Google Ads, Meta (Facebook/Instagram), LinkedIn, Twitter/X, or other ad platforms. Also use when the user mentions 'PPC,' 'paid media,' 'ad copy,' 'ad creative,' 'ROAS,' 'CPA,' 'ad campaign,' 'retargeting,' or 'audience targeting.' This skill covers campaign strategy, ad creation, audience targeting, and optimization.
testing
--- name: using-sharkitect-methodology description: Use when starting any conversation in a Sharkitect workspace OR before any task involving NEW pricing, positioning, proposal, strategy, plan-execution, or schema-design work — mandates invocation of Sharkitect-specific methodology skills (pricing-strategy, marketing-strategy-pmm, smb-cfo, hq-revenue-ops, executing-plans, brainstorming) under the same anti-rationalization discipline as using-superpowers. Documentation has failed 4 times across H
testing
Use when user says 'end session', 'wrap up', 'stop for the day', 'done for today', 'close out', 'save session', 'wrapping up', or invokes /end-session. Runs the full 9-step end-of-session protocol: resource audit, MEMORY.md update, lessons capture, plan status, pending items, workspace checklist, .tmp/ audit, git commit+push, Supabase brain sync, session brief, summary. Final step schedules a detached self-kill of the current session ONLY (3s delay) so the window closes cleanly. Other claude.exe processes (active workspaces) are NOT touched -- orphan cleanup is handled separately by Claude-Orphan-Cleanup-Hourly with proper age safeguards. Do NOT use for: mid-session quick saves (use session-checkpoint), skill syncing (use sync-skills.py), brain memory queries (use supabase-sync.py pull), document freshness reviews (use document-lifecycle), resource gap detection (use resource-auditor).
testing
Remove signs of AI-generated writing from text. Use when editing or reviewing text to make it sound more natural and human-written. Based on Wikipedia's comprehensive "Signs of AI writing" guide. Detects and fixes patterns including: inflated symbolism, promotional language, superficial -ing analyses, vague attributions, em dash overuse, rule of three, AI vocabulary words, passive voice, negative parallelisms, and filler phrases.