skills/from-prompt-response-goal-directed-systems/SKILL.md
Design production-grade agentic AI architectures with separated cognition/execution layers, typed tool interfaces, multi-agent topologies, and enterprise hardening. Use when: 'design an agent system', 'build a multi-agent architecture', 'add governance to my AI pipeline', 'harden my LLM agent for production', 'create a tool registry for agents', 'architect agent-to-agent coordination'.
npx skillsauth add ndpvt-web/arxiv-claude-skills from-prompt-response-goal-directed-systemsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill enables Claude to architect goal-directed agentic AI systems by applying the reference architecture from Alenezi (2026). Instead of treating LLM agents as monolithic prompt-response functions, this skill teaches you to decompose agent systems into separated layers — cognitive reasoning, control flow, memory, tool execution, and governance — connected by typed contracts. It covers single-agent loop design, multi-agent topology selection with failure-mode awareness, and a concrete enterprise hardening checklist for production deployment.
The paper's core insight is that production-grade LLM agents must separate cognition (the LLM's reasoning) from control flow (planning, retries, circuit breakers), memory (tiered storage with access control), tool execution (sandboxed, typed, versioned), and governance (policy gates, audit, RBAC). This separation mirrors how web services matured: monolithic CGI scripts gave way to layered architectures with typed APIs, registries, and middleware. The same evolution applies to agents.
The architecture implements a goal-directed loop (perceive → plan → act → reflect) bounded by explicit resource budgets (max steps, token caps, cost limits, time limits). Every side-effecting action passes through a policy enforcement gateway before execution. Tools are not ad-hoc function calls — they are registry entries with typed schemas, version tracking, and sandboxed execution under least privilege. This makes every agent run auditable and reproducible.
For multi-agent systems, the paper provides a topology taxonomy with mapped failure modes. An orchestrator-worker topology risks silent worker failure (mitigated by heartbeats and ACK/NACK). A swarm topology risks herding behavior (mitigated by entropy-preserving incentives). Choosing the right topology is an architectural decision with direct reliability consequences, not a stylistic preference.
Identify the goal structure. Determine whether the task is single-goal (one agent loop suffices) or multi-goal/decomposable (requires multi-agent coordination). Map user intent to explicit goals and constraints using the BDI frame: Beliefs (world state + memory), Desires (goals + policies), Intentions (planned actions).
Design the layered stack. Create these separated layers for each agent:
Define typed tool interfaces. For every tool the agent can invoke, create a schema specifying: input types and required fields, output types, preconditions, idempotency guarantees, version identifier, and required permissions. Treat the tool registry as an API gateway — tools are discoverable, versioned, and access-controlled.
Implement the agent loop with budgeted autonomy. Code the core loop:
initialize state from goal
for step in 1..K_MAX:
context = build_context(state, memory, policies)
action = llm.propose_action(context)
if violates_policy(action): action = repair_or_escalate(action)
if is_tool_call(action):
result = execute_tool(action, sandbox, schema_validated=True)
update_state(result)
write_to_memory(action, result)
elif is_final_answer(action):
return action
check_budget(tokens, cost, time, tool_calls)
return graceful_degradation_response()
Select a multi-agent topology (if multi-agent). Choose based on task structure:
Wire failure-mode mitigations into the architecture. For each topology, implement the specific mitigations:
Implement the governance layer. Apply the enterprise hardening checklist:
Configure memory with access control. Implement tiered memory with PII filtering, retention policies, and policy-aware retrieval. Episodic memory should summarize past interactions indexed by task and time. Semantic memory should use vector stores with access scoping.
Set up CI/CD evaluation. Create a continuous eval pipeline with regression benchmarks, safety tests (prompt injection, adversarial inputs), and schema contract tests for all tool interfaces.
Validate with trace analysis. Run the system end-to-end and verify that every trace contains: the complete action sequence, policy decisions at each step, resource consumption metrics, and a reproducible execution path.
Example 1: Designing a customer support agent system
User: "Design an agent architecture for handling customer support tickets — it needs to read tickets, query our knowledge base, escalate to humans when needed, and log everything for compliance."
Approach:
CLASSIFY → SEARCH_KB → DRAFT_RESPONSE → REVIEW → RESPOND | ESCALATEread_ticket(id) → Ticket, search_kb(query) → Article[], send_response(ticket_id, message) → Ack, escalate(ticket_id, reason) → Acksend_response to verified agent identity, audit trail on every actionOutput structure:
# Tool registry entry example
tool_registry = {
"search_kb": {
"version": "1.2.0",
"input_schema": {"query": "string", "max_results": "int", "filters": "dict?"},
"output_schema": {"articles": "Article[]", "scores": "float[]"},
"preconditions": ["authenticated", "ticket_context_loaded"],
"idempotent": True,
"permissions": ["kb:read"],
"sandbox": "network_restricted"
}
}
Example 2: Choosing a multi-agent topology for a data pipeline
User: "I have an ETL pipeline where different data sources need different extraction logic, then everything gets transformed and loaded. Should this be one agent or multiple?"
Approach:
ExtractorOutput = { records: Record[], source_id: str, schema_version: str, extraction_ts: datetime }
Output: Architecture diagram description + implementation skeleton with router logic, solver registration, and typed message contracts.
Example 3: Hardening an existing agent for production
User: "I have a LangChain agent that works in dev. What do I need before deploying to production?"
Approach:
Output: Prioritized checklist with specific code changes mapped to files in the existing codebase.
Paper: Alenezi, M. (2026). "From Prompt-Response to Goal-Directed Systems: The Evolution of Agentic AI Software Architecture." arXiv:2602.10479v1. https://arxiv.org/abs/2602.10479v1
What to look for: Section 3 for the reference architecture and Algorithm 1 (agent loop pseudocode), Table 1 for the multi-agent failure mode taxonomy, and Table 2 for the complete enterprise hardening checklist with verification methods and evidence requirements.
development
Audit LLM-based automatic short answer grading (ASAG) systems for adversarial vulnerabilities using token-level and prompt-level attack strategies from the GradingAttack framework. Triggers: 'test grading robustness', 'adversarial attack on grading', 'audit LLM grader', 'red-team answer grading', 'ASAG vulnerability assessment', 'grading fairness attack'
development
Build structured information-seeking agents that decompose complex queries into multi-turn search-and-browse workflows, aggregate results from multiple web sources, and return answers in typed structured formats (items, sets, lists, tables). Applies the GISA benchmark's ReAct-based agent architecture and evaluation methodology. Trigger phrases: "build an information-seeking agent", "search agent pipeline", "multi-turn web research agent", "structured web search workflow", "aggregate information from multiple sources", "web research with structured output"
data-ai
Optimize LLM prompts using GFlowPO's iterative generate-evaluate-refine loop with diversity-preserving exploration and dynamic memory. Use when: 'optimize this prompt', 'find a better prompt for this task', 'prompt engineering with examples', 'auto-tune my system prompt', 'improve prompt accuracy', 'generate prompt variations'.
development
Constrain LLM generation with executable Pydantic schemas and multi-agent pipelines to produce structurally valid, domain-rich artifacts. Uses ontology-as-grammar to eliminate hallucinated structures while preserving creative output. Trigger phrases: "generate a valid game design", "schema-constrained generation", "build a multi-agent pipeline with Pydantic validation", "ontology-driven content generation", "structured creative generation with DSPy", "generate artifacts that pass domain validation".