skills/agentic-infrastructure-2026/SKILL.md
Build and adopt production AI agent infrastructure in 2026. Covers framework selection (LangGraph, CrewAI, AutoGen, MCP), orchestration patterns, evaluation, observability, memory systems, and tool use. Also covers the SOCIAL dimension: how to sell agent infrastructure internally, change management, measuring ROI, building trust in autonomous systems, and scaling adoption across teams. Activate on: "agent infrastructure", "agent framework comparison", "which agent framework", "sell AI tools internally", "agent adoption", "agent observability", "agent evaluation", "MCP architecture", "agentic mesh", "enterprise AI agents", "AI change management", "agent ROI". NOT for: building specific agents (use ai-engineer), designing agent behavior patterns (use agentic-patterns), prompt tuning (use prompt-engineer).
npx skillsauth add curiositech/windags-skills agentic-infrastructure-2026Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
You are an expert in building, evaluating, and socializing AI agent infrastructure. You understand both the technical landscape (frameworks, protocols, observability) and the organizational challenge (adoption, ROI, trust).
If complex multi-step workflows with conditional branching:
If multi-agent collaboration on shared tasks:
If Microsoft ecosystem/.NET shop:
If need tool interoperability across providers:
If simple assistant with file retrieval:
If single-purpose agent:
Simple: User → Agent → Tool → Response
If multi-step workflow:
LangGraph: User → State Machine → [Tool A → Decision → Tool B] → Response
If team collaboration needed:
Agentic Mesh: User → LangGraph Orchestrator → CrewAI Teams → MCP Tools
If engineering leadership audience:
If product leadership audience:
If security/compliance audience:
If executive leadership audience:
If token usage > 100k/minute:
If multiple teams using agents:
If production deployment:
If single conversation:
If multi-turn session:
If user personalization needed:
If debugging/auditing required:
Symptoms: Team picks LangGraph before defining workflows, gets stuck in configuration hell Detection Rule: If you're reading framework docs before writing requirements, you're here Fix:
Symptoms: Agents spending $2+ per simple task, slow response times, hitting token limits Detection Rule: If MCP tool schemas consume >50% of context before real work, you're here Fix:
Symptoms: Agents failing silently, impossible debugging, no cost visibility Detection Rule: If you're using console.log to debug agent behavior, you're here Fix:
Symptoms: Great demos, no production usage, teams reverting to manual processes Detection Rule: If pilot has been "almost ready" for >3 months, you're here Fix:
Symptoms: $500+ surprise bills, agents in infinite loops, no budget controls Detection Rule: If you don't know your cost-per-task within $1, you're here Fix:
Scenario: Engineering team wants agent to help with code reviews, reduce reviewer burden
Decision Process:
Framework Selection: Complex workflow (read PR → analyze diff → check standards → generate feedback)
Architecture Design:
PR Created → LangGraph State Machine:
├─ Fetch diff (GitHub MCP)
├─ Security scan (if contains auth/secrets)
├─ Style check (if language = Python/JS)
├─ Test coverage (if tests modified)
└─ Generate review comment
Pilot Scope: Start with one repo, non-critical reviews only
ROI Calculation:
Before: 45 min avg per review × $75/hour = $56.25 per review
After: 10 min human + $2.50 agent cost = $14.00 per review
Savings: $42.25 per review × 200 reviews/month = $8,450/month
Infrastructure cost: $1,200/month (LangSmith + compute)
Net savings: $7,250/month
What Novice Misses: Would build complex multi-agent system, skip evaluation pipeline
What Expert Catches: Start simple, measure everything, expand gradually
Outcome: 67% time reduction in review cycle, 89% of agent suggestions accepted by humans
Scenario: Team has AutoGen v0.2 multi-agent research system, needs production reliability
Decision Process:
Migration Trigger: AutoGen conversations unpredictable, hard to debug, no state persistence
Framework Analysis:
Migration Strategy:
Phase 1: Parallel implementation (both systems running)
Phase 2: A/B test same research tasks
Phase 3: Quality comparison (accuracy, cost, reliability)
Phase 4: Full cutover
Key Differences:
AutoGen Pattern:
Agent A: "Here's my analysis"
Agent B: "I disagree because..."
Agent A: "Good point, let me revise..."
(continues until timeout/consensus)
LangGraph Pattern:
State: {question, analyses[], consensus_needed}
Node: Analyst → analysis
Node: Critic → critique
Edge: If critique_score > 0.8 → Consensus, else → Analyst
What Novice Misses: Would rewrite everything at once, no comparison metrics
What Expert Catches: Run systems in parallel, measure quality differences, gradual migration
Outcome: 73% fewer failed research runs, 45% cost reduction, deterministic execution paths
Scenario: 5 engineering teams want agent infrastructure, no central coordination
Decision Process:
Problem: Each team building isolated solutions, duplicated effort, no learning transfer
Solution: Centralized AI Studio providing shared infrastructure
Studio Architecture:
AI Studio provides:
├─ Pre-built MCP servers (GitHub, Jira, Slack, AWS)
├─ Evaluation harness templates
├─ Cost monitoring dashboard
├─ Agent deployment pipeline
└─ Best practices documentation
Teams consume:
├─ SDK for their language/framework
├─ Pre-configured observability
├─ Shared tool protocols
└─ Cost quotas and guardrails
Rollout Strategy:
Success Metrics:
Technical:
- Time to first working agent: 3 days → 1 day
- Code reuse across teams: 0% → 70%
- Infrastructure cost per team: $5k → $1.2k
Organizational:
- Teams actively using agents: 1 → 5
- Cross-team knowledge sharing: Weekly demos
- Executive confidence: Quarterly ROI reports
What Novice Misses: Would let teams build in isolation, reinvent wheels
What Expert Catches: Central platform creates network effects, reduces duplicated learning
Outcome: 5 teams deployed production agents in 6 months, 80% infrastructure code reuse
Technical Infrastructure:
[ ] Framework selected with documented decision criteria (use case fit)
[ ] MCP tool servers configured with lazy loading (< 50% context consumption)
[ ] Observability pipeline operational (LangSmith/Braintrust/Langfuse)
[ ] Evaluation suite covering unit/trajectory/end-to-end testing
[ ] Cost controls active (per-task caps, daily quotas, kill switches)
[ ] Memory architecture documented (working/short-term/long-term boundaries)
Organizational Readiness:
[ ] Pilot scoped to single team, single workflow (not enterprise-wide)
[ ] ROI measurement framework defined with baseline metrics
[ ] Stakeholder communication tailored per audience (eng/product/exec/security)
[ ] Human-in-the-loop approval gates visible and documented
[ ] Success criteria defined with binary pass/fail conditions
[ ] Adoption expansion plan documented (pilot → scale pathway)
Production Readiness:
[ ] Security review completed (PII filtering, audit trails, access controls)
[ ] Error handling documented (retry logic, circuit breakers, escalation)
[ ] Performance benchmarks established (latency SLAs, throughput targets)
[ ] Incident response procedures defined (who gets paged, rollback plan)
This skill is NOT for:
Building specific agent behaviors → Use agentic-patterns instead
Implementing RAG systems or chatbots → Use ai-engineer instead
Prompt optimization and tuning → Use prompt-engineer instead
DAG workflow design → Use windags-architect instead
LLM fine-tuning or model training → Use domain-specific skills
Delegate to other skills when:
agentic-patternsai-engineer + prompt-engineerwindags-architectchange-management (if exists)tools
Building resilient distributed systems with circuit breakers, retries with full-jitter exponential backoff, retry budgets (per-request 3-attempt + per-client 10% ratio per Google SRE), deadline propagation, and the cascading-failure math (4 layers × 3 retries = 64x amplification). Grounded in Resilience4j, Microsoft Cloud Patterns, AWS Architecture Blog (Marc Brooker), and Google SRE Book.
testing
Designing HTTP cache headers that work correctly across browsers, CDNs, and shared proxies — `Cache-Control` directives per RFC 9111, `stale-while-revalidate` and `stale-if-error` per RFC 5861, the Vary header for varying responses, and surrogate keys for tag-based purging. Grounded in IETF RFCs and Cloudflare/Fastly docs.
development
Use when designing or fixing a Content Security Policy on a real site, choosing between nonce-based and hash-based CSP, adding strict-dynamic, debugging "Refused to execute inline script" errors, deploying CSP in report-only mode first, configuring report-to / report-uri, or auditing an existing policy for unsafe-inline / unsafe-eval / wildcards. Triggers: "CSP blocks legitimate inline script", strict-dynamic, nonce-{RANDOM}, sha256-{HASH}, object-src none, base-uri none, frame-ancestors, Trusted Types, X-Content-Security-Policy obsolete, report-only vs enforced. NOT for general HTTP security headers (HSTS, COOP/COEP), Trusted Types deep dive, CORS configuration, or building a WAF.
tools
Choosing and operating an HTTP API versioning strategy that doesn't break clients — Stripe's date-based pinned versions, the Deprecation/Sunset header pair (RFC 9745 + RFC 8594), URI vs header vs media-type approaches, and the version-transformer pattern. Grounded in Stripe's published architecture and IETF RFCs.