skills/ai-engineer/SKILL.md
Build production-ready LLM applications, advanced RAG systems, and intelligent agents. Implements vector search, multimodal AI, agent orchestration, and enterprise AI integrations. Use PROACTIVELY for LLM features, chatbots, AI agents, or AI-powered applications.
npx skillsauth add curiositech/windags-skills ai-engineerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Expert in building production-ready LLM applications, from simple chatbots to complex multi-agent systems. Specializes in RAG architectures, vector databases, prompt management, and enterprise AI deployments.
Query Type Assessment:
├── Simple FAQ/Knowledge Lookup
│ ├── Document Count < 1000 → Chroma + text-embedding-3-small
│ └── Document Count > 1000 → Pinecone + text-embedding-3-large
├── Technical/Code Documentation
│ ├── Budget Constrained → bge-large + pgvector
│ └── Performance Critical → voyage-2 + Weaviate
└── Conversational/Multi-turn
├── Memory Required → Agent pattern + context management
└── Stateless → Standard RAG pipeline
Reranking Decision:
├── Precision Critical (legal, medical) → Always use Cohere Rerank
├── Latency < 200ms → Skip reranking, tune retrieval
├── Budget Constrained → Cross-encoder (bge-reranker-large)
└── Default → Cohere Rerank with top-10 → top-3
Database Selection:
├── Existing Postgres → pgvector extension
├── Need Hybrid Search → Weaviate or Qdrant
├── Managed Service → Pinecone
└── Self-hosted/Local → Chroma or Qdrant
Complexity Assessment:
├── Keywords Only (FAQ) → Claude Haiku
├── Single Document Reference → Claude Sonnet
├── Multi-document Synthesis → Claude Opus
└── Code Generation → Claude Sonnet with tools
Token Budget Check:
├── < 1K tokens → Any model
├── 1K-4K tokens → Sonnet/GPT-4
├── 4K-32K tokens → Claude Opus
└── > 32K tokens → Chunk and summarize first
Task Classification:
├── Static Knowledge Query → Pure RAG
├── Need External APIs → Agent with tools
├── Multi-step Reasoning → Agent with planning
├── Real-time Data Required → Agent with live tools
└── Simple Q&A → RAG with fallback to agent
Symptoms: Good retrieval precision but poor answer relevance, users say "close but not quite right" Detection Rule: If semantic similarity > 0.8 but user satisfaction < 60% Root Cause: Query and document embeddings optimized for different semantic spaces Fix: Switch to domain-specific embedding model or implement query expansion with synonyms
Symptoms: Responses become generic, model ignores specific retrieved context, inconsistent answers Detection Rule: If context utilization ratio < 30% and response generality score > 0.7 Root Cause: Too many irrelevant chunks diluting relevant information Fix: Implement stricter relevance threshold (>0.8) and dynamic context selection
Symptoms: Agent makes up API calls, references non-existent functions, infinite retry cycles Detection Rule: If tool call success rate < 50% or iteration count > max_iterations * 0.8 Root Cause: Model trained on different tool schemas than implementation Fix: Add tool validation layer and explicit error handling in agent system prompt
Symptoms: Gradual decline in retrieval quality over time, seasonal performance drops Detection Rule: If monthly average retrieval@5 drops > 10% from baseline Root Cause: Domain language evolves but embedding model remains static Fix: Implement embedding model retraining pipeline or switch to adaptive embeddings
Symptoms: P95 latency increases gradually, user complaints about slow responses Detection Rule: If P95 response time > 2x baseline for 7 consecutive days Root Cause: Vector index degradation, context size inflation, or model endpoint saturation Fix: Implement index optimization schedule, context pruning, and multi-model load balancing
Initial Requirements: "Build a chatbot that can answer questions about our 500-page product documentation"
Step 1: Architecture Decision
Step 2: Implementation Walkthrough
// Novice approach - would use basic similarity search
const chunks = await vectorDb.query(queryEmbedding, { topK: 5 });
// Expert approach - considers relevance thresholds
const rawChunks = await vectorDb.query(queryEmbedding, {
topK: 20,
threshold: 0.7 // Ensure minimum relevance
});
// Expert adds reranking step novice would skip
const reranked = await reranker.rank(query, rawChunks);
const finalChunks = reranked.slice(0, 3);
// Expert includes fallback handling
if (finalChunks.length === 0) {
return await fallbackToGeneralSupport(query);
}
Step 3: Performance Optimization Discovery
Step 4: Failure Scenario Handling
Final Architecture: Pinecone + local reranker + agent escalation = 89% automation rate at 2.1s P95
Do NOT use this skill for:
Prompt Engineering Tasks → Use prompt-engineer instead
ML Model Training/Fine-tuning → Use ml-engineer instead
Data Pipeline Engineering → Use data-pipeline-engineer instead
Infrastructure/DevOps → Use backend-architect instead
Analytics and Monitoring Setup → Use chatbot-analytics instead
Delegate When:
ml-engineerprompt-engineerbackend-architectchatbot-analyticstools
Building resilient distributed systems with circuit breakers, retries with full-jitter exponential backoff, retry budgets (per-request 3-attempt + per-client 10% ratio per Google SRE), deadline propagation, and the cascading-failure math (4 layers × 3 retries = 64x amplification). Grounded in Resilience4j, Microsoft Cloud Patterns, AWS Architecture Blog (Marc Brooker), and Google SRE Book.
testing
Designing HTTP cache headers that work correctly across browsers, CDNs, and shared proxies — `Cache-Control` directives per RFC 9111, `stale-while-revalidate` and `stale-if-error` per RFC 5861, the Vary header for varying responses, and surrogate keys for tag-based purging. Grounded in IETF RFCs and Cloudflare/Fastly docs.
development
Use when designing or fixing a Content Security Policy on a real site, choosing between nonce-based and hash-based CSP, adding strict-dynamic, debugging "Refused to execute inline script" errors, deploying CSP in report-only mode first, configuring report-to / report-uri, or auditing an existing policy for unsafe-inline / unsafe-eval / wildcards. Triggers: "CSP blocks legitimate inline script", strict-dynamic, nonce-{RANDOM}, sha256-{HASH}, object-src none, base-uri none, frame-ancestors, Trusted Types, X-Content-Security-Policy obsolete, report-only vs enforced. NOT for general HTTP security headers (HSTS, COOP/COEP), Trusted Types deep dive, CORS configuration, or building a WAF.
tools
Choosing and operating an HTTP API versioning strategy that doesn't break clients — Stripe's date-based pinned versions, the Deprecation/Sunset header pair (RFC 9745 + RFC 8594), URI vs header vs media-type approaches, and the version-transformer pattern. Grounded in Stripe's published architecture and IETF RFCs.