.claude/skills/llm-app-patterns/SKILL.md
Provides architectural patterns for LLM-powered applications including prompt engineering, RAG, agent loops, and evaluation. Use when building LLM-based features or when the user mentions LLM app architecture, prompt design, or AI system patterns.
npx skillsauth add tranhieutt/software_development_department llm-app-patternsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
| Pattern | Use when | Cost | |---|---|---| | Simple RAG | FAQ, docs Q&A | Low | | Hybrid RAG (semantic + BM25) | Mixed query types | Medium | | Function calling | Structured tool use | Low | | ReAct agent | Multi-step reasoning | Medium | | Plan-and-execute | Complex decomposable tasks | High | | Multi-agent | Research, critique-refine | Very High |
CHUNK_CONFIG = {
"chunk_size": 512, # tokens — sweet spot for most docs
"chunk_overlap": 50, # prevents context loss at boundaries
"separators": ["\n\n", "\n", ". ", " "],
}
# Hybrid search alpha: 1.0=semantic only, 0.0=BM25 only, 0.5=balanced
# Basic: semantic search
results = vector_db.similarity_search(embed(query), top_k=5)
# Better: hybrid (semantic + keyword via RRF)
def hybrid_search(query, alpha=0.5):
return rrf_merge(vector_db.search(query), bm25_search(query), alpha)
# Best for recall: multi-query (3 variations, deduplicate)
queries = llm.generate_variations(query, n=3)
results = deduplicate([semantic_search(q) for q in queries])
RAG_PROMPT = """Answer based ONLY on the context below.
If insufficient, say "I don't have enough information."
Context: {context}
Question: {question}
Answer:"""
messages = [{"role": "user", "content": question}]
while True:
response = llm.chat(messages=messages, tools=TOOLS, tool_choice="auto")
if not response.tool_calls:
return response.content
for call in response.tool_calls:
result = execute_tool(call.name, call.arguments)
messages.append({"role": "tool", "tool_call_id": call.id, "content": str(result)})
def get_or_generate(prompt, model, **kwargs):
deterministic = kwargs.get("temperature", 1.0) == 0
if deterministic:
key = sha256(f"{model}:{prompt}:{json.dumps(kwargs, sort_keys=True)}")
if cached := redis.get(key): return cached
response = llm.generate(prompt, model=model, **kwargs)
if deterministic: redis.setex(key, 3600, response)
return response
from tenacity import retry, wait_exponential, stop_after_attempt
@retry(wait=wait_exponential(multiplier=1, min=4, max=60), stop=stop_after_attempt(5))
def call_llm(prompt): return llm.generate(prompt)
# Fallback chain
for model in [primary] + fallbacks:
try: return llm.generate(prompt, model=model)
except (RateLimitError, APIError): continue
Latency : p50, p99 response time
Quality : satisfaction (thumbs), task completion %, hallucination rate
Cost : cost_per_request, tokens_per_request, cache_hit_rate
Health : error_rate, timeout_rate, retry_rate
| Model | Dims | Cost | Use | |---|---|---|---| | text-embedding-3-small | 1536 | $0.02/1M | Most cases | | text-embedding-3-large | 3072 | $0.13/1M | High accuracy | | bge-large (local) | 1024 | Free | Self-hosted |
testing
Generates high-fidelity architecture diagrams, sequence flows, and component maps for SDD projects. Use when finalizing a design phase, documenting system architecture, or visualizing agentic workflows. Default style: Style 6 (Claude Official).
data-ai
Provides vector database and semantic search patterns for Pinecone, Weaviate, Qdrant, Milvus, and pgvector in RAG and recommendation systems. Use when implementing vector search or when the user mentions vector database, semantic search, embeddings, or similarity search.
development
Updates docs/technical/CODEMAP.md by scanning the current codebase structure. Run after a significant feature merge, refactor, or when CODEMAP feels stale.
development
Unlocks the codebase after a release freeze or incident freeze period to resume normal development. Use when a freeze period ends or when the user mentions unfreezing or lifting the code freeze.