plugins/yzmir-llm-specialist/skills/using-llm-specialist/SKILL.md
Use when working on LLM applications — chat / instruct prompting, reasoning models (o-series / Claude extended thinking / DeepSeek-R1 / Gemini thinking / Qwen QwQ), agentic patterns + MCP, RAG, fine-tuning (SFT / DPO / IPO / KTO / SimPO / ORPO / GRPO + LoRA family), context engineering and prompt caching, inference optimization (vLLM / SGLang / TensorRT-LLM), evaluation (incl. LLM-as-judge bias controls and capability suites), or safety (OWASP LLM Top 10 2025). Calibrated to 2026-05 with capability-tier vocabulary (frontier-reasoning / frontier-general / fast-cheap / on-device) instead of hardcoded model IDs. Routes to the right specialist sheet.
npx skillsauth add tachyon-beep/skillpacks using-llm-specialistInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
You are an LLM engineering specialist. This skill routes you to the right specialized reference sheet based on the user's LLM-related task.
This pack is calibrated to the LLM landscape as of 2026-05. Capability tiers — frontier-reasoning, frontier-general, fast-cheap, on-device — are used throughout these sheets in place of hardcoded model IDs. Specific provider model names rotate quarterly; always verify current model IDs in provider docs before pinning a model in production code.
The pack also recognizes a category split that didn't exist in earlier LLM guidance: chat / instruct models vs reasoning models. Prompting rules, evaluation rules, and cost models differ between these. The router below sends you to the right sheet for whichever you're targeting.
Use this skill when the user needs help with:
IMPORTANT: All reference sheets are located in the SAME DIRECTORY as this SKILL.md file.
When this skill is loaded from:
skills/using-llm-specialist/SKILL.md
Reference sheets like prompt-engineering-patterns.md are at:
skills/using-llm-specialist/prompt-engineering-patterns.md
NOT at:
skills/prompt-engineering-patterns.md ← WRONG PATH
When you see a link like [prompt-engineering-patterns.md](prompt-engineering-patterns.md), read the file from the same directory as this SKILL.md.
Reasoning models include the OpenAI o-series (o1, o3, o4-mini and successors), Claude with extended thinking, DeepSeek-R1 and distillations, Google Gemini "thinking" modes, and Qwen QwQ.
If yes, prompting and evaluation rules differ — go to reasoning-models.md before anything else.
Reasoning Models → See reasoning-models.md
Prompt Engineering (chat / instruct models) → See prompt-engineering-patterns.md
Agentic Patterns / Tool Use / MCP → See agentic-patterns-and-mcp.md
Context Engineering / Prompt Caching → See context-engineering-and-prompt-caching.md
Fine-tuning → See llm-finetuning-strategies.md
RAG (Retrieval-Augmented Generation) → See rag-architecture-patterns.md
Evaluation → See llm-evaluation-metrics.md
Context Window Management → See context-window-management.md
Inference Optimization → See llm-inference-optimization.md
Safety & Alignment → See llm-safety-alignment.md
For adversarial-ML threat modeling (attack trees, defense-in-depth, supply-chain risk), see the cross-pack ordis-security-architect.
For production serving (deployment, scaling, monitoring infrastructure), see yzmir-ml-production.
For training (pretraining and fine-tuning at scale), see yzmir-training-optimization.
The top-level Yzmir router for AI/ML work is yzmir-ai-engineering-expert.
User: "My LLM isn't following instructions consistently. How can I improve my prompts?"
Route to: prompt-engineering-patterns.md
User: "I'm using o3 / Claude extended thinking / DeepSeek-R1 — should I use chain-of-thought prompts?"
Route to: reasoning-models.md
User: "I want my LLM to call APIs, retry on failure, and use multiple tools."
Route to: agentic-patterns-and-mcp.md
User: "I'm sending a 50k-token system prompt to every request and it's expensive."
Route to: context-engineering-and-prompt-caching.md
User: "I have 10,000 examples of customer support conversations. Should I fine-tune a model or use prompts?"
Route to: llm-finetuning-strategies.md
User: "I want to build a Q&A system over my company's documentation."
Route to: rag-architecture-patterns.md
User: "How do I measure if my LLM's summaries are good quality? Can I just use GPT-4 as a judge?"
Route to: llm-evaluation-metrics.md
User: "My documents are 50,000 tokens but my model only supports 8k context."
Route to: context-window-management.md, with rag-architecture-patterns.md as secondary.
User: "My LLM inference is too slow. How can I make it faster?"
Route to: llm-inference-optimization.md
User: "Users are trying to jailbreak my LLM. How do I prevent this?"
Route to: llm-safety-alignment.md, then ordis-security-architect for adversarial-ML threat modeling.
User: "I want to send images and PDFs to the model alongside text prompts."
Route to: prompt-engineering-patterns.md for general prompting principles.
Sometimes multiple skills are relevant:
Example: "I'm building a RAG system and need to evaluate retrieval quality."
Example: "I'm building an agent that uses tools and I want to cache the system prompt."
Example: "I'm using a reasoning model and need to evaluate it against a chat baseline."
Example: "My RAG system is slow and I need better generation prompts."
Approach: Start with the primary skill, then reference secondary skills as needed.
| Task | Primary Skill | Common Secondary Skills |
|------|---------------|------------------------|
| Chat / instruct prompting | prompt-engineering-patterns.md | llm-evaluation-metrics.md |
| Reasoning-model prompting | reasoning-models.md | llm-evaluation-metrics.md |
| Tool use / agents / MCP | agentic-patterns-and-mcp.md | prompt-engineering-patterns.md, llm-safety-alignment.md |
| Multimodal | prompt-engineering-patterns.md (general principles; dedicated sheet forthcoming) | — |
| Customize behavior | llm-finetuning-strategies.md | prompt-engineering-patterns.md |
| External knowledge | rag-architecture-patterns.md | context-window-management.md |
| Stable-prefix cost / latency | context-engineering-and-prompt-caching.md | llm-inference-optimization.md |
| Quality measurement | llm-evaluation-metrics.md | — |
| Long documents | context-window-management.md | rag-architecture-patterns.md, context-engineering-and-prompt-caching.md |
| Faster inference | llm-inference-optimization.md | — |
| Safety / security | llm-safety-alignment.md | prompt-engineering-patterns.md, ordis-security-architect |
If task is unclear, ask clarifying questions:
Then route to the most relevant skill.
If a user pushes for a "quick answer" without identifying the model class or the bottleneck, do not freelance. Generic prompting advice will be wrong for reasoning models, will miss caching savings, and will under-evaluate the resulting system. Spend the 30 seconds to route correctly.
This is a meta-skill that routes to specialized LLM engineering skills.
After routing, load the appropriate specialist skill for detailed guidance. The pack contains 10 reference sheets:
Cross-pack references:
yzmir-ai-engineering-expert — top-level AI/ML routeryzmir-ml-production — production serving and deploymentyzmir-training-optimization — large-scale training and fine-tuning infrastructureordis-security-architect — adversarial-ML threat modeling and defense-in-depthWhen multiple skills apply: Start with the primary skill, reference others as needed.
Default approach: Identify model class first (reasoning vs chat / instruct), get prompt right, add complexity only when needed (caching, tools, RAG, fine-tuning, optimization).
tools
Use when designing, implementing, or auditing an MCP (Model Context Protocol) server — tool API design, idempotency under agent retry, structured error envelopes agents can recover from, schema versioning across model drift, transport reliability (stdio / HTTP), output-shape and pagination discipline, and choosing between tools / resources / prompts / sampling. Also use when an MCP server's tools confuse agents, return unstructured errors, deadlock under concurrent calls, double-execute under retry, or lose state across reconnects. Do not use for general REST/GraphQL API design (use `/web-backend`), for client-side prompt engineering or tool-loop design (use `/llm-specialist`), for general in-process plugin architecture (use `/system-architect`), or for cryptographic-provenance audit trails (use `/audit-pipelines`).
development
Use when running **SQLite or DuckDB inside an application process** as the durable store — not as a development convenience but as the production database. Use when scaling an SQLite layer that worked at low concurrency and is now hitting SQLITE_BUSY, WAL bloat, lock contention, schema-migration ceremony, or correctness gaps under multi-process writers. Use when introducing DuckDB as an OLAP complement to an OLTP SQLite store, or when picking between the two for a new component. Pairs with `/web-backend` (the API surface above the DB) and `/audit-pipelines` (when the DB is also the audit trail). Do not load for server databases (Postgres, MySQL), key-value stores, or ORM choice in isolation.
development
Use when designing or critiquing the structure of a staged procedure — a wizard, configuration flow, troubleshooting tree, training curriculum, multi-stage approval pipeline, decision pipeline, or any decomposition of expert work into composable stages. Use for both producer work (build the decomposition) and critic work (audit a proposed decomposition). Use when reasoning about capacity, bottlenecks, or soundness of a procedural flow. Do not use for implementation-plan critique of code changes (use `/axiom-planning` instead), for execution-time dynamics (use `/simulation-foundations`), or for rendering an already-designed procedure as docs or UI (use `/technical-writer` or `/ux-designer`).
testing
Use when the user wants to draft fiction or creative nonfiction prose, get craft critique on prose they have written, or plan story structure, outline, or premise. Workshop-voiced. Three explicit modes (draft, critique, plan) and the router will refuse to begin work without a declared mode.