engineering/ai-ml-engineering/skills/prompt-engineering/SKILL.md
This skill should be used when the user asks about "prompt engineering", "prompt design", "system prompt", "few-shot examples", "chain of thought", "CoT", "zero-shot", "one-shot", "few-shot", "prompt optimization", "prompt iteration", "improve accuracy of LLM", "make the model follow instructions", "structured output", "JSON mode", "function calling", "tool calling", "prompt template", "LangChain", "LlamaIndex", "prompt injection", "jailbreak defense", "RAG prompt", "retrieval augmented generation prompt", "evaluation of prompts", or "why is the LLM not doing what I want". Also trigger for "my prompt gives inconsistent results", "the model ignores my instructions", or "how do I write a better system prompt".
npx skillsauth add harsh040506/claude-code-unified-skill-plugin-library prompt-engineeringInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Design, iterate, and optimize prompts for large language models to achieve consistent, accurate, and production-ready results.
LLMs respond to precise, imperative instructions. Vague requests produce vague results.
| Vague | Specific | |-------|---------| | "Summarize this" | "Summarize this in exactly 3 bullet points, each under 20 words, focusing on action items" | | "Extract information" | "Extract the customer name, order ID, and complaint category. Return ONLY a JSON object matching the schema below" | | "Write a response" | "Write a professional 3-sentence email response acknowledging the complaint, apologizing without admitting liability, and offering a call within 24 hours" |
The model reads top-to-bottom. Structure matters:
Primacy effect: Instructions at the beginning set the frame. Recency effect: Instructions at the end are freshest in context.
Critical constraints → put at END, after the task description. Role and context → put at BEGINNING.
When precision matters, show don't tell. Three well-chosen examples outperform a paragraph of description.
ROLE: You are [specific expert role with domain and context].
TASK: [One paragraph describing the exact task, in imperative voice.]
CONTEXT: [Any background the model needs. Remove anything the model can infer.]
RULES:
1. [Hard constraint — frame as prohibition or requirement]
2. [Hard constraint]
3. [Hard constraint]
OUTPUT FORMAT:
[Exact format. Use code block for JSON schemas or templates.]
EXAMPLES:
Input: [example input 1]
Output: [example output 1]
Input: [example input 2]
Output: [example output 2]
Now process the following:
[{{USER_INPUT}}]
A well-designed role activates relevant knowledge and sets tone.
Format: You are [job title] at [organization type] with [years] of experience [specialization]. You [key behavioral trait].
Examples:
// Generic (bad)
You are a helpful assistant.
// Specific (good)
You are a senior backend engineer at a high-traffic SaaS company with 10 years of experience
in distributed systems. You prioritize correctness and operational safety over cleverness.
// For a support agent
You are a customer support specialist for a B2B software company. You are empathetic, clear,
and solution-oriented. You never promise what you can't deliver and always set accurate expectations.
// For document analysis
You are a paralegal with expertise in commercial contract review. You are precise, cite specific
clauses when relevant, and flag ambiguity rather than guessing intent.
For complex reasoning tasks (math, logic, multi-step analysis), explicit reasoning dramatically improves accuracy.
Simply adding "Think step by step" at the end of a prompt improves reasoning:
User: James has 3 apples. He gives half to Sarah and then buys 4 more. How many does he have?
Without CoT: 5 (often wrong on harder problems)
With CoT prompt:
"Think through this step by step before giving your answer."
Model response:
Step 1: James starts with 3 apples.
Step 2: He gives half to Sarah: 3 / 2 = 1.5. Since we're dealing with whole apples, assume 1 or 2...
[correct reasoning follows]
Show the reasoning pattern, not just the answer:
Example 1:
Question: If a store offers 20% off and then an additional 10% off, what's the total discount?
Reasoning:
- First discount: 100% × 0.80 = 80% of original price
- Second discount: 80% × 0.90 = 72% of original price
- Total: paid 72%, so discount is 28%
Answer: 28% (not 30% — discounts don't add, they compound)
Example 2:
Question: [new question]
Reasoning:
Force structured output for machine-readable responses.
Extract the following from the customer complaint and return ONLY a valid JSON object.
Do not include any text, explanation, or markdown outside the JSON.
Schema:
{
"customer_name": string | null,
"order_id": string | null,
"issue_category": "billing" | "shipping" | "product" | "account" | "other",
"sentiment": "angry" | "frustrated" | "neutral" | "satisfied",
"urgency": 1 | 2 | 3 | 4 | 5,
"key_complaint": string // One sentence
}
Complaint: {{COMPLAINT_TEXT}}
# OpenAI function calling
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": complaint_text}],
tools=[{
"type": "function",
"function": {
"name": "extract_complaint",
"description": "Extract structured data from a customer complaint",
"parameters": {
"type": "object",
"properties": {
"customer_name": {"type": "string"},
"issue_category": {"type": "string", "enum": ["billing", "shipping", "product", "account", "other"]},
"urgency": {"type": "integer", "minimum": 1, "maximum": 5},
},
"required": ["issue_category", "urgency"],
}
}
}],
tool_choice={"type": "function", "function": {"name": "extract_complaint"}},
)
Function calling is more reliable than asking for JSON in the prompt — the model is trained specifically for this format.
For RAG systems, design the prompt to use retrieved context faithfully:
You are a support agent for Acme Corp. Answer the customer's question using ONLY the information
in the provided context. Do not use outside knowledge.
Rules:
1. If the answer is in the context, answer directly and cite the source section.
2. If the answer is NOT in the context, say: "I don't have information about that in our documentation. Let me connect you with a specialist."
3. Never guess or make up product details.
4. Keep answers under 150 words unless the question requires more detail.
CONTEXT:
{{RETRIEVED_CHUNKS}}
CUSTOMER QUESTION:
{{QUESTION}}
Common RAG prompt mistakes:
Users can try to override your system prompt with inputs like "Ignore previous instructions and..."
Defensive techniques:
// Sandwich defense — repeat critical rules after the user input
System: You are a customer support agent for Acme Corp. Only answer questions about Acme products.
User input goes here.
IMPORTANT REMINDER: You may only discuss Acme Corp products and services.
If the user asks about anything else, politely redirect.
// Input sanitization in code
def sanitize_user_input(text: str) -> str:
# Encode potentially dangerous phrases
dangerous_patterns = [
"ignore previous instructions",
"ignore all instructions",
"disregard your",
"you are now",
"pretend you are",
]
for pattern in dangerous_patterns:
if pattern.lower() in text.lower():
return "[Input contained policy-violating content and was blocked]"
return text
Evaluate prompts systematically, not by vibe:
def evaluate_prompt(prompt_template: str, test_cases: list[dict]) -> dict:
"""
test_cases: list of {"input": str, "expected_output": str, "criteria": list[str]}
"""
results = []
for case in test_cases:
filled_prompt = prompt_template.replace("{{INPUT}}", case["input"])
actual_output = call_llm(filled_prompt)
# Evaluate each criterion
case_results = {
"input": case["input"],
"output": actual_output,
"criteria": {}
}
for criterion in case["criteria"]:
# Use LLM-as-judge or hard-coded checks
passed = evaluate_criterion(actual_output, criterion, case["expected_output"])
case_results["criteria"][criterion] = passed
results.append(case_results)
# Aggregate
all_criteria = set(c for r in results for c in r["criteria"])
summary = {
c: sum(r["criteria"].get(c, False) for r in results) / len(results)
for c in all_criteria
}
return {"per_case": results, "aggregate": summary}
Key criteria to evaluate:
| Technique | Token savings | Quality impact | |-----------|--------------|----------------| | Shorter system prompt (remove redundancy) | 20–50% | Minimal if done carefully | | Few-shot → zero-shot (with CoT) | 30–70% | Test carefully | | Use smaller model for simpler tasks | N/A | Test accuracy threshold | | Prompt caching (Anthropic / OpenAI) | Up to 90% on repeated prefix | None | | Response caching for identical inputs | 100% on cache hit | None (verification required) | | Batch requests | Reduces overhead | None |
Always measure: cheaper prompts that produce wrong results cost more in the long run.
For an extensive prompt pattern catalog and optimization recipes, see:
references/prompt-patterns-catalog.md — 50+ annotated prompt patterns covering chain-of-thought, self-consistency, tool use, structured output, and domain-specific templatestesting
Performs quality control on single-cell RNA-seq data (.h5ad or .h5 files) using scverse best practices with MAD-based filtering and comprehensive visualizations. Use when users request QC analysis, filtering low-quality cells, assessing data quality, or following scverse/scanpy best practices for single-cell analysis.
tools
Deep learning for single-cell analysis using scvi-tools. This skill should be used when users need (1) data integration and batch correction with scVI/scANVI, (2) ATAC-seq analysis with PeakVI, (3) CITE-seq multi-modal analysis with totalVI, (4) multiome RNA+ATAC analysis with MultiVI, (5) spatial transcriptomics deconvolution with DestVI, (6) label transfer and reference mapping with scANVI/scArches, (7) RNA velocity with veloVI, or (8) any deep learning-based single-cell method. Triggers include mentions of scVI, scANVI, totalVI, PeakVI, MultiVI, DestVI, veloVI, sysVI, scArches, variational autoencoder, VAE, batch correction, data integration, multi-modal, CITE-seq, multiome, reference mapping, latent space.
testing
This skill should be used when scientists need help with research problem selection, project ideation, troubleshooting stuck projects, or strategic scientific decisions. Use this skill when users ask to pitch a new research idea, work through a project problem, evaluate project risks, plan research strategy, navigate decision trees, or get help choosing what scientific problem to work on. Typical requests include "I have an idea for a project", "I'm stuck on my research", "help me evaluate this project", "what should I work on", or "I need strategic advice about my research".
development
Run nf-core bioinformatics pipelines (rnaseq, sarek, atacseq) on sequencing data. Use when analyzing RNA-seq, WGS/WES, or ATAC-seq data—either local FASTQs or public datasets from GEO/SRA. Triggers on nf-core, Nextflow, FASTQ analysis, variant calling, gene expression, differential expression, GEO reanalysis, GSE/GSM/SRR accessions, or samplesheet creation.