skills/prompt-engineering/SKILL.md
Prompt templates, few-shot examples, chain-of-thought, structured output, evals
npx skillsauth add rubicanjr/FinCognis prompt-engineeringInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
SYSTEM_PROMPT = """You are a {role} specialized in {domain}.
## Task
{task_description}
## Rules
{numbered_rules}
## Output Format
{format_spec}
## Examples
{few_shot_examples}
"""
def build_few_shot_prompt(task: str, examples: list[dict], query: str) -> str:
prompt = f"Task: {task}\n\n"
for i, ex in enumerate(examples, 1):
prompt += f"Example {i}:\nInput: {ex['input']}\nOutput: {ex['output']}\n\n"
prompt += f"Now process:\nInput: {query}\nOutput:"
return prompt
# Usage
examples = [
{"input": "The food was great", "output": '{"sentiment": "positive", "confidence": 0.95}'},
{"input": "Terrible service", "output": '{"sentiment": "negative", "confidence": 0.90}'},
{"input": "It was okay", "output": '{"sentiment": "neutral", "confidence": 0.70}'},
]
prompt = build_few_shot_prompt("Classify sentiment as JSON", examples, "Really loved it!")
Analyze this code for security vulnerabilities.
Think step by step:
1. Identify all user inputs
2. Trace each input through the code
3. Check if any input reaches a sensitive operation without sanitization
4. For each vulnerability found, classify severity (critical/high/medium/low)
5. Suggest a fix for each vulnerability
Code:
{code}
import json
from collections import Counter
async def self_consistent_answer(question: str, n_paths: int = 5) -> str:
answers = []
for _ in range(n_paths):
response = await llm.generate(
f"Think step by step and answer: {question}\n\nFinal answer:",
temperature=0.7, # Higher temp for diversity
)
final = extract_final_answer(response)
answers.append(final)
# Majority vote
most_common = Counter(answers).most_common(1)[0][0]
return most_common
from pydantic import BaseModel, Field
from openai import OpenAI
class CodeReview(BaseModel):
issues: list[dict] = Field(description="List of issues found")
severity: str = Field(description="Overall severity: low|medium|high|critical")
summary: str = Field(description="One-line summary")
suggestions: list[str] = Field(description="Improvement suggestions")
client = OpenAI()
response = client.beta.chat.completions.parse(
model="gpt-4o",
messages=[
{"role": "system", "content": "Review code and output structured analysis."},
{"role": "user", "content": f"Review this code:\n```\n{code}\n```"},
],
response_format=CodeReview,
)
review = response.choices[0].message.parsed
<task>Analyze the following error log and extract structured information.</task>
<rules>
- Extract timestamp, severity, service name, and error message
- Classify root cause category
- Output in the specified XML format
</rules>
<input>
{error_log}
</input>
<output_format>
<analysis>
<timestamp>ISO 8601</timestamp>
<severity>ERROR|WARN|FATAL</severity>
<service>service name</service>
<message>error message</message>
<root_cause>category</root_cause>
<suggested_fix>actionable fix</suggested_fix>
</analysis>
</output_format>
class PromptEvaluator:
def __init__(self, test_cases: list[dict]):
self.test_cases = test_cases # [{"input": ..., "expected": ..., "criteria": ...}]
async def evaluate(self, prompt_template: str) -> dict:
results = []
for case in self.test_cases:
prompt = prompt_template.format(**case["input"])
output = await llm.generate(prompt)
score = self._score(output, case["expected"], case.get("criteria", {}))
results.append({"input": case["input"], "output": output, "score": score})
return {
"avg_score": sum(r["score"] for r in results) / len(results),
"pass_rate": sum(1 for r in results if r["score"] >= 0.8) / len(results),
"failures": [r for r in results if r["score"] < 0.8],
}
def _score(self, output: str, expected: str, criteria: dict) -> float:
scores = []
if "contains" in criteria:
scores.append(1.0 if criteria["contains"] in output else 0.0)
if "format" in criteria:
scores.append(1.0 if self._check_format(output, criteria["format"]) else 0.0)
if "max_length" in criteria:
scores.append(1.0 if len(output) <= criteria["max_length"] else 0.0)
return sum(scores) / len(scores) if scores else 0.5
development
Goal-based workflow orchestration - routes tasks to specialist agents based on user goals
tools
Wiring Verification
development
Connection management, room patterns, reconnection strategies, message buffering, and binary protocol design.
development
Screenshot comparison QA for frontend development. Takes a screenshot of the current implementation, scores it across multiple visual dimensions, and returns a structured PASS/REVISE/FAIL verdict with concrete fixes. Use when implementing UI from a design reference or verifying visual correctness.