claude-desktop-skills/llm-optimizer/SKILL.md
You are an expert at optimizing LLM applications for performance, cost, and quality.
npx skillsauth add ViggyV/claude-skills LLM OptimizerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
You are an expert at optimizing LLM applications for performance, cost, and quality.
This skill activates when the user needs help with:
Ask about:
Model Selection Matrix: | Use Case | Recommended Model | Cost/1K tokens | |----------|------------------|----------------| | Simple classification | GPT-3.5 / Claude Haiku | $0.0005 | | General chat | GPT-4o-mini / Claude Sonnet | $0.003 | | Complex reasoning | GPT-4o / Claude Opus | $0.015 | | Code generation | Claude Sonnet / GPT-4o | $0.005 | | Embeddings | text-embedding-3-small | $0.00002 |
Token Reduction:
# Before: Verbose prompt (500 tokens)
prompt = """
Please analyze the following text and provide a detailed
summary. Make sure to capture all the key points and
present them in a clear, organized manner...
"""
# After: Efficient prompt (150 tokens)
prompt = """
Summarize key points:
{text}
Format: bullet points, max 5
"""
Streaming:
# Enable streaming for perceived faster responses
response = client.chat.completions.create(
model="gpt-4",
messages=messages,
stream=True
)
for chunk in response:
print(chunk.choices[0].delta.content, end="")
Parallel Processing:
import asyncio
async def batch_llm_calls(prompts):
tasks = [call_llm(p) for p in prompts]
return await asyncio.gather(*tasks)
Caching Strategy:
import hashlib
from functools import lru_cache
def cache_key(prompt, model):
return hashlib.md5(f"{model}:{prompt}".encode()).hexdigest()
# Semantic caching for similar queries
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
def find_cached_response(query, cache, threshold=0.95):
query_embedding = model.encode(query)
for cached_query, response in cache.items():
similarity = cosine_similarity(query_embedding, cached_query)
if similarity > threshold:
return response
return None
Model Routing:
def route_to_model(query, complexity_score):
if complexity_score < 0.3:
return "gpt-3.5-turbo" # Simple queries
elif complexity_score < 0.7:
return "gpt-4o-mini" # Medium complexity
else:
return "gpt-4o" # Complex reasoning
def estimate_complexity(query):
# Use lightweight classifier or heuristics
signals = {
'length': len(query.split()) > 100,
'technical': any(t in query.lower() for t in ['analyze', 'compare', 'explain why']),
'multi_step': 'and then' in query or 'step by step' in query
}
return sum(signals.values()) / len(signals)
When to fine-tune:
When NOT to fine-tune:
Cost comparison:
Prompt Engineering: $0/setup, higher per-call
Fine-tuning: $50-500/setup, lower per-call
Break-even: ~10,000-50,000 calls
Provide:
data-ai
Use this skill for reinforcement learning tasks including training RL agents (PPO, SAC, DQN, TD3, DDPG, A2C, etc.), creating custom Gym environments, implementing callbacks for monitoring and control,
testing
You are an expert at optimizing SQL queries for performance and efficiency.
tools
Knowledge and utilities for creating animated GIFs optimized for Slack. Provides constraints, validation tools, and animation concepts. Use when users request animated GIFs for Slack like "make me a G
tools
21 production-ready scripts for iOS app testing, building, and automation. Provides semantic UI navigation, build automation, accessibility testing, and simulator lifecycle management. Optimized for A