SKILLS/agent-orchestration-improve-agent/SKILL.md
Systematic improvement of existing agents through performance analysis, prompt engineering, and continuous iteration.
npx skillsauth add pinkpixel-dev/skills-collection-1 agent-orchestration-improve-agentInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Systematic improvement of existing agents through performance analysis, prompt engineering, and continuous iteration.
[Extended thinking: Agent optimization requires a data-driven approach combining performance metrics, user feedback analysis, and advanced prompt engineering techniques. Success depends on systematic evaluation, targeted improvements, and rigorous testing with rollback capabilities for production safety.]
Comprehensive analysis of agent performance using context-manager for historical data collection.
Use: context-manager
Command: analyze-agent-performance $ARGUMENTS --days 30
Collect metrics including:
Identify recurring patterns in user interactions:
Categorize failures by root cause:
Generate quantitative baseline metrics:
Performance Baseline:
- Task Success Rate: [X%]
- Average Corrections per Task: [Y]
- Tool Call Efficiency: [Z%]
- User Satisfaction Score: [1-10]
- Average Response Latency: [Xms]
- Token Efficiency Ratio: [X:Y]
Apply advanced prompt optimization techniques using prompt-engineer agent.
Implement structured reasoning patterns:
Use: prompt-engineer
Technique: chain-of-thought-optimization
Curate high-quality examples from successful interactions:
Example structure:
Good Example:
Input: [User request]
Reasoning: [Step-by-step thought process]
Output: [Successful response]
Why this works: [Key success factors]
Bad Example:
Input: [Similar request]
Output: [Failed response]
Why this fails: [Specific issues]
Correct approach: [Fixed version]
Strengthen agent identity and capabilities:
Implement self-correction mechanisms:
Constitutional Principles:
1. Verify factual accuracy before responding
2. Self-check for potential biases or harmful content
3. Validate output format matches requirements
4. Ensure response completeness
5. Maintain consistency with previous responses
Add critique-and-revise loops:
Optimize response structure:
Comprehensive testing framework with A/B comparison.
Create representative test scenarios:
Test Categories:
1. Golden path scenarios (common successful cases)
2. Previously failed tasks (regression testing)
3. Edge cases and corner scenarios
4. Stress tests (complex, multi-step tasks)
5. Adversarial inputs (potential breaking points)
6. Cross-domain tasks (combining capabilities)
Compare original vs improved agent:
Use: parallel-test-runner
Config:
- Agent A: Original version
- Agent B: Improved version
- Test set: 100 representative tasks
- Metrics: Success rate, speed, token usage
- Evaluation: Blind human review + automated scoring
Statistical significance testing:
Comprehensive scoring framework:
Task-Level Metrics:
Quality Metrics:
Performance Metrics:
Structured human review process:
Safe rollout with monitoring and rollback capabilities.
Systematic versioning strategy:
Version Format: agent-name-v[MAJOR].[MINOR].[PATCH]
Example: customer-support-v2.3.1
MAJOR: Significant capability changes
MINOR: Prompt improvements, new examples
PATCH: Bug fixes, minor adjustments
Maintain version history:
Progressive deployment strategy:
Quick recovery mechanism:
Rollback Triggers:
- Success rate drops >10% from baseline
- Critical errors increase >5%
- User complaints spike
- Cost per task increases >20%
- Safety violations detected
Rollback Process:
1. Detect issue via monitoring
2. Alert team immediately
3. Switch to previous stable version
4. Analyze root cause
5. Fix and re-test before retry
Real-time performance tracking:
Agent improvement is successful when:
After 30 days of production use:
Establish regular improvement cadence:
Remember: Agent optimization is an iterative process. Each cycle builds upon previous learnings, gradually improving performance while maintaining stability and safety.
testing
When the user wants a full ASO health audit, review their App Store listing quality, or diagnose why their app isn't ranking. Also use when the user mentions "ASO audit", "ASO score", "why am I not ranking", "listing review", or "optimize my app store page". For keyword-specific research, see keyword-research. For metadata writing, see metadata-optimization.
testing
Clarify requirements before implementing. Use when serious doubts arise.
tools
Complete reference and build guide for ASI:One (ASI1) — the AI platform by Fetch.ai built for agentic, Web3-native applications. Use this skill IMMEDIATELY and ALWAYS when the user mentions ASI1, ASI:One, Fetch.ai AI API, building with ASI1, integrating ASI:One, asking about ASI1 models, tool calling with ASI1, ASI1 image generation, ASI1 agentic LLM, Agentverse, uagents, Agent Chat Protocol, structured output with ASI1, or OpenAI-compatible wrappers for ASI1. Also trigger when the user says things like "use ASI1 instead of OpenAI", "build an app with ASI:One", "ASI1 API", or references docs.asi1.ai. This skill covers everything needed to build production apps - setup, all models, all API features, tool calling, image gen, agentic orchestration, structured data, session management, streaming, LangChain integration, uagents / Agent Chat Protocol, and TypeScript/Node.js patterns.
data-ai
When the user wants to analyze their own app's actual performance data from App Store Connect — real downloads, revenue, IAP, subscriptions, trials, or country breakdowns synced via Appeeky Connect. Use when the user asks about "my downloads", "my revenue", "how is my app performing", "ASC data", "sales and trends", "my subscription numbers", "App Store Connect metrics", or wants to compare periods or top markets. For third-party app estimates, see app-analytics. For subscription analytics depth, see monetization-strategy.