packages/skills/skills/evaluation/SKILL.md
# Evaluation Methods for Agent Systems Evaluate agent systems with multi-dimensional rubrics accounting for non-determinism and multiple valid paths. ## Prerequisites - Understanding of agent architectures - Familiarity with evaluation metrics ## Instructions ### Key Insight: Performance Drivers Research found three factors explain 95% of performance variance: | Factor | Variance | Implication | |--------|----------|-------------| | Token usage | 80% | More tokens = better performance | |
npx skillsauth add mediar-ai/skillhubz packages/skills/skills/evaluationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Evaluate agent systems with multi-dimensional rubrics accounting for non-determinism and multiple valid paths.
Research found three factors explain 95% of performance variance:
| Factor | Variance | Implication | |--------|----------|-------------| | Token usage | 80% | More tokens = better performance | | Tool calls | ~10% | More exploration helps | | Model choice | ~5% | Better models multiply efficiency |
| Dimension | Measures | |-----------|----------| | Factual accuracy | Claims match ground truth | | Completeness | Output covers requested aspects | | Citation accuracy | Citations match sources | | Source quality | Uses appropriate primary sources | | Tool efficiency | Right tools, reasonable count |
Complexity Stratification:
LLM-as-Judge: Scalable, consistent judgments.
Human Evaluation: Catches what automation misses.
End-State Evaluation: For agents that mutate state.
Source: muratcankoylan/Agent-Skills-for-Context-Engineering
tools
# X Twitter Scraper Use Xquik for X/Twitter tweet search, user lookup, profile tweets, follower export, media download, monitors, webhooks, posting workflows, and MCP-backed API exploration. ## Prerequisites - A Xquik API key in `XQUIK_API_KEY`. - Internet access to `https://xquik.com/api/v1`, `https://xquik.com/mcp`, and `https://docs.xquik.com`. - A clear user request that identifies the target tweets, users, accounts, keywords, media, monitor, webhook, or write action. ## Source Truth -
tools
Use when the user says "mk0r", "appmaker CLI", "open a VM", "run something in the sandbox", "talk to the VM agent", "spin up an E2B sandbox", or "chat with appmaker from CLI." Wraps the `mk0r` CLI to list projects, exec commands inside their E2B sandboxes, stream chat with the VM agent (same `/api/chat` the web UI uses), toggle SOAX residential IP, manage schedules, and copy files. Supports a sticky default project via `mk0r projects use`.
testing
Use when the user mentions "influencer candidates", "social media operator", "check proposals on Upwork/Fiverr", "review influencer applications", "qualify candidates", or "reach out to operators". Manages the IG/TikTok account operator hiring pipeline — review applicants, check replies, qualify, and do proactive outreach.
tools
End-to-end newsletter pipeline: investigate recent features, draft, send via API endpoint, and track delivery/open/click metrics.