
Live benchmark protocol for Switchboard's MCP server. Runs real tool-calling sequences against enabled integrations, tracks failure metrics, and identifies impediments to successful LLM tool usage. Use when: "benchmark", "test the MCP", "run user stories", "smoke test integrations", after adding/changing integrations or tools, after changing compaction specs or search logic, before releases. Not for unit testing (use make test) or load testing.
Improve an existing Switchboard integration adapter's LLM usability — tool description enrichment, field compaction refinement, and response tuning. Use when: "optimize integration", "improve tool descriptions", "extend compaction", "make integration better for LLMs", after user story mapping, or when an LLM is making wrong tool choices or passing wrong IDs. Not for adding new integrations (use add-integration) or fixing bugs.
Submit a PR review as inline GitHub comments on specific files and lines using the gh CLI.
Use when adding a new external API integration to Switchboard, scaffolding an integration adapter, or deciding between SDK vs raw HTTP for a new service. Not for modifying existing integrations or fixing bugs in current adapters.
Review a GitHub pull request for the Switchboard Go MCP server project. Enforces idiomatic Go, project conventions (hexagonal architecture, dispatch maps, port interfaces), test coverage, build/lint verification, and production readiness.
Cross-model search quality benchmark for Switchboard's tool discovery. Dispatches identical search scenarios to opus, sonnet, and haiku in parallel, compiles a comparison table, and identifies optimization opportunities. Use when: "benchmark search", "test search quality", "run search benchmark", after changing scoring logic, synonyms, stop words, IDF, or tool descriptions, after adding new integrations, or when evaluating Phase 2 tag impact. Also use when the user mentions "search hit rate", "search recall", or "did search get better/worse". Not for full MCP smoke tests (use mcp-benchmark) or unit testing (use make test).