ai-ml/dev-signal/.agent/skills/gcp-agent-eval-engine-runner/SKILL.md
Boilerplate for a production evaluation runner that performs parallel inference, captures reasoning traces via SSE, and integrates with the Vertex AI Gen AI Evaluation service.
npx skillsauth add googlecloudplatform/devrel-demos gcp-agent-eval-engine-runnerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill provides the "engine" for your automated evaluation pipeline. Grounded in evaluation_blog.md, it handles the complexity of running hundreds of parallel requests against a shadow revision while capturing the full "Thinking Process" (Reasoning Trace).
Ask Antigravity to:
asyncio.Semaphore to throttle requests (preventing DDOS of the shadow service).POST /run_sse endpoint to stream intermediate events.response and intermediate_events to the input dataset.create_evaluation_run API.Refer to scripts/evaluate_agent_boilerplate.py for the core implementation.
devops
Standardizes the creation of Sensitive Data Protection (DLP) templates for PII and credential redaction.
development
Implements the "Defense-in-Depth" integration pattern in Python (intercepting prompts, parsing filter results).
testing
Configures Model Armor security policies (Prompt Injection, Jailbreak, RAI filters).
tools
Assists developers in collecting and structuring a library of diverse examples ("Golden Dataset") required for data-driven evaluation, including tool trajectories.