.claude/skills/cuyamaca-runtime-agent/SKILL.md
Build the runtime agent loop for Cuyamaca — the agentic control loop where the runtime model reads sensor context, decides tool calls, writes serial commands, and iterates until the user stops it. Use this skill whenever the user wants to implement the runtime window, build the agent loop, assemble multimodal context for the runtime model, implement tool call dispatch via serial, add the kill button, or references "phase 7", "runtime agent", "agent loop", "runtime window", "runtime model", "tool calling", "kill button", "agentic loop", "multimodal context", or "control loop". Also trigger when the user asks about feeding sensor data to a vision model, executing tool calls as serial commands, or the observe-decide-act cycle. This skill assumes Phase 6 is complete (serial communication, sensor parsing, sensor visualization).
npx skillsauth add yuyanghu06/cuyamaca cuyamaca-runtime-agentInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill builds the core agentic control loop: the runtime model observes sensor state, camera frames, and sensor visualizations, decides which tools to call, the backend translates those tool calls into serial commands, the board executes and reports back, and the loop repeats. This is the phase where the robot actually moves.
The runtime window is a separate Tauri window that opens after a successful flash. It is independent from the project window — closing it terminates the serial session but does not close the project.
// src-tauri/src/commands/runtime.rs
#[tauri::command]
pub async fn open_runtime_window(
app: tauri::AppHandle,
state: tauri::State<'_, AppState>,
) -> Result<(), String> {
// Verify:
// 1. A sketch has been flashed
// 2. The runtime model slot is configured
// 3. The serial port is available
// Open serial connection
// Start sensor state collection
// Create new Tauri window
let runtime_window = tauri::WebviewWindowBuilder::new(
&app,
"runtime",
tauri::WebviewUrl::App("runtime.html".into()),
)
.title("Cuyamaca — Runtime")
.inner_size(1100.0, 700.0)
.min_inner_size(800.0, 500.0)
.build()
.map_err(|e| e.to_string())?;
Ok(())
}
The runtime window uses a separate HTML entry point (runtime.html) that renders the runtime-specific layout. This keeps the project window and runtime window as independent React roots.
┌─────────────────────────────────────┬──────────────────┐
│ │ Serial Monitor │
│ Chat (model + tool call pills) │ (raw output) │
│ │ │
│ │ ──────────────── │
│ │ Sensor State │
│ │ (parsed, live) │
│ │ │
│ │ ──────────────── │
│ │ Sensor Viz │
│ ───────────────────────────────── │ (images) │
│ [Input capsule] [KILL] │ │
└─────────────────────────────────────┴──────────────────┘
Left side (flex, ~65%): chat interface with tool call pills. Right side (fixed, ~35%): serial monitor, sensor state, sensor visualizations stacked vertically with adjustable splits.
The context assembler builds the input for each runtime model turn. It combines structured text, sensor visualization images, and camera frames into a single CompletionRequest.
// src-tauri/src/services/context.rs
pub struct ContextAssembler;
impl ContextAssembler {
pub fn assemble(
sensor_state: &SensorStateStore,
sensor_viz: Option<&[u8]>, // PNG bytes
camera_frame: Option<&[u8]>, // JPEG bytes
tools: &[ToolDefinition],
conversation: &[ChatMessage],
user_message: &str,
manifest: &Manifest,
) -> CompletionRequest {
let system_prompt = build_runtime_system_prompt(manifest, tools);
let mut messages = conversation.to_vec();
// Build the user turn with multimodal content
let mut content_parts = Vec::new();
// 1. Structured sensor state as text
let sensor_text = sensor_state.format_for_model();
content_parts.push(ContentPart::Text {
text: sensor_text,
});
// 2. Sensor visualization image (if spatial sensors present)
if let Some(viz_bytes) = sensor_viz {
let base64 = base64::Engine::encode(
&base64::engine::general_purpose::STANDARD,
viz_bytes,
);
content_parts.push(ContentPart::Image {
data: base64,
media_type: "image/png".to_string(),
});
}
// 3. Camera frame (if camera component present)
if let Some(frame_bytes) = camera_frame {
let base64 = base64::Engine::encode(
&base64::engine::general_purpose::STANDARD,
frame_bytes,
);
content_parts.push(ContentPart::Image {
data: base64,
media_type: "image/jpeg".to_string(),
});
}
// 4. User message
content_parts.push(ContentPart::Text {
text: user_message.to_string(),
});
messages.push(ChatMessage {
role: "user".to_string(),
content: MessageContent::Multimodal(content_parts),
});
CompletionRequest {
messages,
system_prompt: Some(system_prompt),
temperature: Some(0.3),
max_tokens: Some(1024),
tools: Some(tools.to_vec()),
}
}
}
You are controlling a robot through serial commands. You observe sensor data and camera images, decide what actions to take, and call the available tools to control the hardware.
Hardware: {manifest_summary}
Rules:
- Always check sensor data before and after actions
- If any sensor indicates danger (obstacle too close, tilt too steep), call stop immediately
- Explain what you observe and why you're taking each action
- If you're unsure about a sensor reading, call read_sensor_state to get a fresh reading
- Never move without checking distance sensors first
You can call multiple tools in sequence. After each tool call, you'll receive updated sensor data.
When the runtime model returns tool calls, the dispatcher translates them into serial commands:
// src-tauri/src/services/tool_dispatch.rs
pub struct ToolDispatcher {
tools: Vec<ToolDefinition>,
serial: Arc<SerialManager>,
}
impl ToolDispatcher {
pub async fn execute(&self, tool_call: &ToolCall) -> Result<ToolResult, String> {
match tool_call.name.as_str() {
// Lifecycle tools — handled by the app, not serial
"read_sensor_state" => self.handle_read_sensor_state().await,
"wait_milliseconds" => self.handle_wait(tool_call).await,
"get_camera_frame" => self.handle_get_camera_frame().await,
"end_session" => self.handle_end_session().await,
// Domain tools — translated to serial commands
_ => self.handle_serial_tool(tool_call).await,
}
}
async fn handle_serial_tool(&self, tool_call: &ToolCall) -> Result<ToolResult, String> {
// Find the tool definition
let tool_def = self.tools.iter()
.find(|t| t.name == tool_call.name)
.ok_or_else(|| format!("Unknown tool: {}", tool_call.name))?;
// Build the CMD string from the tool's serial_command template
let cmd = build_serial_command(&tool_def.serial_command, &tool_call.arguments)?;
// Write to serial
self.serial.send_command(&cmd)?;
// Wait briefly for the board to acknowledge
tokio::time::sleep(std::time::Duration::from_millis(100)).await;
Ok(ToolResult {
tool_name: tool_call.name.clone(),
success: true,
output: format!("Sent: {}", cmd),
})
}
async fn handle_wait(&self, tool_call: &ToolCall) -> Result<ToolResult, String> {
let ms = tool_call.arguments["milliseconds"]
.as_u64()
.unwrap_or(1000);
tokio::time::sleep(std::time::Duration::from_millis(ms)).await;
Ok(ToolResult {
tool_name: "wait_milliseconds".to_string(),
success: true,
output: format!("Waited {}ms", ms),
})
}
async fn handle_end_session(&self) -> Result<ToolResult, String> {
// Signal the agent loop to terminate
Ok(ToolResult {
tool_name: "end_session".to_string(),
success: true,
output: "Session ended by model".to_string(),
})
}
}
fn build_serial_command(template: &str, arguments: &serde_json::Value) -> Result<String, String> {
// template: "CMD:move_forward:speed={speed}"
// arguments: {"speed": 80}
// result: "CMD:move_forward:speed=80"
let mut cmd = template.to_string();
if let Some(obj) = arguments.as_object() {
for (key, value) in obj {
let placeholder = format!("{{{}}}", key);
let value_str = match value {
serde_json::Value::Number(n) => n.to_string(),
serde_json::Value::String(s) => s.clone(),
serde_json::Value::Bool(b) => b.to_string(),
_ => value.to_string(),
};
cmd = cmd.replace(&placeholder, &value_str);
}
}
Ok(cmd)
}
#[derive(Debug, Clone, Serialize)]
pub struct ToolResult {
pub tool_name: String,
pub success: bool,
pub output: String,
}
The agent loop is the core runtime cycle. It runs as a background task, orchestrated by the Rust backend:
// src-tauri/src/services/agent.rs
pub struct AgentLoop {
runtime_model: Box<dyn ModelProvider>,
tool_dispatcher: ToolDispatcher,
context_assembler: ContextAssembler,
sensor_state: Arc<Mutex<SensorStateStore>>,
camera: Option<CameraService>,
sensor_viz: SensorVizRenderer,
conversation: Vec<ChatMessage>,
manifest: Manifest,
tools: Vec<ToolDefinition>,
running: Arc<AtomicBool>,
event_tx: mpsc::Sender<AgentEvent>,
}
impl AgentLoop {
pub async fn run_turn(&mut self, user_message: &str) -> Result<(), String> {
self.running.store(true, Ordering::SeqCst);
loop {
if !self.running.load(Ordering::SeqCst) {
break; // killed by user
}
// 1. Collect current context
let sensor_state = self.sensor_state.lock().await;
let sensor_viz = self.sensor_viz.render(&sensor_state, &self.manifest);
let camera_frame = if let Some(ref cam) = self.camera {
cam.capture_frame().await.ok()
} else {
None
};
// 2. Assemble the completion request
let request = ContextAssembler::assemble(
&sensor_state,
sensor_viz.as_deref(),
camera_frame.as_deref(),
&self.tools,
&self.conversation,
user_message,
&self.manifest,
);
drop(sensor_state);
// 3. Call the runtime model
let response = self.runtime_model.complete(request).await?;
// 4. Process the response
if let Some(text) = &response.content.as_str().filter(|s| !s.is_empty()) {
// Model has a text response — send to chat UI
self.event_tx.send(AgentEvent::ModelResponse(text.to_string())).await
.map_err(|e| e.to_string())?;
self.conversation.push(ChatMessage {
role: "assistant".to_string(),
content: MessageContent::Text(text.to_string()),
});
}
// 5. Execute tool calls if any
if let Some(tool_calls) = &response.tool_calls {
if tool_calls.is_empty() {
break; // No more tool calls — turn is done
}
for tool_call in tool_calls {
// Notify UI of tool call
self.event_tx.send(AgentEvent::ToolCallStarted {
tool_name: tool_call.name.clone(),
arguments: tool_call.arguments.clone(),
}).await.map_err(|e| e.to_string())?;
// Execute
let result = self.tool_dispatcher.execute(tool_call).await?;
// Check for end_session
if tool_call.name == "end_session" {
self.event_tx.send(AgentEvent::SessionEnded).await
.map_err(|e| e.to_string())?;
self.running.store(false, Ordering::SeqCst);
break;
}
// Notify UI of result
self.event_tx.send(AgentEvent::ToolCallCompleted {
tool_name: tool_call.name.clone(),
success: result.success,
output: result.output.clone(),
}).await.map_err(|e| e.to_string())?;
// Add tool result to conversation for next iteration
self.conversation.push(ChatMessage {
role: "tool".to_string(),
content: MessageContent::Text(
serde_json::to_string(&result).unwrap_or_default()
),
});
}
// Continue the loop — model may want to call more tools
// after observing the results
continue;
}
// No tool calls and model responded with text — turn is done
break;
}
Ok(())
}
pub fn kill(&self) {
self.running.store(false, Ordering::SeqCst);
// Send emergency stop to the board
let _ = self.tool_dispatcher.serial.send_command("CMD:stop");
}
}
#[derive(Debug, Clone, Serialize)]
#[serde(rename_all = "camelCase", tag = "event", content = "data")]
pub enum AgentEvent {
ModelResponse(String),
ToolCallStarted { tool_name: String, arguments: serde_json::Value },
ToolCallCompleted { tool_name: String, success: bool, output: String },
SessionEnded,
Error(String),
}
The agent loop is NOT autonomous by default. It runs one "turn" per user message:
move_forward)The model may iterate multiple times within a single turn — calling tools, observing results, calling more tools. This is the "agentic" part. But a new user message is required to start a new turn. The model does not autonomously decide to keep acting after finishing a turn.
#[tauri::command]
pub async fn runtime_send_message(
state: tauri::State<'_, AppState>,
message: String,
on_event: Channel<AgentEvent>,
) -> Result<(), String> {
// Get the agent loop from state
// Run a turn with the user's message
// Stream AgentEvents to the frontend via Channel
}
#[tauri::command]
pub async fn runtime_kill(
state: tauri::State<'_, AppState>,
) -> Result<(), String> {
// Kill the agent loop
// Send CMD:stop to the board
// Close serial connection
// Close the runtime window
}
The left side of the runtime window is a chat interface similar to the code chat (Phase 4) but with additional elements:
┌───────────────────────────────────────┐
│ I can see clear space ahead. Moving │
│ forward at 60% speed. │
│ │
│ ┌─ ◉ move_forward speed=60 ──────┐ │
│ └─────────────────────────────────┘ │
│ │
│ Obstacle detected at 12cm. Stopping. │
│ │
│ ┌─ ◉ stop ───────────────────────┐ │
│ └─────────────────────────────────┘ │
│ │
│ I've stopped. The front distance │
│ sensor reads 12cm. Should I turn │
│ and find an alternate path? │
└───────────────────────────────────────┘
Tool call pill styling:
Same design as the code chat input but with the Kill button adjacent:
While the agent loop is running (executing a turn), the input is disabled and shows a pulsing cyan border.
While the model is thinking:
The Kill button is the most important safety control. It must be always reachable, always functional, and always fast.
// Kill is NOT an async operation — it must complete immediately
pub fn kill_runtime(state: &AppState) {
// 1. Set the running flag to false (stops the agent loop)
if let Some(ref agent) = state.agent_loop {
agent.kill();
}
// 2. Send CMD:stop to the board (synchronous serial write)
if let Some(ref serial) = state.serial_manager {
serial.send_command("CMD:stop").ok(); // ignore errors — best effort
}
// 3. Close serial connection
if let Some(ref serial) = state.serial_manager {
serial.stop();
}
}
Escape key binding: Register a global keyboard shortcut in the runtime window:
// In the runtime window setup
runtime_window.on_window_event(move |event| {
if let tauri::WindowEvent::KeyboardInput { event, .. } = event {
if event.physical_key == PhysicalKey::Code(KeyCode::Escape) {
kill_runtime(&state);
}
}
});
Also bind it on the frontend side as a backup:
useEffect(() => {
const handler = (e: KeyboardEvent) => {
if (e.key === "Escape") {
invoke("runtime_kill");
}
};
window.addEventListener("keydown", handler);
return () => window.removeEventListener("keydown", handler);
}, []);
Update the flash success flow from Phase 5:
After a successful flash:
open_runtime_windowDo NOT auto-start the runtime. The user explicitly transitions by clicking.
move_forward with low speedstopType "Explore the area — move forward, check for obstacles, turn if blocked."
Model doesn't call tools: Ensure the tools are included in the CompletionRequest. Check that the tool definitions match the provider's expected format. Ollama uses a different tool format than OpenAI — the provider trait must translate.
Agent loop never terminates a turn: Add a maximum iteration count (e.g., 10 tool calls per turn). If the model keeps calling tools, force-stop and tell the user.
Sensor data is stale in model context: The context is assembled at the start of each model call. If the model calls a tool and wants fresh data, it should call read_sensor_state. The context assembler uses the latest data at assembly time.
Camera frames are too large: Resize JPEG frames to 320×240 before including in context. Large images consume too many tokens.
Runtime model is text-only: The settings UI should have warned the user in Phase 2. If a text-only model is used, camera frames and sensor viz are silently dropped. The model still works with structured text sensor data.
complete method so the full response + tool calls arrive together. Streaming makes tool call parsing unreliable.development
Build the Settings view and apply final polish to Cuyamaca — model configuration UI, API key management, process health monitoring, accessibility improvements, responsive refinements, and overall UX tightening. Use this skill whenever the user wants to build the settings view, configure model providers in the UI, add API key entry, polish the app's responsiveness, improve accessibility, refine animations, add keyboard navigation, or references "phase 8", "settings", "settings view", "API key management", "model configuration", "accessibility", "polish", "keyboard navigation", or "responsive refinement". Also trigger when the user asks about WCAG compliance for glass effects, reduce transparency mode, or the settings UI for Cuyamaca. This skill assumes Phase 7 is complete (runtime agent loop, all core functionality working).
tools
Build serial communication, structured output parsing, sensor state management, and sensor visualization rendering for Cuyamaca. Use this skill whenever the user wants to implement serial port reading/writing, parse structured sensor output, build the sensor state panel, render sensor visualization images, manage the serial connection lifecycle, or references "phase 6", "serial communication", "serial port", "sensor parsing", "sensor state", "sensor visualization", "structured output", "serial monitor", or "serial reader". Also trigger when the user asks about the SENSOR_ID:VALUE protocol, concurrent serial read/write, sensor image rendering, or real-time state updates. This skill assumes Phase 5 is complete (arduino-cli integration, compile and flash working).
testing
Scaffold a Tauri v2 desktop app for the Cuyamaca project — an Arduino robotics controller with natural language control. Use this skill whenever the user wants to initialize the Cuyamaca project, set up the Tauri v2 scaffold, create the base layout and warm-white liquid glass UI theme, verify the IPC bridge, or references "phase 1", "scaffold", "project setup", "initialize Cuyamaca", "create the app skeleton", or "base layout". Also trigger when the user asks about Cuyamaca's three-panel layout, the warm-white glass design language, or setting up the Tauri project structure from scratch.
development
Write the user-facing README.md for the Cuyamaca Tauri v2 desktop app (natural language Arduino/robotics control). Use this skill whenever the user asks to create, write, draft, or update the README for Cuyamaca, or mentions "readme", "documentation", "project description", "repo docs", or "GitHub page" in the context of Cuyamaca. Also trigger when the user asks about what to put in the README, how to describe the project publicly, installation instructions for end users, or setup guides. This skill produces a polished, user-facing README — not developer docs or architecture specs.