.claude/skills/cuyamaca-llm-abstraction/SKILL.md
Build the multi-provider LLM abstraction layer for Cuyamaca with two independent model slots — a code model and a runtime model. Use this skill whenever the user wants to add LLM provider support, integrate Ollama or external APIs (OpenAI, Anthropic, Google, Mistral), set up the two-model-slot architecture, implement streaming chat with multiple providers, or references "phase 2", "LLM abstraction", "model providers", "code model", "runtime model", "multi-provider", or "Ollama integration" in the context of Cuyamaca. Also trigger when the user asks about supporting multiple LLM backends, API key storage in OS keychain, or model selection logic. This skill assumes Phase 1 is complete (Tauri v2 scaffold, IPC bridge verified, three-panel layout).
npx skillsauth add yuyanghu06/cuyamaca cuyamaca-llm-abstractionInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill builds the LLM service layer for Cuyamaca. Unlike Sierra (which only talks to Ollama), Cuyamaca supports multiple LLM providers and has two independent model slots: a code model for sketch generation/modification and a runtime model for agentic hardware control. Each slot can be configured to use any supported provider.
ModelProvider trait in Rust with complete (non-streaming) and complete_stream (streaming) methodsModelSlot abstraction representing a configured model (provider + model name + API key)code_model and runtime_modelAdd to src-tauri/Cargo.toml:
reqwest = { version = "0.12", features = ["json", "stream"] }
futures-util = "0.3"
tokio = { version = "1", features = ["full"] }
serde_json = "1"
async-trait = "0.1"
tauri-plugin-store = "2"
tauri-plugin-store is used for persisting model slot configuration (which provider, which model). API keys go through the OS keychain via Tauri's secure storage, not the store plugin.
For keychain access, add the Tauri stronghold or keyring plugin depending on your preference. The simplest approach for v2:
keyring = "3"
The keyring crate provides cross-platform OS keychain access (macOS Keychain, Windows Credential Manager).
Create the file structure:
src-tauri/src/
├── services/
│ ├── mod.rs
│ ├── provider.rs # ModelProvider trait + shared types
│ ├── ollama.rs # Ollama implementation
│ ├── openai.rs # OpenAI implementation
│ ├── anthropic.rs # Anthropic implementation
│ ├── google.rs # Google Gemini implementation
│ ├── mistral.rs # Mistral implementation
│ └── model_manager.rs # ModelSlot, slot management, factory
services/provider.rs)use serde::{Deserialize, Serialize};
use tokio::sync::mpsc;
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ChatMessage {
pub role: String, // "user", "assistant", "system"
pub content: MessageContent,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
#[serde(untagged)]
pub enum MessageContent {
Text(String),
Multimodal(Vec<ContentPart>),
}
#[derive(Debug, Clone, Serialize, Deserialize)]
#[serde(tag = "type")]
pub enum ContentPart {
#[serde(rename = "text")]
Text { text: String },
#[serde(rename = "image")]
Image { data: String, media_type: String }, // base64
}
#[derive(Debug, Clone, Serialize)]
pub struct StreamChunk {
pub content: String,
pub done: bool,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct CompletionRequest {
pub messages: Vec<ChatMessage>,
pub system_prompt: Option<String>,
pub temperature: Option<f32>,
pub max_tokens: Option<u32>,
pub tools: Option<Vec<ToolDefinition>>,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ToolDefinition {
pub name: String,
pub description: String,
pub parameters: serde_json::Value, // JSON Schema
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct CompletionResponse {
pub content: String,
pub tool_calls: Option<Vec<ToolCall>>,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ToolCall {
pub name: String,
pub arguments: serde_json::Value,
}
#[async_trait::async_trait]
pub trait ModelProvider: Send + Sync {
/// Non-streaming completion. Used by the code model for sketch generation.
async fn complete(
&self,
request: CompletionRequest,
) -> Result<CompletionResponse, String>;
/// Streaming completion. Used by the runtime model and chat views.
async fn complete_stream(
&self,
request: CompletionRequest,
tx: mpsc::Sender<StreamChunk>,
) -> Result<CompletionResponse, String>;
/// Check if the provider is reachable and the configured model exists.
async fn is_healthy(&self) -> bool;
/// List available models for this provider.
async fn list_models(&self) -> Result<Vec<ModelInfo>, String>;
/// Whether this provider supports multimodal input (images).
fn supports_multimodal(&self) -> bool;
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ModelInfo {
pub id: String,
pub name: String,
pub multimodal: bool,
}
Key design decisions:
MessageContent is an enum so the same trait handles text-only (code model) and multimodal (runtime model with camera frames and sensor visualizations).CompletionResponse returns both text content and optional tool calls. The code model uses text content for sketch generation. The runtime model uses tool calls for serial commands.complete is non-streaming — the code model doesn't need token-by-token display since the user reviews the full output as a diff.complete_stream is streaming — the runtime model streams responses in the chat view.supports_multimodal() is used by the settings UI to filter models for the runtime slot (only multimodal models are valid for runtime).services/ollama.rs — hits Ollama's native /api/chat endpoint.
Key details:
Struct:
pub struct OllamaProvider {
client: reqwest::Client,
base_url: String,
model: String,
}
complete implementation:
Post to /api/chat with "stream": false. Parse the single JSON response.
complete_stream implementation:
Post to /api/chat with "stream": true. Read NDJSON line by line. Each line is:
{"model":"llama3.2","message":{"role":"assistant","content":"token"},"done":false}
Final line has "done": true. Buffer bytes and split on \n to handle partial lines.
list_models implementation:
GET /api/tags returns {"models": [{"name": "llama3.2", ...}]}. Map to ModelInfo. Mark models as multimodal based on known multimodal model families (llava, bakllava, moondream, llama vision).
is_healthy implementation:
GET / — 200 means running.
Multimodal support:
Ollama's /api/chat accepts "images": ["base64..."] in message objects. When the request contains ContentPart::Image, convert to Ollama's image format. supports_multimodal() returns true only if the configured model is in the known multimodal list.
Each provider follows the same pattern: translate CompletionRequest into the provider's API format, make the HTTP call, translate the response back to CompletionResponse.
services/openai.rs)https://api.openai.com/v1/chat/completionsAuthorization: Bearer {api_key}data: {"choices":[{"delta":{"content":"token"}}]} lines{"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,..."}} in content arrayssupports_multimodal(): true for gpt-4o modelsservices/anthropic.rs)https://api.anthropic.com/v1/messagesx-api-key: {api_key}, anthropic-version: 2023-06-01system field, not in messagesevent: content_block_delta containing {"delta":{"text":"token"}}{"type": "image", "source": {"type": "base64", "media_type": "image/jpeg", "data": "..."}} in content arrayssupports_multimodal(): true for claude-sonnet and claude-opus modelsservices/google.rs)https://generativelanguage.googleapis.com/v1beta/models/{model}:generateContent?key={api_key} query parameter:streamGenerateContent endpoint with SSE{"inlineData": {"mimeType": "image/jpeg", "data": "base64..."}} in partssupports_multimodal(): true for gemini-1.5-pro and gemini-2.0-flashservices/mistral.rs)https://api.mistral.ai/v1/chat/completionsAuthorization: Bearer {api_key}supports_multimodal(): falseAll external providers share:
api_key: String field stored in the structFactor common SSE parsing into a shared utility if the implementations are too repetitive.
services/model_manager.rs manages the two model slots and creates provider instances.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct SlotConfig {
pub provider: ProviderType,
pub model: String,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum ProviderType {
Ollama,
OpenAI,
Anthropic,
Google,
Mistral,
}
pub struct ModelManager {
code_model: Option<Box<dyn ModelProvider>>,
runtime_model: Option<Box<dyn ModelProvider>>,
code_config: Option<SlotConfig>,
runtime_config: Option<SlotConfig>,
}
The ModelManager provides:
configure_code_model(config: SlotConfig, api_key: Option<String>) — creates the appropriate provider instance and stores itconfigure_runtime_model(config: SlotConfig, api_key: Option<String>) — same, but validates that the provider+model supports multimodal. Warns (does not block) if the model is text-only.code_model() — returns a reference to the code model provider, or an error if not configuredruntime_model() — returns a reference to the runtime model provider, or an error if not configuredfactory(provider: ProviderType, model: String, api_key: Option<String>, base_url: Option<String>) -> Box<dyn ModelProvider> — constructs the right provider implementationAPI keys are stored in the OS keychain, not in plaintext config files.
Use the keyring crate:
use keyring::Entry;
fn store_api_key(provider: &str, key: &str) -> Result<(), String> {
let entry = Entry::new("cuyamaca", provider).map_err(|e| e.to_string())?;
entry.set_password(key).map_err(|e| e.to_string())?;
Ok(())
}
fn get_api_key(provider: &str) -> Result<Option<String>, String> {
let entry = Entry::new("cuyamaca", provider).map_err(|e| e.to_string())?;
match entry.get_password() {
Ok(key) => Ok(Some(key)),
Err(keyring::Error::NoEntry) => Ok(None),
Err(e) => Err(e.to_string()),
}
}
Service names: "cuyamaca-openai", "cuyamaca-anthropic", "cuyamaca-google", "cuyamaca-mistral".
Slot configurations (which provider, which model) are persisted via tauri-plugin-store to a JSON file. API keys never go in that file.
// commands/models.rs
#[tauri::command]
pub async fn list_providers() -> Vec<ProviderInfo> {
// Return the static list of supported providers with their model lists
}
#[tauri::command]
pub async fn configure_model_slot(
state: tauri::State<'_, AppState>,
slot: String, // "code" or "runtime"
provider: String,
model: String,
api_key: Option<String>,
) -> Result<(), String> {
// Store API key in keychain if provided
// Create provider instance via factory
// Update the appropriate slot in ModelManager
// Persist slot config to store
}
#[tauri::command]
pub async fn get_slot_config(
state: tauri::State<'_, AppState>,
slot: String,
) -> Result<Option<SlotConfigResponse>, String> {
// Return current config (provider + model, NOT the API key)
}
#[tauri::command]
pub async fn check_model_health(
state: tauri::State<'_, AppState>,
slot: String,
) -> Result<bool, String> {
// Call is_healthy() on the appropriate slot's provider
}
#[tauri::command]
pub async fn list_ollama_models(
state: tauri::State<'_, AppState>,
) -> Result<Vec<ModelInfo>, String> {
// Hit Ollama's /api/tags to list locally available models
}
Do NOT add a send_completion command yet. The code model completion is used in Phase 4 (code generation) and the runtime model completion is used in Phase 7 (agent loop). This phase only sets up the abstraction and configuration.
// src/commands/models.ts
export interface SlotConfig {
provider: "ollama" | "openai" | "anthropic" | "google" | "mistral";
model: string;
}
export interface ModelInfo {
id: string;
name: string;
multimodal: boolean;
}
export async function configureModelSlot(
slot: "code" | "runtime",
provider: string,
model: string,
apiKey?: string,
): Promise<void> {
return invoke("configure_model_slot", { slot, provider, model, apiKey });
}
export async function getSlotConfig(
slot: "code" | "runtime",
): Promise<SlotConfig | null> {
return invoke("get_slot_config", { slot });
}
export async function checkModelHealth(
slot: "code" | "runtime",
): Promise<boolean> {
return invoke("check_model_health", { slot });
}
export async function listOllamaModels(): Promise<ModelInfo[]> {
return invoke("list_ollama_models");
}
Update the sidebar service health indicators from Phase 1:
/). Green if reachable, red if not.Poll health every 30 seconds or on window focus.
listOllamaModels() returns the locally available modelsconfigureModelSlot("code", "ollama", "llama3.2") succeedsgetSlotConfig("code") returns the configcheckModelHealth("code") returns trueconfigureModelSlot("code", "openai", "gpt-4o", "sk-...") succeeds and the key is stored in the OS keychain (verify via Keychain Access on macOS or Credential Manager on Windows)Keychain permission prompt on macOS: The first time the app accesses the keychain, macOS may prompt the user to allow access. This is expected behavior.
Ollama multimodal model detection: There's no reliable API to check if an Ollama model is multimodal. Maintain a hardcoded list of known multimodal model family prefixes (llava, bakllava, moondream, llama-vision). This is imperfect but sufficient.
External API rate limits: All external providers have rate limits. Return clear error messages when a 429 is received so the frontend can display it.
reqwest TLS on Windows: Ensure reqwest uses native-tls or rustls features appropriately for Windows compatibility.
ToolDefinition and ToolCall types are defined here for the trait, but the actual tool registry and execution loop come in later phases.development
Build the Settings view and apply final polish to Cuyamaca — model configuration UI, API key management, process health monitoring, accessibility improvements, responsive refinements, and overall UX tightening. Use this skill whenever the user wants to build the settings view, configure model providers in the UI, add API key entry, polish the app's responsiveness, improve accessibility, refine animations, add keyboard navigation, or references "phase 8", "settings", "settings view", "API key management", "model configuration", "accessibility", "polish", "keyboard navigation", or "responsive refinement". Also trigger when the user asks about WCAG compliance for glass effects, reduce transparency mode, or the settings UI for Cuyamaca. This skill assumes Phase 7 is complete (runtime agent loop, all core functionality working).
tools
Build serial communication, structured output parsing, sensor state management, and sensor visualization rendering for Cuyamaca. Use this skill whenever the user wants to implement serial port reading/writing, parse structured sensor output, build the sensor state panel, render sensor visualization images, manage the serial connection lifecycle, or references "phase 6", "serial communication", "serial port", "sensor parsing", "sensor state", "sensor visualization", "structured output", "serial monitor", or "serial reader". Also trigger when the user asks about the SENSOR_ID:VALUE protocol, concurrent serial read/write, sensor image rendering, or real-time state updates. This skill assumes Phase 5 is complete (arduino-cli integration, compile and flash working).
testing
Scaffold a Tauri v2 desktop app for the Cuyamaca project — an Arduino robotics controller with natural language control. Use this skill whenever the user wants to initialize the Cuyamaca project, set up the Tauri v2 scaffold, create the base layout and warm-white liquid glass UI theme, verify the IPC bridge, or references "phase 1", "scaffold", "project setup", "initialize Cuyamaca", "create the app skeleton", or "base layout". Also trigger when the user asks about Cuyamaca's three-panel layout, the warm-white glass design language, or setting up the Tauri project structure from scratch.
tools
Build the runtime agent loop for Cuyamaca — the agentic control loop where the runtime model reads sensor context, decides tool calls, writes serial commands, and iterates until the user stops it. Use this skill whenever the user wants to implement the runtime window, build the agent loop, assemble multimodal context for the runtime model, implement tool call dispatch via serial, add the kill button, or references "phase 7", "runtime agent", "agent loop", "runtime window", "runtime model", "tool calling", "kill button", "agentic loop", "multimodal context", or "control loop". Also trigger when the user asks about feeding sensor data to a vision model, executing tool calls as serial commands, or the observe-decide-act cycle. This skill assumes Phase 6 is complete (serial communication, sensor parsing, sensor visualization).