skills/vision/SKILL.md
--- id: vision name: Vision description: Describe or analyze an image, or generate an image and send to chat. Actions: describe (image/url/path/webcam), generate (prompt → image sent to chat). See SKILL.md. --- # Vision Read or analyze an image using a **vision-capable LLM**, or **generate an image** from a text prompt and **send it to the chat**. Use when the user sends an image, when you have an image path (e.g. from a browse screenshot), when the user wants to **see through the camera**, or
npx skillsauth add bishwashere/cowcode skills/visionInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Read or analyze an image using a vision-capable LLM, or generate an image from a text prompt and send it to the chat. Use when the user sends an image, when you have an image path (e.g. from a browse screenshot), when the user wants to see through the camera, or when the user asks to create/draw/generate an image ("Draw a sunset", "Generate a logo for …", "Make an image of …").
Built-in chaining: Screenshot → auto-describe → act. After a browse screenshot, use the returned file path with vision to describe the page; then you can click, fill, or scroll in a follow-up step. The user does not need to say "describe this then click"-you chain screenshot → vision → browse actions as needed.
Live camera: Vision can use the webcam as input, not just files. Set arguments.image to "webcam" (or arguments.source to "webcam") to capture one frame from the default camera. Use for prompts like "Show me what you see", "What's in the room?", "Describe what's in front of the camera."
Generate image and send to chat: Set arguments.action to "generate" and arguments.prompt to a description of the image to create. The image is generated (OpenAI DALL·E), saved, and sent to the chat as a photo with an optional caption. Use for "Draw …", "Generate an image of …", "Create a picture of …". Optional: arguments.size (e.g. 1024x1024), arguments.sendToChat (default true; set to false to only save the image and not send it to the chat).
Call run_skill with skill: "vision". Set command or arguments.action to describe (default) or generate. Arguments:
~/.pasture/browse-screenshots/, or user upload under uploads), or1024x1024, 1024x1792, 1792x1024). Default: 1024x1024.true (default), the generated image is sent to the chat as a photo. Set to false to only save the image and return the path in the tool result.~/.pasture/browse-screenshots/. Use vision with that path to describe or analyze the page; then chain with click/fill/scroll as needed. No need for the user to say "describe this then click."For describe, you must provide an image source (or the agent will use the last image from chat history when available): arguments.image, arguments.url, or arguments.path (file path from "Image file: ..." in the message), or arguments.source: "webcam". For generate, you must provide arguments.prompt.
vision_describe
description: Describe or analyze an image. Provide image (path, url, or "webcam"), optional prompt.
parameters:
image: string
url: string
path: string
prompt: string
systemPrompt: string
vision_generate
description: Generate an image from a text prompt and send to chat (DALL·E).
parameters:
prompt: string
size: string
skills.vision.fallback with provider, model, and apiKey (env var name, e.g. LLM_1_API_KEY). Same style as llm.models and versions chosen in setup.LLM_1_API_KEY), or use OpenAI as the vision fallback-the same key is used for image generation. Optional: skills.vision.imageGeneration.size, skills.vision.imageGeneration.model (default dall-e-3).testing
Bridge conversation to dashboard Projects and Missions — list configured projects, register new ones with setup details, health-check, propose tasks, create missions after user approval, log progress, and update task status. Use when the user wants to work on, track, or manage a project.
testing
Scan linked teammates and score who best fits a user request. Returns ranked agents with relevance scores, reasoning, and a recommendation (delegate, handle-in-main, adapt, or create-new). Call when the topic does not clearly match your active skills or before deciding whether to delegate.
tools
Gmail integration. List, read, search, send, reply, archive, trash, mark-read emails. Natural language commands like "clear my inbox" or "summarize unread". Requires gog CLI authenticated with Gmail.
documentation
GitHub integration. Read repos, list/read issues and PRs, create branches, post comments, create PRs. Requires GitHub token in ~/.pasture/secrets.json or GITHUB_TOKEN env var.