plugins/summarizer/skills/image-summarization/SKILL.md
Describe images and screenshots by viewing content with multimodal Read tool and documenting only visible elements. Activates on what's in this image, describe this screenshot, summarize this diagram, image summary, what does this screenshot show, explain this diagram, break down this chart. Handles UI screenshots, architecture diagrams, charts, photos, code screenshots, and terminal output. Never infers from filenames.
npx skillsauth add jamie-bitflight/claude_skills image-summarizationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Methodology for summarizing visual content including screenshots, diagrams, charts, photos, code screenshots, and terminal output.
The model MUST use the Read tool to view images. Claude Code is multimodal and can directly read image files.
Supported formats:
Special handling for SVG:
Example:
Read("/path/to/diagram.svg") # Visual view
Read("/path/to/diagram.svg") # Text view for extracting labels
The summarization approach varies by image type. The model MUST identify the image type and apply the corresponding strategy.
The model MUST describe:
Example output structure:
UI screenshot showing [application name if visible] with three-column layout.
Left sidebar contains navigation menu with items: [list items].
Main content area displays [describe primary content].
Top banner shows [error/success/info message if present].
The model MUST describe:
Example output structure:
Architecture diagram with 5 components:
- [Component A] connects to [Component B] via [labeled arrow]
- Data flows from [source] through [intermediate] to [destination]
- [Component X] is grouped with [Component Y] within [boundary label]
The model MUST describe:
Example output structure:
Line graph titled [title] showing [Y-axis label] over [X-axis label].
Trend: [describe pattern - e.g., "steady increase from 10 to 50 over time period"].
Notable points: peak at [X value] with [Y value], lowest point at [X value].
The model MUST describe:
Do NOT:
The model MUST:
Example output structure:
Code screenshot showing [language] file [filename if visible].
Visible code defines [function/class name] that [describe from visible code].
Lines [X-Y] are visible; code appears truncated at [top/bottom/both].
The model MUST:
Example output structure:
Terminal screenshot showing command: [extract exact command]
Output: [extract visible output text]
[If truncated] Output appears truncated; [X] lines visible.
The model MUST describe only what IS visible in the image.
Prohibited behaviors:
Required behaviors:
The model MUST produce a structured summary following the format defined in Structured Summary.
Image-specific frontmatter:
---
source_type: image
source_path: "/absolute/path/to/image.png"
summarized_at: "2026-02-06T10:30:00Z"
method: abstractive
word_count_source: null
word_count_summary: <integer>
compression_ratio: null
confidence: high | medium | low
confidence_notes: "Image clearly visible with high resolution" | "Some labels obscured" | "Low resolution, text partially unreadable"
---
Required sections:
High confidence:
Medium confidence:
Low confidence:
../summarizer/templates/{format_id}.md (default: structured). The template defines the schema, required sections, and fidelity constraints for the selected format.All image summaries MUST comply with the shared fidelity rules defined in Fidelity Rules.
Key rules for images:
---
source_type: image
source_path: "/home/user/screenshot.png"
summarized_at: "2026-02-06T10:45:00Z"
method: abstractive
word_count_source: null
word_count_summary: 85
compression_ratio: null
confidence: high
confidence_notes: "High resolution screenshot with all UI elements clearly visible"
---
Summary: Login page for application "ExampleApp" showing username and password fields, "Remember me" checkbox, and "Sign In" button. Blue banner at top displays "Welcome back!" message.
What Was Found:
What Was NOT Found:
Uncertain: N/A
Sources:
---
source_type: image
source_path: "/home/user/architecture.svg"
summarized_at: "2026-02-06T11:00:00Z"
method: abstractive
word_count_source: null
word_count_summary: 120
compression_ratio: null
confidence: medium
confidence_notes: "Most components labeled clearly, one connection label partially obscured"
---
Summary: Three-tier architecture diagram showing web frontend, API gateway, and database layer with message queue for async processing.
What Was Found:
What Was NOT Found:
Uncertain:
Sources:
development
When an application needs to store config, data, cache, or state files. When designing where user-specific files should live. When code writes to ~/.appname or hardcoded home paths. When implementing cross-platform file storage with platformdirs.
testing
Enforce mandatory pre-action verification checkpoints to prevent pattern-matching from overriding explicit reasoning. Use this skill when about to execute implementation actions (Bash, Write, Edit) to verify hypothesis-action alignment. Blocks execution when hypothesis unverified or action targets different system than hypothesis identified. Critical for preventing cognitive dissonance where correct diagnosis leads to wrong implementation.
tools
Reference guide for the Twelve-Factor App methodology — 15 principles (12 original + 3 modern extensions) for building portable, resilient, cloud-native applications. Use when evaluating application architecture, designing cloud-native services, reviewing codebases for methodology compliance, advising on configuration, scaling, observability, security, and deployment patterns. Incorporates the 2025 open-source community evolution and cloud-native reinterpretations of each factor.
tools
Converts user-facing documentation (how-to guides, tutorials, API references, examples) in any format — Markdown, PDF, DOCX, PPTX, XLSX, AsciiDoc, RST, HTML, Jupyter notebooks, man pages, TOML/YAML/JSON configs, and plain text — into Claude Code skill directories with SKILL.md plus thematically grouped references/*.md files. Use when given a docs directory or mixed-format documentation to transform into an AI skill. Uses MCP file-reader server for binary formats.