Image Vision Analysis

Overview

Analyze images using state-of-the-art LLM vision models. Use the provided scripts for most tasks - custom code only needed for advanced scenarios.

Workflow Decision Tree

First time using this skill?

→ Read setup.md for one-time environment and API key setup

Simple image analysis (most common)

→ Use "Quick Start" canned scripts below

Batch processing or multi-turn conversations

→ Read patterns.md for advanced patterns

Something failing?

→ Check setup.md for troubleshooting

Quick Start (Use Wrapper Scripts)

ALWAYS use the wrapper scripts - they handle venv setup automatically:

# Simple analysis (auto-creates venv on first use)
./vision-analyze.sh <provider> <image_path> <prompt>

# Robust analysis (auto-fallback if provider times out)
./vision-analyze-robust.sh <image_path> <prompt> [timeout_seconds]

The wrapper scripts automatically:

Create venv if it doesn't exist
Install required SDKs
Use venv Python (no manual activation needed)
Handle errors gracefully

Example usage:

# Analyze a UI screenshot (Anthropic Claude)
./vision-analyze.sh anthropic screenshot.png "Describe any UI bugs or issues you see"

# Extract text (Google Gemini - fastest)
./vision-analyze.sh gemini document.jpg "Extract all text from this image"

# Robust analysis with auto-fallback (tries Gemini → Anthropic → OpenAI)
./vision-analyze-robust.sh photo.png "Describe this image in detail"

# With custom timeout (default is 60 seconds)
./vision-analyze-robust.sh large-image.png "Analyze this" 120

Advanced: Direct Script Usage (Not Recommended)

If you need to call the Python scripts directly, you MUST use the venv Python:

# ❌ WRONG - uses system Python, will fail
python examples/anthropic-vision.py image.png "prompt"

# ✅ CORRECT - uses venv Python
./.venv/bin/python examples/anthropic-vision.py image.png "prompt"

For agents: Always use the wrapper scripts to avoid setup issues.

Provider Comparison

| Provider | Model | Best For | Speed | Cost | |----------|-------|----------|-------|------| | Anthropic | claude-sonnet-4-5 | Latest, balanced quality/speed | Fast | $$ | | Anthropic | claude-3-opus | Highest quality (older) | Slow | $$$ | | Anthropic | claude-3-haiku | Fastest, simple tasks | Very Fast | $ | | OpenAI | gpt-5 | Latest flagship model | Fast | $$$ | | OpenAI | gpt-4.1 | High-volume production | Fast | $$ | | Gemini | gemini-2.5-flash | Latest, excellent balance | Very Fast | $ | | Gemini | gemini-2.5-pro | Large images, best quality | Medium | $$ | | Azure | (deployment-based) | Enterprise, compliance | Varies | Varies |

Supported Image Formats

JPEG/JPG - Most common
PNG - With transparency
GIF - Static or animated (only the first frame is analyzed)
WEBP - Modern format

Max sizes:

Anthropic: 5MB per image
OpenAI: 20MB (auto-resizes)
Gemini: Varies by model (1.5 pro handles very large)

Common Use Cases

# UI/UX Analysis - High-level layout and spacing
./vision-analyze.sh anthropic app-screenshot.png \
  "Analyze this UI for accessibility issues and suggest improvements"

# Bug Identification (use robust for auto-fallback)
./vision-analyze-robust.sh error-state.png \
  "What's wrong with this interface? Describe any visual bugs."

# Content Moderation
./vision-analyze.sh openai user-upload.jpg \
  "Does this image contain inappropriate content? Yes or no, and explain."

# Document Understanding (Gemini is fastest)
./vision-analyze.sh gemini invoice.png \
  "Extract the total amount, date, and vendor name from this invoice"

# Design Review - Layout, color, hierarchy (not typography details)
./vision-analyze-robust.sh mockup.png \
  "Provide design feedback on this mockup. Consider layout, color hierarchy, and spacing."

⚠️ Known Limitations for Web UI Analysis

Typography and Font Detection

Vision models struggle with precise typography at typical screenshot resolutions:

❌ Unreliable for:

Distinguishing serif vs sans-serif fonts at small sizes (<16px)
Identifying specific font families (Inter vs Roboto vs Arial)
Detecting subtle weight differences (400 vs 500)
Precise alignment measurements (<5px differences)

✅ Reliable for:

High-level layout issues (spacing, hierarchy, colors)
Large size differences (14px vs 24px heading sizes)
Missing elements or obviously broken UI states
Color contrast and accessibility problems

Best Practice: Multi-Modal Investigation

For Web UI bugs, use this hierarchy:

# 1. Vision for TRIAGE (identify area of concern)
./vision-analyze-robust.sh screenshot.png "Are there any visual inconsistencies in the navigation?"

# 2. Browser inspection for FACTS (if typography/font suspected)
# Use Playwright or DevTools to query computed CSS:
# const styles = await page.evaluate(() => ({
#   fontFamily: getComputedStyle(element).fontFamily
# }));

# 3. Code investigation for ROOT CAUSE
# grep -r ".suspicious-class" src/

# 4. Vision for VERIFICATION (after fix applied)
./vision-analyze-robust.sh fixed.png "Is the navigation font now consistent?"

When to Stop Using Vision

If vision gives contradictory results across 2+ attempts on similar screenshots:

Stop asking vision for more detailed analysis
Switch to browser DevTools inspection (query computed styles)
Use vision only for final verification after fix is applied

This indicates the issue is too subtle for vision models to detect reliably.

Prompt Patterns for Web UI

Font/Typography (with caveats):

# Be explicit about what to look for
./vision-analyze.sh anthropic ui.png \
  "Look at the navigation text. Do any items have decorative 'feet' at letter ends (serif font) 
  while others have clean straight edges (sans-serif)? Point out any font style differences."
  
# Note: Small fonts may be unreliable - verify with browser inspection

Alignment (relative observations):

# Ask for noticeable differences, not pixel precision
./vision-analyze.sh anthropic ui.png \
  "Is the bullet (•) noticeably misaligned with the text baseline? 
  Describe its vertical position relative to the text."

Layout and Spacing:

# Vision is GOOD at this
./vision-analyze.sh anthropic ui.png \
  "Compare the spacing between navigation sections. Is it consistent?"

Output Format

All scripts output to stdout as plain text. The LLM's analysis is printed directly:

$ python examples/anthropic-vision.py screenshot.png "What's in this image?"

This image shows a web application dashboard with a navigation bar at the top,
a sidebar on the left with menu items, and a main content area displaying...

For structured output, modify your prompt:

python examples/openai-vision.py data.png \
  "Extract data as JSON with keys: title, date, amount"

When to Write Custom Scripts

Use the canned scripts for:

✅ Single image + single prompt analysis
✅ Quick one-off tasks
✅ Simple Q&A about images

Write custom scripts when you need:

❌ Batch processing (analyze 100 images)
❌ Multi-turn conversations (follow-up questions on same image)
❌ Custom output formatting (generate markdown reports)
❌ Image preprocessing (resize, crop, filter)
❌ Provider fallback logic (try Gemini, then Claude)

→ See patterns.md for custom script examples

Anti-Patterns

| ❌ Don't | ✅ Do | |----------|-------| | Write custom script for simple analysis | Use canned scripts | | Use low-quality compressed images | Use clear, high-res images | | Ask vague questions | Be specific in prompts | | Forget to set API keys | Set keys in environment variables | | Mix up provider-specific model names | Check provider comparison table |

Quick Reference

| Task | Command | |------|---------| | Analyze (single provider) | ./vision-analyze.sh anthropic img.png "prompt" | | Analyze (auto-fallback) | ./vision-analyze-robust.sh img.png "prompt" | | Extract text (OCR) | ./vision-analyze.sh gemini img.png "Extract all text" | | Health check | ./health-check.sh | | Compare images | See patterns.md for custom script | | Batch process | See patterns.md for custom script |

⚠️ CRITICAL INSTRUCTIONS FOR AGENTS

READ THIS BEFORE USING THIS SKILL:

1. Always Use the Wrapper Scripts

# For AI agents (recommended) - auto-fallback on timeout
~/.amplifier/skills/image-vision/vision-analyze-robust.sh <image_path> <prompt>

# Single provider (faster if you know which to use)
~/.amplifier/skills/image-vision/vision-analyze.sh <provider> <image_path> <prompt>

Examples:

# Robust analysis (tries multiple providers if timeout)
~/.amplifier/skills/image-vision/vision-analyze-robust.sh screenshot.png "Analyze this UI"

# Specific provider
~/.amplifier/skills/image-vision/vision-analyze.sh anthropic screenshot.png "Describe this"

2. ALWAYS Check Exit Code Before Using Output

# Correct usage pattern
OUTPUT=$(~/.amplifier/skills/image-vision/vision-analyze-robust.sh image.png "Analyze this" 2>&1)
EXIT_CODE=$?

if [ $EXIT_CODE -eq 0 ]; then
    echo "Vision analysis succeeded"
    # Now you can use $OUTPUT
else
    echo "ERROR: Vision analysis failed (exit code: $EXIT_CODE)"
    echo "Error details: $OUTPUT"
    # STOP HERE - do NOT proceed
    exit 1
fi

Exit codes are classified so callers get an honest failure signal instead of a catch-all (this distinction matters: a missing key and a slow provider need different fixes):

| Exit | Meaning | What to do | |------|---------|-----------| | 0 | Success | Use the output | | 1 | Usage error, or all configured providers failed (mixed/other errors) | Read stderr for the underlying error | | 3 | No vision provider configured (no API key present) | Set GOOGLE_API_KEY / ANTHROPIC_API_KEY / OPENAI_API_KEY (or AZURE_OPENAI_API_KEY + AZURE_OPENAI_ENDPOINT) | | 4 | provider_timeout — provider(s) present but every attempt timed out within the bounded timeout | Increase the timeout (3rd arg), use a smaller image, or a faster provider |

vision-analyze-robust.sh fails fast and clearly on exit 3/4 rather than hanging — a missing or slow provider surfaces immediately, not minutes later.

3. NEVER Fabricate Visual Observations

If vision analysis fails, you MUST:

✅ DO:

Report failure explicitly to user
Provide error details from stderr
Ask user how to proceed (retry? different provider? skip visual analysis?)
Wait for user direction before continuing

❌ NEVER:

Write analysis documents without successfully seeing images
Fabricate visual observations based on context/guesswork
Guess pixel measurements or UI element details
Pretend you analyzed screenshots you didn't actually see
Continue with tasks that require visual inspection if vision failed

Example of CORRECT failure handling:

Agent: I attempted to analyze the 3 screenshots using the image-vision skill:
- screenshot-1.png: ✗ Anthropic timed out (60s)
- screenshot-1.png: ✗ Gemini timed out (60s)  
- screenshot-1.png: ✗ OpenAI failed (API error)

I have NOT successfully analyzed any of the screenshots. I cannot provide visual design 
feedback without actually seeing the images.

Options:
1. Retry with different settings
2. Investigate why all providers are failing
3. Defer visual analysis until the issue is resolved

I will NOT write design analysis documents based on guesswork or context alone.

4. Timeout Considerations

Vision API calls typically take 5-60 seconds:

Gemini Flash: 3-10s (fastest)
Anthropic Claude: 5-15s
OpenAI GPT-4: 8-20s

The wrapper scripts handle timeouts with:

60-second default timeout (configurable, always bounded)
Auto-fallback to faster providers (robust script)
Retry logic on transient failures
Automatic screenshot downscaling — every image is capped in width (2000px) and bounded in encoded payload size (a real, fail-closed bound) before it is sent (see examples/image_utils.py). This makes the "resize to 2000px max" guidance automatic so an un-capped full-page screenshot can't produce an oversized, interruptible request. Downscaling is conservative: downscale-only (never upscales), aspect preserved, LANCZOS, EXIF-aware. It caps width rather than the longest edge specifically so a tall full-page capture is not squashed — width is the dimension that text legibility depends on. If a pathological image still exceeds the payload bound at the width floor, it is re-encoded to JPEG, and if it still cannot fit the call fails clearly rather than sending an oversized payload.

Capture hygiene still matters. For text-critical verification (small numbers, labels, precise alignment), prefer viewport-sized captures over aggressive full-page captures, and corroborate with browser/DOM facts — see "Known Limitations for Web UI Analysis" above. The cap removes a hang risk; it does not make vision reliable for sub-pixel typography.

If still hitting timeouts:

Use smaller images (capping is automatic, but viewport captures help most)
Simplify prompts
Use faster models (Gemini Flash)

Environment Setup Reminder

For interactive use:

Create venv: cd image-vision && uv venv
Install SDKs: uv pip install anthropic openai google-genai pillow
Set API keys: Export ANTHROPIC_API_KEY, OPENAI_API_KEY, GOOGLE_API_KEY

For agents:

Just use the wrapper scripts - they auto-setup on first use
Verify health: ./health-check.sh

→ See setup.md for complete instructions

Image Vision Analysis

Overview

Analyze images using state-of-the-art LLM vision models. Use the provided scripts for most tasks - custom code only needed for advanced scenarios.

Workflow Decision Tree

First time using this skill?

→ Read setup.md for one-time environment and API key setup

Simple image analysis (most common)

→ Use "Quick Start" canned scripts below

Batch processing or multi-turn conversations

→ Read patterns.md for advanced patterns

Something failing?

→ Check setup.md for troubleshooting

Quick Start (Use Wrapper Scripts)

ALWAYS use the wrapper scripts - they handle venv setup automatically:

# Simple analysis (auto-creates venv on first use)
./vision-analyze.sh <provider> <image_path> <prompt>

# Robust analysis (auto-fallback if provider times out)
./vision-analyze-robust.sh <image_path> <prompt> [timeout_seconds]

The wrapper scripts automatically:

Create venv if it doesn't exist
Install required SDKs
Use venv Python (no manual activation needed)
Handle errors gracefully

Example usage:

# Analyze a UI screenshot (Anthropic Claude)
./vision-analyze.sh anthropic screenshot.png "Describe any UI bugs or issues you see"

# Extract text (Google Gemini - fastest)
./vision-analyze.sh gemini document.jpg "Extract all text from this image"

# Robust analysis with auto-fallback (tries Gemini → Anthropic → OpenAI)
./vision-analyze-robust.sh photo.png "Describe this image in detail"

# With custom timeout (default is 60 seconds)
./vision-analyze-robust.sh large-image.png "Analyze this" 120

Advanced: Direct Script Usage (Not Recommended)

If you need to call the Python scripts directly, you MUST use the venv Python:

# ❌ WRONG - uses system Python, will fail
python examples/anthropic-vision.py image.png "prompt"

# ✅ CORRECT - uses venv Python
./.venv/bin/python examples/anthropic-vision.py image.png "prompt"

For agents: Always use the wrapper scripts to avoid setup issues.

Provider Comparison

Supported Image Formats

JPEG/JPG - Most common
PNG - With transparency
GIF - Static or animated (only the first frame is analyzed)
WEBP - Modern format

Max sizes:

Anthropic: 5MB per image
OpenAI: 20MB (auto-resizes)
Gemini: Varies by model (1.5 pro handles very large)

Common Use Cases

# UI/UX Analysis - High-level layout and spacing
./vision-analyze.sh anthropic app-screenshot.png \
  "Analyze this UI for accessibility issues and suggest improvements"

# Bug Identification (use robust for auto-fallback)
./vision-analyze-robust.sh error-state.png \
  "What's wrong with this interface? Describe any visual bugs."

# Content Moderation
./vision-analyze.sh openai user-upload.jpg \
  "Does this image contain inappropriate content? Yes or no, and explain."

# Document Understanding (Gemini is fastest)
./vision-analyze.sh gemini invoice.png \
  "Extract the total amount, date, and vendor name from this invoice"

# Design Review - Layout, color, hierarchy (not typography details)
./vision-analyze-robust.sh mockup.png \
  "Provide design feedback on this mockup. Consider layout, color hierarchy, and spacing."

⚠️ Known Limitations for Web UI Analysis

Typography and Font Detection

Vision models struggle with precise typography at typical screenshot resolutions:

❌ Unreliable for:

Distinguishing serif vs sans-serif fonts at small sizes (<16px)
Identifying specific font families (Inter vs Roboto vs Arial)
Detecting subtle weight differences (400 vs 500)
Precise alignment measurements (<5px differences)

✅ Reliable for:

High-level layout issues (spacing, hierarchy, colors)
Large size differences (14px vs 24px heading sizes)
Missing elements or obviously broken UI states
Color contrast and accessibility problems

Best Practice: Multi-Modal Investigation

For Web UI bugs, use this hierarchy:

# 1. Vision for TRIAGE (identify area of concern)
./vision-analyze-robust.sh screenshot.png "Are there any visual inconsistencies in the navigation?"

# 2. Browser inspection for FACTS (if typography/font suspected)
# Use Playwright or DevTools to query computed CSS:
# const styles = await page.evaluate(() => ({
#   fontFamily: getComputedStyle(element).fontFamily
# }));

# 3. Code investigation for ROOT CAUSE
# grep -r ".suspicious-class" src/

# 4. Vision for VERIFICATION (after fix applied)
./vision-analyze-robust.sh fixed.png "Is the navigation font now consistent?"

When to Stop Using Vision

If vision gives contradictory results across 2+ attempts on similar screenshots:

Stop asking vision for more detailed analysis
Switch to browser DevTools inspection (query computed styles)
Use vision only for final verification after fix is applied

This indicates the issue is too subtle for vision models to detect reliably.

Prompt Patterns for Web UI

Font/Typography (with caveats):

# Be explicit about what to look for
./vision-analyze.sh anthropic ui.png \
  "Look at the navigation text. Do any items have decorative 'feet' at letter ends (serif font) 
  while others have clean straight edges (sans-serif)? Point out any font style differences."
  
# Note: Small fonts may be unreliable - verify with browser inspection

Alignment (relative observations):

# Ask for noticeable differences, not pixel precision
./vision-analyze.sh anthropic ui.png \
  "Is the bullet (•) noticeably misaligned with the text baseline? 
  Describe its vertical position relative to the text."

Layout and Spacing:

# Vision is GOOD at this
./vision-analyze.sh anthropic ui.png \
  "Compare the spacing between navigation sections. Is it consistent?"

Output Format

All scripts output to stdout as plain text. The LLM's analysis is printed directly:

$ python examples/anthropic-vision.py screenshot.png "What's in this image?"

This image shows a web application dashboard with a navigation bar at the top,
a sidebar on the left with menu items, and a main content area displaying...

For structured output, modify your prompt:

python examples/openai-vision.py data.png \
  "Extract data as JSON with keys: title, date, amount"

When to Write Custom Scripts

Use the canned scripts for:

✅ Single image + single prompt analysis
✅ Quick one-off tasks
✅ Simple Q&A about images

Write custom scripts when you need:

❌ Batch processing (analyze 100 images)
❌ Multi-turn conversations (follow-up questions on same image)
❌ Custom output formatting (generate markdown reports)
❌ Image preprocessing (resize, crop, filter)
❌ Provider fallback logic (try Gemini, then Claude)

→ See patterns.md for custom script examples

Anti-Patterns

Quick Reference

⚠️ CRITICAL INSTRUCTIONS FOR AGENTS

READ THIS BEFORE USING THIS SKILL:

1. Always Use the Wrapper Scripts

# For AI agents (recommended) - auto-fallback on timeout
~/.amplifier/skills/image-vision/vision-analyze-robust.sh <image_path> <prompt>

# Single provider (faster if you know which to use)
~/.amplifier/skills/image-vision/vision-analyze.sh <provider> <image_path> <prompt>

Examples:

# Robust analysis (tries multiple providers if timeout)
~/.amplifier/skills/image-vision/vision-analyze-robust.sh screenshot.png "Analyze this UI"

# Specific provider
~/.amplifier/skills/image-vision/vision-analyze.sh anthropic screenshot.png "Describe this"

2. ALWAYS Check Exit Code Before Using Output

# Correct usage pattern
OUTPUT=$(~/.amplifier/skills/image-vision/vision-analyze-robust.sh image.png "Analyze this" 2>&1)
EXIT_CODE=$?

if [ $EXIT_CODE -eq 0 ]; then
    echo "Vision analysis succeeded"
    # Now you can use $OUTPUT
else
    echo "ERROR: Vision analysis failed (exit code: $EXIT_CODE)"
    echo "Error details: $OUTPUT"
    # STOP HERE - do NOT proceed
    exit 1
fi

Exit codes are classified so callers get an honest failure signal instead of a catch-all (this distinction matters: a missing key and a slow provider need different fixes):

vision-analyze-robust.sh fails fast and clearly on exit 3/4 rather than hanging — a missing or slow provider surfaces immediately, not minutes later.

3. NEVER Fabricate Visual Observations

If vision analysis fails, you MUST:

✅ DO:

Report failure explicitly to user
Provide error details from stderr
Ask user how to proceed (retry? different provider? skip visual analysis?)
Wait for user direction before continuing

❌ NEVER:

Write analysis documents without successfully seeing images
Fabricate visual observations based on context/guesswork
Guess pixel measurements or UI element details
Pretend you analyzed screenshots you didn't actually see
Continue with tasks that require visual inspection if vision failed

Example of CORRECT failure handling:

Agent: I attempted to analyze the 3 screenshots using the image-vision skill:
- screenshot-1.png: ✗ Anthropic timed out (60s)
- screenshot-1.png: ✗ Gemini timed out (60s)  
- screenshot-1.png: ✗ OpenAI failed (API error)

I have NOT successfully analyzed any of the screenshots. I cannot provide visual design 
feedback without actually seeing the images.

Options:
1. Retry with different settings
2. Investigate why all providers are failing
3. Defer visual analysis until the issue is resolved

I will NOT write design analysis documents based on guesswork or context alone.

4. Timeout Considerations

Vision API calls typically take 5-60 seconds:

Gemini Flash: 3-10s (fastest)
Anthropic Claude: 5-15s
OpenAI GPT-4: 8-20s

The wrapper scripts handle timeouts with:

60-second default timeout (configurable, always bounded)
Auto-fallback to faster providers (robust script)
Retry logic on transient failures
Automatic screenshot downscaling — every image is capped in width (2000px) and bounded in encoded payload size (a real, fail-closed bound) before it is sent (see examples/image_utils.py). This makes the "resize to 2000px max" guidance automatic so an un-capped full-page screenshot can't produce an oversized, interruptible request. Downscaling is conservative: downscale-only (never upscales), aspect preserved, LANCZOS, EXIF-aware. It caps width rather than the longest edge specifically so a tall full-page capture is not squashed — width is the dimension that text legibility depends on. If a pathological image still exceeds the payload bound at the width floor, it is re-encoded to JPEG, and if it still cannot fit the call fails clearly rather than sending an oversized payload.

Capture hygiene still matters. For text-critical verification (small numbers, labels, precise alignment), prefer viewport-sized captures over aggressive full-page captures, and corroborate with browser/DOM facts — see "Known Limitations for Web UI Analysis" above. The cap removes a hang risk; it does not make vision reliable for sub-pixel typography.

If still hitting timeouts:

Use smaller images (capping is automatic, but viewport captures help most)
Simplify prompts
Use faster models (Gemini Flash)

Environment Setup Reminder

For interactive use:

Create venv: cd image-vision && uv venv
Install SDKs: uv pip install anthropic openai google-genai pillow
Set API keys: Export ANTHROPIC_API_KEY, OPENAI_API_KEY, GOOGLE_API_KEY

For agents:

Just use the wrapper scripts - they auto-setup on first use
Verify health: ./health-check.sh

→ See setup.md for complete instructions

Adoption

microsoft/image-vision

$ install --global

Security Scan Results

SKILL.md

Image Vision Analysis

Overview

Workflow Decision Tree

First time using this skill?

Simple image analysis (most common)

Batch processing or multi-turn conversations

Something failing?

Quick Start (Use Wrapper Scripts)

Advanced: Direct Script Usage (Not Recommended)

Provider Comparison

Supported Image Formats

Common Use Cases

⚠️ Known Limitations for Web UI Analysis

Typography and Font Detection

Best Practice: Multi-Modal Investigation

When to Stop Using Vision

Prompt Patterns for Web UI

Output Format

When to Write Custom Scripts

Anti-Patterns

Quick Reference

⚠️ CRITICAL INSTRUCTIONS FOR AGENTS

1. Always Use the Wrapper Scripts

2. ALWAYS Check Exit Code Before Using Output

3. NEVER Fabricate Visual Observations

4. Timeout Considerations

Environment Setup Reminder

See Also

Related Skills

microsoft/council-here

microsoft/council

microsoft/msgraph-integration-patterns

microsoft/amplifier-tool-leverage-patterns

microsoft/image-vision

$ install --global

Security Scan Results

SKILL.md

Image Vision Analysis

Overview

Workflow Decision Tree

First time using this skill?

Simple image analysis (most common)

Batch processing or multi-turn conversations

Something failing?

Quick Start (Use Wrapper Scripts)

Advanced: Direct Script Usage (Not Recommended)

Provider Comparison

Supported Image Formats

Common Use Cases

⚠️ Known Limitations for Web UI Analysis

Typography and Font Detection

Best Practice: Multi-Modal Investigation

When to Stop Using Vision

Prompt Patterns for Web UI

Output Format

When to Write Custom Scripts

Anti-Patterns

Quick Reference

⚠️ CRITICAL INSTRUCTIONS FOR AGENTS

1. Always Use the Wrapper Scripts

2. ALWAYS Check Exit Code Before Using Output

3. NEVER Fabricate Visual Observations

4. Timeout Considerations

Environment Setup Reminder

See Also

Related Skills

microsoft/council-here

microsoft/council

microsoft/msgraph-integration-patterns

microsoft/amplifier-tool-leverage-patterns