Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

markus41/vision-multimodal

Name: vision-multimodal
Author: markus41

.claude/skills/vision-multimodal/SKILL.md

npx skillsauth add markus41/claude vision-multimodal

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Vision & Multimodal Skill

Leverage Claude's vision capabilities for image analysis, document processing, and multimodal understanding.

When to Use This Skill

Image analysis and description
Document/PDF processing
Screenshot analysis
OCR-like text extraction
Visual comparison
Chart and diagram interpretation

Supported Formats

| Format | Status | Best For | |--------|--------|----------| | JPEG | ✓ | Photos, natural scenes | | PNG | ✓ | Screenshots, UI, text | | GIF | ✓ | Animated (first frame) | | WebP | ✓ | Modern, compressed | | PDF | ✓ | Documents (via Files API) |

Image Size Guidelines

Minimum: 200 pixels (smaller = reduced accuracy)
Optimal: 1000x1000 pixels
Maximum: 8000x8000 pixels
Token cost: ~(width × height) / 1000
Tip: Resize to 1568px max dimension for 30-50% token savings

Core Patterns

Pattern 1: Single Image Analysis

import anthropic
import base64

client = anthropic.Anthropic()

# Load and encode image
with open("image.jpg", "rb") as f:
    image_data = base64.standard_b64encode(f.read()).decode("utf-8")

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "image",
                "source": {
                    "type": "base64",
                    "media_type": "image/jpeg",
                    "data": image_data
                }
            },
            {
                "type": "text",
                "text": "Describe this image in detail."
            }
        ]
    }]
)

Pattern 2: Image from URL

import httpx

# Fetch and encode from URL
image_url = "https://example.com/image.jpg"
response = httpx.get(image_url)
image_data = base64.standard_b64encode(response.content).decode("utf-8")

# Then use same pattern as above

Pattern 3: Multiple Images

# Compare multiple images (up to 100 per request)
messages = [{
    "role": "user",
    "content": [
        {"type": "image", "source": {"type": "base64", "media_type": "image/jpeg", "data": image1}},
        {"type": "image", "source": {"type": "base64", "media_type": "image/jpeg", "data": image2}},
        {"type": "text", "text": "Compare these two images and list the differences."}
    ]
}]

Pattern 4: Few-Shot with Images

# Teach by example
messages = [
    # Example 1
    {"role": "user", "content": [
        {"type": "image", "source": {...}},
        {"type": "text", "text": "Classify this image."}
    ]},
    {"role": "assistant", "content": "Category: Landscape\nElements: Mountains, lake, trees"},

    # Example 2
    {"role": "user", "content": [
        {"type": "image", "source": {...}},
        {"type": "text", "text": "Classify this image."}
    ]},
    {"role": "assistant", "content": "Category: Portrait\nElements: Person, indoor, professional"},

    # Target image
    {"role": "user", "content": [
        {"type": "image", "source": {"type": "base64", "media_type": "image/jpeg", "data": target_image}},
        {"type": "text", "text": "Classify this image."}
    ]}
]

Pattern 5: PDF Processing

# Using Files API (beta)
with open("document.pdf", "rb") as f:
    pdf_data = base64.standard_b64encode(f.read()).decode("utf-8")

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "document",
                "source": {
                    "type": "base64",
                    "media_type": "application/pdf",
                    "data": pdf_data
                }
            },
            {"type": "text", "text": "Summarize this document."}
        ]
    }]
)

Prompt Engineering for Vision

Strategy 1: Role Assignment

prompt = """You have perfect vision and exceptional attention to detail,
making you an expert at analyzing technical diagrams.

Analyze this architecture diagram and identify:
1. All components
2. Data flow between components
3. Potential bottlenecks"""

Strategy 2: Step-by-Step Thinking

prompt = """Before answering, analyze the image systematically:

<thinking>
1. What is the overall subject?
2. What are the key elements?
3. How do elements relate to each other?
4. What details stand out?
</thinking>

Then provide your answer based on this analysis."""

Strategy 3: Structured Output

prompt = """Extract information from this receipt and return as JSON:

{
    "vendor": "",
    "date": "",
    "items": [{"name": "", "price": 0}],
    "total": 0
}"""

Image Optimization

from PIL import Image
import io

def optimize_for_claude(image_path, max_dimension=1568):
    """Resize image to reduce token usage by 30-50%"""
    with Image.open(image_path) as img:
        # Calculate new dimensions
        ratio = min(max_dimension / img.width, max_dimension / img.height)
        if ratio < 1:
            new_size = (int(img.width * ratio), int(img.height * ratio))
            img = img.resize(new_size, Image.LANCZOS)

        # Convert to bytes
        buffer = io.BytesIO()
        img.save(buffer, format="JPEG", quality=85)
        return base64.standard_b64encode(buffer.getvalue()).decode("utf-8")

Common Use Cases

Text Extraction (OCR-like)

prompt = """Extract all text from this image.
Preserve the original formatting and structure as much as possible.
If text is unclear, indicate with [unclear]."""

Table Extraction

prompt = """Extract the table data from this image.
Return as a markdown table with proper headers and alignment."""

Chart Analysis

prompt = """Analyze this chart:
1. What type of chart is this?
2. What are the axes/labels?
3. What are the key data points?
4. What trends or patterns are visible?"""

Best Practices

DO:

Use high-quality images (≥1000px)
Resize large images to save tokens
Provide context about what to look for
Use few-shot examples for consistent output

DON'T:

Send images smaller than 200px
Expect perfect OCR for handwriting
Send very large images (>8000px)
Ignore token costs for multiple images

Limitations

Cannot identify specific individuals
May struggle with very small text
Animated GIFs: only first frame analyzed
Some specialized symbols may be misread

markus41/vision-multimodal

.claude/skills/vision-multimodal/SKILL.md

Vision and multimodal capabilities for Claude including image analysis, PDF processing, and document understanding. Activate for image input, base64 encoding, multiple images, and visual analysis.

9 stars

documentation

Updated Apr 7, 2026

$ install --global

skillsauth

npx skillsauth add markus41/claude vision-multimodal

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 24, 2026, 8:40 PM1.9s1 file scanned

SKILL.md

name:: vision-multimodal
description:: Vision and multimodal capabilities for Claude including image analysis, PDF processing, and document understanding. Activate for image input, base64 encoding, multiple images, and visual analysis.

Vision & Multimodal Skill

Leverage Claude's vision capabilities for image analysis, document processing, and multimodal understanding.

When to Use This Skill

Image analysis and description
Document/PDF processing
Screenshot analysis
OCR-like text extraction
Visual comparison
Chart and diagram interpretation

Supported Formats

Image Size Guidelines

Minimum: 200 pixels (smaller = reduced accuracy)
Optimal: 1000x1000 pixels
Maximum: 8000x8000 pixels
Token cost: ~(width × height) / 1000
Tip: Resize to 1568px max dimension for 30-50% token savings

Core Patterns

Pattern 1: Single Image Analysis

import anthropic
import base64

client = anthropic.Anthropic()

# Load and encode image
with open("image.jpg", "rb") as f:
    image_data = base64.standard_b64encode(f.read()).decode("utf-8")

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "image",
                "source": {
                    "type": "base64",
                    "media_type": "image/jpeg",
                    "data": image_data
                }
            },
            {
                "type": "text",
                "text": "Describe this image in detail."
            }
        ]
    }]
)

Pattern 2: Image from URL

import httpx

# Fetch and encode from URL
image_url = "https://example.com/image.jpg"
response = httpx.get(image_url)
image_data = base64.standard_b64encode(response.content).decode("utf-8")

# Then use same pattern as above

Pattern 3: Multiple Images

# Compare multiple images (up to 100 per request)
messages = [{
    "role": "user",
    "content": [
        {"type": "image", "source": {"type": "base64", "media_type": "image/jpeg", "data": image1}},
        {"type": "image", "source": {"type": "base64", "media_type": "image/jpeg", "data": image2}},
        {"type": "text", "text": "Compare these two images and list the differences."}
    ]
}]

Pattern 4: Few-Shot with Images

# Teach by example
messages = [
    # Example 1
    {"role": "user", "content": [
        {"type": "image", "source": {...}},
        {"type": "text", "text": "Classify this image."}
    ]},
    {"role": "assistant", "content": "Category: Landscape\nElements: Mountains, lake, trees"},

    # Example 2
    {"role": "user", "content": [
        {"type": "image", "source": {...}},
        {"type": "text", "text": "Classify this image."}
    ]},
    {"role": "assistant", "content": "Category: Portrait\nElements: Person, indoor, professional"},

    # Target image
    {"role": "user", "content": [
        {"type": "image", "source": {"type": "base64", "media_type": "image/jpeg", "data": target_image}},
        {"type": "text", "text": "Classify this image."}
    ]}
]

Pattern 5: PDF Processing

# Using Files API (beta)
with open("document.pdf", "rb") as f:
    pdf_data = base64.standard_b64encode(f.read()).decode("utf-8")

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "document",
                "source": {
                    "type": "base64",
                    "media_type": "application/pdf",
                    "data": pdf_data
                }
            },
            {"type": "text", "text": "Summarize this document."}
        ]
    }]
)

Prompt Engineering for Vision

Strategy 1: Role Assignment

prompt = """You have perfect vision and exceptional attention to detail,
making you an expert at analyzing technical diagrams.

Analyze this architecture diagram and identify:
1. All components
2. Data flow between components
3. Potential bottlenecks"""

Strategy 2: Step-by-Step Thinking

prompt = """Before answering, analyze the image systematically:

<thinking>
1. What is the overall subject?
2. What are the key elements?
3. How do elements relate to each other?
4. What details stand out?
</thinking>

Then provide your answer based on this analysis."""

Strategy 3: Structured Output

prompt = """Extract information from this receipt and return as JSON:

{
    "vendor": "",
    "date": "",
    "items": [{"name": "", "price": 0}],
    "total": 0
}"""

Image Optimization

from PIL import Image
import io

def optimize_for_claude(image_path, max_dimension=1568):
    """Resize image to reduce token usage by 30-50%"""
    with Image.open(image_path) as img:
        # Calculate new dimensions
        ratio = min(max_dimension / img.width, max_dimension / img.height)
        if ratio < 1:
            new_size = (int(img.width * ratio), int(img.height * ratio))
            img = img.resize(new_size, Image.LANCZOS)

        # Convert to bytes
        buffer = io.BytesIO()
        img.save(buffer, format="JPEG", quality=85)
        return base64.standard_b64encode(buffer.getvalue()).decode("utf-8")

Common Use Cases

Text Extraction (OCR-like)

prompt = """Extract all text from this image.
Preserve the original formatting and structure as much as possible.
If text is unclear, indicate with [unclear]."""

Table Extraction

prompt = """Extract the table data from this image.
Return as a markdown table with proper headers and alignment."""

Chart Analysis

prompt = """Analyze this chart:
1. What type of chart is this?
2. What are the axes/labels?
3. What are the key data points?
4. What trends or patterns are visible?"""

Best Practices

DO:

Use high-quality images (≥1000px)
Resize large images to save tokens
Provide context about what to look for
Use few-shot examples for consistent output

DON'T:

Send images smaller than 200px
Expect perfect OCR for handwriting
Send very large images (>8000px)
Ignore token costs for multiple images

Limitations

Cannot identify specific individuals
May struggle with very small text
Animated GIFs: only first frame analyzed
Some specialized symbols may be misread

Related Skills

markus41/plugins/microsoft-agents-expert/skills/teams-agents

tools

VerifiedTrustedCommunity

Build Teams-native agents with the Teams SDK (formerly Teams AI Library v2) — App class, activity routing, adaptive cards, streaming, AI-generated labels, feedback, message extensions, Teams-as-MCP-server, and the bring-your-own-AI pattern with Agent Framework.

18SKILL.mdUpdated Jul 12, 2026

markus41/plugins/microsoft-agents-expert/skills/teams-agents

markus41/plugins/microsoft-agents-expert/skills/microsoft-foundry

tools

VerifiedTrustedCommunity

Run agents on Microsoft Foundry (formerly Azure AI Foundry) Agent Service — prompt agents vs hosted agents, threads/runs and the Responses API, built-in tools (Bing grounding, code interpreter, file search, MCP, OpenAPI, A2A), connected agents, Entra agent identity, SDKs, and observability/evaluations.

18SKILL.mdUpdated Jul 12, 2026

markus41/plugins/microsoft-agents-expert/skills/microsoft-foundry

markus41/plugins/microsoft-agents-expert/skills/m365-agents-sdk

tools

VerifiedTrustedCommunity

Build and host custom engine agents with the Microsoft 365 Agents SDK — AgentApplication, the Activity protocol, channel reach via Azure Bot Service, hosting Agent Framework or Semantic Kernel engines, and the Agents Toolkit/Playground workflow. Successor to the Bot Framework SDK.

18SKILL.mdUpdated Jul 12, 2026

markus41/plugins/microsoft-agents-expert/skills/m365-agents-sdk

markus41/plugins/microsoft-agents-expert/skills/copilot-studio

tools

VerifiedTrustedCommunity

Design, govern, and extend Microsoft Copilot Studio agents — topics, generative orchestration, knowledge, tools and MCP, agent flows, autonomous triggers, publishing channels, Copilot Credits pricing, and solution-based ALM on Power Platform.

18SKILL.mdUpdated Jul 12, 2026

markus41/plugins/microsoft-agents-expert/skills/copilot-studio

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/markus41/claude.git

# Copy into Claude Code skills folder (global)
cp -r claude/.claude/skills/vision-multimodal ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

markus41/claude

9 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT

Adoption

markus41/vision-multimodal

$ install --global

Security Scan Results

SKILL.md

Vision & Multimodal Skill

When to Use This Skill

Supported Formats

Image Size Guidelines

Core Patterns

Pattern 1: Single Image Analysis

Pattern 2: Image from URL

Pattern 3: Multiple Images

Pattern 4: Few-Shot with Images

Pattern 5: PDF Processing

Prompt Engineering for Vision

Strategy 1: Role Assignment

Strategy 2: Step-by-Step Thinking

Strategy 3: Structured Output

Image Optimization

Common Use Cases

Text Extraction (OCR-like)

Table Extraction

Chart Analysis

Best Practices

DO:

DON'T:

Limitations

See Also

Related Skills

markus41/plugins/microsoft-agents-expert/skills/teams-agents

markus41/plugins/microsoft-agents-expert/skills/microsoft-foundry

markus41/plugins/microsoft-agents-expert/skills/m365-agents-sdk

markus41/plugins/microsoft-agents-expert/skills/copilot-studio

markus41/vision-multimodal

$ install --global

Security Scan Results

SKILL.md

Vision & Multimodal Skill

When to Use This Skill

Supported Formats

Image Size Guidelines

Core Patterns

Pattern 1: Single Image Analysis

Pattern 2: Image from URL

Pattern 3: Multiple Images

Pattern 4: Few-Shot with Images

Pattern 5: PDF Processing

Prompt Engineering for Vision

Strategy 1: Role Assignment

Strategy 2: Step-by-Step Thinking

Strategy 3: Structured Output

Image Optimization

Common Use Cases

Text Extraction (OCR-like)

Table Extraction

Chart Analysis

Best Practices

DO:

DON'T:

Limitations

See Also

Related Skills

markus41/plugins/microsoft-agents-expert/skills/teams-agents

markus41/plugins/microsoft-agents-expert/skills/microsoft-foundry

markus41/plugins/microsoft-agents-expert/skills/m365-agents-sdk

markus41/plugins/microsoft-agents-expert/skills/copilot-studio