plugins/give-claude-eyes/skills/qwen-vision/SKILL.md
Use when the user asks to "analyze video", "watch this video", "what happens in this video", "describe this clip", "review this footage", "classify these videos", "compare videos", "analyze this image", "what's in this screenshot", or when the user provides a video/image file path and expects visual understanding. Also trigger on: "qwen", "video bridge", "multimodal analysis", "motion analysis", "video reference", "video breakdown", "batch classify", or any task requiring understanding of video content that Claude cannot do natively.
npx skillsauth add davepoon/buildwithclaude qwen-visionInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Claude cannot natively understand video. This skill bridges that gap by calling Qwen Omni — a natively multimodal model that processes video with temporal attention (it sees motion, not just individual frames).
The bridge also handles images, useful when you want Qwen's analysis on screenshots, diagrams, or photos.
A Python script at ${CLAUDE_PLUGIN_ROOT}/skills/qwen-vision/scripts/qwen_bridge.py sends media files to the Qwen API and returns the analysis as text. Call it via Bash.
The user must have:
DASHSCOPE_API_KEY environment variable set (get one at https://dashscope.console.aliyun.com/ or https://modelstudio.console.alibabacloud.com/)dashscope package installedIf the user hasn't set up yet, suggest running /qwen-setup first.
python3 "${CLAUDE_PLUGIN_ROOT}/skills/qwen-vision/scripts/qwen_bridge.py" "/path/to/video.mp4" "Describe what happens in this video"
| Flag | Default | Description |
|------|---------|-------------|
| (positional 1) | required | Path to video or image file |
| (positional 2) | generic prompt | Analysis prompt |
| --fps | 2.0 | Frames per second to sample from video. Lower = cheaper, higher = more detail |
| --model | qwen-omni-plus-latest | Qwen model to use |
| --json | off | Output as JSON (for parsing) |
| --context | none | Path to JSON file with previous conversation (multi-turn) |
| --save-context | none | Save conversation context for follow-up questions |
| --system-prompt | none | Custom system prompt for Qwen |
| --prompt-file | none | Read prompt from a file instead of argument |
Video: .mp4, .mov, .avi, .mkv, .webm, .flv, .wmv Image: .png, .jpg, .jpeg, .gif, .webp, .bmp, .tiff
python3 "${CLAUDE_PLUGIN_ROOT}/skills/qwen-vision/scripts/qwen_bridge.py" "/path/to/video.mp4" "Describe the character's body movement, poses, and transitions" --fps 2
Parse the text response and use it in your answer to the user.
When the user has multiple videos to analyze, write a Python script that loops through files and calls the bridge for each one. Use --json flag for machine-readable output. See references/batch-pattern.md for a template.
# First question
python3 "${CLAUDE_PLUGIN_ROOT}/skills/qwen-vision/scripts/qwen_bridge.py" video.mp4 "General analysis" --save-context /tmp/ctx.json
# Follow-up
python3 "${CLAUDE_PLUGIN_ROOT}/skills/qwen-vision/scripts/qwen_bridge.py" video.mp4 "Tell me more about the lighting" --context /tmp/ctx.json
Same script, just pass an image path instead of video:
python3 "${CLAUDE_PLUGIN_ROOT}/skills/qwen-vision/scripts/qwen_bridge.py" "/path/to/screenshot.png" "What UI elements are visible in this screenshot?"
--fps 1 for long videos or when fine detail isn't needed--fps 0.5 for very long videos (minutes+)--fps 1 and increase only if results are too vagueDASHSCOPE_API_KEY is not set, the script exits with a clear error message. Guide the user to set it up.dashscope is not installed, suggest pip install dashscope.--fps or trimming the video first.This is important context for the user: Qwen processes video frames with temporal attention — it understands motion, direction, rhythm, and transitions between frames. Claude analyzing individual screenshots cannot do this. When the user needs to understand what happens in a video (not just what a single frame looks like), this bridge is the right tool.
references/batch-pattern.md — template for batch video classificationreferences/prompt-tips.md — effective prompts for different analysis typestools
Assesses the current state of the startup project and recommends what to focus on next. Use when there is a need or a question from the user to understand what the next steps are or what to focus on next.
data-ai
Use at the start of any conversation about a startup idea, product validation, founder strategy, or work inside a `startup/` workspace. Establishes file conventions, voice-input handling, subagent dispatch rules, and how to update each artifact safely. Activate before invoking any other startup-superpowers skill.
tools
Manages the founder's survey-based validation — crafting the right questions, deploying a survey to the internet, and analyzing results against hypotheses. Use when the founder wants to run a survey, create survey questions, validate hypotheses at scale, check how a survey is going, understand whether a survey is the right tool right now, or deploy a question set to get quantitative signal. Also bring this up if you believe that creating a survey to collect quantitative evidence may be useful at this point.
development
Guides the founder through designing and optionally building the simplest MVP or prototype that validates their current hypotheses. Use when the founder wants to build something to test assumptions, discusses what to build next, wants to interpret results from a live MVP, or is deciding whether the current approach is still right. Also use when a founder proposes something to build — the skill will check whether the proposed form is the simplest thing that generates honest signal.