Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

zai-org/glmv-caption

Name: glmv-caption
Author: zai-org

skills/glmv-caption/SKILL.md

npx skillsauth add zai-org/GLM-V glmv-caption

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

GLM-V Caption Skill

Generate captions for images, videos, and documents using the ZhiPu GLM-V multimodal model.

When to Use

Describe, caption, summarize, or interpret image/video/document content
User mentions "describe this image", "caption", "summarize this video", "图片描述", "视频摘要", "文档解读", "看图说话"
Extract visual or textual information from media files
Compare multiple images
User provides an image/video/file and asks what's in it

Supported Input Types

| Type | Formats | Max Size | Max Count | Base64 | | ----- | --------------------------------- | ----------------- | --------- | ------ | | Image | jpg, png, jpeg | 5MB / 6000×6000px | 50 | ✅ | | Video | mp4, mkv, mov | 200MB | — | ❌ | | File | pdf, docx, txt, xlsx, pptx, jsonl | — | 50 | ❌ |

⚠️ file_url cannot mix with image_url or video_url in the same request. ⚠️ Videos and files only support URLs — local paths and base64 are NOT supported (images only).

Resource Links

| Resource | Link | | --------------- | --------------------------------------------------------------------------------------------------------------------------------- | | Get API Key | https://bigmodel.cn/usercenter/proj-mgmt/apikeys | | API Docs | Chat Completions / 对话补全 |

Prerequisites

API Key Setup / API Key 配置（Required / 必需）

This script reads the key from the ZHIPU_API_KEY environment variable and shares it with other Zhipu skills. 脚本通过 ZHIPU_API_KEY 环境变量获取密钥，与其他智谱技能共用同一个 key。

Get Key / 获取 Key： Visit Zhipu Open Platform API Keys / 智谱开放平台 API Keys to create or copy your key.

Setup options / 配置方式（任选一种）：

OpenClaw config (recommended) / OpenClaw 配置（推荐）： Set in openclaw.json under skills.entries.glmv-caption.env:
```
"glmv-caption": { "enabled": true, "env": { "ZHIPU_API_KEY": "你的密钥" } }
```
Shell environment variable / Shell 环境变量： Add to ~/.zshrc:
```
export ZHIPU_API_KEY="你的密钥"
```
.env file / .env 文件： Create .env in this skill directory:
```
ZHIPU_API_KEY=你的密钥
```

⛔ MANDATORY RESTRICTIONS - DO NOT VIOLATE ⛔

ONLY use GLM-V API — Execute the script python scripts/glmv_caption.py
NEVER caption media yourself — Do NOT try to describe content using built-in vision or any other method
NEVER offer alternatives — Do NOT suggest "I can try to describe it" or similar
IF API fails — Display the error message and STOP immediately
NO fallback methods — Do NOT attempt captioning any other way

📋 Output Display Rules (MANDATORY)

After running the script, you must show the full raw output to the user exactly as returned. Do not summarize, truncate, or only say "generated". Users need the original model output to evaluate quality.

Image captioning: show the full caption text
Multiple images: show each image result
Video/files: show the full understanding result
If token usage is included, you may optionally display it

How to Use

Caption an Image

python scripts/glmv_caption.py --images "https://example.com/photo.jpg"
python scripts/glmv_caption.py --images /path/to/photo.png

Caption Multiple Images

python scripts/glmv_caption.py --images img1.jpg img2.png "https://example.com/img3.jpg"

Caption a Video

python scripts/glmv_caption.py --videos "https://example.com/clip.mp4"

Caption a Document

python scripts/glmv_caption.py --files "https://example.com/report.pdf"
python scripts/glmv_caption.py --files "https://example.com/doc1.docx" "https://example.com/doc2.txt"

Custom Prompt

python scripts/glmv_caption.py --images photo.jpg --prompt "Describe the architecture style in detail"

Save Result

python scripts/glmv_caption.py --images photo.jpg --output result.json

Thinking Mode

python scripts/glmv_caption.py --images photo.jpg --thinking

CLI Reference

python {baseDir}/scripts/glmv_caption.py (--images IMG [IMG...] | --videos VID [VID...] | --files FILE [FILE...]) [OPTIONS]

| Parameter | Required | Description | | --------------------- | -------- | -------------------------------------------------------------------------------------------- | | --images, -i | One of | Image paths or URLs (supports multiple, base64 OK) | | --videos, -v | One of | Video paths or URLs (supports multiple, mp4/mkv/mov) | | --files, -f | One of | Document paths or URLs (supports multiple, pdf/docx/txt/xlsx/pptx/jsonl) | | --prompt, -p | No | Custom prompt (default: "请详细描述这张图片的内容" / "Please describe this image in detail") | | --model, -m | No | Model name (default: glm-4.6v) | | --temperature, -t | No | Sampling temperature 0-1 (default: 0.8) | | --top-p | No | Nucleus sampling 0.01-1.0 (default: 0.6) | | --max-tokens | No | Max output tokens (default: 1024, max 32768) | | --thinking | No | Enable thinking/reasoning mode | | --output, -o | No | Save result JSON to file | | --pretty | No | Pretty-print JSON output | | --stream | No | Enable streaming output |

Note: --images, --videos, and --files are mutually exclusive per API limits.

Response Format

{
  "success": true,
  "caption": "A landscape photo showing a mountain range at sunset...",
  "usage": {
    "prompt_tokens": 128,
    "completion_tokens": 256,
    "total_tokens": 384
  }
}

Key fields:

success — whether the request succeeded
caption — the generated caption text
usage — token usage statistics
warning — present when content was blocked by safety review
error — error details on failure

Error Handling

API key not configured:

ZHIPU_API_KEY not configured. Get your API key at: https://bigmodel.cn/usercenter/proj-mgmt/apikeys

→ Show exact error to user, guide them to configure

Authentication failed (401/403): API key invalid/expired → reconfigure

Rate limit (429): Quota exhausted → inform user to wait

File not found: Local file missing → check path

Content filtered: warning field present → content blocked by safety review

zai-org/glmv-caption

skills/glmv-caption/SKILL.md

Generate captions (descriptions) for images, videos, and documents using ZhiPu GLM-V multimodal model series. Use this skill whenever the user wants to describe, caption, summarize, or interpret the content of images, videos, or files. Supports single/multiple inputs, URLs, local paths, and base64 (images only).

2,274 stars

documentation

Updated Apr 21, 2026

$ install --global

skillsauth

npx skillsauth add zai-org/GLM-V glmv-caption

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 21, 2026, 11:49 PM114.8s2 files scanned

SKILL.md

name:: glmv-caption
primaryEnv:: ZHIPU_API_KEY
emoji:: 🖼️
homepage:: https://github.com/zai-org/GLM-V/tree/main/skills/glmv-caption

GLM-V Caption Skill

Generate captions for images, videos, and documents using the ZhiPu GLM-V multimodal model.

When to Use

Describe, caption, summarize, or interpret image/video/document content
User mentions "describe this image", "caption", "summarize this video", "图片描述", "视频摘要", "文档解读", "看图说话"
Extract visual or textual information from media files
Compare multiple images
User provides an image/video/file and asks what's in it

Supported Input Types

⚠️ file_url cannot mix with image_url or video_url in the same request. ⚠️ Videos and files only support URLs — local paths and base64 are NOT supported (images only).

Resource Links

Prerequisites

API Key Setup / API Key 配置（Required / 必需）

Get Key / 获取 Key： Visit Zhipu Open Platform API Keys / 智谱开放平台 API Keys to create or copy your key.

Setup options / 配置方式（任选一种）：

OpenClaw config (recommended) / OpenClaw 配置（推荐）： Set in openclaw.json under skills.entries.glmv-caption.env:
```
"glmv-caption": { "enabled": true, "env": { "ZHIPU_API_KEY": "你的密钥" } }
```
Shell environment variable / Shell 环境变量： Add to ~/.zshrc:
```
export ZHIPU_API_KEY="你的密钥"
```
.env file / .env 文件： Create .env in this skill directory:
```
ZHIPU_API_KEY=你的密钥
```

⛔ MANDATORY RESTRICTIONS - DO NOT VIOLATE ⛔

ONLY use GLM-V API — Execute the script python scripts/glmv_caption.py
NEVER caption media yourself — Do NOT try to describe content using built-in vision or any other method
NEVER offer alternatives — Do NOT suggest "I can try to describe it" or similar
IF API fails — Display the error message and STOP immediately
NO fallback methods — Do NOT attempt captioning any other way

📋 Output Display Rules (MANDATORY)

Image captioning: show the full caption text
Multiple images: show each image result
Video/files: show the full understanding result
If token usage is included, you may optionally display it

How to Use

Caption an Image

python scripts/glmv_caption.py --images "https://example.com/photo.jpg"
python scripts/glmv_caption.py --images /path/to/photo.png

Caption Multiple Images

python scripts/glmv_caption.py --images img1.jpg img2.png "https://example.com/img3.jpg"

Caption a Video

python scripts/glmv_caption.py --videos "https://example.com/clip.mp4"

Caption a Document

python scripts/glmv_caption.py --files "https://example.com/report.pdf"
python scripts/glmv_caption.py --files "https://example.com/doc1.docx" "https://example.com/doc2.txt"

Custom Prompt

python scripts/glmv_caption.py --images photo.jpg --prompt "Describe the architecture style in detail"

Save Result

python scripts/glmv_caption.py --images photo.jpg --output result.json

Thinking Mode

python scripts/glmv_caption.py --images photo.jpg --thinking

CLI Reference

python {baseDir}/scripts/glmv_caption.py (--images IMG [IMG...] | --videos VID [VID...] | --files FILE [FILE...]) [OPTIONS]

Note: --images, --videos, and --files are mutually exclusive per API limits.

Response Format

{
  "success": true,
  "caption": "A landscape photo showing a mountain range at sunset...",
  "usage": {
    "prompt_tokens": 128,
    "completion_tokens": 256,
    "total_tokens": 384
  }
}

Key fields:

success — whether the request succeeded
caption — the generated caption text
usage — token usage statistics
warning — present when content was blocked by safety review
error — error details on failure

Error Handling

API key not configured:

ZHIPU_API_KEY not configured. Get your API key at: https://bigmodel.cn/usercenter/proj-mgmt/apikeys

→ Show exact error to user, guide them to configure

Authentication failed (401/403): API key invalid/expired → reconfigure

Rate limit (429): Quota exhausted → inform user to wait

File not found: Local file missing → check path

Content filtered: warning field present → content blocked by safety review

Related Skills

zai-org/glmv-web-replication

tools

VerifiedTrustedCommunity

Frontend visual replication skill. Explores a target website’s publicly visible pages via Playwright MCP or agent-browser, captures screenshots and layout information, then generates a static or client-side frontend replica that approximates the original’s visual appearance and page structure. This skill replicates FRONTEND PRESENTATION ONLY — it does not reproduce backend logic, server-side behavior, databases, or any non-public content. The user is responsible for ensuring they have proper authorization (ownership, license, or explicit permission) before replicating any website. ⚠️ Authorization gate: Before starting, the agent MUST confirm with the user that they have the legal right to replicate the target site. If the user cannot confirm, the skill MUST refuse to proceed.

2,274SKILL.mdUpdated Apr 21, 2026

zai-org/glmv-web-replication

zai-org/glmv-stock-analyst

tools

VerifiedTrustedCommunity

股票分析与涨跌预测分析。在用户表达分析、判断或预测意图时触发，如“分析一下腾讯”、“0700最近走势如何”、“XX能不能买”、“预测一下后续走势”、“生成一份分析报告”等；对于简单查询类需求（如“腾讯当前价格是多少”、“茅台代码是什么”）不触发本 Skill。支持港股、A股、美股，整合多源数据（包括新闻、基本面、技术面、资金流及宏观信息）进行多维综合分析，输出图文结合、包含可视化图表的结构化分析报告。 ⚠️ 需要多模态主模型支持（如 glm-5v-turbo），主模型需能读取图片。

2,274SKILL.mdUpdated Apr 21, 2026

zai-org/glmv-stock-analyst

zai-org/glmv-resume-screen

documentation

VerifiedTrustedCommunity

Screen and evaluate resumes against criteria using ZhiPu GLM-V multimodal model. Reads multiple resume files (PDF/DOCX/TXT), compares against user-defined screening criteria, and outputs a Markdown table with pass/fail analysis. Use when the user wants to filter resumes, compare candidates, or batch-evaluate job applications.

2,274SKILL.mdUpdated Apr 21, 2026

zai-org/glmv-resume-screen

zai-org/glmv-prompt-gen

tools

VerifiedTrustedCommunity

Analyze images/videos and generate professional prompts for text-to-image and text-to-video AI tools (Midjourney, Stable Diffusion, DALL-E, Sora, Runway, Kling, Pika). Use when the user wants to generate prompts from reference images/videos, create AI art prompts, or get prompt engineering suggestions from visual content.

2,274SKILL.mdUpdated Apr 21, 2026

zai-org/glmv-prompt-gen

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/zai-org/GLM-V.git

# Copy into Claude Code skills folder (global)
cp -r GLM-V/skills/glmv-caption ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

zai-org/GLM-V

2,274 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT