skills/glmv-pdf-to-web/SKILL.md
Convert a PDF (research paper, technical report, or project document) into a beautiful single-page academic/project website with a structured outline JSON. Trigger this skill when the user wants to make a paper page, project homepage, or academic website from a PDF — in Chinese or English.
npx skillsauth add zai-org/GLM-V glmv-pdf-to-webInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Convert a research paper or technical document PDF into a polished single-page project website — the kind used for NeurIPS/CVPR/ICLR paper releases. Pages are converted locally at DPI 120, a structured outline.json is saved, images are cropped locally, and the final page is saved with generate_web.py.
Scripts are in: {SKILL_DIR}/scripts/
Python packages (install once):
pip install pymupdf pillow
System tools: curl (pre-installed on macOS/Linux).
Trigger when the user asks to create a webpage or project page from a PDF — phrases like: "make a project page from a PDF", "create a paper website", "build an academic website for this paper", "论文主页", "做项目主页", "根据pdf做网页", "把论文做成主页", or any similar intent in Chinese or English.
All output goes under {WORKSPACE}/web/<pdf_stem>_<timestamp>/:
web/
└── <pdf_stem>_<timestamp>/
├── outline.json ← structured web plan (WebPlan schema)
├── crops/ ← locally-saved cropped images
│ ├── fig_arch_crop.png
│ ├── table_results_crop.png
│ └── ...
└── index.html ← the website
<pdf_stem> = PDF filename without extension<timestamp> = format YYYYMMDD_HHMMSScrops/<name>_crop.png$ARGUMENTS is the path to the PDF file (local) or an HTTP/HTTPS URL.
import os, datetime
pdf_stem = os.path.splitext(os.path.basename(pdf_path))[0]
timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
out_dir = os.path.join(workspace, "web", f"{pdf_stem}_{timestamp}")
mkdir -p "<out_dir>/crops"
If the input is a URL, download it first:
pdf_stem=$(basename "$ARGUMENTS" .pdf)
curl -L -o "/tmp/${pdf_stem}.pdf" "$ARGUMENTS"
Then convert (pass either the downloaded path or the original local path):
python {SKILL_DIR}/scripts/pdf_to_images.py "<pdf_path>" --dpi 120
Outputs JSON to stdout:
[{"page": 1, "path": "/abs/path/page_001.png"}, ...]
Parse and store the full page → path map.
View all page images sequentially before planning. Goal: pure understanding of the document's content, figures, and structure.
While reading, note:
Do NOT plan sections yet — read everything first.
Plan the website sections. Standard structure for academic papers (adapt as needed):
| section_id | Purpose |
|---|---|
| hero | Title, authors, venue badge, link buttons |
| abstract | Full abstract text |
| contributions | 3–5 key contribution cards |
| method | Architecture figure + method explanation |
| results | Quantitative table + qualitative figures |
| conclusion | Brief conclusion |
| citation | BibTeX block |
For each section that needs an image, identify:
Save as <out_dir>/outline.json using exactly this schema:
{
"project_title": "Paper Title",
"lang": "English",
"authors": ["Author One", "Author Two"],
"sections_plan": [
{
"section_index": 1,
"section_id": "hero",
"title": "Hero",
"content": "Title, authors, venue, teaser figure description",
"required_images": [
{
"url": "<local_page_path_from_phase1>",
"visual_description": "Figure 1: teaser showing input-output examples",
"usage_reason": "Hero section visual to immediately show the paper's output"
}
]
}
]
}
Field notes:
lang: "Chinese" or "English" — match the PDF languagerequired_images: empty array [] if section needs no imagesurl: the local file path of the source page (from Phase 1 path field)Write outline.json using the Write tool to <out_dir>/outline.json.
IMPORTANT: You MUST delegate ALL cropping to a clean subagent using the Agent tool. By this phase your context is very long (all page images + outline), which degrades visual coordinate accuracy. A fresh subagent with only the target image produces much more precise coordinates.
IMPORTANT: You MUST use the provided {SKILL_DIR}/scripts/crop.py script for ALL image cropping. Do NOT write your own cropping code, do NOT use PIL/Pillow directly, do NOT use any other method.
Read outline.json. Collect all crops needed, then launch one subagent per source page (or one per crop if pages differ). The subagent uses grounding-style localization — it views the image, locates the target element, and outputs a precise bounding box in normalized 0–999 coordinates.
Use the Agent tool like this:
Agent tool call:
description: "Grounding crop page N"
prompt: |
You are a visual grounding and cropping assistant. Your task is to precisely
locate specified visual elements in a page image and crop them out.
## Grounding method
Use visual grounding to locate each target:
1. Read the source image using the Read tool to view it
2. Identify the target element described below
3. Determine its bounding box as normalized coordinates in the 0–999 range:
- 0 = left/top edge of the image
- 999 = right/bottom edge of the image
- These are thousandths, NOT pixels, NOT percentages (0–100)
- Format: [x1, y1, x2, y2] where (x1,y1) is top-left, (x2,y2) is bottom-right
- Example: [0, 0, 500, 500] = top-left quarter of the image
4. Be precise: tightly bound the target element with a small margin (~10–20 units)
around it. Do NOT crop too wide or too narrow.
## Source image
<page_image_path>
## Crops needed
For each crop below, first do grounding (locate the element), then crop:
1. Name: "<descriptive_name>"
Target: "<visual_description from outline.json>"
Context: "<usage_reason from outline.json>"
## Crop command
After determining the bounding box [X1, Y1, X2, Y2] for each target, run:
```bash
python <SKILL_DIR>/scripts/crop.py \
--path "<page_image_path>" \
--box X1 Y1 X2 Y2 \
--name "<crop_name>" \
--out-dir "<out_dir>/crops"
```
## Verification
After each crop, READ the output image to visually verify the correct region
was captured. If the crop missed the target or is too wide/narrow, adjust the
coordinates and re-run crop.py.
## Output
Report the final results as a list:
- crop_name: <name>, file: <output_filename>, box: [X1, Y1, X2, Y2]
Replace <page_image_path>, <SKILL_DIR>, <out_dir>, and crop details with actual values from your context.
The crop.py script outputs JSON: {"path": "/abs/path/<name>_crop.png"}
Collect results from all subagents and build the mapping: section_id → [crop filename, ...] to reference in HTML.
Launch subagents for independent pages in parallel when possible. Wait for all to complete before proceeding.
python3 -c "
from PIL import Image; import os, json
d = '<out_dir>/crops'
sizes = {}
for f in sorted(os.listdir(d)):
if f.endswith('.png'):
w, h = Image.open(os.path.join(d, f)).size
sizes[f] = {'width': w, 'height': h, 'aspect': round(w/h, 2)}
print(json.dumps(sizes, indent=2))
"
| Aspect ratio | Layout recommendation |
|---|---|
| < 0.7 (tall/narrow) | max-width: 400–500px, centered |
| 0.7 – 1.3 (square-ish) | max-width: 600–700px |
| > 1.3 (wide) | Full-width, max-width: 100% |
| > 2.0 (very wide, e.g. tables) | Full-width with horizontal scroll fallback |
Step A — Write HTML to /tmp/website.html
<img src="..."> must use relative paths: crops/<name>_crop.pngStep B — Save:
python {SKILL_DIR}/scripts/generate_web.py \
--html-file /tmp/website.html \
--title "<paper title>" \
--out-dir "<out_dir>/"
A single self-contained HTML file — embedded CSS, minimal vanilla JS only. No external JS frameworks. Google Fonts CDN is fine.
Page layout:
900px, centered, comfortable side paddingTypography:
Visual style:
Section guidelines:
hero:
[📄 Paper] [💻 Code] [🗄️ Dataset] — grey out if no URLabstract:
contributions:
method:
<figure><img><figcaption>) + prose explanationresults:
<table> — use actual numbers from the PDF, best numbers boldedconclusion:
citation:
<pre><code> BibTeX block reconstructed from PDF metadatanavigator.clipboard vanilla JSImages:
<img> use relative paths: crops/<name>_crop.pngloading="lazy" and descriptive alt<figure> with <figcaption>Animations (subtle only):
IntersectionObserver + CSS transitions<pdf_stem>_<timestamp>/outline.json saved with valid WebPlan schemacrops/ (local only)crops/<name>_crop.pnggenerate_web.py called and confirmed successMatch the PDF language. English paper → English website. Chinese paper → Chinese. No mixing.
tools
Frontend visual replication skill. Explores a target website’s publicly visible pages via Playwright MCP or agent-browser, captures screenshots and layout information, then generates a static or client-side frontend replica that approximates the original’s visual appearance and page structure. This skill replicates FRONTEND PRESENTATION ONLY — it does not reproduce backend logic, server-side behavior, databases, or any non-public content. The user is responsible for ensuring they have proper authorization (ownership, license, or explicit permission) before replicating any website. ⚠️ Authorization gate: Before starting, the agent MUST confirm with the user that they have the legal right to replicate the target site. If the user cannot confirm, the skill MUST refuse to proceed.
tools
股票分析与涨跌预测分析。 在用户表达分析、判断或预测意图时触发,如“分析一下腾讯”、“0700最近走势如何”、“XX能不能买”、“预测一下后续走势”、“生成一份分析报告”等; 对于简单查询类需求(如“腾讯当前价格是多少”、“茅台代码是什么”)不触发本 Skill。 支持港股、A股、美股,整合多源数据(包括新闻、基本面、技术面、资金流及宏观信息)进行多维综合分析,输出图文结合、包含可视化图表的结构化分析报告。 ⚠️ 需要多模态主模型支持(如 glm-5v-turbo),主模型需能读取图片。
documentation
Screen and evaluate resumes against criteria using ZhiPu GLM-V multimodal model. Reads multiple resume files (PDF/DOCX/TXT), compares against user-defined screening criteria, and outputs a Markdown table with pass/fail analysis. Use when the user wants to filter resumes, compare candidates, or batch-evaluate job applications.
tools
Analyze images/videos and generate professional prompts for text-to-image and text-to-video AI tools (Midjourney, Stable Diffusion, DALL-E, Sora, Runway, Kling, Pika). Use when the user wants to generate prompts from reference images/videos, create AI art prompts, or get prompt engineering suggestions from visual content.