skills/xiaohongshu-scraper/SKILL.md
Use when a user provides Xiaohongshu/XHS/xhslink URLs, asks to fetch 小红书 note or video content, likes, saves/collections, comments, publish metadata, or wants to fill a spreadsheet/Base from 小红书 links.
npx skillsauth add csfuwwc/md-skills xiaohongshu-scraperInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Use this skill for 小红书笔记链接抓取和表格回填. Prefer reliable, low-frequency access patterns and preserve user intent: do not invent engagement numbers when the public page hides them.
Use the dedicated Xiaohongshu browser profile and CDP endpoint when a logged-in visible browser is explicitly needed:
http://127.0.0.1:9223$HOME/Library/Application Support/Google/SocialScraperProfiles/Xiaohongshu~/.agents/social-browser-profiles/launch-social-chrome.sh xhsDo not use the default Chrome profile for Xiaohongshu scraping. Do not share this port/profile with Douyin or Weibo jobs.
The Base entrypoint uses a local platform lock at /tmp/social-scraper-locks/xiaohongshu.lock; do not run two Xiaohongshu jobs at the same time from different sessions.
node ~/.agents/skills/xiaohongshu-scraper/scripts/scrape-xhs.mjs --json --delay-ms 6000 < urls.txt
node ~/.agents/skills/xiaohongshu-scraper/scripts/process-lark-xhs.mjs \
--base-token <base_token> \
--table-id <table_id> \
--view-id <view_id> \
--batch-size 10
小红书, 404, or 当前内容无法展示, retry with a real Chrome profile only when the user agrees:
node ~/.agents/skills/xiaohongshu-scraper/scripts/process-lark-xhs.mjs \
--base-token <base_token> \
--table-id <table_id> \
--view-id <view_id> \
--browser-confirm \
--chrome-user-data-dir "$HOME/Library/Application Support/Google/SocialScraperProfiles/Xiaohongshu" \
--batch-size 3
抓取状态 is blank, 准备抓取, or 保持抓取.正文, 点赞数, 收藏数, 抓取时间, and confirmed terminal 抓取状态. Use lark-base for Base commands.Do not wrap scrape-xhs.mjs --browser in a shell loop for Base work. It is a lower-level scraper; use process-lark-xhs.mjs so status filtering, failure fuses, and Base writes stay consistent.
Use 抓取状态 as the control plane:
准备抓取.准备抓取: new or newly appended row, eligible for first scraping. Fetch 正文, 点赞数, 收藏数, and 抓取时间.保持抓取: first scrape succeeded and data has been written; refresh only 点赞数, 收藏数, and 抓取时间. Do not fetch or overwrite 正文 by default.抓取异常: confirmed inaccessible, app-only, login-wall, 404, repeated no-note-state, or platform warning risk. Do not retry automatically.停止抓取: manual or rule-based stop state for rows that no longer need engagement tracking.Default runs scrape rows whose 抓取状态 is blank, 准备抓取, or 保持抓取. Treat blank as 准备抓取. When a 准备抓取 row succeeds, write the scraped body and engagement fields, then set 抓取状态 to 保持抓取. When a 保持抓取 row succeeds, update only engagement fields and 抓取时间; keep 正文 and 抓取状态 unchanged. When a row is confirmed failed after the allowed fallback path, set 抓取状态 to 抓取异常, write 抓取时间, and preserve existing useful values.
Use 评论状态 as a separate operations queue. Keep it blank by default. The scraper may set 准备评论 only when a configured engagement threshold is reached or the user explicitly asks to queue rows for manual commenting; it must not publish Xiaohongshu comments automatically.
准备评论: selected by threshold or manual review and waiting for human commenting on Xiaohongshu.取消评论: manually decided not to comment.评论成功: human comment has been posted.评论失败: a comment attempt failed and is not currently being retried.For now, Xiaohongshu commenting is all manual. The skill can help surface candidates by writing 评论状态=准备评论, but comment publishing and final status changes are handled by the user. 准备评论 on Xiaohongshu means "ready for a human to comment", not "auto-comment now".
When the row is marked 视频 or the note state contains video streams:
video-download skill if the stream URL is not directly exposed.ffmpeg -hide_banner -loglevel error -i video.mp4 -vf "fps=1/2,scale=720:-1" frames/frame_%03d.jpg
mmx vision describe:
mmx vision describe --region global --image frames/frame_001.jpg --prompt "请提取这帧里的字幕、画面主体和对理解小红书视频正文有用的信息。"
正文. Clearly mark inferred video content as 【视频内容解析】.MiniMax requires local authentication (MINIMAX_API_KEY or mmx auth login). If unavailable, fall back to manual frame review and say so. Video rows are higher risk and slower; process fewer per batch.
正文: title + newline + desc when both exist. Write this only for 准备抓取 rows unless the user explicitly requests body reprocessing.【视频内容解析】 section when frame/video evidence is available.点赞数 / 收藏数: refresh for both 准备抓取 and 保持抓取 rows. Leave unavailable values unchanged/null unless the user explicitly asks to treat hidden values as 0.抓取时间: use the user's current local datetime rounded to minutes, e.g. YYYY-MM-DD HH:mm:00, when the Base field supports time.抓取状态: write 保持抓取 only after a successful 准备抓取 first scrape. Keep 保持抓取 unchanged during engagement refresh. Write 抓取异常 only after an access failure is confirmed and no automatic retry should happen. Never write 停止抓取 automatically unless a separate stop-tracking rule is explicitly enabled.评论状态: blank by default; write 准备评论 only when an engagement threshold or explicit user instruction selects the row for commenting. Xiaohongshu 准备评论 means a human should comment.抓取异常.9223 and the Xiaohongshu profile only.references/strategy.md before changing crawl behavior, retry policy, browser profile usage, or Lark Base write semantics.tools
Use when the user asks to install Feishu/Lark CLI, configure lark-cli, connect an agent with Feishu CLI, check or refresh lark-cli auth, recover expired tokens, or start a Feishu device-flow login.
content-media
Use when a user provides Weibo/微博 links, asks to fetch 微博 post text, likes, video visual content, or wants to fill a Lark Base table from 微博 links.
testing
Read and send email via IMAP/SMTP. Check for new/unread messages, fetch content, search mailboxes, mark as read/unread, and send emails with attachments. Supports multiple accounts. Works with any IMAP/SMTP server including Gmail, Outlook, 163.com, vip.163.com, 126.com, vip.126.com, 188.com, and vip.188.com.
content-media
Use when a user provides Douyin/抖音 links, v.douyin.com short links, asks to fetch 抖音 video text, likes, collections/favorites, video visual content, or wants to fill a Lark Base table from 抖音 links.