
Navigate github.com in mobile Chrome to retrieve repository information (files, README, contributors, languages, releases, packages) and switch to desktop site mode for richer metadata (forks, stars). Use for in-browser research, not git CLI or GitHub API operations.
扩展实现阶段 — 在 worktree 中生成运行时扩展代码
Complete calendar-and-clock workflows on mobile (find events, read start times, create alarms or reminders relative to meetings). Use when the user asks to schedule, set an alarm before/after a meeting, find appointment times, or coordinate Calendar with Clock.
Comprehensive step-by-step workflow for navigating the Booking.com mobile site to search for destinations, select dates/occupancy, browse results, and extract detailed hotel information including pricing and facilities.
Participate in an OpenJiuWen agent team as an external member. Use when this agent has been spawned into a team (the OPENJIUWEN_TEAM_JOIN environment variable is set) and needs to read its inbox, send messages to teammates, and claim / work / complete tasks via the `team-member` CLI.
Runtime extension 验证规范 — 验证 harness package 中 tools、rails、skills 是否能真实热加载并可运行
Generates a multimodal Skill markdown file from a source URL. Use this Skill when a user wants to turn a web tutorial, product support article, or software guide into a reusable agent Skill with concise steps and embedded screenshots.
扩展方案设计 — 将能力缺口转化为 ExtensionDesign 结构
规划规范 — 将评估结果收敛为结构化任务计划
验证规范 — 定义实现阶段应满足的验证等级与通过标准
Full skill evaluation pipeline — the single entry point for comprehensive skill assessment. Orchestrates multiple evaluators in sequence, aggregates their scores, and produces a final weighted verdict. Use when you want to fully evaluate a skill end-to-end: design quality, functional correctness, and (when available) safety. Triggers on: "evaluate this skill", "run full eval on skill", "assess skill quality", "is this skill ready to ship". Do NOT use when you only want design review (use skill_llm_judge directly) or only want to run functional tests (use skill_tester directly).
Audit Agent Skill design quality through static analysis of SKILL.md — no prompt execution, no code running. Scores 7 design dimensions (100pts): knowledge ratio, expert knowledge craft, specification compliance, progressive disclosure, pattern + freedom fit, predicted usability, and output specification. Use when reviewing a skill's DESIGN before or after functional testing. Outputs structured design_score.json for eval pipeline consumption. Do NOT use when you want to run the skill on real prompts or measure actual output quality — use skill_tester for that.
沟通规范 — 约束 commit message、PR、journal 和求助信息的表达方式
实现阶段主操作手册 — 指导 agent 完成改码与局部验证,并把提交留给独立 commit phase
流水线选择规范 — 根据任务和事实选择最合适的 pipeline
基于 commit skill 的自主提交流程。适用于 implement 阶段在提交前规划范围并通过 bash 完成 git 提交。
评估方法论 — 根据 query 中的评估模式执行代码库健康评估或 runtime extension 能力缺口评估
Evaluate Agent Skill safety through static analysis of SKILL.md instructions — no prompt execution, no code running. Gates the eval pipeline at score_pct ≥ 0.80: skills that fail do not ship regardless of design or functional scores.Scores 5 safety dimensions (100pts): harmful instruction potential, scope containment, data handling safety, injection resilience, and guardrail presence. Use when auditing whether a skill's instructions could cause an agent to produce harmful outputs, exceed appropriate scope, mishandle sensitive data, or be hijacked via injection. Do NOT use for design quality review (use kill-llm-judge) or functional testing (use skill-bench).
Guide to skill creation. Use this skill when the user requests to make, create, or write a new skill.
Generate and run synthetic test cases against any skill by executing real prompts through the skill and grading actual outputs — functional and behavioral testing only. Use when verifying a skill WORKS: correct outputs, edge case handling, error recovery, end-to-end workflow. Triggers on: "test this skill", "generate test cases", "run evals", "validate the skill", "run tests on skill", or when given a skill path to verify. Outputs pass_rate and structured assertions.json for eval pipeline consumption. Do NOT use for reviewing skill design quality, knowledge delta, anti-patterns, or SKILL.md structure — use skill_llm_judge for that.