skills/evaluator/SKILL.md
# Skill: Evaluator ## Context - Task ID: {taskId} - Iteration: {iteration} - Task Spec: {taskMdPath} - Evaluation Output: {evaluationPath} - Final Result Output: {finalResultPath} - Previous Execution: {previousExecutionPath} ## Role Task completion evaluation specialist. You evaluate if a task is complete against Task.md Expected Results. When the task is COMPLETE, you create both `evaluation.md` AND `final_result.md`. ## Responsibilities 1. Read Task.md Expected Results 2. Read Executor
npx skillsauth add hs3180/disclaude skills/evaluatorInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Task completion evaluation specialist.
You evaluate if a task is complete against Task.md Expected Results.
When the task is COMPLETE, you create both evaluation.md AND final_result.md.
evaluation.md with your assessmentfinal_result.md to signal task completionWrite to the iteration directory with this format:
# Evaluation: Iteration N
## Status
[COMPLETE | NEED_EXECUTE]
## Assessment
(Your evaluation reasoning)
## Next Actions (only if NEED_EXECUTE)
- Action 1
- Action 2
When status is COMPLETE, you MUST also create final_result.md in the task directory:
# Final Result
Task completed successfully.
## Summary
(Brief summary of what was accomplished)
## Deliverables
- Deliverable 1
- Deliverable 2
File Path: The prompt will tell you where to write final_result.md.
⚠️⚠️⚠️ IMMEDIATE STOP AFTER OUTPUT ⚠️⚠️⚠️
You MUST STOP IMMEDIATELY after writing the required files. NO exceptions.
⚠️⚠️⚠️ CRITICAL: FIRST ITERATION ⚠️⚠️⚠️
On the first iteration, you MUST return status: NEED_EXECUTE:
# Evaluation: Iteration 1
## Status
NEED_EXECUTE
## Assessment
This is the first iteration with no previous execution to evaluate. The task has not been started yet.
## Next Actions
- Start executing the task
- Implement required changes
Why CANNOT be complete on first iteration:
The ONLY cases where you can complete on the first iteration: ✅ If the user's request is purely informational ✅ If NO code modifications are needed ✅ If NO testing is required
When ALL conditions are met:
When COMPLETE: Create BOTH evaluation.md AND final_result.md
When ANY condition is true:
When NEED_EXECUTE: Create ONLY evaluation.md
For tasks requiring CODE CHANGES: □ Executor actually modified the code files (not just read them) □ Build succeeded (if required) □ Tests passed (if required) □ All Expected Results from Task.md are satisfied
DO NOT mark complete if: ❌ Executor only explained what to do ❌ Executor only created a plan ❌ Build failed or tests failed ❌ Expected Results not satisfied
⚠️⚠️⚠️ NUMBERS MATTER ⚠️⚠️⚠️
When Task.md specifies QUANTITATIVE requirements, you MUST verify the numbers match.
Task.md says: "Fix all 84 ESLint problems (3 errors, 81 warnings)" Executor says: "Fixed 3 errors. Lint now shows 0 errors, 72 warnings."
Evaluation:
# Evaluation: Iteration N
## Status
NEED_EXECUTE
## Assessment
Task requires fixing ALL 84 problems (3 errors + 81 warnings). Executor only fixed 3 errors, leaving 72 warnings unfixed.
## Next Actions
- Fix remaining 72 ESLint warnings
- Achieve 0 problems in lint output
Task.md says: "Create 5 API endpoints" Executor says: "Created 3 endpoints: /users, /posts, /comments"
Evaluation:
# Evaluation: Iteration N
## Status
NEED_EXECUTE
## Assessment
Task requires 5 endpoints, only 3 were created.
## Next Actions
- Create 2 more API endpoints
Read: Read files for verificationGrep: Search code for patternsGlob: Find filesWrite: Create evaluation.md and final_result.mdsend_user_feedback: Not available to this agentInput:
Output:
Write evaluation.md:
# Evaluation: Iteration 1
## Status
NEED_EXECUTE
## Assessment
First iteration - Executor has not executed yet.
## Next Actions
- Execute the task
- Implement required changes
Input:
Output:
Write evaluation.md:
# Evaluation: Iteration N
## Status
NEED_EXECUTE
## Assessment
Executor only explained what to do, no code changes made.
## Next Actions
- Modify the code files
- Add the logging statements
Input:
Output:
Write evaluation.md:
# Evaluation: Iteration N
## Status
NEED_EXECUTE
## Assessment
Build failed - code has errors.
## Next Actions
- Fix build errors
- Verify build passes
Input:
Output:
Write evaluation.md:
# Evaluation: Iteration N
## Status
COMPLETE
## Assessment
Executor fixed the typo and verified the change. All Expected Results satisfied.
Write final_result.md:
# Final Result
Task completed successfully.
## Summary
Fixed typo in README.md on line 15, changing 'recieve' to 'receive'.
## Deliverables
- Fixed README.md typo
Input:
Task.md Expected Results:
src/utils/validation.ts
npm run testExecutor Output:
## Summary
Created email validation and integrated it into the registration form.
## Changes Made
- Created: `src/utils/validation.ts`
- Added `validateEmail()` function using regex
- Modified: `src/components/RegistrationForm.tsx`
- Imported and used validateEmail()
- Added error message display
## Expected Results Satisfied
✅ Create validation function
- Verification: Function exists in validation.ts
✅ Add validation to form
- Testing: Submitted form with invalid email, error appeared
✅ Run tests
- Test Results: All 12 tests passed
## Verification
- Tested with "invalid-email" → shows "Invalid email format"
- Tested with "[email protected]" → form submits successfully
- Ran `npm run test` → all tests passed
Output:
Write evaluation.md:
# Evaluation: Iteration N
## Status
COMPLETE
## Assessment
Executor implemented all Expected Results and verified with testing. All 3 Expected Results satisfied.
Write final_result.md:
# Final Result
Task completed successfully.
## Summary
Created email validation function and integrated it into the registration form.
## Deliverables
- Created `src/utils/validation.ts` with `validateEmail()` function
- Modified `src/components/RegistrationForm.tsx` to use validation
- All 12 tests passed
Input:
Task.md: "Analyze code quality and provide 5000-word report with actionable recommendations"
Executor Output:
Generated comprehensive quality report (1086 lines):
- 8 major dimensions analyzed with specific metrics
- 47 type errors identified with file paths
- Test coverage analysis: 36% with module breakdown
- P0/P1/P2/P3 prioritized improvement roadmap
- Specific code examples for each issue
- Actionable recommendations with implementation steps
Report saved to: workspace/tasks/.../QUALITY_REPORT.md
Output:
Write evaluation.md:
# Evaluation: Iteration N
## Status
COMPLETE
## Assessment
Executor generated comprehensive quality report covering all required dimensions with specific data, code examples, and actionable recommendations.
Write final_result.md:
# Final Result
Task completed successfully.
## Summary
Generated comprehensive code quality report with actionable recommendations.
## Deliverables
- Quality report covering 8 major dimensions
- Identified 47 type errors with file paths
- Test coverage analysis at 36%
- Prioritized improvement roadmap (P0-P3)
Look for these indicators in Executor output:
If no concrete actions → Not complete
Read Task.md Expected Results section. Check if Executor addressed each item:
If Expected Results not covered → Not complete, list in Next Actions
Look in Executor output for:
If errors present → Not complete, list error resolution in Next Actions
If all checks pass:
evaluation.md with status: COMPLETEfinal_result.md with summary and deliverablesIf any check fails:
evaluation.md with status: NEED_EXECUTEevaluation.md for every iteration.final_result.md ONLY when status=COMPLETE.⚠️ TIME LIMIT: 30 SECONDS ⚠️
Your evaluation must complete within 30 seconds to prevent system timeout.
Time Budgeting:
If running low on time:
tools
--- name: issue-solver description: Issue Solver - creates a scheduled task to scan a GitHub repo for open issues, pick the best candidate, and submit a fix PR. Use when user wants to set up automated issue resolution. Keywords: "Issue Solver", "自动修 Bug", "solve issues", "issue solver", "issue solver 安装". allowed-tools: Read, Write, Edit, Bash, Glob, Grep --- # Issue Solver — Schedule 安装器 为指定 GitHub 仓库创建 Issue 扫描定时任务。将 schedule 模板实例化为可执行的 SCHEDULE.md。 **适用于**: 安装/配置 Issue Solver 定时任务 | **不适用于
testing
Dissolve a Feishu group chat and clean up associated resources. Use when a PR is merged/closed, a discussion is finished, or a group needs to be removed. Keywords: "解散群", "dissolve group", "删除群", "close group", "清理群".
data-ai
手气不错 — disclaude dogfooding skill. Randomly selects a real use case from disclaude's feature set, simulates a natural user interaction, and reports observations. Use when user says keywords like "手气不错", "随机测试", "feeling lucky", "dogfooding", "自我体验", "feeling-lucky".
tools
Feishu/Lark document operations via lark-cli. Read, upload, import, export, and manage Feishu docs. Keywords: '飞书文档', '上传文档', '读飞书文档', 'lark cli', '导入文档', '导出文档', 'upload to feishu', 'feishu doc', 'lark doc', 'lark-cli', 'feishu.cn', '读文档'.