plugins/handsonai/skills/test/SKILL.md
Guide structured testing of AI workflow artifacts, evaluate output quality, identify which building blocks need adjustment, and determine readiness for deployment. Use when the user has built workflow artifacts and needs to test them. This is Step 5 (Test) of the AI Workflow Framework.
npx skillsauth add jamesgray-ai/handsonai-plugins testInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Structured testing and evaluation of AI workflow artifacts. Walk the user through running their workflow against real scenarios, scoring output quality, diagnosing issues back to specific building blocks, and deciding whether the workflow is ready for deployment.
Read the Design Spec (including Evaluation Criteria) to understand what was built, expected behavior, and how to evaluate. Identify the test scenarios and scoring dimensions defined during Design.
One representative input, manual check: does the workflow run end-to-end and produce something reasonable? This is a sanity check before systematic evaluation — catch showstoppers early.
Execute each test scenario from the Evaluation Criteria (defined in Design). For each scenario:
Guide the user through scoring with plain-language prompts:
Adapt the dimension names and prompts to match whatever eval dimensions were defined in the spec.
Test individual skills and prompts in isolation — not just end-to-end. For each skill or prompt in the workflow:
Record the eval scores as the reference point for future regression testing in Step 7 (Improve). This baseline captures:
For each problem identified in the eval, map it to which building block to adjust:
| Symptom | Building Block to Adjust | |---------|--------------------------| | Generic output | Add more Context (examples, style guides, reference materials) | | Steps skipped or misunderstood | Refine the Prompt (more explicit instructions) | | Missing expertise | Build a Skill for that step (codify domain knowledge) | | Unpredictable decisions | Convert to Agent (let AI plan its approach) |
Based on eval scores across all scenarios:
run skill (Step 6)build skill (Step 4), then re-testWrite results to outputs/[workflow-name]-test-results.md.
Include an eval scorecard with this format:
documentation
Write Standard Operating Procedure documentation for workflows and save as markdown files. Selects full or lightweight SOP template based on autonomy level (deterministic vs. guided/autonomous), then adapts for workflow type (Manual, Augmented, Automated). Use when the user asks to write an SOP, document a workflow, create procedure documentation, or capture how a workflow is executed. Triggers on "write an SOP", "document this workflow", "create operating instructions", "how is this workflow executed".
tools
Use when a user has a fuzzy idea they want to explore before writing a formal PRD. Captures the essence of an idea as a Vision Brief — a structured, business-focused artifact that feeds directly into the feature-prd workflow.
documentation
Write Business Process Guide documentation that explains when, why, and how to execute a complete business process with its component workflows, and save as markdown files. Use when documenting a business process end-to-end, creating playbooks, or explaining how multiple workflows fit together. Triggers on "write process guide", "document this process", "create a playbook for", "how do these workflows connect".
tools
Use when starting a new feature, defining requirements before implementation, or when the user says "new feature", "create a spec", "create a PRD", or "feature PRD".