.claude/skills/test/SKILL.md
Guide structured testing of AI workflow artifacts, evaluate output quality, identify which building blocks need adjustment, and determine readiness for deployment. Use when the user has built workflow artifacts and needs to test them. This is Step 5 (Test) of the Business-First AI Framework.
npx skillsauth add jamesgray-ai/handsonai testInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Structured testing and evaluation of AI workflow artifacts. Walk the user through running their workflow against real scenarios, scoring output quality, diagnosing issues back to specific building blocks, and deciding whether the workflow is ready for deployment.
Read the Building Block Spec (including Evaluation Criteria) to understand what was built, expected behavior, and how to evaluate. Identify the test scenarios and scoring dimensions defined during Design.
One representative input, manual check: does the workflow run end-to-end and produce something reasonable? This is a sanity check before systematic evaluation — catch showstoppers early.
Execute each test scenario from the Evaluation Criteria (defined in Design). For each scenario:
Guide the user through scoring with plain-language prompts:
Adapt the dimension names and prompts to match whatever eval dimensions were defined in the spec.
Test individual skills and prompts in isolation — not just end-to-end. For each skill or prompt in the workflow:
Record the eval scores as the reference point for future regression testing in Step 7 (Improve). This baseline captures:
For each problem identified in the eval, map it to which building block to adjust:
| Symptom | Building Block to Adjust | |---------|--------------------------| | Generic output | Add more Context (examples, style guides, reference materials) | | Steps skipped or misunderstood | Refine the Prompt (more explicit instructions) | | Missing expertise | Build a Skill for that step (codify domain knowledge) | | Unpredictable decisions | Convert to Agent (let AI plan its approach) |
Based on eval scores across all scenarios:
Write results to outputs/[workflow-name]-test-results.md.
Include an eval scorecard with this format:
documentation
Write Standard Operating Procedure documentation for workflows and save as markdown files. Selects full or lightweight SOP template based on autonomy level (deterministic vs. guided/autonomous), then adapts for workflow type (Manual, Augmented, Automated). Use when the user asks to write an SOP, document a workflow, create procedure documentation, or capture how a workflow is executed. Triggers on "write an SOP", "document this workflow", "create operating instructions", "how is this workflow executed".
documentation
Write Business Process Guide documentation that explains when, why, and how to execute a complete business process with its component workflows, and save as markdown files. Use when documenting a business process end-to-end, creating playbooks, or explaining how multiple workflows fit together. Triggers on "write process guide", "document this process", "create a playbook for", "how do these workflows connect".
development
This skill should be used when the user wants to sync skills to GitHub, push skill changes to a remote repository, or back up local skills. Syncs Claude Agent Skills from ~/.claude/skills/ (local) to GitHub repository using git commands. Commits changes, pushes to remote, and updates Notion AI Building Blocks with GitHub URLs.
development
This skill should be used when the user wants to register or update AI building blocks (Skills, Agents, Prompts, Context MDs) in the Notion AI Building Blocks database. Triggers after skill creation, agent creation, prompt authoring, context MD updates, or when the user asks to register, add, or track a building block in Notion.