plugins/handsonai/skills/improve/SKILL.md
Evaluate a running AI workflow for quality, relevance, and evolution opportunities. Use when the user wants to review how a deployed workflow is performing, check if it needs tuning, or assess whether it should graduate to a more capable orchestration mechanism. This is Step 7 (Improve) of the AI Workflow Framework.
npx skillsauth add jamesgray-ai/handsonai-plugins improveInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Evaluate and evolve running AI workflows. Review how a deployed workflow is performing against its original baseline, identify degradation or growth signals, and recommend whether to tune, redesign, or evolve the orchestration mechanism.
Read the workflow's manifest (outputs/[workflow-name]/workflow.yaml) and load the artifacts it registers: the Design Spec, Run Guide, original Test Results (the baseline), and the run log (runs.md) if one exists. If no manifest exists but legacy flat files (outputs/[name]-*.md) do, use those paths.
Confirm the artifacts belong to the same workflow — check that the workflow field in the Test Results frontmatter matches the manifest before treating its scores as this workflow's baseline. Parse the baseline scores from the Test Results frontmatter (scores and averages) — that's the regression reference.
Understand what was built, how it was designed to work, and what quality bar was established.
Start from the run log if one exists — it's evidence, not recollection. Summarize what it shows (run frequency, recurring edits, failures, drift) and confirm the summary with the user rather than asking them to remember.
Then interview the user for what the log can't show:
Identify signals of degradation or opportunity:
| Signal | What It Means | |--------|---------------| | Increasing manual edits | Context may need updating (stale examples, changed standards) | | New decision types appearing | May need additional skills or agent capabilities | | Steps being skipped | Workflow coverage gap — missing steps need to be added | | Output quality inconsistent | Prompt or context needs tuning | | User adding steps manually | Workflow scope has grown beyond original design |
Should the orchestration mechanism evolve?
Only recommend graduation when there's a concrete capability gap, not just because "it could be more sophisticated."
Re-run the eval suite from Step 5 (Test):
scores block parsed from the original Test Results frontmatter, and present a delta table (scenario × dimension, baseline → current, flagging any drop ≥1 point)environment field — if an integration was simulated then and is live now (or vice versa), say so; a score change caused by an integration being fixed is not the workflow getting better or worseFor workflows used by teams (not just individuals), assess:
Skip this step for individual/personal workflows.
Produce one of the following:
build skill (Step 4) and test skill (Step 5)design skill (Step 3)design skill (Step 3) with an explicit graduation recommendationWrite results to outputs/[workflow-name]/improvement-plan.md. If a plan already exists from a previous review cycle, rename it with a date suffix first. Then update the workflow manifest: set current_step: 7, last_updated, add improvement_plan under artifacts, and record next_review: YYYY-MM-DD (agree the date with the user — monthly is a good default for high-frequency workflows, quarterly for occasional ones).
Include:
documentation
Write Standard Operating Procedure documentation for workflows and save as markdown files. Selects full or lightweight SOP template based on autonomy level (deterministic vs. guided/autonomous), then adapts for workflow type (Manual, Augmented, Automated). Use when the user asks to write an SOP, document a workflow, create procedure documentation, or capture how a workflow is executed. Triggers on "write an SOP", "document this workflow", "create operating instructions", "how is this workflow executed".
development
Guide structured testing of AI workflow artifacts, evaluate output quality, identify which building blocks need adjustment, and determine readiness for deployment. Use when the user has built workflow artifacts and needs to test them. This is Step 5 (Test) of the AI Workflow Framework.
development
This skill should be used when the user has built and tested workflow artifacts and wants a Run Guide for deploying and operating their AI workflow. It generates a plain-language guide with setup steps, deployment patterns, and sharing instructions — tailored to the user's platform and build path. This is Step 6 (Run) of the AI Workflow Framework.
development
This skill should be used when the user has a Workflow Requirements document and wants to design an AI workflow. It gathers architecture decisions, assesses workflow autonomy level, chooses an orchestration mechanism and involvement mode, classifies steps, maps building blocks, identifies skill candidates, configures agents, and produces a Design Spec for approval. Supports both step-decomposed and goal-driven Workflow Requirements. This is Step 3 (Design) of the AI Workflow Framework.