Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

jamesgray-ai/improve

Name: improve
Author: jamesgray-ai

plugins/handsonai/skills/improve/SKILL.md

npx skillsauth add jamesgray-ai/handsonai-plugins improve

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Improve Workflow

Evaluate and evolve running AI workflows. Review how a deployed workflow is performing against its original baseline, identify degradation or growth signals, and recommend whether to tune, redesign, or evolve the orchestration mechanism.

Workflow

1. Load workflow context

Read the workflow's manifest (outputs/[workflow-name]/workflow.yaml) and load the artifacts it registers: the Design Spec, Run Guide, original Test Results (the baseline), and the run log (runs.md) if one exists. If no manifest exists but legacy flat files (outputs/[name]-*.md) do, use those paths.

Confirm the artifacts belong to the same workflow — check that the workflow field in the Test Results frontmatter matches the manifest before treating its scores as this workflow's baseline. Parse the baseline scores from the Test Results frontmatter (scores and averages) — that's the regression reference.

Understand what was built, how it was designed to work, and what quality bar was established.

2. Current state assessment

Start from the run log if one exists — it's evidence, not recollection. Summarize what it shows (run frequency, recurring edits, failures, drift) and confirm the summary with the user rather than asking them to remember.

Then interview the user for what the log can't show:

How often are you running this workflow? (skip if the run log answers this)
How much manual editing does the output typically need?
Have your requirements or business context changed?
Are there new steps or decisions that have emerged since deployment?
What's working well that you want to preserve?

3. Quality evaluation

Identify signals of degradation or opportunity:

| Signal | What It Means | |--------|---------------| | Increasing manual edits | Context may need updating (stale examples, changed standards) | | New decision types appearing | May need additional skills or agent capabilities | | Steps being skipped | Workflow coverage gap — missing steps need to be added | | Output quality inconsistent | Prompt or context needs tuning | | User adding steps manually | Workflow scope has grown beyond original design |

4. Graduation assessment

Should the orchestration mechanism evolve?

Prompt → Skill-Powered Prompt — if repeatable sub-routines have emerged that deserve codification
Skill-Powered Prompt → Agent — if AI needs to make sequencing decisions rather than follow a fixed order
Single Agent → Multi-Agent — if complexity has grown to require specialized sub-agents

Only recommend graduation when there's a concrete capability gap, not just because "it could be more sophisticated."

5. Regression evaluation

Re-run the eval suite from Step 5 (Test):

Run the same test scenarios (E1, E2, …) from the original baseline, scoring the same dimensions the same way Test does (AI-graded against Acceptance Criteria and Golden Examples first, user confirms)
Diff against the baseline mechanically: compare the new per-scenario scores against the scores block parsed from the original Test Results frontmatter, and present a delta table (scenario × dimension, baseline → current, flagging any drop ≥1 point)
Compare like-for-like: check the baseline's environment field — if an integration was simulated then and is live now (or vice versa), say so; a score change caused by an integration being fixed is not the workflow getting better or worse
Identify areas of degradation or improvement
Determine if the eval criteria themselves need updating (requirements may have shifted)

6. Operationalization review (organizational workflows)

For workflows used by teams (not just individuals), assess:

Adoption — Is the team actually using it? What's the usage frequency?
Training — Do new team members know how to use it?
Governance — Are outputs being reviewed appropriately? Are there quality controls?

Skip this step for individual/personal workflows.

7. Recommendation

Produce one of the following:

No changes needed — workflow is performing at or above baseline, requirements haven't shifted
Tune — specific building blocks to adjust (identify which ones and what to change) → loop back to the build skill (Step 4) and test skill (Step 5)
Redesign — requirements have changed enough that the workflow structure needs rethinking → loop back to the design skill (Step 3)
Evolve — graduate to a more capable orchestration mechanism → loop back to the design skill (Step 3) with an explicit graduation recommendation

Output

Write results to outputs/[workflow-name]/improvement-plan.md. If a plan already exists from a previous review cycle, rename it with a date suffix first. Then update the workflow manifest: set current_step: 7, last_updated, add improvement_plan under artifacts, and record next_review: YYYY-MM-DD (agree the date with the user — monthly is a good default for high-frequency workflows, quarterly for occasional ones).

Include:

Current performance summary — how the workflow is being used and performing
Regression scores — comparison table of baseline vs. current scores
Issues identified — specific problems with diagnosed root causes
Recommendation — No changes / Tune / Redesign / Evolve, with rationale
Action items — concrete next steps if changes are recommended

Guidelines

Don't prompt for information the user can't answer. If they don't track usage metrics, work with qualitative signals instead.
Focus on concrete signals, not abstract evaluation. "Your context file references Q3 goals but it's Q1" beats "your context may be stale."
This step is typically invoked weeks or months after initial deployment, in a separate conversation from the original build.
Not every workflow needs improvement. If it's working, say so and move on.

jamesgray-ai/improve

plugins/handsonai/skills/improve/SKILL.md

Evaluate a running AI workflow for quality, relevance, and evolution opportunities. Use when the user wants to review how a deployed workflow is performing, check if it needs tuning, or assess whether it should graduate to a more capable orchestration mechanism. This is Step 7 (Improve) of the AI Workflow Framework.

7 stars

development

Updated Jun 11, 2026

$ install --global

skillsauth

npx skillsauth add jamesgray-ai/handsonai-plugins improve

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Jun 11, 2026, 8:36 AM143.1s1 file scanned

SKILL.md

name:: improve
description:: >
user-invocable:: true

Improve Workflow

Workflow

1. Load workflow context

Understand what was built, how it was designed to work, and what quality bar was established.

2. Current state assessment

Then interview the user for what the log can't show:

How often are you running this workflow? (skip if the run log answers this)
How much manual editing does the output typically need?
Have your requirements or business context changed?
Are there new steps or decisions that have emerged since deployment?
What's working well that you want to preserve?

3. Quality evaluation

Identify signals of degradation or opportunity:

4. Graduation assessment

Should the orchestration mechanism evolve?

Prompt → Skill-Powered Prompt — if repeatable sub-routines have emerged that deserve codification
Skill-Powered Prompt → Agent — if AI needs to make sequencing decisions rather than follow a fixed order
Single Agent → Multi-Agent — if complexity has grown to require specialized sub-agents

Only recommend graduation when there's a concrete capability gap, not just because "it could be more sophisticated."

5. Regression evaluation

Re-run the eval suite from Step 5 (Test):

Run the same test scenarios (E1, E2, …) from the original baseline, scoring the same dimensions the same way Test does (AI-graded against Acceptance Criteria and Golden Examples first, user confirms)
Diff against the baseline mechanically: compare the new per-scenario scores against the scores block parsed from the original Test Results frontmatter, and present a delta table (scenario × dimension, baseline → current, flagging any drop ≥1 point)
Compare like-for-like: check the baseline's environment field — if an integration was simulated then and is live now (or vice versa), say so; a score change caused by an integration being fixed is not the workflow getting better or worse
Identify areas of degradation or improvement
Determine if the eval criteria themselves need updating (requirements may have shifted)

6. Operationalization review (organizational workflows)

For workflows used by teams (not just individuals), assess:

Adoption — Is the team actually using it? What's the usage frequency?
Training — Do new team members know how to use it?
Governance — Are outputs being reviewed appropriately? Are there quality controls?

Skip this step for individual/personal workflows.

7. Recommendation

Produce one of the following:

No changes needed — workflow is performing at or above baseline, requirements haven't shifted
Tune — specific building blocks to adjust (identify which ones and what to change) → loop back to the build skill (Step 4) and test skill (Step 5)
Redesign — requirements have changed enough that the workflow structure needs rethinking → loop back to the design skill (Step 3)
Evolve — graduate to a more capable orchestration mechanism → loop back to the design skill (Step 3) with an explicit graduation recommendation

Output

Include:

Current performance summary — how the workflow is being used and performing
Regression scores — comparison table of baseline vs. current scores
Issues identified — specific problems with diagnosed root causes
Recommendation — No changes / Tune / Redesign / Evolve, with rationale
Action items — concrete next steps if changes are recommended

Guidelines

Don't prompt for information the user can't answer. If they don't track usage metrics, work with qualitative signals instead.
Focus on concrete signals, not abstract evaluation. "Your context file references Q3 goals but it's Q1" beats "your context may be stale."
This step is typically invoked weeks or months after initial deployment, in a separate conversation from the original build.
Not every workflow needs improvement. If it's working, say so and move on.

Related Skills

jamesgray-ai/writing-workflow-sops

documentation

VerifiedTrustedCommunity

Write Standard Operating Procedure documentation for workflows and save as markdown files. Selects full or lightweight SOP template based on autonomy level (deterministic vs. guided/autonomous), then adapts for workflow type (Manual, Augmented, Automated). Use when the user asks to write an SOP, document a workflow, create procedure documentation, or capture how a workflow is executed. Triggers on "write an SOP", "document this workflow", "create operating instructions", "how is this workflow executed".

7SKILL.mdUpdated May 5, 2026

jamesgray-ai/writing-workflow-sops

jamesgray-ai/test

development

VerifiedTrustedCommunity

Guide structured testing of AI workflow artifacts, evaluate output quality, identify which building blocks need adjustment, and determine readiness for deployment. Use when the user has built workflow artifacts and needs to test them. This is Step 5 (Test) of the AI Workflow Framework.

7SKILL.mdUpdated May 5, 2026

jamesgray-ai/run

development

VerifiedTrustedCommunity

This skill should be used when the user has built and tested workflow artifacts and wants a Run Guide for deploying and operating their AI workflow. It generates a plain-language guide with setup steps, deployment patterns, and sharing instructions — tailored to the user's platform and build path. This is Step 6 (Run) of the AI Workflow Framework.

7SKILL.mdUpdated May 5, 2026

jamesgray-ai/design

development

VerifiedTrustedCommunity

This skill should be used when the user has a Workflow Requirements document and wants to design an AI workflow. It gathers architecture decisions, assesses workflow autonomy level, chooses an orchestration mechanism and involvement mode, classifies steps, maps building blocks, identifies skill candidates, configures agents, and produces a Design Spec for approval. Supports both step-decomposed and goal-driven Workflow Requirements. This is Step 3 (Design) of the AI Workflow Framework.

7SKILL.mdUpdated May 5, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/jamesgray-ai/handsonai-plugins.git

# Copy into Claude Code skills folder (global)
cp -r handsonai-plugins/plugins/handsonai/skills/improve ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

jamesgray-ai/handsonai-plugins

7 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT