skill-candidates/qwen-training-checkpoint-eval/SKILL.md
Evaluate saved Qwen training checkpoints against batch-aligned training samples and trained-adapter eval lanes. Use when inspecting checkpoint.eval.json, validating a saved batch adapter on the Radeon eval lane, or comparing baseline student behavior against a newly trained checkpoint before promotion.
npx skillsauth add grtninja/skill-arbiter qwen-training-checkpoint-evalInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Use this skill for logical checkpoint testing during or after staggered student training.
checkpoint.eval.json and trainer.report.json for the batch.batch_sources.json or batch_sources.txt.Batch artifact review:
type <batch-dir>\checkpoint.eval.json
type <batch-dir>\trainer.report.json
type <batch-dir>\batch_sources.json
Eval lane launch:
powershell -ExecutionPolicy Bypass -File `
<training-workbench-root>\tools\start_qwen35_4b_radeon_eval.ps1 `
-Config <eval-config-pointing-at-batch-adapter>
Eval lane health:
curl <loopback-eval-lane>/health
curl <loopback-eval-lane>/v1/models
At minimum, require:
checkpoint.eval.json existsadult_context and penny_affinity matches are present for the sampled records/health shows adapter_loaded = true when a saved adapter is mounted on the eval laneUse this skill for checkpoint validation and trained-adapter smoke testing.
Do not use this skill for:
references/checkpoint-contract.mdtools
Run a defender-first security sweep on code, configs, prompts, model/tooling surfaces, or third-party contribution lanes. Use when a request involves safe bug, leak, zero-day-class, exploit, or hack hunting for protection, when contributing to outside repositories and you want a focused security pass, or when touching auth, secrets, permissions, network exposure, prompt/tool boundaries, data flow, or update/build surfaces. This skill is defensive only and must never be used for weaponization or unauthorized access.
development
Validate and repair VRM Sandbox startup acceptance with shim-first local model authority, frontend/backend bring-up, and avatar-runtime launch proof. Use when launch behavior, chat handoff, voice fallback, or runtime bridge acceptance must be verified end to end.
documentation
Align documented voice-command catalogs, endpoint action allowances, and live runtime handlers so operator-visible voice surfaces match what the stack can actually execute. Use when voice command docs, parser matrices, endpoint permissions, or runtime action routing drift apart.
development
Track SkillHub trend and topic drift, maintain a bounded rewrite watchlist, and surface emerging gaps worth turning into repo-owned skills. Use when the marketplace query set shows new families or when the current shortlist has gone stale.