.claude/skills/eval-running/SKILL.md
Run evaluation suites against the Loa framework
npx skillsauth add 0xhoneyjar/loa-finn evalInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Run evaluation suites against the Loa framework to detect regressions and benchmark skill quality.
# Run framework correctness suite
/eval --suite framework
# Run regression suite
/eval --suite regression
# Run a single task
/eval --task constraint-proc-001-enforced
# Run all tasks for a skill
/eval --skill implementing-tasks
# Update baselines
/eval --suite framework --update-baseline --reason "Post-refactor re-baseline"
/eval commandevals/harness/run-eval.sh with appropriate flagsWhen invoked, translate the user's request into run-eval.sh arguments:
# Default: run all default suites
./evals/harness/run-eval.sh --suite framework --trusted
# With suite specified
./evals/harness/run-eval.sh --suite <suite> --trusted
# With task specified
./evals/harness/run-eval.sh --task <task-id> --trusted
# With skill filter
./evals/harness/run-eval.sh --skill <skill-name> --trusted
# Update baseline
./evals/harness/run-eval.sh --suite <suite> --update-baseline --reason "<reason>" --trusted
# JSON output for programmatic use
./evals/harness/run-eval.sh --suite <suite> --json --trusted
Note: --trusted flag is always added for local execution. In CI, the container sandbox provides isolation.
| Code | Meaning | |------|---------| | 0 | All pass, no regressions | | 1 | Regressions detected | | 2 | Infrastructure error | | 3 | Configuration error |
testing
# valid-skill Test skill with valid license for unit testing. ## Purpose Used in test_constructs_loader.bats to verify correct handling of valid licenses.
testing
# grace-skill Test skill in license grace period for unit testing. ## Purpose Used in test_constructs_loader.bats to verify correct handling of licenses in grace period.
testing
# expired-skill Test skill with expired license for unit testing. ## Purpose Used in test_constructs_loader.bats to verify correct handling of expired licenses.
testing
# skill-b Test skill B from test-pack for unit testing. ## Purpose Used in test_pack_support.bats to verify pack validation and skill loading.