skills/subject-outlier-review/exports/claude/SKILL.md
Review per-subject performance to identify likely outliers, distinguish bad data from difficult but valid cases, and document whether subject exclusion is justified before any filtered rerun.
npx skillsauth add balandongiv/agent-skillbook subject-outlier-reviewInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Use this skill when weak overall model performance may be driven by one or more subjects. The goal is to produce an auditable subject-level review, not to silently remove a subject because the metrics look inconvenient.
Use the actual predictions, metrics, and subject-performance artifacts from the run. Do not infer outliers from aggregate scores alone.
Identify the weakest and strongest subjects explicitly before deciding what kind of issue each subject represents.
A low score does not automatically mean the subject is invalid. Check whether the subject is sparse, imbalanced, noisy, mislabeled, or simply difficult but still valid.
If a subject is excluded, the decision must be justified, rerun through the pipeline explicitly, and reported alongside the full-cohort result.
Read planning/Project_Execution_Flowchart.md first.
Prefer the concrete run outputs:
05_models/predictions.parquet05_models/metrics.json05_models/subject_performance.parquetmanifest.jsonIf subject_performance.parquet is missing, build the subject-level comparison from predictions.parquet.
Rank subjects by:
mccbalanced_accuracyCheck whether weak subjects show:
State one of:
If evidence is missing, say exactly what evidence is missing.
Use analysis.evaluation.exclude_subject_ids for the filtered rerun and keep both:
If the subject-outlier review changes expected artifacts or reporting steps, update planning/Project_Execution_Flowchart.md.
planning/Project_Execution_Flowchart.md first.tools
One-sentence description of what this skill does and when to use it.
tools
One-sentence description of what this skill does and when to use it.
documentation
Review per-subject performance to identify likely outliers, distinguish bad data from difficult but valid cases, and document whether subject exclusion is justified before any filtered rerun.
development
Record and track strategy proposals, code changes, performance metrics, issues encountered, and their cumulative effects on final results to maintain a durable audit trail of what was tried, what worked, and what didn't.