Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

balandongiv/subject-outlier-review

Name: subject-outlier-review
Author: balandongiv

skills/subject-outlier-review/exports/claude/SKILL.md

npx skillsauth add balandongiv/agent-skillbook subject-outlier-review

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Subject Outlier Review

Use this skill when weak overall model performance may be driven by one or more subjects. The goal is to produce an auditable subject-level review, not to silently remove a subject because the metrics look inconvenient.

Core principles

1. Start from concrete run artifacts

Use the actual predictions, metrics, and subject-performance artifacts from the run. Do not infer outliers from aggregate scores alone.

2. Rank subjects before diagnosing them

Identify the weakest and strongest subjects explicitly before deciding what kind of issue each subject represents.

3. Distinguish bad data from hard data

A low score does not automatically mean the subject is invalid. Check whether the subject is sparse, imbalanced, noisy, mislabeled, or simply difficult but still valid.

4. Keep exclusion explicit and auditable

If a subject is excluded, the decision must be justified, rerun through the pipeline explicitly, and reported alongside the full-cohort result.

Step-by-step process

Step 1: Read the repository workflow contract

Read planning/Project_Execution_Flowchart.md first.

Step 2: Locate the run artifacts

Prefer the concrete run outputs:

05_models/predictions.parquet
05_models/metrics.json
05_models/subject_performance.parquet
the relevant manifest.json

Step 3: Derive the per-subject table if needed

If subject_performance.parquet is missing, build the subject-level comparison from predictions.parquet.

Step 4: Rank the subjects

Rank subjects by:

mcc
balanced_accuracy
support count or row count
any other available classification metrics that help explain the gap

Step 5: Diagnose the weakest subjects

Check whether weak subjects show:

low support only
strong class imbalance
uniformly poor metrics
likely preprocessing gaps
likely annotation or channel-quality issues
difficulty that still looks valid rather than corrupted

Step 6: Decide the recommendation

State one of:

exclusion is justified
exclusion is not justified
exclusion is still unproven

If evidence is missing, say exactly what evidence is missing.

Step 7: If exclusion is justified, preserve both result sets

Use analysis.evaluation.exclude_subject_ids for the filtered rerun and keep both:

the full-cohort result
the filtered-cohort result

Step 8: Update repository docs if the workflow changed

If the subject-outlier review changes expected artifacts or reporting steps, update planning/Project_Execution_Flowchart.md.

Rules

Always read planning/Project_Execution_Flowchart.md first.
Always start from concrete run artifacts.
Always show a subject-by-subject comparison table.
Always call out the weakest and strongest subjects explicitly.
Always explain whether the weak subject looks invalid, weak but valid, or still uncertain.
Always keep subject exclusion explicit and auditable.
Never drop a subject silently.
Never recommend exclusion from a single low metric alone.
Never overwrite the full-cohort result with the filtered-cohort result.

Common mistakes to avoid

Jumping from low MCC to exclusion: Low performance alone is not enough to justify dropping a subject.
Ignoring support count: A weak subject with very few rows may need a different interpretation from a weak subject with strong support.
No comparison table: Outlier claims are hard to evaluate without the full subject ranking.
Conflating noisy with invalid: Some subjects are simply hard cases and should remain in the final analysis.
Reporting only filtered metrics: That hides the effect of the exclusion and makes the decision unauditable.

balandongiv/subject-outlier-review

skills/subject-outlier-review/exports/claude/SKILL.md

Review per-subject performance to identify likely outliers, distinguish bad data from difficult but valid cases, and document whether subject exclusion is justified before any filtered rerun.

documentation

Updated May 22, 2026

$ install --global

skillsauth

npx skillsauth add balandongiv/agent-skillbook subject-outlier-review

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 22, 2026, 3:20 AM200.2s1 file scanned

SKILL.md

name:: subject-outlier-review
description:: Review per-subject performance to identify likely outliers, distinguish bad data from difficult but valid cases, and document whether subject exclusion is justified before any filtered rerun.
disable-model-invocation:: false
user-invocable:: true
allowed-tools:: []

Subject Outlier Review

Core principles

1. Start from concrete run artifacts

Use the actual predictions, metrics, and subject-performance artifacts from the run. Do not infer outliers from aggregate scores alone.

2. Rank subjects before diagnosing them

Identify the weakest and strongest subjects explicitly before deciding what kind of issue each subject represents.

3. Distinguish bad data from hard data

A low score does not automatically mean the subject is invalid. Check whether the subject is sparse, imbalanced, noisy, mislabeled, or simply difficult but still valid.

4. Keep exclusion explicit and auditable

If a subject is excluded, the decision must be justified, rerun through the pipeline explicitly, and reported alongside the full-cohort result.

Step-by-step process

Step 1: Read the repository workflow contract

Read planning/Project_Execution_Flowchart.md first.

Step 2: Locate the run artifacts

Prefer the concrete run outputs:

05_models/predictions.parquet
05_models/metrics.json
05_models/subject_performance.parquet
the relevant manifest.json

Step 3: Derive the per-subject table if needed

If subject_performance.parquet is missing, build the subject-level comparison from predictions.parquet.

Step 4: Rank the subjects

Rank subjects by:

mcc
balanced_accuracy
support count or row count
any other available classification metrics that help explain the gap

Step 5: Diagnose the weakest subjects

Check whether weak subjects show:

low support only
strong class imbalance
uniformly poor metrics
likely preprocessing gaps
likely annotation or channel-quality issues
difficulty that still looks valid rather than corrupted

Step 6: Decide the recommendation

State one of:

exclusion is justified
exclusion is not justified
exclusion is still unproven

If evidence is missing, say exactly what evidence is missing.

Step 7: If exclusion is justified, preserve both result sets

Use analysis.evaluation.exclude_subject_ids for the filtered rerun and keep both:

the full-cohort result
the filtered-cohort result

Step 8: Update repository docs if the workflow changed

If the subject-outlier review changes expected artifacts or reporting steps, update planning/Project_Execution_Flowchart.md.

Rules

Always read planning/Project_Execution_Flowchart.md first.
Always start from concrete run artifacts.
Always show a subject-by-subject comparison table.
Always call out the weakest and strongest subjects explicitly.
Always explain whether the weak subject looks invalid, weak but valid, or still uncertain.
Always keep subject exclusion explicit and auditable.
Never drop a subject silently.
Never recommend exclusion from a single low metric alone.
Never overwrite the full-cohort result with the filtered-cohort result.

Common mistakes to avoid

Jumping from low MCC to exclusion: Low performance alone is not enough to justify dropping a subject.
Ignoring support count: A weak subject with very few rows may need a different interpretation from a weak subject with strong support.
No comparison table: Outlier claims are hard to evaluate without the full subject ranking.
Conflating noisy with invalid: Some subjects are simply hard cases and should remain in the final analysis.
Reporting only filtered metrics: That hides the effect of the exclusion and makes the decision unauditable.

Related Skills

balandongiv/template-skill-name

tools

VerifiedTrustedCommunity

One-sentence description of what this skill does and when to use it.

SKILL.mdUpdated May 22, 2026

balandongiv/template-skill-name

balandongiv/template-skill-name

tools

VerifiedTrustedCommunity

One-sentence description of what this skill does and when to use it.

SKILL.mdUpdated May 22, 2026

balandongiv/template-skill-name