skills/real-data-validation-promotion/exports/openai/SKILL.md
Validate data or ML pipelines on the smallest real dataset scope first, then promote to staged batches and full sweeps with the same code path, editable local dependencies, artifact checks, and honest residual-risk reporting.
npx skillsauth add balandongiv/agent-skillbook real-data-validation-promotionInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Use this skill when a user cares about whether a pipeline, feature extractor, labeling stage, experiment loop, or reporting path truly works on the real dataset. The point is to validate the real behavior, not only the abstract logic. Synthetic data is still valuable for unit tests, but it does not prove that real folder layouts, annotations, caches, editable dependencies, or naming contracts behave correctly.
If the user wants to know whether a stage or full pipeline works, start from a real dataset scope whenever one is available and safe to use. Keep synthetic data for unit tests, isolated regression tests, and situations where no real sample can be run.
Pick the smallest real scope that still exercises the true code path. Usually that means:
Do not jump to the full sweep until the smallest real scope is clean.
The smoke run should use the same code path, dependency path, config family, artifact naming, and cache logic as the broader run. Narrow the scope, not the logic.
If the runtime uses a package installed in editable mode from a local repo, that package is part of the code under test. If the failure originates there, debug and patch it there, then rerun the real-data smoke scope before promoting.
A passing run should be justified by real outputs:
If a stage benefits from both human-readable summary EDA and a large plot gallery, keep them separate. Summary artifacts should stay fast to open and easy to inspect; heavy galleries can exist alongside them without making the main validation artifact unwieldy.
A pipeline can be runnable and still not be publication-ready or fully cleared. Record what passed, what was skipped, what failed, and what remains scientifically or operationally unresolved.
Identify:
Record the exact entrypoint, key parameters, config files, and editable local dependencies used by the validation run. If a debug helper exists, make sure it enters the same production functions in the same order.
Prefer:
jobs=1 when step-through inspection is helpfulDo not stop at exit code. Check:
If the issue is in:
Then rerun the same smallest real-data smoke scope.
Only after the smoke scope is clean, promote to:
Keep the code path stable across promotions.
Close with a concise but honest summary:
tools
One-sentence description of what this skill does and when to use it.
tools
One-sentence description of what this skill does and when to use it.
documentation
Review per-subject performance to identify likely outliers, distinguish bad data from difficult but valid cases, and document whether subject exclusion is justified before any filtered rerun.
documentation
Review per-subject performance to identify likely outliers, distinguish bad data from difficult but valid cases, and document whether subject exclusion is justified before any filtered rerun.