skills/hyperparameter-search-strategy/exports/openai/SKILL.md
Choose efficient hyperparameter search strategies for finding optimal parameter sets or parameter pairs, favoring random search, Bayesian optimization, successive halving, evolutionary methods, or population-based training over brute-force grids. Use when an experiment, detector, or training pipeline must tune parameters under compute and evaluation-cost constraints.
npx skillsauth add balandongiv/agent-skillbook hyperparameter-search-strategyInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Use this skill when the core problem is not "run the experiment" but "decide how to search for a better parameter setting." The main responsibility of this skill is to choose a search strategy that matches the shape of the search space and the cost of evaluation. Do not default to brute-force grid search unless the space is genuinely tiny and exhaustive enumeration is clearly cheaper and simpler than anything smarter.
Treat search strategy as an engineering decision with tradeoffs. You must define the objective, the budget, and the evaluation protocol before recommending a method. A good answer explains why the chosen search method fits the detector or training loop, what compute budget it assumes, what evidence will count as improvement, and how the rationale will be recorded in experiment metadata.
Identify the metric to optimize, the evaluation cost per trial, the acceptable wall-clock budget, and any resource limits. A search method is only appropriate relative to these constraints.
Brute-force grid search wastes trials in large or mixed spaces because it evaluates many unimportant combinations. Use it only when the total number of combinations is truly small and exhaustive search is obviously the cheapest correct option.
Choose the method based on:
Always record the chosen strategy, search space, budget, seeds, stopping rule, and rationale in experiment metadata or the run note.
Use validation data, cross-validation, or another tuning-safe protocol during the search. Keep the final test or held-out evaluation out of the tuning loop.
Prefer random search when the space is large, mixed, or only a few parameters matter strongly. Random search usually beats coarse grids for the same budget because it covers more unique combinations.
Prefer Bayesian optimization when:
Bayesian optimization is usually a strong choice when you expect relatively few trials and want each next trial to use information from earlier ones.
Prefer successive halving or Hyperband when:
This is often the best choice for model-training workflows where poor candidates can be stopped after a few epochs or batches.
Prefer evolutionary algorithms when:
Evolutionary methods are especially useful when candidate representations are awkward for grid or Bayesian approaches.
Use population-based training only when the model or detector supports iterative training and can adapt hyperparameters during the run. Do not recommend it for one-shot evaluators or static scoring pipelines.
Exhaustive enumeration is acceptable when the space is small, discrete, and cheap enough that smarter optimization adds complexity without savings. State the combination count and explain why enumeration is cheaper.
Write down:
Determine whether the space is:
Select the method that best matches the space and budget:
Explain the choice plainly.
Hold constant what should stay constant across trials. Specify seeds, evaluation splits, resource caps, and any repeated runs needed to reduce noise. Compare candidates under the same rules.
For every search, record at minimum:
Do not call a configuration "optimal" until it survives a sensible confirmation step.
tools
One-sentence description of what this skill does and when to use it.
tools
One-sentence description of what this skill does and when to use it.
documentation
Review per-subject performance to identify likely outliers, distinguish bad data from difficult but valid cases, and document whether subject exclusion is justified before any filtered rerun.
documentation
Review per-subject performance to identify likely outliers, distinguish bad data from difficult but valid cases, and document whether subject exclusion is justified before any filtered rerun.