skills/nlp/simulated-annealing-query-optimization/SKILL.md
Uses simulated annealing to select the optimal subset of candidate terms for a boolean search query, maximizing a retrieval metric like AP@K.
npx skillsauth add wenmin-wu/ds-skills nlp-simulated-annealing-query-optimizationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Boolean search queries (used in patent search, legal discovery, academic retrieval) require selecting which terms to include from a candidate set. The search space is combinatorial — 2^N subsets for N candidate terms. Simulated annealing efficiently explores this space by randomly flipping term inclusion bits and accepting worse solutions with decreasing probability. The energy function evaluates query quality against a full-text index (Whoosh, Elasticsearch), typically using AP@K or recall@K.
import numpy as np
from simanneal import Annealer
class QueryOptimizer(Annealer):
def __init__(self, candidates, searcher, targets, **kwargs):
self.candidates = candidates # list of search terms
self.searcher = searcher # full-text index searcher
self.targets = targets # ground-truth document IDs
initial_state = np.random.binomial(1, 0.5, len(candidates))
super().__init__(initial_state, **kwargs)
def move(self):
"""Flip one random term's inclusion bit."""
idx = np.random.randint(len(self.state))
self.state[idx] = 1 - self.state[idx]
def energy(self):
"""Negative AP@50 — lower is better for annealer."""
active = [t for t, use in zip(self.candidates, self.state) if use]
if not active:
return 0.0 # empty query = worst score
query = " OR ".join(active)
results = self.searcher.search(query, limit=50)
return -average_precision(results, self.targets, k=50)
optimizer = QueryOptimizer(candidates, searcher, targets)
optimizer.steps = 2000
optimizer.Tmax = 1.0
optimizer.Tmin = 0.001
best_state, best_energy = optimizer.anneal()
best_terms = [t for t, u in zip(candidates, best_state) if u]
data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF