skills/llm/adaptive-time-budget-inference/SKILL.md
Dynamically reduce max_tokens and batch size as wall-clock time approaches a cutoff to ensure all inputs get processed
npx skillsauth add wenmin-wu/ds-skills llm-adaptive-time-budget-inferenceInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
In time-limited settings (competitions, APIs with timeout), LLM inference must balance quality (more tokens, more samples) against the risk of timeout. Monitor wall-clock time and progressively reduce max_tokens and batch size as the deadline approaches. Precompute a cutoff schedule so the check is a simple comparison, not a complex calculation.
import time
import numpy as np
MAX_TOKENS = 16384
MAX_BATCH = 16
TIME_LIMIT = 4 * 60 * 60 + 45 * 60 # 4h45m
start_time = time.time()
cutoff_time = start_time + TIME_LIMIT
def get_params():
elapsed_frac = (time.time() - start_time) / TIME_LIMIT
if elapsed_frac > 0.9:
return MAX_TOKENS // 3, MAX_BATCH // 3 # emergency mode
elif elapsed_frac > 0.7:
return 2 * MAX_TOKENS // 3, 2 * MAX_BATCH // 3 # reduced
return MAX_TOKENS, MAX_BATCH # full budget
def predict(question):
if time.time() > cutoff_time:
return default_answer
max_tokens, batch_size = get_params()
# ... generate with adjusted params ...
data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF