skills/tabular/time-varying-reward-shaping/SKILL.md
Shape RL rewards with time-decaying asset weights and time-increasing resource weights so the agent transitions from expansion to accumulation as the game progresses
npx skillsauth add wenmin-wu/ds-skills tabular-time-varying-reward-shapingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
In resource-management games, the optimal strategy shifts over time: early game rewards fleet/structure building (assets), late game rewards resource hoarding. Time-varying reward shaping encodes this by linearly interpolating weights between assets and resources as a function of turn number, plus a terminal win bonus inversely proportional to game length.
import numpy as np
max_steps = 400
transition_point = 300
w_assets = np.concatenate([
np.linspace(1.0, 0.0, transition_point),
np.zeros(max_steps - transition_point)
])
w_resources = np.concatenate([
np.linspace(0.0, 1.0, transition_point),
np.ones(max_steps - transition_point)
])
def board_value(player, turn, ship_cost=10, yard_cost=50):
val_resources = w_resources[turn] * player.kore
val_ships = w_assets[turn] * ship_cost * player.total_ships
val_yards = w_assets[turn] * yard_cost * player.total_shipyards
return val_resources + val_ships + val_yards
def reward(board, prev_board, turn, done, won):
r = board_value(board.current_player, turn) - board_value(prev_board.current_player, turn - 1)
if done:
bonus = (1 if won else -1) * (100 + 5 * (max_steps - turn))
r += bonus
return r
remaining_turns * constant encourages decisive play over stallingdata-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF