skills/tabular/ppo-gym-wrapper-kaggle-env/SKILL.md
Wrap a Kaggle competitive game environment as an OpenAI Gym env with continuous action space for training PPO agents via stable-baselines3
npx skillsauth add wenmin-wu/ds-skills tabular-ppo-gym-wrapper-kaggle-envInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Kaggle game AI competitions (Halite, Kore, Lux) use kaggle_environments which don't conform to the Gym API. Wrapping them in a gym.Env subclass with defined observation and action spaces enables training with stable-baselines3 PPO (or SAC, A2C). The wrapper handles state encoding, action translation, opponent management, and episode termination.
import gym
from gym import spaces
import numpy as np
from kaggle_environments import make
from stable_baselines3 import PPO
from stable_baselines3.common.monitor import Monitor
class KaggleGymEnv(gym.Env):
def __init__(self, opponent="random"):
super().__init__()
self.env = make("kore_fleets", debug=True)
self.opponent = opponent
self.observation_space = spaces.Box(-1, 1, shape=(21*21*4+3,), dtype=np.float32)
self.action_space = spaces.Box(-1, 1, shape=(3,), dtype=np.float32)
def reset(self):
self.trainer = self.env.train([None, self.opponent])
obs = self.trainer.reset()
return self._encode(obs)
def step(self, action):
game_action = self._decode(action)
obs, reward, done, info = self.trainer.step(game_action)
return self._encode(obs), reward, done, info
env = Monitor(KaggleGymEnv())
model = PPO("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=100_000)
gym.Env, define observation and action spacesreset(), create a trainer via env.train([None, opponent]) — None marks the learning agentstep(), decode continuous actions to game actions, call trainer.step()Monitor for logging, train with PPO or similar algorithmstep()data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF