skills/tabular/autoencoder-timeseries-embedding/SKILL.md
Train a PyTorch autoencoder on time-series summary statistics to produce dense encoded features for downstream GBDT models
npx skillsauth add wenmin-wu/ds-skills tabular-autoencoder-timeseries-embeddingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
When per-subject time-series data is too large or variable-length to feed directly into tabular models, compress the .describe() statistics through an autoencoder. The encoder's bottleneck produces a fixed-size dense embedding that captures non-linear relationships between summary stats. Works well when raw stats are high-dimensional (50-200 columns) and correlated.
import torch
import torch.nn as nn
import pandas as pd
from sklearn.preprocessing import StandardScaler
class AutoEncoder(nn.Module):
def __init__(self, input_dim, enc_dim):
super().__init__()
self.encoder = nn.Sequential(
nn.Linear(input_dim, enc_dim * 3), nn.ReLU(),
nn.Linear(enc_dim * 3, enc_dim * 2), nn.ReLU(),
nn.Linear(enc_dim * 2, enc_dim), nn.ReLU())
self.decoder = nn.Sequential(
nn.Linear(enc_dim, input_dim * 2), nn.ReLU(),
nn.Linear(input_dim * 2, input_dim * 3), nn.ReLU(),
nn.Linear(input_dim * 3, input_dim), nn.Sigmoid())
def forward(self, x):
return self.decoder(self.encoder(x))
def encode_features(df_stats, enc_dim=50, epochs=50, bs=32):
scaler = StandardScaler()
X = torch.FloatTensor(scaler.fit_transform(df_stats.fillna(0)))
ae = AutoEncoder(X.shape[1], enc_dim)
opt = torch.optim.Adam(ae.parameters())
for _ in range(epochs):
for i in range(0, len(X), bs):
batch = X[i:i+bs]
loss = nn.MSELoss()(ae(batch), batch)
opt.zero_grad(); loss.backward(); opt.step()
with torch.no_grad():
enc = ae.encoder(X).numpy()
return pd.DataFrame(enc, columns=[f'enc_{i}' for i in range(enc_dim)])
df.describe() on each subject's time-series → flatten into one row per subjectStandardScaler to normalizedata-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF