skills/cv/chunked-csv-image-generator/SKILL.md
Memory-efficient Keras generator that streams sharded CSV files in chunks, renders strokes to images on-the-fly, and yields batches for training on datasets too large for memory
npx skillsauth add wenmin-wu/ds-skills cv-chunked-csv-image-generatorInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
When training on millions of images stored as raw data in CSV files (stroke coordinates, pixel arrays), loading everything into memory is infeasible. A streaming generator reads sharded CSVs in chunks, converts each chunk to images on-the-fly, and yields (X, y) batches. Random shard permutation ensures epoch-level shuffling without random access to the full dataset.
import pandas as pd
import numpy as np
import json
def image_generator(shard_ids, batch_size, size=64, n_classes=340):
while True:
for k in np.random.permutation(shard_ids):
filename = f"train_shard_{k}.csv.gz"
for chunk in pd.read_csv(filename, chunksize=batch_size):
chunk["drawing"] = chunk["drawing"].apply(json.loads)
X = np.zeros((len(chunk), size, size, 1), dtype=np.float32)
for i, strokes in enumerate(chunk["drawing"]):
X[i, :, :, 0] = render_strokes(strokes, size)
X = X / 255.0
y = np.eye(n_classes)[chunk["label"].values]
yield X, y
train_gen = image_generator(range(100), batch_size=256)
model.fit(train_gen, steps_per_epoch=800, epochs=50)
batch_size rowssteps_per_epochtf.data.Dataset.from_generator().prefetch(2) for GPU pipeline overlapworkers=4, use_multiprocessing=True if rendering is CPU-bounddata-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF