wenmin-wu

535 verified skills14,723 total stars

timeseries-quantile-ratio-scaling

Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF

tools31

llm-sliding-window-permutation-search

Local search that slides a window of size p across a word sequence, brute-forcing all permutations within each window to minimize an objective like LLM perplexity

testing31

llm-perplexity-prompt-ranking

Rank candidate prompts by computing LLM perplexity of the full conversation conditioned on each prompt, selecting the lowest-perplexity candidate as the best match

data-ai31

llm-greedy-word-reinsert-search

Greedy local search that removes one element from a fixed position and re-inserts it at every possible index, keeping the best improvement per round

data-ai31

llm-batched-perplexity-scoring

Batch-compute perplexity for multiple texts using a causal LM with proper padding, shifted labels, and pad-token masking for efficient GPU utilization

testing31

cv-slice-padding-augmented-duplicates

Pad 3D volumes with fewer slices than required by duplicating existing slices with slight brightness variation via convertScaleAbs

tools31

cv-batch-all-contrastive-loss

All-vs-all contrastive loss comparing every pair in a batch (N^2 pairs) with margin and compactification regularizer

data-ai31

nlp-anchor-grouped-validation

Split validation by unique anchor/query entities so no anchor appears in both train and val, preventing data leakage in pairwise matching tasks

data-ai31

tabular-distribution-matching-postprocess

Reshapes model predictions to match the known label distribution from training data using rank-based mapping.

data-ai31

tabular-frequency-encoding

Adds each feature's value-count frequency as a new column, enabling tree models to split on how common or rare a value is.

data-ai31

tabular-group-kfold-leak-prevention

Uses GroupKFold to prevent data leakage when multiple rows share a common entity (e.g., same user, question, or document).

documentation31

tabular-group-shuffle-split

Splits train/validation using GroupShuffleSplit so that related samples (forks, families, sessions) never span both sets.

data-ai31

tabular-lgbm-dart-boosting

Use LightGBM DART boosting (dropout on trees) with aggressive feature and bagging fractions to reduce overfitting on high-dimensional tabular data

data-ai31

tabular-log-odds-fold-averaging

Average model predictions across CV folds in log-odds space rather than probability space for better-calibrated ensemble outputs

data-ai31

tabular-null-importance-feature-selection

Scores features by comparing actual importances against a null distribution from shuffled targets, removing features that cannot beat random noise.

data-ai31

tabular-pairwise-te-logit-stacking

Generates all C(n,2) pairwise feature combinations, target-encodes each pair with cuML TargetEncoder, then applies logit polynomial expansion (z, z^2, z^3) for stacking with cuML LogisticRegression.

development31

tabular-pseudo-labeling

Augments training data with high-confidence test predictions as pseudo labels, retrains the model, and keeps the result only if OOF AUC improves. A semi-supervised technique for tabular competitions.

testing31

tabular-polynomial-interaction-features

Generates polynomial powers and interaction terms from selected numeric features to capture nonlinear relationships with the target.

tools31

tabular-row-aggregate-features

Engineers row-wise statistical features (sum, mean, std, skew, kurtosis, median, min, max) across all numeric columns per sample.

content-media31

tabular-row-wise-target-normalization

Normalizes each sample's multi-output target vector to zero mean and unit variance, removing per-sample scale differences before training.

data-ai31

tabular-spatial-distance-aggregation

Compute min/max/mean/std of Euclidean distances from all entities to a key point, then aggregate per group for spatial feature engineering

tools31

tabular-synthetic-sample-detection

Detects synthetic/fake test samples by checking whether each row has at least one unique value across all features — real samples do, synthetic ones don't.

testing31

timeseries-prediction-smoothing-lpf

Applies rolling mean or Butterworth low-pass filter to model predictions for temporal consistency and noise reduction.

data-ai31

tabular-weather-ordinal-encoding

Map free-text categorical descriptions to ordinal numeric scores via keyword matching — captures ordered severity in a single dense feature

development31

tabular-weighted-recall-multi-objective-metric

Evaluate recommendation quality with recall@K per action type, combined via business-importance weights

testing31

timeseries-cnn-transformer-multimodal-fusion

Process multiple sensor modalities through separate CNN branches then fuse via a transformer with CLS token for classification

tools31

timeseries-dilated-conv-residual-gru

Combines dilated 1D convolutions for multi-scale receptive fields with residual bidirectional GRU layers for sequence classification.

tools31

timeseries-event-peak-detection

Detects discrete events (state transitions) from continuous predictions using local maxima with minimum-interval constraints.

data-ai31

timeseries-k-mode-gaussian-nll-loss

Negative log-likelihood loss over K isotropic-Gaussian trajectory modes with per-mode confidences and logsumexp stability

tools31

timeseries-mean-residual-decomposition

Decompose multi-output prediction into a global mean (1D model) plus per-channel residuals (2D model) with quadrature uncertainty

data-ai31

timeseries-mixup-sequence-augmentation

Apply MixUp augmentation to padded time-series batches with Beta-distributed lambda and soft label mixing

tools31

timeseries-se-residual-1d-cnn

1D ResNet block with Squeeze-and-Excitation channel attention for temporal sensor feature extraction

testing31

tabular-dtype-preset-csv-load

Predefines minimal unsigned integer dtypes before CSV loading to cut DataFrame memory usage by 2-4x without any data loss.

data-ai31

timeseries-scaled-pinball-loss

Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data

data-ai31

timeseries-ratio-target-for-smape

Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE

testing31

timeseries-retroactive-outlier-rescaling

Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies

data-ai31

tabular-simulated-annealing-multi-operator

Simulated annealing with diverse move operators (translate, rotate, swap, Levy flight, squeeze) and adaptive reheating on stagnation for combinatorial optimization

tools31

timeseries-density-to-count-roundtrip

Convert a density metric back to integer counts using known population, round to nearest integer, then recompute density to exploit the discrete nature of the target

tools31

tabular-rmsle-keras-metric

Custom Keras RMSLE metric using K.log with K.clip to safely evaluate price and count regression during training

tools31

cv-cnn-rnn-video-classification

Extract per-frame CNN features then classify the temporal sequence with stacked GRU layers and a boolean mask for variable-length video inputs

content-media31

cv-deterministic-hash-partitioning

Partition a large dataset into N balanced shards using integer key modulo arithmetic for reproducible, class-interleaved splits across CSV files

testing31

cv-clip-interrogator-captioning

Generate descriptive text prompts from images by combining BLIP captioning with CLIP cosine similarity against curated label banks for medium, movement, and flavor attributes

tools31

cv-embedding-knn-regression

GPU-accelerated k-NN regression on CLIP image embeddings using cosine distance and inverse-distance-power weighting to predict target embedding vectors

tools31

tabular-opponent-avoidance-scoring

Score candidate movement directions by average distance to nearby opponents and pick the safest path for ball-carrying agents in game AI

data-ai31

tabular-featureunion-field-dispatch

Use sklearn FeatureUnion with closure-based preprocessors to apply different vectorizers to different DataFrame columns in a single fit_transform call

data-ai31

tabular-brand-name-recovery-from-title

Recover missing categorical values by matching words in a related text field against a known vocabulary built from the full dataset

development31

tabular-ball-landing-prediction

Predict where a projectile will land using kinematic equations with estimated gravity to intercept aerial passes in game AI simulations

data-ai31

nlp-lda-topic-modeling

Latent Dirichlet Allocation on CountVectorizer bag-of-words to discover latent topics with per-document topic distributions for feature engineering or EDA

documentation31

cv-sentence-transformer-target-encoding

Encode text prompts into fixed-length dense vectors using SentenceTransformer for cosine-similarity evaluation in image-to-text retrieval tasks

development31

cv-pairwise-tracking-feature-merge

Double left-join on tracking data to create pairwise features (positions, velocities, distance) for both entities in an interaction pair

testing31

cv-percentile-contrast-stretch

Normalize high-dynamic-range satellite or medical imagery to [0,1] using per-channel percentile clipping to suppress outliers while preserving relative contrast

tools31

cv-mask-to-polygon-contour-hierarchy

Convert binary segmentation masks to Shapely MultiPolygons using cv2 contour hierarchy to correctly handle interior holes, with Douglas-Peucker simplification

development31

cv-3d-center-crop-zero-pad

Select center slices from a 3D volume and zero-pad along depth when the scan has fewer slices than the target count

tools31

timeseries-tof-spatial-region-pooling

Aggregate high-dimensional spatial sensor grids into hierarchical region statistics at multiple granularities

tools31

timeseries-sensor-modality-dropout

Randomly zero out entire sensor modalities during training with a learned gate to handle missing modalities at inference

data-ai31

timeseries-poisson-lgbm-bias-corrected-ensemble

Train a single Poisson LightGBM count forecaster, then ensemble its predictions with multiple multiplicative scaling factors (alpha ≈ 1.02-1.03) to undo the systematic downward bias of Poisson regression on intermittent retail data

data-ai31

timeseries-quaternion-angular-velocity

Derive angular velocity from consecutive quaternion frames via relative rotation and rotvec conversion

testing31

timeseries-neighbor-average-nan-interpolation

Fill gaps in a daily exogenous series (oil prices, sensor feeds) by merging against a full calendar to expose NaNs, then replacing each NaN with the midpoint of its nearest valid left and right neighbors, walking outward past consecutive NaN runs

testing31

timeseries-multi-scale-rolling-features

Computes rolling mean/max/std at multiple window sizes plus total variation (abs first differences) for multi-resolution temporal context.

tools31

timeseries-multi-gap-lag-diff-features

Generate shift/diff features at multiple lag sizes (1,2,3,5,10,20,50,100) over cursor/time/state series, then aggregate statistics per session

tools31

timeseries-locale-scoped-holiday-flag

Build a single per-row "day off" boolean from a holidays table with National/Regional/Local locale hierarchy and Work Day overrides that flip make-up working weekends back to working days

development31

timeseries-inverse-variance-channel-weighting

Weight multi-channel signals by inverse per-channel variance with percentile clipping, emphasizing low-noise channels in aggregation

tools31

timeseries-gradient-transit-phase-detection

Detect event ingress/egress boundaries by finding steepest gradient on each side of the signal minimum in a smoothed time series

tools31

timeseries-gradient-event-boundary-detection

Detect event start/end boundaries in time series by finding extrema of the first derivative (steepest gradient points)

tools31

timeseries-event-anchored-frame-sync

Align low-Hz sensor data to high-fps video by anchoring a named event (e.g. ball_snap) to a known frame index and converting time offsets via fps

data-ai31

timeseries-detector-calibration-pipeline

Multi-step detector calibration pipeline — ADC inversion, hot/dead pixel masking, nonlinearity correction, dark subtraction, flat-field normalization

devops31

timeseries-correlated-double-sampling

Subtract paired reference frames from signal frames to cancel readout noise and common-mode bias

data-ai31

tabular-yearly-partitioned-groupby

Split a multi-year table into per-year partitions, run the same groupby aggregation on each, then concat and gc — a pure-pandas map-reduce that survives 100M+ rows on a 16GB kernel

tools31

tabular-xgb-gpu-batch-iterator

Use XGBoost DeviceQuantileDMatrix with a custom batch iterator to train on large datasets without exhausting GPU memory

development31

tabular-vote-ensemble-outer-join

Ensemble ranked recommendation lists by outer-joining exploded candidates and re-ranking by weighted vote sum

tools31

tabular-typed-panel-aggregation

Aggregate panel/sequential data with type-appropriate statistics — numeric (mean/std/min/max/last) and categorical (count/last/nunique) — then concat into flat features

development31

tabular-weighted-gini-top-recall-metric

Custom ranking metric combining normalized weighted Gini coefficient with top-K% capture rate for imbalanced classification with class-weighted evaluation

testing31

tabular-two-level-hierarchical-aggregation

Aggregates deeply nested relational tables through two groupby levels (child → intermediate → parent) to build features from multi-hop relationships.

development31

tabular-temporal-session-aggregation

Builds user-level features by accumulating statistics across sequential event sessions before each assessment point.

development31

tabular-tabular-to-image-cnn

Reshapes tabular features into 2D pseudo-images via random feature permutation, enabling CNN-based feature interaction learning.

content-media31

tabular-svd-target-reconstruction

Compresses high-dimensional targets with TruncatedSVD, trains on the reduced space, then reconstructs full predictions via the components matrix.

data-ai31

tabular-sparse-dense-hstack-lgbm

Train LightGBM directly on a scipy.sparse.hstack of TF-IDF text vectors and dense tabular columns, passing feature_name and categorical_feature so native categorical handling survives the sparse block

development31

tabular-regression-to-cdf-smoothing

Convert a scalar regression prediction into a smoothed CDF over discrete bins using a linear ramp instead of a hard step

tools31

tabular-rdkit-molecular-descriptors

Computes all numeric RDKit molecular descriptors from SMILES strings, filtering out NaN, constant, and infinite values to produce a clean feature matrix.

tools31

tabular-play-direction-normalization

Mirror spatial coordinates and angles so all plays face the same direction — removes left/right asymmetry from sports and spatial data

data-ai31

cv-black-slice-replacement

Detect all-black DICOM/MRI slices (mean==0) and replace them by randomly sampling a non-black slice from the same series

data-ai31

cv-brovey-pansharpening

Fuse low-resolution multispectral bands with a high-resolution panchromatic band using the Brovey transform to produce sharp multi-band imagery

content-media31

tabular-personnel-count-parsing

Parse structured text fields like '1 RB, 2 TE, 2 WR' into separate numeric columns per category

development31

cv-centroid-distance-anomaly-score

Score each face in a video as anomalous by computing L2 distance from the embedding centroid of all faces, then convert to probability via logistic function

data-ai31

cv-smm-spatial-observation-encoding

Encode game state as a Super Mini Map (SMM) with separate binary channels for players, ball, and ownership, bit-packed for efficient transfer in RL training

development31

tabular-per-feature-bias-correction

Post-processing correction for multi-output regression — scale each output by its train-derived mean ratio to fix systematic per-feature bias

data-ai31

tabular-per-partition-variance-filtering

Apply VarianceThreshold within each data partition on combined train+test to select informative features per subgroup

development31

cv-structured-gt-serialization-tokens

Serialize structured ground truth (chart data, table rows, form fields) as special-token-delimited sequences for generative vision-language model training

testing31

cv-tensor-core-aligned-padding

Pad batch sequence lengths to multiples of 8 for efficient tensor core utilization on GPUs, with -100 masking for label padding

testing31

cv-temporal-frame-jitter-augmentation

Add random temporal offset to the center frame during training to augment temporal diversity in video-based models

data-ai31

tabular-outlier-aware-two-stage-blending

When a regression target has a long discrete tail (e.g. ~1% of rows pinned at -33.22 in Elo), train one regressor on the *non-outlier* subset, a separate binary classifier for the outlier flag, and splice the predictions — replace the top-K most-confident outlier predictions in the regressor's output with the outlier value, where K is calibrated on validation

data-ai31

tabular-next-click-time-delta

Computes seconds until the next event within a group using diff().shift(-1) on sorted timestamps, capturing user behavior velocity.

tools31

llm-sentence-truncation-fallback

Truncate LLM output to exactly N sentences and fall back to a known-good baseline string when output is empty or too short

development31

llm-spiral-patrol-exploration

Agents patrol in expanding spiral patterns using rotating direction sequences with increasing radius for systematic grid exploration

data-ai31

tabular-hierarchical-collision-cascade

Three-level polygon overlap test — AABB early exit, then point-in-polygon ray casting, then segment intersection — for fast non-convex collision detection

development31

tabular-multi-source-candidate-fusion

Fuse recommendation candidates from user history, multiple co-visitation matrices, and global popularity in a priority-ordered cascade

tools31

tabular-multi-input-embedding-nn

Keras multi-input model with separate embedding layers for categoricals, GRU for text sequences, and dense layers for numerics, all concatenated into a shared regression trunk

development31

tabular-leak-free-loop-features

Iterates through rows chronologically to accumulate user statistics, fetching current state before updating to prevent future data leakage.

data-ai31

timeseries-kaggle-api-streaming-inference

Predict day-by-day via Kaggle's iter_test API while maintaining a rolling history buffer for computing lag features online

development31

timeseries-multi-lag-target-features

Generate lag features for multiple targets over N days by shifting evaluation dates and self-joining per entity, creating a wide feature matrix of past target values

tools31

tabular-implicit-als-collaborative-filtering

Alternating Least Squares matrix factorization on sparse user-item interaction matrices for implicit feedback recommendations.

testing31

tabular-cyclical-feature-encoding

Encodes cyclical features (hour, month, day-of-week) using sine/cosine transforms to preserve circular distance.

tools31

nlp-pearson-correlation-metric

Use Pearson correlation coefficient as evaluation metric for semantic similarity regression tasks, selecting best checkpoint by correlation rather than loss

testing31

nlp-two-stage-retrieve-rerank

Two-stage pipeline where an unsupervised bi-encoder retrieves KNN candidates and a supervised cross-encoder reranks them with sigmoid thresholding

development31

nlp-taxonomy-context-enrichment

Enrich model input by mapping categorical codes to human-readable taxonomy descriptions and concatenating them as context for transformer models

development31

nlp-dropout-disabled-inference

Explicitly zero all dropout probabilities in transformer config at load time for fully deterministic inference

testing31

cv-learned-distance-metric

Trainable nonlinear distance metric that transforms (v1-v2) and (v1-v2)^2 through a linear layer before computing squared norm

data-ai31

cv-greedy-mask-overlap-resolution

Resolve overlapping instance masks by greedily assigning contested pixels to higher-confidence predictions using a running occupancy map

testing31

cv-cumulative-sum-channel

Add a cumulative sum channel along the vertical axis to capture directional structural trends in grayscale images for segmentation

content-media31

tabular-threaded-parquet-describe-features

Parallel-load per-subject parquet time-series files with ThreadPoolExecutor and flatten describe() statistics into tabular feature vectors

data-ai31

tabular-time-varying-reward-shaping

Shape RL rewards with time-decaying asset weights and time-increasing resource weights so the agent transitions from expansion to accumulation as the game progresses

development31

tabular-toroidal-manhattan-distance

Compute shortest Manhattan distance on a toroidal (wrapping) grid by comparing normal vs wrap-around routes in each axis

testing31

tabular-transductive-train-test-transform

Fit unsupervised transforms (scaler, PCA, variance filter) on combined train+test data for more stable statistics, especially on small datasets

testing31

tabular-transitive-match-closure

Post-processes entity match predictions to enforce symmetry (A→B implies B→A) and transitivity (A→B, B→C implies A→C) via graph closure.

tools31

cv-tumor-volume-ratio-features

Extract volumetric features from 3D segmentation masks including scan/tumor pixel ratios, tumor percentage, and tumor centroid coordinates

tools31

llm-turn-based-prompt-accumulation

Rebuild multi-turn conversation context by interleaving user/assistant turns with chat template tokens into a single prompt each call

development31

timeseries-tweedie-objective-zero-inflated

Use LightGBM's tweedie objective with variance_power between 1.05 and 1.2 for zero-inflated count forecasting (retail SKUs, intermittent demand, click events) — handles the "many zeros plus a heavy right tail" distribution that breaks both regression (RMSE) and classification (BCE) objectives

tools31

cv-two-stage-bce-then-lovasz

Train segmentation with BCE loss first for stable convergence, then fine-tune with Lovasz-hinge on raw logits for IoU-optimal predictions

data-ai31

timeseries-unknown-class-residual-probability

Estimate out-of-distribution class probability as the product of (1 - p_i) across all known classes, scaled by a calibrated prior

tools31

cv-video-frame-sampling-pipeline

Efficiently sample N evenly-spaced frames from a video using OpenCV grab/retrieve pattern with optional resize for batch face detection or classification

testing31

timeseries-wavelet-denoising

Denoise an erratic 1D series with discrete wavelet decomposition + universal soft thresholding (sigma estimated from MAD of the detail coefficients) to extract the underlying trend/seasonality without lagging the signal — a far better trend extractor than rolling means for spiky retail or sensor data

testing31

cv-weighted-embedding-blending

Ensemble predictions from heterogeneous vision-language models by blending their output embeddings with fixed scalar weights in embedding space

data-ai31

tabular-weighted-position-decay-ensemble

Ensembles multiple ranked recommendation lists by scoring items as model_weight / position_rank, then re-ranking.

data-ai31

cv-coco-annotation-conversion

Convert per-instance RLE or polygon annotations to COCO JSON format for seamless use with Detectron2 and MMDetection

development31

llm-code-execution-weighted-vote

Execute Python code blocks from LLM math responses in a sandbox, then double-weight code-derived answers in majority voting

development31

tabular-collinear-feature-removal

Removes redundant features by iterating pairwise Pearson correlations and dropping one member of each pair exceeding a threshold.

data-ai31

tabular-game-state-grid-encoding

Encode a 2D game board into a normalized multi-channel feature tensor with log-scaled resources, signed unit counts, and directional features for RL agents

development31

cv-per-modality-ensemble-averaging

Train separate models per imaging modality (FLAIR/T1w/T1wCE/T2w) and average their predictions for final ensemble

data-ai31

tabular-iterative-pseudo-label-refinement

Multi-round pseudo labeling with progressively confident test predictions merged into training plus OOF-based train label correction

testing31

cv-3d-encoder-2d-decoder-segmentation

3D ResNet encoder extracts volumetric features, pools depth dimension, then feeds into a 2D UNet/FPN decoder for segmentation

development31

llm-4bit-nf4-double-quantization

Load large LLMs with 4-bit NF4 quantization and optional double quantization via BitsAndBytes to reduce GPU memory by 4x while preserving inference quality

testing31

timeseries-activity-threshold-lastval-fallback

Override model predictions with last known value for low-activity or low-density entities where learned trends are unreliable

data-ai31

llm-actor-critic-game-agent

Shared-backbone neural network with actor (policy) and critic (value) heads for grid-based game agent RL training

data-ai31

llm-adaptive-time-budget-inference

Dynamically reduce max_tokens and batch size as wall-clock time approaches a cutoff to ensure all inputs get processed

data-ai31

llm-answer-accumulation-filter

Parse sequential yes/no answers to build inclusion/exclusion sets, then apply compound boolean filters to narrow a candidate list

development31

tabular-autoencoder-timeseries-embedding

Train a PyTorch autoencoder on time-series summary statistics to produce dense encoded features for downstream GBDT models

development31

timeseries-availability-masked-regression-loss

Multiply per-timestep regression loss by a 0/1 availability mask so missing future steps contribute zero gradient

data-ai31

cv-bbox-interpolation-temporal-tracking

Interpolate missing bounding boxes across video frames using bidirectional pandas interpolation to maintain smooth tracking through occlusions

data-ai31

nlp-best-prob-fallback-matching

When no candidate passes the threshold for a query, fall back to the single highest-scoring match to guarantee at least one prediction per query

databases31

llm-binary-answer-clamping

Force-clamp free-form LLM output to binary yes/no with keyword matching and a fallback default for constrained environments

data-ai31

llm-binary-search-entity-narrowing

Hierarchical binary search over entity space by asking category, region, then first-letter questions to narrow candidates before guessing

development31

timeseries-bootstrapped-residual-prediction-intervals

Generate prediction intervals by repeatedly sampling from model residuals, adding to point forecasts, and taking quantiles across synthetic futures

data-ai31

llm-boxed-answer-extraction

Extract final numeric answers from LaTeX \boxed{} notation in LLM math reasoning output, scanning matches in reverse for robustness

data-ai31

timeseries-burst-rle-detection

Detect P-bursts (fast-typing runs) and R-bursts (consecutive revisions) via polars run-length encoding over boolean event conditions

tools31

llm-cargo-threshold-state-machine

Per-agent COLLECT/DEPOSIT state machine driven by cargo thresholds with greedy neighbor selection for resource collection games

development31

cv-center-z-slice-selection

Select a fixed number of Z-slices centered around the volume midpoint for memory-efficient 2.5D input from 3D CT/MRI stacks

testing31

tabular-chained-target-prediction

Predict correlated targets sequentially, using earlier target predictions as input features for subsequent targets to exploit inter-target dependencies

testing31

cv-chunked-csv-image-generator

Memory-efficient Keras generator that streams sharded CSV files in chunks, renders strokes to images on-the-fly, and yields batches for training on datasets too large for memory

testing31

cv-class-density-threshold-patch-sampling

Sample training patches from large images using per-class area-fraction thresholds to ensure each patch contains meaningful object coverage

data-ai31

cv-classifier-free-guidance-diffusion

Manual stable diffusion inference loop with classifier-free guidance that interpolates between unconditional and conditional noise predictions for controllable image generation

content-media31

timeseries-class-weighted-multiclass-logloss

Custom multiclass log-loss that weights per-class contributions by class frequency and domain importance, usable as both training loss and eval metric

data-ai31

tabular-column-shuffle-augmentation

Augments imbalanced tabular data by independently shuffling each feature column within a class, creating synthetic samples that preserve per-column marginal distributions.

data-ai31

tabular-confidence-probability-clipping

Hard-clips predicted probabilities to 0 or 1 when they exceed high-confidence thresholds, reducing log loss on near-certain predictions.

tools31

tabular-confidence-weighted-rate-encoding

Encodes categorical groups by their target rate scaled by a log-confidence factor, smoothing unreliable rates from low-frequency groups toward zero.

development31

tabular-content-difficulty-features

Precomputes item/content difficulty as historical mean accuracy, merged as a static feature for user-item prediction tasks.

content-media31

cv-conv1d-lstm-stroke-classifier

Stack 1D convolutions for local feature extraction before bidirectional LSTMs to classify variable-length stroke sequences into hundreds of doodle categories

development31

tabular-co-purchase-item-pairing

Recommends items frequently purchased together with a customer's recent items using pre-computed pair dictionaries.

data-ai31

cv-coverage-stratified-split

Stratify train/validation split by binned mask coverage percentage to ensure balanced foreground representation in segmentation tasks

data-ai31

tabular-cross-dataset-user-aggregation

Build user-level behavioral features (avg listing duration, relisting frequency, total items) by joining auxiliary activity tables that share user_id but not item_id with train/test

development31

tabular-crps-cdf-loss

Model cumulative distribution via softmax output layer and CRPS loss — for probabilistic regression over discrete bins

data-ai31

cv-isotropic-resize-with-padding

Resize images preserving aspect ratio then zero-pad to a square to avoid distortion artifacts in face crops or object detection inputs

content-media31

cv-detectron2-custom-data-mapper

Custom Detectron2 data mapper with photometric augmentations that properly transforms images, bounding boxes, and instance masks in sync

data-ai31

llm-inverse-task-prompt-template

Structured prompt template for recovering the instruction that transformed one text into another, with labeled original/rewritten fields and explicit task framing

testing31

tabular-declarative-groupby-aggregation

Config-driven feature factory that generates groupby aggregation features from a declarative spec list, supporting count, mean, var, nunique, cumcount, and custom lambdas.

tools31

nlp-domain-special-token-embedding

Add domain-specific categorical values as new special tokens, resize embeddings, and prepend them to input so the model learns domain-aware representations

development31

llm-dual-role-agent-dispatch

Single agent entry point that dispatches between multiple roles (ask/answer/guess) based on turn type, combining heuristic and LLM-based strategies

data-ai31

cv-dual-view-reshape-forward

Reshape dual-view stacked channels into doubled batch dimension for shared backbone, then concatenate with tabular features for classification

tools31

timeseries-mc-dropout-uncertainty

Estimate prediction uncertainty via Monte Carlo Dropout — run inference N times with dropout active and compute mean/std

data-ai31

llm-exhaustive-permutation-early-stopping

Enumerate all factorial permutations in batches with LLM scoring, tracking the running best and early-stopping when score crosses a known optimality threshold

data-ai31

timeseries-expanding-window-stacking

Walk-forward stacking ensemble that trains base models on expanding windows and a meta-learner on their out-of-fold predictions across time

data-ai31

timeseries-flux-snr-weighted-features

Engineer SNR-derived features from irregular time series — flux ratio squared, error-weighted mean flux, and normalized amplitude/range features

tools31

cv-frame-differencing-temporal-encoding

Encode motion and velocity by computing per-channel pixel differences between consecutive frames instead of stacking raw frames for RL visual observations

development31

timeseries-gaussian-log-likelihood-metric

Evaluate probabilistic forecasts using normalized Gaussian log-likelihood relative to naive and oracle baselines, scoring both mean accuracy and uncertainty calibration

data-ai31

cv-generative-output-numeric-cleaning

Clean noisy numeric strings from generative model output by removing invalid characters, fixing malformed floats, and handling multiple decimal points

testing31

tabular-gmm-feature-augmentation

Fits a Gaussian Mixture Model on the joint feature-target space and samples synthetic data pairs to augment small tabular datasets.

data-ai31

tabular-group-mean-log-mae-metric

Custom evaluation metric that computes log of per-group MAE then averages, penalizing uniformly bad groups.

data-ai31

tabular-gnn-on-knn-graph

Constructs a customer similarity graph via KNN on mixed features, then trains a GraphSAGE GNN for node classification. Captures relational patterns that tree and linear models miss, adding ensemble diversity.

data-ai31

tabular-haversine-knn-candidate-generation

Generates geographically proximate candidate pairs for entity matching using KNN with haversine distance, optionally partitioned by country.

data-ai31

tabular-hierarchical-rule-engine

Two-level group-then-pattern dispatch for game AI agents where groups filter by game state and ordered patterns within a group fire the first matching action

data-ai31

timeseries-hierarchy-level-confidence-coefficients

Assign different uncertainty spread coefficients per aggregation level in hierarchical forecasts, reflecting that higher aggregation yields narrower intervals

testing31

timeseries-imu-gravity-removal

Remove gravity component from raw accelerometer data using quaternion rotation to yield linear acceleration

data-ai31

tabular-inner-kfold-target-encoding

Computes leak-free target encoding statistics (mean, std, min, max) using nested inner KFold within each outer CV fold, preventing target leakage that occurs with naive groupby-based encoding.

data-ai31

llm-low-meaning-input-substitution

Replace the actual input text with a generic low-meaning passage to prevent the LLM from fixating on content specifics, forcing it to focus on stylistic and structural transformation cues

development31

timeseries-keras-multi-quantile-loss

Single neural network outputting all quantiles simultaneously via pinball loss over a quantile vector for joint probabilistic forecasting

data-ai31

timeseries-keystroke-pause-bucket-features

Bucket inter-keystroke latencies into pause-duration ranges (0.5-1s, 1-1.5s, 1.5-2s, 2-3s, >3s) and count per session as hesitation features

testing31

timeseries-learnable-fir-filter

Initialize a depthwise Conv1d with FIR filter coefficients as a trainable high-pass/low-pass filter for sensor signal preprocessing

testing31

tabular-multi-seed-fold-averaging

Trains multiple models per CV fold with different random seeds for augmentation, then averages their predictions to reduce variance from stochastic data generation.

data-ai31

tabular-l1-coefficient-interaction-map

Extract and visualize per-subgroup feature coefficient signs from L1-regularized models as an interaction heatmap for EDA

testing31

cv-lap-hard-negative-mining

Use linear assignment problem (LAP/lapjv) on a score matrix to select globally optimal hard-negative pairs for metric learning

data-ai31

tabular-last-diff-lag-features

Compute first-order difference between last and second-to-last rows per entity in panel data to capture recent trend direction and magnitude

data-ai31

nlp-learned-attention-pooling

Replace mean pooling with a trainable attention network (Linear-Tanh-Linear-Softmax) that learns token importance weights over transformer hidden states

data-ai31

nlp-length-sorted-batching

Sort texts by length before batching with dynamic padding to minimize wasted padding tokens and speed up transformer inference

tools31

tabular-logit-transform-stacking

Applies logit transformation to base model probabilities before fitting a logistic regression meta-learner, enabling principled linear combination in log-odds space.

development31

tabular-logspace-recency-reranking

Rerank session candidates using log-spaced recency weights multiplied by interaction-type multipliers

tools31

tabular-majority-vote-submission-blend

Blend multiple submission CSVs by row-wise majority voting on discrete predictions to produce a more robust final output

tools31

cv-map-iou-precision-sweep

Compute mean Average Precision by sweeping IoU thresholds from 0.5 to 0.95 on RLE-encoded instance masks using pycocotools

tools31

tabular-multiclass-to-binary-collapse

Trains on a finer-grained multiclass target (subtypes), then collapses non-baseline classes into a single positive class for binary submission.

data-ai31

cv-metadata-injection-bottleneck

Inject scalar metadata (depth, position, clinical features) into U-Net bottleneck via RepeatVector and Reshape for metadata-aware segmentation

tools31

tabular-morgan-fingerprint-features

Converts molecular SMILES strings to fixed-length Morgan fingerprint bit vectors using RDKit for use as tabular ML features.

data-ai31

timeseries-multiband-color-index-features

Compute log-ratio features between adjacent frequency bands as color indices to characterize spectral shape from multi-band time series

tools31

timeseries-multiband-lomb-scargle-period

Estimate periodicity from irregularly sampled multi-band time series using the multiband Lomb-Scargle periodogram, then phase-fold observations

testing31

timeseries-multiband-tsfresh-fft

Apply tsfresh per-passband feature extraction with FFT coefficients to capture multi-band periodicity from irregular time series

testing31

timeseries-multimodal-trajectory-head

Single linear head that jointly predicts K candidate trajectories and K softmax confidences, sliced and reshaped for multimodal regression

tools31

cv-spatial-rect-train-val-split

Split large-image segmentation data into train/val by spatial rectangle regions with border buffer exclusion to prevent patch leakage

data-ai31

tabular-multi-output-auxiliary-targets

Neural network with multiple output heads for main target plus auxiliary targets, improving representation learning via shared layers.

data-ai31

nlp-multi-retriever-union-ensemble

Run multiple independent retrieve-rerank pipelines and union-merge their predicted IDs per query via explode-groupby-unique

devops31

tabular-nelder-mead-threshold-optimization

Use scipy Nelder-Mead simplex to optimize regression-to-ordinal thresholds maximizing quadratic weighted kappa on OOF predictions

testing31

tabular-popularity-fallback-recommendation

Fills unfilled recommendation slots with globally popular recent items to handle cold-start users and short lists.

tools31

tabular-ppo-gym-wrapper-kaggle-env

Wrap a Kaggle competitive game environment as an OpenAI Gym env with continuous action space for training PPO agents via stable-baselines3

data-ai31

tabular-predicted-class-mass-reweighting

Post-hoc rescales ensemble probabilities by the inverse of each class's estimated total mass across the test set, correcting for class imbalance in predictions.

testing31

tabular-ngram-composite-features

Creates bi-gram and tri-gram composite categorical features by concatenating top categorical columns, then target-encodes the composites. Captures interaction effects that tree models may miss.

development31

cv-numeric-categorical-auto-detection

Auto-detect whether a generated data series is numeric or categorical by measuring the fraction of digit characters in the concatenated values

development31

tabular-oof-meta-features

Generates out-of-fold predictions from auxiliary models and uses them as input features for the final model.

data-ai31

cv-open-set-distance-cutoff

Assign an unknown/novel class when all nearest-neighbor distances exceed a tuned cutoff threshold for open-set recognition

tools31

cv-per-class-score-threshold

Apply class-specific confidence thresholds by inferring the dominant class per image and indexing into a per-class threshold array

testing31

tabular-optuna-lgbm-tuning

Uses Optuna with TPE sampler for Bayesian hyperparameter optimization of LightGBM, searching key params like num_leaves, depth, and learning rate.

tools31

tabular-outlier-rate-label-encoding

Encode a categorical column by replacing each category with the per-category outlier rate (mean of a binary outlier flag), out-of-fold to avoid leakage — a target-aware encoding tuned to long-tail / sentinel-target problems where a binary classifier signal is more useful than the raw regression mean

development31

cv-pairwise-distance-proximity-filter

Compute Euclidean distance between entity pairs from tracking data and filter out pairs beyond a threshold to reduce inference candidates

tools31

tabular-pearson-correlation-loss

Uses negative row-wise Pearson correlation as a differentiable loss function for multi-output regression, directly optimizing the competition metric.

tools31

tabular-per-fold-threshold-voting-ensemble

Binarizes each CV fold's predictions using its own optimized threshold, then majority-votes across folds instead of averaging raw probabilities.

data-ai31

timeseries-periodogram-seasonality-detection

Use scipy periodogram to identify dominant seasonal frequencies in a time series before selecting Fourier feature orders or ARIMA seasonal parameters

testing31

tabular-per-target-nan-mask-training

Trains independent models per target by masking NaN labels, enabling multi-output regression on datasets where each target has different coverage.

data-ai31

tabular-per-type-model-training

Trains separate models for each discrete category (e.g., molecule type, product class) to capture type-specific patterns.

development31

tabular-phase-based-strategy-cycling

Divide a game into repeating phases (attack, mine, spawn) with turn-modular gating so the agent cycles between aggressive and economic behavior

tools31

tabular-type-weighted-covisitation-matrix

Build item co-visitation matrix from session pairs within a time window, weighting by interaction type (click/cart/order) via GPU self-join

tools31

cv-prediction-map-stitching-averaging

Stitch overlapping tile predictions into a full-resolution output by accumulating probabilities and dividing by per-pixel overlap counts

data-ai31

tabular-rectangular-flight-plan-encoding

Encode closed rectangular patrol routes as compact direction-distance strings for fleet pathfinding on toroidal game grids

development31

tabular-recursive-feature-elimination

Uses RFE with a tree estimator to iteratively remove least important features, selecting an optimal compact feature set.

tools31

tabular-prior-rebalancing-oversampling

Rebalances training data by oversampling the majority class to match a known test-set class prior, reducing prediction miscalibration.

testing31

cv-progressive-dropout-unet

Apply lower dropout in shallow/final U-Net layers and higher dropout in deep layers to preserve spatial detail while regularizing abstract features

data-ai31

llm-prompt-variant-ensemble

Generate multiple LLM responses using diverse system prompt variants to increase reasoning diversity for self-consistency voting

data-ai31

tabular-rank-averaging-ensemble

Ensembles multiple model predictions by converting to ranks, averaging, and normalizing back to [0,1].

data-ai31

tabular-rank-calibrated-blending

Blends predictions from multiple models by converting to ranks, weighting, and calibrating back to probabilities via rank-group means from a reference model. Ensures monotonic calibrated output.

data-ai31

tabular-recency-weighted-candidate-generation

Generates recommendation candidates by ranking a customer's purchase history by frequency and recency within a recent window.

tools31

timeseries-rolling-refit-arima-forecast

Walk-forward validation for ARIMA by refitting on history at each step, forecasting one step ahead, then appending the true observation — produces an honest one-step error distribution that mirrors nightly-retrained production forecasters

data-ai31

timeseries-recursive-multistep-forecasting

Forecast a multi-step horizon by predicting one day ahead, writing the prediction back into the panel as the new "actual", recomputing all lag and rolling features that depend on it, then predicting the next day — turns a one-step LightGBM regressor into a 28-day forecaster without changing the model

data-ai31

tabular-regression-to-ordinal-thresholding

Converts regression predictions to ordinal classes by optimizing bin thresholds to maximize Quadratic Weighted Kappa.

tools31

tabular-regularized-qda-classifier

Use QuadraticDiscriminantAnalysis with regularization for binary classification on data with Gaussian cluster structure

data-ai31

tabular-relative-deviation-features

Computes differences and ratios between group-level aggregates and raw values to capture how each sample deviates from its group.

development31

tabular-ridge-xgb-stacking

Two-stage stacking where Ridge regression on OHE+scaled features produces OOF predictions fed as an extra feature to XGBoost, letting the tree model correct non-linear residuals on top of captured linear patterns.

data-ai31

cv-siamese-pairwise-comparison-head

Siamese network head that compares two embeddings via element-wise multiply, add, abs-diff, and squared-diff features for verification tasks

data-ai31

timeseries-sigma-clip-outlier-masking

Detect and mask outlier data points using iterative sigma-clipping on reference frames or calibration data

tools31

cv-sigmoid-normalized-rmse

Sigmoid-transformed normalized RMSE that maps error from [0,inf) to a bounded (0,1] similarity score using R2-score ratio

tools31

cv-rotation-tta-segmentation

Test-time augmentation via 4 rotation angles (0/90/180/270), applying inverse rotation to each prediction before averaging

testing31

timeseries-scaled-nan-sentinel

Encode missing sensor data with a per-modality sentinel value that survives standardization and remains detectable after scaling

development31

timeseries-transit-depth-polynomial-optimization

Estimate event depth by optimizing a scalar scaling factor on the in-event segment that minimizes polynomial baseline residual across the full signal

tools31

tabular-season-phase-labeling

Map calendar dates to categorical season phases (offseason, preseason, regular, postseason) using np.select with boundary date conditions

development31

llm-self-consistency-majority-vote

Aggregate multiple LLM reasoning attempts via majority voting with random jitter tiebreaking and validity filtering

data-ai31

timeseries-sensor-phase-visualization

EDA visualization of multi-modal sensor data with axvspan shading for labeled behavioral phases and contiguous span detection

data-ai31

tabular-smiles-randomization-augmentation

Augments molecular datasets by generating multiple randomized SMILES strings for the same molecule, exploiting SMILES non-uniqueness to multiply training samples.

data-ai31

timeseries-snap-event-interaction-features

Build per-state SNAP / event-flag interaction features by multiplying the binary flag with the sales and revenue columns segmented by state, capturing the demand uplift on government-benefit days that affects only specific geographies and product categories

development31

tabular-tfidf-svd-dense-text-features

Compress TF-IDF sparse text vectors into a handful of dense TruncatedSVD components so GBDTs can consume free-text fields as plain tabular columns

data-ai31

tabular-tfidf-weighted-category-counts

Convert per-group categorical event counts into TF-IDF-style features using log(1+tf/total) * log(N/df)

development31

cv-spectral-index-segmentation

Compute normalized spectral band ratios (NDWI, CCCI, NDVI) from multispectral imagery and threshold for binary segmentation of water, vegetation, or other targets

testing31

tabular-squeeze-compact-local-search

Three-stage packing refinement — uniform squeeze toward centroid, greedy compaction per object, then multi-directional local search — to tighten solutions after metaheuristic optimization

tools31

timeseries-stateful-chunk-inference

Processes long sequences in fixed-size chunks while carrying RNN hidden state across chunks for memory-efficient inference.

testing31

llm-stopword-priority-sorting

Initialize text ordering by placing stopwords first then content words, producing low-perplexity starting points for combinatorial search over word permutations

testing31

timeseries-store-profile-hierarchical-clustering

Re-cluster retail stores by scale-normalized weekday/dayoff mean+std profiles using Ward agglomerative clustering, replacing vendor-supplied "type/cluster" labels that correlate with store size instead of demand shape

testing31

tabular-streaming-prediction-api

Online inference pattern that processes test batches sequentially, updating feature dictionaries incrementally for time-series prediction APIs.

development31

cv-stroke-normalize-simplify-pipeline

Normalize raw stroke coordinates to 0-255 range, resample at uniform arc-length spacing, then apply Ramer-Douglas-Peucker simplification

testing31

cv-stroke-temporal-color-rendering

Render stroke sequences to grayscale images with temporal intensity encoding where earlier strokes are brighter and later strokes fade to encode drawing order

development31

tabular-strtree-spatial-index-collision

Use Shapely STRtree spatial index for O(n log n) polygon overlap detection instead of brute-force O(n^2) pairwise checks

development31

llm-swarm-tactic-diversity

Assign each new agent a different directional rotation pattern from a set of permutations to ensure swarm coverage diversity across the map

data-ai31

tabular-successive-groupby-aggregates

Build hierarchical features for transaction panels by aggregating twice — first groupby (entity, sub-key) to get a per-(entity, sub-key) summary, then groupby (entity) on those summaries to compute mean/min/max/std across the sub-keys, capturing the *distribution* of per-customer behavior rather than a single flat mean

development31

tabular-tabnet-sklearn-wrapper

Wrap PyTorch TabNet in a scikit-learn BaseEstimator with built-in imputation and early stopping for use in VotingRegressor ensembles

testing31

tabular-tabpfn-small-dataset-ensemble

Ensembles TabPFN (a prior-fitted Bayesian transformer for small tabular data) with XGBoost, averaging probabilities for stronger predictions on datasets under 1000 rows.

data-ai31

timeseries-temporal-frame-binning

Reduce temporal resolution by averaging consecutive frame blocks to improve SNR and compress high-cadence data

data-ai31

tabular-convex-hull-bbox-rotation

Minimize axis-aligned bounding box side length by finding the optimal rotation angle over convex hull vertices using bounded scalar optimization

tools31

cv-frame-prediction-averaging

Average per-frame sigmoid predictions across sampled video frames to produce a stable video-level classification probability

data-ai31

cv-lateralized-label-flip-tta-disable

Disable horizontal-flip augmentation (both train-time and TTA) when label columns encode left/right anatomy — flipping silently corrupts the targets because "Left ICA" must map to "Right ICA" after a flip, not stay as "Left ICA"

development24

cv-levenshtein-distance-metric

Evaluates image-to-sequence models using mean Levenshtein edit distance between predicted and ground-truth strings.

Adoption

wenmin-wu

timeseries-quantile-ratio-scaling

llm-sliding-window-permutation-search

llm-perplexity-prompt-ranking

llm-greedy-word-reinsert-search

llm-batched-perplexity-scoring

cv-slice-padding-augmented-duplicates

cv-batch-all-contrastive-loss

nlp-anchor-grouped-validation

tabular-distribution-matching-postprocess

tabular-frequency-encoding

tabular-group-kfold-leak-prevention

tabular-group-shuffle-split

tabular-lgbm-dart-boosting

tabular-log-odds-fold-averaging

tabular-null-importance-feature-selection

tabular-pairwise-te-logit-stacking

tabular-pseudo-labeling

tabular-polynomial-interaction-features

tabular-row-aggregate-features

tabular-row-wise-target-normalization

tabular-spatial-distance-aggregation

tabular-synthetic-sample-detection

timeseries-prediction-smoothing-lpf

tabular-weather-ordinal-encoding

tabular-weighted-recall-multi-objective-metric

timeseries-cnn-transformer-multimodal-fusion

timeseries-dilated-conv-residual-gru

timeseries-event-peak-detection

timeseries-k-mode-gaussian-nll-loss

timeseries-mean-residual-decomposition

timeseries-mixup-sequence-augmentation

timeseries-se-residual-1d-cnn

tabular-dtype-preset-csv-load

timeseries-scaled-pinball-loss

timeseries-ratio-target-for-smape

timeseries-retroactive-outlier-rescaling

tabular-simulated-annealing-multi-operator

timeseries-density-to-count-roundtrip

tabular-rmsle-keras-metric

cv-cnn-rnn-video-classification

cv-deterministic-hash-partitioning

cv-clip-interrogator-captioning

cv-embedding-knn-regression

tabular-opponent-avoidance-scoring

tabular-featureunion-field-dispatch

tabular-brand-name-recovery-from-title

tabular-ball-landing-prediction

nlp-lda-topic-modeling

cv-sentence-transformer-target-encoding

cv-pairwise-tracking-feature-merge

cv-percentile-contrast-stretch

cv-mask-to-polygon-contour-hierarchy

cv-3d-center-crop-zero-pad

timeseries-tof-spatial-region-pooling

timeseries-sensor-modality-dropout

timeseries-poisson-lgbm-bias-corrected-ensemble

timeseries-quaternion-angular-velocity

timeseries-neighbor-average-nan-interpolation

timeseries-multi-scale-rolling-features

timeseries-multi-gap-lag-diff-features

timeseries-locale-scoped-holiday-flag

timeseries-inverse-variance-channel-weighting

timeseries-gradient-transit-phase-detection

timeseries-gradient-event-boundary-detection

timeseries-event-anchored-frame-sync

timeseries-detector-calibration-pipeline

timeseries-correlated-double-sampling

tabular-yearly-partitioned-groupby

tabular-xgb-gpu-batch-iterator

tabular-vote-ensemble-outer-join

tabular-typed-panel-aggregation

tabular-weighted-gini-top-recall-metric

tabular-two-level-hierarchical-aggregation

tabular-temporal-session-aggregation

tabular-tabular-to-image-cnn

tabular-svd-target-reconstruction

tabular-sparse-dense-hstack-lgbm

tabular-regression-to-cdf-smoothing