skills/cv/tensorrt-fp16-compilation/SKILL.md
Compiles a PyTorch model to a TensorRT FP16 engine via torch_tensorrt for 2-5x inference speedup, saved as reusable TorchScript.
npx skillsauth add wenmin-wu/ds-skills cv-tensorrt-fp16-compilationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Kaggle inference kernels have strict time limits (2–9 hours for large datasets). TensorRT compiles a PyTorch model into an optimized GPU engine with FP16 precision, typically achieving 2–5x speedup over vanilla PyTorch. The compiled model is saved as TorchScript and loaded for inference without recompilation. Works with any model that can run a traced forward pass — CNNs, vision transformers, etc.
import torch
import torch_tensorrt
# Load trained model
model = MyModel()
model.load_state_dict(torch.load('best_model.pth'))
model.eval().cuda().half()
# Merge BatchNorm for better optimization (if applicable)
if hasattr(model, 'merge_bn'):
model.merge_bn()
# Compile to TensorRT FP16
batch_size = 8
trt_model = torch_tensorrt.compile(
model,
inputs=[
torch_tensorrt.Input(
[batch_size, 1, 1024, 512],
dtype=torch.half
)
],
enabled_precisions={torch.half},
workspace_size=1 << 32, # 4GB workspace
require_full_compilation=True,
)
# Save as TorchScript for reuse
torch.jit.save(trt_model, 'model.trt_fp16.ts')
# Load and run (no recompilation needed)
trt_model = torch.jit.load('model.trt_fp16.ts')
with torch.no_grad():
output = trt_model(batch.cuda().half())
torch_tensorrt.compile() specifying input shape and FP16.ts)1 << 32) is safe for most GPUs; reduce for smaller GPUsrequire_full_compilation=Falsedata-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF