plugins/kaggle-master/skills/kaggle-environment/SKILL.md
Kaggle runtime environment, paths, accelerators, and reproducibility. PROACTIVELY activate for: (1) `/kaggle/input` path errors, (2) `/kaggle/working` output placement, (3) local vs Kaggle notebook behavior, (4) GPU/TPU/accelerator selection, (5) internet enablement, (6) package/version pinning, (7) memory cleanup and timeout issues, (8) DEBUG flags for fast runs, (9) Kaggle Secrets usage guidance, (10) quota-consuming runtime settings. Provides: path conventions, runtime checklist, accelerator IDs, reproducibility patterns, and resource safeguards.
npx skillsauth add JosiahSiegel/claude-plugin-marketplace kaggle-environmentInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Use this skill for Kaggle hosted runtime behavior: filesystem paths, accelerators, internet access, dependency drift, output locations, reproducibility, and resource limits. Distinguish local development assumptions from Kaggle execution assumptions.
| Path | Use |
|---|---|
| /kaggle/input | Read-only mounted datasets, competition data, models, and notebook sources |
| /kaggle/working | Writable working directory; save submissions and artifacts here |
| Local project folder | Source and metadata folder used by CLI push/pull |
Never write final outputs only under /kaggle/input; it is read-only. For submissions or artifacts, write to /kaggle/working/<file> and verify the file appears in notebook outputs.
kaggle kernels push -p <folder> --accelerator <ID> can request accelerators. Valid IDs include NvidiaTeslaP100, NvidiaTeslaT4, NvidiaTeslaT4Highmem, NvidiaTeslaA100, NvidiaL4, NvidiaL4X1, NvidiaH100, NvidiaRtxPro6000, TpuV38, Tpu1VmV38, TpuV5E8, and TpuV6E8. Choose the smallest appropriate accelerator first, and warn that high-end GPUs/TPUs plus long timeouts can consume quotas.
DEBUG flag for small samples and short epochs./kaggle/working.enable_internet defaults to false in metadata. Enable only when the workflow and competition rules allow it. For sensitive values, advise Kaggle Secrets through the Kaggle UI. Do not embed tokens in notebooks, metadata, command history, or uploaded artifacts. Do not claim public API support for secrets administration.
Do not assume UserSecretsClient().get_secret() works in Kaggle CLI-pushed committed or batch runs. It can return HTTP 400 even when the secret exists and is attached in the Kaggle UI. kernel-metadata.json has no supported secrets, environment, or environment-variable field; do not invent one.
When a workflow can avoid secrets, prefer a zero-auth design. For temporary public tunnels from Kaggle, Cloudflare TryCloudflare quick tunnels avoid ngrok tokens:
cloudflared tunnel --url http://127.0.0.1:11434
Validate services from inside Kaggle against localhost, not the public tunnel hostname. Cloudflare tunnel hostnames may not resolve from Kaggle's network:
curl -fsS http://127.0.0.1:11434
Write discovered public URLs to /kaggle/working/<name>.txt so external tooling can retrieve them with kaggle kernels output even when kaggle kernels logs is blank during keepalive cells.
Kaggle base images may include third-party PPA sources that intermittently fail DNS resolution and break apt-get update. Before installing packages, consider disabling PPA source files and using retries with short timeouts:
sudo mkdir -p /etc/apt/sources.list.d/disabled
sudo find /etc/apt/sources.list.d -type f -name "*.list" -print -exec sudo mv {} /etc/apt/sources.list.d/disabled/ \;
sudo apt-get update \
-o Acquire::Retries=5 \
-o Acquire::http::Timeout=20 \
-o Acquire::https::Timeout=20
Keep package installation minimal because notebook startup networking can be intermittent.
When code works locally but fails on Kaggle, check mounted source names, case sensitivity, missing package versions, write paths, internet availability, accelerator availability, and memory. Add path-detection wrappers only when they keep execution deterministic; avoid hidden environment-specific branches that alter modeling logic.
Do not claim public API support for Docker image selection, quota management, scheduler administration, collaborator administration, or cell-level editing. Provide UI-based caveats when those tasks are outside public CLI/API scope.
development
This skill should be used when the user asks to train, debug, scale, or improve ML models. PROACTIVELY activate for: (1) PyTorch, TensorFlow/Keras, JAX, Flax, Hugging Face Trainer/Accelerate training loops, (2) distributed training, DDP/FSDP/DeepSpeed, TPU/GPU setup, (3) mixed precision AMP/bf16, gradient accumulation, checkpointing, seeding, (4) overfitting, imbalance, loss functions, regularization, LR schedules, warmup, (5) memory optimization, gradient checkpointing, offloading, quantization-aware training. Provides: reproducible training best practices across deep learning and classical ML.
development
This skill should be used when the user asks to productionize, track, version, govern, monitor, or automate ML systems. PROACTIVELY activate for: (1) MLflow, Weights & Biases, Neptune, Comet, ClearML experiment tracking, (2) model registry, model versioning, artifact lineage, reproducibility, (3) Kubeflow, SageMaker Pipelines, Vertex AI Pipelines, Azure ML pipelines, Databricks workflows, (4) CI/CD, continuous training/evaluation, A/B tests, canary/shadow deployments, (5) drift detection, model monitoring, data validation, responsible AI governance. Provides: end-to-end MLOps architecture and operational safeguards.
development
This skill should be used when the user asks to optimize, export, serve, compress, or accelerate ML inference. PROACTIVELY activate for: (1) latency, throughput, p95/p99, batching, concurrency, KV cache, memory, or cost issues, (2) quantization INT8/INT4, GPTQ, AWQ, bitsandbytes, pruning, sparsity, distillation, (3) ONNX export, ONNX Runtime, TensorRT, TorchScript, torch.compile, XLA, OpenVINO, Core ML, TFLite, (4) Triton, TorchServe, TF Serving, BentoML, Seldon, KServe configuration, (5) edge deployment, CPU/GPU/TPU/Inferentia serving. Provides: hardware-aware inference optimization and safe benchmarking.
testing
This skill should be used when the user asks to tune hyperparameters, run sweeps, optimize search spaces, or use AutoML. PROACTIVELY activate for: (1) Optuna, Ray Tune, FLAML, AutoGluon, Hyperopt, Nevergrad, KerasTuner, W&B sweeps, (2) grid search, random search, Bayesian optimization, TPE, Gaussian processes, evolutionary search, (3) ASHA, Hyperband, successive halving, multi-fidelity optimization, population-based training, (4) learning-rate finder, batch-size search, early stopping, pruning, (5) reproducible sweep design and experiment analysis. Provides: budget-aware hyperparameter search strategy.