plugins/kaggle-master/skills/datasets-models-sources/SKILL.md
Kaggle datasets, models, sources, and kagglehub workflows. PROACTIVELY activate for: (1) downloading datasets with kagglehub, (2) uploading datasets, (3) downloading or uploading Kaggle models, (4) competition_download, (5) notebook_output_download, (6) choosing Kaggle CLI vs kagglehub, (7) attaching dataset_sources, competition_sources, kernel_sources, or model_sources, (8) model artifact transfer, (9) source dependency cleanup, (10) kagglehub limitations for notebooks. Provides: dataset/model transfer patterns, source attachment guidance, tool selection, and limitation checks.
npx skillsauth add JosiahSiegel/claude-plugin-marketplace datasets-models-sourcesInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Use this skill for Kaggle datasets, models, source attachments, and kagglehub workflows. Separate transfer/download tasks from notebook lifecycle tasks: kagglehub is useful for data and model artifacts, but Kaggle CLI remains the notebook push/pull/status/delete tool.
| Need | Prefer |
|---|---|
| Create/edit/push/delete notebook | Kaggle CLI or Python API |
| Check notebook status/logs/files | Kaggle CLI or Python API |
| Download notebook outputs | Kaggle CLI or kagglehub.notebook_output_download |
| Download/load/upload datasets | kagglehub dataset functions or Kaggle CLI |
| Download/upload models | kagglehub model functions |
| Download competition data | kagglehub competition_download or Kaggle CLI |
| Manage metadata/source arrays | kernel-metadata.json plus Kaggle CLI push |
kagglehub supports dataset_download, dataset_load, dataset_upload, model_download, model_upload, competition_download, and notebook_output_download. It does not create, edit, push, delete, or administer Kaggle notebooks, and it does not manage kernel-metadata.json or notebook run status. Route those operations to notebook-lifecycle and kernel-metadata.
Use dataset_sources, competition_sources, kernel_sources, and model_sources in kernel-metadata.json for Kaggle-hosted notebook dependencies. Prefer explicit Kaggle resource identifiers and remove unused sources to reduce ambiguity, mount clutter, and rule-compliance risk.
/kaggle/input and output paths under /kaggle/working.For competition notebooks, attach competition data through metadata and write submissions to /kaggle/working. For reusable training outputs, download notebook outputs or upload model artifacts with kagglehub after checking license/rule constraints. For local experimentation, use kagglehub downloads to populate local cache while keeping notebook code path-compatible with /kaggle/input.
Require confirmation before overwriting uploaded dataset/model versions or making private assets public. Do not claim kagglehub can push notebooks, change kernel metadata, inspect run logs, or delete kernels. Use Kaggle Secrets guidance for sensitive credentials rather than packaging them into datasets, models, or notebooks.
development
This skill should be used when the user asks to train, debug, scale, or improve ML models. PROACTIVELY activate for: (1) PyTorch, TensorFlow/Keras, JAX, Flax, Hugging Face Trainer/Accelerate training loops, (2) distributed training, DDP/FSDP/DeepSpeed, TPU/GPU setup, (3) mixed precision AMP/bf16, gradient accumulation, checkpointing, seeding, (4) overfitting, imbalance, loss functions, regularization, LR schedules, warmup, (5) memory optimization, gradient checkpointing, offloading, quantization-aware training. Provides: reproducible training best practices across deep learning and classical ML.
development
This skill should be used when the user asks to productionize, track, version, govern, monitor, or automate ML systems. PROACTIVELY activate for: (1) MLflow, Weights & Biases, Neptune, Comet, ClearML experiment tracking, (2) model registry, model versioning, artifact lineage, reproducibility, (3) Kubeflow, SageMaker Pipelines, Vertex AI Pipelines, Azure ML pipelines, Databricks workflows, (4) CI/CD, continuous training/evaluation, A/B tests, canary/shadow deployments, (5) drift detection, model monitoring, data validation, responsible AI governance. Provides: end-to-end MLOps architecture and operational safeguards.
development
This skill should be used when the user asks to optimize, export, serve, compress, or accelerate ML inference. PROACTIVELY activate for: (1) latency, throughput, p95/p99, batching, concurrency, KV cache, memory, or cost issues, (2) quantization INT8/INT4, GPTQ, AWQ, bitsandbytes, pruning, sparsity, distillation, (3) ONNX export, ONNX Runtime, TensorRT, TorchScript, torch.compile, XLA, OpenVINO, Core ML, TFLite, (4) Triton, TorchServe, TF Serving, BentoML, Seldon, KServe configuration, (5) edge deployment, CPU/GPU/TPU/Inferentia serving. Provides: hardware-aware inference optimization and safe benchmarking.
testing
This skill should be used when the user asks to tune hyperparameters, run sweeps, optimize search spaces, or use AutoML. PROACTIVELY activate for: (1) Optuna, Ray Tune, FLAML, AutoGluon, Hyperopt, Nevergrad, KerasTuner, W&B sweeps, (2) grid search, random search, Bayesian optimization, TPE, Gaussian processes, evolutionary search, (3) ASHA, Hyperband, successive halving, multi-fidelity optimization, population-based training, (4) learning-rate finder, batch-size search, early stopping, pruning, (5) reproducible sweep design and experiment analysis. Provides: budget-aware hyperparameter search strategy.