Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

brycewang-stanford/kaggle-api-guide

Name: kaggle-api-guide
Author: brycewang-stanford

skills/43-wentorai-research-plugins/skills/tools/code-exec/kaggle-api-guide/SKILL.md

npx skillsauth add brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research kaggle-api-guide

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Kaggle API Guide

Overview

Kaggle is the world's largest data science and machine learning community, hosting thousands of datasets, competitions, and computational notebooks. The Kaggle API provides programmatic access to these resources, enabling researchers to download datasets, submit competition entries, manage kernels (notebooks), and explore the Kaggle ecosystem from the command line or scripts.

For academic researchers, Kaggle is a valuable resource for accessing curated, well-documented datasets across diverse domains including healthcare, natural language processing, computer vision, economics, and social sciences. Many published research papers use Kaggle datasets as benchmarks, and the platform's competition infrastructure provides standardized evaluation frameworks for comparing methods.

The Kaggle API is available as a Python CLI tool and library. It requires a free Kaggle account and API token for authentication. The API supports dataset search and download, competition data retrieval, kernel management, and model access.

Authentication

A free Kaggle API token is required. Generate one from your Kaggle account settings at https://www.kaggle.com/settings.

Download the kaggle.json credentials file and place it in the standard location:

# The kaggle.json file should be at ~/.kaggle/kaggle.json
# It contains your username and key from your Kaggle account settings
mkdir -p ~/.kaggle
# Move your downloaded kaggle.json to ~/.kaggle/kaggle.json
chmod 600 ~/.kaggle/kaggle.json

Alternatively, use environment variables:

export KAGGLE_USERNAME=$KAGGLE_USERNAME
export KAGGLE_KEY=$KAGGLE_KEY

Install the CLI tool:

pip install kaggle

Core Endpoints

Search Datasets

Find datasets by keyword, file type, or license.

# Search for datasets
kaggle datasets list -s "climate change" --sort-by votes

# Search with specific criteria
kaggle datasets list -s "medical imaging" --file-type csv --max-size 1000000

Download a Dataset

# Download and unzip a dataset
kaggle datasets download -d "heptapod/titanic" --unzip -p ./data/titanic/

# Download a specific file from a dataset
kaggle datasets download -d "yelp-dataset/yelp-dataset" -f "yelp_academic_dataset_review.json" -p ./data/

List and Join Competitions

# List active competitions
kaggle competitions list

# Download competition data (must accept rules on kaggle.com first)
kaggle competitions download -c "house-prices-advanced-regression-techniques" -p ./data/house-prices/

Submit to a Competition

# Submit predictions
kaggle competitions submit -c "house-prices-advanced-regression-techniques" \
  -f ./submission.csv -m "Random forest baseline v1"

# Check submission status
kaggle competitions submissions -c "house-prices-advanced-regression-techniques"

Manage Notebooks (Kernels)

# Search for notebooks
kaggle kernels list -s "transformer nlp" --sort-by voteCount

# Pull a notebook to local
kaggle kernels pull "username/notebook-name" -p ./notebooks/

# Push a notebook to Kaggle
kaggle kernels push -p ./my-notebook/

Python Example: Automated Dataset Discovery and Download

import subprocess
import json
import os

def search_kaggle_datasets(query, sort_by="votes", max_results=10):
    """Search Kaggle datasets and return structured results."""
    cmd = [
        "kaggle", "datasets", "list",
        "-s", query,
        "--sort-by", sort_by,
        "--max-size", "50000000",
        "--csv"
    ]
    result = subprocess.run(cmd, capture_output=True, text=True)
    lines = result.stdout.strip().split("\n")
    if len(lines) < 2:
        return []

    headers = lines[0].split(",")
    datasets = []
    for line in lines[1:max_results + 1]:
        values = line.split(",")
        dataset = dict(zip(headers, values))
        datasets.append(dataset)
    return datasets

def download_dataset(dataset_ref, output_dir="./data"):
    """Download a Kaggle dataset by reference."""
    os.makedirs(output_dir, exist_ok=True)
    cmd = [
        "kaggle", "datasets", "download",
        "-d", dataset_ref,
        "--unzip",
        "-p", output_dir
    ]
    result = subprocess.run(cmd, capture_output=True, text=True)
    if result.returncode == 0:
        print(f"Downloaded {dataset_ref} to {output_dir}")
    else:
        print(f"Error: {result.stderr}")

# Search for NLP benchmark datasets
datasets = search_kaggle_datasets("nlp text classification benchmark")
for ds in datasets[:5]:
    print(f"  {ds.get('ref', 'N/A')}")
    print(f"    Size: {ds.get('totalBytes', 'N/A')} bytes")
    print(f"    Votes: {ds.get('voteCount', 'N/A')}")
    print()

Python Example: Using the Kaggle Python API Directly

from kaggle.api.kaggle_api_extended import KaggleApi

api = KaggleApi()
api.authenticate()

# Search datasets
datasets = api.dataset_list(search="genomics", sort_by="updated")
for ds in datasets[:5]:
    print(f"{ds.ref}: {ds.title} ({ds.size})")

# Get dataset metadata
metadata = api.dataset_view("nih-chest-xrays/data")
print(f"Title: {metadata.title}")
print(f"Size: {metadata.totalBytes}")
print(f"Description: {metadata.description[:200]}")

# Download dataset files
api.dataset_download_files(
    "nih-chest-xrays/sample",
    path="./data/chest-xrays/",
    unzip=True
)

Common Research Patterns

Benchmark Dataset Access: Download well-established datasets used in published research for reproducibility studies. Kaggle hosts canonical versions of many benchmark datasets referenced in ML papers.

Competition as Evaluation Framework: Use Kaggle competitions as standardized evaluation environments with leaderboards and held-out test sets. Submit predictions from novel methods to compare against state-of-the-art approaches.

Data Exploration Notebooks: Search for and pull community notebooks that explore datasets relevant to your research. These often contain valuable preprocessing code, exploratory analysis, and baseline models.

Collaborative Research Datasets: Upload processed research datasets to Kaggle for sharing with collaborators and the broader community, enabling others to reproduce and extend your work.

Cross-Domain Transfer: Search across Kaggle's diverse dataset collection to find datasets from adjacent domains that could be useful for transfer learning or cross-domain validation studies.

Rate Limits and Best Practices

API rate limits: Kaggle imposes daily limits on API calls; typical free accounts allow several hundred requests per day
Download limits: Large datasets may take significant time and disk space; check sizes before downloading
Competition rules: Always accept competition rules on the Kaggle website before attempting to download competition data via API
Kernel push format: When pushing notebooks, include a kernel-metadata.json file specifying the kernel type, language, and datasets
Authentication security: Never commit kaggle.json to version control; use environment variables in CI/CD pipelines
Dataset versioning: Kaggle datasets support versions; specify version numbers for reproducibility in research
Large files: For datasets over 10GB, consider using the Kaggle CLI rather than the Python API for more reliable downloads

References

Kaggle API Documentation: https://www.kaggle.com/docs/api
Kaggle API GitHub Repository: https://github.com/Kaggle/kaggle-api
Kaggle Datasets: https://www.kaggle.com/datasets
Kaggle Competitions: https://www.kaggle.com/competitions
Kaggle Notebooks: https://www.kaggle.com/code

brycewang-stanford/kaggle-api-guide

skills/43-wentorai-research-plugins/skills/tools/code-exec/kaggle-api-guide/SKILL.md

Download datasets, manage competitions and notebooks via Kaggle API

1,661 stars

development

Updated Jun 4, 2026

$ install --global

skillsauth

npx skillsauth add brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research kaggle-api-guide

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 20, 2026, 4:22 AM6.5s1 file scanned

SKILL.md

name:: kaggle-api-guide
description:: Download datasets, manage competitions and notebooks via Kaggle API
emoji:: 📈
category:: tools
subcategory:: code-exec
keywords:: ["kaggle", "datasets", "competitions", "notebooks", "data-science", "machine-learning"]
source:: https://www.kaggle.com/docs/api

Kaggle API Guide

Overview

Authentication

A free Kaggle API token is required. Generate one from your Kaggle account settings at https://www.kaggle.com/settings.

Download the kaggle.json credentials file and place it in the standard location:

# The kaggle.json file should be at ~/.kaggle/kaggle.json
# It contains your username and key from your Kaggle account settings
mkdir -p ~/.kaggle
# Move your downloaded kaggle.json to ~/.kaggle/kaggle.json
chmod 600 ~/.kaggle/kaggle.json

Alternatively, use environment variables:

export KAGGLE_USERNAME=$KAGGLE_USERNAME
export KAGGLE_KEY=$KAGGLE_KEY

Install the CLI tool:

pip install kaggle

Core Endpoints

Search Datasets

Find datasets by keyword, file type, or license.

# Search for datasets
kaggle datasets list -s "climate change" --sort-by votes

# Search with specific criteria
kaggle datasets list -s "medical imaging" --file-type csv --max-size 1000000

Download a Dataset

# Download and unzip a dataset
kaggle datasets download -d "heptapod/titanic" --unzip -p ./data/titanic/

# Download a specific file from a dataset
kaggle datasets download -d "yelp-dataset/yelp-dataset" -f "yelp_academic_dataset_review.json" -p ./data/

List and Join Competitions

# List active competitions
kaggle competitions list

# Download competition data (must accept rules on kaggle.com first)
kaggle competitions download -c "house-prices-advanced-regression-techniques" -p ./data/house-prices/

Submit to a Competition

# Submit predictions
kaggle competitions submit -c "house-prices-advanced-regression-techniques" \
  -f ./submission.csv -m "Random forest baseline v1"

# Check submission status
kaggle competitions submissions -c "house-prices-advanced-regression-techniques"

Manage Notebooks (Kernels)

# Search for notebooks
kaggle kernels list -s "transformer nlp" --sort-by voteCount

# Pull a notebook to local
kaggle kernels pull "username/notebook-name" -p ./notebooks/

# Push a notebook to Kaggle
kaggle kernels push -p ./my-notebook/

Python Example: Automated Dataset Discovery and Download

import subprocess
import json
import os

def search_kaggle_datasets(query, sort_by="votes", max_results=10):
    """Search Kaggle datasets and return structured results."""
    cmd = [
        "kaggle", "datasets", "list",
        "-s", query,
        "--sort-by", sort_by,
        "--max-size", "50000000",
        "--csv"
    ]
    result = subprocess.run(cmd, capture_output=True, text=True)
    lines = result.stdout.strip().split("\n")
    if len(lines) < 2:
        return []

    headers = lines[0].split(",")
    datasets = []
    for line in lines[1:max_results + 1]:
        values = line.split(",")
        dataset = dict(zip(headers, values))
        datasets.append(dataset)
    return datasets

def download_dataset(dataset_ref, output_dir="./data"):
    """Download a Kaggle dataset by reference."""
    os.makedirs(output_dir, exist_ok=True)
    cmd = [
        "kaggle", "datasets", "download",
        "-d", dataset_ref,
        "--unzip",
        "-p", output_dir
    ]
    result = subprocess.run(cmd, capture_output=True, text=True)
    if result.returncode == 0:
        print(f"Downloaded {dataset_ref} to {output_dir}")
    else:
        print(f"Error: {result.stderr}")

# Search for NLP benchmark datasets
datasets = search_kaggle_datasets("nlp text classification benchmark")
for ds in datasets[:5]:
    print(f"  {ds.get('ref', 'N/A')}")
    print(f"    Size: {ds.get('totalBytes', 'N/A')} bytes")
    print(f"    Votes: {ds.get('voteCount', 'N/A')}")
    print()

Python Example: Using the Kaggle Python API Directly

from kaggle.api.kaggle_api_extended import KaggleApi

api = KaggleApi()
api.authenticate()

# Search datasets
datasets = api.dataset_list(search="genomics", sort_by="updated")
for ds in datasets[:5]:
    print(f"{ds.ref}: {ds.title} ({ds.size})")

# Get dataset metadata
metadata = api.dataset_view("nih-chest-xrays/data")
print(f"Title: {metadata.title}")
print(f"Size: {metadata.totalBytes}")
print(f"Description: {metadata.description[:200]}")

# Download dataset files
api.dataset_download_files(
    "nih-chest-xrays/sample",
    path="./data/chest-xrays/",
    unzip=True
)

Common Research Patterns

Collaborative Research Datasets: Upload processed research datasets to Kaggle for sharing with collaborators and the broader community, enabling others to reproduce and extend your work.

Cross-Domain Transfer: Search across Kaggle's diverse dataset collection to find datasets from adjacent domains that could be useful for transfer learning or cross-domain validation studies.

Rate Limits and Best Practices

API rate limits: Kaggle imposes daily limits on API calls; typical free accounts allow several hundred requests per day
Download limits: Large datasets may take significant time and disk space; check sizes before downloading
Competition rules: Always accept competition rules on the Kaggle website before attempting to download competition data via API
Kernel push format: When pushing notebooks, include a kernel-metadata.json file specifying the kernel type, language, and datasets
Authentication security: Never commit kaggle.json to version control; use environment variables in CI/CD pipelines
Dataset versioning: Kaggle datasets support versions; specify version numbers for reproducibility in research
Large files: For datasets over 10GB, consider using the Kaggle CLI rather than the Python API for more reliable downloads

References

Kaggle API Documentation: https://www.kaggle.com/docs/api
Kaggle API GitHub Repository: https://github.com/Kaggle/kaggle-api
Kaggle Datasets: https://www.kaggle.com/datasets
Kaggle Competitions: https://www.kaggle.com/competitions
Kaggle Notebooks: https://www.kaggle.com/code

Related Skills

brycewang-stanford/literature-review-tools

tools

VerifiedTrustedCommunity

Recommend AND run open-source AI tools, agents, Claude Code / Codex skills, and MCP servers for any stage of a literature review — searching, reading, extracting, synthesizing, screening, citation-checking, and paper writing. Use when the user asks "what tool should I use to..." OR "install/run/use <tool> to ..." for research/lit-review work: automating a survey or related-work section, PDF→Markdown extraction for LLMs (MinerU/marker/docling), PRISMA / systematic review (ASReview), citation-backed Q&A over PDFs (PaperQA2), wiring papers into Claude/Cursor via MCP (arxiv/paper-search/zotero servers), or chatting with a Zotero library. Ships a launcher (scripts/litrun.py) that installs each tool in an isolated venv and runs it. Curated catalog of 70+ vetted projects. 支持中英文（用于「文献综述工具选型」与「一键安装/运行」）。

3,109SKILL.mdUpdated Jul 28, 2026

brycewang-stanford/literature-review-tools

brycewang-stanford/auto-empirical-research-skills

development

VerifiedTrustedCommunity

Route empirical-research requests through the Auto-Empirical Research Skills catalog when this whole repository is installed as one skill in Codex, CodeBuddy, Claude Code, or another IDE. Use to choose and load the right vendored AERS skill for causal inference, econometrics, replication, data acquisition, manuscript writing, peer review and referee responses, citation checking, de-AIGC editing, or full empirical-paper workflows without reading the entire repository at once.

3,109SKILL.mdUpdated Jun 27, 2026

brycewang-stanford/auto-empirical-research-skills

brycewang-stanford/aer-preregistration

documentation

VerifiedTrustedCommunity

Use when the project collects primary data or runs a field, lab, or survey experiment, before the intervention begins — write the pre-analysis plan, size the sample from a power calculation, and register with the AEA RCT Registry. Apply after the design is chosen in aer-identification and before any outcome data are seen.

3,021SKILL.mdUpdated Jul 23, 2026

brycewang-stanford/aer-preregistration

brycewang-stanford/economist-data-skill

tools

VerifiedTrustedCommunity

Guide economists to authoritative data sources with explicit, confirmed data specifications before retrieval; interfaces with Playwright MCP to navigate portals and extract real data, not articles about data.

3,021SKILL.mdUpdated Jul 23, 2026

brycewang-stanford/economist-data-skill

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research.git

# Copy into Claude Code skills folder (global)
cp -r Awesome-Agent-Skills-for-Empirical-Research/skills/43-wentorai-research-plugins/skills/tools/code-exec/kaggle-api-guide ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research

1,661 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT