gemini/skills/bigquery/SKILL.md
Use bigquery CLI (instead of `bq`) for all Google BigQuery and GCP data warehouse operations including SQL query execution, data ingestion (streaming insert, bulk load, JSONL/CSV/Parquet), data extraction/export, dataset/table/view management, external tables, schema operations, query templates, cost estimation with dry-run, authentication with gcloud, data pipelines, ETL workflows, and MCP/LSP server integration for AI-assisted querying and editor support. Modern Rust-based replacement for the Python `bq` CLI with faster startup, better cost awareness, and streaming support. Handles both small-scale streaming inserts (<1000 rows) and large-scale bulk loading (>10MB files), with support for Cloud Storage integration.
npx skillsauth add lanej/dotfiles bigqueryInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
You are a BigQuery specialist using the bigquery CLI tool. This skill provides comprehensive guidance for working with Google BigQuery through a unified Rust-based CLI with query execution, template management, and server modes.
The bigquery CLI provides:
# Check if authenticated and verify required scopes
bigquery auth check
# Will show:
# - Authentication status
# - Active account
# - BigQuery scopes availability
# Authenticate with gcloud including all required BigQuery scopes
bigquery auth login
# This will:
# 1. Run gcloud auth login
# 2. Ensure all necessary BigQuery scopes are granted
# 3. Verify authentication succeeded
Best Practice: Always run bigquery auth check first to verify authentication before operations.
# Basic query execution (interactive cost confirmation)
bigquery query "SELECT * FROM dataset.table LIMIT 10"
# Skip cost confirmation for automation
bigquery query --yes "SELECT COUNT(*) FROM dataset.table"
# JSON output (default)
bigquery query "SELECT * FROM dataset.table LIMIT 5"
# Text/table output
bigquery query --format text "SELECT * FROM dataset.table LIMIT 5"
Cost Awareness: The query command automatically:
--yes is used)# JSON output (default, machine-readable)
bigquery query "SELECT * FROM dataset.table"
bigquery query --format json "SELECT * FROM dataset.table"
# Text output (human-readable table)
bigquery query --format text "SELECT * FROM dataset.table"
# Estimate cost without executing
bigquery dry-run "SELECT * FROM large_dataset.table WHERE date >= '2025-01-01'"
# Returns:
# - Bytes that would be processed
# - Estimated cost
# - No actual data
Use dry-run to:
# List datasets in current project (text format, default)
bigquery datasets list my-project
# JSON output
bigquery datasets list my-project --format json
# Example output shows:
# - Dataset ID
# - Location
# - Creation time
# - Labels (if any)
Note: Dataset reference format is project.dataset or just project to list all datasets.
# List tables in a dataset (text format, first 10)
bigquery tables list my-project.my-dataset
# JSON output
bigquery tables list my-project.my-dataset --format json
# Limit results
bigquery tables list my-project.my-dataset --limit 20
# Maximum limit is 100
bigquery tables list my-project.my-dataset --limit 100
# Show table schema and metadata (text format)
bigquery tables describe my-project.my-dataset.my-table
# JSON output
bigquery tables describe my-project.my-dataset.my-table --format json
# Output includes:
# - Column names and types
# - Nullability (NULLABLE, REQUIRED, REPEATED)
# - Mode information
# - Table metadata
Best for <1000 rows. Uses streaming insert API for immediate availability.
From JSONL File:
# Create sample JSONL file
cat > users.jsonl <<EOF
{"id": "1", "name": "Alice Johnson", "email": "[email protected]", "age": 30}
{"id": "2", "name": "Bob Smith", "email": "[email protected]", "age": 25}
{"id": "3", "name": "Charlie Brown", "email": "[email protected]", "age": 35}
EOF
# Insert from JSONL file
bigquery tables insert my-project.dataset.users \
--data users.jsonl --format json
From JSONL Stream (stdin):
# Stream from command output
echo '{"id": "1", "name": "Alice", "email": "[email protected]"}' | \
bigquery tables insert my-project.dataset.users --data - --format json
# Stream from multiple sources (heredoc)
cat << EOF | bigquery tables insert my-project.dataset.users --data - --format json
{"id": "1", "name": "Alice", "email": "[email protected]", "age": 30}
{"id": "2", "name": "Bob", "email": "[email protected]", "age": 25}
{"id": "3", "name": "Charlie", "email": "[email protected]", "age": 35}
EOF
# Stream from application output
my-etl-tool --output jsonl | bigquery tables insert my-project.dataset.events --data -
# Stream from compressed file
gunzip -c logs.jsonl.gz | bigquery tables insert my-project.dataset.logs --data -
# Stream from jq transformation
cat raw_data.json | jq -c '.records[]' | \
bigquery tables insert my-project.dataset.processed --data -
JSONL Format Requirements:
{"field1":"value1","field2":"value2"}\nFrom CSV File:
# Create sample CSV file
cat > users.csv <<EOF
id,name,email,age
1,Alice Johnson,[email protected],30
2,Bob Smith,[email protected],25
3,"Charlie Brown, Jr.",[email protected],35
EOF
# Insert from CSV file
bigquery tables insert my-project.dataset.users \
--data users.csv --format csv
From CSV Stream (stdin):
# Stream from heredoc
cat << EOF | bigquery tables insert my-project.dataset.users --data - --format csv
id,name,email,age
1,Alice Johnson,[email protected],30
2,Bob Smith,[email protected],25
3,Charlie Brown,[email protected],35
EOF
# Stream from application output
./generate_report.sh | bigquery tables insert my-project.dataset.reports --data - --format csv
# Stream from compressed CSV
gunzip -c data.csv.gz | bigquery tables insert my-project.dataset.imports --data -
# Stream from curl/API response
curl -s https://api.example.com/export.csv | \
bigquery tables insert my-project.dataset.api_data --data - --format csv
# Transform and stream CSV
cat raw.csv | tail -n +2 | awk '{print tolower($0)}' | \
bigquery tables insert my-project.dataset.cleaned --data - --format csv
CSV Format Requirements:
# Insert inline JSON (single object)
bigquery tables insert my-project.dataset.users \
--json '{"id": "1", "name": "Alice", "email": "[email protected]"}'
# Insert inline JSON array
bigquery tables insert my-project.dataset.users \
--json '[{"id": "1", "name": "Alice"}, {"id": "2", "name": "Bob"}]'
# Dry-run validation (no data inserted)
bigquery tables insert my-project.dataset.users \
--data users.csv --format csv --dry-run
# Skip invalid rows instead of failing
bigquery tables insert my-project.dataset.users \
--data users.csv --format csv --skip-invalid
# Ignore unknown fields in data
bigquery tables insert my-project.dataset.users \
--data users.csv --format csv --ignore-unknown
# Combine options for production pipelines
cat production_data.jsonl | \
bigquery tables insert my-project.dataset.production \
--data - --format json \
--skip-invalid \
--ignore-unknown
Insert Options:
--json <JSON>: Inline JSON data (object or array)--data <PATH>: Path to data file, or - for stdin--format <FORMAT>: Data format (json or csv, default: json)--dry-run: Validate without inserting--skip-invalid: Skip invalid rows instead of failing--ignore-unknown: Ignore unknown fields in data--yes: Skip confirmation promptsBest for >10MB files or >1000 rows. Uses BigQuery load jobs.
⚠️ IMPORTANT: Local file loading requires GCS staging bucket configuration.
tables insert for datasets <1000 rows insteadbigquery tables load gs://...# Load from Cloud Storage URI (RECOMMENDED - no bucket config needed)
bigquery tables load my-project.dataset.users \
gs://my-bucket/data.csv --format csv
# Load from local CSV file (requires GCS staging bucket configured)
bigquery tables load my-project.dataset.users data.csv --format csv
# Load with schema auto-detection
bigquery tables load my-project.dataset.new_table data.csv \
--format csv --autodetect
# Load with replace write disposition (truncates table first)
bigquery tables load my-project.dataset.users data.csv \
--format csv --write-disposition replace
# Load JSON file
bigquery tables load my-project.dataset.events events.json \
--format json
# Supported formats: csv, json, avro, parquet, orc
bigquery tables load my-project.dataset.table data.parquet \
--format parquet
# Dry-run validation (no data loaded)
bigquery tables load my-project.dataset.users data.csv \
--format csv --dry-run
# Allow some bad records (skip up to 100 invalid rows)
bigquery tables load my-project.dataset.users data.csv \
--format csv --max-bad-records 100
# Ignore unknown fields
bigquery tables load my-project.dataset.users data.csv \
--format csv --ignore-unknown
# Skip confirmation prompts (for automation/CI)
bigquery tables load my-project.dataset.users data.csv \
--format csv --write-disposition replace --yes
Load Job Features:
append (default) or replace (truncate first)--max-bad-recordsWhen to Use:
When NOT to Use:
tables insert instead (no GCS required)tables insert or upload to GCS firstLoad Options:
--format <FORMAT>: csv, json, avro, parquet, orc (default: csv)--write-disposition <DISPOSITION>: append or replace (default: append)--autodetect: Auto-detect schema from source files--dry-run: Validate without loading--max-bad-records <N>: Maximum bad records before failing--ignore-unknown: Ignore unknown fields--yes: Skip confirmation promptsExport table data to Cloud Storage in various formats:
# Extract table to Cloud Storage as CSV
bigquery tables extract my-project.dataset.users \
gs://my-bucket/exports/users.csv --format csv
# Extract as JSON
bigquery tables extract my-project.dataset.events \
gs://my-bucket/exports/events-*.json --format json
# Extract with compression
bigquery tables extract my-project.dataset.large_table \
gs://my-bucket/exports/data-*.csv.gz --format csv --compression gzip
# Extract as Avro with Snappy compression
bigquery tables extract my-project.dataset.events \
gs://my-bucket/exports/events-*.avro --format avro --compression snappy
# Extract as Parquet
bigquery tables extract my-project.dataset.analytics \
gs://my-bucket/exports/analytics.parquet --format parquet
# CSV with custom delimiter and header
bigquery tables extract my-project.dataset.data \
gs://my-bucket/data.csv \
--format csv \
--field-delimiter "|" \
--print-header
# Dry-run to validate configuration
bigquery tables extract my-project.dataset.users \
gs://my-bucket/users.csv --format csv --dry-run
# Skip confirmation prompt
bigquery tables extract my-project.dataset.large \
gs://my-bucket/export.csv --format csv --yes
Supported Formats: CSV, JSON (newline-delimited), Avro, Parquet Compression: none, gzip, snappy (Avro/Parquet only)
External tables reference data in Cloud Storage without copying it to BigQuery.
# Create CSV external table
bigquery tables create-external my-project.dataset.external_table \
--source-uri gs://bucket/data.csv \
--format csv \
--schema "id:INTEGER,name:STRING,created_at:TIMESTAMP"
# Create with auto-detected schema
bigquery tables create-external my-project.dataset.external_table \
--source-uri gs://bucket/data.csv \
--format csv \
--autodetect
# Multiple source URIs (comma-separated)
bigquery tables create-external my-project.dataset.external_table \
--source-uri "gs://bucket/file1.csv,gs://bucket/file2.csv" \
--format csv \
--autodetect
# Multiple source URIs (multiple flags)
bigquery tables create-external my-project.dataset.external_table \
--source-uri gs://bucket/file1.csv \
--source-uri gs://bucket/file2.csv \
--format csv \
--autodetect
# CSV-specific options
bigquery tables create-external my-project.dataset.external_table \
--source-uri gs://bucket/data.csv \
--format csv \
--schema "id:INTEGER,name:STRING" \
--field-delimiter "," \
--skip-leading-rows 1
# Other formats (Parquet, JSON, Avro, ORC)
bigquery tables create-external my-project.dataset.parquet_table \
--source-uri gs://bucket/data.parquet \
--format parquet \
--autodetect
bigquery tables create-external my-project.dataset.json_table \
--source-uri gs://bucket/data.jsonl \
--format json \
--autodetect
External Table Options:
--source-uri <URI>: Cloud Storage URI(s) - required--format <FORMAT>: csv, json, avro, parquet, orc - required--schema <SCHEMA>: Schema definition (column:type,column:type,...)--autodetect: Auto-detect schema from source files--field-delimiter <DELIMITER>: CSV field delimiter (default: ,)--skip-leading-rows <N>: CSV header rows to skip# Update source URIs
bigquery tables update-external my-project.dataset.external_table \
--source-uri gs://bucket/new-data.csv
# Update schema
bigquery tables update-external my-project.dataset.external_table \
--schema "id:INTEGER,name:STRING,email:STRING"
# Update CSV options
bigquery tables update-external my-project.dataset.external_table \
--field-delimiter "|" \
--skip-leading-rows 2
# Update multiple properties
bigquery tables update-external my-project.dataset.external_table \
--source-uri gs://bucket/new-data.csv \
--schema "id:INTEGER,name:STRING,updated_at:TIMESTAMP" \
--skip-leading-rows 1
Named query templates allow you to save frequently-used queries with parameter placeholders.
# List all available templates (text format)
bigquery templates list
# JSON output
bigquery templates list --format json
# Shows:
# - Template name
# - Description
# - Parameters
# - Query preview
# Search by name or description
bigquery templates search "customer"
bigquery templates search "daily metrics"
# JSON output
bigquery templates search "analytics" --format json
# Validate template for parameter consistency
bigquery templates validate my-template
# Checks:
# - Parameter definitions match query placeholders
# - Required parameters are defined
# - Parameter types are valid
# Run template with default parameters
bigquery templates run my-template
# Override parameters
bigquery templates run daily-report \
--param date=2025-01-15 \
--param region=US
# Multiple parameters
bigquery templates run customer-analysis \
--param customer_id=CUST123 \
--param start_date=2025-01-01 \
--param end_date=2025-01-31
# JSON output
bigquery templates run my-template --format json
# Skip cost confirmation
bigquery templates run expensive-query --yes
Template Run Options:
--param <KEY=VALUE>: Parameter override (can be used multiple times)--format <FORMAT>: Output format (json or text, default: json)--yes: Skip cost confirmation prompt# 1. Search for templates
bigquery templates search "revenue"
# 2. Validate template before running
bigquery templates validate monthly-revenue
# 3. Run with parameters
bigquery templates run monthly-revenue \
--param month=2025-01 \
--param min_amount=1000
# 4. Run in automation (skip confirmation)
bigquery templates run monthly-revenue \
--param month=2025-01 \
--yes \
--format json > output.json
Use templates for:
The BigQuery MCP server provides semantic search and natural language query capabilities via Model Context Protocol.
STDIO Mode (for local clients):
# Start MCP server in stdio mode
bigquery mcp stdio
# Server will:
# - Accept MCP protocol messages on stdin
# - Send responses on stdout
# - Expose BigQuery tools to MCP clients
HTTP Mode (for network clients):
# Start HTTP MCP server on default port 8080
bigquery mcp http
# Specify custom port
bigquery mcp http --port 3000
# Server provides:
# - HTTP endpoint for MCP protocol
# - JSON-RPC over HTTP
# - Remote access to BigQuery tools
The MCP server exposes these tools through the Model Context Protocol:
Configure in Claude Code or other MCP-enabled applications:
STDIO Mode (.claude/mcp.json or similar):
{
"mcpServers": {
"bigquery": {
"command": "bigquery",
"args": ["mcp", "stdio"],
"env": {
"GOOGLE_CLOUD_PROJECT": "my-project"
}
}
}
}
HTTP Mode:
{
"mcpServers": {
"bigquery": {
"url": "http://localhost:8080",
"transport": "http"
}
}
}
When using BigQuery MCP through clients:
Semantic Search:
"Find all tables containing customer purchase data from the last 30 days"
→ MCP translates to appropriate SQL query
Schema Discovery:
"What columns are in the analytics.events table?"
→ MCP returns schema information
Natural Language Queries:
"Show me total revenue by region for Q1 2025"
→ MCP generates and executes SQL
Template Execution:
"Run the monthly revenue template for January 2025"
→ MCP executes template with parameters
The BigQuery LSP provides SQL language features in text editors.
# Start LSP server
bigquery lsp
# Server provides:
# - Language Server Protocol communication
# - SQL syntax validation
# - Schema-aware completions
# - Query formatting
# - Hover documentation
Neovim:
-- In nvim/lua/bigquery-lsp.lua or init.lua
vim.api.nvim_create_autocmd("FileType", {
pattern = { "sql", "bq", "bigquery" },
callback = function()
vim.lsp.start({
name = "bigquery-lsp",
cmd = { "bigquery", "lsp" },
root_dir = vim.fn.getcwd(),
})
end,
})
VS Code (in settings.json or language server config):
{
"bigquery-lsp": {
"command": "bigquery",
"args": ["lsp"],
"filetypes": ["sql", "bq", "bigquery"]
}
}
Helix (in languages.toml):
[[language]]
name = "sql"
language-servers = ["bigquery-lsp"]
[language-server.bigquery-lsp]
command = "bigquery"
args = ["lsp"]
# 1. Verify authentication
bigquery auth check
# 2. List available datasets
bigquery datasets list my-project
# 3. List tables in dataset
bigquery tables list my-project.analytics
# 4. Check table schema
bigquery tables describe my-project.analytics.events
# 5. Preview data (text format for readability)
bigquery query --format text \
"SELECT * FROM my-project.analytics.events LIMIT 10"
# 6. Get row count
bigquery query "SELECT COUNT(*) as total FROM my-project.analytics.events"
# 7. Check data distribution
bigquery query --format text "
SELECT
DATE(timestamp) as date,
COUNT(*) as events
FROM my-project.analytics.events
GROUP BY date
ORDER BY date DESC
LIMIT 30
"
# 1. Dry run to estimate cost
bigquery dry-run "
SELECT *
FROM my-project.large_dataset.table
WHERE date >= '2025-01-01'
"
# 2. If cost is acceptable, run query
bigquery query "
SELECT *
FROM my-project.large_dataset.table
WHERE date >= '2025-01-01'
"
# 3. For automation, skip confirmation
bigquery query --yes "
SELECT *
FROM my-project.large_dataset.table
WHERE date >= '2025-01-01'
" > results.json
# 1. Search for relevant templates
bigquery templates search "daily"
# 2. Validate template
bigquery templates validate daily-metrics
# 3. Run template with parameters
bigquery templates run daily-metrics \
--param date=$(date +%Y-%m-%d) \
--param region=US \
--format json > daily-report.json
# 4. Schedule in cron or CI/CD
# 0 1 * * * bigquery templates run daily-metrics --param date=$(date +%Y-%m-%d) --yes
# 1. Create external table pointing to GCS
bigquery tables create-external my-project.staging.raw_logs \
--source-uri gs://logs-bucket/2025-01-*.json \
--format json \
--autodetect
# 2. Query external table
bigquery query "
SELECT
timestamp,
user_id,
action
FROM my-project.staging.raw_logs
WHERE action = 'purchase'
LIMIT 100
"
# 3. Update external table when new files arrive
bigquery tables update-external my-project.staging.raw_logs \
--source-uri gs://logs-bucket/2025-02-*.json
# 1. Load initial data
bigquery tables load my-project.dataset.events \
gs://bucket/events-2025-01-01.csv \
--format csv \
--write-disposition replace
# 2. Append incremental data
bigquery tables load my-project.dataset.events \
gs://bucket/events-2025-01-02.csv \
--format csv \
--write-disposition append
# 3. Verify data loaded
bigquery query "
SELECT
DATE(timestamp) as date,
COUNT(*) as count
FROM my-project.dataset.events
GROUP BY date
ORDER BY date
"
# 1. Insert single event (inline JSON)
bigquery tables insert my-project.dataset.events \
--json '{"user_id": "U123", "event": "click", "timestamp": "2025-01-15T10:00:00Z"}'
# 2. Stream JSONL from application
my-app --output jsonl | bigquery tables insert my-project.dataset.events --data - --format json
# 3. Insert batch from JSONL file
bigquery tables insert my-project.dataset.events \
--data events.jsonl --format json
# 4. Stream with transformation and error handling
cat raw_events.json | jq -c '.events[]' | \
bigquery tables insert my-project.dataset.events \
--data - --format json \
--skip-invalid \
--ignore-unknown
bigquery dry-run to estimate costs--format text for human-readable tables--format json for machine processing--yes flag for automationbigquery dry-runbigquery auth check before operationsbigquery auth login if check failsbigquery templates validate before executioninsert for <1000 rows (streaming insert API, immediate availability)load for >10MB files or >1000 rows (load jobs with GCS upload)--data - to pipe data from applications or transformations--dry-run flag to test configurations--max-bad-records for messy datareplace for full refresh, append for incremental# Set default project
export GOOGLE_CLOUD_PROJECT=my-project
# Set credentials (for service accounts)
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json
# Add to ~/.zshrc or ~/.bashrc for persistence
echo 'export GOOGLE_CLOUD_PROJECT=my-project' >> ~/.zshrc
User Credentials (interactive):
bigquery auth login
# Opens browser for Google authentication
Service Account (automation):
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/sa-key.json
bigquery auth check
Application Default Credentials (gcloud):
gcloud auth application-default login
bigquery auth check
Solution: Check authentication and scopes
# Check current auth status
bigquery auth check
# Re-authenticate if needed
bigquery auth login
# Verify gcloud is set to correct project
gcloud config get-value project
# Set project if needed
gcloud config set project my-project
Solution: Use fully qualified table names
# Wrong - missing project/dataset
bigquery query "SELECT * FROM table"
# Correct - fully qualified
bigquery query "SELECT * FROM my-project.my-dataset.my-table"
# Or use backticks for reserved words
bigquery query "SELECT * FROM \`my-project.my-dataset.my-table\`"
Solution: Check cost with dry-run and optimize
# Check estimated cost
bigquery dry-run "SELECT * FROM large_table WHERE date >= '2025-01-01'"
# Optimize with partition filters
bigquery dry-run "
SELECT * FROM large_table
WHERE _PARTITIONDATE = '2025-01-15'
"
Solution: Search for templates and verify name
# List all templates
bigquery templates list
# Search for template
bigquery templates search "keyword"
# Use exact template name
bigquery templates run exact-template-name
Cause: bigquery tables load with a local file requires GCS staging bucket configuration.
Solutions:
Preferred for small datasets (<1000 rows): Use tables insert instead (no GCS required)
bigquery tables insert my-project.dataset.table \
--data /tmp/data.jsonl \
--format json
For larger datasets: Upload to GCS first, then load
gsutil cp /tmp/large-file.jsonl gs://my-bucket/
bigquery tables load my-project.dataset.table \
gs://my-bucket/large-file.jsonl \
--format json
Last resort: Configure GCS staging bucket in BigQuery CLI config (requires additional setup)
Solution: Check schema format for external tables
# Schema format: column:type,column:type,...
bigquery tables create-external my-project.dataset.table \
--source-uri gs://bucket/file.csv \
--format csv \
--schema "id:INTEGER,name:STRING,created_at:TIMESTAMP"
# Or use autodetect
bigquery tables create-external my-project.dataset.table \
--source-uri gs://bucket/file.csv \
--format csv \
--autodetect
Solution: Check server mode and connectivity
# For stdio mode, ensure client is using stdio transport
bigquery mcp stdio
# For HTTP mode, check port and firewall
bigquery mcp http --port 8080
# Test HTTP endpoint
curl http://localhost:8080
Solution: Verify LSP configuration and binary path
# Check bigquery is in PATH
which bigquery
# Test LSP manually
bigquery lsp
# Verify editor configuration points to correct command
# Neovim: check cmd = { "bigquery", "lsp" }
# VS Code: check "command": "bigquery", "args": ["lsp"]
# Authentication
bigquery auth check # Check auth status
bigquery auth login # Login with gcloud
# Queries
bigquery query "SELECT ..." # Execute query
bigquery query --yes "SELECT ..." # Skip confirmation
bigquery query --format text "SELECT ..." # Table output
bigquery dry-run "SELECT ..." # Estimate cost
# Datasets
bigquery datasets list PROJECT # List datasets
# Tables
bigquery tables list PROJECT.DATASET # List tables
bigquery tables describe PROJECT.DATASET.TABLE # Show schema
bigquery tables insert TABLE --json '{"id": 1}' # Insert rows (inline)
bigquery tables insert TABLE --data file.jsonl --format json # Insert from JSONL
cat data.jsonl | bigquery tables insert TABLE --data - # Stream insert
bigquery tables load TABLE file.csv # Load data (bulk)
bigquery tables load TABLE gs://bucket/file.csv # Load from GCS
bigquery tables extract TABLE gs://bucket/output.csv # Extract to GCS
bigquery tables create-external TABLE --source-uri ... # External table
bigquery tables update-external TABLE --source-uri ... # Update external
# Templates
bigquery templates list # List templates
bigquery templates search "keyword" # Search templates
bigquery templates validate TEMPLATE # Validate template
bigquery templates run TEMPLATE --param key=value # Run template
# MCP Server
bigquery mcp stdio # Start MCP (stdio mode)
bigquery mcp http # Start MCP (HTTP mode)
bigquery mcp http --port 3000 # Custom port
# LSP Server
bigquery lsp # Start LSP server
#!/bin/bash
# daily-etl.sh
# Authenticate with service account
export GOOGLE_APPLICATION_CREDENTIALS=/secrets/sa-key.json
bigquery auth check || exit 1
# Run daily ETL template
bigquery templates run daily-etl \
--param date=$(date +%Y-%m-%d) \
--yes \
--format json > /tmp/etl-result.json
# Check result
if [ $? -eq 0 ]; then
echo "ETL completed successfully"
else
echo "ETL failed"
exit 1
fi
#!/bin/bash
# check-data-quality.sh
# Run data quality template
RESULT=$(bigquery templates run data-quality-check \
--param table=my-project.dataset.table \
--yes \
--format json)
# Parse result and check quality metrics
INVALID_ROWS=$(echo $RESULT | jq '.invalid_rows')
if [ "$INVALID_ROWS" -gt 100 ]; then
echo "Data quality check failed: $INVALID_ROWS invalid rows"
exit 1
else
echo "Data quality check passed"
fi
#!/bin/bash
# generate-report.sh
# Generate weekly report
bigquery templates run weekly-revenue-report \
--param week_start=$(date -d "last monday" +%Y-%m-%d) \
--param week_end=$(date -d "next sunday" +%Y-%m-%d) \
--yes \
--format json > /reports/weekly-$(date +%Y-%m-%d).json
# Upload to GCS
gsutil cp /reports/weekly-*.json gs://reports-bucket/
Primary commands:
bigquery auth {check,login} - Authentication managementbigquery query - Execute SQL with cost awarenessbigquery dry-run - Estimate query costsbigquery datasets list - List datasetsbigquery tables {list,describe,insert,load,extract,create-external,update-external} - Table operationsbigquery templates {list,search,validate,run} - Named templatesbigquery mcp {stdio,http} - MCP server modesbigquery lsp - LSP serverKey features:
--data -)Best practices:
auth checkdry-run to estimate costs before expensive queries--yes flag for automation and CI/CDinsert for <1000 rows, load for larger datasets--data - for data transformationsMCP Integration:
devops
DORA engineering metrics project at ~/src/dora. Load when: querying DORA BigQuery views (deployment frequency, lead time, change failure rate, alerts, review time) from any project; joining against DORA.unified_identity or DORA_clean.* views from any project; running the data pipeline (just refresh, just download-*, just upload-*); making OpenTofu infrastructure changes to DORA tables or views; working with team attribution, team identity, or engineer roster data.
development
Data pipeline architecture patterns and best practices, including medallion/three-layer architecture (Raw/Staging/Enriched or Bronze/Silver/Gold), YAML-based schema management, and ETL workflow patterns. Use when designing or implementing data pipelines, working with data warehouse layers, or managing table schemas in YAML.
data-ai
Delegate research and context-gathering tasks to a sub-agent to protect the primary context window. Use when the user asks to "research X", "look into X", "find out about X", "gather context on X", or any investigative framing where answering requires 2+ searches or multiple sources. Also use proactively before starting substantive work when prior context is unknown. Never run research inline — always delegate.
documentation
--- name: qmd-math description: Math notation conventions for Quarto/EPQ documents rendered via lualatex. Use when: writing or adding a formula, equation, or mathematical expression to a .qmd file; asked about display math, inline math, or LaTeX notation in a QMD/Quarto context; defining a where-clause or variable definitions for an equation; converting prose variable descriptions into structured math notation; fixing math that renders badly in a PDF; using \lvert, \begin{aligned}, \tfrac, \text