BigQuery CLI Skill

You are a BigQuery specialist using the bigquery CLI tool. This skill provides comprehensive guidance for working with Google BigQuery through a unified Rust-based CLI with query execution, template management, and server modes.

Core Capabilities

The bigquery CLI provides:

Authentication: Check status and login with gcloud
Query Execution: Run SQL queries with cost awareness and confirmation prompts
Dry Run: Estimate query costs without execution
Dataset Operations: List datasets in a project
Table Operations: List, describe, insert, load, and manage external tables
Template System: Named query templates with parameter substitution
MCP Server: Semantic search via stdio or HTTP modes
LSP Server: SQL language server for editor integration

Authentication

Check Authentication Status

# Check if authenticated and verify required scopes
bigquery auth check

# Will show:
# - Authentication status
# - Active account
# - BigQuery scopes availability

Login with gcloud

# Authenticate with gcloud including all required BigQuery scopes
bigquery auth login

# This will:
# 1. Run gcloud auth login
# 2. Ensure all necessary BigQuery scopes are granted
# 3. Verify authentication succeeded

Best Practice: Always run bigquery auth check first to verify authentication before operations.

Query Operations

Running Queries

# Basic query execution (interactive cost confirmation)
bigquery query "SELECT * FROM dataset.table LIMIT 10"

# Skip cost confirmation for automation
bigquery query --yes "SELECT COUNT(*) FROM dataset.table"

# JSON output (default)
bigquery query "SELECT * FROM dataset.table LIMIT 5"

# Text/table output
bigquery query --format text "SELECT * FROM dataset.table LIMIT 5"

Cost Awareness: The query command automatically:

Estimates query cost before execution
Displays bytes to be processed
Prompts for confirmation (unless --yes is used)
Prevents accidental expensive queries

Query Output Formats

# JSON output (default, machine-readable)
bigquery query "SELECT * FROM dataset.table"
bigquery query --format json "SELECT * FROM dataset.table"

# Text output (human-readable table)
bigquery query --format text "SELECT * FROM dataset.table"

Dry Run (Cost Estimation)

# Estimate cost without executing
bigquery dry-run "SELECT * FROM large_dataset.table WHERE date >= '2025-01-01'"

# Returns:
# - Bytes that would be processed
# - Estimated cost
# - No actual data

Use dry-run to:

Estimate costs before running expensive queries
Validate query syntax
Check partition pruning effectiveness
Test queries in CI/CD pipelines

Dataset Operations

Listing Datasets

# List datasets in current project (text format, default)
bigquery datasets list my-project

# JSON output
bigquery datasets list my-project --format json

# Example output shows:
# - Dataset ID
# - Location
# - Creation time
# - Labels (if any)

Note: Dataset reference format is project.dataset or just project to list all datasets.

Table Operations

Listing Tables

# List tables in a dataset (text format, first 10)
bigquery tables list my-project.my-dataset

# JSON output
bigquery tables list my-project.my-dataset --format json

# Limit results
bigquery tables list my-project.my-dataset --limit 20

# Maximum limit is 100
bigquery tables list my-project.my-dataset --limit 100

Describing Table Schema

# Show table schema and metadata (text format)
bigquery tables describe my-project.my-dataset.my-table

# JSON output
bigquery tables describe my-project.my-dataset.my-table --format json

# Output includes:
# - Column names and types
# - Nullability (NULLABLE, REQUIRED, REPEATED)
# - Mode information
# - Table metadata

Inserting Rows (Small Datasets)

Best for <1000 rows. Uses streaming insert API for immediate availability.

JSONL (Newline-Delimited JSON) Format

From JSONL File:

# Create sample JSONL file
cat > users.jsonl <<EOF
{"id": "1", "name": "Alice Johnson", "email": "[email protected]", "age": 30}
{"id": "2", "name": "Bob Smith", "email": "[email protected]", "age": 25}
{"id": "3", "name": "Charlie Brown", "email": "[email protected]", "age": 35}
EOF

# Insert from JSONL file
bigquery tables insert my-project.dataset.users \
  --data users.jsonl --format json

From JSONL Stream (stdin):

# Stream from command output
echo '{"id": "1", "name": "Alice", "email": "[email protected]"}' | \
  bigquery tables insert my-project.dataset.users --data - --format json

# Stream from multiple sources (heredoc)
cat << EOF | bigquery tables insert my-project.dataset.users --data - --format json
{"id": "1", "name": "Alice", "email": "[email protected]", "age": 30}
{"id": "2", "name": "Bob", "email": "[email protected]", "age": 25}
{"id": "3", "name": "Charlie", "email": "[email protected]", "age": 35}
EOF

# Stream from application output
my-etl-tool --output jsonl | bigquery tables insert my-project.dataset.events --data -

# Stream from compressed file
gunzip -c logs.jsonl.gz | bigquery tables insert my-project.dataset.logs --data -

# Stream from jq transformation
cat raw_data.json | jq -c '.records[]' | \
  bigquery tables insert my-project.dataset.processed --data -

JSONL Format Requirements:

Each line is a separate JSON object
Empty lines are automatically skipped
No commas between objects
Ideal for streaming and large datasets
Format: {"field1":"value1","field2":"value2"}\n

CSV Format

From CSV File:

# Create sample CSV file
cat > users.csv <<EOF
id,name,email,age
1,Alice Johnson,[email protected],30
2,Bob Smith,[email protected],25
3,"Charlie Brown, Jr.",[email protected],35
EOF

# Insert from CSV file
bigquery tables insert my-project.dataset.users \
  --data users.csv --format csv

From CSV Stream (stdin):

# Stream from heredoc
cat << EOF | bigquery tables insert my-project.dataset.users --data - --format csv
id,name,email,age
1,Alice Johnson,[email protected],30
2,Bob Smith,[email protected],25
3,Charlie Brown,[email protected],35
EOF

# Stream from application output
./generate_report.sh | bigquery tables insert my-project.dataset.reports --data - --format csv

# Stream from compressed CSV
gunzip -c data.csv.gz | bigquery tables insert my-project.dataset.imports --data -

# Stream from curl/API response
curl -s https://api.example.com/export.csv | \
  bigquery tables insert my-project.dataset.api_data --data - --format csv

# Transform and stream CSV
cat raw.csv | tail -n +2 | awk '{print tolower($0)}' | \
  bigquery tables insert my-project.dataset.cleaned --data - --format csv

CSV Format Requirements:

First row must contain column headers matching BigQuery table schema
Values are inserted as strings (BigQuery will coerce types)
Supports quoted fields, escaped quotes, and newlines (RFC 4180 compliant)
Headers are case-sensitive and must match table column names

Additional Insert Options

# Insert inline JSON (single object)
bigquery tables insert my-project.dataset.users \
  --json '{"id": "1", "name": "Alice", "email": "[email protected]"}'

# Insert inline JSON array
bigquery tables insert my-project.dataset.users \
  --json '[{"id": "1", "name": "Alice"}, {"id": "2", "name": "Bob"}]'

# Dry-run validation (no data inserted)
bigquery tables insert my-project.dataset.users \
  --data users.csv --format csv --dry-run

# Skip invalid rows instead of failing
bigquery tables insert my-project.dataset.users \
  --data users.csv --format csv --skip-invalid

# Ignore unknown fields in data
bigquery tables insert my-project.dataset.users \
  --data users.csv --format csv --ignore-unknown

# Combine options for production pipelines
cat production_data.jsonl | \
  bigquery tables insert my-project.dataset.production \
    --data - --format json \
    --skip-invalid \
    --ignore-unknown

Insert Options:

--json <JSON>: Inline JSON data (object or array)
--data <PATH>: Path to data file, or - for stdin
--format <FORMAT>: Data format (json or csv, default: json)
--dry-run: Validate without inserting
--skip-invalid: Skip invalid rows instead of failing
--ignore-unknown: Ignore unknown fields in data
--yes: Skip confirmation prompts

Loading Data (Large Datasets)

Best for >10MB files or >1000 rows. Uses BigQuery load jobs.

⚠️ IMPORTANT: Local file loading requires GCS staging bucket configuration.

If you get "The specified bucket does not exist" error, use tables insert for datasets <1000 rows instead
For larger datasets, upload to GCS first, then use bigquery tables load gs://...

# Load from Cloud Storage URI (RECOMMENDED - no bucket config needed)
bigquery tables load my-project.dataset.users \
  gs://my-bucket/data.csv --format csv

# Load from local CSV file (requires GCS staging bucket configured)
bigquery tables load my-project.dataset.users data.csv --format csv

# Load with schema auto-detection
bigquery tables load my-project.dataset.new_table data.csv \
  --format csv --autodetect

# Load with replace write disposition (truncates table first)
bigquery tables load my-project.dataset.users data.csv \
  --format csv --write-disposition replace

# Load JSON file
bigquery tables load my-project.dataset.events events.json \
  --format json

# Supported formats: csv, json, avro, parquet, orc
bigquery tables load my-project.dataset.table data.parquet \
  --format parquet

# Dry-run validation (no data loaded)
bigquery tables load my-project.dataset.users data.csv \
  --format csv --dry-run

# Allow some bad records (skip up to 100 invalid rows)
bigquery tables load my-project.dataset.users data.csv \
  --format csv --max-bad-records 100

# Ignore unknown fields
bigquery tables load my-project.dataset.users data.csv \
  --format csv --ignore-unknown

# Skip confirmation prompts (for automation/CI)
bigquery tables load my-project.dataset.users data.csv \
  --format csv --write-disposition replace --yes

Load Job Features:

GCS Staging Bucket Required: Local file loading needs GCS bucket configuration
Real-time progress tracking with exponential backoff
Automatic cleanup of temporary files after completion
Write modes: append (default) or replace (truncate first)
Safety confirmations for destructive operations
Configurable error tolerance with --max-bad-records

When to Use:

Large datasets (>1000 rows or >10MB)
Data already in Cloud Storage
Bulk data migrations

When NOT to Use:

Small datasets (<1000 rows) → Use tables insert instead (no GCS required)
Don't have GCS staging bucket configured → Use tables insert or upload to GCS first

Load Options:

--format <FORMAT>: csv, json, avro, parquet, orc (default: csv)
--write-disposition <DISPOSITION>: append or replace (default: append)
--autodetect: Auto-detect schema from source files
--dry-run: Validate without loading
--max-bad-records <N>: Maximum bad records before failing
--ignore-unknown: Ignore unknown fields
--yes: Skip confirmation prompts

Extracting Data

Export table data to Cloud Storage in various formats:

# Extract table to Cloud Storage as CSV
bigquery tables extract my-project.dataset.users \
  gs://my-bucket/exports/users.csv --format csv

# Extract as JSON
bigquery tables extract my-project.dataset.events \
  gs://my-bucket/exports/events-*.json --format json

# Extract with compression
bigquery tables extract my-project.dataset.large_table \
  gs://my-bucket/exports/data-*.csv.gz --format csv --compression gzip

# Extract as Avro with Snappy compression
bigquery tables extract my-project.dataset.events \
  gs://my-bucket/exports/events-*.avro --format avro --compression snappy

# Extract as Parquet
bigquery tables extract my-project.dataset.analytics \
  gs://my-bucket/exports/analytics.parquet --format parquet

# CSV with custom delimiter and header
bigquery tables extract my-project.dataset.data \
  gs://my-bucket/data.csv \
  --format csv \
  --field-delimiter "|" \
  --print-header

# Dry-run to validate configuration
bigquery tables extract my-project.dataset.users \
  gs://my-bucket/users.csv --format csv --dry-run

# Skip confirmation prompt
bigquery tables extract my-project.dataset.large \
  gs://my-bucket/export.csv --format csv --yes

Supported Formats: CSV, JSON (newline-delimited), Avro, Parquet Compression: none, gzip, snappy (Avro/Parquet only)

External Tables

External tables reference data in Cloud Storage without copying it to BigQuery.

Creating External Tables

# Create CSV external table
bigquery tables create-external my-project.dataset.external_table \
  --source-uri gs://bucket/data.csv \
  --format csv \
  --schema "id:INTEGER,name:STRING,created_at:TIMESTAMP"

# Create with auto-detected schema
bigquery tables create-external my-project.dataset.external_table \
  --source-uri gs://bucket/data.csv \
  --format csv \
  --autodetect

# Multiple source URIs (comma-separated)
bigquery tables create-external my-project.dataset.external_table \
  --source-uri "gs://bucket/file1.csv,gs://bucket/file2.csv" \
  --format csv \
  --autodetect

# Multiple source URIs (multiple flags)
bigquery tables create-external my-project.dataset.external_table \
  --source-uri gs://bucket/file1.csv \
  --source-uri gs://bucket/file2.csv \
  --format csv \
  --autodetect

# CSV-specific options
bigquery tables create-external my-project.dataset.external_table \
  --source-uri gs://bucket/data.csv \
  --format csv \
  --schema "id:INTEGER,name:STRING" \
  --field-delimiter "," \
  --skip-leading-rows 1

# Other formats (Parquet, JSON, Avro, ORC)
bigquery tables create-external my-project.dataset.parquet_table \
  --source-uri gs://bucket/data.parquet \
  --format parquet \
  --autodetect

bigquery tables create-external my-project.dataset.json_table \
  --source-uri gs://bucket/data.jsonl \
  --format json \
  --autodetect

External Table Options:

--source-uri <URI>: Cloud Storage URI(s) - required
--format <FORMAT>: csv, json, avro, parquet, orc - required
--schema <SCHEMA>: Schema definition (column:type,column:type,...)
--autodetect: Auto-detect schema from source files
--field-delimiter <DELIMITER>: CSV field delimiter (default: ,)
--skip-leading-rows <N>: CSV header rows to skip

Updating External Tables

# Update source URIs
bigquery tables update-external my-project.dataset.external_table \
  --source-uri gs://bucket/new-data.csv

# Update schema
bigquery tables update-external my-project.dataset.external_table \
  --schema "id:INTEGER,name:STRING,email:STRING"

# Update CSV options
bigquery tables update-external my-project.dataset.external_table \
  --field-delimiter "|" \
  --skip-leading-rows 2

# Update multiple properties
bigquery tables update-external my-project.dataset.external_table \
  --source-uri gs://bucket/new-data.csv \
  --schema "id:INTEGER,name:STRING,updated_at:TIMESTAMP" \
  --skip-leading-rows 1

Template System

Named query templates allow you to save frequently-used queries with parameter placeholders.

Listing Templates

# List all available templates (text format)
bigquery templates list

# JSON output
bigquery templates list --format json

# Shows:
# - Template name
# - Description
# - Parameters
# - Query preview

Searching Templates

# Search by name or description
bigquery templates search "customer"
bigquery templates search "daily metrics"

# JSON output
bigquery templates search "analytics" --format json

Validating Templates

# Validate template for parameter consistency
bigquery templates validate my-template

# Checks:
# - Parameter definitions match query placeholders
# - Required parameters are defined
# - Parameter types are valid

Running Templates

# Run template with default parameters
bigquery templates run my-template

# Override parameters
bigquery templates run daily-report \
  --param date=2025-01-15 \
  --param region=US

# Multiple parameters
bigquery templates run customer-analysis \
  --param customer_id=CUST123 \
  --param start_date=2025-01-01 \
  --param end_date=2025-01-31

# JSON output
bigquery templates run my-template --format json

# Skip cost confirmation
bigquery templates run expensive-query --yes

Template Run Options:

--param <KEY=VALUE>: Parameter override (can be used multiple times)
--format <FORMAT>: Output format (json or text, default: json)
--yes: Skip cost confirmation prompt

Template Workflow Example

# 1. Search for templates
bigquery templates search "revenue"

# 2. Validate template before running
bigquery templates validate monthly-revenue

# 3. Run with parameters
bigquery templates run monthly-revenue \
  --param month=2025-01 \
  --param min_amount=1000

# 4. Run in automation (skip confirmation)
bigquery templates run monthly-revenue \
  --param month=2025-01 \
  --yes \
  --format json > output.json

Use templates for:

Standardized reporting queries
Common analytics patterns
Scheduled data pipelines
Team query sharing
Reducing query errors

MCP Server Integration

The BigQuery MCP server provides semantic search and natural language query capabilities via Model Context Protocol.

Starting MCP Server

STDIO Mode (for local clients):

# Start MCP server in stdio mode
bigquery mcp stdio

# Server will:
# - Accept MCP protocol messages on stdin
# - Send responses on stdout
# - Expose BigQuery tools to MCP clients

HTTP Mode (for network clients):

# Start HTTP MCP server on default port 8080
bigquery mcp http

# Specify custom port
bigquery mcp http --port 3000

# Server provides:
# - HTTP endpoint for MCP protocol
# - JSON-RPC over HTTP
# - Remote access to BigQuery tools

MCP Server Capabilities

The MCP server exposes these tools through the Model Context Protocol:

semantic_search: Search tables using natural language
execute_query: Run SQL queries with automatic formatting
get_schema: Retrieve table schemas
list_tables: List available tables
list_datasets: List available datasets
explain_query: Get query execution plan
optimize_query: Suggest query optimizations
run_template: Execute named templates with parameters

MCP Configuration

Configure in Claude Code or other MCP-enabled applications:

STDIO Mode (.claude/mcp.json or similar):

{
  "mcpServers": {
    "bigquery": {
      "command": "bigquery",
      "args": ["mcp", "stdio"],
      "env": {
        "GOOGLE_CLOUD_PROJECT": "my-project"
      }
    }
  }
}

HTTP Mode:

{
  "mcpServers": {
    "bigquery": {
      "url": "http://localhost:8080",
      "transport": "http"
    }
  }
}

MCP Usage Patterns

When using BigQuery MCP through clients:

Semantic Search:

"Find all tables containing customer purchase data from the last 30 days"
→ MCP translates to appropriate SQL query

Schema Discovery:

"What columns are in the analytics.events table?"
→ MCP returns schema information

Natural Language Queries:

"Show me total revenue by region for Q1 2025"
→ MCP generates and executes SQL

Template Execution:

"Run the monthly revenue template for January 2025"
→ MCP executes template with parameters

LSP Integration

The BigQuery LSP provides SQL language features in text editors.

Starting LSP Server

# Start LSP server
bigquery lsp

# Server provides:
# - Language Server Protocol communication
# - SQL syntax validation
# - Schema-aware completions
# - Query formatting
# - Hover documentation

LSP Features

SQL syntax highlighting: Proper tokenization and highlighting
Schema completion: Table and column suggestions based on project schema
Query validation: Real-time syntax and semantic checks
Hover documentation: Table and column info on hover
Go to definition: Navigate to table definitions
Query formatting: Auto-format SQL queries
Diagnostics: Show errors and warnings inline

Editor Configuration

Neovim:

-- In nvim/lua/bigquery-lsp.lua or init.lua
vim.api.nvim_create_autocmd("FileType", {
  pattern = { "sql", "bq", "bigquery" },
  callback = function()
    vim.lsp.start({
      name = "bigquery-lsp",
      cmd = { "bigquery", "lsp" },
      root_dir = vim.fn.getcwd(),
    })
  end,
})

VS Code (in settings.json or language server config):

{
  "bigquery-lsp": {
    "command": "bigquery",
    "args": ["lsp"],
    "filetypes": ["sql", "bq", "bigquery"]
  }
}

Helix (in languages.toml):

[[language]]
name = "sql"
language-servers = ["bigquery-lsp"]

[language-server.bigquery-lsp]
command = "bigquery"
args = ["lsp"]

Common Workflows

Workflow 1: Exploratory Data Analysis

# 1. Verify authentication
bigquery auth check

# 2. List available datasets
bigquery datasets list my-project

# 3. List tables in dataset
bigquery tables list my-project.analytics

# 4. Check table schema
bigquery tables describe my-project.analytics.events

# 5. Preview data (text format for readability)
bigquery query --format text \
  "SELECT * FROM my-project.analytics.events LIMIT 10"

# 6. Get row count
bigquery query "SELECT COUNT(*) as total FROM my-project.analytics.events"

# 7. Check data distribution
bigquery query --format text "
  SELECT
    DATE(timestamp) as date,
    COUNT(*) as events
  FROM my-project.analytics.events
  GROUP BY date
  ORDER BY date DESC
  LIMIT 30
"

Workflow 2: Cost-Aware Query Development

# 1. Dry run to estimate cost
bigquery dry-run "
  SELECT *
  FROM my-project.large_dataset.table
  WHERE date >= '2025-01-01'
"

# 2. If cost is acceptable, run query
bigquery query "
  SELECT *
  FROM my-project.large_dataset.table
  WHERE date >= '2025-01-01'
"

# 3. For automation, skip confirmation
bigquery query --yes "
  SELECT *
  FROM my-project.large_dataset.table
  WHERE date >= '2025-01-01'
" > results.json

Workflow 3: Template-Based Reporting

# 1. Search for relevant templates
bigquery templates search "daily"

# 2. Validate template
bigquery templates validate daily-metrics

# 3. Run template with parameters
bigquery templates run daily-metrics \
  --param date=$(date +%Y-%m-%d) \
  --param region=US \
  --format json > daily-report.json

# 4. Schedule in cron or CI/CD
# 0 1 * * * bigquery templates run daily-metrics --param date=$(date +%Y-%m-%d) --yes

Workflow 4: External Data Analysis

# 1. Create external table pointing to GCS
bigquery tables create-external my-project.staging.raw_logs \
  --source-uri gs://logs-bucket/2025-01-*.json \
  --format json \
  --autodetect

# 2. Query external table
bigquery query "
  SELECT
    timestamp,
    user_id,
    action
  FROM my-project.staging.raw_logs
  WHERE action = 'purchase'
  LIMIT 100
"

# 3. Update external table when new files arrive
bigquery tables update-external my-project.staging.raw_logs \
  --source-uri gs://logs-bucket/2025-02-*.json

Workflow 5: Data Loading Pipeline

# 1. Load initial data
bigquery tables load my-project.dataset.events \
  gs://bucket/events-2025-01-01.csv \
  --format csv \
  --write-disposition replace

# 2. Append incremental data
bigquery tables load my-project.dataset.events \
  gs://bucket/events-2025-01-02.csv \
  --format csv \
  --write-disposition append

# 3. Verify data loaded
bigquery query "
  SELECT
    DATE(timestamp) as date,
    COUNT(*) as count
  FROM my-project.dataset.events
  GROUP BY date
  ORDER BY date
"

Workflow 6: Real-Time Data Insertion

# 1. Insert single event (inline JSON)
bigquery tables insert my-project.dataset.events \
  --json '{"user_id": "U123", "event": "click", "timestamp": "2025-01-15T10:00:00Z"}'

# 2. Stream JSONL from application
my-app --output jsonl | bigquery tables insert my-project.dataset.events --data - --format json

# 3. Insert batch from JSONL file
bigquery tables insert my-project.dataset.events \
  --data events.jsonl --format json

# 4. Stream with transformation and error handling
cat raw_events.json | jq -c '.events[]' | \
  bigquery tables insert my-project.dataset.events \
    --data - --format json \
    --skip-invalid \
    --ignore-unknown

Best Practices

Query Development

Always dry-run first: Use bigquery dry-run to estimate costs
Use templates: Create templates for repeated queries
Validate before running: Check syntax and cost before execution
Use text format for exploration: --format text for human-readable tables
Use JSON for automation: --format json for machine processing
Skip confirmations in scripts: Use --yes flag for automation

Cost Management

Dry run expensive queries: Always estimate with bigquery dry-run
Monitor bytes processed: Check query cost estimates before running
Use partition pruning: Filter on partitioned columns in WHERE clauses
Limit result sets: Use LIMIT for exploratory queries
Use templates: Standardize queries to avoid mistakes
Leverage external tables: Avoid copying data when querying directly from GCS

Authentication

Check auth first: Run bigquery auth check before operations
Use service accounts: For automation and CI/CD
Verify scopes: Ensure all required BigQuery scopes are granted
Re-authenticate when needed: bigquery auth login if check fails

Template Management

Use descriptive names: Make templates easy to find
Document parameters: Include parameter descriptions in templates
Validate before use: Run bigquery templates validate before execution
Search before creating: Check if similar template exists
Version control templates: Store template definitions in git

Data Loading

Choose the right method:
- Use insert for <1000 rows (streaming insert API, immediate availability)
- Use load for >10MB files or >1000 rows (load jobs with GCS upload)
Use JSONL for streaming: Newline-delimited JSON is ideal for streaming pipelines
Stream from stdin: Use --data - to pipe data from applications or transformations
Validate before loading: Use --dry-run flag to test configurations
Handle bad records: Set --max-bad-records for messy data
Choose write disposition: replace for full refresh, append for incremental
Use external tables: For data that changes frequently in GCS (no data copying)
Use appropriate formats: CSV for simple data, JSON/JSONL for complex, Parquet/Avro for large datasets

MCP Server

Use stdio for local: Prefer stdio mode for local MCP clients
Use HTTP for remote: Use HTTP mode for networked deployments
Secure HTTP endpoints: Put HTTP server behind authentication/firewall
Monitor server logs: Check for errors and performance issues
Set appropriate port: Choose non-conflicting port for HTTP mode

LSP Integration

Configure per-project: Set up LSP for SQL files in your editor
Use schema completion: Leverage auto-complete for table/column names
Check diagnostics: Fix errors and warnings shown inline
Format queries: Use LSP formatting for consistent style

Configuration

Environment Variables

# Set default project
export GOOGLE_CLOUD_PROJECT=my-project

# Set credentials (for service accounts)
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json

# Add to ~/.zshrc or ~/.bashrc for persistence
echo 'export GOOGLE_CLOUD_PROJECT=my-project' >> ~/.zshrc

Authentication Methods

User Credentials (interactive):

bigquery auth login
# Opens browser for Google authentication

Service Account (automation):

export GOOGLE_APPLICATION_CREDENTIALS=/path/to/sa-key.json
bigquery auth check

Application Default Credentials (gcloud):

gcloud auth application-default login
bigquery auth check

Troubleshooting

Issue: "Not authenticated" or "Permission denied"

Solution: Check authentication and scopes

# Check current auth status
bigquery auth check

# Re-authenticate if needed
bigquery auth login

# Verify gcloud is set to correct project
gcloud config get-value project

# Set project if needed
gcloud config set project my-project

Issue: "Table not found"

Solution: Use fully qualified table names

# Wrong - missing project/dataset
bigquery query "SELECT * FROM table"

# Correct - fully qualified
bigquery query "SELECT * FROM my-project.my-dataset.my-table"

# Or use backticks for reserved words
bigquery query "SELECT * FROM \`my-project.my-dataset.my-table\`"

Issue: "Query too expensive"

Solution: Check cost with dry-run and optimize

# Check estimated cost
bigquery dry-run "SELECT * FROM large_table WHERE date >= '2025-01-01'"

# Optimize with partition filters
bigquery dry-run "
  SELECT * FROM large_table
  WHERE _PARTITIONDATE = '2025-01-15'
"

Issue: "Template not found"

Solution: Search for templates and verify name

# List all templates
bigquery templates list

# Search for template
bigquery templates search "keyword"

# Use exact template name
bigquery templates run exact-template-name

Issue: "The specified bucket does not exist"

Cause: bigquery tables load with a local file requires GCS staging bucket configuration.

Solutions:

Preferred for small datasets (<1000 rows): Use tables insert instead (no GCS required)

bigquery tables insert my-project.dataset.table \
  --data /tmp/data.jsonl \
  --format json

For larger datasets: Upload to GCS first, then load

gsutil cp /tmp/large-file.jsonl gs://my-bucket/
bigquery tables load my-project.dataset.table \
  gs://my-bucket/large-file.jsonl \
  --format json

Last resort: Configure GCS staging bucket in BigQuery CLI config (requires additional setup)

Issue: "Invalid schema"

Solution: Check schema format for external tables

# Schema format: column:type,column:type,...
bigquery tables create-external my-project.dataset.table \
  --source-uri gs://bucket/file.csv \
  --format csv \
  --schema "id:INTEGER,name:STRING,created_at:TIMESTAMP"

# Or use autodetect
bigquery tables create-external my-project.dataset.table \
  --source-uri gs://bucket/file.csv \
  --format csv \
  --autodetect

Issue: "MCP server not responding"

Solution: Check server mode and connectivity

# For stdio mode, ensure client is using stdio transport
bigquery mcp stdio

# For HTTP mode, check port and firewall
bigquery mcp http --port 8080

# Test HTTP endpoint
curl http://localhost:8080

Issue: "LSP not starting in editor"

Solution: Verify LSP configuration and binary path

# Check bigquery is in PATH
which bigquery

# Test LSP manually
bigquery lsp

# Verify editor configuration points to correct command
# Neovim: check cmd = { "bigquery", "lsp" }
# VS Code: check "command": "bigquery", "args": ["lsp"]

Quick Reference

# Authentication
bigquery auth check                          # Check auth status
bigquery auth login                          # Login with gcloud

# Queries
bigquery query "SELECT ..."                  # Execute query
bigquery query --yes "SELECT ..."            # Skip confirmation
bigquery query --format text "SELECT ..."    # Table output
bigquery dry-run "SELECT ..."                # Estimate cost

# Datasets
bigquery datasets list PROJECT               # List datasets

# Tables
bigquery tables list PROJECT.DATASET                          # List tables
bigquery tables describe PROJECT.DATASET.TABLE               # Show schema
bigquery tables insert TABLE --json '{"id": 1}'              # Insert rows (inline)
bigquery tables insert TABLE --data file.jsonl --format json # Insert from JSONL
cat data.jsonl | bigquery tables insert TABLE --data -       # Stream insert
bigquery tables load TABLE file.csv                          # Load data (bulk)
bigquery tables load TABLE gs://bucket/file.csv              # Load from GCS
bigquery tables extract TABLE gs://bucket/output.csv         # Extract to GCS
bigquery tables create-external TABLE --source-uri ...       # External table
bigquery tables update-external TABLE --source-uri ...       # Update external

# Templates
bigquery templates list                              # List templates
bigquery templates search "keyword"                  # Search templates
bigquery templates validate TEMPLATE                 # Validate template
bigquery templates run TEMPLATE --param key=value    # Run template

# MCP Server
bigquery mcp stdio                   # Start MCP (stdio mode)
bigquery mcp http                    # Start MCP (HTTP mode)
bigquery mcp http --port 3000        # Custom port

# LSP Server
bigquery lsp                         # Start LSP server

Integration Examples

CI/CD Pipeline

#!/bin/bash
# daily-etl.sh

# Authenticate with service account
export GOOGLE_APPLICATION_CREDENTIALS=/secrets/sa-key.json
bigquery auth check || exit 1

# Run daily ETL template
bigquery templates run daily-etl \
  --param date=$(date +%Y-%m-%d) \
  --yes \
  --format json > /tmp/etl-result.json

# Check result
if [ $? -eq 0 ]; then
  echo "ETL completed successfully"
else
  echo "ETL failed"
  exit 1
fi

Data Quality Checks

#!/bin/bash
# check-data-quality.sh

# Run data quality template
RESULT=$(bigquery templates run data-quality-check \
  --param table=my-project.dataset.table \
  --yes \
  --format json)

# Parse result and check quality metrics
INVALID_ROWS=$(echo $RESULT | jq '.invalid_rows')

if [ "$INVALID_ROWS" -gt 100 ]; then
  echo "Data quality check failed: $INVALID_ROWS invalid rows"
  exit 1
else
  echo "Data quality check passed"
fi

Scheduled Reporting

#!/bin/bash
# generate-report.sh

# Generate weekly report
bigquery templates run weekly-revenue-report \
  --param week_start=$(date -d "last monday" +%Y-%m-%d) \
  --param week_end=$(date -d "next sunday" +%Y-%m-%d) \
  --yes \
  --format json > /reports/weekly-$(date +%Y-%m-%d).json

# Upload to GCS
gsutil cp /reports/weekly-*.json gs://reports-bucket/

Summary

Primary commands:

bigquery auth {check,login} - Authentication management
bigquery query - Execute SQL with cost awareness
bigquery dry-run - Estimate query costs
bigquery datasets list - List datasets
bigquery tables {list,describe,insert,load,extract,create-external,update-external} - Table operations
bigquery templates {list,search,validate,run} - Named templates
bigquery mcp {stdio,http} - MCP server modes
bigquery lsp - LSP server

Key features:

Cost-aware query execution with confirmation prompts
Named query templates with parameter substitution
Streaming insert API for real-time data (<1000 rows)
Bulk load jobs for large datasets (>10MB or >1000 rows)
JSONL streaming support with stdin (--data -)
Data extraction to Cloud Storage (CSV, JSON, Avro, Parquet)
External table support for GCS data
MCP server with stdio and HTTP modes
LSP integration for editor support

Best practices:

Always check authentication first with auth check
Use dry-run to estimate costs before expensive queries
Create templates for frequently-used queries
Use --yes flag for automation and CI/CD
Use insert for <1000 rows, load for larger datasets
Use JSONL format for streaming pipelines
Stream from stdin with --data - for data transformations
Use external tables to avoid data duplication
Configure MCP for natural language query capabilities
Set up LSP in your editor for SQL development

MCP Integration:

Semantic search across datasets
Natural language to SQL translation
Schema discovery and exploration
Template execution via MCP tools
Available in both stdio and HTTP modes

BigQuery CLI Skill

Core Capabilities

The bigquery CLI provides:

Authentication: Check status and login with gcloud
Query Execution: Run SQL queries with cost awareness and confirmation prompts
Dry Run: Estimate query costs without execution
Dataset Operations: List datasets in a project
Table Operations: List, describe, insert, load, and manage external tables
Template System: Named query templates with parameter substitution
MCP Server: Semantic search via stdio or HTTP modes
LSP Server: SQL language server for editor integration

Authentication

Check Authentication Status

# Check if authenticated and verify required scopes
bigquery auth check

# Will show:
# - Authentication status
# - Active account
# - BigQuery scopes availability

Login with gcloud

# Authenticate with gcloud including all required BigQuery scopes
bigquery auth login

# This will:
# 1. Run gcloud auth login
# 2. Ensure all necessary BigQuery scopes are granted
# 3. Verify authentication succeeded

Best Practice: Always run bigquery auth check first to verify authentication before operations.

Query Operations

Running Queries

# Basic query execution (interactive cost confirmation)
bigquery query "SELECT * FROM dataset.table LIMIT 10"

# Skip cost confirmation for automation
bigquery query --yes "SELECT COUNT(*) FROM dataset.table"

# JSON output (default)
bigquery query "SELECT * FROM dataset.table LIMIT 5"

# Text/table output
bigquery query --format text "SELECT * FROM dataset.table LIMIT 5"

Cost Awareness: The query command automatically:

Estimates query cost before execution
Displays bytes to be processed
Prompts for confirmation (unless --yes is used)
Prevents accidental expensive queries

Query Output Formats

# JSON output (default, machine-readable)
bigquery query "SELECT * FROM dataset.table"
bigquery query --format json "SELECT * FROM dataset.table"

# Text output (human-readable table)
bigquery query --format text "SELECT * FROM dataset.table"

Dry Run (Cost Estimation)

# Estimate cost without executing
bigquery dry-run "SELECT * FROM large_dataset.table WHERE date >= '2025-01-01'"

# Returns:
# - Bytes that would be processed
# - Estimated cost
# - No actual data

Use dry-run to:

Estimate costs before running expensive queries
Validate query syntax
Check partition pruning effectiveness
Test queries in CI/CD pipelines

Dataset Operations

Listing Datasets

# List datasets in current project (text format, default)
bigquery datasets list my-project

# JSON output
bigquery datasets list my-project --format json

# Example output shows:
# - Dataset ID
# - Location
# - Creation time
# - Labels (if any)

Note: Dataset reference format is project.dataset or just project to list all datasets.

Table Operations

Listing Tables

# List tables in a dataset (text format, first 10)
bigquery tables list my-project.my-dataset

# JSON output
bigquery tables list my-project.my-dataset --format json

# Limit results
bigquery tables list my-project.my-dataset --limit 20

# Maximum limit is 100
bigquery tables list my-project.my-dataset --limit 100

Describing Table Schema

# Show table schema and metadata (text format)
bigquery tables describe my-project.my-dataset.my-table

# JSON output
bigquery tables describe my-project.my-dataset.my-table --format json

# Output includes:
# - Column names and types
# - Nullability (NULLABLE, REQUIRED, REPEATED)
# - Mode information
# - Table metadata

Inserting Rows (Small Datasets)

Best for <1000 rows. Uses streaming insert API for immediate availability.

JSONL (Newline-Delimited JSON) Format

From JSONL File:

# Create sample JSONL file
cat > users.jsonl <<EOF
{"id": "1", "name": "Alice Johnson", "email": "[email protected]", "age": 30}
{"id": "2", "name": "Bob Smith", "email": "[email protected]", "age": 25}
{"id": "3", "name": "Charlie Brown", "email": "[email protected]", "age": 35}
EOF

# Insert from JSONL file
bigquery tables insert my-project.dataset.users \
  --data users.jsonl --format json

From JSONL Stream (stdin):

# Stream from command output
echo '{"id": "1", "name": "Alice", "email": "[email protected]"}' | \
  bigquery tables insert my-project.dataset.users --data - --format json

# Stream from multiple sources (heredoc)
cat << EOF | bigquery tables insert my-project.dataset.users --data - --format json
{"id": "1", "name": "Alice", "email": "[email protected]", "age": 30}
{"id": "2", "name": "Bob", "email": "[email protected]", "age": 25}
{"id": "3", "name": "Charlie", "email": "[email protected]", "age": 35}
EOF

# Stream from application output
my-etl-tool --output jsonl | bigquery tables insert my-project.dataset.events --data -

# Stream from compressed file
gunzip -c logs.jsonl.gz | bigquery tables insert my-project.dataset.logs --data -

# Stream from jq transformation
cat raw_data.json | jq -c '.records[]' | \
  bigquery tables insert my-project.dataset.processed --data -

JSONL Format Requirements:

Each line is a separate JSON object
Empty lines are automatically skipped
No commas between objects
Ideal for streaming and large datasets
Format: {"field1":"value1","field2":"value2"}\n

CSV Format

From CSV File:

# Create sample CSV file
cat > users.csv <<EOF
id,name,email,age
1,Alice Johnson,[email protected],30
2,Bob Smith,[email protected],25
3,"Charlie Brown, Jr.",[email protected],35
EOF

# Insert from CSV file
bigquery tables insert my-project.dataset.users \
  --data users.csv --format csv

From CSV Stream (stdin):

# Stream from heredoc
cat << EOF | bigquery tables insert my-project.dataset.users --data - --format csv
id,name,email,age
1,Alice Johnson,[email protected],30
2,Bob Smith,[email protected],25
3,Charlie Brown,[email protected],35
EOF

# Stream from application output
./generate_report.sh | bigquery tables insert my-project.dataset.reports --data - --format csv

# Stream from compressed CSV
gunzip -c data.csv.gz | bigquery tables insert my-project.dataset.imports --data -

# Stream from curl/API response
curl -s https://api.example.com/export.csv | \
  bigquery tables insert my-project.dataset.api_data --data - --format csv

# Transform and stream CSV
cat raw.csv | tail -n +2 | awk '{print tolower($0)}' | \
  bigquery tables insert my-project.dataset.cleaned --data - --format csv

CSV Format Requirements:

First row must contain column headers matching BigQuery table schema
Values are inserted as strings (BigQuery will coerce types)
Supports quoted fields, escaped quotes, and newlines (RFC 4180 compliant)
Headers are case-sensitive and must match table column names

Additional Insert Options

# Insert inline JSON (single object)
bigquery tables insert my-project.dataset.users \
  --json '{"id": "1", "name": "Alice", "email": "[email protected]"}'

# Insert inline JSON array
bigquery tables insert my-project.dataset.users \
  --json '[{"id": "1", "name": "Alice"}, {"id": "2", "name": "Bob"}]'

# Dry-run validation (no data inserted)
bigquery tables insert my-project.dataset.users \
  --data users.csv --format csv --dry-run

# Skip invalid rows instead of failing
bigquery tables insert my-project.dataset.users \
  --data users.csv --format csv --skip-invalid

# Ignore unknown fields in data
bigquery tables insert my-project.dataset.users \
  --data users.csv --format csv --ignore-unknown

# Combine options for production pipelines
cat production_data.jsonl | \
  bigquery tables insert my-project.dataset.production \
    --data - --format json \
    --skip-invalid \
    --ignore-unknown

Insert Options:

--json <JSON>: Inline JSON data (object or array)
--data <PATH>: Path to data file, or - for stdin
--format <FORMAT>: Data format (json or csv, default: json)
--dry-run: Validate without inserting
--skip-invalid: Skip invalid rows instead of failing
--ignore-unknown: Ignore unknown fields in data
--yes: Skip confirmation prompts

Loading Data (Large Datasets)

Best for >10MB files or >1000 rows. Uses BigQuery load jobs.

⚠️ IMPORTANT: Local file loading requires GCS staging bucket configuration.

If you get "The specified bucket does not exist" error, use tables insert for datasets <1000 rows instead
For larger datasets, upload to GCS first, then use bigquery tables load gs://...

# Load from Cloud Storage URI (RECOMMENDED - no bucket config needed)
bigquery tables load my-project.dataset.users \
  gs://my-bucket/data.csv --format csv

# Load from local CSV file (requires GCS staging bucket configured)
bigquery tables load my-project.dataset.users data.csv --format csv

# Load with schema auto-detection
bigquery tables load my-project.dataset.new_table data.csv \
  --format csv --autodetect

# Load with replace write disposition (truncates table first)
bigquery tables load my-project.dataset.users data.csv \
  --format csv --write-disposition replace

# Load JSON file
bigquery tables load my-project.dataset.events events.json \
  --format json

# Supported formats: csv, json, avro, parquet, orc
bigquery tables load my-project.dataset.table data.parquet \
  --format parquet

# Dry-run validation (no data loaded)
bigquery tables load my-project.dataset.users data.csv \
  --format csv --dry-run

# Allow some bad records (skip up to 100 invalid rows)
bigquery tables load my-project.dataset.users data.csv \
  --format csv --max-bad-records 100

# Ignore unknown fields
bigquery tables load my-project.dataset.users data.csv \
  --format csv --ignore-unknown

# Skip confirmation prompts (for automation/CI)
bigquery tables load my-project.dataset.users data.csv \
  --format csv --write-disposition replace --yes

Load Job Features:

GCS Staging Bucket Required: Local file loading needs GCS bucket configuration
Real-time progress tracking with exponential backoff
Automatic cleanup of temporary files after completion
Write modes: append (default) or replace (truncate first)
Safety confirmations for destructive operations
Configurable error tolerance with --max-bad-records

When to Use:

Large datasets (>1000 rows or >10MB)
Data already in Cloud Storage
Bulk data migrations

When NOT to Use:

Small datasets (<1000 rows) → Use tables insert instead (no GCS required)
Don't have GCS staging bucket configured → Use tables insert or upload to GCS first

Load Options:

--format <FORMAT>: csv, json, avro, parquet, orc (default: csv)
--write-disposition <DISPOSITION>: append or replace (default: append)
--autodetect: Auto-detect schema from source files
--dry-run: Validate without loading
--max-bad-records <N>: Maximum bad records before failing
--ignore-unknown: Ignore unknown fields
--yes: Skip confirmation prompts

Extracting Data

Export table data to Cloud Storage in various formats:

# Extract table to Cloud Storage as CSV
bigquery tables extract my-project.dataset.users \
  gs://my-bucket/exports/users.csv --format csv

# Extract as JSON
bigquery tables extract my-project.dataset.events \
  gs://my-bucket/exports/events-*.json --format json

# Extract with compression
bigquery tables extract my-project.dataset.large_table \
  gs://my-bucket/exports/data-*.csv.gz --format csv --compression gzip

# Extract as Avro with Snappy compression
bigquery tables extract my-project.dataset.events \
  gs://my-bucket/exports/events-*.avro --format avro --compression snappy

# Extract as Parquet
bigquery tables extract my-project.dataset.analytics \
  gs://my-bucket/exports/analytics.parquet --format parquet

# CSV with custom delimiter and header
bigquery tables extract my-project.dataset.data \
  gs://my-bucket/data.csv \
  --format csv \
  --field-delimiter "|" \
  --print-header

# Dry-run to validate configuration
bigquery tables extract my-project.dataset.users \
  gs://my-bucket/users.csv --format csv --dry-run

# Skip confirmation prompt
bigquery tables extract my-project.dataset.large \
  gs://my-bucket/export.csv --format csv --yes

Supported Formats: CSV, JSON (newline-delimited), Avro, Parquet Compression: none, gzip, snappy (Avro/Parquet only)

External Tables

External tables reference data in Cloud Storage without copying it to BigQuery.

Creating External Tables

# Create CSV external table
bigquery tables create-external my-project.dataset.external_table \
  --source-uri gs://bucket/data.csv \
  --format csv \
  --schema "id:INTEGER,name:STRING,created_at:TIMESTAMP"

# Create with auto-detected schema
bigquery tables create-external my-project.dataset.external_table \
  --source-uri gs://bucket/data.csv \
  --format csv \
  --autodetect

# Multiple source URIs (comma-separated)
bigquery tables create-external my-project.dataset.external_table \
  --source-uri "gs://bucket/file1.csv,gs://bucket/file2.csv" \
  --format csv \
  --autodetect

# Multiple source URIs (multiple flags)
bigquery tables create-external my-project.dataset.external_table \
  --source-uri gs://bucket/file1.csv \
  --source-uri gs://bucket/file2.csv \
  --format csv \
  --autodetect

# CSV-specific options
bigquery tables create-external my-project.dataset.external_table \
  --source-uri gs://bucket/data.csv \
  --format csv \
  --schema "id:INTEGER,name:STRING" \
  --field-delimiter "," \
  --skip-leading-rows 1

# Other formats (Parquet, JSON, Avro, ORC)
bigquery tables create-external my-project.dataset.parquet_table \
  --source-uri gs://bucket/data.parquet \
  --format parquet \
  --autodetect

bigquery tables create-external my-project.dataset.json_table \
  --source-uri gs://bucket/data.jsonl \
  --format json \
  --autodetect

External Table Options:

--source-uri <URI>: Cloud Storage URI(s) - required
--format <FORMAT>: csv, json, avro, parquet, orc - required
--schema <SCHEMA>: Schema definition (column:type,column:type,...)
--autodetect: Auto-detect schema from source files
--field-delimiter <DELIMITER>: CSV field delimiter (default: ,)
--skip-leading-rows <N>: CSV header rows to skip

Updating External Tables

# Update source URIs
bigquery tables update-external my-project.dataset.external_table \
  --source-uri gs://bucket/new-data.csv

# Update schema
bigquery tables update-external my-project.dataset.external_table \
  --schema "id:INTEGER,name:STRING,email:STRING"

# Update CSV options
bigquery tables update-external my-project.dataset.external_table \
  --field-delimiter "|" \
  --skip-leading-rows 2

# Update multiple properties
bigquery tables update-external my-project.dataset.external_table \
  --source-uri gs://bucket/new-data.csv \
  --schema "id:INTEGER,name:STRING,updated_at:TIMESTAMP" \
  --skip-leading-rows 1

Template System

Named query templates allow you to save frequently-used queries with parameter placeholders.

Listing Templates

# List all available templates (text format)
bigquery templates list

# JSON output
bigquery templates list --format json

# Shows:
# - Template name
# - Description
# - Parameters
# - Query preview

Searching Templates

# Search by name or description
bigquery templates search "customer"
bigquery templates search "daily metrics"

# JSON output
bigquery templates search "analytics" --format json

Validating Templates

# Validate template for parameter consistency
bigquery templates validate my-template

# Checks:
# - Parameter definitions match query placeholders
# - Required parameters are defined
# - Parameter types are valid

Running Templates

# Run template with default parameters
bigquery templates run my-template

# Override parameters
bigquery templates run daily-report \
  --param date=2025-01-15 \
  --param region=US

# Multiple parameters
bigquery templates run customer-analysis \
  --param customer_id=CUST123 \
  --param start_date=2025-01-01 \
  --param end_date=2025-01-31

# JSON output
bigquery templates run my-template --format json

# Skip cost confirmation
bigquery templates run expensive-query --yes

Template Run Options:

--param <KEY=VALUE>: Parameter override (can be used multiple times)
--format <FORMAT>: Output format (json or text, default: json)
--yes: Skip cost confirmation prompt

Template Workflow Example

# 1. Search for templates
bigquery templates search "revenue"

# 2. Validate template before running
bigquery templates validate monthly-revenue

# 3. Run with parameters
bigquery templates run monthly-revenue \
  --param month=2025-01 \
  --param min_amount=1000

# 4. Run in automation (skip confirmation)
bigquery templates run monthly-revenue \
  --param month=2025-01 \
  --yes \
  --format json > output.json

Use templates for:

Standardized reporting queries
Common analytics patterns
Scheduled data pipelines
Team query sharing
Reducing query errors

MCP Server Integration

The BigQuery MCP server provides semantic search and natural language query capabilities via Model Context Protocol.

Starting MCP Server

STDIO Mode (for local clients):

# Start MCP server in stdio mode
bigquery mcp stdio

# Server will:
# - Accept MCP protocol messages on stdin
# - Send responses on stdout
# - Expose BigQuery tools to MCP clients

HTTP Mode (for network clients):

# Start HTTP MCP server on default port 8080
bigquery mcp http

# Specify custom port
bigquery mcp http --port 3000

# Server provides:
# - HTTP endpoint for MCP protocol
# - JSON-RPC over HTTP
# - Remote access to BigQuery tools

MCP Server Capabilities

The MCP server exposes these tools through the Model Context Protocol:

semantic_search: Search tables using natural language
execute_query: Run SQL queries with automatic formatting
get_schema: Retrieve table schemas
list_tables: List available tables
list_datasets: List available datasets
explain_query: Get query execution plan
optimize_query: Suggest query optimizations
run_template: Execute named templates with parameters

MCP Configuration

Configure in Claude Code or other MCP-enabled applications:

STDIO Mode (.claude/mcp.json or similar):

{
  "mcpServers": {
    "bigquery": {
      "command": "bigquery",
      "args": ["mcp", "stdio"],
      "env": {
        "GOOGLE_CLOUD_PROJECT": "my-project"
      }
    }
  }
}

HTTP Mode:

{
  "mcpServers": {
    "bigquery": {
      "url": "http://localhost:8080",
      "transport": "http"
    }
  }
}

MCP Usage Patterns

When using BigQuery MCP through clients:

Semantic Search:

"Find all tables containing customer purchase data from the last 30 days"
→ MCP translates to appropriate SQL query

Schema Discovery:

"What columns are in the analytics.events table?"
→ MCP returns schema information

Natural Language Queries:

"Show me total revenue by region for Q1 2025"
→ MCP generates and executes SQL

Template Execution:

"Run the monthly revenue template for January 2025"
→ MCP executes template with parameters

LSP Integration

The BigQuery LSP provides SQL language features in text editors.

Starting LSP Server

# Start LSP server
bigquery lsp

# Server provides:
# - Language Server Protocol communication
# - SQL syntax validation
# - Schema-aware completions
# - Query formatting
# - Hover documentation

LSP Features

SQL syntax highlighting: Proper tokenization and highlighting
Schema completion: Table and column suggestions based on project schema
Query validation: Real-time syntax and semantic checks
Hover documentation: Table and column info on hover
Go to definition: Navigate to table definitions
Query formatting: Auto-format SQL queries
Diagnostics: Show errors and warnings inline

Editor Configuration

Neovim:

-- In nvim/lua/bigquery-lsp.lua or init.lua
vim.api.nvim_create_autocmd("FileType", {
  pattern = { "sql", "bq", "bigquery" },
  callback = function()
    vim.lsp.start({
      name = "bigquery-lsp",
      cmd = { "bigquery", "lsp" },
      root_dir = vim.fn.getcwd(),
    })
  end,
})

VS Code (in settings.json or language server config):

{
  "bigquery-lsp": {
    "command": "bigquery",
    "args": ["lsp"],
    "filetypes": ["sql", "bq", "bigquery"]
  }
}

Helix (in languages.toml):

[[language]]
name = "sql"
language-servers = ["bigquery-lsp"]

[language-server.bigquery-lsp]
command = "bigquery"
args = ["lsp"]

Common Workflows

Workflow 1: Exploratory Data Analysis

# 1. Verify authentication
bigquery auth check

# 2. List available datasets
bigquery datasets list my-project

# 3. List tables in dataset
bigquery tables list my-project.analytics

# 4. Check table schema
bigquery tables describe my-project.analytics.events

# 5. Preview data (text format for readability)
bigquery query --format text \
  "SELECT * FROM my-project.analytics.events LIMIT 10"

# 6. Get row count
bigquery query "SELECT COUNT(*) as total FROM my-project.analytics.events"

# 7. Check data distribution
bigquery query --format text "
  SELECT
    DATE(timestamp) as date,
    COUNT(*) as events
  FROM my-project.analytics.events
  GROUP BY date
  ORDER BY date DESC
  LIMIT 30
"

Workflow 2: Cost-Aware Query Development

# 1. Dry run to estimate cost
bigquery dry-run "
  SELECT *
  FROM my-project.large_dataset.table
  WHERE date >= '2025-01-01'
"

# 2. If cost is acceptable, run query
bigquery query "
  SELECT *
  FROM my-project.large_dataset.table
  WHERE date >= '2025-01-01'
"

# 3. For automation, skip confirmation
bigquery query --yes "
  SELECT *
  FROM my-project.large_dataset.table
  WHERE date >= '2025-01-01'
" > results.json

Workflow 3: Template-Based Reporting

# 1. Search for relevant templates
bigquery templates search "daily"

# 2. Validate template
bigquery templates validate daily-metrics

# 3. Run template with parameters
bigquery templates run daily-metrics \
  --param date=$(date +%Y-%m-%d) \
  --param region=US \
  --format json > daily-report.json

# 4. Schedule in cron or CI/CD
# 0 1 * * * bigquery templates run daily-metrics --param date=$(date +%Y-%m-%d) --yes

Workflow 4: External Data Analysis

# 1. Create external table pointing to GCS
bigquery tables create-external my-project.staging.raw_logs \
  --source-uri gs://logs-bucket/2025-01-*.json \
  --format json \
  --autodetect

# 2. Query external table
bigquery query "
  SELECT
    timestamp,
    user_id,
    action
  FROM my-project.staging.raw_logs
  WHERE action = 'purchase'
  LIMIT 100
"

# 3. Update external table when new files arrive
bigquery tables update-external my-project.staging.raw_logs \
  --source-uri gs://logs-bucket/2025-02-*.json

Workflow 5: Data Loading Pipeline

# 1. Load initial data
bigquery tables load my-project.dataset.events \
  gs://bucket/events-2025-01-01.csv \
  --format csv \
  --write-disposition replace

# 2. Append incremental data
bigquery tables load my-project.dataset.events \
  gs://bucket/events-2025-01-02.csv \
  --format csv \
  --write-disposition append

# 3. Verify data loaded
bigquery query "
  SELECT
    DATE(timestamp) as date,
    COUNT(*) as count
  FROM my-project.dataset.events
  GROUP BY date
  ORDER BY date
"

Workflow 6: Real-Time Data Insertion

# 1. Insert single event (inline JSON)
bigquery tables insert my-project.dataset.events \
  --json '{"user_id": "U123", "event": "click", "timestamp": "2025-01-15T10:00:00Z"}'

# 2. Stream JSONL from application
my-app --output jsonl | bigquery tables insert my-project.dataset.events --data - --format json

# 3. Insert batch from JSONL file
bigquery tables insert my-project.dataset.events \
  --data events.jsonl --format json

# 4. Stream with transformation and error handling
cat raw_events.json | jq -c '.events[]' | \
  bigquery tables insert my-project.dataset.events \
    --data - --format json \
    --skip-invalid \
    --ignore-unknown

Best Practices

Query Development

Always dry-run first: Use bigquery dry-run to estimate costs
Use templates: Create templates for repeated queries
Validate before running: Check syntax and cost before execution
Use text format for exploration: --format text for human-readable tables
Use JSON for automation: --format json for machine processing
Skip confirmations in scripts: Use --yes flag for automation

Cost Management

Dry run expensive queries: Always estimate with bigquery dry-run
Monitor bytes processed: Check query cost estimates before running
Use partition pruning: Filter on partitioned columns in WHERE clauses
Limit result sets: Use LIMIT for exploratory queries
Use templates: Standardize queries to avoid mistakes
Leverage external tables: Avoid copying data when querying directly from GCS

Authentication

Check auth first: Run bigquery auth check before operations
Use service accounts: For automation and CI/CD
Verify scopes: Ensure all required BigQuery scopes are granted
Re-authenticate when needed: bigquery auth login if check fails

Template Management

Use descriptive names: Make templates easy to find
Document parameters: Include parameter descriptions in templates
Validate before use: Run bigquery templates validate before execution
Search before creating: Check if similar template exists
Version control templates: Store template definitions in git

Data Loading

Choose the right method:
- Use insert for <1000 rows (streaming insert API, immediate availability)
- Use load for >10MB files or >1000 rows (load jobs with GCS upload)
Use JSONL for streaming: Newline-delimited JSON is ideal for streaming pipelines
Stream from stdin: Use --data - to pipe data from applications or transformations
Validate before loading: Use --dry-run flag to test configurations
Handle bad records: Set --max-bad-records for messy data
Choose write disposition: replace for full refresh, append for incremental
Use external tables: For data that changes frequently in GCS (no data copying)
Use appropriate formats: CSV for simple data, JSON/JSONL for complex, Parquet/Avro for large datasets

MCP Server

Use stdio for local: Prefer stdio mode for local MCP clients
Use HTTP for remote: Use HTTP mode for networked deployments
Secure HTTP endpoints: Put HTTP server behind authentication/firewall
Monitor server logs: Check for errors and performance issues
Set appropriate port: Choose non-conflicting port for HTTP mode

LSP Integration

Configure per-project: Set up LSP for SQL files in your editor
Use schema completion: Leverage auto-complete for table/column names
Check diagnostics: Fix errors and warnings shown inline
Format queries: Use LSP formatting for consistent style

Configuration

Environment Variables

# Set default project
export GOOGLE_CLOUD_PROJECT=my-project

# Set credentials (for service accounts)
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json

# Add to ~/.zshrc or ~/.bashrc for persistence
echo 'export GOOGLE_CLOUD_PROJECT=my-project' >> ~/.zshrc

Authentication Methods

User Credentials (interactive):

bigquery auth login
# Opens browser for Google authentication

Service Account (automation):

export GOOGLE_APPLICATION_CREDENTIALS=/path/to/sa-key.json
bigquery auth check

Application Default Credentials (gcloud):

gcloud auth application-default login
bigquery auth check

Troubleshooting

Issue: "Not authenticated" or "Permission denied"

Solution: Check authentication and scopes

# Check current auth status
bigquery auth check

# Re-authenticate if needed
bigquery auth login

# Verify gcloud is set to correct project
gcloud config get-value project

# Set project if needed
gcloud config set project my-project

Issue: "Table not found"

Solution: Use fully qualified table names

# Wrong - missing project/dataset
bigquery query "SELECT * FROM table"

# Correct - fully qualified
bigquery query "SELECT * FROM my-project.my-dataset.my-table"

# Or use backticks for reserved words
bigquery query "SELECT * FROM \`my-project.my-dataset.my-table\`"

Issue: "Query too expensive"

Solution: Check cost with dry-run and optimize

# Check estimated cost
bigquery dry-run "SELECT * FROM large_table WHERE date >= '2025-01-01'"

# Optimize with partition filters
bigquery dry-run "
  SELECT * FROM large_table
  WHERE _PARTITIONDATE = '2025-01-15'
"

Issue: "Template not found"

Solution: Search for templates and verify name

# List all templates
bigquery templates list

# Search for template
bigquery templates search "keyword"

# Use exact template name
bigquery templates run exact-template-name

Issue: "The specified bucket does not exist"

Cause: bigquery tables load with a local file requires GCS staging bucket configuration.

Solutions:

Preferred for small datasets (<1000 rows): Use tables insert instead (no GCS required)

bigquery tables insert my-project.dataset.table \
  --data /tmp/data.jsonl \
  --format json

For larger datasets: Upload to GCS first, then load

gsutil cp /tmp/large-file.jsonl gs://my-bucket/
bigquery tables load my-project.dataset.table \
  gs://my-bucket/large-file.jsonl \
  --format json

Last resort: Configure GCS staging bucket in BigQuery CLI config (requires additional setup)

Issue: "Invalid schema"

Solution: Check schema format for external tables

# Schema format: column:type,column:type,...
bigquery tables create-external my-project.dataset.table \
  --source-uri gs://bucket/file.csv \
  --format csv \
  --schema "id:INTEGER,name:STRING,created_at:TIMESTAMP"

# Or use autodetect
bigquery tables create-external my-project.dataset.table \
  --source-uri gs://bucket/file.csv \
  --format csv \
  --autodetect

Issue: "MCP server not responding"

Solution: Check server mode and connectivity

# For stdio mode, ensure client is using stdio transport
bigquery mcp stdio

# For HTTP mode, check port and firewall
bigquery mcp http --port 8080

# Test HTTP endpoint
curl http://localhost:8080

Issue: "LSP not starting in editor"

Solution: Verify LSP configuration and binary path

# Check bigquery is in PATH
which bigquery

# Test LSP manually
bigquery lsp

# Verify editor configuration points to correct command
# Neovim: check cmd = { "bigquery", "lsp" }
# VS Code: check "command": "bigquery", "args": ["lsp"]

Quick Reference

# Authentication
bigquery auth check                          # Check auth status
bigquery auth login                          # Login with gcloud

# Queries
bigquery query "SELECT ..."                  # Execute query
bigquery query --yes "SELECT ..."            # Skip confirmation
bigquery query --format text "SELECT ..."    # Table output
bigquery dry-run "SELECT ..."                # Estimate cost

# Datasets
bigquery datasets list PROJECT               # List datasets

# Tables
bigquery tables list PROJECT.DATASET                          # List tables
bigquery tables describe PROJECT.DATASET.TABLE               # Show schema
bigquery tables insert TABLE --json '{"id": 1}'              # Insert rows (inline)
bigquery tables insert TABLE --data file.jsonl --format json # Insert from JSONL
cat data.jsonl | bigquery tables insert TABLE --data -       # Stream insert
bigquery tables load TABLE file.csv                          # Load data (bulk)
bigquery tables load TABLE gs://bucket/file.csv              # Load from GCS
bigquery tables extract TABLE gs://bucket/output.csv         # Extract to GCS
bigquery tables create-external TABLE --source-uri ...       # External table
bigquery tables update-external TABLE --source-uri ...       # Update external

# Templates
bigquery templates list                              # List templates
bigquery templates search "keyword"                  # Search templates
bigquery templates validate TEMPLATE                 # Validate template
bigquery templates run TEMPLATE --param key=value    # Run template

# MCP Server
bigquery mcp stdio                   # Start MCP (stdio mode)
bigquery mcp http                    # Start MCP (HTTP mode)
bigquery mcp http --port 3000        # Custom port

# LSP Server
bigquery lsp                         # Start LSP server

Integration Examples

CI/CD Pipeline

#!/bin/bash
# daily-etl.sh

# Authenticate with service account
export GOOGLE_APPLICATION_CREDENTIALS=/secrets/sa-key.json
bigquery auth check || exit 1

# Run daily ETL template
bigquery templates run daily-etl \
  --param date=$(date +%Y-%m-%d) \
  --yes \
  --format json > /tmp/etl-result.json

# Check result
if [ $? -eq 0 ]; then
  echo "ETL completed successfully"
else
  echo "ETL failed"
  exit 1
fi

Data Quality Checks

#!/bin/bash
# check-data-quality.sh

# Run data quality template
RESULT=$(bigquery templates run data-quality-check \
  --param table=my-project.dataset.table \
  --yes \
  --format json)

# Parse result and check quality metrics
INVALID_ROWS=$(echo $RESULT | jq '.invalid_rows')

if [ "$INVALID_ROWS" -gt 100 ]; then
  echo "Data quality check failed: $INVALID_ROWS invalid rows"
  exit 1
else
  echo "Data quality check passed"
fi

Scheduled Reporting

#!/bin/bash
# generate-report.sh

# Generate weekly report
bigquery templates run weekly-revenue-report \
  --param week_start=$(date -d "last monday" +%Y-%m-%d) \
  --param week_end=$(date -d "next sunday" +%Y-%m-%d) \
  --yes \
  --format json > /reports/weekly-$(date +%Y-%m-%d).json

# Upload to GCS
gsutil cp /reports/weekly-*.json gs://reports-bucket/

Summary

Primary commands:

bigquery auth {check,login} - Authentication management
bigquery query - Execute SQL with cost awareness
bigquery dry-run - Estimate query costs
bigquery datasets list - List datasets
bigquery tables {list,describe,insert,load,extract,create-external,update-external} - Table operations
bigquery templates {list,search,validate,run} - Named templates
bigquery mcp {stdio,http} - MCP server modes
bigquery lsp - LSP server

Key features:

Cost-aware query execution with confirmation prompts
Named query templates with parameter substitution
Streaming insert API for real-time data (<1000 rows)
Bulk load jobs for large datasets (>10MB or >1000 rows)
JSONL streaming support with stdin (--data -)
Data extraction to Cloud Storage (CSV, JSON, Avro, Parquet)
External table support for GCS data
MCP server with stdio and HTTP modes
LSP integration for editor support

Best practices:

Always check authentication first with auth check
Use dry-run to estimate costs before expensive queries
Create templates for frequently-used queries
Use --yes flag for automation and CI/CD
Use insert for <1000 rows, load for larger datasets
Use JSONL format for streaming pipelines
Stream from stdin with --data - for data transformations
Use external tables to avoid data duplication
Configure MCP for natural language query capabilities
Set up LSP in your editor for SQL development

MCP Integration:

Semantic search across datasets
Natural language to SQL translation
Schema discovery and exploration
Template execution via MCP tools
Available in both stdio and HTTP modes

Adoption

lanej/bigquery

$ install --global

Security Scan Results

SKILL.md

BigQuery CLI Skill

Core Capabilities

Authentication

Check Authentication Status

Login with gcloud

Query Operations

Running Queries

Query Output Formats

Dry Run (Cost Estimation)

Dataset Operations

Listing Datasets

Table Operations

Listing Tables

Describing Table Schema

Inserting Rows (Small Datasets)

JSONL (Newline-Delimited JSON) Format

CSV Format

Additional Insert Options

Loading Data (Large Datasets)

Extracting Data

External Tables

Creating External Tables

Updating External Tables

Template System

Listing Templates

Searching Templates

Validating Templates

Running Templates

Template Workflow Example

MCP Server Integration

Starting MCP Server

MCP Server Capabilities

MCP Configuration

MCP Usage Patterns

LSP Integration

Starting LSP Server

LSP Features

Editor Configuration

Common Workflows

Workflow 1: Exploratory Data Analysis

Workflow 2: Cost-Aware Query Development

Workflow 3: Template-Based Reporting

Workflow 4: External Data Analysis

Workflow 5: Data Loading Pipeline

Workflow 6: Real-Time Data Insertion

Best Practices

Query Development

Cost Management

Authentication

Template Management

Data Loading

MCP Server

LSP Integration

Configuration

Environment Variables

Authentication Methods

Troubleshooting

Issue: "Not authenticated" or "Permission denied"

Issue: "Table not found"

Issue: "Query too expensive"

Issue: "Template not found"

Issue: "The specified bucket does not exist"

Issue: "Invalid schema"

Issue: "MCP server not responding"

Issue: "LSP not starting in editor"

Quick Reference

Integration Examples

CI/CD Pipeline

Data Quality Checks

Scheduled Reporting

Summary

Related Skills

lanej/dora

lanej/data-pipeline

lanej/research