skills/bigquery/SKILL.md
Run ad-hoc SQL against Mozilla telemetry tables using the bq CLI. Use when the task needs raw BigQuery access — dry-run cost checks, custom SQL, pulling DAU/MAU/Windows-distribution/Glean-metric data into a script. DO NOT USE FOR saved/shared queries with visualizations (use redash) or finding which probe to query (use mozdata:probe-discovery).
npx skillsauth add jwmossmoz/agent-skills bigqueryInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Query Mozilla telemetry data directly using the bq CLI.
gcloud and bq CLI installed (brew install google-cloud-sdk)gcloud auth login with a Mozilla accountbigquery.jobs.create on# Check current account
gcloud config get-value account
# Re-authenticate if needed
gcloud auth login
# List available projects
gcloud projects list --format="table(projectId,name)"
# Set billing project (mozdata is the standard choice)
gcloud config set project mozdata
If queries fail with "Access Denied", the billing project likely lacks permissions. Try --project_id=mozdata.
# Basic query
bq query --project_id=mozdata --use_legacy_sql=false --format=pretty "SELECT ..."
# Dry run (check cost before executing)
bq query --project_id=mozdata --use_legacy_sql=false --dry_run "SELECT ..."
Always use --project_id=mozdata and --use_legacy_sql=false.
Choose the right table — this is the most important optimization:
| Query Type | Table | Why |
|------------|-------|-----|
| Windows version distribution | telemetry.windows_10_aggregate | Pre-aggregated, instant |
| DAU/MAU by standard dimensions | firefox_desktop_derived.active_users_aggregates_v3 | Pre-computed, 100x faster |
| DAU with custom dimensions | firefox_desktop.baseline_clients_daily | One row per client per day |
| MAU/WAU/retention | firefox_desktop.baseline_clients_last_seen | Bit patterns, scan 1 day not 28 |
| Event analysis | firefox_desktop.events_stream | Pre-unnested, clustered |
| Mobile search | search.mobile_search_clients_daily_v2 | Pre-aggregated |
| Specific Glean metric | firefox_desktop.metrics | Raw metrics ping |
All tables are in the moz-fx-data-shared-prod project. Fully qualify as `moz-fx-data-shared-prod.{dataset}.{table}`.
submission_date or DATE(submission_timestamp)sample_id = 0 for development (1% sample) — remove for productionclient_id, not actual humansevents_stream for events — never raw events_v1 (requires UNNEST)baseline_clients_last_seen for MAU — bit patterns, scan 1 day not 28bigquery.jobs.create. Switch with gcloud config set project mozdata or pass --project_id=mozdata per-query.bq defaults to legacy SQL. Always pass --use_legacy_sql=false — leaving it off silently changes the dialect mid-script.events_v1 directly; use events_stream. Never join across products by client_id — each product has its own namespace.baseline_clients_last_seen bit patterns (1-day scan), not 28 days of baseline_clients_daily aggregates.client_id is not a person. Say "clients" in user-facing text, not "users".references/tables.md — Detailed table schemas and common query patternsreferences/os-versions.md — Windows, macOS, and Linux version distribution queries with build number, Darwin, and kernel version mappingsdevelopment
Download Azure Cost Management exports and query local Parquet/CSV in DuckDB. Use when refreshing local Azure cost caches or writing DuckDB SQL over exports. DO NOT USE FOR live Cost Management API diagnosis; use azure-cost-analysis.
data-ai
Use when creating performance self-reviews from local notes, prior reviews, review prompts, and verified evidence. Helps draft H1/H2, annual, and promotion self evaluations, example answers, and rich review-form paste output. Do not use for routine status or 1:1 summaries; use one-on-one.
tools
Prepare one-on-one/status bullets from ~/moz_artifacts using qmd and copy a topic-organized HTML/RTF list with embedded links to the macOS clipboard. Use when summarizing recent Mozilla work for a manager, 1:1, or status update. DO NOT USE FOR generating raw daily logs; use daily-log.
development
Use when tracing Taskcluster Azure VM startup from worker-manager request through in-VM boot scripts to generic-worker `workerReady` with tc-logview, paperctl, Splunk Web, and Yardstick Prometheus. Applies to Windows worker provisioning latency. DO NOT USE FOR task failure triage (use worker-image-investigation).