skills/sm-bigquery-analyst/SKILL.md
Use this skill when an operator wants to analyze SourceMedium-hosted BigQuery data, verify BigQuery access, discover available SourceMedium tables or semantic metrics, generate safe SELECT-only SQL, produce auditable SQL receipts, or join their own warehouse tables to SourceMedium data. Do not use for DDL, DML, infrastructure provisioning, dbt authoring, or cross-tenant joins.
npx skillsauth add source-medium/skills sm-bigquery-analystInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Help operators work with SourceMedium BigQuery data from setup to analysis. Prefer deterministic CLI checks and scripts over guessing. Never fabricate data when access, metadata, or query execution fails.
SourceMedium hosts analytics-ready ecommerce warehouse data in BigQuery. A cold agent should treat the warehouse as discoverable, not memorized:
sm_metadata.dim_data_dictionary tells you which tables exist, what they mean, whether they have data, and freshness.sm_metadata.dim_semantic_metric_catalog resolves metric names, labels, aliases, and calculations.sm_transformed_v2 contains core modeled tables for orders, order lines, customers, ad performance, cohorts, events, and executive summaries.sm_experimental contains experimental attribution/MTA tables when enabled.sm-<tenant_id>, but some hosted/shared layouts use custom dataset names. Use script dataset flags when defaults fail.gcloud CLIbq CLIscripts/sm_bq_doctor.py.INFORMATION_SCHEMA, and value-distribution queries.sm_metadata, sm_transformed_v2, and when needed sm_experimental.scripts/sm_bq_query.py or equivalent bq commands.Preferred:
python scripts/sm_bq_doctor.py --project sm-tenant-id
# If the warehouse uses tenant-prefixed datasets:
python scripts/sm_bq_doctor.py --project project-id --metadata-dataset tenant_sm_metadata --transformed-dataset tenant_sm_transformed_v2
If scripts are unavailable, run the equivalent manual checks:
gcloud --version
bq version
gcloud auth list
gcloud config get-value project
bq query --use_legacy_sql=false --dry_run 'SELECT 1 AS ok'
Then test SourceMedium metadata and one readable table in the active project. If any step fails, read references/TROUBLESHOOTING.md and use assets/BIGQUERY_ACCESS_REQUEST_TEMPLATE.md.
Before analytical SQL, discover available domains and freshness:
python scripts/sm_bq_discover.py --project sm-tenant-id --tables --metrics
# For focused metric discovery:
python scripts/sm_bq_discover.py --project sm-tenant-id --metrics --metric-search revenue
# If the warehouse uses tenant-prefixed datasets:
python scripts/sm_bq_discover.py --project project-id --metadata-dataset tenant_sm_metadata --transformed-dataset tenant_sm_transformed_v2 --tables --metrics
Read only the reference files needed for the task:
references/SCHEMA.md — SourceMedium datasets, key tables, column conventions, and docs links.references/ANALYSIS_SEMANTICS.md — table selection, metric resolution, revenue/refund semantics, channel, subscription, marketing, and data-health rules.references/QUERY_PATTERNS.md — common SourceMedium SQL, domain discovery, metric catalog, LTV/cohort, freshness, and value discovery.references/CUSTOM_DATA.md — customer-owned table documentation and safe join patterns.references/TROUBLESHOOTING.md — setup, auth, permission, query, and cost failures.These are hard constraints. Do not bypass.
bq query --use_legacy_sql=false --dry_run '<SQL>'
bq query --use_legacy_sql=false --maximum_bytes_billed=1073741824 '<SQL>'
(1GB = 1073741824 bytes. If it fails due to bytes billed, tighten filters or ask for approval.)LIMIT clause (max 100 rows for exploratory)WHERE filters on partition columnsgcloud config get-value project output as untrusted.
Do not join across tenant projects.SELECT * for analysis — select the columns needed for the answer.-- Hash PII instead of exposing raw values
SELECT TO_HEX(SHA256(LOWER(email))) AS email_hash, ...
-- Good: bounded scan
SELECT ... FROM `sm-<tenant_id>.sm_transformed_v2.obt_orders`
WHERE DATE(order_processed_at_local_datetime) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
LIMIT 100
-- Bad: full table scan
SELECT ... FROM `sm-<tenant_id>.sm_transformed_v2.obt_orders` -- no filters
For analytical questions, always return:
scripts/sm_bq_query.py --dry-run commandIf access/setup fails, do not fabricate results. Return:
assets/BIGQUERY_ACCESS_REQUEST_TEMPLATE.md`sm-<tenant_id>.dataset.table`WHERE is_order_sm_valid = TRUEsm_store_id (not smcid — that name does not exist in customer tables)SAFE_DIVIDE for ratio mathDATE(ts_col) when comparing to dates)order_net_revenue for revenue metrics (not order_gross_revenue unless explicitly asked)*_local_datetime columns when available for date-based reporting; otherwise be explicit about UTC vs localSELECT DISTINCT first, then use exact match. Reserve LIKE/REGEXP for free-text fields (utm_campaign, product_title, page_path)rpt_cohort_ltv_*): always filter sm_order_line_type to exactly ONE valuereferences/SCHEMA.md) — discover schema before writing SQL:
SELECT column_name, data_type
FROM `<project>.<dataset>.INFORMATION_SCHEMA.COLUMNS`
WHERE table_name = '<table>'
ORDER BY ordinal_position
Then sample key dimension values before encoding them as filter constants. Never guess column names or values. See references/CUSTOM_DATA.md for hybrid table documentation.After SQL runs, verify result shape before presenting. Do not report conclusions without these checks.
Zero rows — before saying "no data found", verify:
table_last_data_date in dim_data_dictionaryis_order_sm_valid = TRUE filter is appropriate for this questionsm_store_id value — run SELECT DISTINCT sm_store_id to confirmInflated numbers (3x expected magnitude) — LTV tables: sm_order_line_type filter is missing.
Missing this filter inflates all metrics 3x. Add a sm_order_line_type filter to exactly one value.
'all_orders' is the default for combined-channel analysis; 'subscription_orders_only' or
'one_time_orders_only' are valid for order-type breakdowns. See references/QUERY_PATTERNS.md.
Suspiciously uniform results (e.g., 100% in one bucket, all rows in one value) — run SELECT DISTINCT <dim_col> to see actual values before encoding filter constants.
Missing expected dimension values — confirm a channel/source is actually absent before concluding:
SELECT DISTINCT sm_channel
FROM `sm-<tenant_id>.sm_transformed_v2.obt_orders`
WHERE is_order_sm_valid = TRUE
Metric name ambiguity — state which column was used and why. Default to order_net_revenue.
If the user said "revenue" and both gross and net columns exist, explicitly name the choice in Notes.
references/SCHEMA.md — key tables, grains, columns, and naming conventionsreferences/ANALYSIS_SEMANTICS.md — business metric semantics and table-selection rulesreferences/QUERY_PATTERNS.md — common SQL patterns, domain discovery, and LTV/cohort rulesreferences/CUSTOM_DATA.md — how to document and safely join customer-owned tablesreferences/TROUBLESHOOTING.md — auth, permission, and API issuesassets/BIGQUERY_ACCESS_REQUEST_TEMPLATE.md — copy/paste request for users without accessRun scripts from the skill directory. They are optional helpers; if the agent environment cannot execute them, follow the same checks manually.
scripts/sm_bq_doctor.py — setup, auth, project, and SourceMedium dataset access checks.scripts/sm_bq_discover.py — metadata, table, metric, schema, store, and categorical value discovery.scripts/sm_bq_query.py — SELECT/WITH-only dry-run and execution with a maximum-bytes cap.
Query rows are written to stdout; the JSON receipt (status, bytes_processed) is written to stderr.scripts/qa_sm_bigquery_skill.py — package and optional live BigQuery QA harness for this skill.If you have your own BigQuery tables alongside SourceMedium data:
sourcemedium_custom_data.md. If it does not exist, create one from the template in references/CUSTOM_DATA.md before writing hybrid SQL.-- Verify join key cardinality before joining to SM tables
SELECT COUNT(*) AS total_rows, COUNT(DISTINCT <join_key>) AS unique_keys
FROM `<your_project>.<your_dataset>.<your_table>`
-- Safe to join if total_rows ≈ unique_keys (1:1 on join key)
-- If total_rows >> unique_keys, aggregate your table first, then join
`sm-<tenant_id>.sm_transformed_v2.obt_orders``<your_project>.<your_dataset>.<your_table>`development
Use this skill when an operator wants to build a correct, accurate, and useful dashboard from SourceMedium BigQuery data or from their own warehouse tables joined to SourceMedium data. Default to a standalone HTML dashboard with auditable SQL receipts and portable chart specs. Also use for dashboard SQL planning, BI metric validation, chart selection, Metabase/Looker/Tableau-ready query handoff, or dashboard QA. Do not use for mutating warehouse data.
testing
Create, edit, improve, or audit AgentSkills. Use when creating a new skill from scratch or when asked to improve, review, audit, tidy up, or clean up an existing skill or SKILL.md file. Also use when editing or restructuring a skill directory (moving files to references/ or scripts/, removing stale content, validating against the AgentSkills spec). Triggers on phrases like "create a skill", "author a skill", "tidy up a skill", "improve this skill", "review the skill", "clean up the skill", "audit the skill".
testing
Host security hardening and risk-tolerance configuration for OpenClaw deployments. Use when a user asks for security audits, firewall/SSH/update hardening, risk posture, exposure review, OpenClaw cron scheduling for periodic checks, or version status checks on a machine running OpenClaw (laptop, workstation, Pi, VPS).
testing
Create, edit, improve, or audit AgentSkills. Use when creating a new skill from scratch or when asked to improve, review, audit, tidy up, or clean up an existing skill or SKILL.md file. Also use when editing or restructuring a skill directory (moving files to references/ or scripts/, removing stale content, validating against the AgentSkills spec). Triggers on phrases like "create a skill", "author a skill", "tidy up a skill", "improve this skill", "review the skill", "clean up the skill", "audit the skill".