templates/analytics/.builder/skills/dbt/SKILL.md
dbt (data build tool) project structure, SQL patterns, and best practices for the analytics warehouse. Use this skill when working with dbt models, testing SQL queries, or creating new analytical tables.
npx skillsauth add BuilderIO/agent-native dbtInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
dbt/ (in workspace root, NOT in code/)dbt/models/ organized by layer (staging, intermediate, mart, analytics)dbt/dbt_project.ymldbt/profiles.yml⚠️ CRITICAL: The dbt directory has restricted write access. When creating or modifying dbt models:
code/.builder/dbt-models/ firstcode/scripts/dbt/models/ directory| Schema | Purpose | Examples |
| ---------------------- | ------------------------------------------ | ------------------------------------------------------ |
| dbt_staging_bigquery | Raw staged events from BigQuery | first_pageviews, all_pageviews, signups |
| dbt_staging | Raw staged data from other sources | hubspot_companies, hubspot_contacts |
| dbt_intermediate | Joins, transforms, denormalization | hubspot_form_submissions, deal_first_contact |
| dbt_mapping | Join tables, ID mappings | hs_deals_to_contact_id, user_id_to_org_id |
| dbt_mart | Dimensional models (fact/dim tables) | dim_hs_deals, dim_hs_contacts, dim_subscriptions |
| dbt_analytics | Reporting views, aggregates | deals_by_motion, revenue_funnel, active_users |
| dbt_dev | Development/testing (EXCLUDE from queries) | Auto-filtered by BigQuery lib |
{{
config(
schema="dbt_analytics", -- Target schema
materialized="table", -- or "view", "incremental"
tags=["daily", "analytics", "hubspot"], -- For orchestration/docs
)
}}
table - Full refresh daily, good for < 10M rowsview - No storage, always fresh, good for simple transformsincremental - Append-only, for large event tables⚠️ Common bug source: Column names differ between spec and actual tables
| Spec Column | Actual Column | Table |
| --------------------- | -------------------------- | ----------------- |
| first_pageview_date | created_date (TIMESTAMP) | first_pageviews |
| channel | first_touch_channel | all_pageviews |
| referrer | c_referrer | all_pageviews |
| user_create_date | user_create_d | product_signups |
| deal_stage | stage_name | dim_hs_deals |
| deal_amount | amount | dim_hs_deals |
Always verify column names by querying INFORMATION_SCHEMA.COLUMNS or reading the source dbt model.
❌ WRONG (DISTINCT + ORDER BY non-argument):
ARRAY_AGG(DISTINCT form_name IGNORE NULLS ORDER BY form_fill_date LIMIT 1)
✅ CORRECT (remove DISTINCT or order by same column):
-- Option 1: Remove DISTINCT (ORDER BY creates uniqueness)
ARRAY_AGG(form_name IGNORE NULLS ORDER BY form_fill_date LIMIT 1)[SAFE_OFFSET(0)]
-- Option 2: Order by the aggregated column
ARRAY_AGG(DISTINCT form_name ORDER BY form_name LIMIT 1)[SAFE_OFFSET(0)]
BigQuery dbt models store booleans as strings in some tables. Always cast:
-- dim_hs_deals.is_closed_won is STRING 'true'/'false', not BOOL
CASE WHEN CAST(is_closed_won AS STRING) = 'true' THEN 1 ELSE 0 END
-- Amounts may be STRING, cast to numeric
SUM(CAST(amount AS FLOAT64))
Always use case-insensitive email matching:
LOWER(qf.email) = LOWER(c.email)
Use QUALIFY for window function filtering (cleaner than subquery):
SELECT *
FROM table
QUALIFY ROW_NUMBER() OVER (PARTITION BY deal_id ORDER BY created_date) = 1
When joining on potentially NULL columns (like visitor IDs):
LEFT JOIN forms f
ON (
LOWER(f.email) = LOWER(c.email)
OR (f.b_visitor_id IS NOT NULL AND f.b_visitor_id = c.b_visitor_id)
)
FROM {{ ref("dim_hs_deals") }} d
LEFT JOIN {{ ref("hs_deals_to_contact_id") }} dc
ON d.deal_id = dc.deal_id
LEFT JOIN {{ ref("dim_hs_contacts") }} c
ON dc.contact_id = c.contact_id
LEFT JOIN {{ ref("hubspot_form_submissions") }} f
ON LOWER(f.email) = LOWER(c.email)
AND f.form_fill_date < d.createdate
Key points:
hs_deals_to_contact_id unnests the associatedcontactids JSON arrayQUALIFY to dedupeb_visitor_idform_fill_date < deal.createdate) for attributionFROM {{ ref("first_pageviews") }} fp
LEFT JOIN {{ ref("signups") }} s
ON fp.visitor_id = s.visitor_id
LEFT JOIN {{ ref("dim_subscriptions") }} sub
ON s.root_organization_id = sub.root_id
-- Use product_signups for user data
FROM {{ ref("dim_hs_contacts") }} c
LEFT JOIN {{ ref("product_signups") }} ps
ON LOWER(ps.email) = LOWER(c.email)
OR (ps.user_id IS NOT NULL AND ps.user_id = c.builder_user_id)
LEFT JOIN {{ ref("dim_root_organizations") }} ro
ON ps.user_id = ro.user_id -- or use appropriate join key
WHERE ps.user_create_d IS NOT NULL
Important: Use dbt_analytics.product_signups for signup data - it has the most complete user coverage. Match on both email and user_id for best results.
Always test SQL before creating dbt model:
code/scripts/test-<feature>.sqlFROM `your-project-id.dbt_mart.dim_hs_deals`
code/scripts/test-<feature>.ts:
import { runQuery } from "../server/lib/bigquery";
import { readFileSync } from "fs";
const sql = readFileSync("scripts/test-<feature>.sql", "utf-8");
const result = await runQuery(sql);
console.log(result.rows);
pnpm action test-<feature>{{ ref("table") }})code/.builder/dbt-models/<model_name>.sqlTo detect if a contact had a product signup before deal creation, use dbt_analytics.product_signups:
-- Join to product_signups (match by email OR user_id)
-- AND signup was BEFORE deal creation
LEFT JOIN {{ ref("product_signups") }} ps
ON (
LOWER(ps.email) = LOWER(c.email)
OR (ps.user_id IS NOT NULL AND ps.user_id = c.builder_user_id)
)
AND ps.user_create_d < d.createdate
Key columns in product_signups:
user_id - Builder user IDemail - User emailuser_create_d (TIMESTAMP) - Signup/user creation dateCritical: Match on both email AND user_id with OR logic for complete coverage. user_create_d is already TIMESTAMP, no conversion needed.
Do NOT use:
dbt_staging_bigquery.signups - incomplete coveragedim_hs_contacts.sign_up_time_stamp - DATE type, requires conversion and has gapsWhen attributing form submissions to deals/contacts:
Qualifying form categories (based on actual data analysis):
WHERE (
-- Sales-related forms
LOWER(form_name) LIKE '%sales%'
OR LOWER(conversion_details) LIKE '%sales%'
-- Demo forms
OR LOWER(form_name) LIKE '%demo%'
OR LOWER(conversion_details) LIKE '%demo%'
-- Specific high-intent forms
OR form_name = '[Marketing] | Component Indexing Request'
OR conversion_details = 'Unlock Ent Trial'
)
Common form names (March 2026 data):
[Marketing] Sales Demo Form | 7.20.23 - 5,803 submissions[Marketing] Sales Demo Form - Unlock Enterprise Features - 3,971 submissionsDemo Library Form - 1,866 submissions[Marketing] | Component Indexing Request - 51 submissionsAdd to dbt/models/analytics/_models.yml:
- name: deals_inbound_outbound_motion
description: >
Classifies Enterprise deals as Inbound or Outbound based on whether
any associated contact filled a qualifying form before deal creation.
columns:
- name: deal_id
description: Unique deal identifier
- name: deal_motion
description: "Inbound or Outbound classification"
- name: qualifying_form_count
description: "Number of distinct qualifying forms filled by associated contacts"
- name: first_qualifying_form_name
description: "Name of earliest qualifying form"
dim_hs_deals ~3,400 rows, hubspot_form_submissions ~20K rowsserver/lib/bigquery.tsBased on this analysis, here are additional AI instructions that would be helpful:
associatedcontactids is non-obvioustools
Public booking flow — the state machine, animations, and URL/app-state sync.
tools
Trigger-based automations — reminders, follow-ups, webhooks — across the booking lifecycle.
tools
Team event types, round-robin assignment, collective bookings, host weights, and no-show calibration.
development
The pure `computeAvailableSlots` function — inputs, outputs, invariants, and debugging guide.