skills/sidemantic-modeler/SKILL.md
Build, validate, and manage semantic models using Sidemantic. Use when asked to create a semantic layer, define metrics/dimensions, model a database schema, generate models from SQL queries, import from Cube/dbt/LookML, or set up analytics definitions. Prioritizes CLI-first workflows, with YAML and optional Python API usage for advanced automation.
npx skillsauth add sidequery/sidemantic sidemantic-modelerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Build semantic layers that map physical database tables to business-friendly dimensions and metrics. Sidemantic generates SQL from these definitions, handling joins, aggregations, granularity, and dialect differences automatically.
uv add sidemantic duckdb
mkdir -p models
cat > models/orders.yml <<'YAML'
models:
- name: orders
table: orders
primary_key: order_id
dimensions:
- name: status
type: categorical
metrics:
- name: revenue
agg: sum
sql: order_amount
- name: order_count
agg: count
YAML
uv run sidemantic validate models/ --verbose
uv run sidemantic info models/
uv run sidemantic query models/ -c duckdb:///data.duckdb \
"SELECT revenue, status FROM orders ORDER BY revenue DESC LIMIT 5"
Assumes an orders table already exists in data.duckdb with status and order_amount columns.
models:
- name: orders
table: orders
primary_key: order_id
dimensions:
- name: status
type: categorical
- name: order_date
type: time
sql: created_at
granularity: day
metrics:
- name: revenue
agg: sum
sql: order_amount
- name: order_count
agg: count
Load and query:
from sidemantic import SemanticLayer
layer = SemanticLayer.from_yaml("models.yml", connection="duckdb:///data.duckdb")
result = layer.sql("SELECT revenue, status FROM orders")
from sidemantic import Model, Dimension, Metric, SemanticLayer
layer = SemanticLayer(connection="duckdb:///data.duckdb")
Model(
name="orders",
table="orders",
primary_key="order_id",
dimensions=[
Dimension(name="status", type="categorical"),
Dimension(name="order_date", type="time", sql="created_at", granularity="day"),
],
metrics=[
Metric(name="revenue", agg="sum", sql="order_amount"),
Metric(name="order_count", agg="count"),
],
)
result = layer.sql("SELECT revenue, status FROM orders")
The fastest path when existing queries are available. The Migrator reverse-engineers semantic models by analyzing SQL: it extracts tables, columns, aggregations, joins, time dimensions, derived metrics, and window functions automatically.
# Generate model YAML + rewritten queries from raw SQL
sidemantic migrator --queries queries/ --generate-models output/
# Check coverage: how well do existing models handle these queries?
sidemantic migrator models/ --queries queries/ --verbose
from sidemantic import SemanticLayer
from sidemantic.core.migrator import Migrator
# Connect to your database (optional but improves inference via information_schema)
layer = SemanticLayer(connection="duckdb:///data.duckdb", auto_register=False)
migrator = Migrator(layer, connection=layer.conn)
# Feed it SQL queries (strings, not files)
queries = [
"SELECT status, SUM(amount) AS revenue, COUNT(*) AS orders FROM orders GROUP BY status",
"SELECT DATE_TRUNC('month', created_at), SUM(amount) FROM orders GROUP BY 1",
"SELECT c.region, SUM(o.amount) / COUNT(DISTINCT c.id) AS rev_per_customer "
"FROM orders o JOIN customers c ON o.customer_id = c.id GROUP BY 1",
]
report = migrator.analyze_queries(queries)
models = migrator.generate_models(report) # YAML-ready model dicts
graph_metrics = migrator.generate_graph_metrics(report, models) # cross-model metrics
rewritten = migrator.generate_rewritten_queries(report) # semantic SQL
# Write to disk
migrator.write_model_files(models, "output/models/")
migrator.write_rewritten_queries(rewritten, "output/rewritten_queries/")
# Print coverage report
migrator.print_report(report, verbose=True)
| Pattern in SQL | What it generates |
|----------------|-------------------|
| SUM(amount) / COUNT(*) / AVG(price) | Metric with matching agg |
| COUNT(DISTINCT user_id) | Metric with agg: count_distinct |
| SUM(amount) AS revenue | Metric named revenue (preserves aliases) |
| GROUP BY status | Dimension type: categorical |
| DATE_TRUNC('month', created_at) | Dimension type: time, granularity extracted from SQL (here: month) |
| JOIN customers ON o.customer_id = c.id | Relationship many_to_one, foreign_key: customer_id |
| SUM(a) / NULLIF(COUNT(b), 0) | Derived metric with formula |
| SUM(x) OVER (ORDER BY date ROWS ...) | Cumulative metric with window |
| SUM(x) OVER (PARTITION BY DATE_TRUNC(...)) | Cumulative metric with grain_to_date |
| Cross-model expressions | Graph-level derived metrics |
migrator.analyze_queries(queries) to generate a first passFor the full Migrator API (all methods, outputs, edge cases), load references/generation.md.
Follow these steps when building a semantic model from a database schema.
Inspect tables, columns, data types, and foreign key relationships. Identify which tables hold transactional/event data (fact tables) and which hold descriptive attributes (dimension tables).
For each table, create a Model with:
name: a short, snake_case identifiertable: schema-qualified table name (e.g., public.orders)primary_key: the table's primary key column (default: id)Use sql instead of table for derived/virtual tables built from a SQL expression.
Add dimensions for columns used in GROUP BY or WHERE clauses. Choose the correct type:
| Type | When to use | Example |
|------|-------------|---------|
| categorical | Strings, enums, IDs for grouping | status, region |
| time | Dates/timestamps (enables granularity) | created_at, order_date |
| boolean | Computed true/false from SQL expression | sql: "amount > 100" |
| numeric | Numbers used for grouping, not aggregation | quantity_bucket |
Time dimensions require granularity (one of: second, minute, hour, day, week, month, quarter, year). Queries use double-underscore syntax: orders.order_date__month.
Use sql when the dimension maps to a different column name or a computed expression. If omitted, defaults to a column matching name.
Set parent on dimensions to create drill-down hierarchies (e.g., country > state > city).
Add metrics for columns that should be aggregated.
Simple aggregations (model-level):
| agg | SQL generated | Notes |
|-----|---------------|-------|
| sum | SUM(col) | Revenue, quantities |
| count | COUNT(*) | Row counts (no sql needed) |
| count_distinct | COUNT(DISTINCT col) | Unique values |
| avg | AVG(col) | Averages |
| min / max | MIN(col) / MAX(col) | Extremes |
| median | MEDIAN(col) | Median |
Model-level simple metrics currently validate against: sum, count, count_distinct, avg, min, max, median.
Use filters on a metric to create filtered aggregations (e.g., filters: ["status = 'completed'"]). These become CASE WHEN expressions, not WHERE clauses.
Complex metrics (usually graph-level, in top-level metrics: section):
| type | Purpose | Required fields |
|------|---------|-----------------|
| ratio | Division of two measures | numerator, denominator |
| derived | Arbitrary SQL formula | sql (references other metrics) |
| cumulative | Rolling/running totals | sql, optional window or grain_to_date |
| time_comparison | Period-over-period | base_metric, comparison_type (yoy/mom/wow/dod/qoq) |
| conversion | Funnel analysis | entity, base_event, conversion_event |
Graph-level metrics sit in the top-level metrics: section (outside models:). They reference model-level measures using model.metric syntax.
Connect models with relationships so Sidemantic can auto-generate JOINs.
| Type | Direction | Example |
|------|-----------|---------|
| many_to_one | This model has FK to other | orders -> customers |
| one_to_one | Unique FK | user -> user_profile |
| one_to_many | Other model has FK to this | customer -> orders |
| many_to_many | Through junction table | students <-> courses |
Declare relationships on the model that owns the foreign key. For many_to_one, foreign_key defaults to {related_model}_id.
For many_to_many, specify through (junction model), through_foreign_key, and related_foreign_key.
# Validate definitions (checks for errors and warnings)
sidemantic validate models/ --verbose
# Quick summary of what's defined
sidemantic info models/
# Validate and inspect without writing code
uv run sidemantic validate models/ --verbose
uv run sidemantic info models/
# Execute semantic SQL through CLI
uv run sidemantic query models/ -c duckdb:///data.duckdb \
"SELECT revenue, status FROM orders WHERE status = 'completed'"
Python API (optional):
# Structured query API
result = layer.query(
metrics=["orders.revenue"],
dimensions=["orders.status", "orders.order_date__month"],
filters=["orders.status = 'completed'"],
order_by=["orders.revenue DESC"],
limit=10,
)
# SQL interface (auto-rewrites through semantic layer)
result = layer.sql("SELECT revenue, status FROM orders WHERE status = 'completed'")
# Compile to SQL without executing
sql = layer.compile(metrics=["orders.revenue"], dimensions=["customers.region"])
Reusable named WHERE filters applied at query time. Unlike metric filters, segments affect all metrics in the query.
models:
- name: orders
table: orders
segments:
- name: completed_orders
sql: "status = 'completed'"
- name: us_only
sql: "{model}.region = 'US'"
Segments are model-scoped and used as model.segment references at query time:
uv run sidemantic query models/ -c duckdb:///data.duckdb \
"SELECT revenue, status FROM orders WHERE completed_orders"
Python API (optional):
result = layer.query(
metrics=["orders.revenue"],
dimensions=["orders.status"],
segments=["orders.completed_orders"],
)
CLI-first:
uv run sidemantic info path/to/models/
uv run sidemantic validate path/to/models/ --verbose
Python API (optional):
from sidemantic import SemanticLayer, load_from_directory
layer = SemanticLayer(connection="duckdb:///data.duckdb")
load_from_directory(layer, "path/to/models/")
Auto-detects: Cube (.yml with cubes:), dbt MetricFlow (.yml with semantic_models:), LookML (.lkml), Malloy (.malloy), Rill, Hex, Snowflake Cortex, and more.
For detailed field mappings from each format, load references/migration.md.
When SemanticLayer() is created with auto_register=True (the default), it sets itself as the "current layer." Any Model() or Metric() constructed while a layer is active auto-registers with it. This is why the Quick Start examples don't call layer.add_model().
If you create Models before creating a SemanticLayer, they won't be registered. Either create the layer first, or use layer.add_model(model) explicitly.
SQL expressions in models support Jinja2 templating:
models:
- name: orders
sql: "SELECT * FROM orders WHERE region = '{{ region }}'"
Pass values at query time:
result = layer.query(metrics=["orders.revenue"], parameters={"region": "US"})
All commands are run as sidemantic <command>. Use --config path/to/sidemantic.yaml to load a config file with connection and model path settings.
| Command | Purpose |
|---------|---------|
| validate [DIR] --verbose | Validate definitions, show errors and warnings |
| info [DIR] | Summary of models, dimensions, metrics, relationships |
| query [DIR] -c CONNECTION SQL | Execute SQL through the semantic layer (--format table/json/csv, --limit N) |
| migrator [DIR] --queries PATH | Coverage analysis: check how well models handle SQL queries |
| migrator --queries PATH --generate-models OUT | Bootstrap: generate model YAML from SQL queries |
| preagg recommend [DIR] | Recommend pre-aggregation tables from query patterns |
| preagg apply [DIR] | Apply pre-aggregation recommendations |
| serve [DIR] -c CONNECTION | Start PostgreSQL wire-protocol server |
| mcp-serve [DIR] -c CONNECTION | Start MCP server for AI tool integration |
| workbench [DIR] -c CONNECTION | Interactive TUI with SQL editor and charting |
| lsp | Start LSP server for Sidemantic SQL files |
duckdb:///:memory: # In-memory DuckDB
duckdb:///path/to/db.duckdb # File-based DuckDB
duckdb://md:database_name # MotherDuck
postgres://user:pass@host:port/dbname # PostgreSQL
bigquery://project_id/dataset_id # BigQuery
snowflake://user:pass@account/database/schema # Snowflake
clickhouse://user:pass@host:port/database # ClickHouse
databricks://token@server-hostname/http-path # Databricks
spark://host:port/database # Spark SQL
adbc://driver/uri # ADBC
Load these when you need deeper detail:
references/yaml-schema.md: Field-level YAML schema with every field, type, default, and constraintreferences/patterns.md: Complete YAML templates for e-commerce, SaaS, marketing, IoT, and star schema patternsreferences/validation.md: All validation rules, error messages, and fixesreferences/migration.md: Field-by-field mappings from Cube, dbt, LookML, and other formatsreferences/generation.md: Migrator API, schema introspection, auto-model generation, pre-aggregation recommendationsgranularity on time dimensions. Every type: time dimension needs granularity: day (or similar).agg. Metrics that are not complex types need an agg field (sum, count, avg, etc.) or a full SQL expression like sql: "SUM(amount)".SELECT revenue FROM orders), but cross-model queries should use explicit model.field.type: string or type: number for dimensions. The valid types are categorical, time, boolean, numeric.agg. Graph-level metrics (ratio, derived, etc.) go in the top-level metrics: section.numerator + denominator. derived needs sql. time_comparison needs base_metric. conversion needs entity, base_event, conversion_event.customers defaults FK to customers_id, not customer_id. Always set foreign_key explicitly.:, #, {, or >.development
Build interactive analytics webapps, demos, dashboards, or embedded app surfaces from Sidemantic semantic models using copyable component primitives and deterministic query inspection. Use when asked to create a web UI around Sidemantic models, generate a metric explorer, copy reusable analytics components into a project, connect a frontend to Sidemantic query APIs, build a Pyodide/DuckDB-WASM or Sidemantic Rust WASM/DuckDB-WASM demo, expose Sidemantic through an app server, or adapt the Sidemantic widget/UI patterns into a product webapp.
tools
Use when work should span one or more detached tasks but still behave like one job with a single owner context. TaskFlow is the durable flow substrate under authoring layers like Lobster, ACPX, plugins, or plain code. Keep conditional logic in the caller; use TaskFlow for flow identity, child-task linkage, waiting state, revision-checked mutations, and user-facing emergence.
tools
# Lobster Lobster executes multi-step workflows with approval checkpoints. Use it when: - User wants a repeatable automation (triage, monitor, sync) - Actions need human approval before executing (send, post, delete) - Multiple tool calls should run as one deterministic operation ## When to use Lobster | User intent | Use Lobster? | | ------------------------------------------------------ | --------------------------
tools
# Lobster Lobster executes multi-step workflows with approval checkpoints. Use it when: - User wants a repeatable automation (triage, monitor, sync) - Actions need human approval before executing (send, post, delete) - Multiple tool calls should run as one deterministic operation ## When to use Lobster | User intent | Use Lobster? | | ------------------------------------------------------ | --------------------------