/SKILL.md
Guide users through creating and managing Databricks AI/BI Genie spaces (Genie rooms) using the Genie API and serialized_space JSON configuration. Covers the full lifecycle: gathering requirements, profiling Unity Catalog tables, configuring data sources with column_configs, defining sample questions, writing SQL expressions (sql_snippets - measures, filters, dimensions), example SQL queries, text instructions, join specs (join_specs), SQL functions, and benchmarks. Manages existing spaces by auditing, diagnosing issues, and optimizing configuration via the PATCH API. Use when the user mentions Genie space, Genie room, serialized_space, knowledge store, sql_snippets, join_specs, column_configs, example_question_sqls, text_instructions, sql_functions, Genie API, prompt matching, entity matching, format assistance, or wants to create, build, set up, configure, update, audit, troubleshoot, or optimize a Genie space.
npx skillsauth add sean-zhang-dbx/prompt-to-genie prompt-to-genieInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Guide users through creating new Databricks AI/BI Genie spaces or managing existing ones — auditing configuration, diagnosing issues, and optimizing for accuracy.
Determine intent first: If the user wants to create a new space, follow the Create a New Space workflow below. If they have an existing space they want to review, fix, or improve, follow the Diagnose and Optimize an Existing Space workflow in references/diagnose_optimize_space.md.
This is a conversational workflow, not a script. Each step involves asking the user questions and waiting for their response before proceeding. Never generate configuration or create the space until the user has explicitly reviewed and approved the plan.
The most common mistake is rushing to build the space after getting table names. Instead, gather business context thoroughly — it is far easier to get the initial room right than to debug it afterward.
Copy this checklist and track progress:
See references/diagnose_optimize_space.md for the full workflow: retrieve config, audit, diagnose issues, recommend optimizations, apply updates, and benchmark.
A well-defined Genie space should answer questions for a specific topic and audience, not general questions across various domains. Start by understanding the user's needs clearly.
Ask the user about:
Example prompt:
"What kind of questions do you want users to be able to ask in this Genie space? For example: sales analytics, customer insights, inventory tracking? Try to keep it focused on one topic — a narrowly scoped space gives more accurate answers. Also, what would you like to name this space?"
Key principle: Curating a Genie space is an iterative process. Plan to start small and refine based on real user feedback rather than aiming for perfection on the first pass.
STOP. Do not proceed to Step 2 until the user has answered the questions above — including a title and description for the space. If their answers are vague (e.g., "just sales stuff"), ask follow-up questions to get specifics — which metrics matter most, what filters users will apply, what time granularity they need. If they haven't provided a title, ask for one now. The more context you gather now, the better the space will be.
Determine which Unity Catalog tables to include. Keep the dataset focused — include only the tables necessary to answer the questions from Step 1.
Example prompt:
"Which Unity Catalog tables should this Genie space have access to? Please provide the full path (catalog.schema.table)."
exclude: true on any column that the user has not explicitly approved for hiding.catalog.schema.table_name
Tip: If the user is unsure, help them explore their catalog:
SHOW TABLES IN catalog.schema;
Before adding tables to the space, verify the user has access:
DESCRIBE TABLE catalog.schema.table_name;
If successful, the table is accessible and can be included in the Genie space.
Review column names and descriptions to assess annotation quality:
DESCRIBE TABLE EXTENDED catalog.schema.table_name;
If column descriptions are missing or unclear, suggest the user add them in Unity Catalog first — this significantly improves Genie's response accuracy.
Reference script: See scripts/discover_resources.py (Part 2) for a comprehensive audit that checks table comments, column descriptions, column counts, foreign keys, and generates a Genie-readiness quality score with specific recommendations.
Column-level configuration via API: Set per-column metadata directly in the serialized_space using column_configs on each table. Important: prompt matching (format assistance + entity matching) is only auto-enabled when tables are added via the UI. When creating spaces via the API, prompt matching is OFF by default. You must explicitly include column_configs entries with enable_format_assistance: true and enable_entity_matching: true for every string/category column that users will filter on. Columns not listed in column_configs will not have prompt matching enabled. Entity matching requires format assistance — turning off format assistance automatically disables entity matching. To hide columns, set exclude: true — but only after confirming with the user which columns to exclude. See references/schema.md → "Prompt matching overview" for limits and "Field Reference → data_sources" for all fields.
If foreign key references are not defined in Unity Catalog, Genie may not know how to join tables correctly. Recommend users:
serialized_space via the API (see format below)example_question_sqls — effective fallback that also teaches Genie query patternsAfter creating the space via the API, recommend that users build out the knowledge store in the Genie space UI. A knowledge store is a collection of curated semantic definitions scoped to the space:
These enhancements don't require write access to the underlying Unity Catalog tables — they're scoped to the Genie space only.
Before generating sample questions, SQL expressions, or example SQL queries, always inspect the actual data in the tables. Do not assume column names or values based on table names alone.
-- Check what columns actually exist
DESCRIBE TABLE catalog.schema.table_name;
-- Check distinct values for key filter/category columns
SELECT DISTINCT column_name FROM catalog.schema.table_name LIMIT 20;
-- Check date ranges
SELECT MIN(date_col), MAX(date_col) FROM catalog.schema.table_name;
This prevents common errors:
Always ask the user about domain-specific conventions like fiscal calendar definitions, internal abbreviations, and product naming conventions before writing SQL.
STOP — Business Logic Checkpoint. Before writing any SQL or generating configuration, pause and ask the user:
"Before I start building the space, I want to make sure I capture your business logic correctly. Here's what I see in the data: [summarize tables, key columns, sample values, date ranges]. A few questions:
- Are there any specific business rules, metric definitions, or calculations I should know about? (e.g., how is 'revenue' calculated? what counts as an 'active' customer?)
- Any terminology or abbreviations your team uses that differ from the column names? (e.g., 'AMER' means 'Americas', fiscal year starts April 1st)
- Are there columns or values that should be excluded or treated specially?
- Any common questions your team asks that require complex logic or multi-table joins?"
Do not proceed until the user confirms or provides this context. This is the most impactful checkpoint — missing business logic here leads to incorrect SQL expressions, wrong filter values, and inaccurate answers that are frustrating to debug after the space is created.
Create 3-5 starter questions that demonstrate the space's capabilities:
Good examples:
Avoid:
Instructions help Genie accurately interpret business questions and generate correct SQL. Prioritize SQL-based instructions over text instructions — they are more precise and easier for Genie to apply consistently.
Use SQL expressions to define frequently used business terms as reusable definitions. These are the most efficient way to teach Genie your business logic. SQL expressions are stored in instructions.sql_snippets in the configuration.
Three types of SQL expressions:
sql_snippets.measures): KPIs and aggregation metrics
{"id": "...", "alias": "total_revenue", "sql": ["SUM(orders.quantity * orders.unit_price)"]}
sql_snippets.filters): Common filtering conditions (boolean expression — do not include the WHERE keyword)
{"id": "...", "display_name": "high value", "sql": ["orders.amount > 1000"]}
sql_snippets.expressions): Attributes for grouping and analysis
{"id": "...", "alias": "order_year", "sql": ["YEAR(orders.order_date)"]}
Important: The
sqlfield insql_snippetsis a string array (string[]), the same format asexample_question_sqls[].sql. Wrap the SQL fragment in an array (e.g.,["SUM(orders.amount)"]). The API rejects plain strings. All column references must be table-qualified (table_name.column_name) — the Genie UI rejects bare column names.
Good candidates for SQL expressions:
Ask the user:
"What key metrics, filters, or grouping dimensions do your users frequently reference? For example: 'total revenue' (measure), 'high-value order' (filter), 'fiscal quarter' (dimension). I'll define these as SQL expressions so Genie handles them accurately."
If the user isn't sure, infer SQL expressions from the table metadata — look at column names and types to suggest common measures (SUM, AVG on numeric columns), filters (status/flag columns), and dimensions (date parts, category columns).
Important: Always include SQL expressions in the instructions.sql_snippets section of the config. Do not just describe them — they must be in the JSON to take effect.
Use complete example SQL queries for hard-to-interpret, multi-part, or complex questions. These show Genie how to handle intricate query patterns and multi-step logic. Queries can be static or parameterized.
Good candidates for example SQL queries:
Use one question per SQL entry. Each example SQL query should map to exactly one natural language question. If you want to cover multiple phrasings of the same question, create separate entries — each with its own question string and the same SQL.
Critical formatting rule for sql: Each SQL clause should be a separate string element in the array with \n at the end. Never concatenate SQL clauses into one string.
{
"question": ["What are total sales by product category?"],
"sql": [
"SELECT\n",
" p.category,\n",
" SUM(o.quantity * o.unit_price) as total_sales\n",
"FROM catalog.schema.orders o\n",
"JOIN catalog.schema.products p ON o.product_id = p.product_id\n",
"GROUP BY p.category\n",
"ORDER BY total_sales DESC"
]
}
Add parameters using :parameter_name syntax. Parameterized queries become trusted assets (labeled "Trusted" in responses). Use for recurring questions where users specify different filter values (e.g., by region, by quarter). Parameter types: String (default), Date, Date and Time, Numeric. Use static queries for questions that don't vary or to teach Genie general patterns.
Reserve text instructions for context that doesn't fit SQL definitions. Keep them concise and specific — too many instructions can reduce effectiveness.
Good text instructions:
Avoid vague instructions. Instead of "Ask clarification questions when asked about sales," write:
"When users ask about sales metrics without specifying product name or sales channel, ask: 'To proceed with sales analysis, please specify your product name and sales channel.'"
Important: Ensure consistency across all instruction types. For example, if text instructions specify rounding decimals to two digits, example SQL queries must also round to two digits.
You can instruct Genie to ask clarification questions when user prompts are ambiguous. Structure these instructions with:
Example:
"When users ask about sales performance breakdown but don't include time range, sales channel, or which KPIs in their prompt, you must ask a clarification question first. For example: 'Please specify the time range and sales channel you are looking for.'"
Add clarification instructions at the end of your text instructions to help Genie prioritize this behavior.
You can customize how Genie generates natural language summaries alongside query results. Add a dedicated section at the end of text instructions with the heading "Instructions you must follow when providing summaries".
Example:
Instructions you must follow when providing summaries:
- Cite the table and column names used in your analysis
- Use bullet points to structure multi-part summaries
- Include the date range covered in the results
Note: Only text instructions affect summary generation. SQL expressions and example SQL queries do not influence summaries.
Register Unity Catalog SQL functions (UDFs) as trusted assets for logic too complex for a single query. Genie calls these functions with user-supplied parameters, and responses are labeled Trusted. Use when the same function can serve multiple spaces or you want to encapsulate business logic that shouldn't be modified.
Tips for writing UDFs:
COMMENT 'List of regions. Values: ["AF", "EU", "NA"]')DEFAULT NULL for optional parameters — check for NULL in the WHERE clause: WHERE (isnull(min_date) OR created_date >= min_date)Permissions: Users need EXECUTE on the function and CAN USE on the containing catalog/schema.
A Genie space supports up to 100 instructions total, counted as:
Keep this budget in mind when adding instructions — prioritize quality over quantity.
Every new space must include benchmarks in its initial configuration. Benchmarks are organized into two categories:
Core benchmarks (high expected accuracy):
example_question_sqls entry. Do not rewrite or adapt it — reuse it verbatim so the ground truth matches the pattern Genie learned.Stretch benchmarks (lower expected accuracy):
Benchmark questions must be unambiguous. If a question could reasonably be answered by multiple different SQL queries, make it more specific. Include the exact metric, grouping, count, and scope so the ground truth SQL is the only reasonable interpretation. Bad: "Show me the most lethal cancers" (how many? what metric?). Good: "What are the top 5 cancer types ranked by average mortality rate?"
Ground truth SQL must be minimal. Only include columns and clauses directly implied by the question. Do not add helpful extras — if the question asks about mortality rate, do not include survival rate or death counts in the SELECT. Extra columns cause benchmark failures because Genie may return different "helpful" columns or none at all.
Target: 10-20 total benchmark questions.
At creation time: validate all benchmark SQL by executing it (same as example SQL). Only verify the SQL runs without errors — do not run benchmark accuracy evaluations during creation. Include benchmarks in the serialized_space JSON under the benchmarks key — see references/schema.md for the schema.
If the user doesn't know their warehouse ID or workspace URL, help them discover available resources.
Reference script: See scripts/discover_resources.py for the complete code. Part 1 lists all eligible SQL warehouses — pro and serverless — (name, ID, type, state, size) and prints the workspace URL. Part 2 audits table metadata quality for Genie-readiness.
Important: Genie spaces require a pro or serverless SQL warehouse (serverless recommended for performance).
STOP — Present the Plan for Review. Before generating any JSON, present a summary of everything you plan to include in the space. Format it clearly so the user can review and approve:
"Here's what I plan to include in your Genie space. Please review and let me know if anything needs to change:
- Title: [space name]
- Description: [one-sentence summary]
- Warehouse: [ID]
- Tables: [list tables]
- Sample questions: [list 3-5 questions]
- SQL expressions: [list measures, filters, dimensions with their definitions]
- Example SQL queries: [list question + brief description of each]
- Text instructions: [summarize key rules]
- Join specs: [list table relationships]
- Hidden columns: [list columns to exclude, or "none"]
- Benchmarks: [count] Core (original + rephrased example SQL questions, reusing exact ground truth SQL) + [count] Stretch (new questions testing generalization)"
Only proceed to generate the configuration after the user confirms. This is your last checkpoint before building — any corrections here are easy, but corrections after creation require the diagnose and optimize workflow.
Build the serialized_space JSON using the schema and examples in references/schema.md. Include only sections relevant to the user's space.
Critical formatting rules (these cause API rejection if wrong):
version: Required. Use 2 for new spacessecrets.token_hex(16)id fields must be sorted alphabetically by id. Tables sorted by identifier. column_configs sorted by column_name.sql fields are string arrays — each SQL clause is a separate element with \n: ["SELECT\n", " col\n", "FROM table"]sql_snippets require table-qualified column references (table_name.column) — bare column names are rejected by the UIWHERE keyword — only the boolean conditionjoin_specs.sql requires two elements: (1) backtick-quoted join condition, (2) "--rt=FROM_RELATIONSHIP_TYPE_...--" annotationtext_instructions.content elements must end with \n — the API concatenates without separatorsbenchmarks section is required — include at least one benchmark per example SQL query with 2-3 alternate phrasings each. Benchmark IDs must be unique across both sample_questions and benchmarks.questions.exclude: true on any column unless the user explicitly approved it in the plan review. If no columns were approved for exclusion, do not exclude any. This is a hard rule — do not infer which columns to hide based on column names like _id, etl_, etc.metric_views if none)Parameter | Description |
-----------|-------------|
serialized_space | JSON string from Step 5 |
warehouse_id | Pro or serverless SQL warehouse ID (required) |
parent_path | Workspace folder path (e.g., /Users/username/genie) |
title | Display name for the space |
description | Brief description of the space's purpose |
POST https://<workspace-url>/api/2.0/genie/spaces
{ "serialized_space": "<JSON string>", "warehouse_id": "<serverless-warehouse-id>", "parent_path": "/Users/<username>/genie_spaces", "title": "Sales Analytics", "description": "Ask questions about sales performance and trends" }
Reference script: Run scripts/validate_config.py on the generated config before calling the API. It checks:
WHERE keyword in filters, snippet table references not in data_sources:parameter syntaxThe validator cross-references table names in sql_snippets against data_sources.tables — if a snippet references a table that isn't in the space (e.g., typo orderz.amount instead of orders.amount), it flags an error. This catches the most common snippet mistakes without needing to execute queries.
Before calling the API, execute every example SQL query to verify it runs successfully. Do not create the space with untested SQL.
For each example_question_sqls entry in the configuration:
sql array into a single string: query = "".join(sql_parts)spark.sql(query).show()Only proceed to create the space after all queries pass. If any query fails, work with the user to fix the SQL first.
Run this in a Databricks notebook cell (adapt values to match the user's space):
import json
from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
# serialized_space JSON from Step 5
serialized_space = { ... } # The full JSON built in Step 5
response = w.api_client.do(
method="POST",
path="/api/2.0/genie/spaces",
body={
"title": "Sales Analytics",
"description": "Ask questions about sales performance and trends",
"warehouse_id": "abc123def456", # From scripts/discover_resources.py
"parent_path": "/Users/username/genie", # Workspace folder for the space
"serialized_space": json.dumps(serialized_space),
},
)
space_id = response["space_id"]
host = w.config.host.rstrip("/")
print(f"Space created! Open it here:\n{host}/genie/rooms/{space_id}")
For the full template with column configs and all sections, see scripts/create_space.py.
After creating the space, display a clickable link: https://{w.config.host}/genie/rooms/{space_id}
Important post-creation step: Prompt matching (format assistance + entity matching) is not auto-enabled when creating via the API. After the space is created, remind the user:
"Your space is live! One important step: prompt matching (which helps Genie match user terms like 'California' to actual values like 'CA') is only auto-enabled when tables are added via the UI. Since we created this space via the API, please open the space, go to Configure > Data, and verify that Format assistance and Entity matching are enabled for your key filter columns (under each column's Advanced settings). The
column_configsI included cover [list columns], but any other string/category columns may need to be enabled manually."
Notes: All scripts use WorkspaceClient() which auto-authenticates in notebook context. The creating user's compute credentials are embedded into the space. Rate limits: 20 questions/minute (UI), 5 questions/minute (API free tier).
After creating the space, the curator should be the first user. Testing and iterating is essential — a Genie space gets better over time with real-world feedback.
Your space ships with Core and Stretch benchmarks from Step 4g. After creation, run them from the Benchmarks tab:
Interpreting results:
| Rating | Condition | |--------|-----------| | Good | Generated SQL or result set matches ground truth (including different sort order or numeric values matching to 4 significant digits) | | Bad | Empty result set, error, extra columns, or different single-cell result | | Manual review | Genie couldn't assess, or no SQL answer was provided |
For the full evaluation workflow, see Use benchmarks in a Genie space.
Once you're satisfied with self-testing, recruit a business user:
See references/example_conversation.md for a full multi-turn example demonstrating the pause-heavy, conversational pattern.
See references/validation_checklist.md — run through this before creating the space.
For error handling, troubleshooting, the diagnose/optimize workflow, and links to official Databricks documentation, see references/diagnose_optimize_space.md.
development
Maintainer-only workflow for handling GitHub Secret Scanning alerts on OpenClaw. Use when Codex needs to triage, redact, clean up, and resolve secret leakage found in issue comments, issue bodies, PR comments, or other GitHub content.
development
Maintainer workflow for OpenClaw releases, prereleases, changelog release notes, and publish validation. Use when Codex needs to prepare or verify stable or beta release steps, align version naming, assemble release notes, check release auth requirements, or validate publish-time commands and artifacts.
development
Run, watch, debug, and extend OpenClaw QA testing with qa-lab and qa-channel. Use when Codex needs to execute the repo-backed QA suite, inspect live QA artifacts, debug failing scenarios, add new QA scenarios, or explain the OpenClaw QA workflow. Prefer the live OpenAI lane with regular openai/gpt-5.4 in fast mode; do not use gpt-5.4-pro or gpt-5.4-mini unless the user explicitly overrides that policy.
development
End-to-end Parallels smoke, upgrade, and rerun workflow for OpenClaw across macOS, Windows, and Linux guests. Use when Codex needs to run, rerun, debug, or interpret VM-based install, onboarding, gateway smoke tests, latest-release-to-main upgrade checks, fresh snapshot retests, or optional Discord roundtrip verification under Parallels.