databricks-genie-spaces-best-practices/SKILL.md
--- name: databricks-genie-spaces-best-practices description: Design, configure, curate, govern, monitor, and integrate Databricks AI/BI Genie Spaces — the natural-language-to-SQL surface over Unity Catalog. Covers space scoping, general instructions, parameterized example SQL, SQL functions, trusted assets, JOIN configuration, knowledge store, certified queries, benchmarks, monitoring tab, feedback loops, the Genie Conversation API, governance via Unity Catalog (row filters, column masks, embed
npx skillsauth add kayaman/skills databricks-genie-spaces-best-practicesInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Genie Spaces let business users ask questions in natural language and get SQL-backed answers over Unity Catalog data. The skill operates against a compound AI architecture: an LLM is one component among many — the answer quality is dominated by what you curate (schema metadata, example SQL, SQL functions, instructions, joins) more than by the model itself.
This file is the table of contents. Load referenced files only when the task at hand needs them.
| Reference | When to load |
|---|---|
| references/SETUP.md | Creating a new space, prerequisites, entitlements, choosing tables and warehouses, sharing, cloning. |
| references/CURATION.md | Writing General Instructions, parameterized Example SQL, SQL Functions, Trusted Assets, joins, synonyms, descriptions, knowledge store. |
| references/MONITORING.md | Monitoring tab, weekly digest, "Is this correct?" feedback, benchmarks, Quality Review (Beta), audit logs. |
| references/API.md | Conversation API — auth (OAuth U2M / M2M), endpoints, polling, embedding Genie into apps, rate limits. |
| references/GOVERNANCE.md | Unity Catalog row filters and column masks, embedded warehouse credentials, PII, permission levels, audit. |
| references/ANTIPATTERNS.md | Reviewing an underperforming space — concrete failure modes and the curation fixes that resolve them. |
| references/EXAMPLES.md | Worked examples for two domains (Sales pipeline, IoT telemetry): instructions, parameterized SQL, SQL functions, benchmarks. |
| references/REFERENCES.md | Authoritative Databricks docs and O'Reilly book references with chapter pointers. |
Templates (drop-in starting points; copy and adapt):
assets/space-config.template.json — full configuration scaffold with every supported field.assets/general-instructions.template.md — the four sections every General Instructions block should have.assets/example-sql.template.sql — parameterized Example SQL with the headers Genie expects.assets/benchmark.template.csv — benchmark question/expected-SQL/expected-result triplet.Use Genie when:
Do NOT use Genie when:
(Sources: Databricks docs — What is a Genie Space; Uttamchandani, The Self-Service Data Roadmap, O'Reilly, ch. on self-service success criteria.)
These hold across every production Genie space. Detailed treatment in the referenced files.
references/CURATION.md.references/MONITORING.md.Detailed semantics in references/SETUP.md and references/CURATION.md. Top-level summary so you can plan before opening the UI:
| Area | Field | Purpose |
|---|---|---|
| Identity | Title, Description (Markdown), Thumbnail, Tags (Public Preview) | Discovery and audience expectations. |
| Compute | SQL Warehouse (Pro or Serverless — Serverless recommended) | Executes generated SQL; credentials are embedded. |
| Data | Up to 30 tables/views/metric views from Unity Catalog | Scope of the space. |
| Schema metadata | Table description, column comments, synonyms (space-scoped) | Disambiguates terms ("revenue" → gross_revenue_usd). |
| Relationships | Primary keys, foreign keys, JOIN configuration | Created automatically from PK/FK; review before sharing. |
| Welcome | Common Questions (sample prompts shown to users) | Onboarding and scope signalling. |
| Curation | General Instructions (sectioned Markdown) | Clarification, summary style, scope bounds, out-of-scope refusals. |
| Curation | Example SQL Queries (parameterized) | Canonical answers for known question shapes. |
| Curation | SQL Functions (Unity Catalog UDFs) | Reusable metric definitions; eligible for Trusted designation. |
| Curation | Knowledge Store (preview) | Glossaries, business definitions, prompt-matching aids. |
| Quality | Benchmarks (text → expected SQL → expected result) | Scored evaluation harness; auto-suggested in 2026. |
| Sharing | Permission levels: CAN MANAGE / CAN EDIT / CAN RUN / CAN VIEW | Curator vs consumer separation. |
| Operations | Monitoring tab, Weekly digest, Quality Review (Beta) | Feedback signal for the next iteration. |
| Integration | Conversation API (/api/2.0/genie/spaces/...) | Embed Genie in custom apps; OAuth U2M or M2M. |
Hard limits worth knowing up front:
Use this when standing up a new space. Copy the checklist into your response and tick as you go.
- [ ] Define one audience, one domain, one north-star question shape (write it down)
- [ ] Identify ≤ 30 tables/views; pre-join into views or metric views if you'd exceed
- [ ] Verify Unity Catalog: PK/FK declared, table+column descriptions populated, row filters/column masks attached
- [ ] Pick a Serverless SQL warehouse; verify CAN USE for the space owner
- [ ] Create the space; set Title, Markdown Description, Thumbnail, Tags
- [ ] Add 5–10 Common Questions covering the dominant question shapes
- [ ] Write General Instructions in four sections: Scope, Clarification, Metric definitions, Summary style
- [ ] Add 8–15 parameterized Example SQL queries for the dominant shapes; mark each as a candidate for Trusted
- [ ] Promote stable metrics to SQL Functions in Unity Catalog; reference them from Example SQL
- [ ] Review auto-generated JOINs; remove or rewrite any that are wrong
- [ ] Build a benchmark of ≥ 20 questions with expected SQL + expected result; let Genie auto-suggest more
- [ ] Share with a *small* pilot group (CAN RUN); collect thumbs-up/down + "Fix it" feedback for one week
- [ ] Process Monitoring tab feedback into new Example SQL / SQL Functions / instruction edits
- [ ] Re-run benchmarks; only roll out to all account users after the pass rate stabilizes
If any box is "no" when the space ships, expect quality complaints proportional to the number of unchecked boxes.
references/ANTIPATTERNS.md for the full list)WHERE clause or in a SQL Function.WHERE region = 'EMEA' works until someone renames it to 'Europe'. Use SQL Functions or stable IDs.Before declaring a space ready for general access:
references/ANTIPATTERNS.md and audit the space against each item — fix or accept-with-reason.aws-genai-lens / aws-well-architected — when Genie is part of a broader AWS GenAI architecture (governance, cost, security pillars).domain-driven-design — bounded contexts map cleanly onto one-space-per-domain.event-driven-design — when Genie answers depend on streamed CDC sources hydrating Delta tables.api-design-principles — designing the wrapper API around the Genie Conversation API for embedded use cases.tools
Guidance for designing charts, graphs, plots, dashboards, and data visualizations that communicate clearly and persuade. Use when creating or reviewing a visualization, choosing a chart type, picking a color palette, decluttering a busy graphic, fixing misleading axes or proportions, building a dashboard, annotating a figure, or turning data into a presentation, report, or data-driven story. Grounded in the standard data-visualization literature (Knaflic, Tufte, Cleveland & McGill, Cairo, Wilke, Munzner, Few, Berinato). Covers chart selection, graphical perception and encoding, color and accessibility, decluttering, graphical integrity, dashboards, and narrative. Does NOT cover building data pipelines or ETL, statistical modeling or analysis methods, BI tool/vendor selection, or general UI/UX layout (see ux-design-principles). Tool-agnostic, with optional Python recipes.
development
Architect and implement production-grade microservices systems in TypeScript (NestJS) and Python (FastAPI), including resilience, observability, testing, deployment, and migration guidance.
tools
Implement OTP and passwordless authentication on AWS for TypeScript projects using Cognito CUSTOM_AUTH triggers (default) or a custom DynamoDB-backed flow, with SES (email) and SNS (SMS) delivery. Use when the user mentions OTP, one-time password, passwordless login, magic link, Cognito custom auth, DefineAuthChallenge, CreateAuthChallenge, VerifyAuthChallengeResponse, SES verification email, SNS SMS code, or MFA over email/SMS. Covers architecture decision (Cognito vs custom), Lambda trigger handlers, SES/SNS notifiers, DynamoDB schema with TTL, rate limiting, constant-time comparison, threat model (enumeration, replay, brute force), and aws-sdk-client-mock testing.
tools
O'Reilly book reference lookup for software design decisions. Coding agents MUST use this skill whenever making or reviewing any design decision — choosing an architecture pattern, selecting a data structure, structuring a module, evaluating a library, deciding on an API contract, applying a design pattern, weighing trade-offs between approaches, or any moment where a choice between two or more implementation strategies comes up. The O'Reilly MCP handles the actual book search; this skill tells you how and when to invoke it. Trigger even for seemingly small decisions (naming, layering, concurrency model, error handling strategy) — the best engineers reach for authoritative references before committing to an approach.