skills/data-platforms/nimble-databricks-data-products/SKILL.md
Builds Databricks data products from live web data, end to end: discovers the right Nimble web-data agents, scrapes into Delta tables, and produces an AI/BI dashboard and/or a deployed Databricks App — a table → dashboard → app workflow, for production data products or quick demos. Use whenever a request pairs live or scraped web data WITH a Databricks destination — e.g. "scrape Amazon/Walmart prices into a Delta table and build a dashboard", "load Zillow/Instagram/Maps/search results into Databricks and build a dashboard or app", "showcase Nimble + Databricks to a prospect". Prefer it over nimble-web-expert or competitor-intel when the data lands in Databricks. Do NOT use for one-off web fetches or CSV exports with no Databricks destination — use nimble-web-expert instead. Do NOT use for competitor or company research briefings — use competitor-intel or company-deep-dive instead. Do NOT use for generic Databricks work with no Nimble/web-data angle — use the official databricks-* skills instead.
npx skillsauth add nimbleway/agent-skills nimble-databricks-data-productsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Turn a natural-language brief like
pricing analysis on dog products from walmart and amazon into working Databricks data products:
discover agents → ingest live web data into Delta → build dashboard and/or app → deliver links.
Equally at home for a quick demo or a real, reusable data product.
You are the orchestrator. Databricks mechanics are delegated to the official databricks-*
skills (see references/databricks-skills.md); this skill owns the Nimble glue and the gaps
(agent discovery, ingestion-from-agents, the AI/BI dashboard JSON, branding).
nimble_agent_list() and nimble agent get — never hardcode from memory (Amazon search takes
keyword, not query).;-separated statements in one call are a parse error.cd don't persist — set them inline. See references/preflight.md.references/branding.md.Track these as todos so nothing is skipped.
Lean on the databricks-core skill for the generic checks.
databricks current-user me → confirm auth; capture the username (for the default schema).databricks warehouses list. Prefer one already RUNNING; if none, offer to start one.nimble_integration.tools.{nimble_search, nimble_extract, nimble_agent_run, nimble_agent_list}.
Quick check: databricks functions list nimble_integration tools.
If missing → STOP and walk the user through references/install-nimble-integration.md
(Nimble cookbook). Do not try to auto-install.catalog.schema
(default users.<username>). Verify writability — some shared catalogs deny CREATE TABLE.
Present the recommendation and let the user confirm or override before writing.Details + exact commands: references/preflight.md.
Parse the brief into: domain/entity · search terms · sources · analysis goal. Then ask (batch into one AskUserQuestion call):
Keep the brief's intent (the "analysis goal") — it picks the Phase 4 template and the headline.
See references/nimble-agents.md.
nimble_agent_list() (run via SQL or the nimble CLI), filter by the source/domain keywords.nimble agent get <name> → read input_properties (required params,
exact names) and output_schema (typed fields).source column + a normalized core
(product_name, price, currency, rating, review_count, brand, url, …), keeping only fields the
chosen agents actually emit. Multi-source comparison hinges on the shared columns.See references/nimble-agents.md for the full SQL. Drive ingestion from a control table, not
per-keyword files — it's reproducible and expandable (add a row, re-run).
0. Probe ONE call per source first (fail fast). Before fanning out, run a single
nimble_agent_run per source and check: status, the real field names, the localization flag, and
whether a price casts cleanly. This catches the Walmart-class surprises (localization, currency-
string prices, product_price vs price) in ~40s instead of after a wasted full round. Highest-
leverage step — see nimble-agents.md §2.5.
<schema>.<table>_queries (source, agent, keyword,
params_json, localization, enabled) and seed one row per (source × term). params_json uses each
agent's real param name (from input_properties); set localization per agent (e.g.
amazon_serp true, walmart_serp false).source column + normalized core + raw VARIANT).nimble_agent_run(q.agent, q.params_json, q.localization) via a
correlated LATERAL join over the control table, with a /*+ REPARTITION(N) */ hint (N ≈ enabled
rows, kept modest — high parallelism can trip API rate limits) so the agent calls run in parallel.
It's one long statement → run it async with bash scripts/ingest.sh <WH> ingest.sql.nimble-agents.md §6 for the diagnostic order.Choose a template from the matched agents' vertical/entity_type:
| Vertical | Dashboard/app shape |
|----------|---------------------|
| Ecommerce (SERP/PDP/CLP) | KPIs; listings & avg price by source/keyword; sponsored share; price-vs-rating scatter; product table with Open links; multi-source → comparison bars + best-effort item-level price gap |
| Social | volume/engagement by account/post; top-content table; like/follower distributions |
| Real Estate | price & price/sqft; listings by location; beds/baths breakdowns |
| Maps / Local | avg rating; review counts; places table |
| LLM / AEO | source/answer presence; share-of-voice; citation table |
| fallback | KPIs + 2 categorical bars + the raw table (works off any output_schema) |
Comparison depth (hybrid): always build the aggregate/category comparison; additionally try best-effort item-level matching across sources (normalize brand + key tokens). If confident matches exist, add a "same-product price gap" view; otherwise keep the aggregate comparison and note that item-level matching wasn't confident.
scripts/build_dashboard.py (compact spec → valid serialized_dashboard,
create + publish). It bakes in every Lakeview gotcha. Read references/dashboard-cookbook.md for
the spec format and recipes.references/app-cookbook.md (delegates scaffold/deploy to databricks-apps;
adds the Nimble-specific SQL, branding, and the numeric-string / light-mode gotchas).references/branding.md (always applied).RUNNING; collect URLs.competitor-intel / company-deep-dive for
business signals on the brands surfaced, or nimble-web-expert for a one-off deeper pull.references/databricks-skills.md — which official databricks-* skill to use per phase.references/install-nimble-integration.md — setup when the integration gate fails.references/preflight.md — auth, warehouse, writable-schema discovery (exact commands).references/nimble-agents.md — discovery, schema mapping, ingestion SQL + gotchas.references/dashboard-cookbook.md — Lakeview JSON recipes + every gotcha (authoritative).references/app-cookbook.md — AppKit demo app glue + gotchas.references/branding.md — "Powered by Nimble", logo, colors.scripts/ingest.sh — async statement fan-out + poll.scripts/build_dashboard.py — compact spec → create + publish a dashboard.assets/nimble-logo.png — the Nimble mark for app branding.development
Finds qualified candidates for a role by searching LinkedIn, Indeed, GitHub, and other professional platforms using Nimble Web Search Agents. Accepts a job description, role title, or freeform request and returns a ranked candidate list with profiles, skills, and contact signals. Use this skill when the user wants to find, source, or recruit candidates for a role. Common triggers: "find candidates for", "source engineers in", "who can I hire for", "find me a [role]", "recruiting for", "talent search", "find a [role] in [city]", "build a candidate list", "sourcing for [role]", "who's available for", "find potential hires". Also triggers on a pasted job description followed by a sourcing request. Do NOT use for job market research or salary benchmarking — use market-finder instead. Do NOT use for researching a single known person — use company-deep-dive or meeting-prep instead.
development
Get web data now — fast, incremental, immediately responsive to what the user needs. The only way Claude can access live websites. USE FOR: - Fetching any URL or reading any webpage - Scraping prices, listings, reviews, jobs, stats, docs from any site - Discovering URLs on a site before bulk extraction - Calling public REST/XHR API endpoints - Web search and research (8 focus modes) - Bulk crawling website sections Must be pre-installed and authenticated. Run `nimble --version` to verify. For building reusable extraction workflows to run at scale over time, use nimble-agent-builder instead.
development
A building experience: create, test, validate, refine, and publish extraction workflows based on existing or new Nimble agents. For users who want to invest in a durable, reusable workflow for a specific domain — not get data immediately. Trigger phrases: "set up extraction for X site", "I need to extract from this site regularly", "build an agent for", "create a reusable scraper", "generate a Nimble agent", "refine my agent", "add a field to my agent", or when the user wants to run extraction at scale. For getting data immediately, use nimble-web-expert instead.
tools
SEO intelligence toolkit covering the full lifecycle via live web data: keyword research, rank tracking, site audits, content gap analysis, competitor keyword reverse-engineering, AI visibility across five platforms (ChatGPT, Perplexity, Google AI, Gemini, Grok), and GitHub repo SEO. Crawls real sites and SERPs via Nimble CLI — no fabricated metrics. Triggers: "SEO", "keywords", "rank tracker", "site audit", "content gap", "competitor keywords", "AI visibility", "GitHub SEO", "SERP analysis", "keyword research", "technical SEO", "keyword difficulty", "topic clusters", "ranking delta", "on-page SEO", "AI citation audit". Do NOT use for competitor business signals — use `competitor-intel` instead. Do NOT use for competitor messaging — use `competitor-positioning` instead. Do NOT use for general web scraping — use `nimble-web-expert` instead.