skills/hotdata-search/SKILL.md
Use this skill when the user wants full-text search, BM25 keyword search, vector similarity search, semantic search, embeddings, or retrieval indexes in Hotdata. Activate for "hotdata search", "BM25", "full-text", "vector search", "semantic search", "similarity", "embedding", "embedding provider", "create an index" (bm25 or vector), "list indexes" for search, or SQL using bm25_search or vector_distance. Do not load for general SQL analytics (aggregations, GROUP BY) or geospatial work — use hotdata-analytics or hotdata-geospatial instead. Requires the core hotdata skill for auth and workspace basics.
npx skillsauth add hotdata-dev/hotdata-cli hotdata-searchInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Retrieval workloads in Hotdata: BM25 full-text, vector similarity, and the indexes and embedding providers that power them.
Prerequisites: Authenticate and select a workspace (see the hotdata skill). Use fully qualified table names: <connection>.<schema>.<table>.
Related skills: hotdata-analytics (OLAP SQL, query history, materialized chains), hotdata-geospatial (PostGIS-style functions).
Both run server-side. --type and --column are optional when the table has exactly one search index — they are inferred automatically. Specify them when multiple indexes exist.
# BM25 (requires a BM25 index on the column)
hotdata search "<query>" --table <connection.schema.table> [--type bm25] [--column <column>] \
[--select <columns>] [--limit <n>] [--workspace-id <workspace_id>] [--output table|json|csv]
# Vector (requires a vector index; server auto-embeds the query text)
hotdata search "<query>" --table <connection.schema.table> [--type vector] [--column <source_text_column>] \
[--select <columns>] [--limit <n>] [--workspace-id <workspace_id>] [--output table|json|csv]
| Type | Behavior |
|------|----------|
| bm25 | Server generates bm25_search(table, col, 'text'). Results sort by score (descending). |
| vector | Pass plain-text query; name the source text column (e.g. title). Server embeds using the same provider/metric/dimensions as the index. SQL uses vector_distance(col, 'text'). Results sort by distance (ascending). |
--type or --column are omitted, the CLI fetches the table's indexes and selects the only BM25/vector index. If multiple exist, you must specify both flags.hotdata query (e.g. cosine_distance(col, [<vec>])). The removed --model / stdin-vector paths hardcoded l2_distance and are not supported.indexes create --type bm25 or --type vector). See references/INDEXES.md.--limit is 10.Indexes attach to a connection table (--connection-id + --schema + --table) or a dataset (--dataset-id). Scopes are mutually exclusive for create/delete.
# List — workspace scan on connection tables (filter with -c / --schema / --table)
hotdata indexes list [--connection-id <id>] [--schema <schema>] [--table <table>] [--workspace-id <ws>] [--output table|json|yaml]
hotdata indexes list --dataset-id <dataset_id> [--workspace-id <ws>] [--output table|json|yaml]
# Managed database (catalog alias — uses the active database when the catalog matches)
hotdata indexes create --catalog <alias> --schema <schema> --table <table> \
--column <col> --type bm25|vector \
[--name <name>] [--metric l2|cosine|dot] [--async] \
[--embedding-provider-id <id>] [--dimensions <n>] [--output-column <name>] [--description <text>]
# Connection table (raw connection ID)
hotdata indexes create --connection-id <id> --schema <schema> --table <table> \
--column <col> --type bm25|vector [--name <name>] ...
hotdata indexes delete --connection-id <id> --schema <schema> --table <table> --name <name>
# Dataset
hotdata indexes create --dataset-id <dataset_id> --column <col> --type bm25|vector [--name <name>] ...
hotdata indexes delete --dataset-id <dataset_id> --name <name>
--type is required on create: bm25 (one text column) or vector (exactly one column; often embeddings or auto-embedded text).sorted indexes (range/equality for OLAP filters) are documented in hotdata-analytics — this skill focuses on retrieval types.--async: poll with hotdata jobs <job_id> (see hotdata skill Jobs).--type vector on a text column generates embeddings server-side. Optional --embedding-provider-id; default output column {column}_embedding (override with --output-column).Full workflow (gather workload → compare existing → create → verify): references/INDEXES.md.
hotdata embedding-providers list [--workspace-id <workspace_id>] [--output table|json|yaml]
hotdata embedding-providers get <id> [--workspace-id <workspace_id>] [--output table|json|yaml]
hotdata embedding-providers create --name <name> --provider-type service|local \
[--config '<json>'] [--provider-api-key <key> | --secret-name <name>] [--workspace-id <workspace_id>]
hotdata embedding-providers update <id> [--name <name>] [--config '<json>'] [--provider-api-key <key> | --secret-name <name>] [--workspace-id <workspace_id>] [--output table|json|yaml]
hotdata embedding-providers delete <id> [--workspace-id <workspace_id>]
sys_emb_openai) are pre-configured; use list for IDs to pass to --embedding-provider-id.--provider-api-key is the embedding service key (not Hotdata --api-key). --secret-name references an existing secret.hotdata tables list --connection-id <id> — confirm column types.hotdata indexes list — avoid duplicate indexes.hotdata indexes create --catalog <alias> --table <table> --column <col> --type bm25|vector (add --async if large).hotdata search "..." --table <catalog.table> — --type and --column are inferred when there is one search index.data-ai
Use this skill when the user wants OLAP-style SQL analytics in Hotdata — aggregations, GROUP BY, JOINs, reporting, exploratory queries, query run history, stored results, or materialized follow-up tables (Chain via datasets or managed databases). Activate for "analyze", "aggregate", "rollup", "pivot", "report", "metrics", "GROUP BY", "query history", "past queries", "query runs", "stored results", "materialize", "chain", "intermediate table", or sorted indexes for filters/range scans. Do not load for BM25/vector search or geospatial SQL — use hotdata-search or hotdata-geospatial. Requires the core hotdata skill for connections, tables, datasets, and auth.
development
Use this skill only when the user is working with geospatial data in Hotdata (PostGIS-style SQL like ST_* functions, geometry/WKB, bbox filtering, point-in-polygon, distance/area, lat/lon, spatial joins, “geospatial”, “GIS”, “PostGIS”). Do not load this skill for non-geospatial SQL or general Hotdata usage.
tools
Use this skill when the user wants to run core hotdata CLI commands — auth, workspaces, connections, managed databases, datasets, tables, basic SQL query, database context (context:DATAMODEL), jobs, and skill install. Activate for "run hotdata", "list workspaces", "list connections", "create a connection", "list databases", "managed database", "load parquet", "list tables", "list datasets", "create a dataset", "execute a query", "database context", "context:DATAMODEL", or general Hotdata CLI usage. For full-text/vector search and retrieval indexes use hotdata-search; for OLAP analytics, query history, stored results, and Chain materializations use hotdata-analytics; for geospatial/GIS use hotdata-geospatial.
data-ai
Example TaskFlow authoring pattern for inbox triage. Use when messages need different treatment based on intent, with some routes notifying immediately, some waiting on outside answers, and others rolling into a later summary.