dataagent/.claude/skills/dataagent-nl2sql/SKILL.md
Use this built-in skill for Chinese OpenDataWorks intelligent-query and NL2SQL work: platform metadata, workflow, lineage, datasource routing, generic table discovery, read-only SQL generation/execution, metric explanation, and chart-oriented answers across MySQL and Doris. Use it whenever the user asks for 数据问答、取数、统计、对比、趋势、占比、明细、诊断、血缘排查、指标口径、术语解释 or SQL 示例 on OpenDataWorks or managed tables. Prefer MCP-first via portal-mcp when `mcp__portal__portal_*` tools are visible, and fall back to the built-in CLI/scripts only when MCP is unavailable. Prefer platform terms and generic data-platform rules; do not assume tenant-specific business terminology, business environment defaults, or hidden business knowledge.
npx skillsauth add mikevan666/opendataworks dataagent-nl2sqlInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Convert Chinese natural-language data questions into read-only SQL, execute against MySQL or Doris, and return structured results with optional chart specs. Prefer portal-mcp tools when the runtime exposes them; otherwise fall back to the built-in Python scripts.
Covered scenarios: 数据问答、取数、统计、对比、趋势、占比、明细、诊断、血缘排查、指标口径、术语解释、SQL 示例
Out of scope: general chat, write/update/delete SQL, cross-source federated joins, and questions that depend on tenant-specific business terms or default filters not present in this built-in skill or returned by real metadata tools.
df/di, ds, schema prefixes, and single-source routing.reference/00-skill-map.md first, then progressively load references as needed. Never bulk-read all assets upfront.run_sql.py without first confirming the target database, engine, metrics, dimensions, and required time range.<schema>.<table> in SQL. Never omit the schema, and never use engine names (mysql/doris) as schema.pip install, uv add, which python, python --version, or any environment probing commands.mcp__portal__portal_* first when these tools are visible. Only when MCP tools are unavailable may you fall back to Python scripts and ${DATAAGENT_SKILL_ROOT}/bin/odw-cli.sql_execution or chart_spec. Do not re-execute equivalent SQL or continue reading assets once the answer is grounded.--database after the user or metadata has already narrowed the scope.run_sql.py now hard-blocks first-pass data_lineage SQL unless DATAAGENT_ALLOW_LINEAGE_SQL_FALLBACK=1 is explicitly set for a clearly scoped supplemental query.| Anti-Pattern | Why It Fails | Correct Approach |
|---|---|---|
| Bulk-reading assets/*.json at the start | Wastes tokens and ignores progressive disclosure | Follow the fixed reading order; only drill into assets when references are insufficient |
| Using mysql or doris as SQL schema | Engine type is not a database name | Use metadata-returned db_name as the schema prefix |
| Running run_sql.py without database confirmation | Generates blind-guess SQL | Route through inspect_metadata.py → resolve_datasource.py first for managed tables |
| Querying Doris di table without time range | Full-table scan on incremental data | Always require explicit ds BETWEEN ... AND ... for di tables |
| Querying Doris df table across full history | Unnecessary data scan on snapshot tables | Default to latest ds partition unless the user explicitly requests historical range |
| Inventing tenant-specific business defaults | Built-in skill does not own tenant business knowledge | Limit yourself to current skill docs, metadata, and explicit user input; otherwise ask |
| Retrying with different interpreters on script error | Probes the environment instead of fixing the input | Diagnose from the actual error message; adjust parameters or ask the user |
| Generating a chart when data is unsuitable | Forces visual output on 1-row or text results | Only produce chart_spec when the data structure genuinely fits a chart |
| Re-executing equivalent SQL after getting results | Wastes resources and confuses the answer | Stop after the first correct result |
Process any question in this order; stop as soon as sufficient context is gathered:
reference/00-skill-map.md to classify the question type and execution pathreference/10-query-playbooks.md to match the concrete playbookreference/11-datasource-routing.mdreference/20-term-index.md, reference/21-metric-index.md, reference/22-sql-example-index.mdreference/30-tool-recipes.md, reference/40-runtime-metadata.md, reference/50-tool-output-contract.mdassets/* or execute scripts/*Assign one primary type: 统计 | 对比 | 趋势 | 占比 | 明细 | 诊断 | 术语解释 | SQL 示例
If a question spans multiple types, identify the primary goal first, then supplement.
Ask before guessing when any of these apply:
数据层级, 发布状态, 任务依赖, Doris 只读账号)df table but unclear whether to query latest snapshot or historical rangedi table but no time range providedFollow this priority order:
reference/20-*, 21-*, 22-*mcp__portal__portal_* tools are visible:
mcp__portal__portal_search_tablesmcp__portal__portal_get_lineagemcp__portal__portal_resolve_datasourcemcp__portal__portal_export_metadatamcp__portal__portal_get_table_ddlmcp__portal__portal_query_readonlyget_lineage.pyget_table_ddl.pyrun_sql.py read-only query pathinspect_metadata.py firstresolve_datasource.pyrun_sql.pybuild_chart_spec.py with explicit --chart-type bar|line|piebuild_chart_spec.py --chart-type tableformat_answer.pyDo not execute run_sql.py without confirmed database, metrics, and dimensions.
All scripts execute via: "$DATAAGENT_PYTHON_BIN" "${DATAAGENT_SKILL_ROOT}/scripts/<name>.py" ...
Allowed scripts only: inspect_metadata.py, resolve_datasource.py, get_lineage.py, get_table_ddl.py, run_sql.py, build_chart_spec.py, format_answer.py, query_opendataworks_metadata.py, build_reference_digest.py
Preferred MCP tools when available:
mcp__portal__portal_search_tablesmcp__portal__portal_get_lineagemcp__portal__portal_resolve_datasourcemcp__portal__portal_export_metadatamcp__portal__portal_get_table_ddlmcp__portal__portal_query_readonlyCommand templates:
# Metadata inspection
"$DATAAGENT_PYTHON_BIN" "${DATAAGENT_SKILL_ROOT}/scripts/inspect_metadata.py" --keyword <keyword> [--table <table>] [--database <db>]
# Datasource resolution
"$DATAAGENT_PYTHON_BIN" "${DATAAGENT_SKILL_ROOT}/scripts/resolve_datasource.py" --database <db_name> [--engine mysql|doris]
# Lineage snapshot
"$DATAAGENT_PYTHON_BIN" "${DATAAGENT_SKILL_ROOT}/scripts/get_lineage.py" --table <table_name> [--db-name <db_name>] [--depth <n>]
# Live table DDL
"$DATAAGENT_PYTHON_BIN" "${DATAAGENT_SKILL_ROOT}/scripts/get_table_ddl.py" --database <db_name> --table <table_name>
# SQL execution
"$DATAAGENT_PYTHON_BIN" "${DATAAGENT_SKILL_ROOT}/scripts/run_sql.py" --database <db_name> --engine <mysql|doris> --sql "<SQL>"
# Lineage-only supplemental SQL after snapshot is still insufficient
DATAAGENT_ALLOW_LINEAGE_SQL_FALLBACK=1 "$DATAAGENT_PYTHON_BIN" "${DATAAGENT_SKILL_ROOT}/scripts/run_sql.py" --database opendataworks --engine mysql --sql "<supplemental lineage SQL>"
# Chart generation
"$DATAAGENT_PYTHON_BIN" "${DATAAGENT_SKILL_ROOT}/scripts/build_chart_spec.py" --chart-type <bar|line|pie|table> --input '<sql_execution_json>'
# Answer formatting
"$DATAAGENT_PYTHON_BIN" "${DATAAGENT_SKILL_ROOT}/scripts/format_answer.py" --input '<sql_execution_json>'
Prohibitions:
/app/scripts/..., no bare scripts/<name>.py)odw-cli directly unless you are already inside the documented Python-script fallback path; it is not the primary agent interfacepip install, which python, etc.)reference/* before executing a script; do not interleave reading and executingassets/*.jsonportal-mcp or odw-cli -> backend /api/v1/ai/*portal_get_lineage or get_lineage.py before writing custom SQL; only use run_sql.py when the lineage snapshot still lacks required fieldsdata_lineage SQL after the guard fires; switch to portal_get_lineage or get_lineage.py, and only use DATAAGENT_ALLOW_LINEAGE_SQL_FALLBACK=1 for a clearly scoped supplemental querydata_table, data_field, data_lineage, data_task, data_workflow, workflow_*, doris_*) always use the backend read-only query path with database=opendataworks and engine=mysql| Chart Type | When to Use | Explicit Flag |
|---|---|---|
| Table | Default fallback; always safe | --chart-type table (only when the user explicitly requests a standalone table) |
| Bar | Category comparison, TopN, ranking | --chart-type bar |
| Line | Time-series trends | --chart-type line |
| Pie | Proportional analysis with 2–8 categories | --chart-type pie |
Do not generate chart_spec when data is unsuitable for visualization. Retain sql_execution only. Always pass explicit --chart-type; never let the script auto-guess.
reference/00-skill-map.md — question type to execution path mappingreference/10-query-playbooks.md — concrete playbooks per question typereference/11-datasource-routing.md — MySQL vs. Doris routing rulesreference/20-term-index.md — platform glossary and generic data-platform rulesreference/21-metric-index.md — metric formulas and constraintsreference/22-sql-example-index.md — SQL templates by scenarioreference/30-tool-recipes.md — detailed script usage recipesreference/40-runtime-metadata.md — core table schema and runtime detailsreference/50-tool-output-contract.md — output format contractsscripts/inspect_metadata.py — locate managed tablesscripts/resolve_datasource.py — resolve engine and datasourcescripts/get_lineage.py — fetch lineage snapshot through the backend metadata pathscripts/get_table_ddl.py — fetch live table DDL through the backend metadata pathscripts/run_sql.py — execute read-only SQL through the backend query pathscripts/build_chart_spec.py — generate chart spec from SQL resultsscripts/format_answer.py — summarize results for the final answerscripts/query_opendataworks_metadata.py — export platform metadatascripts/build_reference_digest.py — regenerate reference index files from assetsassets/term_explanations.json, assets/business_concepts.json, assets/semantic_mappings.jsonassets/metrics.json, assets/business_rules.json, assets/constraints.jsonassets/sql_examples.json, assets/few_shots.jsonassets/chart-template/*.jsontools
Use this built-in skill to execute readonly SQL through the OpenDataWorks agent API and shared odw-cli. Guardrails such as readonly checks, datasource scope, default limit, and timeout are enforced by CLI and API.
data-ai
Use this built-in skill for OpenDataWorks platform metadata, lineage, datasource resolution, and DDL lookup. This skill does not execute SQL for data analysis.
testing
A minimal smoke skill used to verify that opendataagent can load bundled skills and surface them during chat.
development
Use this built-in skill for lightweight Python-based data exploration, profiling, aggregation, and file-based analysis. It does not fetch OpenDataWorks metadata on its own.