skills/sql-runner/SKILL.md
> ⚠️ **PROTOTYPE - 开发中 / DO NOT USE IN PRODUCTION** > > 当前 `run_sql.py` 中的 MCP 集成为 **mock 实现**,不会真正调用 Dataphin API。 > 实际使用时,Claude Code 会直接通过 MCP 协议调用 `sh_dp_mcp` 工具,无需运行此脚本。 > > **正确用法**:直接在 Claude Code 对话中要求执行 SQL,Claude 会调用 MCP 工具。 --- name: sql-runner description: Execute optimized SQL files using Dataphin MCP tools. Automatically submits queries, polls for results, and saves output to CSV with timestamped filenames. Use this skill when users mention "run SQL", "execute query", "submit to D
npx skillsauth add OliverOuyang/shuhe-work-skills skills/sql-runnerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
⚠️ PROTOTYPE - 开发中 / DO NOT USE IN PRODUCTION
当前
run_sql.py中的 MCP 集成为 mock 实现,不会真正调用 Dataphin API。 实际使用时,Claude Code 会直接通过 MCP 协议调用sh_dp_mcp工具,无需运行此脚本。正确用法:直接在 Claude Code 对话中要求执行 SQL,Claude 会调用 MCP 工具。
A skill for executing SQL queries through Dataphin's MCP interface with automatic polling and result management.
Use this skill when:
Read SQL File
Submit Query to Dataphin
sh_dp_mcp:submit_dp_query tool to submit SQLPoll for Results
sh_dp_mcp:get_dp_query_status to check query statusSave Results
{query_name}_{YYYYMMDD_HHMMSS}.csvdata/ subfolder in SQL file's directoryRead the SQL file and extract:
Expected SQL file structure:
/*******************************************************************************
* Query Name: Budget Core Statistics
* Purpose: Monthly budget tracking by channel and user segment
* ...
******************************************************************************/
SELECT
...
FROM
...
Extract "Budget Core Statistics" as the query name for filename generation.
If no header comment exists, use the SQL filename (without extension) as fallback.
Call MCP tool to submit query:
result = submit_dp_query(sql=sql_content)
task_id = result['taskId']
Handle errors:
apply_dp_table_permissionPoll query status with 5-second intervals:
while elapsed_time < 300: # 5 minute timeout
status = get_dp_query_status(taskId=task_id)
if status['queryStatus'] == 'SUCCESS':
break
elif status['queryStatus'] in ['FAILED', 'CANCELLED']:
raise QueryError(status['message'])
sleep(5)
elapsed_time += 5
Display progress to user:
When query succeeds:
datetime.now().strftime('%Y%m%d_%H%M%S'){sanitized_name}_{timestamp}.csvdata/ directory exists in SQL file's parent directoryExample output path:
Input: C:\Users\Oliver\Desktop\project\queries\budget_stats.sql
Output: C:\Users\Oliver\Desktop\project\queries\data\Budget_Core_Statistics_20260331_143022.csv
MCP Tool Integration: The script must properly invoke MCP tools, not just mock them. The actual implementation should use the MCP protocol to call sh_dp_mcp tools.
Error Handling:
kill_dp_query tool before exitingFile Naming:
.csv extension (not .xlsx)CSV Format:
Progress Display:
Very Long Queries: If query runs longer than 5 minutes, increase timeout or implement cancellation logic.
Large Result Sets: Dataphin limits to 10,000 rows. If result is truncated, warn user and suggest using data export tools.
Multiple SQL Statements: If file contains multiple queries separated by semicolons, execute them sequentially and save separate CSV files.
Variables in SQL: The SQL may contain ${bizdate} placeholders. These should be preserved as-is when submitting to Dataphin (the platform will substitute them).
Empty Results: If query returns zero rows, still create the CSV with just the header row and notify user.
Based on boundary testing, this skill has the following limitations:
Maximum Result Size: Dataphin enforces a hard limit of 10,000 rows per query. If your result set exceeds this, the data will be truncated. Use filtering or data export tools for larger datasets.
Timeout Enforcement: Queries that run longer than 5 minutes (300 seconds) will be automatically killed. Very complex queries may need optimization or a custom timeout configuration.
File Path Constraints:
Query Name Sanitization:
< > : " / \ | ? * are replaced with underscoreStatus Update Frequency: Status is only printed when it changes, not every 5 seconds. This reduces console spam but means status updates may not appear if the query status remains constant.
Network Resilience: Single connection failures during polling are not retried. Network interruptions lasting >5 seconds between polls may cause query abandonment.
CSV Size: While result rows are limited to 10,000, very wide tables (1000+ columns) may create large CSV files (>100MB).
This skill extracts data from Dataphin queries and formats it as CSV with these specifications:
Data Structure:
[[header1, header2, ...], [row1_val1, row1_val2, ...], ...]CSV Encoding & Format:
Filename Generation:
{sanitized_query_name}_{timestamp}.csv
Timestamp format: YYYYMMDD_HHMMSS (e.g., 20260331_143022)
Example: Budget_Core_Statistics_20260331_143022.csv
Directory Structure:
Input SQL: C:\Projects\queries\budget_stats.sql
Output CSV: C:\Projects\queries\data\Budget_Core_Statistics_20260331_143022.csv
└─ Created in "data/" subdirectory relative to SQL file location
Empty Result Handling:
This skill implements robust error handling across multiple scenarios:
Permission Errors:
apply_dp_table_permission to request table access"Syntax Errors:
Timeout Handling:
kill_dp_query MCP tool called before exitingConnection Errors During Submission:
Status Polling Failures:
kill_dp_query may be neededFile System Errors:
Before Running Queries:
Check Query Performance: Very complex queries with many JOINs or aggregations may timeout. Test locally or with LIMIT 100 first.
Verify Table Permissions: If you're unsure about access, check permissions before running. Permission errors stop execution immediately.
Monitor Result Size: Be aware of how many rows your query returns. 10,000-row limit is enforced by Dataphin.
Handle Chinese Characters: Query names and file paths with Chinese characters work correctly, but ensure your terminal supports UTF-8 output.
Plan for Long Queries: Queries running longer than 5 minutes will timeout. Consider optimizing SQL or increasing timeout if needed.
During Query Execution:
Progress Monitoring: Status updates appear when query status changes (RUNNING → SUCCESS or FAILED), not every 5 seconds. Be patient.
Don't Interrupt: The skill handles timeouts automatically. Let it complete rather than manually stopping.
Network Stability: Maintain stable network connection during polling. Disconnections >5 seconds may cause issues.
After Query Completion:
Verify Output: Check that the CSV file was created in data/ subdirectory with correct naming and data.
Excel Compatibility: CSV is UTF-8 with BOM, so it opens directly in Excel. No conversion needed.
Timestamp Uniqueness: Each run creates a new file with current timestamp. Old results are never overwritten (unless you manually delete them).
Large Files: If result set is large (>50MB), Excel may be slow opening it. Consider using dedicated CSV viewers for analysis.
A successful execution should:
data/ directory if neededtools
SQL 分段验证、自我修复、结果导出与智能分析。流程:解析SQL → Dataphin MCP 验证元数据 → 自动修复 → 分段执行验证 → 导出 CSV → 智能分析(漏斗解读、异常识别、预判用户问题)。适用场景:"跑一下这个SQL"、"验证这个查询"、"帮我执行并导出"、"分析一下结果"等。
testing
Security-first vetting for OpenClaw skills. Use before installing any skill from ClawHub, GitHub, or other sources. Checks for red flags, permission scope, and suspicious patterns.
development
A universal self-improving agent that learns from ALL skill experiences. Uses multi-memory architecture (semantic + episodic + working) to continuously evolve the codebase. Auto-triggers on skill completion/error with hooks-based self-correction.
data-ai
Standardize Jupyter notebooks (.ipynb) for interactive data analysis workflows. Enforces a mandatory cell manifest (M1-M8 + archetype chapters) with tags ([CONFIG]/[SETUP]/[FUNC]/[RUN]/[VIZ]/[EXPORT]), structured markdown sections, and output prefixes ([OK]/[WARN]/[SKIP]). Use when the user wants to standardize, clean up, or create a notebook from scratch. Two archetypes: problem-driven (question-answer analysis) and monitoring (dimension-based periodic reporting).