skills/sql-optimizer/SKILL.md
Optimize and clean SQL queries with structured comments, format improvements, and syntax validation. Use this skill whenever users mention "optimize SQL", "clean SQL", "add comments to SQL", "format SQL", "refactor SQL", or want to improve SQL readability. Specifically designed for Hive/MaxCompute SQL with support for variables like ${bizdate}, partitioning, and window functions.
npx skillsauth add OliverOuyang/shuhe-work-skills sql-optimizerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
A skill for optimizing and cleaning Hive/MaxCompute SQL queries with enterprise-grade formatting and documentation standards.
Use this skill when:
Remove Redundancy
Add Structured Documentation
Improve Formatting
Validate Syntax
Read the SQL file and identify:
Organize the optimized SQL with this structure:
/*******************************************************************************
* Query Name: [Descriptive name based on what it does]
* Purpose: [What business question this answers]
* Granularity: [Row-level grain, e.g., "date + channel + platform + asset_type"]
* Data Range: [Time window, e.g., "2026-03-01 onwards (configurable)"]
* Update Frequency: [Daily, weekly, etc.]
******************************************************************************/
SELECT
-- ==================== Dimension Fields ====================
[dimensions with inline comments]
-- ==================== Metric Fields: [Category Name] ====================
[metrics grouped by category with inline comments]
-- ==================== Derived Fields ====================
[calculated fields]
FROM
[main table with description]
-- [Describe the join]
LEFT/INNER JOIN [joined table]
ON [join conditions]
WHERE
-- [Explain the filter logic]
[filter conditions]
GROUP BY
-- [Explain grouping logic]
[grouping fields]
ORDER BY
[ordering]
;
/*******************************************************************************
* Glossary and Notes:
*
* [Business Term Definitions]:
* - T0: [definition]
* - Conversion funnel: [stages]
*
* [Technical Notes]:
* - [Important caveats, assumptions, or edge cases]
******************************************************************************/
SELECT Clause:
CASE Statements:
CASE
WHEN condition1 THEN 'result1'
WHEN condition2 THEN 'result2'
ELSE 'default'
END AS field_name
Aggregations:
SUM(IF(condition, field, 0)) AS metric_name, -- Inline comment explaining the metric
Joins:
WHERE/GROUP BY:
Run the validation script to check:
Execute:
python scripts/validate_sql.py <sql_file_path>
If validation fails, fix the errors and re-validate before proceeding.
Preserve Business Logic: Never change the query results. All optimizations are cosmetic or structural only.
Handle Comments Intelligently:
Chinese vs English:
Hive/MaxCompute Specifics:
${bizdate}, ${pt}, ${yyyymmdd}date_sub, substr, to_date)Error Handling:
Multiple Queries in One File: Optimize each query separately, add divider comments between them.
Extremely Long Queries: Break into sections with clear headers, consider suggesting CTE refactoring.
Performance Optimization: This skill focuses on readability. If user asks about performance (e.g., "make it faster"), acknowledge that's a different scope and focus only on formatting unless they explicitly want performance tuning too.
Before:
select
--substr(a.date_key,1,7) as date_month,
a.date_key,
b.first_login_platform_api_app_mp as platform,
case
when b.marketing_channel_group_name ='精准营销' then '精准营销'
when b.marketing_channel_group_name in ('抖音','抖音二组') then '抖音'
else '其他' end as 渠道类别,
sum(a.first_login_user_count) as log_num,
sum(a.login_t0_apply_finish_user_count) as t0_ato_num-- T0申完户
from
(select * from dwt.table1 where ds = '${bizdate}') a
LEFT JOIN dwt.table2 b on a.uid=b.uid and b.ds = '${bizdate}'
where to_date(a.date_key)>='2026-03-01'
group BY a.date_key, b.first_login_platform_api_app_mp, [case statement]
After:
/*******************************************************************************
* Query Name: Channel Conversion Metrics
* Purpose: Daily conversion funnel metrics by channel and platform
* Granularity: date + platform + channel_category
* Data Range: 2026-03-01 onwards
* Update Frequency: Daily
******************************************************************************/
SELECT
-- ==================== Dimensions ====================
a.date_key, -- Date
b.first_login_platform_api_app_mp AS platform, -- First login platform
-- Channel category aggregation
CASE
WHEN b.marketing_channel_group_name = '精准营销' THEN '精准营销'
WHEN b.marketing_channel_group_name IN ('抖音', '抖音二组') THEN '抖音'
ELSE '其他'
END AS 渠道类别,
-- ==================== Metrics: Conversion Funnel ====================
SUM(a.first_login_user_count) AS log_num, -- First login users
SUM(a.login_t0_apply_finish_user_count) AS t0_ato_num -- T0 application complete users
FROM
-- User daily conversion index fact table
(
SELECT *
FROM dwt.table1
WHERE ds = '${bizdate}'
) a
-- Left join user comprehensive info dimension table
LEFT JOIN dwt.table2 b
ON a.uid = b.uid
AND b.ds = '${bizdate}'
WHERE
-- Data range filter: from 2026-03-01
TO_DATE(a.date_key) >= '2026-03-01'
GROUP BY
a.date_key,
b.first_login_platform_api_app_mp,
CASE
WHEN b.marketing_channel_group_name = '精准营销' THEN '精准营销'
WHEN b.marketing_channel_group_name IN ('抖音', '抖音二组') THEN '抖音'
ELSE '其他'
END
;
/*******************************************************************************
* Glossary:
* - T0: Same day as first login
* - log: Login event
* - ato: Application To (complete)
******************************************************************************/
An optimized SQL should:
tools
SQL 分段验证、自我修复、结果导出与智能分析。流程:解析SQL → Dataphin MCP 验证元数据 → 自动修复 → 分段执行验证 → 导出 CSV → 智能分析(漏斗解读、异常识别、预判用户问题)。适用场景:"跑一下这个SQL"、"验证这个查询"、"帮我执行并导出"、"分析一下结果"等。
testing
Security-first vetting for OpenClaw skills. Use before installing any skill from ClawHub, GitHub, or other sources. Checks for red flags, permission scope, and suspicious patterns.
development
A universal self-improving agent that learns from ALL skill experiences. Uses multi-memory architecture (semantic + episodic + working) to continuously evolve the codebase. Auto-triggers on skill completion/error with hooks-based self-correction.
data-ai
Standardize Jupyter notebooks (.ipynb) for interactive data analysis workflows. Enforces a mandatory cell manifest (M1-M8 + archetype chapters) with tags ([CONFIG]/[SETUP]/[FUNC]/[RUN]/[VIZ]/[EXPORT]), structured markdown sections, and output prefixes ([OK]/[WARN]/[SKIP]). Use when the user wants to standardize, clean up, or create a notebook from scratch. Two archetypes: problem-driven (question-answer analysis) and monitoring (dimension-based periodic reporting).