examples/chat_excel/skills/data-analysis/SKILL.md
当任务以表格数据为中心(CSV/Excel/TSV/JSON/Parquet)并需要分析、清洗、聚合、可视化、导出时,必须使用本技能。该技能强调人类数据分析师式流程:先多轮探查数据,再分段写小块 Python 代码逐步求解。禁止跳过探查直接编码,禁止一次写大段复杂代码。
npx skillsauth add opencmit/alphora data-analysisInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
你是一名“资深人类数据分析师 + 谨慎工程师”。
你的首要目标不是“快写代码”,而是:
inspect_file.py。profile_data.py。print() 关键中间结果(shape、列名、统计值、样例)。tight_layout()、dpi>=150、保存后 plt.close()。/mnt/workspace/ 下。monthly_revenue_trend.png、cleaned_orders.csv,禁止 output1.csv。| 路径 | 用途 | 权限 |
|------|------|------|
| /mnt/workspace/ | 用户输入文件 + 你的输出文件 | 读写 |
| /mnt/skills/data-analysis/ | 技能脚本与参考资料 | 只读 |
所有生成物(图表/CSV/Excel/报告)必须保存到 /mnt/workspace/。
inspect_file.py —— 多模式探查器(首选)python /mnt/skills/data-analysis/scripts/inspect_file.py <file>
支持模式:
--purpose preview:快速预览--purpose structure:列结构/非空率/示例值--purpose stats:统计、缺失、重复、类别分布--purpose search --keyword <kw>:跨列关键词搜索--purpose locate --keyword <kw1,kw2,...>:返回“字段候选 + 命中位置”的紧凑定位清单--purpose range --start-row N --end-row M:行段查看关键参数:
--sheet <name|index|__all__>(Excel)--columns a,b,c(列筛选)--rows N(显示行数)--encoding ENC(强制编码)--max-lines N(--rows 别名,兼容 old file_viewer)--sheet-name <name>(--sheet 别名,兼容 old file_viewer)--start_row N / --end_row M(--start-row/--end-row 别名,兼容 old file_viewer)purpose:preview | structure | search | range | statskeyword:提供后自动切换到 searchmax_lines:限制返回行数(等价 --max-lines)columns:逗号分隔列名start_row/end_row:提供后自动切换到 range;end_row=-10 表示最后 10 行sheet_name:Excel 工作表名;__all__ 只列目录;search + 无 sheet_name 时执行全局跨 Sheet 搜索# All Sheets / # Inspecting Sheet / # Search / # Found / # WarningIdx + 列定位 的 CSV 形式Sheet + RowRef + ColRef + Header + Value + RowPreview--sheet __all__ 只用于拿“轻量索引目录”,绝不用于展开数据明细。--rows 5 或 --rows 8)。示例:
# 第 1 步:只拿轻量目录(不会展开所有 sheet 明细)
python /mnt/skills/data-analysis/scripts/inspect_file.py /mnt/workspace/sales.xlsx --sheet __all__
# 第 2 步:指定 sheet,小样本预览
python /mnt/skills/data-analysis/scripts/inspect_file.py /mnt/workspace/sales.xlsx --sheet 明细 --purpose preview --rows 5
# 第 3 步:只看关键列 + 关键行段
python /mnt/skills/data-analysis/scripts/inspect_file.py /mnt/workspace/sales.xlsx --sheet 明细 --purpose range --columns 日期,区域,销售额 --start-row 1 --end-row 80
# 搜索某关键词定位业务记录
python /mnt/skills/data-analysis/scripts/inspect_file.py /mnt/workspace/sales.xlsx --sheet 明细 --purpose search --keyword 退款
# 快速锁定主表候选字段、主键候选与命中位置
python /mnt/skills/data-analysis/scripts/inspect_file.py /mnt/workspace/sales.xlsx --sheet 明细 --purpose locate --keyword 销售额,订单号,区域 --rows 8
profile_data.py —— 深度剖析器(复杂任务必用)python /mnt/skills/data-analysis/scripts/profile_data.py /mnt/workspace/data.csv
python /mnt/skills/data-analysis/scripts/profile_data.py /mnt/workspace/data.csv --output /mnt/workspace/profile.json
用于:
visualize.py —— 快速制图器(标准图)python /mnt/skills/data-analysis/scripts/visualize.py \
--type bar \
--data /mnt/workspace/data.csv \
--x month --y revenue \
--output /mnt/workspace/monthly_revenue.png
支持类型:bar barh line pie scatter hist box heatmap
references/PATTERNS.md:pandas/matplotlib 常见模式。至少覆盖这些角度中的 2-4 个(避免冗余):
推荐探查序列(模板):
# B1: 全局摸底
python /mnt/skills/data-analysis/scripts/inspect_file.py /mnt/workspace/<file> --sheet __all__
# B2: 结构确认
python /mnt/skills/data-analysis/scripts/inspect_file.py /mnt/workspace/<file> --purpose structure --sheet <sheet_name> --rows 8
# B3: 质量检查(复杂任务)
python /mnt/skills/data-analysis/scripts/profile_data.py /mnt/workspace/<file>
# B4: 关键字定位(可选)
python /mnt/skills/data-analysis/scripts/inspect_file.py /mnt/workspace/<file> --purpose search --keyword <业务关键词>
# B5: 大表快速定位(推荐)
python /mnt/skills/data-analysis/scripts/inspect_file.py /mnt/workspace/<file> --purpose locate --sheet <sheet_name> --keyword <关键字段1,关键字段2> --rows 8
把任务拆成 3-8 个子步骤,每步一个小代码块:
每一步都应:
import pandas as pd
# Step Cx: <单一目标>
df = pd.read_csv('/mnt/workspace/data.csv')
df.columns = df.columns.str.strip()
# ... 本步处理逻辑(只做一件事) ...
print("shape:", df.shape)
print(df.head(3).to_string(index=False))
场景:用户说“请分析销售表现,找出表现最好的区域,并给图表和可复用结果文件”。
python /mnt/skills/data-analysis/scripts/inspect_file.py /mnt/workspace/sales.xlsx --sheet __all__
python /mnt/skills/data-analysis/scripts/inspect_file.py /mnt/workspace/sales.xlsx --sheet 明细 --purpose preview --rows 5
python /mnt/skills/data-analysis/scripts/inspect_file.py /mnt/workspace/sales.xlsx --sheet 明细 --purpose structure --rows 8
python /mnt/skills/data-analysis/scripts/inspect_file.py /mnt/workspace/sales.xlsx --sheet 明细 --purpose locate --keyword 销售额,订单号,区域 --rows 8
目标:确认“哪张表才是分析主表”,并锁定关键字段(日期、区域、销售额、订单号)。
import pandas as pd
df = pd.read_excel('/mnt/workspace/sales.xlsx', sheet_name='明细', engine='openpyxl')
df.columns = df.columns.str.strip()
print("shape:", df.shape)
print("columns:", df.columns.tolist())
import pandas as pd
df = pd.read_excel('/mnt/workspace/sales.xlsx', sheet_name='明细', engine='openpyxl')
df.columns = df.columns.str.strip()
df['销售额'] = pd.to_numeric(df['销售额'], errors='coerce')
df['日期'] = pd.to_datetime(df['日期'], format='mixed', errors='coerce')
print("销售额缺失率:", round(df['销售额'].isna().mean() * 100, 2), "%")
print("日期缺失率:", round(df['日期'].isna().mean() * 100, 2), "%")
print("重复订单数:", df['订单号'].duplicated().sum())
import pandas as pd
df = pd.read_excel('/mnt/workspace/sales.xlsx', sheet_name='明细', engine='openpyxl')
df.columns = df.columns.str.strip()
df['销售额'] = pd.to_numeric(df['销售额'], errors='coerce')
result = df.groupby('区域', dropna=False)['销售额'].sum().reset_index().sort_values('销售额', ascending=False)
print(result.head(10).to_string(index=False))
import pandas as pd
df = pd.read_excel('/mnt/workspace/sales.xlsx', sheet_name='明细', engine='openpyxl')
df.columns = df.columns.str.strip()
df['销售额'] = pd.to_numeric(df['销售额'], errors='coerce')
by_region = df.groupby('区域', dropna=False)['销售额'].sum().sum()
total = df['销售额'].sum()
print("分组汇总合计:", by_region)
print("原始总计:", total)
print("是否一致:", abs(by_region - total) < 1e-6)
/mnt/workspace/region_sales_summary.csv(utf-8-sig)/mnt/workspace/region_sales_bar.png/mnt/workspace/cleaned_sales_detail.csvimport matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
plt.rcParams['font.sans-serif'] = ['Source Han Sans CN', 'WenQuanYi Micro Hei', 'SimHei', 'DejaVu Sans']
plt.rcParams['axes.unicode_minus'] = False
额外要求:
xlabel、ylabelsavefig(..., dpi=150, bbox_inches='tight')plt.close()1,234,567.89)15.3%)12,345)utf-8 -> gbk -> gb18030 -> latin1df.columns = df.columns.str.strip() 再重新核对pd.to_numeric(..., errors='coerce') / pd.to_datetime(..., format='mixed')inspect_file.py --rows 50 局部探查,再分批处理pip install pandas openpyxl matplotlib -i https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
inspect_file.pyprofile_data.py/mnt/workspace/content-media
Use this skill whenever the user wants to do anything with PDF files. This includes reading or extracting text/tables from PDFs, combining or merging multiple PDFs into one, splitting PDFs apart, rotating pages, adding watermarks, creating new PDFs, filling PDF forms, encrypting/decrypting PDFs, extracting images, and OCR on scanned PDFs to make them searchable. If the user mentions a .pdf file or asks to produce one, use this skill.
development
Use this skill for any task that requires in-depth research, investigation, or comprehensive report generation on a topic. This includes: producing industry analysis, market research, technology surveys, competitive intelligence, trend reports, or any deliverable that synthesizes information from multiple sources into a structured long-form document; answering complex questions that require gathering evidence from the web, analyzing data, and presenting findings with charts and citations; any request where the user explicitly asks for a 'report', 'research', 'survey', 'white paper', or 'deep dive'. Trigger especially when the task cannot be answered from memory alone and requires active information gathering. Do NOT trigger for simple factual Q&A, code-only tasks, single-file data analysis (use data-analysis skill), or creative writing without research backing.
development
Maintainer-only workflow for handling GitHub Secret Scanning alerts on OpenClaw. Use when Codex needs to triage, redact, clean up, and resolve secret leakage found in issue comments, issue bodies, PR comments, or other GitHub content.
development
Maintainer workflow for OpenClaw releases, prereleases, changelog release notes, and publish validation. Use when Codex needs to prepare or verify stable or beta release steps, align version naming, assemble release notes, check release auth requirements, or validate publish-time commands and artifacts.