skills/ck991357/python-sandbox/SKILL.md
在沙盒环境中执行Python代码,用于数据分析、可视化和生成Excel、Word、PDF等文件。支持数据清洗、统计分析、机器学习、图表生成、文档自动化等复杂工作流。
npx skillsauth add aiskillstore/marketplace python-sandboxInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Python沙盒是一个多功能的代码执行环境,支持:
| 功能领域 | 主要用途 | 关键库 | |---------|---------|-------| | 数据分析 | 数据清洗、转换、聚合 | Pandas, Polars | | 高性能计算 | 内存SQL、表达式加速 | DuckDB, Numexpr, Bottleneck | | 可视化 | 图表生成与自动捕获 | Matplotlib, Seaborn | | 文档自动化 | Excel/Word/PDF/PPT生成 | python-docx, reportlab, openpyxl | | 机器学习 | 模型训练与评估 | scikit-learn, LightGBM | | 符号数学 | 公式证明、方程求解 | SymPy | | 科学计算 | 优化、积分、信号处理 | SciPy | | 流程图生成 | 架构图、流程图 | Graphviz, NetworkX | | 文本分析 | HTML解析、数据提取 | BeautifulSoup4, lxml | | 性能优化 | 机械硬盘优化、异步IO | aiofiles, joblib |
/data 目录)用途: 数据分析、处理、持久化存储
支持格式: .csv, .xlsx, .xls, .parquet, .json, .txt, .feather
访问方式: 绝对路径 /data/文件名
import pandas as pd
df = pd.read_csv('/data/sales.csv') # ✅ 正确
用途: 图片识别、PDF内容提取
支持格式: .png, .jpg, .jpeg, .pdf, .txt(小文件)
特点: 文件内容直接嵌入对话,不在 /data 目录
# ❌ 错误:无法从/data读取上传的图片
# img = Image.open('/data/uploaded_image.png') # 会失败
import matplotlib.pyplot as plt
plt.plot([1,2,3], [4,5,6])
plt.title('示例图表')
plt.show() # 🎯 关键:自动捕获,无需手动处理
# 支持以下图表库的自动捕获:
# - Matplotlib (使用 plt.show() 触发)
# - Graphviz (创建 Digraph 对象自动捕获)
# - NetworkX (通过 Matplotlib 渲染)
import base64
import json
# 生成文件内容后...
file_data = base64.b64encode(content).decode('utf-8')
# 🎯 后端实际支持的输出类型:
output = {
"type": "excel", # 或 "word", "pdf", "ppt"
"title": "销售报告.xlsx",
"data_base64": file_data # 注意:只有image类型用"image_base64"
}
# 对于图片输出,后端自动生成:
# {
# "type": "image",
# "title": "图表标题",
# "image_base64": "base64字符串"
# }
print(json.dumps(output)) # 🎯 必须用JSON格式打印
print("分析结果:")
print(f"总计: {total}")
print(df.describe()) # Pandas DataFrame自动美化显示
"image" - 图表、流程图(自动捕获)"excel" - Excel文件"word" - Word文档"pdf" - PDF文件"ppt" - PowerPoint演示文稿/data 目录对应会话工作区# 第一步:处理数据并保存
import pandas as pd
df = pd.read_excel('/data/原始数据.xlsx')
processed = df.groupby('部门')['销售额'].sum()
processed.to_csv('/data/部门汇总.csv') # ✅ 保存中间结果
print("已保存部门汇总数据")
# 第二步:读取中间结果继续分析
df_summary = pd.read_csv('/data/部门汇总.csv')
print(f"读取到 {len(df_summary)} 个部门的汇总数据")
/data 目录为空| 任务类型 | 参考文件 | 核心库 |
|---------|---------|-------|
| 创建图表 | matplotlib_cookbook.md | matplotlib, seaborn |
| 数据处理 | pandas_cheatsheet.md | pandas, duckdb |
| 生成报告 | report_generator_workflow.md | python-docx, reportlab |
| 机器学习 | ml_workflow.md | scikit-learn, lightgbm |
| 符号数学 | sympy_cookbook.md | sympy |
| 科学计算 | scipy_cookbook.md | scipy |
| 文本解析 | text_analysis_cookbook.md | beautifulsoup4, lxml |
# 1. 定义符号
import sympy as sp
x, y = sp.symbols('x y')
# 2. 构建表达式
lhs = (x + y)**2
rhs = x**2 + 2*x*y + y**2
# 3. 验证恒等
difference = sp.simplify(lhs - rhs)
print(f"差值: {difference}")
print(f"是否恒等: {difference == 0}")
# Extract
df = pd.read_csv('/data/raw.csv')
# Transform
df_clean = (df
.dropna()
.drop_duplicates()
.assign(profit = lambda d: d['revenue'] - d['cost']))
# Load
df_clean.to_csv('/data/cleaned.csv', index=False)
print(df_clean.describe())
from graphviz import Digraph
# 创建流程图
dot = Digraph(comment='工作流程', format='png')
dot.node('A', '数据采集')
dot.node('B', '数据清洗')
dot.node('C', '数据分析')
dot.node('D', '报告生成')
dot.edges(['AB', 'BC', 'CD'])
dot.attr(rankdir='LR') # 从左到右布局
# 🎯 自动捕获:Graphviz图表会被后端自动捕获并输出为图片
内存限制: 6GB (mem_limit: "6g")
预留内存: 4GB (mem_reservation: "4g")
Swap限制: 禁用 (memswap_limit: "0") # 🔥 避免机械硬盘swap死机
CPU限制: 75%配额 (cpu_quota: 75_000, cpu_period: 100_000)
超时时间: 90秒
文件系统: 只读根目录,/data可写,/tmp为tmpfs
网络: 完全禁用 (network_disabled: true)
chunks = []
for chunk in pd.read_csv('/data/large.csv', chunksize=50000):
processed = process_chunk(chunk) # 自定义处理函数
chunks.append(processed)
final_df = pd.concat(chunks, ignore_index=True)
# 转换CSV为Feather格式 (提速10-100倍)
import pyarrow.feather as feather
df = pd.read_csv('/data/slow.csv')
feather.write_feather(df, '/data/fast.feather') # 保存
# 后续读取极快
df_fast = feather.read_feather('/data/fast.feather')
import duckdb
# 直接查询CSV,不加载到内存
result = duckdb.sql("""
SELECT department,
AVG(salary) as avg_salary,
COUNT(*) as count
FROM read_csv_auto('/data/employees.csv')
WHERE hire_date > '2024-01-01'
GROUP BY department
ORDER BY avg_salary DESC
LIMIT 10
""").df() # 最后转为DataFrame
print(result)
import numexpr as ne
# 传统方式(慢)
df['result'] = df['A'] * 2 + df['B'] ** 2 - df['C'] / 3
# Numexpr方式(快3-5倍)
df['result'] = ne.evaluate(
"A * 2 + B ** 2 - C / 3",
local_dict={k: df[k].values for k in ['A', 'B', 'C']}
)
import aiofiles
import asyncio
async def process_large_file():
# 异步读取,不阻塞主线程(机械硬盘特别受益)
async with aiofiles.open('/data/large_file.csv', 'r') as f:
content = await f.read()
# 处理数据...
# 异步写入
async with aiofiles.open('/data/processed.csv', 'w') as f:
await f.write(processed_content)
# 在异步环境中调用
await process_large_file()
from joblib import Memory
import time
# 创建内存缓存(可配置到磁盘)
cachedir = '/data/cache'
memory = Memory(cachedir, verbose=0)
@memory.cache
def expensive_computation(x, y):
"""计算结果会被缓存到磁盘"""
time.sleep(2) # 模拟耗时计算
return x * y + x**2
# 第一次计算慢,后续从磁盘读取快
result1 = expensive_computation(10, 20) # 慢
result2 = expensive_computation(10, 20) # 快(从缓存)
import duckdb
# ❌ 耗内存的Pandas操作
# df = pd.read_csv('/data/large.csv')
# result = df.groupby('category').agg({'value': ['mean', 'sum', 'count']})
# ✅ 内存友好的DuckDB操作
result = duckdb.sql("""
SELECT category,
AVG(value) as mean_value,
SUM(value) as sum_value,
COUNT(value) as count_value
FROM read_csv('/data/large.csv')
GROUP BY category
""").df()
import pandas as pd # 数据分析 (v2.2.2)
import numpy as np # 数值计算 (v1.26.4)
import duckdb # 内存SQL (v0.10.2)
import numexpr as ne # 表达式加速 (v2.10.0)
import bottleneck as bn # 滚动统计加速 (v1.3.8)
import pyarrow.feather as feather # Feather格式支持 (v14.0.2)
import polars as pl # 高性能DataFrame (v0.20.3)
from sklearn.ensemble import RandomForestClassifier # scikit-learn v1.5.0
import lightgbm as lgb # 梯度提升树 (v4.3.0)
import category_encoders as ce # 分类编码 (v2.6.3)
from skopt import BayesSearchCV # 贝叶斯优化 (v0.9.0)
import statsmodels.api as sm # 统计模型 (v0.14.1)
import matplotlib.pyplot as plt # 基础绘图 (v3.8.4)
import seaborn as sns # 统计可视化 (v0.13.2)
import graphviz # 流程图 (自动布局) - 系统安装
import networkx as nx # 网络图
from docx import Document # Word文档 (v1.1.2)
from reportlab.lib.pagesizes import letter # PDF生成 (v4.0.7)
from pptx import Presentation # PPT演示文稿 (v0.6.23)
import openpyxl # Excel操作 (v3.1.2)
import sympy as sp # 符号数学 (v1.12)
import scipy # 科学计算 (v1.14.1)
import scipy.optimize as opt # 优化算法
from bs4 import BeautifulSoup # HTML解析 (v4.12.3)
import lxml # 高性能解析器 (v5.2.2)
from tabulate import tabulate # 格式化表格 (v0.9.0)
from tqdm import tqdm # 进度条显示 (v4.66.4)
from joblib import Memory # 磁盘缓存和并行 (v1.3.2)
import aiofiles # 异步文件操作 (v24.1.0)
# 以下库已在后端安装,但用户代码通常不需要直接使用
# fastapi, uvicorn, docker, pydot 等
plt.show(),系统自动捕获/data 目录,媒体文件在上下文中# 以下操作会被阻止:
exec("危险代码") # ❌ 动态执行(后端限制exec_globals)
__import__('os').system('rm') # ❌ 系统命令(网络禁用)
open('/etc/passwd') # ❌ 访问系统文件(根目录只读)
class MyClass: # ❌ 类定义(sandbox限制)
pass
# 访问网络资源 # ❌ 网络完全禁用
del large_df# ❌ 错误做法
df = pd.read_csv('/data/huge.csv') # 可能崩溃
# ✅ 正确做法
# 方案A: 分块处理
for chunk in pd.read_csv('/data/huge.csv', chunksize=50000):
process(chunk)
# 方案B: 使用DuckDB内存外查询
result = duckdb.sql("SELECT * FROM read_csv_auto('/data/huge.csv') LIMIT 10000").df()
# 方案C: 转换为Feather格式
import pyarrow.feather as feather
df = pd.read_csv('/data/huge.csv')
feather.write_feather(df, '/data/huge.feather') # 保存为高效格式
df_fast = feather.read_feather('/data/huge.feather') # 快速读取
# ❌ 慢速Pandas操作
df['result'] = df['A'] * 2 + df['B'] ** 2 - df['C'] / 3
# ✅ 使用Numexpr加速
df['result'] = ne.evaluate("A * 2 + B ** 2 - C / 3",
{k: df[k].values for k in ['A', 'B', 'C']})
# ✅ 使用Bottleneck加速滚动统计
import bottleneck as bn
df['rolling_mean'] = bn.move_mean(df['value'], window=20)
# ❌ 缺少show()
plt.plot(x, y)
plt.title('图表')
# ✅ 必须调用show()
plt.plot(x, y)
plt.title('图表')
plt.show() # 🎯 关键!
# ✅ Graphviz图表自动捕获(无需额外调用)
dot = Digraph()
dot.node('A', '开始')
# 创建对象即自动捕获
# ❌ 同步IO阻塞
with open('/data/large.txt', 'r') as f:
content = f.read() # 阻塞主线程
# ✅ 异步IO (机械硬盘特别有效)
import aiofiles
import asyncio
async def read_file_async():
async with aiofiles.open('/data/large.txt', 'r') as f:
return await f.read()
# 监控内存使用
watch -n 2 "free -h | grep -E 'Mem|Swap'"
# 监控磁盘IO(机械硬盘关键指标)
iostat -x 2
# 监控Docker容器
docker stats --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.MemPerc}}"
import pandas as pd
import matplotlib.pyplot as plt
# 1. 读取数据
df = pd.read_csv('/data/data.csv')
# 2. 快速分析
print(f"数据形状: {df.shape}")
print(df.describe())
# 3. 简单可视化
df.groupby('category')['value'].mean().plot(kind='bar')
plt.title('各分类平均值')
plt.tight_layout()
plt.show() # 🎯 自动捕获图表
# 参考: report_generator_workflow.md
# 包含数据读取、分析、图表、文档生成全流程
import pandas as pd
import matplotlib.pyplot as plt
from docx import Document
import base64, json
# 1. 数据读取与分析
df = pd.read_excel('/data/sales_data.xlsx')
summary = df.groupby('region')['sales'].sum()
# 2. 创建图表
summary.plot(kind='bar')
plt.title('各地区销售总额')
plt.tight_layout()
plt.show() # 🎯 自动捕获
# 3. 生成Word报告
doc = Document()
doc.add_heading('销售分析报告', 0)
doc.add_paragraph(f"总销售额: ${df['sales'].sum():,.2f}")
doc.add_paragraph(f"平均销售额: ${df['sales'].mean():,.2f}")
# 4. 保存并输出
doc.save('/data/report.docx')
with open('/data/report.docx', 'rb') as f:
file_data = base64.b64encode(f.read()).decode('utf-8')
# 🎯 后端实际支持的输出格式
output = {
"type": "word",
"title": "销售分析报告.docx",
"data_base64": file_data
}
print(json.dumps(output))
# 参考: ml_workflow.md
# 包含数据预处理、特征工程、模型训练、评估
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report
# 1. 加载数据
df = pd.read_csv('/data/iris.csv')
# 2. 特征与标签
X = df.drop('species', axis=1)
y = df['species']
# 3. 划分数据集
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# 4. 训练模型
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
# 5. 评估
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))
from graphviz import Digraph
# 创建工作流程图
workflow = Digraph('工作流程', format='png')
workflow.attr(rankdir='LR', size='8,5')
# 添加节点
workflow.node('start', '开始', shape='ellipse')
workflow.node('collect', '数据采集', shape='box')
workflow.node('clean', '数据清洗', shape='box')
workflow.node('analyze', '数据分析', shape='box')
workflow.node('report', '报告生成', shape='box')
workflow.node('end', '结束', shape='ellipse')
# 添加边
workflow.edges([
'startcollect', 'collectclean',
'cleananalyze', 'analyzereport', 'reportend'
])
# 🎯 自动捕获:Graphviz对象创建后自动渲染为图片
# 无需调用render()或show()
记住: 这个沙盒环境已经预配置了所有库和优化,你只需要专注于业务逻辑!系统会自动处理技术细节,让你像在本地环境一样顺畅工作。
development
Apple Human Interface Guidelines for content display components. Use this skill when the user asks about charts component, collection view, image view, web view, color well, image well, activity view, lockup, data visualization, content display, displaying images, rendering web content, color pickers, or presenting collections of items in Apple apps. Also use when the user says how should I display charts, what's the best way to show images, should I use a web view, how do I build a grid of items, what component shows media, or how do I present a share sheet. Cross-references: hig-foundations for color/typography/accessibility, hig-patterns for data visualization patterns, hig-components-layout for structural containers, hig-platforms for platform-specific component behavior.
tools
Automate HelpDesk tasks via Rube MCP (Composio): list tickets, manage views, use canned responses, and configure custom fields. Always search tools first for current schemas.
testing
Expert Haskell engineer specializing in advanced type systems, pure functional design, and high-reliability software. Use PROACTIVELY for type-level programming, concurrency, and architecture guidance.
tools
GraphQL gives clients exactly the data they need - no more, no less. One endpoint, typed schema, introspection. But the flexibility that makes it powerful also makes it dangerous. Without proper controls, clients can craft queries that bring down your server. This skill covers schema design, resolvers, DataLoader for N+1 prevention, federation for microservices, and client integration with Apollo/urql. Key insight: GraphQL is a contract. The schema is the API documentation. Design it carefully.