skills/data-analysis/SKILL.md
Analyze datasets to extract insights, identify patterns, and generate reports. Use when exploring data, creating visualizations, or performing statistical analysis. Handles CSV, JSON, SQL queries, and Python pandas operations.
npx skillsauth add jyjeanne/ai-setup-forge data-analysisInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Python (Pandas):
import pandas as pd
import numpy as np
# Load CSV
df = pd.read_csv('data.csv')
# Basic info
print(df.info())
print(df.describe())
print(df.head(10))
# Check missing values
print(df.isnull().sum())
# Data types
print(df.dtypes)
SQL:
-- Inspect table schema
DESCRIBE table_name;
-- Sample data
SELECT * FROM table_name LIMIT 10;
-- Basic stats
SELECT
COUNT(*) as total_rows,
COUNT(DISTINCT column_name) as unique_values,
MIN(numeric_column) as min_val,
MAX(numeric_column) as max_val,
AVG(numeric_column) as avg_val
FROM table_name;
# Handle missing values
df['column'].fillna(df['column'].mean(), inplace=True)
df.dropna(subset=['required_column'], inplace=True)
# Remove duplicates
df.drop_duplicates(inplace=True)
# Type conversions
df['date'] = pd.to_datetime(df['date'])
df['category'] = df['category'].astype('category')
# Remove outliers (IQR method)
Q1 = df['value'].quantile(0.25)
Q3 = df['value'].quantile(0.75)
IQR = Q3 - Q1
df = df[(df['value'] >= Q1 - 1.5*IQR) & (df['value'] <= Q3 + 1.5*IQR)]
# Descriptive statistics
print(df['numeric_column'].describe())
# Grouped analysis
grouped = df.groupby('category').agg({
'value': ['mean', 'sum', 'count'],
'other': 'nunique'
})
print(grouped)
# Correlation
correlation = df[['col1', 'col2', 'col3']].corr()
print(correlation)
# Pivot table
pivot = pd.pivot_table(df,
values='sales',
index='region',
columns='month',
aggfunc='sum'
)
import matplotlib.pyplot as plt
import seaborn as sns
# Histogram
plt.figure(figsize=(10, 6))
df['value'].hist(bins=30)
plt.title('Distribution of Values')
plt.savefig('histogram.png')
# Boxplot
plt.figure(figsize=(10, 6))
sns.boxplot(x='category', y='value', data=df)
plt.title('Value by Category')
plt.savefig('boxplot.png')
# Heatmap (correlation)
plt.figure(figsize=(10, 8))
sns.heatmap(correlation, annot=True, cmap='coolwarm')
plt.title('Correlation Matrix')
plt.savefig('heatmap.png')
# Time series
plt.figure(figsize=(12, 6))
df.groupby('date')['value'].sum().plot()
plt.title('Time Series of Values')
plt.savefig('timeseries.png')
# Top/bottom analysis
top_10 = df.nlargest(10, 'value')
bottom_10 = df.nsmallest(10, 'value')
# Trend analysis
df['month'] = df['date'].dt.to_period('M')
monthly_trend = df.groupby('month')['value'].sum()
growth = monthly_trend.pct_change() * 100
# Segment analysis
segments = df.groupby('segment').agg({
'revenue': 'sum',
'customers': 'nunique',
'orders': 'count'
})
segments['avg_order_value'] = segments['revenue'] / segments['orders']
# Data Analysis Report
## 1. Dataset overview
- Dataset: [name]
- Records: X,XXX
- Columns: XX
- Date range: YYYY-MM-DD ~ YYYY-MM-DD
## 2. Key findings
- Insight 1
- Insight 2
- Insight 3
## 3. Statistical summary
| Metric | Value |
|------|-----|
| Mean | X.XX |
| Median | X.XX |
| Std dev | X.XX |
## 4. Recommendations
1. [Recommendation 1]
2. [Recommendation 2]
development
Generate breadboard circuit mockups and visual diagrams using HTML5 Canvas drawing techniques. Use when asked to create circuit layouts, visualize electronic component placements, draw breadboard diagrams, mockup 6502 builds, generate retro computer schematics, or design vintage electronics projects. Supports 555 timers, W65C02S microprocessors, 28C256 EEPROMs, W65C22 VIA chips, 7400-series logic gates, LEDs, resistors, capacitors, switches, buttons, crystals, and wires.
development
Apply lean thinking to UX: hypothesis-driven design, collaborative sketching, and rapid experiments instead of heavy deliverables. Use when the user mentions "Lean UX", "design hypothesis", "UX experiment", "collaborative design", or "outcome over output". Covers hypothesis statements, MVPs for UX, and cross-functional collaboration. For Build-Measure-Learn, see lean-startup. For usability audits, see ux-heuristics.
development
Design MVPs, validated learning experiments, and pivot-or-persevere decisions using Build-Measure-Learn. Use when the user mentions "MVP scope", "validated learning", "pivot or persevere", "vanity metrics", or "test assumptions". Covers innovation accounting and actionable metrics. For 5-day prototype testing, see design-sprint. For customer motivation analysis, see jobs-to-be-done.
tools
Instrument, trace, evaluate, and monitor LLM applications and AI agents with LangSmith. Use when setting up observability for LLM pipelines, running offline or online evaluations, managing prompts in the Prompt Hub, creating datasets for regression testing, or deploying agent servers. Triggers on: langsmith, langchain tracing, llm tracing, llm observability, llm evaluation, trace llm calls, @traceable, wrap_openai, langsmith evaluate, langsmith dataset, langsmith feedback, langsmith prompt hub, langsmith project, llm monitoring, llm debugging, llm quality, openevals, langsmith cli, langsmith experiment, annotate llm, llm judge.