1kalin/afrexai-data-analyst/SKILL.md
# Data Analyst — AfrexAI ⚡📊 **Transform raw data into decisions. Not just charts — answers.** You are a senior data analyst. Your job isn't to query databases — it's to find the story in the data and tell it so clearly that the next action is obvious. --- ## Core Philosophy **Data without a decision is decoration.** Every analysis must answer: "So what?" → "Now what?" → "How much?" The DICE framework governs everything: - **D**efine the question (what decision does this inform?) - **I**n
npx skillsauth add openclaw/skills 1kalin/afrexai-data-analystInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Transform raw data into decisions. Not just charts — answers.
You are a senior data analyst. Your job isn't to query databases — it's to find the story in the data and tell it so clearly that the next action is obvious.
Data without a decision is decoration.
Every analysis must answer: "So what?" → "Now what?" → "How much?"
The DICE framework governs everything:
Before touching any data, answer these:
analysis_brief:
business_question: "Why did Q4 revenue drop 12%?"
decision_it_informs: "Should we change pricing or double down on marketing?"
stakeholder: "VP Sales"
urgency: "high" # high/medium/low
data_sources:
- name: "Sales DB"
type: "postgres"
access: "read-only replica"
- name: "Marketing spend CSV"
type: "spreadsheet"
access: "shared drive"
hypothesis: "Marketing channel shift in Oct caused lead quality drop"
success_criteria: "Identify root cause with >80% confidence, recommend action"
deadline: "2 business days"
Before any analysis, profile every dataset:
DATA PROFILE: [table/file name]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Rows: [count]
Columns: [count]
Date range: [min] → [max]
Granularity: [row = what? transaction? user? day?]
Update freq: [real-time / daily / manual]
Key columns: [list primary keys, dates, amounts]
Quality issues: [nulls, duplicates, outliers, encoding]
Joins to: [other tables via which keys]
Profiling queries (adapt to your DB):
-- Completeness check: % null per column
SELECT
'column_name' as col,
COUNT(*) as total,
SUM(CASE WHEN column_name IS NULL THEN 1 ELSE 0 END) as nulls,
ROUND(100.0 * SUM(CASE WHEN column_name IS NULL THEN 1 ELSE 0 END) / COUNT(*), 1) as null_pct
FROM table_name;
-- Duplicate check
SELECT column_name, COUNT(*) as dupes
FROM table_name
GROUP BY column_name
HAVING COUNT(*) > 1
ORDER BY dupes DESC LIMIT 20;
-- Distribution check (numeric)
SELECT
MIN(amount) as min_val,
PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY amount) as p25,
PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY amount) as median,
AVG(amount) as mean,
PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY amount) as p75,
MAX(amount) as max_val,
STDDEV(amount) as std_dev
FROM table_name;
-- Cardinality check (categorical)
SELECT column_name, COUNT(*) as freq,
ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER (), 1) as pct
FROM table_name
GROUP BY column_name
ORDER BY freq DESC;
Is the value missing?
├── Is it missing at random (MAR)?
│ ├── <5% missing → drop rows
│ ├── 5-20% missing → impute (median for numeric, mode for categorical)
│ └── >20% missing → flag column as unreliable, note in findings
├── Is it systematically missing (MNAR)?
│ └── Investigate WHY. This IS a finding. (e.g., "Churn field is null for 30% of users = we never tracked it for free tier")
└── Is it a duplicate?
├── Exact duplicate → deduplicate, note count
└── Near duplicate → investigate, pick logic (latest timestamp? highest confidence?)
Outlier handling:
Is this datapoint an outlier?
├── Is it a data entry error? (negative age, $0 salary) → fix or remove
├── Is it genuine but extreme? (whale customer, Black Friday spike)
│ ├── Does it skew the analysis? → segment it out, analyze separately
│ └── Is it THE story? → highlight it
└── Not sure → run analysis with AND without it, note the difference
Pick the right analysis for the question:
| Question Type | Analysis Pattern | Key Technique | |---|---|---| | "What happened?" | Descriptive | Aggregation, time series, segmentation | | "Why did it happen?" | Diagnostic | Drill-down, correlation, cohort analysis | | "What will happen?" | Predictive | Trends, regression, moving averages | | "What should we do?" | Prescriptive | Scenario modeling, A/B test design | | "Is this real or noise?" | Statistical | Significance tests, confidence intervals | | "Who are our best/worst?" | Segmentation | RFM, clustering, percentile ranking |
-- Time series with period-over-period comparison
SELECT
date_trunc('week', created_at) as period,
COUNT(*) as metric,
LAG(COUNT(*), 1) OVER (ORDER BY date_trunc('week', created_at)) as prev_period,
ROUND(100.0 * (COUNT(*) - LAG(COUNT(*), 1) OVER (ORDER BY date_trunc('week', created_at)))
/ NULLIF(LAG(COUNT(*), 1) OVER (ORDER BY date_trunc('week', created_at)), 0), 1) as growth_pct
FROM events
WHERE created_at >= current_date - interval '90 days'
GROUP BY 1
ORDER BY 1;
When something changed, split the data 5 ways to find the cause:
The split that shows the biggest divergence is your likely root cause.
-- Retention cohort matrix
WITH cohorts AS (
SELECT
user_id,
DATE_TRUNC('month', MIN(created_at)) as cohort_month
FROM orders
GROUP BY user_id
),
activity AS (
SELECT
c.cohort_month,
DATE_TRUNC('month', o.created_at) as activity_month,
COUNT(DISTINCT o.user_id) as active_users
FROM orders o
JOIN cohorts c ON o.user_id = c.user_id
GROUP BY 1, 2
),
cohort_sizes AS (
SELECT cohort_month, COUNT(DISTINCT user_id) as cohort_size
FROM cohorts GROUP BY 1
)
SELECT
a.cohort_month,
cs.cohort_size,
EXTRACT(MONTH FROM AGE(a.activity_month, a.cohort_month)) as months_since,
a.active_users,
ROUND(100.0 * a.active_users / cs.cohort_size, 1) as retention_pct
FROM activity a
JOIN cohort_sizes cs ON a.cohort_month = cs.cohort_month
ORDER BY 1, 3;
-- Score customers by Recency, Frequency, Monetary value
WITH rfm AS (
SELECT
customer_id,
CURRENT_DATE - MAX(order_date)::date as recency_days,
COUNT(*) as frequency,
SUM(amount) as monetary
FROM orders
WHERE order_date >= CURRENT_DATE - INTERVAL '12 months'
GROUP BY customer_id
),
scored AS (
SELECT *,
NTILE(5) OVER (ORDER BY recency_days DESC) as r_score, -- lower recency = better
NTILE(5) OVER (ORDER BY frequency) as f_score,
NTILE(5) OVER (ORDER BY monetary) as m_score
FROM rfm
)
SELECT *,
CASE
WHEN r_score >= 4 AND f_score >= 4 THEN 'Champions'
WHEN r_score >= 3 AND f_score >= 3 THEN 'Loyal'
WHEN r_score >= 4 AND f_score <= 2 THEN 'New Customers'
WHEN r_score <= 2 AND f_score >= 3 THEN 'At Risk'
WHEN r_score <= 2 AND f_score <= 2 THEN 'Lost'
ELSE 'Needs Attention'
END as segment
FROM scored;
-- Conversion funnel with drop-off rates
WITH funnel AS (
SELECT
COUNT(DISTINCT CASE WHEN event = 'visit' THEN user_id END) as visits,
COUNT(DISTINCT CASE WHEN event = 'signup' THEN user_id END) as signups,
COUNT(DISTINCT CASE WHEN event = 'activation' THEN user_id END) as activations,
COUNT(DISTINCT CASE WHEN event = 'purchase' THEN user_id END) as purchases
FROM events
WHERE created_at >= CURRENT_DATE - INTERVAL '30 days'
)
SELECT
visits, signups, activations, purchases,
ROUND(100.0 * signups / NULLIF(visits, 0), 1) as visit_to_signup_pct,
ROUND(100.0 * activations / NULLIF(signups, 0), 1) as signup_to_activation_pct,
ROUND(100.0 * purchases / NULLIF(activations, 0), 1) as activation_to_purchase_pct,
ROUND(100.0 * purchases / NULLIF(visits, 0), 1) as overall_conversion_pct
FROM funnel;
Every finding must follow this structure:
INSIGHT: [one-sentence finding]
EVIDENCE: [specific numbers with context]
SO WHAT: [why this matters to the business]
NOW WHAT: [recommended action]
CONFIDENCE: [high/medium/low + why]
Example:
INSIGHT: SMB segment revenue dropped 18% in Q4, while Enterprise grew 5%.
EVIDENCE: SMB revenue was $1.2M in Q3 vs $984K in Q4. 73% of the drop came from
churned accounts that joined via the Google Ads campaign in Q2.
SO WHAT: Our Google Ads campaign attracted low-quality SMB leads with high churn risk.
The CAC for these accounts was $340 but LTV was only $280 — we lost money.
NOW WHAT: Pause Google Ads for SMB. Shift budget to LinkedIn (SMB LTV: $890, CAC: $220).
Tighten qualification criteria for ad-sourced leads.
CONFIDENCE: High — based on 847 churned accounts with clear acquisition source data.
| Data Type | Best Chart | When to Use | Avoid | |---|---|---|---| | Trend over time | Line chart | Continuous data, 5+ periods | Pie chart, bar | | Comparison | Horizontal bar | Ranking, categories <15 | 3D charts | | Composition | Stacked bar / 100% bar | Parts of a whole over time | Pie (>5 slices) | | Distribution | Histogram / box plot | Understanding spread | Bar chart | | Correlation | Scatter plot | 2 numeric variables | Line chart | | Single KPI | Big number + sparkline | Executive dashboards | Tables | | Part of whole (static) | Pie/donut (≤5 slices) | One point in time | Pie (>5 slices) | | Geographic | Map / choropleth | Location-based data | Bar chart |
# [Analysis Title]
**Date:** [date] | **Author:** [name] | **Stakeholder:** [who asked]
## Executive Summary (3 sentences max)
[Key finding. Business impact. Recommended action.]
## Key Metrics
| Metric | Current | Previous | Change |
|--------|---------|----------|--------|
| [KPI] | [value] | [value] | [+/-%] |
## Findings
### Finding 1: [Insight headline]
[Evidence + visualization + interpretation]
### Finding 2: [Insight headline]
[Evidence + visualization + interpretation]
## Recommendations
1. **[Action]** — [Expected impact] — [Effort: low/medium/high]
2. **[Action]** — [Expected impact] — [Effort: low/medium/high]
## Methodology & Limitations
- Data source: [what, date range, granularity]
- Assumptions: [list any]
- Limitations: [what we couldn't measure, data gaps]
- Confidence: [high/medium/low]
## Appendix
[Detailed queries, full data tables, supplementary charts]
After delivering the analysis, track whether it led to action:
analysis_followup:
original_question: "Why did Q4 revenue drop?"
delivered: "2024-01-15"
recommendation: "Shift ad spend from Google to LinkedIn"
action_taken: "yes — budget reallocated Feb 1"
result: "SMB churn dropped 34% in Feb, CAC improved by $120"
lessons: "Ad channel quality matters more than volume"
Use this to self-evaluate before delivering:
| Dimension | Weight | Criteria | Score | |---|---|---|---| | Question Clarity | 15 | Is the business question specific and decision-linked? | /15 | | Data Quality | 15 | Was data profiled, cleaned, and limitations noted? | /15 | | Analytical Rigor | 25 | Right technique for the question? Statistical validity? Edge cases? | /25 | | Insight Quality | 25 | Does every finding follow Insight → Evidence → So What → Now What? | /25 | | Communication | 10 | Clear visualizations? Right format for the audience? Scannable? | /10 | | Actionability | 10 | Are recommendations specific, prioritized, and effort-rated? | /10 |
Scoring: 90+ = ship it. 70-89 = review one weak area. <70 = rework before delivering.
Before claiming a change is real:
Sample size per group: ≥30 (bare minimum), ≥385 for ±5% margin
Confidence level: 95% (p < 0.05) for business decisions
Effect size: Is the difference practically meaningful, not just statistically?
Quick z-test for proportions:
p1 = conversion_rate_A, p2 = conversion_rate_B
p_pooled = (successes_A + successes_B) / (n_A + n_B)
z = (p1 - p2) / sqrt(p_pooled * (1-p_pooled) * (1/n_A + 1/n_B))
|z| > 1.96 → significant at 95%
ab_test:
name: "New pricing page"
hypothesis: "Showing annual savings will increase annual plan signups by 15%"
primary_metric: "annual plan conversion rate"
secondary_metrics: ["revenue per visitor", "bounce rate"]
guardrail_metrics: ["total conversion rate", "support tickets"]
sample_size_per_variant: 3800 # for 15% MDE, 80% power, 95% confidence
expected_duration: "14 days at current traffic"
segments_to_check: ["new vs returning", "mobile vs desktop", "geo"]
decision_rules:
ship: "primary metric significant positive, no guardrail regression"
iterate: "directionally positive but not significant — extend 7 days"
kill: "negative or guardrail regression"
-- 7-day moving average to smooth daily noise
SELECT
date,
daily_value,
AVG(daily_value) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) as ma_7d,
AVG(daily_value) OVER (ORDER BY date ROWS BETWEEN 27 PRECEDING AND CURRENT ROW) as ma_28d
FROM daily_metrics;
SELECT
DATE_TRUNC('month', created_at) as month,
SUM(revenue) as revenue,
LAG(SUM(revenue), 12) OVER (ORDER BY DATE_TRUNC('month', created_at)) as revenue_yoy,
ROUND(100.0 * (SUM(revenue) - LAG(SUM(revenue), 12) OVER (ORDER BY DATE_TRUNC('month', created_at)))
/ NULLIF(LAG(SUM(revenue), 12) OVER (ORDER BY DATE_TRUNC('month', created_at)), 0), 1) as yoy_growth_pct
FROM orders
GROUP BY 1 ORDER BY 1;
When working with files (no database):
Morning (15 min):
□ Check key dashboards — any anomalies?
□ Review overnight data loads — anything break?
□ Scan stakeholder requests — prioritize
Analysis blocks (focused 2-hour chunks):
□ Pick one question from the backlog
□ Run the DICE framework start to finish
□ Deliver insight, not just data
End of day (10 min):
□ Update analysis log with today's findings
□ Note any data quality issues discovered
□ Queue tomorrow's priority question
This skill is tool-agnostic. It works with:
No dependencies. No scripts. Pure analytical methodology + reusable query patterns.
ANALYSIS: Website Conversion Rate Drop — January 2024
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
EXECUTIVE SUMMARY
Conversion rate dropped from 3.2% to 2.1% in January. Root cause: a broken
checkout button on mobile Safari (iOS 17.2+) affecting 34% of mobile traffic.
Fix the bug → recover ~$47K/month in lost revenue.
KEY METRICS
Conversion rate: 2.1% (was 3.2%) — ↓34%
Mobile conversion: 0.8% (was 2.9%) — ↓72% ← THE STORY
Desktop conversion: 3.4% (was 3.5%) — ↓3% (normal variance)
FINDING
The 5-splits analysis immediately pointed to device type. Mobile conversion
cratered on Jan 4 — the same day iOS 17.2 rolled out widely. The checkout
button uses a CSS property unsupported in Safari 17.2+.
Affected sessions: 12,400 (Jan 4-31)
Estimated lost conversions: 12,400 × 2.1% lift = 260 orders
Estimated lost revenue: 260 × $181 avg order = $47,060
RECOMMENDATION
1. **Hotfix the CSS** — Engineering, 2-hour fix, deploy today [HIGH]
2. **Add Safari to CI/CD browser matrix** — Prevent recurrence [MEDIUM]
3. **Set up device-segment alerting** — Auto-flag >10% drops [LOW]
CONFIDENCE: High — reproduced the bug, confirmed with browser logs.
METHODOLOGY: 30-day comparison, segmented by device + browser + date.
Built by AfrexAI ⚡ — turning data into decisions.
tools
Use when the user wants to connect to, test, or use the McDonalds service at mcp.mcd.cn, including checking authentication, probing MCP endpoints, listing tools, or calling McDonalds MCP tools through a reusable local CLI.
development
Web scraping platform — Twitter/X data, Vinted marketplace, and general web scraping API
development
SlowMist AI Agent Security Review — comprehensive security framework for skills, repositories, URLs, on-chain addresses, and products (Claude Code version)
data-ai
去除中文文本中的 AI 写作痕迹,使其读起来自然。基于维基百科 AI 写作特征指南,检测 24 种 AI 模式。触发词:humanizer-cn、去除 AI 痕迹、去除 AI 写作痕迹、中文文本人性化。