Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

baidubce/famou-data-analysis

Name: famou-data-analysis
Author: baidubce

skills/famou-data-analysis/SKILL.md

npx skillsauth add baidubce/skills famou-data-analysis

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Data Analysis Skill

Analysis Goals

The core objectives of a data analysis task are:

Understand the data: Clarify what the data is, where it comes from, and what business meaning it carries — not just read the file
Assess data quality: Identify missing values, duplicates, anomalies, and formatting inconsistencies; evaluate data trustworthiness
Discover patterns and insights: Use statistics and exploration to surface meaningful patterns, trends, anomalies, and correlations
Design a processing pipeline: Based on the actual data issues, design a reproducible and well-reasoned cleaning and transformation plan
Summarize findings: Deliver conclusions in clear, business-relevant language — not just a pile of statistics

Constraints

Always follow these constraints throughout the analysis:

Do not modify raw data without permission: Always inform the user and explain the reason before performing any cleaning or transformation
Do not over-assume data meaning: When column names are unclear, ask the user for clarification rather than guessing
Do not ignore data quality issues: Anomalies, missing values, and inconsistencies must be explicitly flagged — never silently worked around
Do not rigidly apply a fixed pipeline: Data formats vary widely; adapt the analysis approach to the actual situation
Do not stray from the user's goal: Always keep the analysis depth and direction anchored to "what problem is the user trying to solve"

Best Practices

Understand first, then act

After receiving data, prioritize understanding the context: What is the business scenario? What is the user's analytical goal? This determines which columns matter, what counts as "anomalous", and how to handle missing values.

Explain results in business language

Don't just output statistics — explain what they mean. "The mean is 3200" is less useful than "The average order value is approximately $3,200, but the median is only $1,800, suggesting a small number of high-value orders are inflating the mean."

Triage data quality issues by severity

Blocking issues (e.g., critical columns entirely empty, widespread primary key duplication): Stop and inform the user; wait for confirmation before proceeding
Decision-required issues (e.g., high missing rates, ambiguous outliers): Present the tradeoffs of each handling option and let the user choose
Minor issues (e.g., scattered format inconsistencies, leading/trailing whitespace): Handle directly, but document them in the report

Make the processing pipeline explainable

Every operation should be answerable with "why did we do this?", for example:

"Filled with the median because this column is right-skewed and the mean is heavily influenced by outliers"
"Dropped this column because the missing rate is 78%, making it unusable"

Back every conclusion with data

Each insight should be accompanied by specific numbers as evidence; avoid subjective judgments unsupported by data.

Environment & Tools

File paths: Uploaded files are at /mnt/user-data/uploads/; save output files to /mnt/user-data/outputs/
Encoding issues with non-ASCII data: Try utf-8 first; fall back to latin-1 or cp1252 if needed
Chart font rendering: Set matplotlib.rcParams['font.sans-serif'] = ['DejaVu Sans'] for reliable rendering
CSV output for Excel: Use encoding='utf-8-sig' to prevent encoding issues when opening in Excel

Analysis Report Format

Regardless of the analysis path taken, the final summary report should include:

## Data Analysis Report

### Data Overview
(Dataset size, source, time range, description of key fields)

### Data Quality
(Issues found, severity level, actions taken or items pending confirmation)

### Key Findings
(3–5 most important insights, each supported by data)

### Processing Pipeline
(What transformations were applied to the data and why)

### Recommendations
(Suggestions for data improvement, directions for deeper analysis)

Examples

The following examples show how to apply the goals and constraints above when facing different types of data.

Example A: E-commerce Order Data Analysis

Scenario: The user uploads an order CSV with columns for Order ID, Order Time, Product Category, Amount, User City, and Payment Status, and wants to understand "overall sales performance."

Analysis approach:

Confirm the time range first — this is the baseline context for any sales analysis
Check the Amount column for negative values (refunds? data entry errors?) and zero values (require business explanation)
The distribution of Payment Status determines the definition of "valid orders"
Aggregate by time to spot trends; break down by category and city for composition analysis

Quality issue handling example:

Found 12 rows with negative values in the "Amount" column. In sales data, negatives typically represent refund records. Recommendation: filter them out when analyzing gross revenue; keep them when analyzing net income. Please confirm your analysis goal before we proceed.

Key findings example:

March revenue grew 34% month-over-month, primarily driven by the "Electronics" category (contributing 61% of the incremental growth)

New York and Los Angeles together account for 47% of all orders, but average order value is 15% below the average for mid-tier cities

Payment failure rate is 8.3%, above the typical industry benchmark of 2–5%; recommend investigating the payment flow

Example B: Messy Survey Data

Scenario: The user uploads an Excel file with multiple sheets, inconsistent column names (some say "Q1", others say "Question 1"), merged cells, and scattered blank rows and columns.

First step: Don't rush into analysis — first surface the structural issues to the user:

This file contains 3 sheets, and there are several structural issues to clarify before we proceed:

Sheet1 and Sheet2 have different column names — are these different batches of the same survey, or entirely different surveys?

Rows 5–8 are blank — is it safe to remove them?

The "Q3_Other" column is 92% empty — is this a low open-response rate, or a data export issue?

Once confirmed, I'll design a cleaning plan.

Constraint in action: Column meanings are not assumed, sheets are not merged arbitrarily — issues are surfaced first and the user is asked to confirm.

Example C: User Requests "Clean the Data and Output a Processed File"

Processing log example (provided alongside the output file):

Data Processing Log
Source file:  sales_2024.csv  (8,412 rows × 15 columns)
Output file:  sales_2024_cleaned.csv  (8,203 rows × 13 columns)

Changes:
1. Removed duplicate rows: deleted 89 fully duplicate records
2. Dropped "Notes2" column: 96% missing, no useful information
3. Dropped "Internal Code" column: user confirmed it is not needed for analysis
4. Standardized Amount column: converted "¥1,200.00" format to numeric 1200.0 (203 rows affected)
5. Standardized Date column: unified to YYYY-MM-DD format (source had mixed MM/DD/YYYY and written-out date formats)
6. Missing value handling:
   - "City" column: 34 missing values → filled with "Unknown" (user confirmed)
   - "Amount" column: 86 missing values → left blank (user confirmed these are anomalous records that should not be imputed)

baidubce/famou-data-analysis

skills/famou-data-analysis/SKILL.md

A data analysis skill for understanding datasets, analyzing data, building data processing pipelines, and summarizing analytical results. Use this skill when the user mentions "analyze data", "data processing", "data exploration", "statistical analysis", "data cleaning", "data summarization", "create a data report", "understand this dataset", or "take a look at this CSV/Excel/dataset". Even if the user simply says "help me look at this data" or "analyze this", trigger this skill whenever the context involves a data file or dataset. Also invoke this skill if data analysis is required during Famou problem definition.

20 stars

development

Updated May 13, 2026

$ install --global

skillsauth

npx skillsauth add baidubce/skills famou-data-analysis

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 13, 2026, 2:01 AM70.1s1 file scanned

SKILL.md

name:: famou-data-analysis
description:: A data analysis skill for understanding datasets, analyzing data, building data processing pipelines, and summarizing analytical results. Use this skill when the user mentions "analyze data", "data processing", "data exploration", "statistical analysis", "data cleaning", "data summarization", "create a data report", "understand this dataset", or "take a look at this CSV/Excel/dataset". Even if the user simply says "help me look at this data" or "analyze this", trigger this skill whenever the context involves a data file or dataset. Also invoke this skill if data analysis is required during Famou problem definition.
author:: famou-group
version:: 2.0

Data Analysis Skill

Analysis Goals

The core objectives of a data analysis task are:

Understand the data: Clarify what the data is, where it comes from, and what business meaning it carries — not just read the file
Assess data quality: Identify missing values, duplicates, anomalies, and formatting inconsistencies; evaluate data trustworthiness
Discover patterns and insights: Use statistics and exploration to surface meaningful patterns, trends, anomalies, and correlations
Design a processing pipeline: Based on the actual data issues, design a reproducible and well-reasoned cleaning and transformation plan
Summarize findings: Deliver conclusions in clear, business-relevant language — not just a pile of statistics

Constraints

Always follow these constraints throughout the analysis:

Do not modify raw data without permission: Always inform the user and explain the reason before performing any cleaning or transformation
Do not over-assume data meaning: When column names are unclear, ask the user for clarification rather than guessing
Do not ignore data quality issues: Anomalies, missing values, and inconsistencies must be explicitly flagged — never silently worked around
Do not rigidly apply a fixed pipeline: Data formats vary widely; adapt the analysis approach to the actual situation
Do not stray from the user's goal: Always keep the analysis depth and direction anchored to "what problem is the user trying to solve"

Best Practices

Understand first, then act

Explain results in business language

Triage data quality issues by severity

Blocking issues (e.g., critical columns entirely empty, widespread primary key duplication): Stop and inform the user; wait for confirmation before proceeding
Decision-required issues (e.g., high missing rates, ambiguous outliers): Present the tradeoffs of each handling option and let the user choose
Minor issues (e.g., scattered format inconsistencies, leading/trailing whitespace): Handle directly, but document them in the report

Make the processing pipeline explainable

Every operation should be answerable with "why did we do this?", for example:

"Filled with the median because this column is right-skewed and the mean is heavily influenced by outliers"
"Dropped this column because the missing rate is 78%, making it unusable"

Back every conclusion with data

Each insight should be accompanied by specific numbers as evidence; avoid subjective judgments unsupported by data.

Environment & Tools

File paths: Uploaded files are at /mnt/user-data/uploads/; save output files to /mnt/user-data/outputs/
Encoding issues with non-ASCII data: Try utf-8 first; fall back to latin-1 or cp1252 if needed
Chart font rendering: Set matplotlib.rcParams['font.sans-serif'] = ['DejaVu Sans'] for reliable rendering
CSV output for Excel: Use encoding='utf-8-sig' to prevent encoding issues when opening in Excel

Analysis Report Format

Regardless of the analysis path taken, the final summary report should include:

## Data Analysis Report

### Data Overview
(Dataset size, source, time range, description of key fields)

### Data Quality
(Issues found, severity level, actions taken or items pending confirmation)

### Key Findings
(3–5 most important insights, each supported by data)

### Processing Pipeline
(What transformations were applied to the data and why)

### Recommendations
(Suggestions for data improvement, directions for deeper analysis)

Examples

The following examples show how to apply the goals and constraints above when facing different types of data.

Example A: E-commerce Order Data Analysis

Scenario: The user uploads an order CSV with columns for Order ID, Order Time, Product Category, Amount, User City, and Payment Status, and wants to understand "overall sales performance."

Analysis approach:

Confirm the time range first — this is the baseline context for any sales analysis
Check the Amount column for negative values (refunds? data entry errors?) and zero values (require business explanation)
The distribution of Payment Status determines the definition of "valid orders"
Aggregate by time to spot trends; break down by category and city for composition analysis

Quality issue handling example:

Found 12 rows with negative values in the "Amount" column. In sales data, negatives typically represent refund records. Recommendation: filter them out when analyzing gross revenue; keep them when analyzing net income. Please confirm your analysis goal before we proceed.

Key findings example:

March revenue grew 34% month-over-month, primarily driven by the "Electronics" category (contributing 61% of the incremental growth)

New York and Los Angeles together account for 47% of all orders, but average order value is 15% below the average for mid-tier cities

Payment failure rate is 8.3%, above the typical industry benchmark of 2–5%; recommend investigating the payment flow

Example B: Messy Survey Data

Scenario: The user uploads an Excel file with multiple sheets, inconsistent column names (some say "Q1", others say "Question 1"), merged cells, and scattered blank rows and columns.

First step: Don't rush into analysis — first surface the structural issues to the user:

This file contains 3 sheets, and there are several structural issues to clarify before we proceed:

Sheet1 and Sheet2 have different column names — are these different batches of the same survey, or entirely different surveys?

Rows 5–8 are blank — is it safe to remove them?

The "Q3_Other" column is 92% empty — is this a low open-response rate, or a data export issue?

Once confirmed, I'll design a cleaning plan.

Constraint in action: Column meanings are not assumed, sheets are not merged arbitrarily — issues are surfaced first and the user is asked to confirm.

Example C: User Requests "Clean the Data and Output a Processed File"

Processing log example (provided alongside the output file):

Data Processing Log
Source file:  sales_2024.csv  (8,412 rows × 15 columns)
Output file:  sales_2024_cleaned.csv  (8,203 rows × 13 columns)

Changes:
1. Removed duplicate rows: deleted 89 fully duplicate records
2. Dropped "Notes2" column: 96% missing, no useful information
3. Dropped "Internal Code" column: user confirmed it is not needed for analysis
4. Standardized Amount column: converted "¥1,200.00" format to numeric 1200.0 (203 rows affected)
5. Standardized Date column: unified to YYYY-MM-DD format (source had mixed MM/DD/YYYY and written-out date formats)
6. Missing value handling:
   - "City" column: 34 missing values → filled with "Unknown" (user confirmed)
   - "Amount" column: 86 missing values → left blank (user confirmed these are anomalous records that should not be imputed)

Related Skills

baidubce/baidu-cloud-bos

devops

VerifiedTrustedCommunity

百度智能云对象存储（BOS）集成技能。当用户需要上传、下载、删除或复制 BOS 文件，列出文件列表或 Bucket，获取签名 URL，处理图片（亮度、对比度、模糊、旋转、裁剪、水印等），或递归同步本地目录与 BOS 时使用此技能。

21SKILL.mdUpdated Jun 9, 2026

baidubce/baidu-cloud-bos

baidubce/famou-result-visualization

development

VerifiedTrustedCommunity

Generate interactive visualization pages for feasible solutions produced by Famou evolutionary algorithms. Use this skill when the user mentions "Famou visualization", "visualize this solution", "show feasible solution results", "evolution results", "evolve visualization", or provides a Python-code solution (path planning, scheduling, knapsack, TSP, job scheduling, machine learning, etc.) that needs to be displayed visually. Even if the user just says "help me visualize this solution", "draw it out", or "show me the results", trigger this skill immediately whenever the context involves evolutionary algorithms or optimization problem solutions.

20SKILL.mdUpdated Apr 25, 2026

baidubce/famou-result-visualization

baidubce/famou-experiment-manager

testing

VerifiedTrustedCommunity

Workflow skill for managing famou evolutionary experiment tasks, including public normal mode and public pro hybrid mode. Use this skill when the user mentions "submit experiment", "check experiment status", "delete experiment", "get experiment results", "account info", "quota", "credits", "famou experiment", "upload experiment", "config.yaml experiment", "hybrid mode", or needs to use famou-ctl to manage experiment tasks. Even if the user just says "submit" or "run experiment", trigger this skill whenever the context involves the famou platform.

20SKILL.mdUpdated Apr 25, 2026

baidubce/famou-experiment-manager

baidubce/famou-artifact-generator

testing

VerifiedTrustedCommunity

Interactive end-to-end Famou workflow for defining, implementing, and solving optimization tasks. The workflow typically proceeds in three stages: (1) understand the data and define the task, producing `problem.md`; (2) implement and validate `evaluator.py`, `init.py`, and `prompt.md` from the task definition; (3) run deep solving through Famou. Trigger this skill whenever the user wants to define, clarify, create, or fix a Famou task; prepare Famou experiment artifacts; write or update `problem.md`, `evaluator.py`, `init.py`, or `prompt.md`; run Famou; do deep solving; or solve an optimization, ML, or search problem with evolutionary methods. Even if the user simply says "help me make a Famou task", "help me solve this", or "run Famou", trigger this skill whenever the surrounding context indicates an optimization or search task. Also trigger when the user describes a combinatorial optimization, scheduling, routing, or ML problem without mentioning Famou — treat it as a potential Famou task.

20SKILL.mdUpdated Apr 25, 2026

baidubce/famou-artifact-generator

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/baidubce/skills.git

# Copy into Claude Code skills folder (global)
cp -r skills/skills/famou-data-analysis ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

baidubce/skills

20 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT