skills/43-wentorai-research-plugins/skills/analysis/wrangling/streamline-analyst-guide/SKILL.md
End-to-end data analysis AI agent with Streamlit UI
npx skillsauth add brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research streamline-analyst-guideInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Streamline Analyst is an end-to-end data analysis AI agent with a Streamlit web interface. Upload a dataset and describe your analysis goal in natural language — the agent handles data cleaning, EDA, feature engineering, model training, evaluation, and report generation. Provides an interactive UI for reviewing each step and adjusting parameters.
git clone https://github.com/Wilson-ZheLin/Streamline-Analyst.git
cd Streamline-Analyst
pip install -r requirements.txt
streamlit run app.py
Upload Dataset (CSV, Excel, Parquet)
↓
Data Profiling
├── Column types and distributions
├── Missing value analysis
├── Correlation matrix
└── Outlier detection
↓
Data Cleaning (interactive)
├── Handle missing values
├── Remove/fix outliers
├── Type conversions
└── Feature encoding
↓
EDA (automated + custom)
├── Univariate analysis
├── Bivariate relationships
├── Statistical tests
└── Custom visualizations
↓
Modeling (if applicable)
├── Train/test split
├── Model selection + training
├── Hyperparameter tuning
└── Evaluation metrics
↓
Report Generation
# Streamline Analyst provides:
# 1. Smart data profiling
# - Auto-detect column types (numeric, categorical, datetime)
# - Distribution analysis per column
# - Missing value patterns (MCAR, MAR, MNAR hints)
# - Correlation analysis with significance
# 2. Interactive cleaning
# - Imputation strategies (mean, median, mode, KNN, model)
# - Outlier handling (IQR, Z-score, isolation forest)
# - Encoding (one-hot, label, target, ordinal)
# - Scaling (standard, minmax, robust)
# 3. Automated EDA
# - Distribution plots (histogram, KDE, box, violin)
# - Relationship plots (scatter, pair, heatmap)
# - Time series decomposition
# - Statistical tests (t-test, ANOVA, chi-square, Mann-Whitney)
# 4. Model pipeline
# - Classification: LR, RF, GBM, SVM, MLP
# - Regression: LR, RF, GBM, SVR, ElasticNet
# - Cross-validation with confidence intervals
# - Feature importance visualization
# - SHAP explanations
# 5. Report
# - HTML report with all plots and findings
# - Downloadable cleaned dataset
# - Model artifacts (pickle)
### Example Prompts
- "Show me the distribution of all numeric columns"
- "Is there a significant difference in income between genders?"
- "Build a classifier to predict churn using all features"
- "What are the top 5 most important features for prediction?"
- "Clean the data: fill missing values and remove outliers"
- "Generate a summary report of this dataset"
development
Track dataset lineage, transformation steps, merge logic, and reproducibility risks in Stata workflows. Use when the user needs to explain where data came from, how it changed, or why a pipeline can be trusted.
development
Audit datasets for structure, missingness, labeling, suspicious values, duplicate identifiers, and documentation readiness. Use when a researcher asks for data QA, codebook review, sanity checks, or pre-analysis cleanup guidance.
data-ai
Design, run, and critique causal inference workflows in Stata. Use when the user is working on identification, treatment effects, DiD, IV, event studies, RD, or assumption-sensitive empirical claims.
tools
Complete survival analysis library in Python. Handles right-censored data, Kaplan-Meier curves, and Cox regression. Standard for clinical trial analysis and epidemiology.