src/datapro/data/skills/data-manipulation/SKILL.md
High-performance data manipulation and transformation using Pandas, Numpy, and DuckDB. Use when Claude needs to: (1) Clean or transform structured data (CSV, Parquet, JSON), (2) Perform large-scale aggregations or analytics, (3) Optimize analysis for performance and memory, (4) Implement the 'Tidy Data' (Wide-to-Long) strategy for reporting.
npx skillsauth add pablodiegoo/data-pro-skill data-manipulationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
High-performance data manipulation and transformation suite using Pandas, Numpy, and DuckDB. This skill handles the "T-Layer" (Transformation) of the data pipeline, preparing raw data for statistical analysis or reporting.
Standardizes raw, cryptic variables into semantic labels using external mapping dictionaries.
scripts/dict_mapper.pyProvides extremely fast ingestion and fuzzy cleaning for messy local files or files > 1GB.
scripts/duckdb_fuzzy_cleaner.py, scripts/quant_analyzer_duckdb.pyCalculates and applies expansion weights for representative survey analysis.
scripts/weighting.pyUtilities for managing project directory structures and data discovery.
scripts/data_directory_finder.py| Task | Recommended Tool | Pattern Reference |
| :--- | :--- | :--- |
| Join > 5M rows | DuckDB | Analysis Pattern |
| Wide-to-Long | Pandas melt | Tidy Pattern |
| Clean Outliers | NumPy/SciPy | Stats Pattern |
[!IMPORTANT] This skill focusing on preparation. For statistical inference, multivariate modeling, or causal analysis, defer to the
@data-analysis-suite.
testing
Comprehensive time-series validation and analysis suite. Handles backtesting of trading and non-trading strategies with support for walk-forward validation (training vs testing windows), performance metric calculation (Sharpe, Drawdown, Win Rate), and event-driven resource allocation simulation. Use for: (1) Validating sequential logic on time-series data, (2) Calculating risk-adjusted performance, (3) Simulating constraints in resource distribution, (4) Detecting look-ahead bias through walk-forward testing.
tools
Core statistical analysis and pipeline automation for survey datasets. Use for: (1) Running standard Crosstabs, NPS, Top-Box calculations, (2) Generating complete EDA or Analytics notebooks, (3) Quantitative and qualitative processing of questionnaire data.
development
Business-level frameworks and actionable reporting for executives. Use for: (1) Plotting Priority Matrices, (2) Generating Pain Curves, (3) Conversion Funnels, (4) Removing Halo Effects to uncover true sentiment.
testing
Tactical and highly interpretable Machine Learning. Use for: (1) Extracting Feature Importance via Random Forest, (2) Running Permutation Tests, (3) Handling Imbalanced Data (SMOTE).