plugins/python-engineering/skills/python3-data/SKILL.md
Specialist skill for Python data engineering — pandas, polars, DuckDB, numpy, ETL pipelines, tabular data ingestion, and notebook-to-module extraction. Use when working with dataframes, data validation at ingress boundaries, merge/join operations, typed column contracts, or choosing between pandas vs polars vs DuckDB for a data task.
npx skillsauth add jamie-bitflight/claude_skills python3-dataInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Load python3-core for standing defaults. Load python3-typing for boundary schemas. Load python3-testing for parser and edge-case tests.
dtype= explicit in pd.read_csv() / pd.read_excel() — never rely on inferencepd.DataFrame crossing module boundaries without documented column contractmodel_config = {"strict": True} on all Pydantic boundary modelsinplace=True — deprecated, returns None, causes silent bugs| Trap | What to do instead |
|---|---|
| df["a"]["b"] = x (chained indexing) | df.loc[:, "b"] = x — chained indexing silently fails |
| .apply(lambda) on large frames | Vectorized ops first; .apply() only when no vectorized path exists |
| pd.merge() without post-check | Assert no unexpected nulls or duplicate keys after merge |
| df.drop(..., inplace=True) | df = df.drop(...) — inplace is deprecated and returns None |
| Bare pd.read_csv(path) | Always pass dtype= to prevent silent type inference errors |
| Task | Use | Not |
|---|---|---|
| Tabular < 1M rows | pandas | Polars (overhead not justified) |
| Tabular > 1M rows or need speed | Polars | pandas |
| SQL-like analytics on local files | DuckDB | Loading everything into pandas |
| Read-only TOML config | tomllib (stdlib, binary mode "rb") | tomlkit |
| Read/write TOML preserving comments | tomlkit (text mode) | tomllib |
etl/
├── ingest.py # raw data loading (boundary)
├── validate.py # schema validation (boundary)
├── transform.py # business logic (typed core)
├── load.py # output writing (boundary)
└── types.py # shared typed models
development
When an application needs to store config, data, cache, or state files. When designing where user-specific files should live. When code writes to ~/.appname or hardcoded home paths. When implementing cross-platform file storage with platformdirs.
testing
Enforce mandatory pre-action verification checkpoints to prevent pattern-matching from overriding explicit reasoning. Use this skill when about to execute implementation actions (Bash, Write, Edit) to verify hypothesis-action alignment. Blocks execution when hypothesis unverified or action targets different system than hypothesis identified. Critical for preventing cognitive dissonance where correct diagnosis leads to wrong implementation.
tools
Reference guide for the Twelve-Factor App methodology — 15 principles (12 original + 3 modern extensions) for building portable, resilient, cloud-native applications. Use when evaluating application architecture, designing cloud-native services, reviewing codebases for methodology compliance, advising on configuration, scaling, observability, security, and deployment patterns. Incorporates the 2025 open-source community evolution and cloud-native reinterpretations of each factor.
tools
Converts user-facing documentation (how-to guides, tutorials, API references, examples) in any format — Markdown, PDF, DOCX, PPTX, XLSX, AsciiDoc, RST, HTML, Jupyter notebooks, man pages, TOML/YAML/JSON configs, and plain text — into Claude Code skill directories with SKILL.md plus thematically grouped references/*.md files. Use when given a docs directory or mixed-format documentation to transform into an AI skill. Uses MCP file-reader server for binary formats.