skills/spreadsheets/SKILL.md
Use when CSV, TSV, or Excel (.xlsx) is the primary input/output: inspect, clean, transform, dedupe, merge, validate, convert, recalc formulas, or create/fix spreadsheets. Do not trigger when tabular data is incidental.
npx skillsauth add paulrberg/dot-agents spreadsheetsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Opinionated tabular-data handling for macOS. TSV/CSV is the primary format; .xlsx is the exception. Stack: duckdb and qsv for wrangling, uv-run Python (stdlib csv + Decimal) for transforms, openpyxl for Excel mechanics, headless LibreOffice for formula recalculation. Run all Python through uv — never bare python or pip.
| Job | Use |
| ------------------------------------------------------------ | --------------------------------------------------- |
| First look at an unknown CSV/TSV | uv run scripts/peek.py <file> |
| Counts, stats, frequencies, dedupe, column select | qsv (infers the tab delimiter from .tsv) |
| Joins, group-bys, pivots, cross-file SQL, format conversion | duckdb -c "..." |
| Row-level transforms, precision-critical edits | uv run Python with a PEP 723 header |
| Anything .xlsx in or out | read references/xlsx.md first |
| Recalculating .xlsx formulas | uv run scripts/recalc.py <file.xlsx> |
| Row/column-aware diff of two tables | bunx daff old.tsv new.tsv |
| Interactive viewing (suggest to the user; never launch TUIs) | csvlens, vd, Numbers.app |
decimal.Decimal or DuckDB DECIMAL(38, 18). Read with all_varchar = true in DuckDB and plain stdlib csv in Python. If pandas is unavoidable, pass dtype=str.snake_case headers; ISO 8601 dates (YYYY-MM-DD; prb-finance timestamps use YYYY-MM-DD@HH:MM:SS); . decimal point; no thousands separators or currency symbols inside cells; - for null. Conventions already present in an existing file override every one of these.encoding="utf-8-sig".peek.py (column count unchanged, no new ragged rows) or the owning repo's checks. In prb-finance: just tsv-check, then just cli::write-changed to regenerate derived reports — never hand-edit generated .pool.tsv/.annual.tsv/.md artifacts.mv over the original.=, +, or @ with ' (a bare - null is exempt).uv run scripts/peek.py <file> [--rows N] prints a JSON report: encoding and BOM, newline style and trailing newline, delimiter and how it was detected, header with duplicates flagged, column/row counts, ragged and empty rows, - null usage, and sample rows. Run it before editing any delimited file; on a binary spreadsheet it exits with a pointer to the xlsx workflow.
Quick follow-ups with qsv:
qsv count txs.tsv # row count (excludes header)
qsv headers txs.tsv # numbered column names
qsv stats -E txs.tsv | qsv table # per-column types, ranges, cardinality
qsv frequency -s event txs.tsv # value distribution of one column
qsv select date_utc,amount txs.tsv
qsv dedup txs.tsv
qsv infers the input delimiter from the file extension, but stdout is always comma-separated. When the result must stay TSV, write it with -o out.tsv (the output extension sets the delimiter) — never shell redirection.
Canonical read — everything as strings, - mapped to NULL:
FROM read_csv('txs.tsv', delim = '\t', header = true, all_varchar = true, nullstr = '-');
-- Profile every column
SUMMARIZE SELECT * FROM read_csv('txs.tsv', delim = '\t', all_varchar = true, nullstr = '-');
-- Aggregate with exact decimals
SELECT event, SUM(amount::DECIMAL(38, 18)) AS total
FROM read_csv('txs.tsv', delim = '\t', all_varchar = true, nullstr = '-')
GROUP BY event
ORDER BY total DESC;
-- Write a TSV back out
COPY (SELECT ...) TO 'out.tsv' (FORMAT csv, DELIMITER '\t', HEADER true, NULLSTR '-');
DuckDB also reads and writes .xlsx (read_xlsx, COPY ... (FORMAT xlsx)) — see references/xlsx.md.
Stdlib csv keeps every cell a string — precision-safe by default. Script template:
#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.12"
# dependencies = []
# ///
import csv
from decimal import Decimal
with open("in.tsv", encoding="utf-8-sig", newline="") as f:
rows = list(csv.DictReader(f, delimiter="\t"))
# transform here; use Decimal(row["amount"]) for arithmetic
with open("out.tsv", "w", encoding="utf-8", newline="") as f:
writer = csv.DictWriter(f, fieldnames=rows[0].keys(), delimiter="\t", lineterminator="\n")
writer.writeheader()
writer.writerows(rows)
newline="" to every open() the csv module touches, and lineterminator="\n" for LF output.uv run --with <pkg> python -c "...".Read references/xlsx.md whenever a .xlsx/.xlsm is input or deliverable: openpyxl create/edit, DuckDB xlsx I/O, styling, conversion recipes, and the recalculation loop. The two absolutes:
=SUM(B2:B9)), not values precomputed in Python.uv run scripts/recalc.py <file.xlsx> and deliver only when it reports zero #REF!/#DIV/0!/#VALUE!/#NAME? errors.Recalculation needs LibreOffice: brew install --cask libreoffice. The script finds the app bundle on its own; soffice does not need to be on PATH.
testing
Use ONLY to check or update the project-scoped agent skills installed under .agents/skills so they match the current state of the repo. Do not trigger for creating, finding, or installing skills, or for README/AGENTS.md updates.
testing
Use only when explicitly asked to archive/prune/compact/roll over checked tasks from TODO.md into `.ai/todos/TODO_UNTIL_YYYY_MM_DD.md`, leaving unchecked tasks.
development
Use this skill any time a spreadsheet file is the primary input or output. This means any task where the user wants to: open, read, edit, or fix an existing .xlsx, .xlsm, .csv, or .tsv file (e.g., adding columns, computing formulas, formatting, charting, cleaning messy data); create a new spreadsheet from scratch or from other data sources; or convert between tabular file formats. Trigger especially when the user references a spreadsheet file by name or path — even casually (like "the xlsx in my downloads") — and wants something done to it or produced from it. Also trigger for cleaning or restructuring messy tabular data files (malformed rows, misplaced headers, junk data) into proper spreadsheets. The deliverable must be a spreadsheet file. Do NOT trigger when the primary deliverable is a Word document, HTML report, standalone Python script, database pipeline, or Google Sheets API integration, even if tabular data is involved.
content-media
Use this skill whenever the user wants to do anything with PDF files. This includes reading or extracting text/tables from PDFs, combining or merging multiple PDFs into one, splitting PDFs apart, rotating pages, adding watermarks, creating new PDFs, filling PDF forms, encrypting/decrypting PDFs, extracting images, and OCR on scanned PDFs to make them searchable. If the user mentions a .pdf file or asks to produce one, use this skill.