skills/spreadsheets/SKILL.md
Use when CSV, TSV, or Excel (.xlsx) is the primary input/output: inspect, clean, transform, dedupe, merge, validate, convert, recalc formulas, or create/fix spreadsheets. Do not trigger when tabular data is incidental.
npx skillsauth add paulrberg/agent-skills spreadsheetsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Opinionated tabular-data handling for macOS. TSV/CSV is the primary format; .xlsx is the exception. Stack: duckdb and qsv for wrangling, uv-run Python (stdlib csv + Decimal) for transforms, openpyxl for Excel mechanics, headless LibreOffice for formula recalculation. Run all Python through uv — never bare python or pip.
| Job | Use |
| ------------------------------------------------------------ | --------------------------------------------------- |
| First look at an unknown CSV/TSV | uv run scripts/peek.py <file> |
| Counts, stats, frequencies, dedupe, column select | qsv (infers the tab delimiter from .tsv) |
| Joins, group-bys, pivots, cross-file SQL, format conversion | duckdb -c "..." |
| Row-level transforms, precision-critical edits | uv run Python with a PEP 723 header |
| Anything .xlsx in or out | read references/xlsx.md first |
| Recalculating .xlsx formulas | uv run scripts/recalc.py <file.xlsx> |
| Row/column-aware diff of two tables | bunx daff old.tsv new.tsv |
| Interactive viewing (suggest to the user; never launch TUIs) | csvlens, vd, Numbers.app |
decimal.Decimal or DuckDB DECIMAL(38, 18). Read with all_varchar = true in DuckDB and plain stdlib csv in Python. If pandas is unavoidable, pass dtype=str.snake_case headers; ISO 8601 dates (YYYY-MM-DD; prb-finance timestamps use YYYY-MM-DD@HH:MM:SS); . decimal point; no thousands separators or currency symbols inside cells; - for null. Conventions already present in an existing file override every one of these.encoding="utf-8-sig".peek.py (column count unchanged, no new ragged rows) or the owning repo's checks. In prb-finance: just tsv-check, then just cli::write-changed to regenerate derived reports — never hand-edit generated .pool.tsv/.annual.tsv/.md artifacts.mv over the original.=, +, or @ with ' (a bare - null is exempt).uv run scripts/peek.py <file> [--rows N] prints a JSON report: encoding and BOM, newline style and trailing newline, delimiter and how it was detected, header with duplicates flagged, column/row counts, ragged and empty rows, - null usage, and sample rows. Run it before editing any delimited file; on a binary spreadsheet it exits with a pointer to the xlsx workflow.
Quick follow-ups with qsv:
qsv count txs.tsv # row count (excludes header)
qsv headers txs.tsv # numbered column names
qsv stats -E txs.tsv | qsv table # per-column types, ranges, cardinality
qsv frequency -s event txs.tsv # value distribution of one column
qsv select date_utc,amount txs.tsv
qsv dedup txs.tsv
qsv infers the input delimiter from the file extension, but stdout is always comma-separated. When the result must stay TSV, write it with -o out.tsv (the output extension sets the delimiter) — never shell redirection.
Canonical read — everything as strings, - mapped to NULL:
FROM read_csv('txs.tsv', delim = '\t', header = true, all_varchar = true, nullstr = '-');
-- Profile every column
SUMMARIZE SELECT * FROM read_csv('txs.tsv', delim = '\t', all_varchar = true, nullstr = '-');
-- Aggregate with exact decimals
SELECT event, SUM(amount::DECIMAL(38, 18)) AS total
FROM read_csv('txs.tsv', delim = '\t', all_varchar = true, nullstr = '-')
GROUP BY event
ORDER BY total DESC;
-- Write a TSV back out
COPY (SELECT ...) TO 'out.tsv' (FORMAT csv, DELIMITER '\t', HEADER true, NULLSTR '-');
DuckDB also reads and writes .xlsx (read_xlsx, COPY ... (FORMAT xlsx)) — see references/xlsx.md.
Stdlib csv keeps every cell a string — precision-safe by default. Script template:
#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.12"
# dependencies = []
# ///
import csv
from decimal import Decimal
with open("in.tsv", encoding="utf-8-sig", newline="") as f:
rows = list(csv.DictReader(f, delimiter="\t"))
# transform here; use Decimal(row["amount"]) for arithmetic
with open("out.tsv", "w", encoding="utf-8", newline="") as f:
writer = csv.DictWriter(f, fieldnames=rows[0].keys(), delimiter="\t", lineterminator="\n")
writer.writeheader()
writer.writerows(rows)
newline="" to every open() the csv module touches, and lineterminator="\n" for LF output.uv run --with <pkg> python -c "...".Read references/xlsx.md whenever a .xlsx/.xlsm is input or deliverable: openpyxl create/edit, DuckDB xlsx I/O, styling, conversion recipes, and the recalculation loop. The two absolutes:
=SUM(B2:B9)), not values precomputed in Python.uv run scripts/recalc.py <file.xlsx> and deliver only when it reports zero #REF!/#DIV/0!/#VALUE!/#NAME? errors.Recalculation needs LibreOffice: brew install --cask libreoffice. The script finds the app bundle on its own; soffice does not need to be on PATH.
testing
Use ONLY to check or update the project-scoped agent skills installed under .agents/skills so they match the current state of the repo. Do not trigger for creating, finding, or installing skills, or for README/AGENTS.md updates.
development
This skill should be used when the user asks to consult Codex/GPT for planning or code review, or wants a second opinion on code. Trigger phrases include "ask Codex", "consult Codex", "second opinion on code", "consult the oracle". NOT for implementation tasks.
databases
This skill should be used when the user asks to create a monochrome technical diagram, schematic, or systems/architecture diagram in black-and-white line-art style. Trigger phrases include "create a diagram", "monochrome diagram", "systems diagram", "draw a schematic".
testing
Use only when explicitly asked to archive/prune/compact/roll over checked tasks from TODO.md into `.ai/todos/TODO_UNTIL_YYYY_MM_DD.md`, leaving unchecked tasks.