claude/skills/xsv/SKILL.md
Use xsv for fast CSV data processing with selection, filtering, statistics, joining, sorting, and indexing for high-performance data manipulation.
npx skillsauth add lanej/dotfiles xsvInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
You are a CSV data manipulation specialist using xsv, a fast command-line CSV toolkit written in Rust. This skill provides comprehensive guidance for processing, analyzing, and transforming CSV data efficiently.
xsv is designed for high-performance CSV operations:
xsv table data.csv
xsv table data.csv | less -S
# Limit field width for readability
xsv table -c 20 data.csv
xsv count data.csv
xsv headers data.csv
xsv select 1,3,5 data.csv
xsv select Name,Email,Age data.csv
# Show all column names with indices
xsv headers data.csv
# Output format:
# 1 Name
# 2 Email
# 3 Age
Use case: Understand CSV structure before operations
# Count records (excluding header)
xsv count data.csv
# Count with index (much faster)
xsv count data.csv.idx
Performance: O(1) with index, O(n) without
# By index (1-based)
xsv select 1,3,5 data.csv
# By name
xsv select Name,Email,Age data.csv
# Column ranges
xsv select 1-4 data.csv
xsv select Name-Age data.csv
# From column to end
xsv select 3- data.csv
# Exclude columns
xsv select '!1-2' data.csv
# Reorder and duplicate
xsv select 3,1,2,1 data.csv
# Disambiguate duplicate column names
xsv select 'Name[0],Name[1],Name[2]' data.csv
# Quote names with special characters
xsv select '"Date - Opening","Date - Closing"' data.csv
Common options:
-o, --output <file>: Write to file-n, --no-headers: Treat first row as data-d, --delimiter <char>: Input delimiter (default: ,)# Basic search
xsv search "pattern" data.csv
# Case insensitive
xsv search -i "pattern" data.csv
# Search specific columns
xsv search -s Email "gmail.com" data.csv
xsv search -s 1,3,5 "pattern" data.csv
# Invert match (exclude matching rows)
xsv search -v "pattern" data.csv
# Save results
xsv search "active" data.csv -o active_users.csv
Use case: Filter CSV rows like grep
# First 10 rows
xsv slice -l 10 data.csv
# Rows 100-200
xsv slice -s 100 -e 200 data.csv
# Start at row 50, take 20 rows
xsv slice -s 50 -l 20 data.csv
# Last 10 rows (requires index)
xsv count data.csv # Get total
xsv slice -s -10 data.csv
# Single row
xsv slice -i 42 data.csv
Performance: Much faster with index
# Basic stats (mean, min, max, stddev)
xsv stats data.csv
# All statistics (includes median, mode, cardinality)
xsv stats --everything data.csv
# Specific columns
xsv stats -s Age,Salary data.csv
# Include median (requires memory)
xsv stats --median data.csv
# Include mode
xsv stats --mode data.csv
# Include cardinality (unique count)
xsv stats --cardinality data.csv
# Parallel processing
xsv stats -j 4 data.csv
# Output as table
xsv stats data.csv | xsv table
Output fields: field, type, sum, min, max, mean, stddev, median, mode, cardinality
# Top 10 values per column
xsv frequency data.csv
# Specific columns
xsv frequency -s Status,Category data.csv
# Top 20 values
xsv frequency -l 20 data.csv
# All values (no limit)
xsv frequency -l 0 data.csv
# Ascending order
xsv frequency --asc data.csv
# Exclude nulls
xsv frequency --no-nulls data.csv
# View as table
xsv frequency -s Status data.csv | xsv table
Output: field, value, count
Use case: Value distribution analysis
# Sort by first column
xsv sort data.csv
# Sort by specific columns
xsv sort -s Age data.csv
xsv sort -s LastName,FirstName data.csv
# Numeric sort
xsv sort -s Age -N data.csv
# Reverse order
xsv sort -s Age -R data.csv
# Numeric + reverse
xsv sort -s Salary -N -R data.csv
# Save sorted
xsv sort -s Name data.csv -o sorted.csv
Note: Requires reading entire file into memory
# Inner join
xsv join ID users.csv ID orders.csv
# Left outer join
xsv join --left ID users.csv ID orders.csv
# Right outer join
xsv join --right ID users.csv ID orders.csv
# Full outer join
xsv join --full ID users.csv ID orders.csv
# Case-insensitive join
xsv join --no-case Email users.csv Email contacts.csv
# Join on multiple columns
xsv join 'ID,Date' file1.csv 'ID,Date' file2.csv
# Include nulls in join
xsv join --nulls ID file1.csv ID file2.csv
# Cross join (cartesian product)
xsv join --cross 1 file1.csv 1 file2.csv
# Save result
xsv join ID users.csv ID orders.csv -o joined.csv
Join types:
--left: Left outer join--right: Right outer join--full: Full outer join--cross: Cartesian product (use with caution)# Basic table
xsv table data.csv
# With pager
xsv table data.csv | less -S
# Minimum column width
xsv table -w 10 data.csv
# Padding between columns
xsv table -p 4 data.csv
# Limit field length
xsv table -c 20 data.csv
# Combine limits
xsv slice -l 50 data.csv | xsv table -c 30
Note: Requires buffering entire file into memory
# Sample 100 rows
xsv sample 100 data.csv
# Sample 10% of large file
xsv count data.csv # e.g., 1000000
xsv sample 100000 data.csv
# Save sample
xsv sample 1000 large.csv -o sample.csv
# Sample then analyze
xsv sample 10000 huge.csv | xsv stats --everything
Performance: Uses indexing for samples <10% of total
# Convert to TSV
xsv fmt -t '\t' data.csv -o data.tsv
# Convert to pipe-delimited
xsv fmt -t '|' data.csv
# Add CRLF line endings
xsv fmt --crlf data.csv -o windows.csv
# Quote all fields
xsv fmt --quote-always data.csv
# Custom quote character
xsv fmt --quote "'" data.csv
# Custom escape character
xsv fmt --escape '\\' data.csv
# Concatenate by rows (vertically)
xsv cat rows file1.csv file2.csv file3.csv
# Concatenate by columns (horizontally)
xsv cat columns file1.csv file2.csv
# Pad with empty values if different lengths
xsv cat rows-columns file1.csv file2.csv
# Split into files of 1000 rows each
xsv split -s 1000 output_dir data.csv
# Creates: output_dir/0.csv, output_dir/1.csv, etc.
# Flatten first record
xsv slice -i 0 data.csv | xsv flatten
# Output format:
# field,value
# Name,John Doe
# Email,[email protected]
# Age,30
# Ensure all rows have same number of fields
xsv fixlengths data.csv -o fixed.csv
# Pads short rows with empty fields
# Useful for malformed CSVs
# Create index
xsv index data.csv
# Creates: data.csv.idx
# Now operations are faster:
xsv count data.csv # O(1) instead of O(n)
xsv slice -i 1000 data.csv # Direct access
xsv sample 100 data.csv # Fast random access
When to index:
# 1. Understand structure
xsv headers data.csv
# 2. Count records
xsv count data.csv
# 3. View sample
xsv slice -l 10 data.csv | xsv table
# 4. Get statistics
xsv stats data.csv | xsv table
# 5. Check value distributions
xsv frequency -s Status data.csv | xsv table
# 1. Select relevant columns
xsv select Name,Email,Age,Status data.csv |
# 2. Filter active users
xsv search -s Status "active" |
# 3. Filter by age
xsv search -s Age "^[3-9][0-9]$" |
# 4. Save result
xsv -o active_users_30plus.csv
# 1. Create index for performance
xsv index large_data.csv
# 2. Sample for quick analysis
xsv sample 10000 large_data.csv |
# 3. Select columns of interest
xsv select Revenue,Region,Product |
# 4. Get statistics
xsv stats --everything |
# 5. View as table
xsv table
# 1. Join users with orders
xsv join UserID users.csv UserID orders.csv |
# 2. Select relevant columns
xsv select 'UserName,Email,OrderID,OrderDate,Amount' |
# 3. Sort by amount
xsv sort -s Amount -N -R |
# 4. Top 100 orders
xsv slice -l 100 |
# 5. Format and save
xsv table -o top_orders.txt
# 1. Fix row lengths
xsv fixlengths messy.csv |
# 2. Select valid columns
xsv select 1-10 |
# 3. Remove rows with empty email
xsv search -s Email '.+' |
# 4. Sort and deduplicate (using uniq)
xsv sort -s Email |
uniq |
# 5. Save cleaned data
xsv -o cleaned.csv
# 1. Select and reorder columns
xsv select 'LastName,FirstName,Email,Phone' data.csv |
# 2. Convert to TSV
xsv fmt -t '\t' |
# 3. Save
xsv -o output.tsv
# 1. Create index first
xsv index huge_file.csv
# 2. Get quick count
xsv count huge_file.csv
# 3. Sample for analysis
xsv sample 50000 huge_file.csv |
# 4. Analyze sample
xsv stats --everything |
# 5. View results
xsv table
# One-time cost, speeds up many operations
xsv index large.csv
Speeds up:
count (O(1) instead of O(n))slice (direct access)sample (efficient random access)stats -j (parallel processing)These don't require reading entire file into memory:
selectsearchslice (with index)headerscount (with index)These require full file in memory:
sorttablestats --medianstats --modefrequencySolution: Use sample or slice first:
xsv sample 100000 huge.csv | xsv stats --everything
# Use multiple cores for stats
xsv stats -j 0 data.csv # Auto-detect CPUs
# Specific job count
xsv stats -j 4 data.csv
Requires: Indexed file for best performance
# Good: streaming pipeline
xsv select Name,Age data.csv | xsv search -s Age "^[3-9]" | xsv table
# Less efficient: multiple file reads
xsv select Name,Age data.csv -o temp1.csv
xsv search -s Age "^[3-9]" temp1.csv -o temp2.csv
xsv table temp2.csv
# Top 10 customers by revenue
xsv sort -s Revenue -N -R customers.csv | xsv slice -l 10 | xsv table
# Count by status
xsv frequency -s Status -l 0 data.csv | xsv table
# Average age by region (requires external tools)
xsv select Region,Age data.csv | xsv sort -s Region | ...
# Search across multiple columns
xsv select Name,Email,Phone data.csv | xsv search "pattern"
# Find rows with missing email
xsv search -s Email -v '.+' data.csv
# Find duplicates (by email)
xsv select Email data.csv | xsv sort | uniq -d
# Find differences between two files
xsv select ID,Value file1.csv > temp1
xsv select ID,Value file2.csv > temp2
diff temp1 temp2
# Stats for specific column
xsv select Age data.csv | xsv stats | xsv table
# Multiple column stats
xsv select Age,Salary,Score data.csv | xsv stats --everything | xsv table
Most commands support these options:
-h, --help Display help
-o, --output <file> Write to file instead of stdout
-n, --no-headers First row is data, not headers
-d, --delimiter <char> Input delimiter (default: ,)
# TSV (tab-separated)
xsv select 1,3 -d '\t' data.tsv
# Pipe-delimited
xsv select Name,Age -d '|' data.txt
# Semicolon-delimited
xsv select 1-5 -d ';' data.csv
# TSV to CSV
xsv fmt -d '\t' data.tsv -o data.csv
# CSV to TSV
xsv fmt -t '\t' data.csv -o data.tsv
# CSV to pipe-delimited
xsv fmt -t '|' data.csv -o data.txt
Issue: "CSV error: record has different length"
Solution: Use fixlengths
xsv fixlengths data.csv -o fixed.csv
Issue: "No such file or directory"
Solution: Check file path, use absolute paths if needed
Issue: Out of memory with large file
Solution: Use sampling or indexing
xsv index large.csv
xsv sample 10000 large.csv | xsv stats
Issue: Column name not found
Solution: Check headers first
xsv headers data.csv
# Convert CSV to JSON
xsv select Name,Age data.csv | xsv fmt -t ',' | \
python -c 'import csv, json, sys; print(json.dumps([dict(r) for r in csv.DictReader(sys.stdin)]))'
# Add computed column
xsv select Price,Quantity data.csv | \
awk -F, 'NR==1{print $0",Total"} NR>1{print $0","$1*$2}'
# Deduplicate by column
xsv select Email data.csv | sort | uniq
# Pre-filter before xsv
cat data.csv | grep "pattern" | xsv table
# View structure
xsv headers data.csv
xsv count data.csv
xsv slice -l 5 data.csv | xsv table
# Select columns
xsv select 1,3,5 data.csv
xsv select Name,Email data.csv
# Filter rows
xsv search "pattern" data.csv
xsv search -s Email "gmail" data.csv
# Statistics
xsv stats data.csv
xsv frequency -s Status data.csv
# Sort
xsv sort -s Age -N data.csv
# Join
xsv join ID file1.csv ID file2.csv
# Format
xsv table data.csv
xsv fmt -t '\t' data.csv
# Sample
xsv sample 1000 data.csv
# Index (for performance)
xsv index large.csv
While xlsx handles Excel files, xsv is specialized for CSV:
| Feature | xsv | xlsx | |---------|-----|------| | Format | CSV only | XLSX/Excel | | Speed | Extremely fast | Fast | | Memory | Streaming | Depends on operation | | Formulas | No | Yes | | Formatting | No | Yes | | Multiple sheets | No | Yes | | Statistics | Rich | Basic | | Joining | Yes | No | | Indexing | Yes | No |
When to use xsv:
When to use xlsx:
Primary tool: xsv for fast CSV processing
Most common commands:
xsv headers - Understand structurexsv select - Choose columnsxsv search - Filter rowsxsv stats - Analyze dataxsv table - View formattedxsv join - Combine filesxsv index - Speed up operationsKey advantages:
Best practices:
data-ai
Delegate research and context-gathering tasks to a sub-agent to protect the primary context window. Use when the user asks to "research X", "look into X", "find out about X", "gather context on X", or any investigative framing where answering requires 2+ searches or multiple sources. Also use proactively before starting substantive work when prior context is unknown. Never run research inline — always delegate.
documentation
--- name: qmd-math description: Math notation conventions for Quarto/EPQ documents rendered via lualatex. Use when: writing or adding a formula, equation, or mathematical expression to a .qmd file; asked about display math, inline math, or LaTeX notation in a QMD/Quarto context; defining a where-clause or variable definitions for an equation; converting prose variable descriptions into structured math notation; fixing math that renders badly in a PDF; using \lvert, \begin{aligned}, \tfrac, \text
development
Trim a prose document (README, design doc, blog post, notes) for readability by cutting redundancy, filler, and dead weight in the author's own words. Invoke with /trim [file path], or /trim alone to be prompted for a file. Not for source code, data files, or summarization.
business
Query and analyze Josh Lane's org headcount from the staffing DuckDB at ~/workspace/areas/staffing/staffing.duckdb. Use when asked about headcount counts, org structure, direct reports, team breakdown, hiring/attrition trends, international employees, salary/pay grade distribution, offboarding lag, or any question about people in Josh's org. Triggers on questions about how many people, who reports to whom, headcount by team/country/level, who joined or left, org size, staffing, headcount trend.