skills/xlsx/SKILL.md
Create, edit, analyze, or convert Excel spreadsheets (.xlsx, .xlsm) where the workbook file is the primary deliverable. Use for formulas, formatting, financial models, multi-sheet workbooks, and tabular cleanup exported to Excel. Also applies to .csv/.tsv when the user wants spreadsheet output. Do NOT use for Word documents, HTML reports, standalone Python scripts, database pipelines, or Google Sheets API work.
npx skillsauth add K-Dense-AI/claude-scientific-skills xlsxInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Unless otherwise stated by the user or existing template
A user may ask you to create, edit, or analyze the contents of an .xlsx file. You have different tools and workflows available for different tasks.
uv pip install openpyxl pandas
Optional — faster Excel reading across formats with pandas 2.2+:
uv pip install python-calamine
For untrusted workbook files, harden openpyxl against XML expansion attacks:
uv pip install defusedxml
See openpyxl security guidance.
LibreOffice required for formula recalculation: Assume LibreOffice is installed for recalculating formula values using scripts/recalc.py. The script configures LibreOffice on first run, including in sandboxed environments where Unix sockets are restricted (handled by scripts/office/soffice.py).
System dependencies (not installed via uv):
| Tool | Purpose |
|------|---------|
| soffice (LibreOffice 7.x+) | Evaluates Excel formulas via scripts/recalc.py |
| gcc | Only when Unix domain sockets are blocked; compiles a one-time shim into ~/.cache/xlsx-skill/lo-shim/ |
| gtimeout (macOS, optional) | GNU coreutils timeout for recalc timeout support on Darwin |
Verify LibreOffice is available: soffice --version
For data analysis, visualization, and basic operations, use pandas which provides powerful data manipulation capabilities:
import pandas as pd
# Read Excel (.xlsx default engine: openpyxl)
df = pd.read_excel('file.xlsx') # Default: first sheet
all_sheets = pd.read_excel('file.xlsx', sheet_name=None) # All sheets as dict
# Optional: calamine engine (pandas 2.2+) — faster, supports .xlsx/.xls/.xlsb/.xlsm/.ods
# df = pd.read_excel('file.xlsx', engine='calamine')
# Analyze
df.head() # Preview data
df.info() # Column info
df.describe() # Statistics
# Write Excel
df.to_excel('output.xlsx', index=False)
Always use Excel formulas instead of calculating values in Python and hardcoding them. This ensures the spreadsheet remains dynamic and updateable.
# Bad: Calculating in Python and hardcoding result
total = df['Sales'].sum()
sheet['B10'] = total # Hardcodes 5000
# Bad: Computing growth rate in Python
growth = (df.iloc[-1]['Revenue'] - df.iloc[0]['Revenue']) / df.iloc[0]['Revenue']
sheet['C5'] = growth # Hardcodes 0.15
# Bad: Python calculation for average
avg = sum(values) / len(values)
sheet['D20'] = avg # Hardcodes 42.5
# Good: Let Excel calculate the sum
sheet['B10'] = '=SUM(B2:B9)'
# Good: Growth rate as Excel formula
sheet['C5'] = '=(C4-C2)/C2'
# Good: Average using Excel function
sheet['D20'] = '=AVERAGE(D2:D19)'
This applies to ALL calculations - totals, percentages, ratios, differences, etc. The spreadsheet should be able to recalculate when source data changes.
scripts/recalc.py script
python skills/xlsx/scripts/recalc.py output.xlsx
status is errors_found, check error_summary for specific error types and locations#REF!: Invalid cell references#DIV/0!: Division by zero#VALUE!: Wrong data type in formula#NAME?: Unrecognized formula name# Using openpyxl for formulas and formatting
from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill, Alignment
wb = Workbook()
sheet = wb.active
# Add data
sheet['A1'] = 'Hello'
sheet['B1'] = 'World'
sheet.append(['Row', 'of', 'data'])
# Add formula
sheet['B2'] = '=SUM(A1:A10)'
# Formatting
sheet['A1'].font = Font(bold=True, color='FF0000')
sheet['A1'].fill = PatternFill('solid', start_color='FFFF00')
sheet['A1'].alignment = Alignment(horizontal='center')
# Column width
sheet.column_dimensions['A'].width = 20
wb.save('output.xlsx')
# Using openpyxl to preserve formulas and formatting
from openpyxl import load_workbook
# Load existing file
wb = load_workbook('existing.xlsx')
sheet = wb.active # or wb['SheetName'] for specific sheet
# Working with multiple sheets
for sheet_name in wb.sheetnames:
sheet = wb[sheet_name]
print(f"Sheet: {sheet_name}")
# Modify cells
sheet['A1'] = 'New Value'
sheet.insert_rows(2) # Insert row at position 2
sheet.delete_cols(3) # Delete column 3
# Add new sheet
new_sheet = wb.create_sheet('NewSheet')
new_sheet['A1'] = 'Data'
wb.save('modified.xlsx')
Excel files created or modified by openpyxl contain formulas as strings but not calculated values. Use the provided scripts/recalc.py script to recalculate formulas:
python skills/xlsx/scripts/recalc.py <excel_file> [timeout_seconds]
Example:
python skills/xlsx/scripts/recalc.py output.xlsx 30
The script:
Quick checks to ensure formulas work correctly:
pd.notna()/ in formulas (#DIV/0!)The script returns JSON with error details:
{
"status": "success", // or "errors_found"
"total_errors": 0, // Total error count
"total_formulas": 42, // Number of formulas in file
"error_summary": { // Only present if errors found
"#REF!": {
"count": 2,
"locations": ["Sheet1!B5", "Sheet1!C10"]
}
}
}
data_only=True to read calculated values: load_workbook('file.xlsx', data_only=True)data_only=True and saved, formulas are replaced with values and permanently lostread_only=True for reading or write_only=True for writingpd.read_excel('file.xlsx', dtype={'id': str})pd.read_excel('file.xlsx', usecols=['A', 'C', 'E'])pd.read_excel('file.xlsx', parse_dates=['date_column'])IMPORTANT: When generating Python code for Excel operations:
For Excel files themselves:
testing
Run structured What-If scenario analysis with 4–6 branch possibility exploration (best, likely, worst, wild card, contrarian, second-order). Use when the user asks speculative what-if questions about uncertain futures, strategic forks, contingency planning, or stress-testing a decision before committing.
development
Access comprehensive LaTeX templates, formatting requirements, and submission guidelines for major scientific publication venues (Nature, Science, PLOS, IEEE, ACM), academic conferences (NeurIPS, ICML, CVPR, CHI), research posters, and grant proposals (NSF, NIH, DOE, DARPA). This skill should be used when preparing manuscripts for journal submission, conference papers, research posters, or grant proposals and need venue-specific formatting requirements and templates.
development
Use this skill for processing and analyzing large tabular datasets (billions of rows) that exceed available RAM. Vaex excels at out-of-core DataFrame operations, lazy evaluation, fast aggregations, efficient visualization of big data, and machine learning on large datasets. Apply when users need to work with large CSV/HDF5/Arrow/Parquet files, perform fast statistics on massive datasets, create visualizations of big data, or build ML pipelines that do not fit in memory.
development
Query the U.S. Treasury Fiscal Data REST API for federal financial data. No API key required. Use for national debt (Debt to the Penny), Daily Treasury Statements, Monthly Treasury Statements, Treasury securities auctions, interest rates, foreign exchange rates, savings bonds, or U.S. government revenue and spending statistics.