skills/pdf-statement-parser/SKILL.md
Parses a single financial statement PDF (checking, savings, credit card, brokerage, 401k, HSA, mortgage, tax form) and emits a normalized JSON record with institution, account mask, statement period, opening/closing balances, line-item transactions or holdings, and a confidence score. Use when extracting structured data from a bank, brokerage, retirement, or HSA PDF statement, when ingesting a drop of household finance documents, or when user mentions parsing a statement, extracting transactions from a PDF, or normalizing statement data.
npx skillsauth add lyndonkl/claude pdf-statement-parserInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Bank, brokerage, retirement, HSA, and mortgage statements arrive as PDFs in many house-styles. This skill is the prompt-driven extraction methodology: identify the document type, locate the canonical fields by their labels (not position), pull every transaction or holding row, and emit a strict JSON record. All monetary values become integer cents, all dates become ISO 8601 (YYYY-MM-DD).
Use directly with Claude's PDF reading: Read the PDF, then run this skill's workflow on the visible content. No external library needed.
The caller provides:
pdf_path — absolute path to a single statement PDF.expected_type (optional) — one of checking_statement | savings_statement | credit_card_statement | brokerage_statement | 401k_statement | hsa_statement | mortgage_statement | tax_form | insurance_doc | unknown. If absent, the skill detects it.known_accounts (optional) — array of {id, institution, mask, type} for matching.- [ ] Step 1: Read the PDF (first pass: pages 1-2 for header)
- [ ] Step 2: Detect document type from header keywords
- [ ] Step 3: Extract account identity (institution + mask + type)
- [ ] Step 4: Extract statement period (start/end dates)
- [ ] Step 5: Extract balance markers (opening / closing / available)
- [ ] Step 6: Extract line items (transactions, holdings, or events)
- [ ] Step 7: Normalize to integer cents and ISO dates
- [ ] Step 8: Score confidence on each field; emit JSON
Use the Read tool with the PDF path. For PDFs > 10 pages, read in chunks: pages 1–5 (header + first transactions), then subsequent ranges as needed. Track total page count.
Look for these header signals (in order of authority):
| Signal | Document type |
|---|---|
| "Statement of Account" + "Available Balance" + checking-style transactions | checking_statement |
| "Savings Statement" / "Money Market Statement" | savings_statement |
| "Credit Card Statement" + "Payment Due" + "Minimum Payment" | credit_card_statement |
| "Account Summary" + "Holdings" + ticker symbols | brokerage_statement |
| "401(k) Statement" / "Retirement Plan" + "Vested Balance" | 401k_statement |
| "Health Savings Account" / "HSA" + "Contribution Limit" | hsa_statement |
| "Mortgage Statement" + "Principal" + "Escrow" | mortgage_statement |
| "W-2" / "1099" / "1098" / "5498" form numbers | tax_form |
| Policy declaration page, premium, deductible | insurance_doc |
If two signals tie, prefer the one in the document title or page header.
****1234, xxxx1234, Account ending in 1234, or the redacted form the institution uses. Always store as ****1234.household.json members downstream).Match against known_accounts by (institution, mask). If no match, mark account_match: "new" and propose an account_id like acc_<type>_<incrementing>.
Find labels: "Statement Period", "Cycle Date", "Closing Date", "Period Beginning … Period Ending". Output:
period_start — first day covered (ISO).period_end — last day covered, also the statement closing date (ISO).If only a closing date is present, set period_end = closing_date and period_start = null with confidence: 0.6 on the period.
For cash/credit accounts: opening_balance_cents, closing_balance_cents, available_balance_cents (if present). For credit cards, also statement_balance_cents, minimum_payment_cents, payment_due_date.
For brokerage / 401k / HSA: total_value_cents, cash_cents, invested_cents. Skip opening/closing transaction balances — these are holdings statements.
For mortgages: current_principal_cents, monthly_pi_cents, monthly_escrow_cents, next_payment_due, extra_principal_ytd_cents.
Three modes:
Cash/credit transactions. For each row in the transactions table, capture: date, post_date (if separate column), description_raw, amount. Sign convention: outflows negative, inflows positive. If statement uses two columns ("Withdrawals" and "Deposits"), set sign accordingly. Preserve description_raw exactly as printed — the bookkeeper will normalize the merchant name later.
Brokerage / 401k / HSA holdings. For each holding row: symbol, description, shares (decimal), price (per-share), value, cost_basis (if shown), asset_class (mapped from fund description: see playbook below).
Mortgage activity. Recent payments table: each entry with date, total_paid_cents, principal_cents, interest_cents, escrow_cents, extra_principal_cents.
Tax forms. Form-specific fields per playbook (W-2 boxes, 1099-DIV ordinary/qualified dividends, 1098 mortgage interest paid, 5498-SA HSA contributions).
$1,247.50 → 124750 cents. Negative outflow: -$87.42 → -8742.01/15/26 → 2026-01-15. Two-digit years assume current century unless context says otherwise.Per field, assign a confidence in [0.0, 1.0]:
1.0 — extracted directly from a labeled field.0.85–0.95 — extracted from positional context (column header inferred, value clear).0.6–0.8 — inferred from surrounding text or partially OCR'd.< 0.6 — flag for human review.The overall record gets confidence = min(per-field confidences for required fields).
Required: institution, mask, period_start, period_end, opening_balance_cents, closing_balance_cents, transactions[]. Reconcile: opening + Σ(transactions.amount) = closing within ±$1.
Required: institution, mask, period_start, period_end, opening_balance_cents, closing_balance_cents, statement_balance_cents, minimum_payment_cents, payment_due_date, transactions[]. Note: credit-card outflows from the cardholder's perspective are purchases (positive on the card balance) — but for the unified household ledger, treat purchases as negative (money leaving the household) and payments to the card as positive (or as internal transfers). Document the sign convention you used.
Required: institution, mask, account_type, period_end, total_value_cents, holdings[]. Optional: cash_cents, transactions[] (buys/sells/dividends). Asset-class mapping:
| Fund hint | asset_class |
|---|---|
| "S&P 500", "Total Stock Market", "Large Cap Blend", VOO/VTI/SPY | us_equity |
| "International", "Developed Markets", "Emerging Markets", VXUS/IXUS | intl_equity |
| "Total Bond", "Aggregate Bond", "Treasury", BND/AGG | us_bond |
| "Money Market", "Cash Reserve", SPAXX | cash |
| "REIT", "Real Estate" | reit |
| "Target Date 20XX" | target_date (note the year) |
If unmappable, leave asset_class: null with confidence 0.5.
Required: institution, owner, period_end, total_balance_cents, ytd_contribution_cents (if shown), employer_match_ytd_cents (if shown), holdings[]. Watch for vesting columns — record vested_cents separately if shown.
Required: institution, period_end, cash_cents, invested_cents, total_cents, ytd_contribution_cents, ytd_distribution_cents (if shown). Note coverage type if printed: self_only or family.
Required: institution, mask, period_end, current_principal_cents, monthly_pi_cents, monthly_escrow_cents, next_payment_due, recent_payments[]. Capture the rate APR if printed.
Required: form_type (W-2, 1099-DIV, 1099-INT, 1099-B, 1098, 5498-SA, 1095), tax_year, issuer, recipient, key_boxes (form-specific). Defer interpretation to the tax-compliance agent — this skill just captures the boxes.
Emit a single JSON object:
{
"document_type": "checking_statement",
"source_path": "inbox/2026-01-15-batch/checking-jan.pdf",
"institution": "Bank Name",
"account": {
"mask": "****1234",
"type": "checking",
"owners": ["Jane Doe", "John Doe"],
"match": "acc_chk_001",
"match_confidence": 0.95
},
"period": {
"start": "2025-12-15",
"end": "2026-01-14"
},
"balances": {
"opening_cents": 1124300,
"closing_cents": 1247500,
"available_cents": 1247500
},
"transactions": [
{
"date": "2025-12-16",
"post_date": "2025-12-17",
"description_raw": "TRADER JOE'S #123 PORTLAND OR",
"amount_cents": -8742,
"confidence": 0.97
}
],
"reconciliation": {
"expected_closing_cents": 1247500,
"computed_closing_cents": 1247500,
"diff_cents": 0,
"ok": true
},
"extracted_at": "2026-01-20T14:00:00Z",
"overall_confidence": 0.94,
"warnings": []
}
For brokerage/401k/HSA replace transactions with holdings and balance fields per playbook.
Reduce overall confidence and add a warnings[] entry whenever:
warning: "reconcile_failed: diff $X.XX" and overall_confidence: max(0.5, current * 0.7).warning: "missing_required: <field>" and overall_confidence ≤ 0.5.warning: "low_ocr_quality_page_<n>" and apply the per-field confidence penalty.warning: "unknown_asset_class_high".null and add a warning. Do not interpolate.description_raw byte-for-byte.transaction-categorizer).ok: true.Given a 4-page Chase checking statement for the cycle 2025-12-15 to 2026-01-14 with opening balance $11,243.00, three deposits totaling $4,832.00, twenty-seven debits totaling $3,600.00, and closing balance $12,475.00, the parser emits:
document_type: "checking_statement"institution: "Chase"account.mask: "****1234", account.type: "checking"period.start: "2025-12-15", period.end: "2026-01-14"balances.opening_cents: 1124300, balances.closing_cents: 1247500transactions: [...] — 30 entries, each with description_raw, amount_cents (signed), and per-field confidence ≥ 0.92reconciliation.ok: true, diff_cents: 0overall_confidence: 0.95, warnings: []testing
--- name: advisory-edit description: A strict advisory-only editing discipline for a writer who dictates ("speaks out") essays and wants help WITHOUT having their voice changed. The editor directs structure, flags grammar, and suggests strategic language — but never modifies the writer's text unless the writer explicitly says "apply" / "make that change" / "rewrite this." Produces a line-referenced, suggestion-only critique where every item is marked the writer's call. Four passes: structural, l
testing
Provides the house style for analyst-grade strategist writing — third-person register with sparing first-person, no em dashes, no "not X, not Y, not Z" negation cascades, numbered footnote citations rather than inline source parentheticals, specific opinion-signaling phrases, and topic-forward paragraph structure modeled on voice patterns observed in Damodaran's Musings on Markets and Thompson's Stratechery. Use when consolidating working notes into a finished long-form strategist or analyst report that must read as written by a senior human analyst rather than an AI assistant.
testing
Renders a markdown report to a PDF using pandoc with xelatex (11pt serif body, 1-inch margins, numbered footnotes, formal heading hierarchy). Requires a one-time install of pandoc and a LaTeX engine on the user's machine — basictex on macOS or texlive-xetex on Linux. Does not attempt automatic install. Fails loudly with the exact install commands if pandoc or xelatex is missing on the user's PATH. Use when producing a finished strategist or analyst report PDF from a polished markdown source.
testing
Produces step-by-step computational walkthroughs of vector and matrix operations as a sequence of numbered "frames", showing the explicit state at each step. The text-equivalent of a 3Blue1Brown animation — each frame shows what changed and why, so the learner can re-trace the operation by hand. Use when the learner needs to *see* a computation unfold (eigenvalue computation, attention with 3 tokens, gradient descent step, SVD on a 2×2, layer norm on a 3-vector, softmax of a small input), when an explanation has been given but the learner needs to ground it in a worked example, or when introducing an operation that's intimidating in symbol form but trivial in pencil-and-paper form.