skills/transaction-deduplicator/SKILL.md
Detects and removes duplicate transactions across overlapping bank, credit-card, and brokerage statement imports using a stable composite key (account_id, date ±1d, amount_cents, description_normalized). Emits a list of new transactions to commit, a list of suppressed duplicates with their reasons, and a list of suspicious near-duplicates that need human review. Use when ingesting financial statements that may overlap prior drops, merging multiple export sources for the same account, or when user mentions duplicate transactions, deduping a transaction file, or reconciling overlapping statements.
npx skillsauth add lyndonkl/claude transaction-deduplicatorInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Statement drops often overlap. A January statement covers December 15 → January 14; the December statement covers November 15 → December 14; the same December 15 transaction appears in both. This skill identifies those duplicates without losing legitimate same-day same-amount same-merchant repeat charges (e.g., two coffees in one day).
The caller provides:
incoming — array of newly extracted transactions: {date, post_date, account_id, amount_cents, description_raw, source}.existing — array of transactions already in the store with the same fields plus id.A duplicate is identified by the tuple:
(account_id, abs(amount_cents), description_normalized, |date_a - date_b| <= 1 day)
description_normalized uses the same normalization as the categorizer (uppercase, strip vendor codes, strip geo, collapse spaces, drop dates).date vs post_date mismatches between two sources.abs(amount_cents) allows a refund matched against the original purchase to NOT be considered a duplicate (different sign). The composite key uses signed amount.Use signed amount_cents. Refunds (opposite sign) are never duplicates of purchases.
Dedupe Progress:
- [ ] Step 1: Index existing transactions by (account_id, signed_amount, normalized_desc)
- [ ] Step 2: For each incoming, look up the index
- [ ] Step 3: Filter index hits by date proximity (≤ 1 day)
- [ ] Step 4: If no hit, mark as new
- [ ] Step 5: If exactly one hit, mark as duplicate of that id
- [ ] Step 6: If multiple hits, run the multi-instance same-day rule
- [ ] Step 7: Surface near-duplicates (different amount or desc) for review
Build existing_by_key[(account_id, amount_cents, description_normalized)] = [tx, …].
For each incoming transaction, compute its key tuple and look up the bucket.
For each candidate in the bucket, keep only those with |incoming.date − candidate.date| ≤ 1 day. Use min(date, post_date) on each side if post_date exists.
Mark decision: "new". The bookkeeper will append it to transactions.json.
Mark decision: "duplicate" and link duplicate_of: <existing_id>. Do not import.
When the existing store already has N transactions with the identical key on the same day, and the incoming batch contains M transactions with the same key on that day:
M ≤ N → all incoming considered duplicates of existing ones (1:1 pairing in date order).M > N → the first N incoming are duplicates; the remaining M − N are new transactions (legitimate same-day repeat charges, e.g., two coffees, gas-station pre-auth + final).This rule preserves real repeat charges while still suppressing overlap-import duplicates.
A near-duplicate shares everything except amount or description and is within 1 day. These commonly arise when:
Emit these to review[] with both records side-by-side and a suggested action: keep_incoming_drop_existing | keep_existing_drop_incoming | keep_both | merge.
Compute a similarity score on near-misses:
near_dup_score = 0.4*amount + 0.4*description + 0.2*date.
Surface for review when 0.7 ≤ near_dup_score < 0.95. Above 0.95 is treated as duplicate; below 0.7 is treated as independent.
{
"new": [
{ "id": "tx_20260115_017", "decision": "new" }
],
"duplicates": [
{
"incoming_index": 4,
"decision": "duplicate",
"duplicate_of": "tx_20251220_003",
"reason": "exact key match within 1 day window"
}
],
"review": [
{
"incoming_index": 12,
"matched_existing_id": "tx_20260108_005",
"near_dup_score": 0.86,
"diff": {
"amount_cents": [-4500, -4583],
"description_raw": ["AMAZON PENDING", "AMZN MKTP US*AB12CD"]
},
"suggested_action": "keep_incoming_drop_existing",
"rationale": "incoming is the finalized charge (post_date set, definite merchant code)"
}
],
"summary": {
"incoming_total": 142,
"new_count": 96,
"duplicate_count": 44,
"review_count": 2
}
}
description_raw strings to the human; do not show the normalized form.existing.duplicate_of so the user can trace why a transaction did not appear in the new import.testing
--- name: advisory-edit description: A strict advisory-only editing discipline for a writer who dictates ("speaks out") essays and wants help WITHOUT having their voice changed. The editor directs structure, flags grammar, and suggests strategic language — but never modifies the writer's text unless the writer explicitly says "apply" / "make that change" / "rewrite this." Produces a line-referenced, suggestion-only critique where every item is marked the writer's call. Four passes: structural, l
testing
Provides the house style for analyst-grade strategist writing — third-person register with sparing first-person, no em dashes, no "not X, not Y, not Z" negation cascades, numbered footnote citations rather than inline source parentheticals, specific opinion-signaling phrases, and topic-forward paragraph structure modeled on voice patterns observed in Damodaran's Musings on Markets and Thompson's Stratechery. Use when consolidating working notes into a finished long-form strategist or analyst report that must read as written by a senior human analyst rather than an AI assistant.
testing
Renders a markdown report to a PDF using pandoc with xelatex (11pt serif body, 1-inch margins, numbered footnotes, formal heading hierarchy). Requires a one-time install of pandoc and a LaTeX engine on the user's machine — basictex on macOS or texlive-xetex on Linux. Does not attempt automatic install. Fails loudly with the exact install commands if pandoc or xelatex is missing on the user's PATH. Use when producing a finished strategist or analyst report PDF from a polished markdown source.
testing
Produces step-by-step computational walkthroughs of vector and matrix operations as a sequence of numbered "frames", showing the explicit state at each step. The text-equivalent of a 3Blue1Brown animation — each frame shows what changed and why, so the learner can re-trace the operation by hand. Use when the learner needs to *see* a computation unfold (eigenvalue computation, attention with 3 tokens, gradient descent step, SVD on a 2×2, layer norm on a 3-vector, softmax of a small input), when an explanation has been given but the learner needs to ground it in a worked example, or when introducing an operation that's intimidating in symbol form but trivial in pencil-and-paper form.