skills/dedupe-against-corpus/SKILL.md
--- name: dedupe-against-corpus description: Checks a candidate seed against the existing substacker corpus (seeds, drafts, published) for exact duplicates (sha256 fingerprint) and near-duplicates (title Jaccard, first-200-word Jaccard, shared topic cluster). Exact match exits as SKIPPED. Near-match links via related_seeds rather than creating a duplicate. Use after topic tagging and density scoring, before writing the seed. Trigger keywords: dedupe, duplicate, already thought, near-match, relat
npx skillsauth add lyndonkl/claude skills/dedupe-against-corpusInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Related skills: Called by ingest-inbox-item step 4. Queried ad-hoc by search-corpus. Backlinks into matched seeds (the one place this skill writes outside new seeds).
SKIPPED.LINK candidate.LINK candidate.Tier 2 and 3 candidates are unioned (up to 3 total LINK targets).
Dedupe one candidate seed:
- [ ] Step 1: Grep all corpus/**/*.md frontmatter for fingerprint match
- [ ] Step 2: If exact match, return SKIPPED
- [ ] Step 3: Normalized title Jaccard against all existing seeds
- [ ] Step 4: For seeds sharing ≥2 topic tags, first-200-word Jaccard
- [ ] Step 5: Union tier-2 and tier-3 candidates, cap at 3
- [ ] Step 6: If any LINK candidates, return LINK with related_seeds list; else CREATE
- [ ] Step 7: For LINK, Edit matched seeds' related_seeds to add this candidate's id
The matched seeds get their related_seeds field extended (append, never replace) ONLY IF:
manual_edits: false on the matched seed, ORDo not touch any other field on the matched seed.
Candidate: new seed about "dropout as ensemble" with topics [regularization, ensembling, dropout], first 200 words describing thinned-network averaging.
Existing corpus:
2026-03-11-l2-as-gaussian-prior — topics [regularization, bayesian]. Title Jaccard = 0.1 (different). Shared tags: 1. Skip content-similarity check.2026-02-08-bagging-in-deep-nets — topics [regularization, ensembling]. Shared tags: 2. First-200-word Jaccard: 0.51 → LINK candidate.Output: {action: LINK, related_seeds: [2026-02-08-bagging-in-deep-nets]}.
Side effect: corpus/seeds/2026-02-08-bagging-in-deep-nets.md gets its related_seeds appended with the new seed's id.
links.related_seeds field.corpus/dead/ — dead ideas stay dead.status: published to status: seed — published posts are immutable except for typo fixes.SKIPPED, LINK, CREATE.development
--- name: zettel-note description: The note-writing discipline for this vault's evergreen knowledge graph, modeled on a Zettelkasten reading companion and governed by the vault conventions. Enforces declarative-claim titles, one claim per note (atomicity), own-words prose with no block quotes, the piped [[slug|Title]] link form, the labeled link-relationship vocabulary (Confirms/Contradicts/Extends/Context/Prerequisite/Builds-on/Applies/Example-of/Contrasts-with), 3-6 links per note, and search-
development
Plans between-round FIFA World Cup Fantasy transfers — budgets the round's free transfer(s), forces out players whose nation has been eliminated, chases fixture-swing drops, upgrades on value, and decides when a rebuild is large enough to fire the Wildcard instead of spending free transfers one at a time. Ranks candidate in/out pairs by EV gain over each player's remaining survival horizon (delta xEV weighted by progression_carry) MINUS transfer cost (a free transfer is cheap, a points hit is real, churning the squad for marginal swings is a critic flag), and tags forced/fixture/upgrade priority. Emits a `transfer-plan` signal. Use when called by wc-squad-architect (whose transfer work this skill is the engine for) and by the strategists in the populate stage when their candidate is transfer-adjacent rather than a full rebuild.
testing
Reads and updates the FIFA World Cup Fantasy tournament state machine (footballfantasy/context/tournament-state.md) — the temporal backbone tracking phase (pre-tournament → group MD1-3 → R32 → R16 → QF → SF → final), budget ($100m group / $105m knockouts), nation cap (3 group, loosening in knockouts), chips remaining, surviving nations, each owned player's elimination-risk horizon, and deadlines. Validates state on load (count/feasibility checks), applies phase transitions, and appends to the append-only state log (never silent overwrite). Use to load state at the start of a run and to commit state changes after the manager makes a move.
development
Validates and persists FIFA World Cup Fantasy signal files to signals/YYYY-MM-DD-<type>.md. Checks the required frontmatter (type, round, date, emitted_by, confidence, source_urls), range-checks declared numeric signals, confirms every factual claim carries a source URL or "manager-provided", rejects unknown signal types, and refuses to persist a signal that fails validation (logging the failure instead). Keeps the inter-agent signal layer auditable so downstream agents can trust what they read and never re-derive it. Use whenever an agent or skill writes a signal.