skills/tabular/transitive-match-closure/SKILL.md
Post-processes entity match predictions to enforce symmetry (A→B implies B→A) and transitivity (A→B, B→C implies A→C) via graph closure.
npx skillsauth add wenmin-wu/ds-skills tabular-transitive-match-closureInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Binary entity matching classifiers predict pairs independently, producing inconsistent match sets: A may match B without B matching A, or A matches B and B matches C but A doesn't match C. Transitive match closure fixes this by first enforcing symmetry (bidirectional links), then propagating matches through connected components. This consistently improves recall in entity deduplication and record linkage tasks.
def symmetric_closure(id2matches):
"""If A matches B, ensure B matches A."""
for base, matches in list(id2matches.items()):
for m in matches:
if base not in id2matches.get(m, []):
id2matches.setdefault(m, [base]).append(base)
return id2matches
def transitive_closure(id2matches):
"""If A matches B and B matches C, add C to A's matches."""
changed = True
while changed:
changed = False
for base, matches in list(id2matches.items()):
expanded = set(matches)
for m in matches:
expanded.update(id2matches.get(m, []))
expanded.discard(base)
if len(expanded) > len(set(matches)):
id2matches[base] = list(expanded)
changed = True
return id2matches
# Apply to predictions
id2matches = dict(zip(df["id"], df["matches"].str.split()))
id2matches = symmetric_closure(id2matches)
id2matches = transitive_closure(id2matches)
df["matches"] = df["id"].map(lambda x: " ".join(id2matches[x]))
data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF