.claude/skills/data-flow-audit/SKILL.md
Detect split data source anti-patterns and scattered business rule duplication where the same logic is reimplemented across multiple files and languages. Catches semantic duplication that syntactic tools like jscpd miss. Use at phase checkpoints or when investigating data consistency issues.
npx skillsauth add benjaminshoemaker/ai_coding_project_base data-flow-auditInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Detect cases where the same business concept is served through multiple API endpoints with independently maintained logic — the "split data source" anti-pattern — and where the same business rule (filter predicates, conditions, thresholds) is scattered across multiple files and languages.
When the same data reaches the UI through two different code paths, subtle inconsistencies emerge:
When the same business rule is reimplemented across SQL, API routes, shared libs, and Trigger.dev jobs:
Copy this checklist and track progress:
Data Flow Audit Progress:
- [ ] Step 0: Validate known rules from MEMORY.md
- [ ] Step 1: Map all API routes, jobs, and data sources
- [ ] Step 2: Detect overlapping SQL/table references
- [ ] Step 2.5: Detect scattered filter predicates (fingerprint matching)
- [ ] Step 3: Detect duplicated types and helpers across routes
- [ ] Step 4: Detect cross-layer formula and constant divergence
- [ ] Step 5: Map frontend consumers to endpoints
- [ ] Step 6: Detect test mock divergence
- [ ] Step 7: Score and report findings
Before running the full audit, check for documented business rules that already have tracked locations.
Search the project's MEMORY.md for patterns that indicate tracked business rules:
For each documented rule, extract: rule name, conditions/fingerprint, canonical location, documented locations, documented exceptions.
For each documented rule:
Use fingerprint-grep (see Step 2.5 methodology in DETECTION_PATTERNS.md) to search for the rule's column names across the full scan scope. Compare discovered set against documented set to find new, removed, or drifted locations.
KNOWN RULE VALIDATION
---------------------
Rule: {rule name} ({N} conditions)
Canonical: {canonical location}
Documented locations: {N} | Actual: {N} | Status: {ALIGNED | DRIFTED | LOCATIONS_CHANGED}
✓ {file1} — conditions match ({implementation style})
✗ {file3} — MISSING condition: {condition name}
+ {new_file} — UNDOCUMENTED location ({N}/{total} conditions found)
- {removed_file} — REMOVED (no longer contains rule conditions)
⊘ {exception_file} — documented skip ({reason})
Severity: CRITICAL if canonical drifted, HIGH if documented locations missing conditions, MEDIUM if location count changed but conditions consistent.
Build an inventory of every API route, Trigger.dev job, and shared module that touches data.
# Next.js API routes (pages router)
find src/pages/api -name "*.ts" -not -name "*.test.*"
# Next.js API routes (app router)
find src/app/api -name "route.ts"
# Trigger.dev jobs
find trigger/jobs -name "*.ts" -not -name "*.test.*" 2>/dev/null
# Shared lib modules with data fetching
grep -rl "\.from(\|\.rpc(" src/lib/ --include="*.ts" | grep -v test
Record: SQL tables queried (.from()), RPC functions called (.rpc()), shared service imports, response shape.
Group routes serving the same business domain. Heuristics: same URL prefix, same tables/RPCs, >50% response field overlap.
Check if list/detail endpoints, dashboard/overview endpoints, or Trigger.dev jobs compute overlapping data independently. BFF pairs are prime candidates for split data sources.
grep -rn "\.from(" src/pages/api/ src/lib/ trigger/jobs/ --include="*.ts" | grep -v test
Flag when two different route files query the same table with different filters, columns, or joins.
grep -rn "\.rpc(" src/pages/api/ src/lib/ trigger/jobs/ --include="*.ts" | grep -v test
Flag when two routes call the same RPC, or two RPCs compute overlapping metrics.
Read SQL definitions in supabase/migrations/ for any RPCs found. Flag when two SQL functions compute the same metric independently (HIGH) vs. one delegating to another (MEDIUM).
An aggregation cascade occurs when an overview endpoint computes SUM(metric) using its own formula while per-item endpoints compute that metric individually with a different formula. If the per-item formula changes, the aggregate won't match the sum of its parts.
Severity: CRITICAL if aggregate and per-item formulas produce inconsistent results. Severity: HIGH if formulas are consistent but independently maintained.
See DETECTION_PATTERNS.md for the full fingerprint matching methodology.
Read DETECTION_PATTERNS.md for the full detection procedure. The summary below is for quick reference only — always defer to the full document.
Summary: Extract filter predicates from all files in scope, build column-name fingerprints per file, cluster files sharing 3+ field names, validate candidates (eliminate type defs, SELECT lists, comments), and assess severity based on location count and language spread.
See DETECTION_PATTERNS.md for detailed detection commands.
Summary: Find type definitions with >60% field overlap across route files (HIGH). Find utility functions duplicated across routes (HIGH).
Check for routes that bypass shared src/lib/ modules by querying tables directly (HIGH — "library drift").
Check for SQL functions that inline conditions already encapsulated by shared SQL helpers.
See DETECTION_PATTERNS.md for detailed detection commands.
Summary: Find repeated magic numbers/strings across SQL and TS layers. Verify SQL ↔ JS formula consistency (CRITICAL if different values, HIGH if independently maintained). Detect cross-layer formula duplication: SQL + API route, SQL + frontend, API route + frontend, SQL + Trigger.dev job.
See DETECTION_PATTERNS.md for detailed commands.
Summary: Find all fetch() and useSWR/useQuery calls in components. Build consumer map (route → components). Flag same-page different-endpoints (CRITICAL), list-vs-detail divergence (HIGH), and missing callback patterns.
See DETECTION_PATTERNS.md for detailed methodology.
Summary: Collect test files for each route group. Compare mock data shapes for type divergence (field names), value representation (number vs string, snake_case vs camelCase), and endpoint divergence. Severity: MEDIUM — test mock divergence is a symptom, useful as an early-warning signal.
| Severity | Meaning | |----------|---------| | CRITICAL | Same metric, different values possible in UI; formula mismatch; 6+ scattered locations across 2+ languages | | HIGH | Independent implementations that will drift; library bypass; 3-5 scattered locations | | MEDIUM | Structural risk, currently consistent; test mock divergence; 2-3 same-language locations | | LOW | Minor duplication, low divergence risk; intentional bypass with justification |
DATA FLOW AUDIT REPORT
======================
Scanned: {timestamp}
API routes analyzed: {N} | Jobs: {N} | SQL functions: {N} | Shared libs: {N}
KNOWN RULE VALIDATION — {per-rule status from Step 0}
SCATTERED FILTER PREDICATES — {fingerprint clusters from Step 2.5}
SPLIT DATA SOURCE FINDINGS
---------------------------
Finding 1: {Business concept} ({severity})
Endpoints: {route1}, {route2}
Signals: BFF pair | Aggregation cascade | Cross-layer formula | Library bypass | Mock divergence
Risk: {what could diverge}
Recommendation: {extract service module | designate authority | use callback pattern}
ROUTE MAP — {full table from Step 1}
CONSUMER MAP — {full table from Step 5}
SUMMARY
Known rules: {N} aligned, {N} drifted
Scattered predicates: {N} clusters
Split sources: {N} critical, {N} high, {N} medium, {N} low
Status: PASSED | PASSED WITH NOTES | FAILED
| Result | Condition | |--------|-----------| | PASSED | No split data source patterns found and all known rules aligned | | PASSED WITH NOTES | Only MEDIUM/LOW findings and known rules have minor location changes | | FAILED | Any CRITICAL or HIGH finding, or known rule conditions have drifted |
When a split data source or scattered business rule is found:
src/lib/{domain}Service.ts with shared types, helpers, and computation| Situation | Action |
|-----------|--------|
| Project has no API routes | Report "No API route handlers found" and mark audit as NOT APPLICABLE |
| Route file is too complex to parse (>500 lines) | Flag the file as tech debt, parse what you can, note "Partial analysis" |
| No frontend consumers found for a route | Note as informational, check trigger/ or jobs/ for server-side consumers |
| Audit finds >5 split data source patterns | Present the top 3 by severity, summarize remaining, suggest /add-todo for tracking |
| MEMORY.md has no documented rules (Step 0) | Skip known rule validation, rely on Step 2.5 fingerprint matching |
After generating the audit report, verify:
fetch() callstesting
Audit project alignment with VISION.md, identify SDLC gaps, and generate feature proposals. Use when reviewing strategic direction or planning new features.
development
Run code-verification on a specific task. Use to verify a single task's acceptance criteria after implementation.
testing
Resolve Vercel preview deployment URL for the current git branch. Invoked by browser-verification when deployment.enabled is true, or directly to check deployment status. Use to check deployment status or when browser verification needs a URL.
tools
Discover and sync all toolkit-using projects with the latest skills. Use when skills are modified, after the post-commit hook reminds you, or to batch-sync multiple projects.