skills/backtestor-quality-control/SKILL.md
--- name: backtestor-quality-control description: "Quality control audit for backtestors across sports projects — enforces walk-forward integrity, real Vegas odds, correct unit math, complete bet type coverage, data caching, and single canonical backtestor per sport. Fires when building, modifying, auditing, or running any backtestor." weight: light triggers: - building or modifying any backtestor - "audit the backtestor", "check the backtestor" - running any backtest - creating a new sp
npx skillsauth add nhouseholder/nicks-claude-code-superpowers skills/backtestor-quality-controlInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Every backtestor across every sport must pass this audit. No exceptions. No shortcuts. No confusion.
AI agents have repeatedly:
This skill exists to make these failures impossible.
Each sport project has exactly ONE canonical backtestor file. All others must be archived or deleted.
Before running or modifying any backtestor:
.ARCHIVED.py with header)backtest.py, run_backtest.py)# CANONICAL BACKTESTOR — do NOT create alternative versionsbacktest_v2.py, backtest_new.py, backtest_fixed.py → consolidate immediately/tmp/ or a worktree → wrong location, run from project rootFor every prediction being evaluated, the model may ONLY use data available BEFORE that event.
For each event/game being scored:
✓ Stats computed using ONLY prior events (cutoff_date or before_event parameter)
✓ Rolling/expanding windows EXCLUDE the current game
✓ Season averages do NOT include the game being predicted
✓ Odds sourced are the odds that WERE available, not post-event data
✓ Injuries/lineups reflect what was KNOWN before the event
✓ No winner bias (using full-season stats that include the outcome)
.mean(), .avg(), .rolling(), aggregate functionscutoff_date or before_event parametersUsing full-season averages (which include the game being predicted) inflates accuracy by 10-20% and makes the entire backtest worthless. This is not a minor issue — it's a complete invalidation.
Every bet's payout must use REAL sportsbook odds. Never +1u for a win. Never assumed odds.
# Positive American odds (e.g., +150)
profit = stake * (odds / 100) # 1u at +150 = +1.50u
# Negative American odds (e.g., -200)
profit = stake * (100 / abs(odds)) # 1u at -200 = +0.50u
# Loss is ALWAYS -1u per bet (stake lost)
__NO_ODDS__, do NOT substitute +100 or any default# Convert each leg to decimal odds
decimal = (odds / 100) + 1 # for positive
decimal = (100 / abs(odds)) + 1 # for negative
# Multiply all legs
parlay_decimal = leg1_decimal * leg2_decimal * ... * legN_decimal
# Profit on 1u
parlay_profit = parlay_decimal - 1
The backtestor must handle ALL bet types the algorithm generates predictions for. Missing a bet type = incomplete results.
| Bet Type | What It Is | Win Condition | Loss = | |----------|-----------|---------------|--------| | ML | Moneyline (who wins) | Predicted fighter wins | -1u | | Method | ML + finish method | Fighter wins by predicted method (KO/TKO, SUB, DEC) | -1u | | Round | ML + finish round | Fighter wins in predicted round | -1u | | Combo | ML + method + round | Fighter wins by predicted method in predicted round | -1u | | Parlay | Multi-fight combined bet | ALL legs win | -1u |
Key rules for UFC:
Define the equivalent bet types at project creation:
For each event in backtest:
✓ Every applicable bet type has a result (W/L/skip)
✓ Every W has real odds and correct payout
✓ Every L is exactly -1u
✓ Skipped bets are explicitly marked (not silently dropped)
✓ Parlay results reflect ALL legs correctly
All scraped data must be cached locally AND committed to GitHub. A full backtest should take seconds (reading cached data), not hours (re-scraping).
<sport>_odds_cache.json — Historical odds for all events/games
<sport>_stats_cache.json — Player/team statistics
<sport>_game_data_cache.json — Game results, scores, play-by-play
<sport>_injuries_cache.json — Historical injury data (if used)
<sport>_weather_cache.json — Weather data (if applicable)
def get_data(event_id, cache_file):
# 1. Check cache first
cache = load_cache(cache_file)
if event_id in cache:
return cache[event_id] # Cache hit — no scraping needed
# 2. Only scrape if genuinely new
data = scrape_from_source(event_id)
# 3. Save to cache immediately
cache[event_id] = data
save_cache(cache_file, cache)
# 4. Return fresh data
return data
The backtest window starts at the sport-specific minimum and GROWS as new events occur. It never shrinks.
| Sport | Minimum | Type | |-------|---------|------| | UFC | 71 events | Growing (auto-increment after each event) | | NHL | 3 seasons | Rolling | | NBA | 3 seasons | Rolling | | MLB | 3 seasons | Rolling | | CBB | 3 seasons | Rolling |
# After scoring a new event:
1. Score the event using the current model
2. Record results (W/L per bet type, odds, payouts)
3. Add event to the backtest dataset
4. Update the cache with new event data
5. Increment event count
6. Regenerate summary statistics
7. Commit updated registry + cache to git
If the current backtest covers 75 events and a re-run produces results for only 71 → ABORT. This is a data regression. Restore from backup.
The backtestor can be paired with a parameter optimizer, but with strict guardrails:
Run this COMPLETE checklist when auditing any backtestor. Every item must pass.
.ARCHIVED suffix and header/tmp/, worktrees, or non-project directories.mean() or aggregate without temporal filter__NO_ODDS__)Creating a new backtestor → enforce all standards from the start. Use this checklist as the spec.
"Audit the backtestor" or "check the backtest" → run the full checklist. Report every failure.
Before/after any backtest run → verify file hygiene, check results against data invariants.
Multiple backtestor files found → archive old ones, consolidate into canonical version.
tools
Unified context management and session continuity skill. Combines total-recall, strategic-compact, /ledger, and session continuity. Runs in background to preserve critical context across compaction and sessions.
tools
Toolkit for interacting with and testing local web applications using Playwright. Supports verifying frontend functionality, debugging UI behavior, capturing browser screenshots, and viewing browser logs.
tools
Suggest /ultraplan for complex planning tasks on Claude Code CLI (2.1.91+ only). Research preview.
tools
UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 9 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind, shadcn/ui). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient. Integrations: shadcn/ui MCP for component search and examples.