skills/backtest/SKILL.md
Run backtests for prediction models (UFC, sports betting). Ensures visible output via tee, compares against baseline accuracy, and commits improvements with structured messages. Enforces walk-forward integrity, overfitting awareness, and future predictive accuracy as the
npx skillsauth add nhouseholder/nicks-claude-code-superpowers backtestInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Standardized workflow for running and evaluating prediction model backtests.
Every backtest exists to answer ONE question: Will this algorithm make money on FUTURE bets?
Historical performance is a tool, not the goal. Before running any backtest, ask:
Check project memory for a Sports Backtesting Protocol. If one exists, follow it. If not, use these defaults: walk-forward validation, no future data leakage, compare against baseline accuracy, output results with | tee to both stdout and log file.
After every backtest, run this quick check:
Before running ANY backtest or algorithm experiment, answer these 3 questions on paper first:
Most parameter changes are arithmetic. Before running a full backtest to test "does X% penalty flip picks?":
If arithmetic predicts the null result → don't run the backtest. Report the math.
If math alone can't answer it, write a standalone script that:
If isolation can answer it → don't run the full pipeline.
State your ONE approach before executing. If you find yourself thinking "let me try a different way" mid-run:
Approach changes mid-backtest = wasted tokens. Three pivots = anti-pattern. Lock your approach first.
python backtest.py | tee backtest_results.log
CRITICAL: Always use | tee <logfile> for visible output. NEVER redirect stdout to /dev/null or suppress output for scripts that need monitoring.
If the backtest script is in a different location or has arguments:
python <script_path> [args] 2>&1 | tee backtest_results_$(date +%Y%m%d_%H%M%S).log
Accuracy: 67.3% → 69.1% (+1.8%)
ROI: +4.2% → +5.7% (+1.5%)
If metrics improved:
git add <changed_files>
git commit -m "backtest: vX.XX +Y.Y% accuracy (+Z.Z% delta)"
If metrics declined:
logs/ directory or project rootBefore committing, validate the improvement is real:
# Run on a holdout time window the algorithm hasn't seen
python backtest.py --start-date HOLDOUT_START --end-date HOLDOUT_END | tee holdout_results.log
These are enforced globally via CLAUDE.md. See those sections for full details:
tools
Unified context management and session continuity skill. Combines total-recall, strategic-compact, /ledger, and session continuity. Runs in background to preserve critical context across compaction and sessions.
tools
Toolkit for interacting with and testing local web applications using Playwright. Supports verifying frontend functionality, debugging UI behavior, capturing browser screenshots, and viewing browser logs.
tools
Suggest /ultraplan for complex planning tasks on Claude Code CLI (2.1.91+ only). Research preview.
tools
UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 9 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind, shadcn/ui). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient. Integrations: shadcn/ui MCP for component search and examples.