Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

nhouseholder/backtest

Name: backtest
Author: nhouseholder

skills/backtest/SKILL.md

npx skillsauth add nhouseholder/nicks-claude-code-superpowers backtest

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Backtest Skill

Standardized workflow for running and evaluating prediction model backtests.

The #1 Rule: Future Predictive Accuracy

Every backtest exists to answer ONE question: Will this algorithm make money on FUTURE bets?

Historical performance is a tool, not the goal. Before running any backtest, ask:

What is the hypothesis? Why should this change improve future predictions?
Is this generalizable across time periods, or fitting noise?
Would a domain expert agree this factor matters?

Check project memory for a Sports Backtesting Protocol. If one exists, follow it. If not, use these defaults: walk-forward validation, no future data leakage, compare against baseline accuracy, output results with | tee to both stdout and log file.

Overfitting Guard

After every backtest, run this quick check:

Suspiciously good? If accuracy jumped 5%+ from a minor tweak → likely overfit
Stable across windows? Test on 2+ non-overlapping time periods
Robust to perturbation? Slightly different coefficients should give similar results
Explainable? If you can't explain WHY it works in domain terms → don't trust it

Pre-Compute Gate (MANDATORY before any backtest run)

Before running ANY backtest or algorithm experiment, answer these 3 questions on paper first:

1. Can math answer this without running anything?

Most parameter changes are arithmetic. Before running a full backtest to test "does X% penalty flip picks?":

Read the current threshold value
Read the average/min/max values being penalized
Calculate: does the penalty cross the threshold? If avg_diff × (1 - penalty) > threshold, the answer is NO and you just saved 30 minutes.

If arithmetic predicts the null result → don't run the backtest. Report the math.

2. Can a 10-line isolation script answer this?

If math alone can't answer it, write a standalone script that:

Reads the registry JSON directly (no algorithm import)
Applies the proposed change to cached data
Reports the result in <5 seconds

If isolation can answer it → don't run the full pipeline.

3. Have I locked my approach before running?

State your ONE approach before executing. If you find yourself thinking "let me try a different way" mid-run:

STOP the current run
Go back to the pre-compute gate
Pick ONE approach based on what you learned
Run it ONCE

Approach changes mid-backtest = wasted tokens. Three pivots = anti-pattern. Lock your approach first.

Workflow

1. Verify Database

Check that BacktestDB (or equivalent SQLite database) exists and is populated
Verify the backtest script exists and is runnable
Check for a baseline accuracy to compare against (previous results, config, or git history)

2. Run Backtest with Visible Output

python backtest.py | tee backtest_results.log

CRITICAL: Always use | tee <logfile> for visible output. NEVER redirect stdout to /dev/null or suppress output for scripts that need monitoring.

If the backtest script is in a different location or has arguments:

python <script_path> [args] 2>&1 | tee backtest_results_$(date +%Y%m%d_%H%M%S).log

3. Compare Against Baseline

Parse results for key metrics: accuracy, ROI, precision, recall
Compare against the previous best (from git log, config, or results file)

Show delta clearly:

Accuracy: 67.3% → 69.1% (+1.8%)
ROI: +4.2% → +5.7% (+1.5%)

4. Commit if Improved

If metrics improved:

git add <changed_files>
git commit -m "backtest: vX.XX +Y.Y% accuracy (+Z.Z% delta)"

If metrics declined:

Do NOT commit automatically
Report the regression and suggest reverting or investigating

5. Log Management

Keep backtest logs in a logs/ directory or project root
Never delete previous logs — they're the audit trail
Name logs with timestamps for traceability

6. Overfitting Validation (Sports Models)

Before committing, validate the improvement is real:

# Run on a holdout time window the algorithm hasn't seen
python backtest.py --start-date HOLDOUT_START --end-date HOLDOUT_END | tee holdout_results.log

If holdout performance is significantly worse than training window → likely overfit
If holdout performance is comparable → improvement is likely genuine
Log both training and holdout results in commit message

Mandatory Rules (defined in CLAUDE.md — not repeated here)

These are enforced globally via CLAUDE.md. See those sections for full details:

Backtest Window Limits — UFC: 70 events growing, NHL/MLB/NBA/CBB: 3 seasons
Walk-Forward Integrity — point-in-time stats only, no post-event data leakage
Data Caching — cache all scraped data locally, commit to GitHub, never re-scrape

Rules

Never suppress backtest output
Always show the comparison to baseline
Commit only improvements (or explicitly ask before committing regressions)
Break long sweeps into chunks that can be committed incrementally
Overfitting check — validate on holdout data before committing sports model changes
Future-first — every change must have a hypothesis for why it improves future accuracy

nhouseholder/backtest

skills/backtest/SKILL.md

Run backtests for prediction models (UFC, sports betting). Ensures visible output via tee, compares against baseline accuracy, and commits improvements with structured messages. Enforces walk-forward integrity, overfitting awareness, and future predictive accuracy as the

1 stars

testing

Updated Apr 9, 2026

$ install --global

skillsauth

npx skillsauth add nhouseholder/nicks-claude-code-superpowers backtest

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 9, 2026, 2:40 AM16.6s1 file scanned

SKILL.md

name:: backtest
description:: Run backtests for prediction models (UFC, sports betting). Ensures visible output via tee, compares against baseline accuracy, and commits improvements with structured messages. Enforces walk-forward integrity, overfitting awareness, and future predictive accuracy as the #1 goal. Use when the user mentions backtesting, model evaluation, coefficient testing, or accuracy comparison.
weight:: heavy

Backtest Skill

Standardized workflow for running and evaluating prediction model backtests.

The #1 Rule: Future Predictive Accuracy

Every backtest exists to answer ONE question: Will this algorithm make money on FUTURE bets?

Historical performance is a tool, not the goal. Before running any backtest, ask:

What is the hypothesis? Why should this change improve future predictions?
Is this generalizable across time periods, or fitting noise?
Would a domain expert agree this factor matters?

Overfitting Guard

After every backtest, run this quick check:

Suspiciously good? If accuracy jumped 5%+ from a minor tweak → likely overfit
Stable across windows? Test on 2+ non-overlapping time periods
Robust to perturbation? Slightly different coefficients should give similar results
Explainable? If you can't explain WHY it works in domain terms → don't trust it

Pre-Compute Gate (MANDATORY before any backtest run)

Before running ANY backtest or algorithm experiment, answer these 3 questions on paper first:

1. Can math answer this without running anything?

Most parameter changes are arithmetic. Before running a full backtest to test "does X% penalty flip picks?":

Read the current threshold value
Read the average/min/max values being penalized
Calculate: does the penalty cross the threshold? If avg_diff × (1 - penalty) > threshold, the answer is NO and you just saved 30 minutes.

If arithmetic predicts the null result → don't run the backtest. Report the math.

2. Can a 10-line isolation script answer this?

If math alone can't answer it, write a standalone script that:

Reads the registry JSON directly (no algorithm import)
Applies the proposed change to cached data
Reports the result in <5 seconds

If isolation can answer it → don't run the full pipeline.

3. Have I locked my approach before running?

State your ONE approach before executing. If you find yourself thinking "let me try a different way" mid-run:

STOP the current run
Go back to the pre-compute gate
Pick ONE approach based on what you learned
Run it ONCE

Approach changes mid-backtest = wasted tokens. Three pivots = anti-pattern. Lock your approach first.

Workflow

1. Verify Database

Check that BacktestDB (or equivalent SQLite database) exists and is populated
Verify the backtest script exists and is runnable
Check for a baseline accuracy to compare against (previous results, config, or git history)

2. Run Backtest with Visible Output

python backtest.py | tee backtest_results.log

CRITICAL: Always use | tee <logfile> for visible output. NEVER redirect stdout to /dev/null or suppress output for scripts that need monitoring.

If the backtest script is in a different location or has arguments:

python <script_path> [args] 2>&1 | tee backtest_results_$(date +%Y%m%d_%H%M%S).log

3. Compare Against Baseline

Parse results for key metrics: accuracy, ROI, precision, recall
Compare against the previous best (from git log, config, or results file)

Show delta clearly:

Accuracy: 67.3% → 69.1% (+1.8%)
ROI: +4.2% → +5.7% (+1.5%)

4. Commit if Improved

If metrics improved:

git add <changed_files>
git commit -m "backtest: vX.XX +Y.Y% accuracy (+Z.Z% delta)"

If metrics declined:

Do NOT commit automatically
Report the regression and suggest reverting or investigating

5. Log Management

Keep backtest logs in a logs/ directory or project root
Never delete previous logs — they're the audit trail
Name logs with timestamps for traceability

6. Overfitting Validation (Sports Models)

Before committing, validate the improvement is real:

# Run on a holdout time window the algorithm hasn't seen
python backtest.py --start-date HOLDOUT_START --end-date HOLDOUT_END | tee holdout_results.log

If holdout performance is significantly worse than training window → likely overfit
If holdout performance is comparable → improvement is likely genuine
Log both training and holdout results in commit message

Mandatory Rules (defined in CLAUDE.md — not repeated here)

These are enforced globally via CLAUDE.md. See those sections for full details:

Backtest Window Limits — UFC: 70 events growing, NHL/MLB/NBA/CBB: 3 seasons
Walk-Forward Integrity — point-in-time stats only, no post-event data leakage
Data Caching — cache all scraped data locally, commit to GitHub, never re-scrape

Rules

Never suppress backtest output
Always show the comparison to baseline
Commit only improvements (or explicitly ask before committing regressions)
Break long sweeps into chunks that can be committed incrementally
Overfitting check — validate on holdout data before committing sports model changes
Future-first — every change must have a hypothesis for why it improves future accuracy

Related Skills

nhouseholder/compactor

tools

VerifiedTrustedCommunity

Unified context management and session continuity skill. Combines total-recall, strategic-compact, /ledger, and session continuity. Runs in background to preserve critical context across compaction and sessions.

1SKILL.mdUpdated Apr 21, 2026

nhouseholder/compactor

nhouseholder/webapp-testing

tools

VerifiedTrustedCommunity

Toolkit for interacting with and testing local web applications using Playwright. Supports verifying frontend functionality, debugging UI behavior, capturing browser screenshots, and viewing browser logs.

1SKILL.mdUpdated Apr 17, 2026

nhouseholder/webapp-testing

nhouseholder/using-ultraplan

tools

VerifiedTrustedCommunity

Suggest /ultraplan for complex planning tasks on Claude Code CLI (2.1.91+ only). Research preview.

1SKILL.mdUpdated Apr 17, 2026

nhouseholder/using-ultraplan

nhouseholder/ui-ux-pro-max

tools

VerifiedTrustedCommunity

UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 9 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind, shadcn/ui). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient. Integrations: shadcn/ui MCP for component search and examples.

1SKILL.mdUpdated Apr 17, 2026

nhouseholder/ui-ux-pro-max

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/nhouseholder/nicks-claude-code-superpowers.git

# Copy into Claude Code skills folder (global)
cp -r nicks-claude-code-superpowers/skills/backtest ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

nhouseholder/nicks-claude-code-superpowers

1 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT