000-docs/000a-planned-skills/prediction-markets/nixtla-forecast-validator/SKILL.md
Validates time series forecast quality metrics by comparing current performance against historical benchmarks. Detects degradation in MASE and sMAPE metrics. Activates when user mentions "validate forecast", "check forecast quality", or "assess forecast metrics".
npx skillsauth add intent-solutions-io/plugins-nixtla nixtla-forecast-validatorInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Validates time series forecast quality metrics and detects performance degradation using statistical measures. Compares current forecast accuracy against historical benchmarks to identify significant deviations in MASE and sMAPE metrics.
This skill analyzes forecast quality by comparing current performance metrics against historical baselines. It detects significant increases in error metrics (MASE and sMAPE) that may indicate model degradation, data quality issues, or changing patterns in the time series. The skill generates comprehensive reports, alerts, and visualizations to help users identify and address forecast quality problems quickly.
Activates automatically when Claude detects forecast validation needs, or when explicitly requested with phrases like "validate forecast quality", "check model performance", or "assess forecast accuracy".
Tools: Read, Write, Bash, Glob, Grep
Environment: No API keys required (operates on CSV metrics files)
Python Packages:
pip install pandas matplotlib
Required CSV Format:
CSV files must contain columns: model, MASE, sMAPE
Ensure you have two CSV files containing forecast metrics:
Each CSV must have columns: model, MASE, sMAPE
Example format:
model,MASE,sMAPE
model_A,1.2,0.15
model_B,0.8,0.10
Configure acceptable deviation thresholds for MASE and sMAPE metrics. Default thresholds are 0.2 (20% increase), but these can be adjusted based on business requirements and model characteristics.
Recommended thresholds:
Run the validation script to compare current metrics against historical benchmarks:
python {baseDir}/scripts/validate_forecast.py \
--historical historical_metrics.csv \
--current current_metrics.csv \
--mase_threshold 0.2 \
--smape_threshold 0.2
The script performs:
Analyze the generated outputs to identify forecast quality issues:
validation_report.txt for summary of findingsalert.log for models requiring immediate attentionmetrics_comparison.csv for detailed metric changesmetrics_visualization.png for visual comparisonIf degradation is detected, investigate potential causes such as data quality changes, concept drift, or model staleness.
The validation process generates four output files:
Common errors and solutions:
Missing required metrics column (MASE or sMAPE)
MASE and sMAPE (case-sensitive)Invalid threshold value
--mase_threshold and --smape_thresholdHistorical data unavailable
File not found error
--historical and --current file paths are correctEmpty DataFrame error
Input (historical_metrics.csv):
model,MASE,sMAPE
model_A,1.2,0.15
Input (current_metrics.csv):
model,MASE,sMAPE
model_A,1.8,0.18
Command:
python scripts/validate_forecast.py --historical historical_metrics.csv --current current_metrics.csv
Output (validation_report.txt):
WARNING: Significant increase in MASE detected for model model_A.
Interpretation: Model A shows 50% increase in MASE (from 1.2 to 1.8), exceeding the default 20% threshold. This indicates forecast quality degradation requiring investigation.
Input (historical_metrics.csv):
model,MASE,sMAPE
model_B,0.8,0.10
Input (current_metrics.csv):
model,MASE,sMAPE
model_B,0.85,0.11
Command:
python scripts/validate_forecast.py --historical historical_metrics.csv --current current_metrics.csv
Output (validation_report.txt):
Forecast validation passed. No significant degradation detected.
Interpretation: Model B shows only 6.25% increase in MASE and 10% increase in sMAPE, both below the 20% threshold. Performance is stable.
Command:
python scripts/validate_forecast.py \
--historical multi_model_historical.csv \
--current multi_model_current.csv \
--mase_threshold 0.3 \
--smape_threshold 0.25
Uses more lenient thresholds (30% for MASE, 25% for sMAPE) suitable for volatile forecasts or experimental models.
Script: {baseDir}/scripts/validate_forecast.py
Metrics: MASE (Mean Absolute Scaled Error), sMAPE (symmetric Mean Absolute Percentage Error)
Related skills: nixtla-timegpt-lab, nixtla-experiment-architect, nixtla-schema-mapper
testing
This skill enables Claude to manage isolated test environments using Docker Compose, Testcontainers, and environment variables. It is used to create consistent, reproducible testing environments for software projects. Claude should use this skill when the user needs to set up a test environment with specific configurations, manage Docker Compose files for test infrastructure, set up programmatic container management with Testcontainers, manage environment variables for tests, or ensure cleanup after tests. Trigger terms include "test environment", "docker compose", "testcontainers", "environment variables", "isolated environment", "env-setup", and "test setup".
tools
This skill uses the test-doubles-generator plugin to automatically create mocks, stubs, spies, and fakes for unit testing. It analyzes dependencies in the code and generates appropriate test doubles based on the chosen testing framework, such as Jest, Sinon, or others. Use this skill when you need to generate test doubles, mocks, stubs, spies, or fakes to isolate units of code during testing. Trigger this skill by requesting test double generation or using the `/gen-doubles` or `/gd` command.
tools
This skill enables Claude to generate realistic test data for software development. It uses the test-data-generator plugin to create users, products, orders, and custom schemas for comprehensive testing. Use this skill when you need to populate databases, simulate user behavior, or create fixtures for automated tests. Trigger phrases include "generate test data", "create fake users", "populate database", "generate product data", "create test orders", or "generate data based on schema". This skill is especially useful for populating testing environments or creating sample data for demonstrations.
development
This skill analyzes code coverage metrics to identify untested code and generate comprehensive coverage reports. It is triggered when the user requests analysis of code coverage, identification of coverage gaps, or generation of coverage reports. The skill is best used to improve code quality by ensuring adequate test coverage and identifying areas for improvement. Use trigger terms like "analyze coverage", "code coverage report", "untested code", or the shortcut "cov".