005-plugins/nixtla-search-to-slack/skills/nixtla-model-benchmarker/SKILL.md
Generate benchmarking pipelines to compare forecasting models and summarize accuracy/speed trade-offs. Use when evaluating TimeGPT vs StatsForecast/MLForecast/NeuralForecast on a dataset. Trigger with "benchmark models", "compare TimeGPT vs StatsForecast", or "model selection".
npx skillsauth add intent-solutions-io/plugins-nixtla nixtla-model-benchmarkerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Generate a runnable benchmark script that compares multiple forecasting approaches on the same train/test split and outputs ranked metrics plus a small set of plots.
pip install ... command.{baseDir}/assets/templates/ for consistent benchmark structure.You are an expert in forecasting model evaluation specializing in the Nixtla ecosystem. You create comprehensive benchmarking pipelines that compare multiple forecasting approaches with statistical rigor.
Help users answer: "Which Nixtla model should I use for my data?"
Compare across dimensions:
When users request a benchmark comparison, generate the complete benchmark script using the template at:
Template location: {baseDir}/assets/templates/benchmark_template.py
The template provides a complete NixtlaBenchmark class with methods:
class NixtlaBenchmark:
def load_data(filepath) -> train, test # Split data 80/20
def benchmark_timegpt(train, horizon, freq) # TimeGPT forecasting
def benchmark_statsforecast(train, h, freq) # Statistical models
def benchmark_mlforecast(train, h, freq) # ML models
def benchmark_neuralforecast(train, h, freq) # Neural networks
def calculate_metrics(y_true, y_pred, model) # MAE, RMSE, MAPE, SMAPE
def run_full_benchmark(data_path, h, freq) # Run all benchmarks
def plot_comparison(results_df, save_path) # Visualize results
When generating the benchmark script, customize these parameters:
# In main() function:
DATA_PATH = "data/timeseries.csv" # User's data file
HORIZON = 30 # Forecast horizon
FREQ = "D" # Time frequency (D/H/M/W)
TIMEGPT_API_KEY = None # Optional TimeGPT key
StatsForecast: Adjust season_length based on data frequency
models = [
AutoARIMA(season_length=7), # Weekly seasonality
AutoETS(season_length=7),
AutoTheta(season_length=7)
]
MLForecast: Configure lags based on temporal patterns
mlf = MLForecast(
models=[RandomForestRegressor(), lgb.LGBMRegressor()],
lags=[7, 14, 21], # Look-back periods
lag_transforms={
1: [RollingMean(window_size=7)],
7: [ExponentiallyWeightedMean(alpha=0.3)]
}
)
NeuralForecast: Set input_size and max_steps for training
models = [
NHITS(h=horizon, input_size=horizon * 2, max_steps=100),
NBEATS(h=horizon, input_size=horizon * 2, max_steps=100)
]
assets/templates/benchmark_template.pyThe benchmark script generates:
benchmark_results.csv - Metrics table sorted by RMSEbenchmark_comparison.png - 4-panel visualization (MAE, RMSE, MAPE, execution time)Activate when users say:
User has CSV data and wants to see which model performs best.
User needs to choose model for deployment.
User wants comprehensive evaluation.
The generated script requires:
pip install nixtla statsforecast mlforecast neuralforecast \
scikit-learn lightgbm pandas matplotlib seaborn
For NeuralForecast: PyTorch installation may be required (CPU or GPU version)
User: "I want to compare all Nixtla models on my sales data. It's daily data with 2 years of history."
Your response:
assets/templates/benchmark_template.pybenchmark_nixtla_sales.pypython benchmark_nixtla_sales.py. The script will train 9+ models and rank them by RMSE. Results in CSV and PNG files."testing
This skill enables Claude to manage isolated test environments using Docker Compose, Testcontainers, and environment variables. It is used to create consistent, reproducible testing environments for software projects. Claude should use this skill when the user needs to set up a test environment with specific configurations, manage Docker Compose files for test infrastructure, set up programmatic container management with Testcontainers, manage environment variables for tests, or ensure cleanup after tests. Trigger terms include "test environment", "docker compose", "testcontainers", "environment variables", "isolated environment", "env-setup", and "test setup".
tools
This skill uses the test-doubles-generator plugin to automatically create mocks, stubs, spies, and fakes for unit testing. It analyzes dependencies in the code and generates appropriate test doubles based on the chosen testing framework, such as Jest, Sinon, or others. Use this skill when you need to generate test doubles, mocks, stubs, spies, or fakes to isolate units of code during testing. Trigger this skill by requesting test double generation or using the `/gen-doubles` or `/gd` command.
tools
This skill enables Claude to generate realistic test data for software development. It uses the test-data-generator plugin to create users, products, orders, and custom schemas for comprehensive testing. Use this skill when you need to populate databases, simulate user behavior, or create fixtures for automated tests. Trigger phrases include "generate test data", "create fake users", "populate database", "generate product data", "create test orders", or "generate data based on schema". This skill is especially useful for populating testing environments or creating sample data for demonstrations.
development
This skill analyzes code coverage metrics to identify untested code and generate comprehensive coverage reports. It is triggered when the user requests analysis of code coverage, identification of coverage gaps, or generation of coverage reports. The skill is best used to improve code quality by ensuring adequate test coverage and identifying areas for improvement. Use trigger terms like "analyze coverage", "code coverage report", "untested code", or the shortcut "cov".