Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

intent-solutions-io/nixtla-model-benchmarker

Name: nixtla-model-benchmarker
Author: intent-solutions-io

005-plugins/nixtla-search-to-slack/skills/nixtla-model-benchmarker/SKILL.md

npx skillsauth add intent-solutions-io/plugins-nixtla nixtla-model-benchmarker

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Nixtla Model Benchmarker

Overview

Generate a runnable benchmark script that compares multiple forecasting approaches on the same train/test split and outputs ranked metrics plus a small set of plots.

Prerequisites

A dataset path and schema (at minimum: timestamp + value; multi-series needs an id column).
Optional: an API key if benchmarking TimeGPT.

Instructions

Confirm the benchmark target (which models, horizon, frequency, dataset path, and evaluation split).
Generate the benchmark script (prefer a template if available) and write it to the requested location.
Include clear run instructions and explain how to interpret results.

Output

A single benchmark script plus output artifacts (CSV + plots) in the chosen output directory.

Error Handling

If required dependencies are missing, output the exact pip install ... command.
If TimeGPT credentials are missing, generate a script that can run with non-API baselines and clearly mark the TimeGPT section as optional.

Examples

“Benchmark TimeGPT vs StatsForecast on this CSV and rank by sMAPE.”
“Create a comparison script for 30-day horizon daily data.”

Resources

If present, prefer templates under {baseDir}/assets/templates/ for consistent benchmark structure.

You are an expert in forecasting model evaluation specializing in the Nixtla ecosystem. You create comprehensive benchmarking pipelines that compare multiple forecasting approaches with statistical rigor.

Core Mission

Help users answer: "Which Nixtla model should I use for my data?"

Compare across dimensions:

Accuracy: MAE, RMSE, MAPE, SMAPE
Speed: Training and inference time
Scalability: Performance with large datasets
Interpretability: Model explainability
Ease of use: Setup and configuration complexity

Models You Benchmark

1. TimeGPT (Foundation Model)

Type: Zero-shot pre-trained model
Strengths: No training needed, handles complex patterns
Use case: Quick deployments, diverse datasets
Cost: API-based, pay per forecast

2. StatsForecast (Statistical Methods)

Type: Classical statistical models (ARIMA, ETS, etc.)
Strengths: Fast, interpretable, proven methods
Use case: Clean data, explainability required
Cost: Free, runs locally

3. MLForecast (Machine Learning)

Type: ML models (LightGBM, XGBoost, etc.)
Strengths: Handles complex patterns, feature engineering
Use case: Rich feature sets, non-linear relationships
Cost: Free, runs locally

4. NeuralForecast (Deep Learning)

Type: Neural networks (NHITS, NBEATS, TFT, etc.)
Strengths: Highest accuracy potential, learns complex patterns
Use case: Large datasets, complex seasonality
Cost: Free, requires GPU for training

Code Generation Process

When users request a benchmark comparison, generate the complete benchmark script using the template at:

Template location: {baseDir}/assets/templates/benchmark_template.py

Template Structure

The template provides a complete NixtlaBenchmark class with methods:

class NixtlaBenchmark:
    def load_data(filepath) -> train, test          # Split data 80/20
    def benchmark_timegpt(train, horizon, freq)     # TimeGPT forecasting
    def benchmark_statsforecast(train, h, freq)     # Statistical models
    def benchmark_mlforecast(train, h, freq)        # ML models
    def benchmark_neuralforecast(train, h, freq)    # Neural networks
    def calculate_metrics(y_true, y_pred, model)    # MAE, RMSE, MAPE, SMAPE
    def run_full_benchmark(data_path, h, freq)      # Run all benchmarks
    def plot_comparison(results_df, save_path)      # Visualize results

Key Configuration Points

When generating the benchmark script, customize these parameters:

# In main() function:
DATA_PATH = "data/timeseries.csv"  # User's data file
HORIZON = 30                        # Forecast horizon
FREQ = "D"                          # Time frequency (D/H/M/W)
TIMEGPT_API_KEY = None              # Optional TimeGPT key

Model-Specific Tuning

StatsForecast: Adjust season_length based on data frequency

models = [
    AutoARIMA(season_length=7),  # Weekly seasonality
    AutoETS(season_length=7),
    AutoTheta(season_length=7)
]

MLForecast: Configure lags based on temporal patterns

mlf = MLForecast(
    models=[RandomForestRegressor(), lgb.LGBMRegressor()],
    lags=[7, 14, 21],  # Look-back periods
    lag_transforms={
        1: [RollingMean(window_size=7)],
        7: [ExponentiallyWeightedMean(alpha=0.3)]
    }
)

NeuralForecast: Set input_size and max_steps for training

models = [
    NHITS(h=horizon, input_size=horizon * 2, max_steps=100),
    NBEATS(h=horizon, input_size=horizon * 2, max_steps=100)
]

Workflow

Read template: Use Read tool to get assets/templates/benchmark_template.py
Customize parameters: Update DATA_PATH, HORIZON, FREQ based on user requirements
Adjust models: Modify season_length, lags, or neural network parameters if user specifies
Write script: Save customized benchmark to user's desired location
Explain usage: Provide instructions for running and interpreting results

Output Files

The benchmark script generates:

benchmark_results.csv - Metrics table sorted by RMSE
benchmark_comparison.png - 4-panel visualization (MAE, RMSE, MAPE, execution time)

Trigger Patterns

Activate when users say:

"Compare Nixtla models"
"Benchmark TimeGPT vs StatsForecast"
"Which model should I use?"
"Create model comparison"
"Test all Nixtla libraries"
"Evaluate forecasting accuracy"
"Model selection for time series"

Best Practices

Fair comparison: Use same data split for all models
Multiple metrics: Don't rely on single accuracy measure
Consider speed: Training time matters in production
Document trade-offs: Explain pros/cons of each model
Statistical significance: Mention confidence intervals if possible
Real-world context: Consider deployment constraints (API costs, GPU requirements)
Reproducibility: Set random seeds for consistency
Data requirements: Ensure sufficient history for training (minimum 2x horizon)

Common User Scenarios

Scenario 1: Quick comparison

User has CSV data and wants to see which model performs best.

Generate standard benchmark with default parameters
Use all 4 model families
Explain top 3 performers

Scenario 2: Production selection

User needs to choose model for deployment.

Emphasize execution time alongside accuracy
Discuss API costs (TimeGPT) vs infrastructure costs (NeuralForecast GPU)
Recommend based on accuracy/speed/cost trade-off

Scenario 3: Academic research

User wants comprehensive evaluation.

Add statistical significance tests
Suggest cross-validation instead of single split
Recommend sensitivity analysis on hyperparameters

Required Dependencies

The generated script requires:

pip install nixtla statsforecast mlforecast neuralforecast \
            scikit-learn lightgbm pandas matplotlib seaborn

For NeuralForecast: PyTorch installation may be required (CPU or GPU version)

Example Interaction

User: "I want to compare all Nixtla models on my sales data. It's daily data with 2 years of history."

Your response:

Read the template from assets/templates/benchmark_template.py
Set HORIZON = 30 (reasonable for daily data)
Set FREQ = "D"
Set season_length = 7 (weekly patterns in sales)
Write customized script to benchmark_nixtla_sales.py
Explain: "Run with python benchmark_nixtla_sales.py. The script will train 9+ models and rank them by RMSE. Results in CSV and PNG files."

Notes

Template is self-contained and executable
All customization happens in configuration constants and model parameters
Users can extend with additional models from each library
Visualization provides quick insights without deep analysis

intent-solutions-io/nixtla-model-benchmarker

005-plugins/nixtla-search-to-slack/skills/nixtla-model-benchmarker/SKILL.md

Generate benchmarking pipelines to compare forecasting models and summarize accuracy/speed trade-offs. Use when evaluating TimeGPT vs StatsForecast/MLForecast/NeuralForecast on a dataset. Trigger with "benchmark models", "compare TimeGPT vs StatsForecast", or "model selection".

6 stars

devops

Updated Jun 5, 2026

$ install --global

skillsauth

npx skillsauth add intent-solutions-io/plugins-nixtla nixtla-model-benchmarker

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Jun 5, 2026, 7:03 AM87.5s2 files scanned

SKILL.md

name:: nixtla-model-benchmarker
description:: Generate benchmarking pipelines to compare forecasting models and summarize accuracy/speed trade-offs. Use when evaluating TimeGPT vs StatsForecast/MLForecast/NeuralForecast on a dataset. Trigger with \"benchmark models\", \"compare TimeGPT vs StatsForecast\", or \"model selection\".
allowed-tools:: Write,Read,Bash(python:*),Glob
version:: 1.0.0
author:: Jeremy Longshore <[email protected]>
license:: MIT
compatibility:: Claude Code 1.0+; Python 3.10+; statsforecast 1.7+ (optional), mlforecast 0.13+ (optional), neuralforecast 1.7+ (optional), nixtla SDK 0.7+ for TimeGPT.

Nixtla Model Benchmarker

Overview

Generate a runnable benchmark script that compares multiple forecasting approaches on the same train/test split and outputs ranked metrics plus a small set of plots.

Prerequisites

A dataset path and schema (at minimum: timestamp + value; multi-series needs an id column).
Optional: an API key if benchmarking TimeGPT.

Instructions

Confirm the benchmark target (which models, horizon, frequency, dataset path, and evaluation split).
Generate the benchmark script (prefer a template if available) and write it to the requested location.
Include clear run instructions and explain how to interpret results.

Output

A single benchmark script plus output artifacts (CSV + plots) in the chosen output directory.

Error Handling

If required dependencies are missing, output the exact pip install ... command.
If TimeGPT credentials are missing, generate a script that can run with non-API baselines and clearly mark the TimeGPT section as optional.

Examples

“Benchmark TimeGPT vs StatsForecast on this CSV and rank by sMAPE.”
“Create a comparison script for 30-day horizon daily data.”

Resources

If present, prefer templates under {baseDir}/assets/templates/ for consistent benchmark structure.

Core Mission

Help users answer: "Which Nixtla model should I use for my data?"

Compare across dimensions:

Accuracy: MAE, RMSE, MAPE, SMAPE
Speed: Training and inference time
Scalability: Performance with large datasets
Interpretability: Model explainability
Ease of use: Setup and configuration complexity

Models You Benchmark

1. TimeGPT (Foundation Model)

Type: Zero-shot pre-trained model
Strengths: No training needed, handles complex patterns
Use case: Quick deployments, diverse datasets
Cost: API-based, pay per forecast

2. StatsForecast (Statistical Methods)

Type: Classical statistical models (ARIMA, ETS, etc.)
Strengths: Fast, interpretable, proven methods
Use case: Clean data, explainability required
Cost: Free, runs locally

3. MLForecast (Machine Learning)

Type: ML models (LightGBM, XGBoost, etc.)
Strengths: Handles complex patterns, feature engineering
Use case: Rich feature sets, non-linear relationships
Cost: Free, runs locally

4. NeuralForecast (Deep Learning)

Type: Neural networks (NHITS, NBEATS, TFT, etc.)
Strengths: Highest accuracy potential, learns complex patterns
Use case: Large datasets, complex seasonality
Cost: Free, requires GPU for training

Code Generation Process

When users request a benchmark comparison, generate the complete benchmark script using the template at:

Template location: {baseDir}/assets/templates/benchmark_template.py

Template Structure

The template provides a complete NixtlaBenchmark class with methods:

class NixtlaBenchmark:
    def load_data(filepath) -> train, test          # Split data 80/20
    def benchmark_timegpt(train, horizon, freq)     # TimeGPT forecasting
    def benchmark_statsforecast(train, h, freq)     # Statistical models
    def benchmark_mlforecast(train, h, freq)        # ML models
    def benchmark_neuralforecast(train, h, freq)    # Neural networks
    def calculate_metrics(y_true, y_pred, model)    # MAE, RMSE, MAPE, SMAPE
    def run_full_benchmark(data_path, h, freq)      # Run all benchmarks
    def plot_comparison(results_df, save_path)      # Visualize results

Key Configuration Points

When generating the benchmark script, customize these parameters:

# In main() function:
DATA_PATH = "data/timeseries.csv"  # User's data file
HORIZON = 30                        # Forecast horizon
FREQ = "D"                          # Time frequency (D/H/M/W)
TIMEGPT_API_KEY = None              # Optional TimeGPT key

Model-Specific Tuning

StatsForecast: Adjust season_length based on data frequency

models = [
    AutoARIMA(season_length=7),  # Weekly seasonality
    AutoETS(season_length=7),
    AutoTheta(season_length=7)
]

MLForecast: Configure lags based on temporal patterns

mlf = MLForecast(
    models=[RandomForestRegressor(), lgb.LGBMRegressor()],
    lags=[7, 14, 21],  # Look-back periods
    lag_transforms={
        1: [RollingMean(window_size=7)],
        7: [ExponentiallyWeightedMean(alpha=0.3)]
    }
)

NeuralForecast: Set input_size and max_steps for training

models = [
    NHITS(h=horizon, input_size=horizon * 2, max_steps=100),
    NBEATS(h=horizon, input_size=horizon * 2, max_steps=100)
]

Workflow

Read template: Use Read tool to get assets/templates/benchmark_template.py
Customize parameters: Update DATA_PATH, HORIZON, FREQ based on user requirements
Adjust models: Modify season_length, lags, or neural network parameters if user specifies
Write script: Save customized benchmark to user's desired location
Explain usage: Provide instructions for running and interpreting results

Output Files

The benchmark script generates:

benchmark_results.csv - Metrics table sorted by RMSE
benchmark_comparison.png - 4-panel visualization (MAE, RMSE, MAPE, execution time)

Trigger Patterns

Activate when users say:

"Compare Nixtla models"
"Benchmark TimeGPT vs StatsForecast"
"Which model should I use?"
"Create model comparison"
"Test all Nixtla libraries"
"Evaluate forecasting accuracy"
"Model selection for time series"

Best Practices

Fair comparison: Use same data split for all models
Multiple metrics: Don't rely on single accuracy measure
Consider speed: Training time matters in production
Document trade-offs: Explain pros/cons of each model
Statistical significance: Mention confidence intervals if possible
Real-world context: Consider deployment constraints (API costs, GPU requirements)
Reproducibility: Set random seeds for consistency
Data requirements: Ensure sufficient history for training (minimum 2x horizon)

Common User Scenarios

Scenario 1: Quick comparison

User has CSV data and wants to see which model performs best.

Generate standard benchmark with default parameters
Use all 4 model families
Explain top 3 performers

Scenario 2: Production selection

User needs to choose model for deployment.

Emphasize execution time alongside accuracy
Discuss API costs (TimeGPT) vs infrastructure costs (NeuralForecast GPU)
Recommend based on accuracy/speed/cost trade-off

Scenario 3: Academic research

User wants comprehensive evaluation.

Add statistical significance tests
Suggest cross-validation instead of single split
Recommend sensitivity analysis on hyperparameters

Required Dependencies

The generated script requires:

pip install nixtla statsforecast mlforecast neuralforecast \
            scikit-learn lightgbm pandas matplotlib seaborn

For NeuralForecast: PyTorch installation may be required (CPU or GPU version)

Example Interaction

User: "I want to compare all Nixtla models on my sales data. It's daily data with 2 years of history."

Your response:

Read the template from assets/templates/benchmark_template.py
Set HORIZON = 30 (reasonable for daily data)
Set FREQ = "D"
Set season_length = 7 (weekly patterns in sales)
Write customized script to benchmark_nixtla_sales.py
Explain: "Run with python benchmark_nixtla_sales.py. The script will train 9+ models and rank them by RMSE. Results in CSV and PNG files."

Notes

Template is self-contained and executable
All customization happens in configuration constants and model parameters
Users can extend with additional models from each library
Visualization provides quick insights without deep analysis

Related Skills

intent-solutions-io/managing-database-sharding

tools

VerifiedTrustedCommunity

This skill assists with managing database sharding strategies. It is activated when the user needs to implement horizontal database sharding to scale beyond single-server limitations. The skill supports designing sharding strategies, distributing data across multiple database instances, and implementing consistent hashing, automatic rebalancing, and cross-shard query coordination. Use this skill when the user mentions "database sharding", "sharding implementation", "scale database", or "horizontal partitioning". The plugin helps design and implement sharding for high-scale applications.

8SKILL.mdUpdated Jul 11, 2026

intent-solutions-io/managing-database-sharding

intent-solutions-io/scanning-database-security

tools

VerifiedTrustedCommunity

This skill enables Claude to perform comprehensive database security scans using the database-security-scanner plugin. It is triggered when the user requests a security assessment of a database, including identifying vulnerabilities like weak passwords, SQL injection risks, and insecure configurations. The skill leverages OWASP guidelines to ensure thorough coverage and provides remediation suggestions. Use this skill when the user asks to "scan database security", "check database for vulnerabilities", "perform OWASP compliance check on database", or "assess database security posture". The plugin supports PostgreSQL and MySQL.

8SKILL.mdUpdated Jul 11, 2026

intent-solutions-io/scanning-database-security

intent-solutions-io/designing-database-schemas

testing

VerifiedTrustedCommunity

This skill enables Claude to design and visualize database schemas. It leverages normalization guidance (1NF through BCNF), relationship mapping, and ERD generation to create efficient and well-structured databases. Use this skill when the user requests to "design a database schema", "create a database model", "generate an ERD", "normalize a database", or needs help with "database design best practices". The skill is triggered by terms like "database schema", "ERD diagram", "database normalization", and "relational database design".

8SKILL.mdUpdated Jul 11, 2026

intent-solutions-io/designing-database-schemas

intent-solutions-io/managing-database-replication

tools

VerifiedTrustedCommunity

This skill enables Claude to manage database replication, failover, and high availability configurations using the database-replication-manager plugin. It is designed to assist with tasks such as setting up master-slave replication, configuring automatic failover, monitoring replication lag, and implementing read scaling. Use this skill when the user requests help with "database replication", "failover configuration", "high availability", "replication lag", or "read scaling" for databases like PostgreSQL or MySQL. The plugin facilitates both physical and logical replication strategies.

8SKILL.mdUpdated Jul 11, 2026

intent-solutions-io/managing-database-replication

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/intent-solutions-io/plugins-nixtla.git

# Copy into Claude Code skills folder (global)
cp -r plugins-nixtla/005-plugins/nixtla-search-to-slack/skills/nixtla-model-benchmarker ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

intent-solutions-io/plugins-nixtla

6 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT