Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

intent-solutions-io/nixtla-benchmark-reporter

Name: nixtla-benchmark-reporter
Author: intent-solutions-io

003-skills/.claude/skills/nixtla-benchmark-reporter/SKILL.md

npx skillsauth add intent-solutions-io/plugins-nixtla nixtla-benchmark-reporter

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Nixtla Benchmark Reporter

Purpose

Generate production-ready benchmark reports from forecasting accuracy metrics, enabling systematic model comparison and regression detection for Nixtla forecasting workflows.

Overview

This skill transforms raw forecast metrics (sMAPE, MASE, MAE, RMSE) into actionable insights. It:

Parses benchmark results CSV files from statsforecast/TimeGPT experiments
Calculates summary statistics (mean, median, std dev, percentiles)
Generates model comparison tables with winners highlighted
Creates regression detection reports comparing current vs. baseline results
Produces GitHub issue templates for performance degradations
Generates markdown reports with embedded charts and recommendations

Key Benefits:

Automates tedious manual benchmarking analysis (2-3 hours → 2 minutes)
Provides consistent reporting format across all forecasting experiments
Detects performance regressions automatically
Generates shareable, version-controlled markdown reports

Prerequisites

Benchmark results CSV files with metrics per series and model
CSV format: columns series_id, model, sMAPE, MASE (minimum)
Optional: Baseline results CSV for regression comparison
Python 3.8+ with pandas, numpy installed

Expected CSV Structure:

series_id,model,sMAPE,MASE,MAE,RMSE
D1,SeasonalNaive,15.23,1.05,12.5,18.3
D1,AutoETS,13.45,0.92,10.2,15.1
D1,AutoTheta,12.34,0.87,9.8,14.5
D2,SeasonalNaive,18.67,1.23,15.1,22.4
...

Instructions

Step 1: Parse Benchmark Results

The script automatically:

Reads benchmark CSV file(s)
Validates CSV structure (required columns present)
Extracts unique models and series
Groups metrics by model

Usage:

python {baseDir}/scripts/generate_benchmark_report.py \
    --results /path/to/benchmark_results.csv \
    --output /path/to/report.md

Step 2: Calculate Summary Statistics

For each model, calculates:

Mean: Average metric across all series
Median: Middle value (less sensitive to outliers)
Std Dev: Measure of consistency
Min/Max: Best and worst performance
Percentiles: 25th, 50th, 75th, 95th percentiles
Win Rate: Percentage of series where model performed best

Step 3: Generate Comparison Table

Creates markdown table comparing all models:

## Model Comparison (sMAPE)

| Model | Mean | Median | Std Dev | Min | Max | Wins |
|-------|------|--------|---------|-----|-----|------|
| AutoTheta | 12.3% | 11.8% | 4.2% | 5.1% | 28.9% | 32/50 (64%) |
| AutoETS | 13.5% | 12.9% | 5.1% | 6.2% | 31.2% | 18/50 (36%) |
| SeasonalNaive | 15.2% | 14.5% | 6.3% | 7.8% | 35.4% | 0/50 (0%) |

Step 4: Identify Winner and Recommendations

Determines overall best model based on:

Primary metric: Lowest mean sMAPE/MASE
Consistency: Lowest standard deviation
Win rate: Most series won

Generates recommendations:

Production baseline model selection
When to use alternatives (e.g., AutoETS for seasonal data)
Failure case analysis (series where all models struggle)

Step 5: Regression Detection (Optional)

If baseline results provided, compares current vs. baseline:

python {baseDir}/scripts/generate_benchmark_report.py \
    --results current_results.csv \
    --baseline baseline_results.csv \
    --output regression_report.md \
    --threshold 5.0  # Alert if sMAPE degrades >5%

Regression Report Includes:

Models with performance degradation
Severity of regression (% change)
Affected series
GitHub issue template for regressions

Step 6: Customize Report Format

Supports multiple output formats:

Standard Report (default):

python {baseDir}/scripts/generate_benchmark_report.py --results metrics.csv

Executive Summary (1-page):

python {baseDir}/scripts/generate_benchmark_report.py \
    --results metrics.csv \
    --format executive \
    --output summary.md

GitHub Issue Template:

python {baseDir}/scripts/generate_benchmark_report.py \
    --results metrics.csv \
    --format github \
    --output .github/ISSUE_TEMPLATE/regression.md

Output

The script generates:

Standard Report (report.md):

Executive Summary (1-2 paragraphs)
Model Comparison Table (all metrics)
Statistical Analysis (means, std devs, percentiles)
Winner Declaration with justification
Per-Series Breakdown (optional)
Recommendations for production use
Failure Case Analysis (series with sMAPE > 30%)

Regression Report (if baseline provided):

Regression Summary (models degraded)
Severity Analysis (% change per model)
Affected Series List
GitHub Issue Template

GitHub Issue Template:

---
title: "Performance Regression Detected: {model_name}"
labels: ["regression", "performance"]
assignees: ["team-lead"]
---

## Regression Summary
Model: {model_name}
Metric: sMAPE degraded by {X}%
Baseline: {baseline_value}%
Current: {current_value}%

## Affected Series
- {series_1}: {baseline}% → {current}% ({delta}%)
- {series_2}: {baseline}% → {current}% ({delta}%)
...

## Acceptance Criteria
- [ ] Investigate root cause
- [ ] Restore performance to within 2% of baseline
- [ ] Add regression test to CI/CD

Error Handling

Missing Metrics File:

Error: Benchmark results not found at /path/to/results.csv
Solution: Verify path and ensure CSV file exists

Invalid CSV Structure:

Error: Required columns missing: series_id, model, sMAPE
Solution: Ensure CSV has minimum required columns

Empty Results:

Warning: No metrics found in CSV file
Solution: Verify CSV has data rows (not just headers)

Regression Threshold Exceeded:

🚨 REGRESSION DETECTED: AutoTheta sMAPE degraded by 12.5%
  Baseline: 12.3%
  Current: 13.8%
  Threshold: 5.0%
Solution: Review recent model changes, check data quality

Examples

Example 1: Generate Standard Benchmark Report

python {baseDir}/scripts/generate_benchmark_report.py \
    --results nixtla_baseline_m4/results_M4_Daily_h14.csv \
    --output reports/m4_daily_baseline.md \
    --verbose

Output:

✓ Loaded 150 results (50 series × 3 models)
✓ Calculated summary statistics
✓ Identified winner: AutoTheta (mean sMAPE: 12.3%)
✓ Generated report: reports/m4_daily_baseline.md (1,245 words)

Example 2: Detect Regressions vs. Baseline

python {baseDir}/scripts/generate_benchmark_report.py \
    --results current_run/results.csv \
    --baseline baseline/v1.0_results.csv \
    --output regression_report.md \
    --threshold 3.0

Output:

⚠️  REGRESSION DETECTED in 2/3 models:
  - AutoETS: sMAPE 13.5% → 14.8% (+9.6%)
  - AutoTheta: sMAPE 12.3% → 12.7% (+3.3%)
✓ Generated regression report with GitHub issue template

Example 3: Generate Executive Summary

python {baseDir}/scripts/generate_benchmark_report.py \
    --results quarterly_benchmark.csv \
    --format executive \
    --output Q1_summary.md

Output:

# Q1 2025 Forecast Baseline Report

**Winner**: AutoTheta with 12.3% sMAPE (vs. 13.5% AutoETS, 15.2% Naive)

**Key Findings**:
- AutoTheta won 64% of series (32/50)
- Most consistent performance (std dev 4.2%)
- Recommended for production baseline

**Action Items**:
- Deploy AutoTheta as default model
- Use AutoETS for highly seasonal data (criteria: seasonal_strength > 0.8)
- Investigate 3 failure cases (sMAPE > 30%)

Example 4: Custom Metric Focus

python {baseDir}/scripts/generate_benchmark_report.py \
    --results results.csv \
    --primary-metric MASE \
    --output mase_focused_report.md

Best Practices

Version Control Reports: Commit generated reports to track performance over time
Automate in CI/CD: Generate reports automatically on every benchmark run
Set Regression Thresholds: Use --threshold to catch regressions early (recommend 3-5%)
Include Timestamps: Reports automatically include generation date/time
Document Assumptions: Reports include metadata about benchmark setup
Share with Stakeholders: Markdown reports render nicely on GitHub/GitLab
Archive Baselines: Keep historical baseline CSVs for regression comparison

Resources

Script: {baseDir}/scripts/generate_benchmark_report.py
Template: {baseDir}/assets/templates/report_template.md
Example Report: {baseDir}/references/EXAMPLE_REPORT.md
M4 Benchmark: https://github.com/Mcompetitions/M4-methods
Forecast Metrics: https://otexts.com/fpp3/accuracy.html

intent-solutions-io/nixtla-benchmark-reporter

003-skills/.claude/skills/nixtla-benchmark-reporter/SKILL.md

Generate comprehensive markdown benchmark reports from forecast accuracy metrics with model comparisons, statistical analysis, and regression detection. Use when analyzing baseline performance, comparing forecast models, or validating model quality. Trigger with 'generate benchmark report', 'analyze forecast metrics', or 'create performance summary'.

6 stars

testing

Updated Apr 5, 2026

$ install --global

skillsauth

npx skillsauth add intent-solutions-io/plugins-nixtla nixtla-benchmark-reporter

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 5, 2026, 11:13 PM8.4s4 files scanned

SKILL.md

name:: nixtla-benchmark-reporter
description:: Generate comprehensive markdown benchmark reports from forecast accuracy metrics with model comparisons, statistical analysis, and regression detection. Use when analyzing baseline performance, comparing forecast models, or validating model quality. Trigger with 'generate benchmark report', 'analyze forecast metrics', or 'create performance summary'.
allowed-tools:: Read,Write,Glob,Bash(python:*)
version:: 1.0.0
author:: Jeremy Longshore <[email protected]>
license:: MIT

Nixtla Benchmark Reporter

Purpose

Generate production-ready benchmark reports from forecasting accuracy metrics, enabling systematic model comparison and regression detection for Nixtla forecasting workflows.

Overview

This skill transforms raw forecast metrics (sMAPE, MASE, MAE, RMSE) into actionable insights. It:

Parses benchmark results CSV files from statsforecast/TimeGPT experiments
Calculates summary statistics (mean, median, std dev, percentiles)
Generates model comparison tables with winners highlighted
Creates regression detection reports comparing current vs. baseline results
Produces GitHub issue templates for performance degradations
Generates markdown reports with embedded charts and recommendations

Key Benefits:

Automates tedious manual benchmarking analysis (2-3 hours → 2 minutes)
Provides consistent reporting format across all forecasting experiments
Detects performance regressions automatically
Generates shareable, version-controlled markdown reports

Prerequisites

Benchmark results CSV files with metrics per series and model
CSV format: columns series_id, model, sMAPE, MASE (minimum)
Optional: Baseline results CSV for regression comparison
Python 3.8+ with pandas, numpy installed

Expected CSV Structure:

series_id,model,sMAPE,MASE,MAE,RMSE
D1,SeasonalNaive,15.23,1.05,12.5,18.3
D1,AutoETS,13.45,0.92,10.2,15.1
D1,AutoTheta,12.34,0.87,9.8,14.5
D2,SeasonalNaive,18.67,1.23,15.1,22.4
...

Instructions

Step 1: Parse Benchmark Results

The script automatically:

Reads benchmark CSV file(s)
Validates CSV structure (required columns present)
Extracts unique models and series
Groups metrics by model

Usage:

python {baseDir}/scripts/generate_benchmark_report.py \
    --results /path/to/benchmark_results.csv \
    --output /path/to/report.md

Step 2: Calculate Summary Statistics

For each model, calculates:

Mean: Average metric across all series
Median: Middle value (less sensitive to outliers)
Std Dev: Measure of consistency
Min/Max: Best and worst performance
Percentiles: 25th, 50th, 75th, 95th percentiles
Win Rate: Percentage of series where model performed best

Step 3: Generate Comparison Table

Creates markdown table comparing all models:

## Model Comparison (sMAPE)

| Model | Mean | Median | Std Dev | Min | Max | Wins |
|-------|------|--------|---------|-----|-----|------|
| AutoTheta | 12.3% | 11.8% | 4.2% | 5.1% | 28.9% | 32/50 (64%) |
| AutoETS | 13.5% | 12.9% | 5.1% | 6.2% | 31.2% | 18/50 (36%) |
| SeasonalNaive | 15.2% | 14.5% | 6.3% | 7.8% | 35.4% | 0/50 (0%) |

Step 4: Identify Winner and Recommendations

Determines overall best model based on:

Primary metric: Lowest mean sMAPE/MASE
Consistency: Lowest standard deviation
Win rate: Most series won

Generates recommendations:

Production baseline model selection
When to use alternatives (e.g., AutoETS for seasonal data)
Failure case analysis (series where all models struggle)

Step 5: Regression Detection (Optional)

If baseline results provided, compares current vs. baseline:

python {baseDir}/scripts/generate_benchmark_report.py \
    --results current_results.csv \
    --baseline baseline_results.csv \
    --output regression_report.md \
    --threshold 5.0  # Alert if sMAPE degrades >5%

Regression Report Includes:

Models with performance degradation
Severity of regression (% change)
Affected series
GitHub issue template for regressions

Step 6: Customize Report Format

Supports multiple output formats:

Standard Report (default):

python {baseDir}/scripts/generate_benchmark_report.py --results metrics.csv

Executive Summary (1-page):

python {baseDir}/scripts/generate_benchmark_report.py \
    --results metrics.csv \
    --format executive \
    --output summary.md

GitHub Issue Template:

python {baseDir}/scripts/generate_benchmark_report.py \
    --results metrics.csv \
    --format github \
    --output .github/ISSUE_TEMPLATE/regression.md

Output

The script generates:

Standard Report (report.md):

Executive Summary (1-2 paragraphs)
Model Comparison Table (all metrics)
Statistical Analysis (means, std devs, percentiles)
Winner Declaration with justification
Per-Series Breakdown (optional)
Recommendations for production use
Failure Case Analysis (series with sMAPE > 30%)

Regression Report (if baseline provided):

Regression Summary (models degraded)
Severity Analysis (% change per model)
Affected Series List
GitHub Issue Template

GitHub Issue Template:

---
title: "Performance Regression Detected: {model_name}"
labels: ["regression", "performance"]
assignees: ["team-lead"]
---

## Regression Summary
Model: {model_name}
Metric: sMAPE degraded by {X}%
Baseline: {baseline_value}%
Current: {current_value}%

## Affected Series
- {series_1}: {baseline}% → {current}% ({delta}%)
- {series_2}: {baseline}% → {current}% ({delta}%)
...

## Acceptance Criteria
- [ ] Investigate root cause
- [ ] Restore performance to within 2% of baseline
- [ ] Add regression test to CI/CD

Error Handling

Missing Metrics File:

Error: Benchmark results not found at /path/to/results.csv
Solution: Verify path and ensure CSV file exists

Invalid CSV Structure:

Error: Required columns missing: series_id, model, sMAPE
Solution: Ensure CSV has minimum required columns

Empty Results:

Warning: No metrics found in CSV file
Solution: Verify CSV has data rows (not just headers)

Regression Threshold Exceeded:

🚨 REGRESSION DETECTED: AutoTheta sMAPE degraded by 12.5%
  Baseline: 12.3%
  Current: 13.8%
  Threshold: 5.0%
Solution: Review recent model changes, check data quality

Examples

Example 1: Generate Standard Benchmark Report

python {baseDir}/scripts/generate_benchmark_report.py \
    --results nixtla_baseline_m4/results_M4_Daily_h14.csv \
    --output reports/m4_daily_baseline.md \
    --verbose

Output:

✓ Loaded 150 results (50 series × 3 models)
✓ Calculated summary statistics
✓ Identified winner: AutoTheta (mean sMAPE: 12.3%)
✓ Generated report: reports/m4_daily_baseline.md (1,245 words)

Example 2: Detect Regressions vs. Baseline

python {baseDir}/scripts/generate_benchmark_report.py \
    --results current_run/results.csv \
    --baseline baseline/v1.0_results.csv \
    --output regression_report.md \
    --threshold 3.0

Output:

⚠️  REGRESSION DETECTED in 2/3 models:
  - AutoETS: sMAPE 13.5% → 14.8% (+9.6%)
  - AutoTheta: sMAPE 12.3% → 12.7% (+3.3%)
✓ Generated regression report with GitHub issue template

Example 3: Generate Executive Summary

python {baseDir}/scripts/generate_benchmark_report.py \
    --results quarterly_benchmark.csv \
    --format executive \
    --output Q1_summary.md

Output:

# Q1 2025 Forecast Baseline Report

**Winner**: AutoTheta with 12.3% sMAPE (vs. 13.5% AutoETS, 15.2% Naive)

**Key Findings**:
- AutoTheta won 64% of series (32/50)
- Most consistent performance (std dev 4.2%)
- Recommended for production baseline

**Action Items**:
- Deploy AutoTheta as default model
- Use AutoETS for highly seasonal data (criteria: seasonal_strength > 0.8)
- Investigate 3 failure cases (sMAPE > 30%)

Example 4: Custom Metric Focus

python {baseDir}/scripts/generate_benchmark_report.py \
    --results results.csv \
    --primary-metric MASE \
    --output mase_focused_report.md

Best Practices

Version Control Reports: Commit generated reports to track performance over time
Automate in CI/CD: Generate reports automatically on every benchmark run
Set Regression Thresholds: Use --threshold to catch regressions early (recommend 3-5%)
Include Timestamps: Reports automatically include generation date/time
Document Assumptions: Reports include metadata about benchmark setup
Share with Stakeholders: Markdown reports render nicely on GitHub/GitLab
Archive Baselines: Keep historical baseline CSVs for regression comparison

Resources

Script: {baseDir}/scripts/generate_benchmark_report.py
Template: {baseDir}/assets/templates/report_template.md
Example Report: {baseDir}/references/EXAMPLE_REPORT.md
M4 Benchmark: https://github.com/Mcompetitions/M4-methods
Forecast Metrics: https://otexts.com/fpp3/accuracy.html

Related Skills

intent-solutions-io/managing-database-sharding

tools

VerifiedTrustedCommunity

This skill assists with managing database sharding strategies. It is activated when the user needs to implement horizontal database sharding to scale beyond single-server limitations. The skill supports designing sharding strategies, distributing data across multiple database instances, and implementing consistent hashing, automatic rebalancing, and cross-shard query coordination. Use this skill when the user mentions "database sharding", "sharding implementation", "scale database", or "horizontal partitioning". The plugin helps design and implement sharding for high-scale applications.

8SKILL.mdUpdated Jul 11, 2026

intent-solutions-io/managing-database-sharding

intent-solutions-io/scanning-database-security

tools

VerifiedTrustedCommunity

This skill enables Claude to perform comprehensive database security scans using the database-security-scanner plugin. It is triggered when the user requests a security assessment of a database, including identifying vulnerabilities like weak passwords, SQL injection risks, and insecure configurations. The skill leverages OWASP guidelines to ensure thorough coverage and provides remediation suggestions. Use this skill when the user asks to "scan database security", "check database for vulnerabilities", "perform OWASP compliance check on database", or "assess database security posture". The plugin supports PostgreSQL and MySQL.

8SKILL.mdUpdated Jul 11, 2026

intent-solutions-io/scanning-database-security

intent-solutions-io/designing-database-schemas

testing

VerifiedTrustedCommunity

This skill enables Claude to design and visualize database schemas. It leverages normalization guidance (1NF through BCNF), relationship mapping, and ERD generation to create efficient and well-structured databases. Use this skill when the user requests to "design a database schema", "create a database model", "generate an ERD", "normalize a database", or needs help with "database design best practices". The skill is triggered by terms like "database schema", "ERD diagram", "database normalization", and "relational database design".

8SKILL.mdUpdated Jul 11, 2026

intent-solutions-io/designing-database-schemas

intent-solutions-io/managing-database-replication

tools

VerifiedTrustedCommunity

This skill enables Claude to manage database replication, failover, and high availability configurations using the database-replication-manager plugin. It is designed to assist with tasks such as setting up master-slave replication, configuring automatic failover, monitoring replication lag, and implementing read scaling. Use this skill when the user requests help with "database replication", "failover configuration", "high availability", "replication lag", or "read scaling" for databases like PostgreSQL or MySQL. The plugin facilitates both physical and logical replication strategies.

8SKILL.mdUpdated Jul 11, 2026

intent-solutions-io/managing-database-replication

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/intent-solutions-io/plugins-nixtla.git

# Copy into Claude Code skills folder (global)
cp -r plugins-nixtla/003-skills/.claude/skills/nixtla-benchmark-reporter ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

intent-solutions-io/plugins-nixtla

6 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT