skills/eventcast-hybrid-demand-forecasting/SKILL.md
Build hybrid demand forecasting systems that fuse LLM-extracted event knowledge with time-series models using a dual-tower architecture. Use when asked to: 'forecast demand for flash sales', 'build a demand prediction pipeline with event awareness', 'integrate promotional calendars into forecasting', 'predict sales spikes from campaigns or holidays', 'combine LLM reasoning with time-series forecasting', 'create an event-driven demand model'.
npx skillsauth add ndpvt-web/arxiv-claude-skills eventcast-hybrid-demand-forecastingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill enables Claude to build modular demand forecasting systems that treat LLMs as event reasoning engines rather than numerical predictors. Based on the EventCast framework, the approach uses a dual-tower architecture: one tower encodes historical demand via an inverted transformer, while a second tower encodes structured event summaries produced by an LLM from unstructured business data (campaigns, holidays, seller incentives). The two towers fuse in a shared embedding space via additive alignment, producing forecasts that capture both baseline trends and event-driven demand shifts. This architecture achieved up to 57% MAE reduction over industrial baselines during promotional periods across 4 countries and 160 regions.
LLMs for reasoning, not prediction. EventCast's core insight is that LLMs are poor at direct numerical forecasting but excellent at interpreting unstructured business context. The framework constrains the LLM to a single task: read raw campaign descriptions, holiday schedules, and incentive rules, then produce a structured textual summary describing the expected demand impact. This summary captures cultural nuances (e.g., Ramadan timing varies by country), event interactions (e.g., a flash sale overlapping a public holiday), and intensity signals (e.g., free-shipping thresholds). The LLM never sees demand numbers.
Dual-tower fusion. The time-series tower uses an inverted transformer that treats each feature dimension's history as a separate token (transposing the input from [T x d] to [d x T]), enabling cross-variable attention. The event tower tokenizes the LLM's structured output using learnable embeddings (not the LLM's own embeddings) with positional encoding, then sums them into a semantic vector. Both towers project into a shared 1024-dimensional space and fuse via addition: h_aligned = h_hist + h_sem. This additive fusion is simple but effective because both towers are trained end-to-end to align their representations.
Dual prediction heads. The fused representation feeds into two separate heads: a trend head capturing baseline demand and an event head capturing event-driven deviations. The final forecast is a weighted combination: y = lambda * y_trend + (1 - lambda) * y_event, with lambda defaulting to 0.4, biasing toward event-driven corrections when event knowledge is present.
Inventory your data sources. Identify three categories: (a) historical demand time-series at your target granularity (daily/weekly, by SKU or region), (b) structured calendar data (holidays, campaigns with start/end dates), and (c) unstructured business text (campaign briefs, seller incentive descriptions, policy announcements). Map each source to a table or API.
Build the event knowledge base. Create a unified schema with columns: date, region, event_type (campaign/holiday/incentive), event_name, intensity (1-12 scale), scope (national/regional/category-level), raw_description (free text), and duration_days. Populate from your operational databases. Include a 13-month rolling window for training.
Design the LLM event-extraction prompt. Construct a parameterized prompt template that provides the LLM with: (a) national context and timezone, (b) temporal properties (day of week, month, quarter), (c) active events with relative positioning (e.g., "Day 3 of 7-day sale"), (d) promotional intensity levels and discount tiers, (e) logistics incentives with eligibility thresholds. Instruct the LLM to reason step-by-step and output semicolon-separated fields inside <result>...</result> tags for reliable parsing.
Run LLM extraction in batch. For each (date, region) pair in your forecast horizon, call the LLM with the populated prompt. Parse the <result> block into structured fields: event_summary, expected_impact_direction (up/down/neutral), cultural_relevance, interaction_effects. Cache results keyed by (date, region, event_hash) to avoid redundant calls. Use a frozen model (e.g., GPT-4, Claude, or ChatGLM-4) with temperature=0 for deterministic outputs.
Prepare the time-series feature matrix. For each prediction target, construct an input matrix X of shape [14 x d] (14-day lookback, d features). Features should include: raw demand, lag-1/7/14 demand, 7-day rolling mean and std, day-of-week encoding, month encoding, and any numerical event features (binary flags for active campaigns). Transpose to [d x 14] for the inverted transformer input.
Encode event summaries. Tokenize each LLM-generated summary field using a standard tokenizer (not the LLM's own). Map tokens through a learnable embedding layer (randomly initialized, trained end-to-end) and add positional embeddings. Aggregate per-field embeddings by summation to produce h_sem of dimension d_model (1024).
Implement the dual-tower model. Build the time-series tower as a multi-head self-attention encoder over the transposed feature matrix, projecting the output to d_model via a linear layer. Build the event tower as the embedding pipeline from step 6. Fuse via element-wise addition: h_aligned = h_hist + h_sem. Pass through a residual feed-forward block with batch normalization and LeakyReLU.
Add dual prediction heads. Create two linear heads from the fused representation: y_trend = W_trend @ z + b_trend and y_event = W_event @ z + b_event. Combine as y_final = 0.4 * y_trend + 0.6 * y_event. Train with L2 loss on actual demand. Use a weighted sampling strategy that oversamples event-period examples.
Train and validate. Train on a 13-month rolling window with the most recent month held out. Use NVIDIA GPU hardware; expect ~4 minutes per epoch on ~150M parameters. Evaluate with MAE and MSE, reporting separately for event periods vs. non-event periods to verify the event tower's contribution.
Deploy with weekly retraining. Set up a pipeline that (a) refreshes the event knowledge base from operational databases, (b) runs LLM extraction for the upcoming forecast horizon, (c) retrains or fine-tunes the model on the updated rolling window, and (d) serves forecasts with inference under 20 seconds. Log the LLM's event summaries alongside predictions for explainability.
Example 1: E-commerce flash sale forecasting
User: "I need to forecast daily order volume for our Indonesian marketplace. We have historical orders and a campaign calendar, but our current Prophet model misses demand spikes during flash sales."
Approach:
PROMPT_TEMPLATE = """
You are an e-commerce demand analyst for Indonesia.
Date: {date} ({day_of_week}, {month})
Region: {region}
Active campaigns:
{campaign_list}
Upcoming holidays within 7 days:
{holiday_list}
Active seller incentives:
{incentive_list}
Analyze the combined effect of these events on consumer demand.
Consider: Ramadan timing, payday cycles (25th-1st), cultural shopping habits.
Provide step-by-step reasoning, then output your analysis as:
<result>impact_direction;intensity_1_to_12;primary_driver;interaction_effects;cultural_notes</result>
"""
Output:
Forecast for Jakarta, 2025-03-15 (Day 2 of Ramadan Flash Sale):
Predicted orders: 142,300 (+67% vs. non-event baseline)
Event attribution: Ramadan Flash Sale (intensity 9/12) + Payday cycle overlap
Trend component: 85,200 | Event component: 183,500 | Lambda: 0.4
Confidence: [128,000 - 156,600] (80% interval)
Example 2: Adding event awareness to an existing time-series pipeline
User: "We already have an LSTM-based demand model. How do I add event knowledge without rewriting everything?"
Approach:
h_histimport torch
import torch.nn as nn
class EventTower(nn.Module):
def __init__(self, vocab_size, d_model=1024, n_fields=5, max_tokens=64):
super().__init__()
self.token_embed = nn.Embedding(vocab_size, d_model)
self.pos_embed = nn.Embedding(max_tokens, d_model)
self.n_fields = n_fields
def forward(self, token_ids_per_field):
# token_ids_per_field: list of [batch, seq_len] tensors, one per field
h_sem = torch.zeros(token_ids_per_field[0].size(0), 1024,
device=token_ids_per_field[0].device)
for i, tokens in enumerate(token_ids_per_field):
positions = torch.arange(tokens.size(1), device=tokens.device)
emb = self.token_embed(tokens) + self.pos_embed(positions)
h_sem += emb.sum(dim=1) # aggregate tokens per field, sum across fields
return h_sem
class EventCastWrapper(nn.Module):
def __init__(self, existing_lstm, d_lstm, vocab_size, d_model=1024):
super().__init__()
self.ts_tower = existing_lstm
self.ts_proj = nn.Linear(d_lstm, d_model)
self.event_tower = EventTower(vocab_size, d_model)
self.ffn = nn.Sequential(
nn.Linear(d_model, d_model),
nn.BatchNorm1d(d_model),
nn.LeakyReLU(),
nn.Linear(d_model, d_model),
)
self.head_trend = nn.Linear(d_model, 1)
self.head_event = nn.Linear(d_model, 1)
def forward(self, ts_input, event_token_fields, lam=0.4):
h_hist = self.ts_proj(self.ts_tower(ts_input))
h_sem = self.event_tower(event_token_fields)
h_fused = h_hist + h_sem # additive fusion
z = self.ffn(h_fused) + h_fused # residual connection
y_trend = self.head_trend(z)
y_event = self.head_event(z)
return lam * y_trend + (1 - lam) * y_event
Example 3: Building the LLM event extraction pipeline
User: "I have campaign data in a Postgres database and need to extract event features using an LLM. How do I set this up?"
Approach:
SELECT date, region, event_name, event_type, description,
start_date, end_date, discount_pct, free_shipping_threshold
FROM campaign_calendar
WHERE date BETWEEN :forecast_start AND :forecast_end
UNION ALL
SELECT date, region, holiday_name, 'holiday', description,
date, date, NULL, NULL
FROM holiday_calendar
WHERE date BETWEEN :forecast_start AND :forecast_end
ORDER BY date, region;
import re
from typing import NamedTuple
class EventFeatures(NamedTuple):
impact_direction: str
intensity: int
primary_driver: str
interaction_effects: str
cultural_notes: str
def parse_llm_output(response: str) -> EventFeatures:
match = re.search(r"<result>(.*?)</result>", response, re.DOTALL)
if not match:
return EventFeatures("neutral", 1, "none", "none", "none")
fields = [f.strip() for f in match.group(1).split(";")]
return EventFeatures(
impact_direction=fields[0],
intensity=int(fields[1]),
primary_driver=fields[2],
interaction_effects=fields[3] if len(fields) > 3 else "none",
cultural_notes=fields[4] if len(fields) > 4 else "none",
)
event_features table keyed by (date, region, event_hash) to avoid reprocessing unchanged eventsh_hist + h_sem) in a shared latent space works as well as or better than cross-attention for this task, and trains faster with fewer parameters.lambda=0.4 overweights events for your domain (causing over-prediction during minor promotions), tune it on a validation set. Values between 0.3 and 0.6 are typical; lower values favor the event head.Paper: EventCast: Hybrid Demand Forecasting in E-Commerce with LLM-Based Event Knowledge (Hu et al., 2026). Focus on Section 3 for the dual-tower architecture, Section 3.2 for the LLM prompt template design, and Section 4 for ablation studies showing the contribution of each component.
development
Audit LLM-based automatic short answer grading (ASAG) systems for adversarial vulnerabilities using token-level and prompt-level attack strategies from the GradingAttack framework. Triggers: 'test grading robustness', 'adversarial attack on grading', 'audit LLM grader', 'red-team answer grading', 'ASAG vulnerability assessment', 'grading fairness attack'
development
Build structured information-seeking agents that decompose complex queries into multi-turn search-and-browse workflows, aggregate results from multiple web sources, and return answers in typed structured formats (items, sets, lists, tables). Applies the GISA benchmark's ReAct-based agent architecture and evaluation methodology. Trigger phrases: "build an information-seeking agent", "search agent pipeline", "multi-turn web research agent", "structured web search workflow", "aggregate information from multiple sources", "web research with structured output"
data-ai
Optimize LLM prompts using GFlowPO's iterative generate-evaluate-refine loop with diversity-preserving exploration and dynamic memory. Use when: 'optimize this prompt', 'find a better prompt for this task', 'prompt engineering with examples', 'auto-tune my system prompt', 'improve prompt accuracy', 'generate prompt variations'.
development
Constrain LLM generation with executable Pydantic schemas and multi-agent pipelines to produce structurally valid, domain-rich artifacts. Uses ontology-as-grammar to eliminate hallucinated structures while preserving creative output. Trigger phrases: "generate a valid game design", "schema-constrained generation", "build a multi-agent pipeline with Pydantic validation", "ontology-driven content generation", "structured creative generation with DSPy", "generate artifacts that pass domain validation".