skills/signal-classification/SKILL.md
ML trading signal classifiers using XGBoost and LightGBM with walk-forward validation, SHAP feature importance, and threshold optimization
npx skillsauth add agiprolabs/claude-trading-skills signal-classificationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Predict whether an asset's price will move up or down over a forward horizon using supervised machine learning classifiers. This skill covers the full pipeline: label creation, model training, walk-forward validation, feature importance analysis, and threshold optimization for trading applications.
XGBoost and LightGBM are the workhorses of quantitative trading ML for good reason:
Linear models and deep learning have their place, but for tabular trading features with fewer than 100k samples, gradient-boosted trees consistently outperform alternatives.
The simplest and most common setup. Predict whether forward returns exceed a threshold:
import numpy as np
def create_binary_labels(
prices: np.ndarray, horizon: int = 24, threshold: float = 0.01
) -> np.ndarray:
"""Create binary labels from forward returns.
Args:
prices: Array of prices.
horizon: Forward return lookback in bars.
threshold: Minimum return magnitude for a label.
Returns:
Array of labels: 1 (up), 0 (down), NaN (neutral).
"""
fwd_returns = np.roll(prices, -horizon) / prices - 1
fwd_returns[-horizon:] = np.nan
labels = np.where(fwd_returns > threshold, 1,
np.where(fwd_returns < -threshold, 0, np.nan))
return labels
Three classes for finer signal granularity:
| Class | Condition | Typical threshold | |-------|-----------|-------------------| | Strong Up | fwd_return > +2% | High confidence long | | Mild Up | +0.5% to +2% | Moderate confidence | | Down | fwd_return < -0.5% | Avoid / short |
Multi-class reduces per-class sample size. Use only with large datasets (1000+ samples per class).
Raw model probabilities from XGBoost/LightGBM are not well-calibrated. A predicted 0.7 probability does not mean 70% chance of being correct. Use calibration to fix this:
from sklearn.calibration import CalibratedClassifierCV
calibrated = CalibratedClassifierCV(base_model, cv=5, method="isotonic")
calibrated.fit(X_train, y_train)
probs = calibrated.predict_proba(X_test)[:, 1]
Isotonic calibration works better than Platt scaling for tree models.
This is the single most important concept in trading ML. Standard cross-validation randomly shuffles data, which creates lookahead bias. Walk-forward validation respects time ordering.
Window 1: [===TRAIN===][GAP][=TEST=]
Window 2: [===TRAIN===][GAP][=TEST=]
Window 3: [===TRAIN===][GAP][=TEST=]
Window 4: [===TRAIN===][GAP][=TEST=]
Each window:
| Parameter | Value | Rationale | |-----------|-------|-----------| | Train window | 30 days (720 hourly bars) | Enough data to learn, recent enough to be relevant | | Test window | 7 days (168 hourly bars) | Enough predictions for statistical significance | | Step size | 1 day (24 bars) | Overlap test windows for more data points | | Gap (embargo) | Same as forward horizon | Prevents label leakage |
from typing import Iterator
def walk_forward_splits(
n_samples: int,
train_size: int = 720,
test_size: int = 168,
step_size: int = 24,
gap: int = 24,
) -> Iterator[tuple[np.ndarray, np.ndarray]]:
"""Generate walk-forward train/test index splits.
Args:
n_samples: Total number of samples.
train_size: Number of training samples per window.
test_size: Number of test samples per window.
step_size: Step between successive windows.
gap: Gap between train end and test start.
Yields:
Tuples of (train_indices, test_indices).
"""
start = 0
while start + train_size + gap + test_size <= n_samples:
train_idx = np.arange(start, start + train_size)
test_start = start + train_size + gap
test_idx = np.arange(test_start, test_start + test_size)
yield train_idx, test_idx
start += step_size
See references/validation_methods.md for purged CV, CPCV, and evaluation metrics.
feature-engineering skill)from xgboost import XGBClassifier
model = XGBClassifier(
n_estimators=200,
max_depth=4,
learning_rate=0.05,
subsample=0.8,
colsample_bytree=0.8,
eval_metric="logloss",
use_label_encoder=False,
random_state=42,
)
model.fit(
X_train, y_train,
eval_set=[(X_val, y_val)],
verbose=False,
)
probabilities = model.predict_proba(X_test)[:, 1]
See references/model_guide.md for parameter recommendations and tuning.
SHAP (SHapley Additive exPlanations) provides the gold standard for understanding model predictions.
Which features matter most across all predictions:
import shap
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_test)
# Summary plot (top 15 features)
shap.summary_plot(shap_values, X_test, max_display=15)
Why a specific prediction was made:
# Explain a single prediction
shap.force_plot(explainer.expected_value, shap_values[0], X_test.iloc[0])
Track how feature importance drifts over walk-forward windows. If a feature's importance drops significantly, the market regime may have shifted.
The default 0.5 probability threshold is almost never optimal for trading.
def optimize_threshold(
probabilities: np.ndarray,
returns: np.ndarray,
thresholds: np.ndarray | None = None,
) -> tuple[float, float]:
"""Find threshold that maximizes profit factor.
Args:
probabilities: Model predicted probabilities.
returns: Actual forward returns.
thresholds: Thresholds to search over.
Returns:
Tuple of (best_threshold, best_profit_factor).
"""
if thresholds is None:
thresholds = np.arange(0.50, 0.85, 0.01)
best_threshold, best_pf = 0.5, 0.0
for t in thresholds:
signals = probabilities >= t
if signals.sum() < 10:
continue
signal_returns = returns[signals]
wins = signal_returns[signal_returns > 0].sum()
losses = abs(signal_returns[signal_returns < 0].sum())
pf = wins / losses if losses > 0 else 0.0
if pf > best_pf:
best_pf = pf
best_threshold = t
return best_threshold, best_pf
Typical finding: optimal threshold is 0.60-0.75 for crypto trading signals.
Crypto market regimes change fast. A model trained on 6 months of data may perform worse than one trained on 30 days. Use shorter training windows and retrain frequently.
Most time periods are "flat" (returns within the neutral zone). Strategies to handle this:
scale_pos_weight in XGBoostA model with 55% accuracy sounds good, but after 0.5% round-trip costs (slippage + fees), many signals become unprofitable. Always evaluate signals net of costs:
net_return = gross_return - 0.005 # 50 bps round-trip
Features lose predictive power over time as more participants discover and trade on them. Monitor rolling performance and retrain when metrics degrade.
| Skill | Integration |
|-------|-------------|
| feature-engineering | Compute input features for the classifier |
| vectorbt | Backtest trading strategies from ML signals |
| regime-detection | Train separate models per regime, or use regime as a feature |
| position-sizing | Size positions based on classifier confidence |
| risk-management | Apply portfolio-level risk limits to ML-generated signals |
references/model_guide.md — XGBoost and LightGBM parameter guide, tuning, and ensemblingreferences/validation_methods.md — Walk-forward, purged CV, CPCV, and evaluation metricsscripts/train_classifier.py — Train a signal classifier with walk-forward validation and feature importancescripts/walk_forward_backtest.py — Backtest ML signals vs buy-and-hold with walk-forward validation# Core (required)
uv pip install pandas numpy scikit-learn
# Optional (recommended)
uv pip install xgboost lightgbm shap
data-ai
DeFi yield evaluation including fee APR, real vs nominal yield, net APY after costs, and yield sustainability analysis
tools
Real-time Solana transaction and account streaming via Yellowstone gRPC (Geyser plugin)
tools
Large wallet monitoring, accumulation and distribution detection, and smart money signal generation for Solana tokens
tools
Wash sale detection under 2025 US crypto rules with 61-day window monitoring, disallowed loss tracking, and safe re-entry countdown