skills/cointegration-analysis/SKILL.md
Cointegration testing for pairs trading using Engle-Granger, Johansen, and rolling stability analysis
npx skillsauth add agiprolabs/claude-trading-skills cointegration-analysisInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Cointegration testing identifies pairs of assets that share a long-run equilibrium relationship, enabling statistical arbitrage and pairs trading strategies.
Two price series are cointegrated when they are individually non-stationary (random walks) but a linear combination of them is stationary (mean-reverting). Intuitively, the prices may wander apart temporarily but are pulled back to an equilibrium spread over time.
| Property | Correlation | Cointegration | |---|---|---| | Measures | Short-term co-movement | Long-run equilibrium | | Stationarity | Requires stationary returns | Works with non-stationary prices | | Time horizon | Can change rapidly | Stable over months/years | | Trading use | Momentum/trend signals | Mean-reversion pairs trades | | Failure mode | Breaks in regime changes | Breaks on structural shifts |
Two assets can be highly correlated but not cointegrated (e.g., two unrelated uptrends). Conversely, cointegrated assets may have low short-term correlation during temporary divergences — which is exactly when pairs trades are entered.
The most common approach for two series.
Step 1 — Regress Y on X using OLS:
Y_t = α + β * X_t + ε_t
Step 2 — Test the residuals ε_t for stationarity using the ADF test.
Important: Engle-Granger critical values differ from standard ADF critical values. For n=2 series: 1% = -3.90, 5% = -3.34, 10% = -3.04.
Asymmetry warning: Testing Y~X can give a different result than X~Y. Always test both directions and use the stronger result.
from scipy import stats
import numpy as np
from statsmodels.tsa.stattools import adfuller
# Step 1: OLS regression
slope, intercept, _, _, _ = stats.linregress(x_prices, y_prices)
hedge_ratio = slope
# Step 2: Test residuals
residuals = y_prices - hedge_ratio * x_prices - intercept
adf_stat, p_value, _, _, crit_values, _ = adfuller(residuals, maxlag=None, autolag="AIC")
cointegrated = p_value < 0.05
Tests multiple series simultaneously and returns the number of cointegrating relationships. More powerful than Engle-Granger for >2 series.
from statsmodels.tsa.vector_ar.vecm import coint_johansen
# data: T×N array of price series
result = coint_johansen(data, det_order=0, k_ar_diff=1)
# Trace statistic vs critical values (90%, 95%, 99%)
trace_stats = result.lr1 # Trace statistics
trace_crit = result.cvt # Critical values
max_eigen_stats = result.lr2 # Max eigenvalue statistics
max_eigen_crit = result.cvm # Critical values
# Cointegrating vectors
coint_vectors = result.evec
Similar to Engle-Granger but uses Phillips-Perron style test statistics
instead of ADF. More robust to heteroskedasticity and serial correlation in
the residuals. Available via statsmodels.tsa.stattools.coint.
from statsmodels.tsa.stattools import coint
# Returns: test statistic, p-value, critical values
t_stat, p_value, crit_values = coint(y_prices, x_prices)
cointegrated = p_value < 0.05
Pre-filter using Pearson correlation > 0.7 to reduce the number of cointegration tests (which are more expensive).
Run Engle-Granger in both directions. Use p < 0.05 threshold.
Use OLS for simplicity. For production, consider Total Least Squares or
Dynamic OLS (see references/methodology.md).
spread = y_prices - hedge_ratio * x_prices - intercept
z_score = (spread - spread.mean()) / spread.std()
If the spread is mean-reverting, it is a viable pairs trade candidate.
See references/pairs_trading.md for entry/exit rules and risk management.
Cointegration relationships can break down over time due to structural changes, regime shifts, or evolving market dynamics.
Test cointegration on rolling 60–90 day windows:
window = 60
rolling_pvalues = []
rolling_hedges = []
for i in range(window, len(prices)):
y_win = y_prices[i - window:i]
x_win = x_prices[i - window:i]
_, p_val, _ = coint(y_win, x_win)
slope, intercept, _, _, _ = stats.linregress(x_win, y_win)
rolling_pvalues.append(p_val)
rolling_hedges.append(slope)
| Signal | Healthy | Warning | Stop Trading | |---|---|---|---| | Rolling p-value | < 0.05 | 0.05–0.10 | > 0.10 | | Hedge ratio drift | < 10% change | 10–25% change | > 25% change | | Spread half-life | 5–60 days | 60–120 days | > 120 days or < 5 |
Spurious cointegration — Two trending series (both up in a bull market) may appear cointegrated. Always test on sufficient data (>200 observations) and check out-of-sample stability.
Structural breaks — A fundamental change (protocol upgrade, tokenomics change) can permanently break cointegration. Monitor rolling p-values.
Look-ahead bias — Estimating the hedge ratio on the full sample and then backtesting on the same sample inflates results. Always use walk-forward estimation.
Too-short sample — Cointegration tests need >100 observations minimum, ideally >200, to have reasonable power.
Ignoring transaction costs — Pairs trades involve 4 transactions per round trip. At 0.3% per leg, that is 1.2% in costs that the spread must overcome.
Asymmetric cointegration — The relationship may only hold in one direction or one regime. Consider threshold cointegration models for production use.
correlation-analysis — Pre-screening pairs by correlation before cointegration testingmean-reversion — Trading the cointegrated spread using mean-reversion entry/exit rulesvectorbt — Backtesting pairs strategies with walk-forward validationregime-detection — Identifying when cointegration regimes shiftvolatility-modeling — Spread volatility forecasting for dynamic position sizingreferences/methodology.md — Engle-Granger details, Johansen derivation, hedge ratio estimation methods, spread constructionreferences/pairs_trading.md — Entry/exit rules, risk management, performance metrics, crypto-specific considerationsscripts/test_cointegration.py — Full cointegration test pipeline with ADF, Hurst, half-life, rolling stability, and demo modescripts/pairs_backtest.py — Walk-forward pairs trading backtest with synthetic data and performance reportingdata-ai
DeFi yield evaluation including fee APR, real vs nominal yield, net APY after costs, and yield sustainability analysis
tools
Real-time Solana transaction and account streaming via Yellowstone gRPC (Geyser plugin)
tools
Large wallet monitoring, accumulation and distribution detection, and smart money signal generation for Solana tokens
tools
Wash sale detection under 2025 US crypto rules with 61-day window monitoring, disallowed loss tracking, and safe re-entry countdown