skills/43-wentorai-research-plugins/skills/analysis/econometrics/panel-data-analyst/SKILL.md
Expert panel data regression analysis with fixed effects and GMM
npx skillsauth add brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research panel-data-analystInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Perform expert-level panel data regression analysis including fixed effects, random effects, dynamic panel models (Arellano-Bond/Blundell-Bond GMM), and advanced diagnostic tests. This skill covers the full workflow from panel setup through model selection, estimation, and publication-ready reporting.
Panel data -- repeated observations on the same cross-sectional units over time -- is the workhorse of modern empirical economics, finance, political science, and management research. Panel methods exploit both cross-sectional and temporal variation, enabling researchers to control for unobserved heterogeneity that would bias ordinary cross-sectional estimates.
The choice between fixed effects, random effects, and dynamic panel estimators depends on the data structure, the nature of unobserved heterogeneity, and the identifying assumptions the researcher is willing to make. This skill provides a systematic decision framework and implementation in both Stata and R, with emphasis on the diagnostic tests that justify model selection.
Beyond basic FE/RE models, this skill covers the advanced techniques increasingly required by journal reviewers: instrumental variables within panel frameworks, Driscoll-Kraay standard errors for cross-sectional dependence, correlated random effects (Mundlak/Chamberlain), and system GMM for dynamic panels with endogenous regressors.
* Stata panel setup
xtset firm_id year
xtset // Verify panel structure
* Check panel balance
xtdescribe
* Shows: min/max/avg observations per panel, gaps
* Summary statistics by panel dimension
xtsum revenue profit employees rnd_spending
* Reports overall, between, and within variation
* Check for gaps in panel
xtset firm_id year
gen gap = year - l.year if l.year != .
tab gap // Should be all 1's for balanced annual panels
* Create balanced subsample
by firm_id: gen T_i = _N
tab T_i
keep if T_i == max_T // Keep only units observed in all periods
* Attrition analysis
gen in_panel = 1
xtset firm_id year
tsfill, full
replace in_panel = 0 if missing(in_panel)
reg in_panel l.revenue l.profit l.size, cluster(firm_id)
* Within estimator (entity fixed effects)
xtreg profit revenue rnd_spending employees i.year, fe robust
estimates store fe_model
* Entity and time fixed effects
reghdfe profit revenue rnd_spending employees, ///
absorb(firm_id year) cluster(firm_id)
estimates store twoway_fe
* First-differences (alternative to within estimator)
reg d.profit d.revenue d.rnd_spending d.employees i.year, ///
cluster(firm_id)
estimates store fd_model
* GLS random effects
xtreg profit revenue rnd_spending employees i.year, re robust
estimates store re_model
* Classic Hausman test
xtreg profit revenue rnd_spending employees, fe
estimates store fe_haus
xtreg profit revenue rnd_spending employees, re
estimates store re_haus
hausman fe_haus re_haus
* Robust Hausman test (preferred with heteroskedasticity)
* Mundlak (1978) approach: add group means to RE model
foreach var of varlist revenue rnd_spending employees {
bysort firm_id: egen m_`var' = mean(`var')
}
xtreg profit revenue rnd_spending employees ///
m_revenue m_rnd_spending m_employees i.year, re cluster(firm_id)
test m_revenue m_rnd_spending m_employees
* Rejection => FE preferred; failure to reject => RE acceptable
* When the lagged dependent variable is a regressor:
* y_it = alpha * y_{i,t-1} + X_it * beta + mu_i + epsilon_it
* Difference GMM (Arellano & Bond 1991)
xtabond profit l.profit revenue rnd_spending employees, ///
lags(1) twostep robust artests(2)
* Diagnostics
* AR(1) should be significant, AR(2) should NOT be significant
* Hansen J test of overidentifying restrictions (p > 0.10 desired)
* System GMM (Blundell & Bond 1998)
* More efficient than difference GMM, especially with persistent series
xtabond2 profit l.profit revenue rnd_spending employees i.year, ///
gmm(l.profit, lag(2 4) collapse) ///
gmm(revenue rnd_spending, lag(2 3) collapse) ///
iv(employees i.year) ///
twostep robust orthogonal small
* Key diagnostics to report:
* 1. Number of instruments (should not exceed number of groups)
* 2. Hansen J test p-value (> 0.10, but < 0.25 preferred -- not too high)
* 3. AR(2) test p-value (> 0.10 for valid instruments)
* 4. Difference-in-Hansen test for subset of instruments
| Test | Null Hypothesis | Desired Result | Stata Command | |------|----------------|----------------|---------------| | AR(1) | No first-order autocorrelation | Reject (p < 0.05) | Reported automatically | | AR(2) | No second-order autocorrelation | Fail to reject (p > 0.10) | Reported automatically | | Hansen J | Instruments are valid | Fail to reject (p > 0.10) | Reported automatically | | Diff-in-Hansen | Level instruments valid | Fail to reject (p > 0.10) | Reported automatically | | Instrument count | -- | N_instruments < N_groups | Check output |
* Entity-clustered (default choice for firm panels)
xtreg profit revenue rnd_spending, fe cluster(firm_id)
* Two-way clustering (firm and year)
reghdfe profit revenue rnd_spending, ///
absorb(firm_id) cluster(firm_id year)
* Driscoll-Kraay standard errors (cross-sectional dependence)
xtscc profit revenue rnd_spending i.year, fe lag(3)
* Newey-West within panels (autocorrelation + heteroskedasticity)
xtreg profit revenue rnd_spending, fe
xtpcse profit revenue rnd_spending i.firm_id, correlation(ar1)
* Test for heteroskedasticity in FE model
xtreg profit revenue rnd_spending, fe
xttest3 // Modified Wald test (rejects => use robust/cluster SE)
* Test for serial correlation
xtserial profit revenue rnd_spending
* Wooldridge test (rejects => use cluster SE or Newey-West)
* Test for cross-sectional dependence
xtreg profit revenue rnd_spending, fe
xtcsd, pesaran abs
* Pesaran CD test (rejects => consider Driscoll-Kraay SE)
* Continuous x continuous interaction with FE
xtreg profit c.rnd_spending##c.market_share i.year, fe cluster(firm_id)
* Visualize marginal effect
margins, dydx(rnd_spending) at(market_share=(0(0.1)1))
marginsplot, title("Marginal Effect of R&D by Market Share")
* IV with fixed effects (xtivreg)
xtivreg profit (rnd_spending = tax_credit regulatory_change) ///
employees size i.year, fe first
* First-stage F-statistic check
* Report Kleibergen-Paap rk Wald F for weak instruments
* Mundlak (1978) approach: include within-group means
foreach var of varlist revenue rnd_spending employees {
bysort firm_id: egen bar_`var' = mean(`var')
}
xtreg profit revenue rnd_spending employees ///
bar_revenue bar_rnd_spending bar_employees ///
i.year, re cluster(firm_id)
* Coefficients on time-varying vars are equivalent to FE estimates
* Coefficients on bar_ vars capture between-unit effects
* Comparison table: FE vs RE vs GMM
esttab fe_model re_model gmm_model using "tables/panel_comparison.tex", ///
b(3) se(3) star(* 0.10 ** 0.05 *** 0.01) ///
label title("Panel Regression Results") ///
mtitles("Fixed Effects" "Random Effects" "System GMM") ///
stats(N N_g r2_w ar2p hansenp, ///
labels("Observations" "Firms" "Within R-squared" ///
"AR(2) p-value" "Hansen p-value") ///
fmt(0 0 3 3 3)) ///
addnotes("Clustered standard errors in parentheses." ///
"All models include year fixed effects.") ///
replace
development
Conduct rigorous thematic analysis (TA) of qualitative data following Braun and Clarke's (2006) six-phase framework. Use whenever the user mentions 'thematic analysis', 'TA', 'Braun and Clarke', 'qualitative coding', 'identifying themes', or asks for help analysing interviews, focus groups, open-ended survey responses, or transcripts to identify patterns. Also trigger for questions about inductive vs theoretical coding, semantic vs latent themes, essentialist vs constructionist epistemology, building a thematic map, or writing up a qualitative findings section. Covers all six phases, the four upfront analytic decisions, the 15-point quality checklist, and the five common pitfalls. Produces a Word document write-up and an annotated thematic map. Does NOT cover IPA, grounded theory, discourse analysis, conversation analysis, or narrative analysis — use a different method for those.
development
Guide users through writing a systematic literature review (SLR) following the PRISMA 2020 framework. Use this skill whenever the user mentions 'systematic review', 'systematic literature review', 'SLR', 'PRISMA', 'PRISMA 2020', 'PRISMA flow diagram', 'PRISMA checklist', or asks for help writing, structuring, or auditing a literature review that follows reporting guidelines. Also trigger when the user asks about inclusion/exclusion criteria for a review, search strategies for databases like Scopus/WoS/PubMed, study selection processes, risk of bias assessment, or narrative synthesis for a review paper. This skill covers the full PRISMA 2020 checklist (27 items), produces a Word document manuscript in strict journal article format, generates an annotated PRISMA flow diagram, and enforces APA 7th Edition referencing throughout. It does NOT cover meta-analysis or statistical pooling. By Chuah Kee Man.
testing
Performs placebo-in-time sensitivity analysis with hierarchical null model and optional Bayesian assurance. Use when checking model robustness, verifying lack of pre-intervention effects, or estimating study power.
data-ai
Fit, summarize, plot, and interpret a chosen CausalPy experiment. Use after the causal method has been selected, including when configuring PyMC/sklearn models and scale-aware custom priors.