skills/26-Data-Wise-scholar/skills/research/literature-gap-finder/SKILL.md
Method×Setting matrices and systematic gap identification
npx skillsauth add brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research literature-gap-finderInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Systematic framework for identifying research opportunities in statistical methodology
Use this skill when: positioning research contributions, finding gaps in methodology literature, identifying unexplored combinations of methods and settings, building literature reviews, or deciding on research directions.
A publishable gap must be:
| Gap Type | Description | Example | |----------|-------------|---------| | Method Gap | No method exists for setting | No mediation analysis for network data | | Theory Gap | Method exists but lacks theory | Bootstrap for mediation lacks consistency proof | | Efficiency Gap | Methods exist but are inefficient | Doubly robust mediation more efficient | | Robustness Gap | Methods fail under violations | Mediation under measurement error | | Computational Gap | Existing methods don't scale | Mediation with high-dimensional confounders | | Extension Gap | Existing method needs generalization | Binary → continuous mediator |
The method-setting matrix is the core tool for finding research gaps systematically:
# Build a method-setting matrix programmatically
create_gap_matrix <- function() {
methods <- c("Regression", "Weighting/IPW", "DR/AIPW", "TMLE", "ML-based")
settings <- c("Binary treatment", "Continuous treatment",
"Time-varying", "Clustered", "High-dimensional",
"Measurement error", "Missing data", "Network")
matrix_data <- expand.grid(method = methods, setting = settings)
matrix_data$status <- "unknown" # To be filled: "developed", "partial", "gap"
matrix_data$priority <- NA
matrix_data$references <- ""
matrix_data
}
# Visualize the gap matrix
visualize_gaps <- function(gap_matrix) {
library(ggplot2)
ggplot(gap_matrix, aes(x = method, y = setting, fill = status)) +
geom_tile(color = "white") +
scale_fill_manual(values = c(
"developed" = "#2ecc71",
"partial" = "#f39c12",
"gap" = "#e74c3c",
"unknown" = "#95a5a6"
)) +
theme_minimal() +
labs(title = "Method × Setting Gap Matrix",
x = "Method", y = "Setting") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
}
Before claiming a gap, verify systematically:
| Step | Action | Tools | |------|--------|-------| | 1 | Search major databases | Google Scholar, Web of Science, Scopus | | 2 | Search preprint servers | arXiv, bioRxiv, SSRN | | 3 | Search R packages | CRAN, GitHub, R-universe | | 4 | Check conference proceedings | ICML, NeurIPS, JSM, ENAR | | 5 | Search dissertations | ProQuest, university repositories | | 6 | Email domain experts | 2-3 experts for confirmation |
# Systematic verification checklist
verify_gap <- function(topic, keywords) {
checklist <- list(
databases_searched = c("google_scholar", "web_of_science", "pubmed", "scopus"),
search_terms = keywords,
date_range = paste(Sys.Date() - 365*5, "to", Sys.Date()),
results = list(
papers_found = 0,
closest_related = c(),
why_not_the_same = ""
),
expert_consultation = list(
experts_contacted = c(),
responses = c()
),
verification_status = "pending" # pending, confirmed, rejected
)
checklist
}
# Document the verification
document_verification <- function(gap_description, search_log) {
cat("## Gap Verification Report\n\n")
cat("**Gap:**", gap_description, "\n\n")
cat("**Search Date:**", as.character(Sys.Date()), "\n\n")
cat("**Databases Searched:**\n")
for (db in search_log$databases_searched) {
cat("- ", db, "\n")
}
cat("\n**Search Terms:**", paste(search_log$search_terms, collapse = ", "), "\n")
cat("\n**Conclusion:**", search_log$verification_status, "\n")
}
| Criterion | Weight | Score 1-5 | |-----------|--------|-----------| | Impact (how many benefit?) | 0.25 | ___ | | Novelty (how new?) | 0.20 | ___ | | Tractability (can we solve it?) | 0.20 | ___ | | Timeliness (is it hot now?) | 0.15 | ___ | | Fit (matches our expertise?) | 0.10 | ___ | | Publication potential | 0.10 | ___ |
Priority Score = Σ(weight × score)
# Priority scoring function
score_research_gap <- function(
impact, # 1-5: How many researchers would benefit
novelty, # 1-5: How new/original is this
tractability, # 1-5: How likely can we solve it
timeliness, # 1-5: Is this currently hot
fit, # 1-5: Matches our expertise
publication # 1-5: Publication potential
) {
weights <- c(0.25, 0.20, 0.20, 0.15, 0.10, 0.10)
scores <- c(impact, novelty, tractability, timeliness, fit, publication)
priority <- sum(weights * scores)
list(
priority_score = priority,
interpretation = case_when(
priority >= 4.0 ~ "High priority - pursue immediately",
priority >= 3.0 ~ "Medium priority - develop further",
priority >= 2.0 ~ "Low priority - back burner",
TRUE ~ "Skip - not worth pursuing"
),
breakdown = data.frame(
criterion = c("Impact", "Novelty", "Tractability",
"Timeliness", "Fit", "Publication"),
weight = weights,
score = scores,
weighted = weights * scores
)
)
}
# Compare multiple gaps
rank_gaps <- function(gaps_list) {
scores <- sapply(gaps_list, function(g) g$priority_score)
order(scores, decreasing = TRUE)
}
Systematically map methods against settings to find gaps:
METHODS
│ Regression │ Weighting │ DR/TMLE │ ML-based │
──────────┼────────────┼───────────┼─────────┼──────────│
Binary A │ ✓ │ ✓ │ ✓ │ ✓ │
Continuous│ ✓ │ ? │ ✓ │ ? │
SETTINGS ├────────────┼───────────┼─────────┼──────────│
Time-vary │ ? │ ✓ │ ✓ │ ✗ │
Clustered │ ✓ │ ? │ ? │ ✗ │
High-dim │ ✗ │ ✗ │ ? │ ✓ │
✓ = Well-developed ? = Partial/emerging ✗ = Gap
Step 1: Identify Dimensions
For mediation analysis:
| Dimension | Variations | |-----------|------------| | Treatment | Binary, continuous, multi-level, time-varying | | Mediator | Single, multiple, high-dimensional, latent | | Outcome | Continuous, binary, count, survival, longitudinal | | Confounding | Measured, unmeasured, time-varying | | Structure | Single mediator, parallel, sequential, moderated | | Data | Cross-sectional, longitudinal, clustered, network | | Assumptions | Standard, relaxed positivity, measurement error |
Step 2: List Methods
| Method Family | Specific Methods | |---------------|------------------| | Regression | Baron-Kenny, product of coefficients, difference | | Weighting | IPW, MSM, sequential g-estimation | | Doubly Robust | AIPW, TMLE, cross-fitted | | Semiparametric | Influence function-based | | Bayesian | MCMC, variational | | Machine Learning | Causal forests, DML, neural | | Bounds | Partial identification, sensitivity |
Step 3: Fill and Analyze
Mark each cell:
│ Product │ Weighting │ DR │ Bounds │
─────────────────────────┼─────────┼───────────┼────┼────────│
2 mediators, linear │ ✓ │ ✓ │ ✓ │ ? │
2 mediators, nonlinear │ ? │ ✓ │ ? │ ✗ │
3+ mediators, linear │ ? │ ? │ ✗ │ ✗ │
3+ mediators, nonlinear │ ✗ │ ? │ ✗ │ ✗ │
With measurement error │ ✗ │ ✗ │ ✗ │ ✗ │
With unmeasured conf. │ ✗ │ ✗ │ ✗ │ ? │
Gaps identified:
Map how assumptions have been relaxed over time:
Standard Mediation (Baron-Kenny 1986)
│
┌─────────────────┼─────────────────┐
↓ ↓ ↓
No unmeasured Linearity No interaction
confounding assumed assumed
│ │ │
↓ ↓ ↓
┌───────┴───────┐ Nonparametric VanderWeele
↓ ↓ (Imai 2010) 4-way decomp
Sensitivity Bounds │
(Imai 2010) (partial ID) ↓
│ │ Multiple mediators?
↓ ↓ Longitudinal?
E-value Sharp bounds? Measurement error?
(Ding 2016) │ │
│ ↓ ↓
↓ [YOUR GAP?] [YOUR GAP?]
[YOUR GAP?]
Step 1: Identify Original Assumptions
For a classic method, list ALL assumptions:
Step 2: Trace Relaxation History
For each assumption, find papers that:
Step 3: Find Unexplored Branches
Look for:
Positivity: P(A=a|X) > ε > 0 for all a, x
│
┌───────────────┼───────────────┐
↓ ↓ ↓
Near-violation Practical Structural
positivity violations
│ │ │
↓ ↓ ↓
Trimming Overlap Extrapolation
weights assessment methods
│ │ │
↓ ↓ ↓
Truncation? Diagnostics? Bounds under
violations?
Backward: From recent key paper, trace citations:
Forward: Using Google Scholar "Cited by":
For any topic, identify:
| Category | Description | How to Find | |----------|-------------|-------------| | Foundational | Original method papers | Most-cited, oldest | | Textbook | Comprehensive treatments | Citations across subfields | | Recent reviews | State-of-the-art summaries | "Review" in title, last 5 years | | Frontier | Latest developments | Top journals, last 2 years | | Your competition | Groups working on same gap | Recent similar titles |
1986: Baron & Kenny [foundations]
│
├──→ 1990s: SEM extensions
│
├──→ 2004: Robins & Greenland [causal foundations]
│ │
│ ├──→ 2010: Imai et al. [sensitivity]
│ │
│ ├──→ 2010: VanderWeele [4-way]
│ │ │
│ │ └──→ 2015: Book [comprehensive]
│ │
│ └──→ 2014: Tchetgen [semiparametric]
│
└──→ 2020s: ML integration [frontier]
Before claiming a gap, verify:
When you identify a gap:
## Gap: [Brief Title]
### Setting
[Precise description of the setting where the gap exists]
### Current State
- **What exists**: [Methods that partially address this]
- **What works**: [Aspects of the problem already solved]
- **What fails**: [Where current methods break down]
### The Gap
- **Precise statement**: [What is missing]
- **Why it matters**: [Who needs this, for what applications]
- **Why it's hard**: [Technical challenges]
### Evidence of Gap
- [ ] Literature search documented
- [ ] No existing solution found
- [ ] Experts consulted (optional)
### Potential Approaches
1. [Approach 1]: [Brief description]
- Pros: [Advantages]
- Cons: [Challenges]
2. [Approach 2]: [Brief description]
- Pros: [Advantages]
- Cons: [Challenges]
### Related Work
- [Paper 1]: [How it relates, why it doesn't solve gap]
- [Paper 2]: [How it relates, why it doesn't solve gap]
### Contribution Positioning
"While [existing work] addresses [related problem], no method currently
handles [specific gap]. We propose [approach] which provides [properties]."
Gap template: "[Method] assumes [simple structure], but in [application] data has [complex structure]"
Examples:
Gap template: "[Method] requires [assumption], which is violated when [situation]"
Examples:
Gap template: "When [complication], standard estimands [NDE/NIE] are not well-defined or interpretable"
Examples:
Gap template: "Efficient methods require [strong assumptions], while robust methods are inefficient"
Examples:
Gap template: "Theoretically valid approach exists but [computational limitation]"
Examples:
Strong positioning formula:
"Although [Author Year] developed [method] for [setting], their approach [limitation]. In contrast, our method [advantage] while maintaining [property]. Specifically, we contribute: (1) [theoretical contribution], (2) [methodological contribution], (3) [practical contribution]."
| Position | When to Use | Example Language | |----------|-------------|------------------| | Extension | Build on existing | "We extend [method] to [new setting]" | | Synthesis | Combine approaches | "We unify [method A] and [method B]" | | Alternative | Different approach | "We propose an alternative that [advantage]" | | Correction | Fix limitation | "We address the limitation of [method]" | | Generalization | Broader framework | "We develop a general framework that includes [special cases]" |
| Dimension | Competitor 1 | Competitor 2 | Our Method | |-----------|--------------|--------------|------------| | Setting | Binary A only | Any A | Any A | | Theory | Consistency | + Normality | + Efficiency | | Assumptions | Strong | Medium | Weaker | | Computation | Fast | Slow | Medium | | Software | R package | None | R + Python |
This skill works with:
Version: 1.0 Created: 2025-12-08 Domain: Research Strategy, Literature Review
development
Conduct rigorous thematic analysis (TA) of qualitative data following Braun and Clarke's (2006) six-phase framework. Use whenever the user mentions 'thematic analysis', 'TA', 'Braun and Clarke', 'qualitative coding', 'identifying themes', or asks for help analysing interviews, focus groups, open-ended survey responses, or transcripts to identify patterns. Also trigger for questions about inductive vs theoretical coding, semantic vs latent themes, essentialist vs constructionist epistemology, building a thematic map, or writing up a qualitative findings section. Covers all six phases, the four upfront analytic decisions, the 15-point quality checklist, and the five common pitfalls. Produces a Word document write-up and an annotated thematic map. Does NOT cover IPA, grounded theory, discourse analysis, conversation analysis, or narrative analysis — use a different method for those.
development
Guide users through writing a systematic literature review (SLR) following the PRISMA 2020 framework. Use this skill whenever the user mentions 'systematic review', 'systematic literature review', 'SLR', 'PRISMA', 'PRISMA 2020', 'PRISMA flow diagram', 'PRISMA checklist', or asks for help writing, structuring, or auditing a literature review that follows reporting guidelines. Also trigger when the user asks about inclusion/exclusion criteria for a review, search strategies for databases like Scopus/WoS/PubMed, study selection processes, risk of bias assessment, or narrative synthesis for a review paper. This skill covers the full PRISMA 2020 checklist (27 items), produces a Word document manuscript in strict journal article format, generates an annotated PRISMA flow diagram, and enforces APA 7th Edition referencing throughout. It does NOT cover meta-analysis or statistical pooling. By Chuah Kee Man.
testing
Performs placebo-in-time sensitivity analysis with hierarchical null model and optional Bayesian assurance. Use when checking model robustness, verifying lack of pre-intervention effects, or estimating study power.
data-ai
Fit, summarize, plot, and interpret a chosen CausalPy experiment. Use after the causal method has been selected, including when configuring PyMC/sklearn models and scale-aware custom priors.