Literature Gap Finder

Systematic framework for identifying research opportunities in statistical methodology

Use this skill when: positioning research contributions, finding gaps in methodology literature, identifying unexplored combinations of methods and settings, building literature reviews, or deciding on research directions.

The Gap-Finding Framework

What Makes a Good Research Gap?

A publishable gap must be:

Real - Not already addressed (check thoroughly!)
Important - Solves a problem researchers face
Tractable - Can be solved with available tools
Novel - Provides new insight, not just combination
Timely - Relevant to current research trends

Types of Gaps

| Gap Type | Description | Example | |----------|-------------|---------| | Method Gap | No method exists for setting | No mediation analysis for network data | | Theory Gap | Method exists but lacks theory | Bootstrap for mediation lacks consistency proof | | Efficiency Gap | Methods exist but are inefficient | Doubly robust mediation more efficient | | Robustness Gap | Methods fail under violations | Mediation under measurement error | | Computational Gap | Existing methods don't scale | Mediation with high-dimensional confounders | | Extension Gap | Existing method needs generalization | Binary → continuous mediator |

Method-Setting Matrix

Systematic Gap Identification Framework

The method-setting matrix is the core tool for finding research gaps systematically:

# Build a method-setting matrix programmatically
create_gap_matrix <- function() {
  methods <- c("Regression", "Weighting/IPW", "DR/AIPW", "TMLE", "ML-based")
  settings <- c("Binary treatment", "Continuous treatment",
                "Time-varying", "Clustered", "High-dimensional",
                "Measurement error", "Missing data", "Network")

  matrix_data <- expand.grid(method = methods, setting = settings)
  matrix_data$status <- "unknown"  # To be filled: "developed", "partial", "gap"
  matrix_data$priority <- NA
  matrix_data$references <- ""

  matrix_data
}

# Visualize the gap matrix
visualize_gaps <- function(gap_matrix) {
  library(ggplot2)

  ggplot(gap_matrix, aes(x = method, y = setting, fill = status)) +
    geom_tile(color = "white") +
    scale_fill_manual(values = c(
      "developed" = "#2ecc71",
      "partial" = "#f39c12",
      "gap" = "#e74c3c",
      "unknown" = "#95a5a6"
    )) +
    theme_minimal() +
    labs(title = "Method × Setting Gap Matrix",
         x = "Method", y = "Setting") +
    theme(axis.text.x = element_text(angle = 45, hjust = 1))
}

Verification Process

Confirming a Gap is Real

Before claiming a gap, verify systematically:

| Step | Action | Tools | |------|--------|-------| | 1 | Search major databases | Google Scholar, Web of Science, Scopus | | 2 | Search preprint servers | arXiv, bioRxiv, SSRN | | 3 | Search R packages | CRAN, GitHub, R-universe | | 4 | Check conference proceedings | ICML, NeurIPS, JSM, ENAR | | 5 | Search dissertations | ProQuest, university repositories | | 6 | Email domain experts | 2-3 experts for confirmation |

# Systematic verification checklist
verify_gap <- function(topic, keywords) {
  checklist <- list(
    databases_searched = c("google_scholar", "web_of_science", "pubmed", "scopus"),
    search_terms = keywords,
    date_range = paste(Sys.Date() - 365*5, "to", Sys.Date()),
    results = list(
      papers_found = 0,
      closest_related = c(),
      why_not_the_same = ""
    ),
    expert_consultation = list(
      experts_contacted = c(),
      responses = c()
    ),
    verification_status = "pending"  # pending, confirmed, rejected
  )

  checklist
}

# Document the verification
document_verification <- function(gap_description, search_log) {
  cat("## Gap Verification Report\n\n")
  cat("**Gap:**", gap_description, "\n\n")
  cat("**Search Date:**", as.character(Sys.Date()), "\n\n")
  cat("**Databases Searched:**\n")
  for (db in search_log$databases_searched) {
    cat("- ", db, "\n")
  }
  cat("\n**Search Terms:**", paste(search_log$search_terms, collapse = ", "), "\n")
  cat("\n**Conclusion:**", search_log$verification_status, "\n")
}

Priority Ranking

Scoring Research Gaps

| Criterion | Weight | Score 1-5 | |-----------|--------|-----------| | Impact (how many benefit?) | 0.25 | ___ | | Novelty (how new?) | 0.20 | ___ | | Tractability (can we solve it?) | 0.20 | ___ | | Timeliness (is it hot now?) | 0.15 | ___ | | Fit (matches our expertise?) | 0.10 | ___ | | Publication potential | 0.10 | ___ |

Priority Score = Σ(weight × score)

# Priority scoring function
score_research_gap <- function(
  impact,        # 1-5: How many researchers would benefit
  novelty,       # 1-5: How new/original is this
  tractability,  # 1-5: How likely can we solve it
  timeliness,    # 1-5: Is this currently hot
  fit,           # 1-5: Matches our expertise
  publication    # 1-5: Publication potential
) {
  weights <- c(0.25, 0.20, 0.20, 0.15, 0.10, 0.10)
  scores <- c(impact, novelty, tractability, timeliness, fit, publication)

  priority <- sum(weights * scores)

  list(
    priority_score = priority,
    interpretation = case_when(
      priority >= 4.0 ~ "High priority - pursue immediately",
      priority >= 3.0 ~ "Medium priority - develop further",
      priority >= 2.0 ~ "Low priority - back burner",
      TRUE ~ "Skip - not worth pursuing"
    ),
    breakdown = data.frame(
      criterion = c("Impact", "Novelty", "Tractability",
                   "Timeliness", "Fit", "Publication"),
      weight = weights,
      score = scores,
      weighted = weights * scores
    )
  )
}

# Compare multiple gaps
rank_gaps <- function(gaps_list) {
  scores <- sapply(gaps_list, function(g) g$priority_score)
  order(scores, decreasing = TRUE)
}

Method × Setting Matrix

The Core Framework

Systematically map methods against settings to find gaps:

                    METHODS
          │ Regression │ Weighting │ DR/TMLE │ ML-based │
──────────┼────────────┼───────────┼─────────┼──────────│
Binary A  │     ✓      │     ✓     │    ✓    │    ✓     │
Continuous│     ✓      │     ?     │    ✓    │    ?     │
SETTINGS  ├────────────┼───────────┼─────────┼──────────│
Time-vary │     ?      │     ✓     │    ✓    │    ✗     │
Clustered │     ✓      │     ?     │    ?    │    ✗     │
High-dim  │     ✗      │     ✗     │    ?    │    ✓     │

✓ = Well-developed    ? = Partial/emerging    ✗ = Gap

Building Your Matrix

Step 1: Identify Dimensions

For mediation analysis:

| Dimension | Variations | |-----------|------------| | Treatment | Binary, continuous, multi-level, time-varying | | Mediator | Single, multiple, high-dimensional, latent | | Outcome | Continuous, binary, count, survival, longitudinal | | Confounding | Measured, unmeasured, time-varying | | Structure | Single mediator, parallel, sequential, moderated | | Data | Cross-sectional, longitudinal, clustered, network | | Assumptions | Standard, relaxed positivity, measurement error |

Step 2: List Methods

| Method Family | Specific Methods | |---------------|------------------| | Regression | Baron-Kenny, product of coefficients, difference | | Weighting | IPW, MSM, sequential g-estimation | | Doubly Robust | AIPW, TMLE, cross-fitted | | Semiparametric | Influence function-based | | Bayesian | MCMC, variational | | Machine Learning | Causal forests, DML, neural | | Bounds | Partial identification, sensitivity |

Step 3: Fill and Analyze

Mark each cell:

✓ (green): Well-established with theory + software
? (yellow): Emerging or partial coverage
✗ (red): Clear gap

Example: Sequential Mediation Matrix

                         │ Product │ Weighting │ DR │ Bounds │
─────────────────────────┼─────────┼───────────┼────┼────────│
2 mediators, linear      │    ✓    │     ✓     │  ✓ │   ?    │
2 mediators, nonlinear   │    ?    │     ✓     │  ? │   ✗    │
3+ mediators, linear     │    ?    │     ?     │  ✗ │   ✗    │
3+ mediators, nonlinear  │    ✗    │     ?     │  ✗ │   ✗    │
With measurement error   │    ✗    │     ✗     │  ✗ │   ✗    │
With unmeasured conf.    │    ✗    │     ✗     │  ✗ │   ?    │

Gaps identified:

DR methods for 3+ mediators
Any method with measurement error
Bounds approach underdeveloped

Assumption Relaxation Trees

The Framework

Map how assumptions have been relaxed over time:

                    Standard Mediation (Baron-Kenny 1986)
                              │
            ┌─────────────────┼─────────────────┐
            ↓                 ↓                 ↓
    No unmeasured      Linearity         No interaction
    confounding        assumed           assumed
            │                 │                 │
            ↓                 ↓                 ↓
    ┌───────┴───────┐   Nonparametric    VanderWeele
    ↓               ↓     (Imai 2010)    4-way decomp
Sensitivity      Bounds                        │
(Imai 2010)   (partial ID)                    ↓
    │               │               Multiple mediators?
    ↓               ↓               Longitudinal?
 E-value        Sharp bounds?       Measurement error?
(Ding 2016)         │                    │
    │               ↓                    ↓
    ↓           [YOUR GAP?]         [YOUR GAP?]
[YOUR GAP?]

Building the Tree

Step 1: Identify Original Assumptions

For a classic method, list ALL assumptions:

Explicit assumptions (stated in paper)
Implicit assumptions (unstated but required)
Computational assumptions (required for implementation)

Step 2: Trace Relaxation History

For each assumption, find papers that:

Relax it partially
Relax it completely
Replace it with different assumption
Show consequences of violation

Step 3: Find Unexplored Branches

Look for:

Combinations of relaxations not yet explored
Relaxations in one method not applied to another
Partial relaxations that could be completed

Example: Positivity Assumption

Positivity: P(A=a|X) > ε > 0 for all a, x
                    │
    ┌───────────────┼───────────────┐
    ↓               ↓               ↓
Near-violation  Practical      Structural
                positivity      violations
    │               │               │
    ↓               ↓               ↓
Trimming      Overlap         Extrapolation
weights       assessment       methods
    │               │               │
    ↓               ↓               ↓
Truncation?   Diagnostics?   Bounds under
                             violations?

Citation Network Analysis

Forward and Backward Searching

Backward: From recent key paper, trace citations:

What foundational papers are cited?
What parallel developments exist?
What's the intellectual lineage?

Forward: Using Google Scholar "Cited by":

Who has built on this work?
What extensions were made?
What gaps remain unaddressed?

Key Paper Identification

For any topic, identify:

| Category | Description | How to Find | |----------|-------------|-------------| | Foundational | Original method papers | Most-cited, oldest | | Textbook | Comprehensive treatments | Citations across subfields | | Recent reviews | State-of-the-art summaries | "Review" in title, last 5 years | | Frontier | Latest developments | Top journals, last 2 years | | Your competition | Groups working on same gap | Recent similar titles |

Building a Citation Map

1986: Baron & Kenny [foundations]
        │
        ├──→ 1990s: SEM extensions
        │
        ├──→ 2004: Robins & Greenland [causal foundations]
        │           │
        │           ├──→ 2010: Imai et al. [sensitivity]
        │           │
        │           ├──→ 2010: VanderWeele [4-way]
        │           │           │
        │           │           └──→ 2015: Book [comprehensive]
        │           │
        │           └──→ 2014: Tchetgen [semiparametric]
        │
        └──→ 2020s: ML integration [frontier]

Gap Verification Checklist

Before claiming a gap, verify:

1. Literature Search

[ ] Searched Google Scholar with multiple keyword combinations
[ ] Searched arXiv stat.ME and stat.TH
[ ] Searched JSTOR for older statistics journals
[ ] Searched bioRxiv/medRxiv for preprints
[ ] Checked reference lists of review papers
[ ] Checked "cited by" for key papers

2. Terminology Check

[ ] Same concept might have different names in different fields
[ ] Checked econometrics terminology
[ ] Checked biostatistics terminology
[ ] Checked machine learning terminology
[ ] Checked psychology/SEM terminology

3. Adjacent Literature

[ ] Checked related but not identical settings
[ ] Method might exist for similar problem
[ ] Checked if general framework applies

4. Working Papers

[ ] Checked key authors' websites
[ ] Checked conference proceedings (JSM, ENAR)
[ ] Asked collaborators/experts

5. Final Verification

[ ] Gap is not addressed in supplementary materials
[ ] Gap is not "obvious" extension reviewers will dismiss
[ ] Gap is important enough to publish

Gap Characterization Template

When you identify a gap:

## Gap: [Brief Title]

### Setting
[Precise description of the setting where the gap exists]

### Current State
- **What exists**: [Methods that partially address this]
- **What works**: [Aspects of the problem already solved]
- **What fails**: [Where current methods break down]

### The Gap
- **Precise statement**: [What is missing]
- **Why it matters**: [Who needs this, for what applications]
- **Why it's hard**: [Technical challenges]

### Evidence of Gap
- [ ] Literature search documented
- [ ] No existing solution found
- [ ] Experts consulted (optional)

### Potential Approaches
1. [Approach 1]: [Brief description]
   - Pros: [Advantages]
   - Cons: [Challenges]

2. [Approach 2]: [Brief description]
   - Pros: [Advantages]
   - Cons: [Challenges]

### Related Work
- [Paper 1]: [How it relates, why it doesn't solve gap]
- [Paper 2]: [How it relates, why it doesn't solve gap]

### Contribution Positioning
"While [existing work] addresses [related problem], no method currently
handles [specific gap]. We propose [approach] which provides [properties]."

Common Gap Patterns in Mediation

Pattern 1: Data Structure Mismatch

Gap template: "[Method] assumes [simple structure], but in [application] data has [complex structure]"

Examples:

Methods assume iid, but data is clustered
Methods assume cross-sectional, but data is longitudinal
Methods assume complete data, but missingness exists

Pattern 2: Assumption Violation

Gap template: "[Method] requires [assumption], which is violated when [situation]"

Examples:

Unmeasured mediator-outcome confounding
Measurement error in mediator
Treatment-mediator interaction

Pattern 3: Estimand Ambiguity

Gap template: "When [complication], standard estimands [NDE/NIE] are not well-defined or interpretable"

Examples:

Post-treatment confounding
Time-varying treatments/mediators
Multiple versions of treatment

Pattern 4: Efficiency vs Robustness

Gap template: "Efficient methods require [strong assumptions], while robust methods are inefficient"

Examples:

Doubly robust methods for mediation
Semiparametric efficiency in complex settings
Adaptive methods

Pattern 5: Computational Barrier

Gap template: "Theoretically valid approach exists but [computational limitation]"

Examples:

High-dimensional settings
Continuous mediators requiring integration
Bootstrap in complex models

Research Positioning Strategies

The Contribution Statement

Strong positioning formula:

"Although [Author Year] developed [method] for [setting], their approach [limitation]. In contrast, our method [advantage] while maintaining [property]. Specifically, we contribute: (1) [theoretical contribution], (2) [methodological contribution], (3) [practical contribution]."

Positioning Types

| Position | When to Use | Example Language | |----------|-------------|------------------| | Extension | Build on existing | "We extend [method] to [new setting]" | | Synthesis | Combine approaches | "We unify [method A] and [method B]" | | Alternative | Different approach | "We propose an alternative that [advantage]" | | Correction | Fix limitation | "We address the limitation of [method]" | | Generalization | Broader framework | "We develop a general framework that includes [special cases]" |

Differentiation Matrix

| Dimension | Competitor 1 | Competitor 2 | Our Method | |-----------|--------------|--------------|------------| | Setting | Binary A only | Any A | Any A | | Theory | Consistency | + Normality | + Efficiency | | Assumptions | Strong | Medium | Weaker | | Computation | Fast | Slow | Medium | | Software | R package | None | R + Python |

Integration with Other Skills

This skill works with:

cross-disciplinary-ideation - Find solutions from other fields for identified gaps
method-transfer-engine - Transfer methods to fill gaps
identification-theory - Understand what assumptions are needed
methods-paper-writer - Write up the gap and solution

Key References

On Finding Research Gaps

Alvesson, M. & Sandberg, J. (2011). Generating research questions through problematization
Sandberg, J. & Alvesson, M. (2011). Ways of constructing research questions

Mediation Reviews (for gap identification)

VanderWeele, T.J. (2016). Mediation analysis: A practitioner's guide. Annual Review
Nguyen, T.Q. et al. (2021). Clarifying causal mediation analysis. Psychological Methods

Causal Inference Reviews

Hernán, M.A. (2018). The C-word: Scientific euphemisms do not improve causal inference
Imbens, G.W. (2020). Potential outcome and directed acyclic graph approaches

Version: 1.0 Created: 2025-12-08 Domain: Research Strategy, Literature Review

Literature Gap Finder

Systematic framework for identifying research opportunities in statistical methodology

The Gap-Finding Framework

What Makes a Good Research Gap?

A publishable gap must be:

Real - Not already addressed (check thoroughly!)
Important - Solves a problem researchers face
Tractable - Can be solved with available tools
Novel - Provides new insight, not just combination
Timely - Relevant to current research trends

Types of Gaps

Method-Setting Matrix

Systematic Gap Identification Framework

The method-setting matrix is the core tool for finding research gaps systematically:

# Build a method-setting matrix programmatically
create_gap_matrix <- function() {
  methods <- c("Regression", "Weighting/IPW", "DR/AIPW", "TMLE", "ML-based")
  settings <- c("Binary treatment", "Continuous treatment",
                "Time-varying", "Clustered", "High-dimensional",
                "Measurement error", "Missing data", "Network")

  matrix_data <- expand.grid(method = methods, setting = settings)
  matrix_data$status <- "unknown"  # To be filled: "developed", "partial", "gap"
  matrix_data$priority <- NA
  matrix_data$references <- ""

  matrix_data
}

# Visualize the gap matrix
visualize_gaps <- function(gap_matrix) {
  library(ggplot2)

  ggplot(gap_matrix, aes(x = method, y = setting, fill = status)) +
    geom_tile(color = "white") +
    scale_fill_manual(values = c(
      "developed" = "#2ecc71",
      "partial" = "#f39c12",
      "gap" = "#e74c3c",
      "unknown" = "#95a5a6"
    )) +
    theme_minimal() +
    labs(title = "Method × Setting Gap Matrix",
         x = "Method", y = "Setting") +
    theme(axis.text.x = element_text(angle = 45, hjust = 1))
}

Verification Process

Confirming a Gap is Real

Before claiming a gap, verify systematically:

# Systematic verification checklist
verify_gap <- function(topic, keywords) {
  checklist <- list(
    databases_searched = c("google_scholar", "web_of_science", "pubmed", "scopus"),
    search_terms = keywords,
    date_range = paste(Sys.Date() - 365*5, "to", Sys.Date()),
    results = list(
      papers_found = 0,
      closest_related = c(),
      why_not_the_same = ""
    ),
    expert_consultation = list(
      experts_contacted = c(),
      responses = c()
    ),
    verification_status = "pending"  # pending, confirmed, rejected
  )

  checklist
}

# Document the verification
document_verification <- function(gap_description, search_log) {
  cat("## Gap Verification Report\n\n")
  cat("**Gap:**", gap_description, "\n\n")
  cat("**Search Date:**", as.character(Sys.Date()), "\n\n")
  cat("**Databases Searched:**\n")
  for (db in search_log$databases_searched) {
    cat("- ", db, "\n")
  }
  cat("\n**Search Terms:**", paste(search_log$search_terms, collapse = ", "), "\n")
  cat("\n**Conclusion:**", search_log$verification_status, "\n")
}

Priority Ranking

Scoring Research Gaps

Priority Score = Σ(weight × score)

# Priority scoring function
score_research_gap <- function(
  impact,        # 1-5: How many researchers would benefit
  novelty,       # 1-5: How new/original is this
  tractability,  # 1-5: How likely can we solve it
  timeliness,    # 1-5: Is this currently hot
  fit,           # 1-5: Matches our expertise
  publication    # 1-5: Publication potential
) {
  weights <- c(0.25, 0.20, 0.20, 0.15, 0.10, 0.10)
  scores <- c(impact, novelty, tractability, timeliness, fit, publication)

  priority <- sum(weights * scores)

  list(
    priority_score = priority,
    interpretation = case_when(
      priority >= 4.0 ~ "High priority - pursue immediately",
      priority >= 3.0 ~ "Medium priority - develop further",
      priority >= 2.0 ~ "Low priority - back burner",
      TRUE ~ "Skip - not worth pursuing"
    ),
    breakdown = data.frame(
      criterion = c("Impact", "Novelty", "Tractability",
                   "Timeliness", "Fit", "Publication"),
      weight = weights,
      score = scores,
      weighted = weights * scores
    )
  )
}

# Compare multiple gaps
rank_gaps <- function(gaps_list) {
  scores <- sapply(gaps_list, function(g) g$priority_score)
  order(scores, decreasing = TRUE)
}

Method × Setting Matrix

The Core Framework

Systematically map methods against settings to find gaps:

                    METHODS
          │ Regression │ Weighting │ DR/TMLE │ ML-based │
──────────┼────────────┼───────────┼─────────┼──────────│
Binary A  │     ✓      │     ✓     │    ✓    │    ✓     │
Continuous│     ✓      │     ?     │    ✓    │    ?     │
SETTINGS  ├────────────┼───────────┼─────────┼──────────│
Time-vary │     ?      │     ✓     │    ✓    │    ✗     │
Clustered │     ✓      │     ?     │    ?    │    ✗     │
High-dim  │     ✗      │     ✗     │    ?    │    ✓     │

✓ = Well-developed    ? = Partial/emerging    ✗ = Gap

Building Your Matrix

Step 1: Identify Dimensions

For mediation analysis:

Step 2: List Methods

Step 3: Fill and Analyze

Mark each cell:

✓ (green): Well-established with theory + software
? (yellow): Emerging or partial coverage
✗ (red): Clear gap

Example: Sequential Mediation Matrix

                         │ Product │ Weighting │ DR │ Bounds │
─────────────────────────┼─────────┼───────────┼────┼────────│
2 mediators, linear      │    ✓    │     ✓     │  ✓ │   ?    │
2 mediators, nonlinear   │    ?    │     ✓     │  ? │   ✗    │
3+ mediators, linear     │    ?    │     ?     │  ✗ │   ✗    │
3+ mediators, nonlinear  │    ✗    │     ?     │  ✗ │   ✗    │
With measurement error   │    ✗    │     ✗     │  ✗ │   ✗    │
With unmeasured conf.    │    ✗    │     ✗     │  ✗ │   ?    │

Gaps identified:

DR methods for 3+ mediators
Any method with measurement error
Bounds approach underdeveloped

Assumption Relaxation Trees

The Framework

Map how assumptions have been relaxed over time:

                    Standard Mediation (Baron-Kenny 1986)
                              │
            ┌─────────────────┼─────────────────┐
            ↓                 ↓                 ↓
    No unmeasured      Linearity         No interaction
    confounding        assumed           assumed
            │                 │                 │
            ↓                 ↓                 ↓
    ┌───────┴───────┐   Nonparametric    VanderWeele
    ↓               ↓     (Imai 2010)    4-way decomp
Sensitivity      Bounds                        │
(Imai 2010)   (partial ID)                    ↓
    │               │               Multiple mediators?
    ↓               ↓               Longitudinal?
 E-value        Sharp bounds?       Measurement error?
(Ding 2016)         │                    │
    │               ↓                    ↓
    ↓           [YOUR GAP?]         [YOUR GAP?]
[YOUR GAP?]

Building the Tree

Step 1: Identify Original Assumptions

For a classic method, list ALL assumptions:

Explicit assumptions (stated in paper)
Implicit assumptions (unstated but required)
Computational assumptions (required for implementation)

Step 2: Trace Relaxation History

For each assumption, find papers that:

Relax it partially
Relax it completely
Replace it with different assumption
Show consequences of violation

Step 3: Find Unexplored Branches

Look for:

Combinations of relaxations not yet explored
Relaxations in one method not applied to another
Partial relaxations that could be completed

Example: Positivity Assumption

Positivity: P(A=a|X) > ε > 0 for all a, x
                    │
    ┌───────────────┼───────────────┐
    ↓               ↓               ↓
Near-violation  Practical      Structural
                positivity      violations
    │               │               │
    ↓               ↓               ↓
Trimming      Overlap         Extrapolation
weights       assessment       methods
    │               │               │
    ↓               ↓               ↓
Truncation?   Diagnostics?   Bounds under
                             violations?

Citation Network Analysis

Forward and Backward Searching

Backward: From recent key paper, trace citations:

What foundational papers are cited?
What parallel developments exist?
What's the intellectual lineage?

Forward: Using Google Scholar "Cited by":

Who has built on this work?
What extensions were made?
What gaps remain unaddressed?

Key Paper Identification

For any topic, identify:

Building a Citation Map

1986: Baron & Kenny [foundations]
        │
        ├──→ 1990s: SEM extensions
        │
        ├──→ 2004: Robins & Greenland [causal foundations]
        │           │
        │           ├──→ 2010: Imai et al. [sensitivity]
        │           │
        │           ├──→ 2010: VanderWeele [4-way]
        │           │           │
        │           │           └──→ 2015: Book [comprehensive]
        │           │
        │           └──→ 2014: Tchetgen [semiparametric]
        │
        └──→ 2020s: ML integration [frontier]

Gap Verification Checklist

Before claiming a gap, verify:

1. Literature Search

[ ] Searched Google Scholar with multiple keyword combinations
[ ] Searched arXiv stat.ME and stat.TH
[ ] Searched JSTOR for older statistics journals
[ ] Searched bioRxiv/medRxiv for preprints
[ ] Checked reference lists of review papers
[ ] Checked "cited by" for key papers

2. Terminology Check

[ ] Same concept might have different names in different fields
[ ] Checked econometrics terminology
[ ] Checked biostatistics terminology
[ ] Checked machine learning terminology
[ ] Checked psychology/SEM terminology

3. Adjacent Literature

[ ] Checked related but not identical settings
[ ] Method might exist for similar problem
[ ] Checked if general framework applies

4. Working Papers

[ ] Checked key authors' websites
[ ] Checked conference proceedings (JSM, ENAR)
[ ] Asked collaborators/experts

5. Final Verification

[ ] Gap is not addressed in supplementary materials
[ ] Gap is not "obvious" extension reviewers will dismiss
[ ] Gap is important enough to publish

Gap Characterization Template

When you identify a gap:

## Gap: [Brief Title]

### Setting
[Precise description of the setting where the gap exists]

### Current State
- **What exists**: [Methods that partially address this]
- **What works**: [Aspects of the problem already solved]
- **What fails**: [Where current methods break down]

### The Gap
- **Precise statement**: [What is missing]
- **Why it matters**: [Who needs this, for what applications]
- **Why it's hard**: [Technical challenges]

### Evidence of Gap
- [ ] Literature search documented
- [ ] No existing solution found
- [ ] Experts consulted (optional)

### Potential Approaches
1. [Approach 1]: [Brief description]
   - Pros: [Advantages]
   - Cons: [Challenges]

2. [Approach 2]: [Brief description]
   - Pros: [Advantages]
   - Cons: [Challenges]

### Related Work
- [Paper 1]: [How it relates, why it doesn't solve gap]
- [Paper 2]: [How it relates, why it doesn't solve gap]

### Contribution Positioning
"While [existing work] addresses [related problem], no method currently
handles [specific gap]. We propose [approach] which provides [properties]."

Common Gap Patterns in Mediation

Pattern 1: Data Structure Mismatch

Gap template: "[Method] assumes [simple structure], but in [application] data has [complex structure]"

Examples:

Methods assume iid, but data is clustered
Methods assume cross-sectional, but data is longitudinal
Methods assume complete data, but missingness exists

Pattern 2: Assumption Violation

Gap template: "[Method] requires [assumption], which is violated when [situation]"

Examples:

Unmeasured mediator-outcome confounding
Measurement error in mediator
Treatment-mediator interaction

Pattern 3: Estimand Ambiguity

Gap template: "When [complication], standard estimands [NDE/NIE] are not well-defined or interpretable"

Examples:

Post-treatment confounding
Time-varying treatments/mediators
Multiple versions of treatment

Pattern 4: Efficiency vs Robustness

Gap template: "Efficient methods require [strong assumptions], while robust methods are inefficient"

Examples:

Doubly robust methods for mediation
Semiparametric efficiency in complex settings
Adaptive methods

Pattern 5: Computational Barrier

Gap template: "Theoretically valid approach exists but [computational limitation]"

Examples:

High-dimensional settings
Continuous mediators requiring integration
Bootstrap in complex models

Research Positioning Strategies

The Contribution Statement

Strong positioning formula:

"Although [Author Year] developed [method] for [setting], their approach [limitation]. In contrast, our method [advantage] while maintaining [property]. Specifically, we contribute: (1) [theoretical contribution], (2) [methodological contribution], (3) [practical contribution]."

Positioning Types

Differentiation Matrix

Integration with Other Skills

This skill works with:

cross-disciplinary-ideation - Find solutions from other fields for identified gaps
method-transfer-engine - Transfer methods to fill gaps
identification-theory - Understand what assumptions are needed
methods-paper-writer - Write up the gap and solution

Key References

On Finding Research Gaps

Alvesson, M. & Sandberg, J. (2011). Generating research questions through problematization
Sandberg, J. & Alvesson, M. (2011). Ways of constructing research questions

Mediation Reviews (for gap identification)

VanderWeele, T.J. (2016). Mediation analysis: A practitioner's guide. Annual Review
Nguyen, T.Q. et al. (2021). Clarifying causal mediation analysis. Psychological Methods

Causal Inference Reviews

Hernán, M.A. (2018). The C-word: Scientific euphemisms do not improve causal inference
Imbens, G.W. (2020). Potential outcome and directed acyclic graph approaches

Version: 1.0 Created: 2025-12-08 Domain: Research Strategy, Literature Review

Adoption

brycewang-stanford/literature-gap-finder

$ install --global

Security Scan Results

SKILL.md

Literature Gap Finder

The Gap-Finding Framework

What Makes a Good Research Gap?

Types of Gaps

Method-Setting Matrix

Systematic Gap Identification Framework

Verification Process

Confirming a Gap is Real

Priority Ranking

Scoring Research Gaps

Method × Setting Matrix

The Core Framework

Building Your Matrix

Example: Sequential Mediation Matrix

Assumption Relaxation Trees

The Framework

Building the Tree

Example: Positivity Assumption

Citation Network Analysis

Forward and Backward Searching

Key Paper Identification

Building a Citation Map

Gap Verification Checklist

1. Literature Search

2. Terminology Check

3. Adjacent Literature

4. Working Papers

5. Final Verification

Gap Characterization Template

Common Gap Patterns in Mediation

Pattern 1: Data Structure Mismatch

Pattern 2: Assumption Violation

Pattern 3: Estimand Ambiguity

Pattern 4: Efficiency vs Robustness

Pattern 5: Computational Barrier

Research Positioning Strategies

The Contribution Statement

Positioning Types

Differentiation Matrix

Integration with Other Skills

Key References

On Finding Research Gaps

Mediation Reviews (for gap identification)

Causal Inference Reviews

Related Skills

brycewang-stanford/literature-review-tools

brycewang-stanford/auto-empirical-research-skills

brycewang-stanford/aer-preregistration

brycewang-stanford/economist-data-skill

brycewang-stanford/literature-gap-finder

$ install --global

Security Scan Results

SKILL.md

Literature Gap Finder

The Gap-Finding Framework

What Makes a Good Research Gap?

Types of Gaps

Method-Setting Matrix

Systematic Gap Identification Framework

Verification Process

Confirming a Gap is Real

Priority Ranking

Scoring Research Gaps

Method × Setting Matrix

The Core Framework

Building Your Matrix

Example: Sequential Mediation Matrix

Assumption Relaxation Trees

The Framework

Building the Tree

Example: Positivity Assumption

Citation Network Analysis

Forward and Backward Searching

Key Paper Identification

Building a Citation Map