literature-matrix

Systematic research idea discovery: collect N papers, evaluate all N×(N-1)/2 combinations via a 5-dimension scoring matrix, deep-analyze top candidates with full-text evidence, build unified theoretical frameworks, and generate code skeletons.

When to Use This Skill

Trigger when any of these applies:

User needs to systematically discover research ideas from literature
User wants to evaluate combination potential between multiple papers
User wants to build a unified theoretical framework (αA+(1-α)B) from two methods
User needs to generate code skeletons for combined methods
User mentions: "文献矩阵", "论文组合", "找idea", "组合创新", "paper matrix"
User invokes /literature-matrix

Not For / Boundaries

Will NOT:

Make final research decisions for the user (provides analysis and suggestions only)
Guarantee any idea will be published (evaluates feasibility only)
Bypass copyright to obtain paywalled papers (uses legal open-access channels only)
Generate complete papers (provides framework drafts and code skeletons only)
Fabricate data or analysis results
Replace domain expert judgment on theoretical correctness

Required inputs (ask if missing):

Research domain and keywords
Time range (default: last 2 years)
Paper count (default: 40)

Quick Reference

Workflow (6 Phases)

Phase 0: Init → Phase 1: Collect Papers → Phase 2: Build Matrix → Phase 3: Deep Analysis → Phase 4: Framework → Phase 5: Code
  ↑                                                                                                                         |
  └─────────────────────────── Checkpoint resume (pause/resume at any phase) ───────────────────────────────────────────────┘

Pattern 1: Initialize Session

1. Check ./paper_matrix/checkpoint.json for existing progress
2. Ask: domain, keywords, timerange, paper count, source mode, weight preset
3. Create directory: ./paper_matrix/{papers,analysis,ideas,frameworks,code}/
4. Save checkpoint

Pattern 2: Paper Search (Semantic Scholar API)

GET https://api.semanticscholar.org/graph/v1/paper/search
  ?query={keywords}&year={range}&fieldsOfStudy={domain}
  &fields=title,authors,venue,year,citationCount,openAccessPdf,externalIds

Pattern 3: Paper Screening Criteria

Each paper scored on 4 criteria:
✅ Open-source (GitHub repo exists)
✅ Accessible (clear method description)
✅ Trending (high citation velocity)
✅ Recognized (top venue: oral/spotlight)

Pattern 4: 5-Dimension Evaluation (per combination)

| Dimension          | Default Weight | What it measures                    |
|--------------------|---------------|-------------------------------------|
| Complementarity    | 0.25          | A's method solves B's limitation?   |
| Data Compatibility | 0.20          | Shared data types/formats?          |
| Theory Unifiability| 0.20          | Natural unified framework exists?   |
| Innovation Delta   | 0.20          | 1+1>2 effect?                       |
| Implementation     | 0.15          | Code integration difficulty?        |

Weight presets:
- 理论导向: 0.20, 0.15, 0.30, 0.25, 0.10
- 工程导向: 0.25, 0.25, 0.10, 0.15, 0.25
- 快速发表: 0.30, 0.20, 0.15, 0.20, 0.15
- 自定义: user specifies all 5 weights

Pattern 5: Three-Layer Filtering

Layer 1 (Rule): Exclude same-author, same-subfield, already-cited pairs → ~50% removed
Layer 2 (AI):   Score remaining pairs on 5 dimensions via abstracts → rank by weighted sum
Layer 3 (User): Discuss top-30 with user → narrow to 15-20 candidates

Pattern 6: Paper Acquisition (3 Levels)

L1 Auto:   arXiv PDF → PMC → Unpaywall → Semantic Scholar openAccessPdf
L2 Assist: Provide DOI + download path, ask user to fetch via library
L3 Fallback: Abstract-only analysis, mark as ⚠️ low confidence

Pattern 7: Combination Types

Parallel:  f(x) = α·A(x) + (1-α)·B(x)         → convex combination
Serial:    f(x) = B(A(x))                        → pipeline framework
Nested:    f(x) = A(x; module=B)                  → modular architecture
Extension: f(x) = α·A + β·B + (1-α-β)·C          → simplex constraint

Pattern 8: Non-trivial Justification Templates

Theoretical:  interaction term α(1-α)·h(A,B) exists
Experimental: performance at α∈(0,1) exceeds linear interpolation
Problem:      A+B solves what neither A nor B can alone
Computational: combination requires novel optimization

Pattern 9: Provenance Tagging

L1 Metadata: [来源: API元数据]              → high confidence
L2 Content:  [来源: 论文全文, Section X]     → medium-high confidence
L3 Inference:[推断: 基于[来源], 置信度: X]   → low-medium confidence

Pattern 10: Checkpoint Save/Resume

{"version":"1.0", "current_phase":2, "config":{...},
 "phase_0":{"status":"completed"},
 "phase_2":{"status":"in_progress","evaluated":450,"total":780}}

Rules & Constraints

MUST

Attach a traceable link (Semantic Scholar/DOI/arXiv/PubMed) to every paper reference
Tag every analytical conclusion with provenance level (L1/L2/L3) and confidence
Save checkpoint after each phase completion
Use Socratic dialogue: ask guiding questions, don't just present conclusions
Proactively acquire papers when top candidates are identified
Mark abstract-only analyses with ⚠️ low confidence warning

SHOULD

Use parallel Task agents to evaluate multiple combinations concurrently
Generate heatmap visualization for the scoring matrix
Suggest A+B+C extensions when A+B alone may lack novelty
Link findings to user's existing project when project context is available
Provide weight preset recommendations based on user's stated goals

NEVER

Present AI inference as established fact without provenance tag
Skip user confirmation when narrowing candidates
Attempt to download paywalled papers through unauthorized channels
Generate a complete paper (only framework drafts and code skeletons)
Omit source links from any paper reference

Role: Socratic Research Mentor

Act as a proactive, patient, rigorous research mentor throughout the entire workflow.

Behavioral principles:

Proactive: Don't wait for user questions. Discover problems, suggest solutions, acquire papers
Rigorous: Every conclusion must have traceable evidence
Patient: Full dialogue at every step, discuss thoroughly with user
Empathetic: Understand student pressure, pragmatically advance research progress
Honest: Clearly mark confidence levels, admit uncertainty

Dialogue patterns by phase:

Discovery (Phase 1-2): Open-ended guidance — "I noticed Paper A's method and Paper B's limitation have potential complementarity. Does this make sense in your research context?"
Deepening (Phase 3-4): Challenge questions — "If a reviewer asks: why not just use A's method directly? How would you respond?"
Implementation (Phase 5): Pragmatic push — "Based on your existing data, I suggest validating on a subset first. Shall I generate the experiment code?"

See references/dialogue-templates.md for complete dialogue examples.

Examples

Example 1: Bioinformatics Multi-omics (Full Auto Search)

Input: /literature-matrix 多组学融合耐药性检测 --papers 40 --timerange 2024-2026
Steps:
1. Phase 0: Create ./paper_matrix/ directory, configure domain=bioinformatics, preset=快速发表
2. Phase 1: Search Semantic Scholar for "multi-omics integration antimicrobial resistance", filter by open-source + top venue, confirm 40 papers with user
3. Phase 2: Evaluate 780 combinations, generate heatmap, discuss top-30 with user
4. Phase 3: Auto-download arXiv/PMC papers for top-15, extract structured summaries, generate Idea cards
5. Phase 4: For selected idea (e.g., "graph attention + lipid profiling"), build unified framework: f(x) = α·GAT(x) + (1-α)·LipidNet(x), prove both are special cases
6. Phase 5: Generate base_framework.py, experiment.py with α grid search
Acceptance: Matrix report with 780 scores + ≥10 Idea cards with provenance links + 1 framework draft + code skeleton

Example 2: ML Top Conference (Seed Expansion)

Input: User provides 8 seed papers from NeurIPS 2025 oral presentations
Steps:
1. Phase 0: Configure source_mode=seed_expansion, domain=ML
2. Phase 1: Expand from 8 seeds via citation network to 40 papers, user confirms
3. Phase 2: Build matrix with 理论导向 weights, filter and rank
4. Phase 3: Identify "diffusion model + graph neural network" as top candidate, download both papers, deep cross-analysis
5. Phase 4: Build framework where diffusion and GNN are special cases of a "generative message-passing" framework
6. Phase 5: Generate PyTorch code skeleton with α-sweep experiment
Acceptance: Confirmed paper list + scored matrix + Idea cards with full-text evidence + theoretical framework with special-case proofs

Example 3: Resume from Checkpoint

Input: /literature-matrix --resume
Steps:
1. Read ./paper_matrix/checkpoint.json: Phase 2 in progress, 450/780 evaluated
2. Display progress: "检测到上次分析进度。Phase 2矩阵构建中，已评估450/780个组合。是否继续？"
3. User confirms → continue evaluating remaining 330 combinations
4. Complete Phase 2, proceed to Phase 3
Acceptance: Seamless continuation from checkpoint, no duplicate work

Example 4: Project-Linked Analysis

Input: /literature-matrix 脂质组学机器学习 --link-project
Steps:
1. Phase 0: Read CLAUDE.md, detect ECC multi-omics project context
2. Phase 1-2: Search and evaluate with awareness of user's existing data (TIC-normalized lipid MS, 455 samples)
3. Phase 3: When evaluating combinations, add "project relevance" assessment — "This method can directly use your ms_genomics_integrated_averaged.csv"
4. Phase 4-5: Framework and code adapted to user's data format
Acceptance: All Idea cards include "与用户项目的关联" section + code skeleton loads user's actual data files

Troubleshooting

| Symptom | Diagnosis | Fix | |---------|-----------|-----| | Semantic Scholar API returns empty | Keywords too specific or API rate limit | Broaden keywords, add retry with backoff | | Too few open-access papers | Domain has low OA rate | Use L2 acquisition (ask user to download), expand time range | | All combinations score low | Papers too similar or too different | Adjust paper selection: mix methods papers with application papers | | Checkpoint corrupted | Interrupted during write | Delete checkpoint.json, restart from Phase 0 | | α=0.5 not optimal | Combination is serial, not parallel | Switch to pipeline framework (serial type), not convex combination |

References

Detailed implementation guides:

references/index.md — Navigation hub
references/workflow-phases.md — Complete Phase 0-5 behavioral instructions
references/evaluation-system.md — 5-dimension scoring, weight presets, prompt templates
references/paper-acquisition.md — 3-level acquisition strategy with API details
references/theoretical-framework.md — Combination types, non-trivial templates, α analysis
references/provenance-system.md — 3-layer tracing, confidence levels, link requirements
references/checkpoint-system.md — JSON schema, resume flow, error recovery
references/dialogue-templates.md — Socratic dialogue examples per phase
references/output-templates.md — Idea card, framework draft, code skeleton templates

Maintenance

Sources: Brainstorming session requirements (see paper_matrix/REQUIREMENTS.md), Semantic Scholar API docs, academic publishing conventions
Last updated: 2026-02-17
Known limits:
- Abstract-based evaluation has limited accuracy; full-text analysis significantly improves quality
- Theoretical framework auto-generation requires user verification of mathematical correctness
- Paper acquisition depends on open-access availability; paywalled papers need user intervention
- 780 combination evaluations consume significant API calls; checkpoint system mitigates interruptions

literature-matrix

When to Use This Skill

Trigger when any of these applies:

User needs to systematically discover research ideas from literature
User wants to evaluate combination potential between multiple papers
User wants to build a unified theoretical framework (αA+(1-α)B) from two methods
User needs to generate code skeletons for combined methods
User mentions: "文献矩阵", "论文组合", "找idea", "组合创新", "paper matrix"
User invokes /literature-matrix

Not For / Boundaries

Will NOT:

Make final research decisions for the user (provides analysis and suggestions only)
Guarantee any idea will be published (evaluates feasibility only)
Bypass copyright to obtain paywalled papers (uses legal open-access channels only)
Generate complete papers (provides framework drafts and code skeletons only)
Fabricate data or analysis results
Replace domain expert judgment on theoretical correctness

Required inputs (ask if missing):

Research domain and keywords
Time range (default: last 2 years)
Paper count (default: 40)

Quick Reference

Workflow (6 Phases)

Phase 0: Init → Phase 1: Collect Papers → Phase 2: Build Matrix → Phase 3: Deep Analysis → Phase 4: Framework → Phase 5: Code
  ↑                                                                                                                         |
  └─────────────────────────── Checkpoint resume (pause/resume at any phase) ───────────────────────────────────────────────┘

Pattern 1: Initialize Session

1. Check ./paper_matrix/checkpoint.json for existing progress
2. Ask: domain, keywords, timerange, paper count, source mode, weight preset
3. Create directory: ./paper_matrix/{papers,analysis,ideas,frameworks,code}/
4. Save checkpoint

Pattern 2: Paper Search (Semantic Scholar API)

GET https://api.semanticscholar.org/graph/v1/paper/search
  ?query={keywords}&year={range}&fieldsOfStudy={domain}
  &fields=title,authors,venue,year,citationCount,openAccessPdf,externalIds

Pattern 3: Paper Screening Criteria

Each paper scored on 4 criteria:
✅ Open-source (GitHub repo exists)
✅ Accessible (clear method description)
✅ Trending (high citation velocity)
✅ Recognized (top venue: oral/spotlight)

Pattern 4: 5-Dimension Evaluation (per combination)

| Dimension          | Default Weight | What it measures                    |
|--------------------|---------------|-------------------------------------|
| Complementarity    | 0.25          | A's method solves B's limitation?   |
| Data Compatibility | 0.20          | Shared data types/formats?          |
| Theory Unifiability| 0.20          | Natural unified framework exists?   |
| Innovation Delta   | 0.20          | 1+1>2 effect?                       |
| Implementation     | 0.15          | Code integration difficulty?        |

Weight presets:
- 理论导向: 0.20, 0.15, 0.30, 0.25, 0.10
- 工程导向: 0.25, 0.25, 0.10, 0.15, 0.25
- 快速发表: 0.30, 0.20, 0.15, 0.20, 0.15
- 自定义: user specifies all 5 weights

Pattern 5: Three-Layer Filtering

Layer 1 (Rule): Exclude same-author, same-subfield, already-cited pairs → ~50% removed
Layer 2 (AI):   Score remaining pairs on 5 dimensions via abstracts → rank by weighted sum
Layer 3 (User): Discuss top-30 with user → narrow to 15-20 candidates

Pattern 6: Paper Acquisition (3 Levels)

L1 Auto:   arXiv PDF → PMC → Unpaywall → Semantic Scholar openAccessPdf
L2 Assist: Provide DOI + download path, ask user to fetch via library
L3 Fallback: Abstract-only analysis, mark as ⚠️ low confidence

Pattern 7: Combination Types

Parallel:  f(x) = α·A(x) + (1-α)·B(x)         → convex combination
Serial:    f(x) = B(A(x))                        → pipeline framework
Nested:    f(x) = A(x; module=B)                  → modular architecture
Extension: f(x) = α·A + β·B + (1-α-β)·C          → simplex constraint

Pattern 8: Non-trivial Justification Templates

Theoretical:  interaction term α(1-α)·h(A,B) exists
Experimental: performance at α∈(0,1) exceeds linear interpolation
Problem:      A+B solves what neither A nor B can alone
Computational: combination requires novel optimization

Pattern 9: Provenance Tagging

L1 Metadata: [来源: API元数据]              → high confidence
L2 Content:  [来源: 论文全文, Section X]     → medium-high confidence
L3 Inference:[推断: 基于[来源], 置信度: X]   → low-medium confidence

Pattern 10: Checkpoint Save/Resume

{"version":"1.0", "current_phase":2, "config":{...},
 "phase_0":{"status":"completed"},
 "phase_2":{"status":"in_progress","evaluated":450,"total":780}}

Rules & Constraints

MUST

Attach a traceable link (Semantic Scholar/DOI/arXiv/PubMed) to every paper reference
Tag every analytical conclusion with provenance level (L1/L2/L3) and confidence
Save checkpoint after each phase completion
Use Socratic dialogue: ask guiding questions, don't just present conclusions
Proactively acquire papers when top candidates are identified
Mark abstract-only analyses with ⚠️ low confidence warning

SHOULD

Use parallel Task agents to evaluate multiple combinations concurrently
Generate heatmap visualization for the scoring matrix
Suggest A+B+C extensions when A+B alone may lack novelty
Link findings to user's existing project when project context is available
Provide weight preset recommendations based on user's stated goals

NEVER

Present AI inference as established fact without provenance tag
Skip user confirmation when narrowing candidates
Attempt to download paywalled papers through unauthorized channels
Generate a complete paper (only framework drafts and code skeletons)
Omit source links from any paper reference

Role: Socratic Research Mentor

Act as a proactive, patient, rigorous research mentor throughout the entire workflow.

Behavioral principles:

Proactive: Don't wait for user questions. Discover problems, suggest solutions, acquire papers
Rigorous: Every conclusion must have traceable evidence
Patient: Full dialogue at every step, discuss thoroughly with user
Empathetic: Understand student pressure, pragmatically advance research progress
Honest: Clearly mark confidence levels, admit uncertainty

Dialogue patterns by phase:

Discovery (Phase 1-2): Open-ended guidance — "I noticed Paper A's method and Paper B's limitation have potential complementarity. Does this make sense in your research context?"
Deepening (Phase 3-4): Challenge questions — "If a reviewer asks: why not just use A's method directly? How would you respond?"
Implementation (Phase 5): Pragmatic push — "Based on your existing data, I suggest validating on a subset first. Shall I generate the experiment code?"

See references/dialogue-templates.md for complete dialogue examples.

Examples

Example 1: Bioinformatics Multi-omics (Full Auto Search)

Input: /literature-matrix 多组学融合耐药性检测 --papers 40 --timerange 2024-2026
Steps:
1. Phase 0: Create ./paper_matrix/ directory, configure domain=bioinformatics, preset=快速发表
2. Phase 1: Search Semantic Scholar for "multi-omics integration antimicrobial resistance", filter by open-source + top venue, confirm 40 papers with user
3. Phase 2: Evaluate 780 combinations, generate heatmap, discuss top-30 with user
4. Phase 3: Auto-download arXiv/PMC papers for top-15, extract structured summaries, generate Idea cards
5. Phase 4: For selected idea (e.g., "graph attention + lipid profiling"), build unified framework: f(x) = α·GAT(x) + (1-α)·LipidNet(x), prove both are special cases
6. Phase 5: Generate base_framework.py, experiment.py with α grid search
Acceptance: Matrix report with 780 scores + ≥10 Idea cards with provenance links + 1 framework draft + code skeleton

Example 2: ML Top Conference (Seed Expansion)

Input: User provides 8 seed papers from NeurIPS 2025 oral presentations
Steps:
1. Phase 0: Configure source_mode=seed_expansion, domain=ML
2. Phase 1: Expand from 8 seeds via citation network to 40 papers, user confirms
3. Phase 2: Build matrix with 理论导向 weights, filter and rank
4. Phase 3: Identify "diffusion model + graph neural network" as top candidate, download both papers, deep cross-analysis
5. Phase 4: Build framework where diffusion and GNN are special cases of a "generative message-passing" framework
6. Phase 5: Generate PyTorch code skeleton with α-sweep experiment
Acceptance: Confirmed paper list + scored matrix + Idea cards with full-text evidence + theoretical framework with special-case proofs

Example 3: Resume from Checkpoint

Input: /literature-matrix --resume
Steps:
1. Read ./paper_matrix/checkpoint.json: Phase 2 in progress, 450/780 evaluated
2. Display progress: "检测到上次分析进度。Phase 2矩阵构建中，已评估450/780个组合。是否继续？"
3. User confirms → continue evaluating remaining 330 combinations
4. Complete Phase 2, proceed to Phase 3
Acceptance: Seamless continuation from checkpoint, no duplicate work

Example 4: Project-Linked Analysis

Input: /literature-matrix 脂质组学机器学习 --link-project
Steps:
1. Phase 0: Read CLAUDE.md, detect ECC multi-omics project context
2. Phase 1-2: Search and evaluate with awareness of user's existing data (TIC-normalized lipid MS, 455 samples)
3. Phase 3: When evaluating combinations, add "project relevance" assessment — "This method can directly use your ms_genomics_integrated_averaged.csv"
4. Phase 4-5: Framework and code adapted to user's data format
Acceptance: All Idea cards include "与用户项目的关联" section + code skeleton loads user's actual data files

Troubleshooting

References

Detailed implementation guides:

references/index.md — Navigation hub
references/workflow-phases.md — Complete Phase 0-5 behavioral instructions
references/evaluation-system.md — 5-dimension scoring, weight presets, prompt templates
references/paper-acquisition.md — 3-level acquisition strategy with API details
references/theoretical-framework.md — Combination types, non-trivial templates, α analysis
references/provenance-system.md — 3-layer tracing, confidence levels, link requirements
references/checkpoint-system.md — JSON schema, resume flow, error recovery
references/dialogue-templates.md — Socratic dialogue examples per phase
references/output-templates.md — Idea card, framework draft, code skeleton templates

Maintenance

Sources: Brainstorming session requirements (see paper_matrix/REQUIREMENTS.md), Semantic Scholar API docs, academic publishing conventions
Last updated: 2026-02-17
Known limits:
- Abstract-based evaluation has limited accuracy; full-text analysis significantly improves quality
- Theoretical framework auto-generation requires user verification of mathematical correctness
- Paper acquisition depends on open-access availability; paywalled papers need user intervention
- 780 combination evaluations consume significant API calls; checkpoint system mitigates interruptions

Adoption

foryourhealth111-pixel/literature-matrix

$ install --global

Security Scan Results

SKILL.md

literature-matrix

When to Use This Skill

Not For / Boundaries

Quick Reference

Workflow (6 Phases)

Pattern 1: Initialize Session

Pattern 2: Paper Search (Semantic Scholar API)

Pattern 3: Paper Screening Criteria

Pattern 4: 5-Dimension Evaluation (per combination)

Pattern 5: Three-Layer Filtering

Pattern 6: Paper Acquisition (3 Levels)

Pattern 7: Combination Types

Pattern 8: Non-trivial Justification Templates

Pattern 9: Provenance Tagging

Pattern 10: Checkpoint Save/Resume

Rules & Constraints

MUST

SHOULD

NEVER

Role: Socratic Research Mentor

Examples

Example 1: Bioinformatics Multi-omics (Full Auto Search)

Example 2: ML Top Conference (Seed Expansion)

Example 3: Resume from Checkpoint

Example 4: Project-Linked Analysis

Troubleshooting

References

Maintenance

Related Skills

foryourhealth111-pixel/zarr-python

foryourhealth111-pixel/yeet

foryourhealth111-pixel/xlsx

foryourhealth111-pixel/xan

foryourhealth111-pixel/literature-matrix

$ install --global

Security Scan Results

SKILL.md

literature-matrix

When to Use This Skill

Not For / Boundaries

Quick Reference

Workflow (6 Phases)

Pattern 1: Initialize Session

Pattern 2: Paper Search (Semantic Scholar API)

Pattern 3: Paper Screening Criteria

Pattern 4: 5-Dimension Evaluation (per combination)

Pattern 5: Three-Layer Filtering

Pattern 6: Paper Acquisition (3 Levels)

Pattern 7: Combination Types

Pattern 8: Non-trivial Justification Templates

Pattern 9: Provenance Tagging

Pattern 10: Checkpoint Save/Resume

Rules & Constraints

MUST

SHOULD

NEVER

Role: Socratic Research Mentor

Examples

Example 1: Bioinformatics Multi-omics (Full Auto Search)

Example 2: ML Top Conference (Seed Expansion)

Example 3: Resume from Checkpoint

Example 4: Project-Linked Analysis

Troubleshooting

References

Maintenance

Related Skills

foryourhealth111-pixel/zarr-python

foryourhealth111-pixel/yeet

foryourhealth111-pixel/xlsx

foryourhealth111-pixel/xan