skills/43-wentorai-research-plugins/skills/research/methodology/research-town-guide/SKILL.md
Simulate human research communities with multi-agent AI collaboration
npx skillsauth add brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research research-town-guideInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Simulate human research communities using multi-agent AI systems. Research Town creates virtual research environments where AI agents take on the roles of researchers, reviewers, editors, and collaborators to generate, critique, refine, and peer-review research ideas through structured multi-agent interaction.
Research Town is an open-source framework for simulating the social dynamics of academic research communities. Rather than using a single AI model for idea generation or paper writing, Research Town instantiates multiple specialized agents -- each with a defined expertise profile, publication history, and behavioral model -- that interact through the same social structures as human researchers: lab meetings, peer review, conference discussions, and collaborative writing.
The key insight behind Research Town is that research quality emerges from the social process of science, not just individual brilliance. Peer review, adversarial critique, iterative refinement through rebuttal, and cross-disciplinary fertilization are all processes that can be simulated with multi-agent systems. By modeling these interactions, Research Town produces research outputs that have been stress-tested through simulated peer review before a human researcher ever sees them.
This approach is particularly valuable for three research tasks: (1) generating novel research ideas by simulating brainstorming sessions between agents with diverse expertise, (2) stress-testing research proposals by subjecting them to simulated peer review, and (3) identifying gaps in the literature by having agents independently survey and then synthesize findings from different subfields.
| Agent Role | Expertise | Behavior | |------------|-----------|----------| | Principal Investigator | Broad domain knowledge, research vision | Sets research direction, evaluates proposals | | Domain Expert | Deep knowledge in a specific area | Provides technical depth, identifies related work | | Methodologist | Statistical and experimental design expertise | Critiques methods, suggests improvements | | Reviewer | Journal review experience, quality standards | Evaluates novelty, significance, rigor | | Devil's Advocate | Critical thinking, identifying weaknesses | Challenges assumptions, finds counterexamples | | Synthesizer | Cross-disciplinary knowledge | Connects ideas across fields, identifies patterns |
# Research Town interaction structure
class ResearchTownSession:
def __init__(self, agents, topic):
self.agents = agents
self.topic = topic
self.rounds = []
def run_brainstorming(self, num_rounds=3):
"""Structured brainstorming with multiple agents."""
ideas = []
for round_num in range(num_rounds):
round_ideas = []
for agent in self.agents:
# Each agent generates ideas given prior context
idea = agent.generate_idea(
topic=self.topic,
prior_ideas=ideas,
round=round_num
)
round_ideas.append(idea)
# Cross-pollination: agents react to each other's ideas
for agent in self.agents:
reactions = agent.react_to_ideas(round_ideas)
ideas.extend(reactions)
self.rounds.append(round_ideas)
return ideas
def run_peer_review(self, paper_draft):
"""Simulate peer review with multiple reviewers."""
reviews = []
for reviewer in self.agents:
if reviewer.role == "reviewer":
review = reviewer.review_paper(
paper_draft,
criteria=["novelty", "significance",
"methodology", "clarity", "reproducibility"]
)
reviews.append(review)
# Meta-review: aggregate and identify consensus
meta_review = self.aggregate_reviews(reviews)
return meta_review
# agents.yaml - Agent configuration
agents:
- name: "Prof. ML Expert"
role: principal_investigator
expertise: ["machine learning", "deep learning", "optimization"]
style: "rigorous, quantitative, focused on scalability"
publication_venues: ["NeurIPS", "ICML", "JMLR"]
h_index: 45
- name: "Dr. Biology Specialist"
role: domain_expert
expertise: ["structural biology", "protein engineering", "bioinformatics"]
style: "experimental, emphasizes biological validity"
publication_venues: ["Nature", "Cell", "PNAS"]
h_index: 32
- name: "Dr. Statistics"
role: methodologist
expertise: ["causal inference", "experimental design", "Bayesian methods"]
style: "rigorous, demands proper statistical justification"
publication_venues: ["JASA", "Biometrika", "Statistical Science"]
h_index: 28
- name: "Reviewer Alpha"
role: reviewer
expertise: ["interdisciplinary research", "computational biology"]
style: "constructive but demanding, focuses on reproducibility"
review_experience: 200
- name: "Skeptic"
role: devils_advocate
expertise: ["philosophy of science", "replication crisis", "research methods"]
style: "challenges assumptions, demands strong evidence"
from research_town import ResearchTown, Agent
# Initialize agents from profiles
town = ResearchTown.from_config("agents.yaml")
# Define research topic
topic = {
"area": "AI for Drug Discovery",
"question": "How can we improve the efficiency of virtual screening "
"for novel antibiotics using foundation models?",
"constraints": [
"Must work with limited labeled data (<1000 compounds)",
"Must generalize across bacterial species",
"Should produce interpretable predictions"
]
}
# Run brainstorming (3 rounds of idea generation and critique)
session = town.create_session(topic)
ideas = session.run_brainstorming(num_rounds=3)
# Rank ideas by multi-agent consensus
ranked = session.rank_ideas(ideas, criteria=[
"novelty", # How different from existing approaches
"feasibility", # Can it be implemented with current resources
"impact", # Potential significance if successful
"rigor" # Methodological soundness
])
# Output top ideas with supporting analysis
for i, idea in enumerate(ranked[:5]):
print(f"\n=== Idea {i+1} (Score: {idea.score:.2f}) ===")
print(f"Title: {idea.title}")
print(f"Summary: {idea.summary}")
print(f"Proposed by: {idea.proposer.name}")
print(f"Endorsed by: {[a.name for a in idea.endorsers]}")
print(f"Critiques: {idea.critiques}")
The simulated peer review follows a structured protocol modeled on top-venue review processes:
review_criteria = {
"novelty": {
"description": "Does this paper present genuinely new ideas?",
"scale": "1-10",
"weight": 0.25
},
"significance": {
"description": "How important is the contribution to the field?",
"scale": "1-10",
"weight": 0.20
},
"soundness": {
"description": "Are the methods and analysis technically correct?",
"scale": "1-10",
"weight": 0.25
},
"clarity": {
"description": "Is the paper well-written and easy to follow?",
"scale": "1-10",
"weight": 0.15
},
"reproducibility": {
"description": "Could another researcher replicate the results?",
"scale": "1-10",
"weight": 0.15
}
}
## Review Summary
**Overall Score**: 6.5/10
**Recommendation**: Minor Revision
### Strengths
1. Novel application of contrastive learning to molecular fingerprints
2. Comprehensive ablation study across 4 benchmark datasets
3. Clear explanation of the biological motivation
### Weaknesses
1. Limited comparison with recent graph neural network baselines (2024+)
2. Statistical significance not reported for main results
3. Interpretability analysis is superficial
### Questions for Authors
1. How does performance scale with dataset size? The smallest dataset has 5000 compounds.
2. What is the computational cost compared to traditional docking methods?
### Minor Issues
- Table 3 formatting is inconsistent
- Reference [24] appears to be a preprint; has it been published?
Before investing months in a research direction, run your idea through a simulated review panel to identify weaknesses early.
Deploy agents with different domain expertise to independently survey a topic, then synthesize their findings to identify under-explored intersections.
Submit draft sections to simulated reviewers for constructive criticism on clarity, argumentation, and missing references.
Iterate on grant proposals by having agents role-play as review panel members with different priorities (novelty, feasibility, broader impact).
development
Conduct rigorous thematic analysis (TA) of qualitative data following Braun and Clarke's (2006) six-phase framework. Use whenever the user mentions 'thematic analysis', 'TA', 'Braun and Clarke', 'qualitative coding', 'identifying themes', or asks for help analysing interviews, focus groups, open-ended survey responses, or transcripts to identify patterns. Also trigger for questions about inductive vs theoretical coding, semantic vs latent themes, essentialist vs constructionist epistemology, building a thematic map, or writing up a qualitative findings section. Covers all six phases, the four upfront analytic decisions, the 15-point quality checklist, and the five common pitfalls. Produces a Word document write-up and an annotated thematic map. Does NOT cover IPA, grounded theory, discourse analysis, conversation analysis, or narrative analysis — use a different method for those.
development
Guide users through writing a systematic literature review (SLR) following the PRISMA 2020 framework. Use this skill whenever the user mentions 'systematic review', 'systematic literature review', 'SLR', 'PRISMA', 'PRISMA 2020', 'PRISMA flow diagram', 'PRISMA checklist', or asks for help writing, structuring, or auditing a literature review that follows reporting guidelines. Also trigger when the user asks about inclusion/exclusion criteria for a review, search strategies for databases like Scopus/WoS/PubMed, study selection processes, risk of bias assessment, or narrative synthesis for a review paper. This skill covers the full PRISMA 2020 checklist (27 items), produces a Word document manuscript in strict journal article format, generates an annotated PRISMA flow diagram, and enforces APA 7th Edition referencing throughout. It does NOT cover meta-analysis or statistical pooling. By Chuah Kee Man.
testing
Performs placebo-in-time sensitivity analysis with hierarchical null model and optional Bayesian assurance. Use when checking model robustness, verifying lack of pre-intervention effects, or estimating study power.
data-ai
Fit, summarize, plot, and interpret a chosen CausalPy experiment. Use after the causal method has been selected, including when configuring PyMC/sklearn models and scale-aware custom priors.