skills/diverge-diversity-enhanced-rag-open-ended/SKILL.md
Diversity-enhanced RAG for open-ended queries with multiple valid answers. Uses reflection-guided generation and memory-augmented iterative refinement to produce diverse, high-quality responses instead of collapsing to a single dominant answer. Triggers: 'give me diverse perspectives on', 'explore different viewpoints', 'brainstorm multiple approaches to', 'what are the different ways to', 'open-ended search with diversity', 'DIVERGE-style RAG search'
npx skillsauth add ndpvt-web/arxiv-claude-skills diverge-diversity-enhanced-rag-open-endedInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill enables Claude to implement the DIVERGE agentic RAG framework, which generates multiple diverse yet high-quality responses to open-ended queries instead of collapsing to a single dominant answer. Standard RAG pipelines assume one correct answer per query; DIVERGE breaks this assumption through reflection-guided viewpoint generation, diversity-aware retrieval re-ranking, and a memory buffer that tracks explored perspectives across iterations. Use this when building search systems, recommendation engines, brainstorming tools, or any pipeline where query answers are legitimately plural.
The core insight: Standard RAG systems underutilize retrieved context diversity. Even when retrieval returns documents covering multiple perspectives, the LLM generation step collapses them into a single dominant response. Simply increasing retrieval diversity does not fix this — the bottleneck is in generation, not retrieval.
DIVERGE solves this with three interlocking mechanisms. First, reflection-guided viewpoint generation: after producing an initial response, the system extracts the underlying viewpoints (e.g., "budget-focused", "health-focused", "convenience-focused"), then explicitly reflects on which perspectives remain unexplored and generates a new viewpoint to target. Second, viewpoint-aware diversity retrieval: each iteration retrieves documents using a modified Maximal Marginal Relevance (MMR) score that penalizes similarity to previously retrieved contexts across all prior iterations, not just the current one. The scoring formula balances relevance to the viewpoint-conditioned query against cross-iteration and within-iteration diversity. Third, memory-augmented iterative refinement: a lightweight buffer stores (query, documents, viewpoint, answer) tuples from each iteration, giving the system full awareness of what has already been explored.
The diversity-quality trade-off is measured using semantic diversity (pairwise cosine distance between response embeddings), viewpoint diversity (unique atomic claims across responses), and quality (factual accuracy, evidence support, consistency, relevance). The unified score UQ^D is the harmonic mean of normalized quality and diversity, providing a single metric for the trade-off.
Classify the query as open-ended or closed. Check whether the query has a single factual answer ("What year was Python created?") or multiple valid answers ("What's the best way to learn Python?"). Only apply DIVERGE to open-ended queries — closed queries should use standard RAG.
Perform initial retrieval and generation. Run a standard retrieval step against your corpus (vector search, BM25, or hybrid). Generate the first response using the retrieved documents. This becomes iteration t=1.
Extract viewpoints from the initial response. Prompt the LLM to decompose the response into its underlying viewpoints — the distinct perspectives, assumptions, or angles it covers. Output these as a structured list (e.g., ["cost-effectiveness", "ease of use", "community support"]).
Initialize the memory buffer. Store the first iteration as a tuple: {query, retrieved_docs, viewpoints, answer}. This buffer persists across all iterations and is the system's "explored territory" map.
Reflect to identify an unexplored viewpoint. Prompt the LLM with the memory buffer contents and ask: "Given the query and the viewpoints already explored, identify a new meaningful perspective that has not been covered. Avoid restating existing viewpoints at a higher abstraction level." The reflection must produce a concrete, novel viewpoint.
Construct a viewpoint-conditioned query. Rewrite the original query to emphasize the new viewpoint. For example, if the original query is "How can I improve my sleep?" and the new viewpoint is "environmental factors", the conditioned query becomes "How can I improve my sleep by changing my environment?"
Retrieve with diversity-aware re-ranking. Retrieve candidate documents using the conditioned query, then re-rank using the modified MMR formula: score(d) = alpha * Rel(d, q_t) - beta * max_similarity_to_memory - (1-alpha) * max_similarity_within_batch. This ensures new documents are relevant to the target viewpoint while being distinct from all previously retrieved contexts.
Generate a viewpoint-conditioned response. Prompt the LLM to answer the original query from the specific viewpoint, grounded in the newly retrieved documents. Include a refinement instruction: "Ensure the response addresses the original question while focusing on the [viewpoint] perspective."
Update the memory buffer and repeat. Append the new iteration's tuple to memory. Return to step 5 and repeat for K iterations total (typically K=5-10 depending on the desired diversity breadth and compute budget).
Aggregate and deduplicate the final response set. Collect all K responses. Optionally compute pairwise semantic similarity and merge near-duplicates. Present the responses organized by viewpoint, or synthesize them into a single comprehensive answer that explicitly labels each perspective.
Example 1: Building a diverse travel recommendation RAG
User: "I'm building a travel Q&A chatbot. When someone asks 'Where should I travel in Southeast Asia?', my RAG keeps returning the same Bangkok/Bali recommendations. Help me implement DIVERGE to get diverse suggestions."
Approach:
["popular tourist destinations", "food-focused travel", "beach vacations"]Output (5 iterations):
Viewpoint 1 — Popular highlights: Bangkok, Bali, Singapore
Viewpoint 2 — Cultural immersion: Luang Prabang, Yogyakarta, Hoi An
Viewpoint 3 — Adventure & nature: Borneo rainforest, Komodo, Ha Giang Loop
Viewpoint 4 — Budget backpacking: Cambodia, Myanmar, rural Philippines
Viewpoint 5 — Wellness & retreat: Ubud, Chiang Mai, Koh Phangan
Example 2: Diverse code architecture suggestions
User: "I want to use DIVERGE to give developers multiple architectural options when they ask 'How should I structure my backend API?'"
Approach:
["REST paradigm", "Node.js ecosystem", "MVC pattern"]Output:
Option A — Classic REST + MVC: Express.js with controllers/services/models separation
Option B — Event-driven CQRS: Commands and queries separated, message bus for async ops
Option C — Serverless functions: AWS Lambda per endpoint, API Gateway routing
Option D — Type-safe RPC: tRPC or gRPC with generated clients, no REST overhead
Each option includes: when to use it, trade-offs, and example folder structure.
Example 3: Implementing the diversity-aware retrieval re-ranker
User: "Show me how to implement the DIVERGE MMR re-ranking step in Python."
import numpy as np
from typing import List, Dict
def diverge_mmr_rerank(
candidates: List[Dict], # [{id, embedding, text}, ...]
query_embedding: np.ndarray, # embedding of viewpoint-conditioned query
memory_embeddings: List[np.ndarray], # embeddings from all prior iterations
alpha: float = 0.6, # relevance vs within-batch diversity
beta: float = 0.3, # cross-iteration diversity penalty
top_k: int = 5
) -> List[Dict]:
"""Re-rank candidates using DIVERGE's diversity-aware MMR."""
selected = []
remaining = list(range(len(candidates)))
for _ in range(min(top_k, len(candidates))):
best_score, best_idx = -float('inf'), -1
for i in remaining:
emb = candidates[i]['embedding']
# Relevance to viewpoint-conditioned query
relevance = np.dot(emb, query_embedding) / (
np.linalg.norm(emb) * np.linalg.norm(query_embedding)
)
# Cross-iteration diversity: penalize similarity to memory
cross_div = 0.0
if memory_embeddings:
sims = [np.dot(emb, m) / (np.linalg.norm(emb) * np.linalg.norm(m))
for m in memory_embeddings]
cross_div = max(sims)
# Within-iteration diversity: penalize similarity to already selected
within_div = 0.0
if selected:
sel_embs = [candidates[s]['embedding'] for s in selected]
within_sims = [np.dot(emb, s) / (np.linalg.norm(emb) * np.linalg.norm(s))
for s in sel_embs]
within_div = max(within_sims)
score = alpha * relevance - beta * cross_div - (1 - alpha) * within_div
if score > best_score:
best_score, best_idx = score, i
selected.append(best_idx)
remaining.remove(best_idx)
return [candidates[i] for i in selected]
DIVERGE: Diversity-Enhanced RAG for Open-Ended Information Seeking — Hu, Tandon, Arora (2026). Focus on Section 3 (the DIVERGE framework architecture), Section 4.2 (diversity-aware MMR formula), and Section 5 (the UQ^D unified diversity-quality metric).
development
Audit LLM-based automatic short answer grading (ASAG) systems for adversarial vulnerabilities using token-level and prompt-level attack strategies from the GradingAttack framework. Triggers: 'test grading robustness', 'adversarial attack on grading', 'audit LLM grader', 'red-team answer grading', 'ASAG vulnerability assessment', 'grading fairness attack'
development
Build structured information-seeking agents that decompose complex queries into multi-turn search-and-browse workflows, aggregate results from multiple web sources, and return answers in typed structured formats (items, sets, lists, tables). Applies the GISA benchmark's ReAct-based agent architecture and evaluation methodology. Trigger phrases: "build an information-seeking agent", "search agent pipeline", "multi-turn web research agent", "structured web search workflow", "aggregate information from multiple sources", "web research with structured output"
data-ai
Optimize LLM prompts using GFlowPO's iterative generate-evaluate-refine loop with diversity-preserving exploration and dynamic memory. Use when: 'optimize this prompt', 'find a better prompt for this task', 'prompt engineering with examples', 'auto-tune my system prompt', 'improve prompt accuracy', 'generate prompt variations'.
development
Constrain LLM generation with executable Pydantic schemas and multi-agent pipelines to produce structurally valid, domain-rich artifacts. Uses ontology-as-grammar to eliminate hallucinated structures while preserving creative output. Trigger phrases: "generate a valid game design", "schema-constrained generation", "build a multi-agent pipeline with Pydantic validation", "ontology-driven content generation", "structured creative generation with DSPy", "generate artifacts that pass domain validation".