skills/a2rag-adaptive-agentic-graph/SKILL.md
Build adaptive, cost-aware Graph-RAG pipelines that route queries through escalating retrieval stages (local -> bridge -> global) with triple-check verification and provenance map-back. Use when: 'build a graph RAG pipeline', 'implement adaptive retrieval for knowledge graphs', 'cost-aware multi-hop question answering', 'add evidence verification to RAG', 'handle mixed-difficulty queries efficiently', 'graph retrieval with source text grounding'.
npx skillsauth add ndpvt-web/arxiv-claude-skills a2rag-adaptive-agentic-graphInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill enables Claude to design and implement Graph Retrieval-Augmented Generation systems that adaptively route queries through escalating retrieval stages based on difficulty, verify evidence with a triple-check mechanism, and ground answers back to source text via provenance mapping. The core insight from the A2RAG paper is that ~55% of queries can be answered with cheap local graph lookups, ~27% need bridge discovery, and only ~15% require expensive global diffusion -- so a progressive escalation pipeline cuts token consumption and latency by ~50% while improving recall by +10 points over flat retrieval baselines.
A2RAG decouples adaptive control from agentic retrieval. The adaptive controller acts as the outer loop: it gates whether retrieval is needed at all (via summarized-KB similarity scoring), orchestrates answer generation, and enforces a triple-check verification before accepting any answer. The three checks are: (1) evidence relevance -- do retrieved passages actually address the query, (2) answer grounding -- are claims derivable from the evidence, and (3) query resolution -- does the answer adequately address the question. Only when all three pass does the system return a result. On failure, the controller identifies which check failed and rewrites the query accordingly (sharpening entities for relevance failures, requesting stricter grounding, or adding missing constraints for resolution failures), up to a bounded retry limit.
The agentic retriever inside this loop maintains a stateful evidence accumulator and escalates through three stages. Stage 1 (Local) extracts entity mentions from the query, aligns them to knowledge graph nodes via hybrid lexical-semantic matching, and collects 1-hop neighbor triples -- this resolves the majority of queries. Stage 2 (Bridge) discovers bridge entities that connect two or more query entity seeds within K hops, extracting the shortest connecting paths -- this handles multi-hop reasoning. Stage 3 (Global Fallback) runs degree-normalized Personalized PageRank from seed nodes, selects top-L nodes by PPR score, and critically performs provenance map-back: mapping each node back to its source text chunks to recover fine-grained qualifiers lost during graph construction. Escalation is monotonic (Local -> Bridge -> Global), with sufficiency checks gating each transition.
Build the knowledge graph index. Parse the corpus into documents, extract entities and relations (via NER + relation extraction or LLM-based extraction), and construct a graph G = (V, E). Store an offline provenance map pi: V -> 2^D linking each node back to its source text chunks. Precompute document summaries for gating.
Implement the gating check. For each incoming query, compute dense embedding similarity against precomputed document summaries. If max(similarity_scores) < tau_g (configurable threshold, typically 0.3-0.5), return "Abstain" immediately -- the corpus likely cannot answer this query.
Implement Stage 1: Local Evidence Collection. Extract entity mentions from the query using NER or phrase extraction. Align them to graph nodes using a hybrid score (edit distance + embedding cosine similarity). Collect 1-hop neighbors of aligned nodes, optionally filtered by relation seed constraints. Package triples as evidence.
Implement Stage 2: Bridge Discovery. For multi-hop queries where local evidence is insufficient, construct an augmented graph with inverse edges. Find bridge candidates -- nodes reachable from 2+ query entity seeds within K hops (K >= 2). Extract shortest paths connecting bridges to seeds. Cap path count and hop length to control evidence size.
Implement Stage 3: Global Fallback with PPR and Provenance Map-Back. When bridge discovery fails, run degree-normalized Personalized PageRank from seed nodes (lower-degree seeds get higher personalization weight to avoid hub bias). Select top-L nodes by PPR score. Map each selected node back to source text chunks via the provenance map pi(v). This recovers fine-grained qualifiers (dates, numbers, exceptions) that graph triples lost.
Implement the triple-check verification. After generating an answer from retrieved evidence, run three validators: (a) V_rel(q, E) -- are the passages relevant to the query? (b) V_grd(a, E) -- is the answer grounded in the evidence? (c) V_ans(q, a) -- does the answer resolve the question? Use NLI models or prompted LLM classifiers. Accept only when all three pass.
Implement failure-aware query rewriting. When verification fails, identify the first violated check. Rewrite the query accordingly: sharpen entity/relation expressions for relevance failures, request stricter evidence for grounding failures, add missing constraints for resolution failures. Retry up to I_max iterations (2-3).
Wire the outer adaptive control loop. Connect gating -> agentic retrieval (with escalation) -> answer generation -> triple-check -> conditional rewrite/retry. Track token consumption per stage for cost monitoring.
Add cost instrumentation. Log which retrieval stage resolved each query. Monitor the stage distribution (target: ~55% local, ~27% bridge, ~15% global). Alert if global fallback usage exceeds 20%, indicating potential graph quality issues.
Test with mixed-difficulty query sets. Evaluate on both single-hop and multi-hop questions. Verify that easy queries terminate at Stage 1, multi-hop queries use Stage 2, and only adversarial/incomplete-graph cases hit Stage 3.
Example 1: Building an A2RAG pipeline for a documentation QA system
User: "I have a knowledge graph built from our product documentation. I want to build a QA system that handles both simple factual questions and complex multi-hop questions efficiently."
Approach:
class A2RAGPipeline:
def __init__(self, kg, corpus, summaries, tau_g=0.4, i_max=2, alpha=0.15):
self.kg = kg # Knowledge graph with adjacency
self.corpus = corpus # Source text chunks indexed by doc ID
self.summaries = summaries # Precomputed doc summary embeddings
self.tau_g = tau_g # Gating threshold
self.i_max = i_max # Max rewrite retries
self.alpha = alpha # PPR teleport probability
self.provenance = kg.provenance_map # node -> source chunk IDs
def answer(self, query: str) -> dict:
# Step 1: Gating
if self._gate(query) < self.tau_g:
return {"answer": None, "status": "abstain"}
q = query
for i in range(self.i_max + 1):
# Step 2: Agentic retrieval with escalation
evidence = self._agentic_retrieve(q)
# Step 3: Generate answer
answer = self._generate(q, evidence)
# Step 4: Triple-check verification
checks = self._triple_check(q, answer, evidence)
if all(checks.values()):
return {"answer": answer, "evidence": evidence,
"stage": evidence.source_stage, "retries": i}
# Step 5: Failure-aware rewrite
failed = next(k for k, v in checks.items() if not v)
q = self._rewrite(q, answer, evidence, failure_type=failed)
return {"answer": answer, "status": "unverified", "retries": self.i_max}
def _agentic_retrieve(self, query: str) -> Evidence:
seeds = self._extract_and_align_entities(query)
rel_seeds = self._extract_relation_seeds(query)
# Stage 1: Local (1-hop neighbors of aligned entities)
evidence = self._local_collect(seeds, rel_seeds)
if self._evidence_sufficient(query, evidence):
evidence.source_stage = "local"
return evidence
# Stage 2: Bridge discovery (K-hop connecting paths)
bridge_evidence = self._bridge_discover(seeds, k_hops=2)
evidence.merge(bridge_evidence)
if self._evidence_sufficient(query, evidence):
evidence.source_stage = "bridge"
return evidence
# Stage 3: Global PPR + provenance map-back
ppr_nodes = self._degree_normalized_ppr(seeds, top_l=20)
source_chunks = self._provenance_mapback(ppr_nodes)
evidence.merge(source_chunks)
evidence.source_stage = "global"
return evidence
Example 2: Adding provenance map-back to an existing Graph-RAG system
User: "Our graph RAG often gives wrong answers because the KG triples don't capture specific dates and numbers from the source documents. How do I fix this?"
Approach:
class ProvenanceMapBack:
def __init__(self, kg, corpus):
# Build offline provenance map: node_id -> set of chunk_ids
self.prov_map = {}
for node_id, node in kg.nodes.items():
self.prov_map[node_id] = set()
for chunk_id, chunk in corpus.items():
if node.label.lower() in chunk.text.lower():
self.prov_map[node_id].add(chunk_id)
def mapback(self, selected_nodes: list[str], corpus) -> list[str]:
"""Given graph-selected nodes, retrieve original source text."""
chunk_ids = set()
for node_id in selected_nodes:
chunk_ids.update(self.prov_map.get(node_id, set()))
return [corpus[cid].text for cid in chunk_ids]
# Usage: After PPR or bridge discovery selects graph nodes,
# ground the answer in source text instead of just triples
graph_nodes = ppr_select(seeds, top_l=20)
source_texts = provenance.mapback(graph_nodes, corpus)
answer = llm.generate(query=q, context=source_texts) # Grounded in full text
Example 3: Implementing triple-check verification for answer quality
User: "I want to add verification to my RAG pipeline so it doesn't return hallucinated answers."
Approach:
class TripleCheck:
def __init__(self, nli_model, llm):
self.nli = nli_model
self.llm = llm
def verify(self, query: str, answer: str, evidence: list[str]) -> dict:
context = "\n".join(evidence)
return {
"relevance": self._check_relevance(query, context),
"grounding": self._check_grounding(answer, context),
"resolution": self._check_resolution(query, answer),
}
def _check_relevance(self, query, context) -> bool:
"""Do the retrieved passages actually address the query?"""
return self.nli.entails(premise=context,
hypothesis=f"This text is relevant to: {query}")
def _check_grounding(self, answer, context) -> bool:
"""Is every claim in the answer supported by the evidence?"""
return self.nli.entails(premise=context, hypothesis=answer)
def _check_resolution(self, query, answer) -> bool:
"""Does the answer adequately resolve the question?"""
prompt = f"Does this answer fully resolve the question?\nQ: {query}\nA: {answer}\nRespond YES or NO."
return "YES" in self.llm.generate(prompt).upper()
def rewrite_on_failure(self, query, answer, evidence, failure_type) -> str:
strategies = {
"relevance": f"Rephrase to be more specific about entities and relations: {query}",
"grounding": f"Find stronger evidence for: {query}. Previous answer '{answer}' was not grounded.",
"resolution": f"Add missing constraints. Original: {query}. Incomplete answer: {answer}",
}
return self.llm.generate(f"Rewrite this query. Strategy: {strategies[failure_type]}")
node -> source chunks) at index time, not query time. This is an offline operation and critical for the map-back stage to be fast.tau_g conservatively (0.3-0.4). False negatives (abstaining on answerable queries) are worse than letting a few irrelevant queries through to the retriever.p0(u) = deg(u)^{-1} / Z.| Failure Mode | Symptom | Resolution |
|---|---|---|
| Gating too aggressive | Many answerable queries return "Abstain" | Lower tau_g threshold; check summary embedding quality |
| Entity alignment misses | Seeds don't match KG nodes, Stage 1 returns empty evidence | Improve hybrid matching; add alias tables; lower alignment threshold |
| Bridge discovery timeout | Stage 2 hangs on dense subgraphs | Cap path count and hop budget K; set explicit traversal limits |
| PPR convergence issues | Stage 3 returns low-quality nodes | Increase iteration count for fixed-point computation; verify graph connectivity |
| Triple-check too strict | All answers fail verification, retry budget exhausted | Relax individual check thresholds; consider soft scoring instead of binary |
| Provenance map gaps | Map-back returns empty for some nodes | Audit extraction pipeline; ensure all entities have source chunk links |
Paper: A2RAG: Adaptive Agentic Graph Retrieval for Cost-Aware and Reliable Reasoning (Liu et al., 2026). Look for: the three-stage escalation policy (Section 3.2), the triple-check verification formulation (Section 3.1), degree-normalized PPR with provenance map-back (Section 3.2.3), and the cost-vs-recall ablation tables (Section 4.3) showing that ~55% of queries resolve at Stage 1.
development
Audit LLM-based automatic short answer grading (ASAG) systems for adversarial vulnerabilities using token-level and prompt-level attack strategies from the GradingAttack framework. Triggers: 'test grading robustness', 'adversarial attack on grading', 'audit LLM grader', 'red-team answer grading', 'ASAG vulnerability assessment', 'grading fairness attack'
development
Build structured information-seeking agents that decompose complex queries into multi-turn search-and-browse workflows, aggregate results from multiple web sources, and return answers in typed structured formats (items, sets, lists, tables). Applies the GISA benchmark's ReAct-based agent architecture and evaluation methodology. Trigger phrases: "build an information-seeking agent", "search agent pipeline", "multi-turn web research agent", "structured web search workflow", "aggregate information from multiple sources", "web research with structured output"
data-ai
Optimize LLM prompts using GFlowPO's iterative generate-evaluate-refine loop with diversity-preserving exploration and dynamic memory. Use when: 'optimize this prompt', 'find a better prompt for this task', 'prompt engineering with examples', 'auto-tune my system prompt', 'improve prompt accuracy', 'generate prompt variations'.
development
Constrain LLM generation with executable Pydantic schemas and multi-agent pipelines to produce structurally valid, domain-rich artifacts. Uses ontology-as-grammar to eliminate hallucinated structures while preserving creative output. Trigger phrases: "generate a valid game design", "schema-constrained generation", "build a multi-agent pipeline with Pydantic validation", "ontology-driven content generation", "structured creative generation with DSPy", "generate artifacts that pass domain validation".