skills/breaking-static-graph-context-aware/SKILL.md
Build query-adaptive knowledge graph retrieval systems using CatRAG's context-aware traversal. Transforms static KG-based RAG pipelines into dynamic, query-sensitive retrieval that recovers complete multi-hop evidence chains. Use when: 'build a multi-hop RAG pipeline', 'improve knowledge graph retrieval', 'fix semantic drift in graph search', 'implement context-aware graph traversal', 'retrieve complete evidence chains from a KG', 'add query-dependent edge weighting to my graph'.
npx skillsauth add ndpvt-web/arxiv-claude-skills breaking-static-graph-context-awareInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill enables Claude to build retrieval-augmented generation systems that use query-adaptive knowledge graph traversal instead of static graph search. Based on the CatRAG framework, it addresses the "Static Graph Fallacy" — where fixed edge weights cause random walks to drift into high-degree hub nodes, retrieving partial context but missing the complete evidence chain needed for multi-hop reasoning. The technique modifies Personalized PageRank with three mechanisms: symbolic anchoring, dynamic edge weighting, and passage weight enhancement, turning any knowledge graph into a query-sensitive navigation structure.
The Static Graph Fallacy. Systems like HippoRAG construct a knowledge graph during indexing and use Personalized PageRank (PPR) with fixed transition probabilities to retrieve relevant passages. The problem: edge weights are set once and never change. A relation edge between "Christopher Nolan" and "film" gets the same weight regardless of whether the query is about his filmography or his childhood. High-degree nodes accumulate disproportionate random walk probability, causing "semantic drift" — the walk reaches a hub and scatters instead of following the evidence chain.
CatRAG's Three Interventions. (1) Symbolic Anchoring extracts named entities from the query via NER and injects them as weak reset seeds (weight epsilon=0.2, scaled by inverse passage frequency) into the PPR teleportation vector. This creates gravitational pull toward query-specific entities without overwhelming the semantic signal from triple-based retrieval. (2) Query-Aware Dynamic Edge Weighting uses a coarse-to-fine strategy: first prune edges on dense nodes (>15 outgoing edges) by embedding similarity, then score surviving edges with an LLM classifier into tiers {Irrelevant: 0, Weak: 0.2-0.3, High: 2.0-3.0, Direct: 5.0}, multiplied against the static weight. This makes the transition matrix query-specific. (3) Key-Fact Passage Weight Enhancement boosts edges between seed entities and passages containing verified seed triples by factor beta=2.5, requiring zero LLM calls — a pure algorithmic shortcut that anchors the walk to likely evidence passages.
Why it matters for implementation. The three mechanisms are modular. You can apply symbolic anchoring alone for a quick improvement, add passage weight enhancement for zero-cost gains, and layer on dynamic edge weighting when LLM inference budget allows. The PPR equation stays the same — only the transition matrix and teleportation vector change per query.
Construct the knowledge graph from your document corpus. Use OpenIE (or an LLM extraction prompt) to extract (subject, relation, object) triples from each passage. Create three node types: entity nodes (V_E), passage nodes (V_P). Create three edge types: relation edges between entities from the same triple, synonym edges between entities with embedding cosine similarity > 0.8 (weight = 2.0 * similarity), and context edges from passage nodes to every entity they contain.
Build the static transition matrix. Normalize outgoing edge weights for each node to form transition probabilities. Store the graph in an adjacency structure that supports per-query weight overrides (e.g., a sparse matrix where weights can be masked or multiplied).
Process the incoming query with dual-channel seed identification. Channel A: embed the query and retrieve the top-N_seed (default 5) entity nodes by cosine similarity between query embedding and triple embeddings. Channel B: run NER on the query to extract explicit entity mentions, then fuzzy-match them to graph entity nodes.
Apply Symbolic Anchoring. Construct the PPR teleportation vector. Assign Channel A seeds their retrieval-score-based probabilities. For Channel B (NER) entities, inject each with weight epsilon=0.2, scaled by the inverse of the entity's passage count (entities appearing in fewer passages get relatively higher anchor weight). Normalize the combined teleportation vector to sum to 1.
Apply Coarse-Grained Edge Pruning. For each of the top-N_seed entity nodes, check if the node has more than K_edge (default 15) outgoing relation edges. If so, compute cosine similarity between the query embedding and each neighbor's fact embedding. Keep the top-K_edge edges; demote the rest to "Weak" weight (0.2).
Apply Fine-Grained Dynamic Edge Weighting. For surviving edges from step 5, call an LLM with the query, source entity, target entity, and target entity's context (summarized if the entity has more than tau triples, concatenated raw otherwise). The LLM classifies the relationship relevance as {Irrelevant, Weak, High, Direct}. Multiply the static edge weight by the tier multiplier: Irrelevant=0, Weak=0.25, High=2.5, Direct=5.0.
Apply Key-Fact Passage Weight Enhancement. For every context edge connecting a seed entity to a passage node, check if the passage contains a verified seed triple (a triple that contributed to the seed identification in step 3). If yes, multiply the edge weight by (1 + beta), where beta=2.5. This is a pure lookup — no LLM calls needed.
Run modified PPR. Execute Personalized PageRank on the modified graph with damping factor d=0.5 and the anchored teleportation vector from step 4. Iterate until convergence (typically 10-20 iterations). Rank passage nodes by their final PPR score.
Retrieve and assemble context. Select the top-K passage nodes (default K=5). Extract their source text. Pass them to the downstream LLM as retrieved context for answering the query.
Evaluate with chain-complete metrics. Beyond standard Recall@K, measure Full Chain Retrieval (FCR) — the fraction of queries where ALL required evidence passages appear in the top-K — and Joint Success Rate (JSR) — FCR multiplied by answer correctness. These metrics expose the partial-retrieval failures that standard recall hides.
Example 1: Multi-hop question answering over a company knowledge base
User: "Build a retrieval pipeline that can answer questions like 'What award did the CEO of the company that acquired Tableau receive in 2020?' across our 10K corporate filings."
Approach:
Output: Retrieved passages cover all three hops — acquisition, CEO identity, and award — enabling a correct grounded answer.
Example 2: Fixing an existing HippoRAG pipeline with hub drift
User: "Our KG retrieval keeps returning Wikipedia-style overview passages instead of specific evidence. How do I fix it?"
Approach:
Output:
# Before: teleportation vector based only on triple retrieval
teleport = retrieval_scores / retrieval_scores.sum()
# After: inject NER anchors with symbolic anchoring
ner_entities = extract_entities(query)
for ent in ner_entities:
node_id = fuzzy_match(ent, graph.entity_nodes)
if node_id is not None:
inv_freq = 1.0 / graph.passage_count(node_id)
teleport[node_id] += EPSILON * inv_freq
teleport = teleport / teleport.sum()
# Passage weight enhancement (no LLM calls)
BETA = 2.5
for seed_triple in seed_triples:
for passage_id in graph.passages_containing(seed_triple):
edge = graph.get_edge(seed_triple.entity, passage_id)
edge.weight *= (1 + BETA)
Example 3: Implementing the LLM edge classifier
User: "How do I implement the dynamic edge weighting LLM call?"
Approach:
Output:
TIER_WEIGHTS = {"Irrelevant": 0.0, "Weak": 0.25, "High": 2.5, "Direct": 5.0}
EDGE_CLASSIFY_PROMPT = """Given the query: "{query}"
Evaluate the relevance of traversing from entity "{source}" to entity "{target}".
Context about target entity: {context}
Classify this edge as exactly one of: Irrelevant, Weak, High, Direct.
- Direct: target entity is explicitly needed to answer the query
- High: target entity provides important supporting context
- Weak: target entity is tangentially related
- Irrelevant: target entity has no bearing on the query
Classification:"""
def score_edges(query, seed_node, neighbors, graph, llm):
context_map = {}
for nbr in neighbors:
facts = graph.get_triples(nbr)
if len(facts) > TAU:
context_map[nbr] = llm.summarize(facts)
else:
context_map[nbr] = " | ".join(str(f) for f in facts)
for nbr in neighbors:
prompt = EDGE_CLASSIFY_PROMPT.format(
query=query, source=seed_node,
target=nbr, context=context_map[nbr]
)
tier = llm.classify(prompt, temperature=0.0)
multiplier = TIER_WEIGHTS.get(tier.strip(), 0.25)
graph.edges[seed_node][nbr]["weight"] *= multiplier
Paper: Breaking the Static Graph: Context-Aware Traversal for Robust Retrieval-Augmented Generation — Lau et al., 2026. Focus on Section 3 (method) for the three mechanisms, Section 4.3 for the Full Chain Retrieval metric that reveals partial-retrieval failures, and Table 2 for the JSR improvements that standard recall metrics hide. Code: github.com/kwunhang/CatRAG.
development
Audit LLM-based automatic short answer grading (ASAG) systems for adversarial vulnerabilities using token-level and prompt-level attack strategies from the GradingAttack framework. Triggers: 'test grading robustness', 'adversarial attack on grading', 'audit LLM grader', 'red-team answer grading', 'ASAG vulnerability assessment', 'grading fairness attack'
development
Build structured information-seeking agents that decompose complex queries into multi-turn search-and-browse workflows, aggregate results from multiple web sources, and return answers in typed structured formats (items, sets, lists, tables). Applies the GISA benchmark's ReAct-based agent architecture and evaluation methodology. Trigger phrases: "build an information-seeking agent", "search agent pipeline", "multi-turn web research agent", "structured web search workflow", "aggregate information from multiple sources", "web research with structured output"
data-ai
Optimize LLM prompts using GFlowPO's iterative generate-evaluate-refine loop with diversity-preserving exploration and dynamic memory. Use when: 'optimize this prompt', 'find a better prompt for this task', 'prompt engineering with examples', 'auto-tune my system prompt', 'improve prompt accuracy', 'generate prompt variations'.
development
Constrain LLM generation with executable Pydantic schemas and multi-agent pipelines to produce structurally valid, domain-rich artifacts. Uses ontology-as-grammar to eliminate hallucinated structures while preserving creative output. Trigger phrases: "generate a valid game design", "schema-constrained generation", "build a multi-agent pipeline with Pydantic validation", "ontology-driven content generation", "structured creative generation with DSPy", "generate artifacts that pass domain validation".