skills/dep-search-learning-dependency-aware-reasoning/SKILL.md
Dependency-aware multi-step reasoning with persistent memory for complex questions requiring information retrieval across multiple sources. Use when: 'answer this multi-hop question', 'research this topic step by step', 'find information that depends on other lookups', 'break down this complex question', 'trace the reasoning chain for this query', 'search and synthesize across multiple sources'.
npx skillsauth add ndpvt-web/arxiv-claude-skills dep-search-learning-dependency-aware-reasoningInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill enables Claude to tackle complex multi-hop questions by decomposing them into dependency-aware sub-questions organized as a directed acyclic graph (DAG), retrieving information in topological order, and storing intermediate findings in a persistent memory buffer for reuse. Based on the Dep-Search framework (Liu et al., 2026), this approach replaces implicit "chain of thought" reasoning with explicit structured operations: Decompose, Retrieve, Memory Access, and Conclude. The result is more reliable answers to questions where the answer to one sub-question is a prerequisite for formulating the next.
Dependency-Aware Decomposition. Instead of generating sub-questions sequentially (where each implicitly depends on the previous), Dep-Search decomposes the original question into K sub-questions forming a DAG. Each sub-question explicitly declares which prior sub-questions it depends on. For example, "What is the population of the capital of France?" decomposes into: (1) "What is the capital of France?" and (2) "What is the population of [result of step 1]?" -- where step 2 explicitly depends on step 1. Sub-questions are then resolved in topological order, guaranteeing prerequisites are satisfied before dependent steps execute.
Persistent Memory Buffer. Dep-Search maintains an LRU (least-recently-used) memory buffer (capacity ~20 entries) that accumulates compressed facts across reasoning steps. After each retrieval, a Conclude operation summarizes the retrieved documents into short, reusable fact sentences stored in memory. Later steps can access these facts via embedding similarity search combined with recency weighting, avoiding redundant re-retrieval. This is critical when context windows would otherwise overflow with raw retrieved documents.
Reward-Driven Optimization. The framework uses GRPO (Group Relative Policy Optimization) with a composite reward: R = R_answer - 0.1 * R_retrieval - 0.05 * R_decomposition. This penalizes excessive retrieval calls (>10) and over-decomposition (>8 sub-questions), training the model to be both accurate and efficient. For application without RL training, the structured format itself provides the key benefit -- you can implement the decomposition and memory pattern directly.
Analyze the question for multi-hop structure. Determine whether the question requires information that can only be found by first resolving a prerequisite question. Single-hop questions don't need this framework -- just answer directly.
Decompose into sub-questions with explicit dependencies. Write out each sub-question and annotate which prior steps it depends on. Use the format: Sub-Q K (depends on: step N, step M): [question text]. Aim for the minimum number of sub-questions needed -- penalize over-decomposition.
Build the dependency DAG and determine topological order. Identify which sub-questions have no dependencies (roots) and which depend on others. Resolve root questions first, then proceed to dependent questions only after their prerequisites are resolved.
For each sub-question in topological order, check memory first. Before performing any new retrieval, check whether previously stored conclusions already answer the current sub-question. If a stored fact resolves the sub-question, use it directly and skip retrieval.
Retrieve information for unresolved sub-questions. Issue a targeted search query for the current sub-question, incorporating resolved dependency values. For example, if step 1 resolved "capital of France = Paris", step 2's query becomes "population of Paris" rather than "population of capital of France".
Substitute resolved dependencies into query templates. Replace placeholder references in dependent sub-questions with actual resolved values before querying. This produces more specific, effective searches.
Conclude: compress retrieved context into reusable facts. After each successful retrieval, extract 1-3 concise fact sentences from the retrieved documents and store them in the memory buffer. Evict oldest entries if buffer exceeds capacity.
Repeat steps 4-7 for each sub-question in topological order. Track which sub-questions are resolved and which are still pending. If a retrieval fails, note the failure in memory to avoid repeating the same failed strategy.
Synthesize the final answer from memory. Once all sub-questions are resolved, combine the accumulated facts from memory into a coherent answer to the original question. Cite which sub-question each fact came from.
Validate dependency satisfaction. Before finalizing, verify that every sub-question that had dependencies actually used the resolved values from those dependencies. Flag any gaps in the reasoning chain.
Example 1: Multi-hop factual question
User: "What is the birth year of the person who directed the first Marvel Cinematic Universe film?"
Approach:
Decompose:
Sub-Q 1 (depends on: none): What was the first Marvel Cinematic Universe film?
Sub-Q 2 (depends on: step 1): Who directed [result of step 1]?
Sub-Q 3 (depends on: step 2): What is the birth year of [result of step 2]?
Topological order: 1 -> 2 -> 3
Step 1: Retrieve "first Marvel Cinematic Universe film"
-> Retrieved: Iron Man (2008)
-> Conclude & store: "The first MCU film was Iron Man, released in 2008."
Step 2: Check memory -> no info on director. Retrieve "director of Iron Man 2008"
-> Retrieved: Jon Favreau directed Iron Man.
-> Conclude & store: "Jon Favreau directed Iron Man (2008)."
Step 3: Check memory -> no birth year. Retrieve "Jon Favreau birth year"
-> Retrieved: Jon Favreau was born October 19, 1966.
-> Conclude & store: "Jon Favreau was born in 1966."
Final answer: 1966. Jon Favreau, who directed Iron Man (the first MCU film),
was born in 1966.
Example 2: Research task with shared dependencies
User: "Compare the GDP per capita of the countries where Tesla and Toyota are headquartered."
Approach:
Decompose:
Sub-Q 1 (depends on: none): Where is Tesla headquartered?
Sub-Q 2 (depends on: none): Where is Toyota headquartered?
Sub-Q 3 (depends on: step 1): What is the GDP per capita of [result of step 1]?
Sub-Q 4 (depends on: step 2): What is the GDP per capita of [result of step 2]?
Sub-Q 5 (depends on: step 3, step 4): Compare [result of step 3] and [result of step 4].
Topological order: {1, 2} (parallel) -> {3, 4} (parallel) -> 5
Steps 1-2 (no dependencies, resolve in parallel):
Step 1: Retrieve -> Tesla is headquartered in Austin, Texas, USA.
Conclude & store: "Tesla HQ: Austin, Texas, USA."
Step 2: Retrieve -> Toyota is headquartered in Toyota City, Japan.
Conclude & store: "Toyota HQ: Toyota City, Japan."
Steps 3-4 (depend on 1 and 2 respectively):
Step 3: Retrieve "GDP per capita of USA"
-> Conclude & store: "USA GDP per capita: ~$85,000 (2025)."
Step 4: Retrieve "GDP per capita of Japan"
-> Conclude & store: "Japan GDP per capita: ~$34,000 (2025)."
Step 5: Synthesize from memory -- all 4 facts available.
Final answer: Tesla (USA) has a GDP per capita of ~$85,000 vs Toyota
(Japan) at ~$34,000. The USA's GDP per capita is roughly 2.5x Japan's.
Example 3: Building a search agent with dependency tracking
User: "Build me a function that answers multi-hop questions using this dependency-aware approach."
Approach:
from dataclasses import dataclass, field
from collections import deque
@dataclass
class MemoryEntry:
fact: str
step_id: int
query: str
class DepSearchReasoner:
def __init__(self, search_fn, max_memory=20):
self.search_fn = search_fn # callable: query -> list[str]
self.memory: deque[MemoryEntry] = deque(maxlen=max_memory)
self.resolved: dict[int, str] = {}
def decompose(self, question: str) -> list[dict]:
"""Returns list of {id, question_template, depends_on: list[int]}."""
# LLM call to decompose question into sub-questions with dependencies
...
def topological_order(self, sub_questions: list[dict]) -> list[int]:
"""Return sub-question IDs in dependency-respecting order."""
from graphlib import TopologicalSorter
graph = {sq["id"]: set(sq["depends_on"]) for sq in sub_questions}
return list(TopologicalSorter(graph).static_order())
def substitute_deps(self, template: str, depends_on: list[int]) -> str:
"""Replace [step N] placeholders with resolved values."""
query = template
for dep_id in depends_on:
query = query.replace(f"[step {dep_id}]", self.resolved[dep_id])
return query
def check_memory(self, query: str) -> str | None:
"""Search memory for a fact that answers the query."""
# Embedding similarity search over self.memory entries
...
def conclude(self, step_id: int, query: str, docs: list[str]) -> str:
"""Summarize retrieved docs into a reusable fact sentence."""
# LLM call to extract key fact from docs
fact = ...
self.memory.append(MemoryEntry(fact=fact, step_id=step_id, query=query))
return fact
def answer(self, question: str) -> str:
sub_qs = self.decompose(question)
order = self.topological_order(sub_qs)
sq_map = {sq["id"]: sq for sq in sub_qs}
for step_id in order:
sq = sq_map[step_id]
query = self.substitute_deps(sq["question_template"], sq["depends_on"])
cached = self.check_memory(query)
if cached:
self.resolved[step_id] = cached
continue
docs = self.search_fn(query)
fact = self.conclude(step_id, query, docs)
self.resolved[step_id] = fact
return self.synthesize(question, self.resolved)
(depends on: step N) for every sub-question. Implicit dependencies are the primary failure mode of naive chain-of-thought decomposition.Liu, Y., Peng, X., Yan, Z., Shen, Y., & Xu, W. (2026). Dep-Search: Learning Dependency-Aware Reasoning Traces with Persistent Memory. arXiv:2601.18771v1. https://arxiv.org/abs/2601.18771v1
Key sections to read: Section 3 (framework architecture with Decompose/Retrieve/Memory/Conclude operations), Section 3.3 (persistent memory buffer with LRU eviction), Section 4 (GRPO training with composite reward R = R_ans - 0.1R_ret - 0.05R_dec), and Table 2 (results across 7 QA benchmarks showing ~3-point average improvement over HierSearch).
development
Audit LLM-based automatic short answer grading (ASAG) systems for adversarial vulnerabilities using token-level and prompt-level attack strategies from the GradingAttack framework. Triggers: 'test grading robustness', 'adversarial attack on grading', 'audit LLM grader', 'red-team answer grading', 'ASAG vulnerability assessment', 'grading fairness attack'
development
Build structured information-seeking agents that decompose complex queries into multi-turn search-and-browse workflows, aggregate results from multiple web sources, and return answers in typed structured formats (items, sets, lists, tables). Applies the GISA benchmark's ReAct-based agent architecture and evaluation methodology. Trigger phrases: "build an information-seeking agent", "search agent pipeline", "multi-turn web research agent", "structured web search workflow", "aggregate information from multiple sources", "web research with structured output"
data-ai
Optimize LLM prompts using GFlowPO's iterative generate-evaluate-refine loop with diversity-preserving exploration and dynamic memory. Use when: 'optimize this prompt', 'find a better prompt for this task', 'prompt engineering with examples', 'auto-tune my system prompt', 'improve prompt accuracy', 'generate prompt variations'.
development
Constrain LLM generation with executable Pydantic schemas and multi-agent pipelines to produce structurally valid, domain-rich artifacts. Uses ontology-as-grammar to eliminate hallucinated structures while preserving creative output. Trigger phrases: "generate a valid game design", "schema-constrained generation", "build a multi-agent pipeline with Pydantic validation", "ontology-driven content generation", "structured creative generation with DSPy", "generate artifacts that pass domain validation".