skills/skillxiv-v0.0.2-claude-opus-4.6/dover-auto-debugging/SKILL.md
Diagnose and fix multi-agent system failures through targeted interventions (message edits, plan changes) rather than static log analysis. DoVer recovers 18-28% of failed trials with 30-60% hypothesis validation—essential for autonomous multi-agent reliability.
npx skillsauth add ADu2021/skillXiv dover-auto-debuggingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
DoVer augments traditional log-based debugging with active verification through targeted system interventions. Rather than accepting single-point attributions, the framework systematically tests modifications to agent communications and planning to determine which changes resolve failures, providing practical mechanisms for improving multi-agent reliability.
Hypothesis generation and verification through targeted interventions:
# Intervention-Driven Debugging Framework
class DoVerDebugger:
def __init__(self, agent_framework):
self.framework = agent_framework # e.g., AG2, Anthropic Framework
self.failed_trials = []
def analyze_failure(self, trial):
"""
Analyze failed trial to generate debugging hypotheses.
Goes beyond log analysis to test interventions.
"""
hypothesis_candidates = []
# Extract relevant context
history = trial.execution_trace
agents = trial.agents
final_state = trial.final_state
task = trial.task
# Hypothesis 1: Agent A made incorrect decision
for agent in agents:
hypothesis = {
'type': 'agent_decision',
'agent': agent,
'hypothesis': f"Agent {agent.name} made suboptimal decision"
}
hypothesis_candidates.append(hypothesis)
# Hypothesis 2: Communication failure
for agent_pair in self.get_agent_pairs(agents):
hypothesis = {
'type': 'communication',
'agents': agent_pair,
'hypothesis': f"Communication between {agent_pair} failed"
}
hypothesis_candidates.append(hypothesis)
# Hypothesis 3: Plan was suboptimal
hypothesis = {
'type': 'plan',
'hypothesis': 'Task decomposition or planning was incorrect'
}
hypothesis_candidates.append(hypothesis)
return hypothesis_candidates
def test_hypothesis_via_intervention(self, trial, hypothesis):
"""
Verify hypothesis by intervening in system and observing outcome.
Multiple modification types enable comprehensive testing.
"""
if hypothesis['type'] == 'agent_decision':
# Intervention: Suggest alternative action to agent
return self.test_agent_intervention(trial, hypothesis)
elif hypothesis['type'] == 'communication':
# Intervention: Edit messages between agents
return self.test_message_intervention(trial, hypothesis)
elif hypothesis['type'] == 'plan':
# Intervention: Alter task decomposition
return self.test_plan_intervention(trial, hypothesis)
def test_agent_intervention(self, trial, hypothesis):
"""
Re-run trial with suggested action changes for problem agent.
"""
agent = hypothesis['agent']
original_trial = trial
# Generate alternative actions agent could take
alternatives = self.generate_alternative_actions(
agent,
original_trial.execution_trace
)
results = []
for alternative_action in alternatives:
# Run modified trial with alternative action
modified_trial = self.run_with_intervention(
original_trial,
agent,
alternative_action
)
# Check if intervention resolved failure
success = modified_trial.completed_successfully
results.append({
'intervention': alternative_action,
'outcome': 'success' if success else 'failure',
'trial': modified_trial
})
return results
def test_message_intervention(self, trial, hypothesis):
"""
Test whether editing agent messages resolves failure.
"""
agent_pair = hypothesis['agents']
agent_a, agent_b = agent_pair
# Identify messages between agents
messages = self.extract_messages(
trial.execution_trace,
agent_a,
agent_b
)
results = []
for message in messages:
# Generate improved message versions
improved_messages = self.improve_message(message)
for improved_msg in improved_messages:
# Re-run trial with modified message
modified_trial = self.run_with_message_edit(
trial,
message,
improved_msg
)
success = modified_trial.completed_successfully
results.append({
'original_message': message,
'improved_message': improved_msg,
'outcome': 'success' if success else 'failure'
})
return results
def test_plan_intervention(self, trial, hypothesis):
"""
Test if altering task decomposition/plan resolves failure.
"""
original_plan = trial.plan
problem_stage = self.identify_problem_stage(trial)
# Generate alternative decompositions
alternative_plans = self.generate_alternative_plans(
trial.task,
original_plan,
problem_stage
)
results = []
for alt_plan in alternative_plans:
# Re-run trial with modified plan
modified_trial = self.run_with_plan_intervention(
trial,
alt_plan
)
success = modified_trial.completed_successfully
progress = self.measure_progress(modified_trial)
results.append({
'alternative_plan': alt_plan,
'outcome': 'success' if success else 'failure',
'progress': progress
})
return results
def validate_hypothesis(self, hypothesis, intervention_results):
"""
Determine if hypothesis is validated/refuted based on intervention results.
Focus on outcomes rather than single causation.
"""
successful_interventions = [
r for r in intervention_results
if r['outcome'] == 'success'
]
if len(successful_interventions) > 0:
return 'validated', successful_interventions
else:
# Check for partial progress
progress_results = [
r for r in intervention_results
if r.get('progress', 0) > 0
]
if len(progress_results) > 0:
return 'partial', progress_results
else:
return 'refuted', []
def apply_successful_intervention(self, trial_id, successful_intervention):
"""
Apply validated intervention to recover from failure.
"""
self.framework.apply_intervention(
trial_id,
successful_intervention
)
# Re-run task with modification
recovered_trial = self.framework.retry_with_intervention(
trial_id,
successful_intervention
)
return recovered_trial
Outcome-oriented evaluation measures success-focused metrics: failure recovery rate and progress toward task completion.
testing
Uses flow maps as look-ahead operators to enable principled reward-guided diffusion by predicting trajectory endpoints at any denoising step. Deploy when applying rewards or preferences to diffusion trajectories with meaningful gradients throughout generation.
testing
Train language models where each expert learns independently on closed datasets, enabling flexible inference with selective data inclusion or exclusion. 41% performance improvement while allowing users to opt out of specific data sources without retraining.
data-ai
Understand how token generation flexibility in diffusion LMs paradoxically constrains reasoning, as models exploit ordering flexibility to avoid uncertain tokens, and apply simplified approaches that preserve parallel decoding benefits. Use when optimizing diffusion-based language models for reasoning tasks.
devops
Enable LLM agents to improve continuously during deployment by constructing structured experience libraries through self-reflection on successes and failures—achieving 23% improvement on reasoning without gradient-based parameter updates or external training.