skills/skillxiv-v0.0.2-claude-opus-4.6/deep-search-hmc/SKILL.md
Monitor search agent reasoning quality via hierarchical uncertainty detection. Fast consistency checks identify anomalies; slow experience-driven feedback provides corrections. Minimal overhead while catching misalignment.
npx skillsauth add ADu2021/skillXiv deep-search-hmcInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Deep search agents can pursue incorrect trajectories for many steps before detection. Standard uncertainty metrics like token entropy are ambiguous—high entropy may reflect legitimate exploration rather than errors.
Most search improvement methods apply global interventions expensive across all steps. Fine-grained, targeted monitoring is needed.
The system implements two monitoring layers: a fast consistency monitor checking every step, and a slow experience-driven monitor activated on anomalies. The fast monitor compares reasoning uncertainty against evidence uncertainty—high misalignment triggers slow monitoring.
Rather than treating entropy in isolation, the system calibrates reasoning entropy to external evidence entropy, distinguishing legitimate exploration from actual failures.
Measure evidence and reasoning uncertainty independently.
def compute_searching_entropy(retrieved_results, embedding_model):
"""Compute semantic diversity of retrieved evidence."""
# Embed retrieved results
embeddings = [embedding_model.encode(r) for r in retrieved_results]
# Cluster embeddings to measure semantic diversity
distances = pairwise_distances(embeddings)
avg_distance = distances.mean()
# Higher average distance = higher entropy
searching_entropy = avg_distance / np.linalg.norm(embeddings[0])
return searching_entropy
def compute_reasoning_entropy(model_logits):
"""Compute model prediction uncertainty."""
# Standard entropy over token distribution
probs = F.softmax(model_logits, dim=-1)
reasoning_entropy = -(probs * torch.log(probs + 1e-6)).sum()
return reasoning_entropy
Check entropy calibration at each search step.
def fast_consistency_monitor(reasoning_entropy, searching_entropy, threshold=1.0):
"""Detect misalignment between reasoning and evidence entropy."""
# Expected reasoning entropy given evidence diversity
expected_re = searching_entropy * 0.8 # Empirical calibration
# Anomaly detection: reasoning entropy >> expected
deviation = reasoning_entropy - expected_re
is_anomaly = deviation > threshold
return is_anomaly, deviation
Retrieve and apply past experiences when anomalies detected.
class ExperienceMemory:
def __init__(self, embedding_model):
self.success_experiences = []
self.failure_experiences = []
self.embedding_model = embedding_model
def retrieve_relevant_experience(self, current_state, k=3):
"""Find similar past experiences."""
current_embedding = self.embedding_model.encode(current_state)
# Compute similarities to stored experiences
all_experiences = self.success_experiences + self.failure_experiences
similarities = [
cosine_similarity(current_embedding, self.embedding_model.encode(exp['state']))
for exp in all_experiences
]
# Return top-k most similar
top_indices = np.argsort(similarities)[-k:]
return [all_experiences[i] for i in top_indices]
def generate_correction(self, current_state, retrieved_experiences, correction_model):
"""Generate corrective action from past experiences."""
context = f"Current state: {current_state}\n\n"
context += "Similar past experiences:\n"
for exp in retrieved_experiences:
context += f"- {exp['description']}\n"
correction = correction_model.generate(context)
return correction
Integrate fast and slow monitoring in search loop.
def search_with_hmc(model, initial_query, experience_memory, max_steps=50):
"""Execute search with hierarchical meta-cognitive monitoring."""
current_state = initial_query
search_trajectory = []
for step in range(max_steps):
# Retrieve evidence
retrieved = retrieve_top_k(current_state, k=5)
# Generate reasoning
model_output = model.generate(current_state)
logits = model.get_logits(current_state)
# Compute entropies
se = compute_searching_entropy(retrieved)
re = compute_reasoning_entropy(logits)
# Fast monitoring
is_anomaly, deviation = fast_consistency_monitor(re, se)
if is_anomaly:
# Slow monitoring activation
state_representation = f"Query: {current_state}\nDeviation: {deviation}"
similar_experiences = experience_memory.retrieve_relevant_experience(state_representation)
correction = experience_memory.generate_correction(state_representation, similar_experiences)
# Apply correction
model_output = correction
search_trajectory.append({
'step': step,
'state': current_state,
'anomaly': is_anomaly,
'output': model_output
})
current_state = model_output
return search_trajectory
| Parameter | Value | Notes | |-----------|-------|-------| | Fast monitor threshold (τ) | 1.0 * std | Calibrate on clean runs | | Entropy similarity margin | k-sigma | k=1 for 68% confidence | | Experience retrieval k | 3-5 | Balance coverage and cost | | Searching entropy normalization | Embedding dim | Scale to embedding space | | Slow monitor overhead | 3-7% | Additional latency acceptable |
Deep Search with Hierarchical Meta-Cognitive Monitoring https://arxiv.org/abs/2601.23188
testing
Uses flow maps as look-ahead operators to enable principled reward-guided diffusion by predicting trajectory endpoints at any denoising step. Deploy when applying rewards or preferences to diffusion trajectories with meaningful gradients throughout generation.
testing
Train language models where each expert learns independently on closed datasets, enabling flexible inference with selective data inclusion or exclusion. 41% performance improvement while allowing users to opt out of specific data sources without retraining.
data-ai
Understand how token generation flexibility in diffusion LMs paradoxically constrains reasoning, as models exploit ordering flexibility to avoid uncertain tokens, and apply simplified approaches that preserve parallel decoding benefits. Use when optimizing diffusion-based language models for reasoning tasks.
devops
Enable LLM agents to improve continuously during deployment by constructing structured experience libraries through self-reflection on successes and failures—achieving 23% improvement on reasoning without gradient-based parameter updates or external training.