skills/skillxiv-v0.0.2-claude-opus-4.6/attention-basin-reranking/SKILL.md
Demonstrates position bias where LLMs neglect middle content while over-attending to endpoints. Proposes Attention-Driven Reranking (AttnRank) to align content with model's intrinsic attention preferences.
npx skillsauth add ADu2021/skillXiv attention-basin-rerankingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Language models systematically misallocate attention based on input position, neglecting information in the middle while over-attending to beginning and end—a phenomenon termed "attention basin." Rather than trying to fix this bias, AttnRank exploits it by strategically repositioning critical information to positions where models naturally allocate high attention. This simple, plug-and-play approach requires no model modifications or training.
Measure how much attention each position receives in a model.
import numpy as np
from typing import List, Tuple, Dict
class AttentionBasinAnalyzer:
"""
Characterize position-specific attention bias in LLMs.
"""
def __init__(self, model):
self.model = model
self.position_weights = None
def estimate_position_attention(
self,
calibration_examples: List[str],
num_positions: int = 16
) -> np.ndarray:
"""
Estimate attention weight per position.
Args:
calibration_examples: Sample texts to analyze
num_positions: Number of position buckets
Returns:
Position attention weights [num_positions]
"""
position_scores = np.zeros(num_positions)
for example in calibration_examples:
# Split into chunks
chunks = self._split_into_chunks(example, num_positions)
# For each chunk, measure how much it affects model behavior
for pos_idx, chunk in enumerate(chunks):
# Create prompt with chunk at different positions
score = self._measure_chunk_importance(chunks, pos_idx)
position_scores[pos_idx] += score
# Average across examples
position_scores /= len(calibration_examples)
# Smooth to reduce noise
position_scores = self._smooth_weights(position_scores)
self.position_weights = position_scores / position_scores.sum()
return self.position_weights
def _split_into_chunks(self, text: str, num_chunks: int) -> List[str]:
"""Split text into equal-sized chunks."""
words = text.split()
chunk_size = len(words) // num_chunks
chunks = []
for i in range(num_chunks):
start = i * chunk_size
end = start + chunk_size if i < num_chunks - 1 else len(words)
chunks.append(" ".join(words[start:end]))
return chunks
def _measure_chunk_importance(self, chunks: List[str], position_idx: int) -> float:
"""
Measure how important chunk at position is to model.
Uses gradient-based importance: how much does removing this chunk
change the model's output?
"""
# Create full text
full_text = " ".join(chunks)
# Create ablated text (remove chunk at position)
ablated_chunks = chunks[:position_idx] + chunks[position_idx + 1:]
ablated_text = " ".join(ablated_chunks)
# Measure output difference
prompt = "Summarize: "
full_output = self.model.generate(prompt + full_text, max_length=50)
ablated_output = self.model.generate(prompt + ablated_text, max_length=50)
# Similarity of outputs
similarity = self._compute_similarity(full_output, ablated_output)
# Importance is inverse of similarity (different = important)
importance = 1.0 - similarity
return importance
def _compute_similarity(self, text1: str, text2: str) -> float:
"""Compute semantic similarity between texts."""
# Simple: word overlap
words1 = set(text1.lower().split())
words2 = set(text2.lower().split())
if not words1 or not words2:
return 0.0
intersection = len(words1 & words2)
union = len(words1 | words2)
return intersection / union if union > 0 else 0.0
def _smooth_weights(self, weights: np.ndarray) -> np.ndarray:
"""Smooth weights to reduce noise."""
# Moving average
kernel_size = 3
kernel = np.ones(kernel_size) / kernel_size
smoothed = np.convolve(weights, kernel, mode='same')
return smoothed
Create reranking algorithm based on estimated attention patterns.
class AttentionDrivenReranking:
"""
Reorder content to match model's attention preferences.
"""
def __init__(self, analyzer: AttentionBasinAnalyzer):
self.analyzer = analyzer
def rerank_documents(
self,
documents: List[str],
query: str = None,
target_positions: int = None
) -> List[str]:
"""
Rerank documents to align with attention basin.
Args:
documents: List of documents/examples to rerank
query: Optional query for context
target_positions: Number of positions (len(documents) if None)
Returns:
Reranked documents
"""
if target_positions is None:
target_positions = len(documents)
# Get model's attention preferences
attention_weights = self.analyzer.position_weights
if attention_weights is None:
raise ValueError("Analyzer must be calibrated first")
# Rank documents by importance
doc_scores = self._compute_document_relevance(documents, query)
# Assign documents to positions based on attention weights
# High-attention positions get high-relevance documents
ranked_indices = np.argsort(-np.array(doc_scores)) # Descending
# Create assignment: position_importance -> document_rank
position_importance = -np.sort(-attention_weights[:target_positions])
position_indices = np.argsort(-attention_weights[:target_positions])
# Assign documents to positions
assignment = [None] * target_positions
for pos_rank, pos_idx in enumerate(position_indices):
if pos_rank < len(ranked_indices):
assignment[pos_idx] = documents[ranked_indices[pos_rank]]
# Fill remaining positions
assigned_docs = set(ranked_indices[:target_positions])
remaining_docs = [i for i in range(len(documents)) if i not in assigned_docs]
for i in range(target_positions):
if assignment[i] is None and remaining_docs:
assignment[i] = documents[remaining_docs.pop()]
return assignment
def _compute_document_relevance(self, documents: List[str], query: str = None) -> List[float]:
"""Compute relevance of each document."""
if query is None:
# If no query, use document length as proxy for informativeness
return [len(doc.split()) for doc in documents]
# Otherwise use semantic similarity to query
scores = []
for doc in documents:
similarity = self._compute_similarity(query, doc)
scores.append(similarity)
return scores
def _compute_similarity(self, text1: str, text2: str) -> float:
"""Compute similarity between texts."""
words1 = set(text1.lower().split())
words2 = set(text2.lower().split())
if not words1 or not words2:
return 0.0
intersection = len(words1 & words2)
union = len(words1 | words2)
return intersection / union if union > 0 else 0.0
Create efficient calibration procedure using small example set.
class ZeroShotCalibration:
"""
Calibrate attention basin with minimal examples.
"""
def __init__(self, model):
self.model = model
self.analyzer = AttentionBasinAnalyzer(model)
def calibrate(self, num_calibration_examples: int = 10) -> AttentionBasinAnalyzer:
"""
Calibrate attention basin with few examples.
Args:
num_calibration_examples: Number of examples for calibration
Returns:
Calibrated analyzer
"""
# Generate synthetic calibration examples
calibration_examples = self._generate_calibration_examples(
num_calibration_examples
)
# Estimate position attention
self.analyzer.estimate_position_attention(calibration_examples)
return self.analyzer
def _generate_calibration_examples(self, num_examples: int) -> List[str]:
"""Generate diverse calibration examples."""
topics = [
"scientific discoveries",
"historical events",
"technology advances",
"cultural phenomena",
"natural phenomena"
]
examples = []
for i in range(num_examples):
topic = topics[i % len(topics)]
prompt = f"""
Write a 200-word passage about {topic}.
Include diverse information and multiple facts.
"""
example = self.model.generate(prompt, max_length=200)
examples.append(example)
return examples
Integrate calibration and reranking into single pipeline.
def attention_driven_pipeline(
model,
task: str,
documents_or_examples: List[str],
query: str = None,
calibration_size: int = 10
) -> Tuple[List[str], Dict]:
"""
Complete pipeline: calibrate, rerank, and use.
Args:
model: Language model
task: Task type ("retrieval", "few-shot", "context")
documents_or_examples: Items to rerank
query: Optional query for ranking
calibration_size: Number of calibration examples
Returns:
(reranked_items, metrics)
"""
# Step 1: Calibrate
calibrator = ZeroShotCalibration(model)
analyzer = calibrator.calibrate(calibration_size)
# Step 2: Rerank
reranker = AttentionDrivenReranking(analyzer)
reranked = reranker.rerank_documents(
documents_or_examples,
query=query,
target_positions=len(documents_or_examples)
)
# Step 3: Evaluate improvement
metrics = evaluate_reranking_improvement(
model,
documents_or_examples,
reranked,
task
)
return reranked, metrics
def evaluate_reranking_improvement(model, original, reranked, task) -> Dict:
"""Evaluate improvement from reranking."""
# For retrieval: measure ranking quality
# For few-shot: measure accuracy with reranked examples
# For context: measure answer quality
metrics = {
"task": task,
"original_order": original,
"reranked_order": reranked,
"improvement": 0.0
}
# Task-specific evaluation
if task == "retrieval":
# Measure ranking metrics
pass
elif task == "few-shot":
# Measure few-shot accuracy
pass
elif task == "context":
# Measure QA accuracy
pass
return metrics
The critical insight is that position bias is intrinsic to transformer attention and not easily fixable. Rather than fight the model's natural preferences, AttnRank exploits them. By placing high-relevance content in high-attention positions, significant performance gains emerge without model modification. The approach is universally applicable across tasks.
Attention Basin: Why Contextual Position Matters in LLMs (arXiv:2508.05128)
Characterizes systematic position bias where models neglect middle content. Proposes Attention-Driven Reranking that strategically repositions information to high-attention zones, improving performance across retrieval, few-shot, and multi-hop reasoning tasks without model changes.
testing
Uses flow maps as look-ahead operators to enable principled reward-guided diffusion by predicting trajectory endpoints at any denoising step. Deploy when applying rewards or preferences to diffusion trajectories with meaningful gradients throughout generation.
testing
Train language models where each expert learns independently on closed datasets, enabling flexible inference with selective data inclusion or exclusion. 41% performance improvement while allowing users to opt out of specific data sources without retraining.
data-ai
Understand how token generation flexibility in diffusion LMs paradoxically constrains reasoning, as models exploit ordering flexibility to avoid uncertain tokens, and apply simplified approaches that preserve parallel decoding benefits. Use when optimizing diffusion-based language models for reasoning tasks.
devops
Enable LLM agents to improve continuously during deployment by constructing structured experience libraries through self-reflection on successes and failures—achieving 23% improvement on reasoning without gradient-based parameter updates or external training.