skills/skillxiv-v0.0.2-claude-opus-4.6/agent-ocr-history-compression/SKILL.md
Compress agent interaction history by converting observation-action sequences into compact visual representations (images), leveraging visual tokens' superior information density. Implements segment optical caching with 20x rendering speedup and enables dynamic compression rates. Preserves over 95% of agent performance while reducing token consumption by 50%+, enabling agents to maintain longer interaction histories within fixed budgets.
npx skillsauth add ADu2021/skillXiv agent-ocr-history-compressionInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Agent interaction histories grow rapidly, creating bottlenecks:
Agents need ways to compress histories without losing critical information for decision-making.
AgentOCR introduces Optical Self-Compression for agent histories:
The framework operates on the principle that visual encoding is information-dense:
Implementing agent history compression:
# Conceptual: optical self-compression for agent histories
class CompressedAgentMemory:
def __init__(self):
self.segments = [] # [state, action, outcome] tuples
self.visual_cache = {} # hash -> rendered image
def record_step(self, state, action, outcome):
segment = (state, action, outcome)
# Compute segment hash for caching
segment_hash = hash(segment)
# Render or retrieve from cache
if segment_hash in self.visual_cache:
visual = self.visual_cache[segment_hash]
else:
visual = self.render_segment(segment)
self.visual_cache[segment_hash] = visual
self.segments.append({
'segment': segment,
'visual': visual,
'hash': segment_hash
})
def compress_history(self, compression_rate):
"""
compression_rate: 0.0 (no compression) to 1.0 (maximum)
"""
if compression_rate == 0.0:
return self.segments # Full history as text
# Sample segments based on importance
num_keep = int(len(self.segments) * (1 - compression_rate))
important_segments = self.select_important(num_keep)
# Convert kept segments to visuals
compressed = [seg['visual'] for seg in important_segments]
return compressed
Key mechanisms:
For a long-horizon web search agent:
This maintains decision-making quality while managing token budget.
Start with 0.3-0.5 compression rate for most tasks:
AgentOCR advances agent efficiency by recognizing that history storage and processing is a core bottleneck. By shifting from text-based logs to visual representations, it enables longer-horizon agents within fixed computational budgets. This infrastructure improvement indirectly supports more complex agent behavior.
testing
Uses flow maps as look-ahead operators to enable principled reward-guided diffusion by predicting trajectory endpoints at any denoising step. Deploy when applying rewards or preferences to diffusion trajectories with meaningful gradients throughout generation.
testing
Train language models where each expert learns independently on closed datasets, enabling flexible inference with selective data inclusion or exclusion. 41% performance improvement while allowing users to opt out of specific data sources without retraining.
data-ai
Understand how token generation flexibility in diffusion LMs paradoxically constrains reasoning, as models exploit ordering flexibility to avoid uncertain tokens, and apply simplified approaches that preserve parallel decoding benefits. Use when optimizing diffusion-based language models for reasoning tasks.
devops
Enable LLM agents to improve continuously during deployment by constructing structured experience libraries through self-reflection on successes and failures—achieving 23% improvement on reasoning without gradient-based parameter updates or external training.