Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

ADu2021/agent-ocr-history-compression

Name: agent-ocr-history-compression
Author: ADu2021

skills/skillxiv-v0.0.2-claude-opus-4.6/agent-ocr-history-compression/SKILL.md

npx skillsauth add ADu2021/skillXiv agent-ocr-history-compression

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Problem

Agent interaction histories grow rapidly, creating bottlenecks:

Token Explosion: Long sequences of observations and actions consume enormous token counts
Memory Pressure: Multi-turn agent executions accumulate context that exceeds token limits
Computational Cost: Each token in history requires reprocessing in forward passes
Limited Horizon: Token constraints force agents to forget recent history or truncate interactions
Inefficient Encoding: Text-based history is redundant (JSON records, timestamped logs)

Agents need ways to compress histories without losing critical information for decision-making.

Solution

AgentOCR introduces Optical Self-Compression for agent histories:

Visual Representation: Convert observation-action sequences into rendered images that capture state visually
- Renders agent state, actions taken, and outcomes as structured visual layouts
- Leverages fact that vision models process visual information with higher information density than text
Segment Optical Caching: Decomposes history into hashable segments with visual cache
- Segments: [state → action → outcome] → render to image
- Cache: map segment hash → cached image rendering
- 20x speedup through cache hits and vectorized rendering
Agentic Self-Compression: Agent learns to dynamically emit compression rates
- Trade-off: compress aggressively to preserve computation budget, or maintain detail for critical decisions
- Agent optimizes compression rate given current task demands

When to Use

Long-Horizon Agents: Tasks spanning 100+ interaction steps
Token-Constrained Deployment: Systems with fixed token budgets (mobile, inference servers)
Multi-Turn Applications: Dialogue agents, interactive tools with extended conversations
Expensive Computation: GPU-limited systems where per-token cost matters
History-Heavy Reasoning: Agents whose future decisions depend on cumulative history

When NOT to Use

For short-horizon tasks (compression overhead exceeds savings)
When exact textual history is required for auditing
In systems with unlimited token budgets
For tasks where visual compression might lose critical information

Core Concepts

The framework operates on the principle that visual encoding is information-dense:

Visual > Text: Images compress agent states more efficiently than textual logs
Segment Caching: Repeated state patterns can be cached and reused
Dynamic Compression: Agents learn when to compress aggressively vs. preserve detail
Performance Preservation: Aggressive compression doesn't significantly harm agent performance

Key Implementation Pattern

Implementing agent history compression:

# Conceptual: optical self-compression for agent histories
class CompressedAgentMemory:
    def __init__(self):
        self.segments = []        # [state, action, outcome] tuples
        self.visual_cache = {}    # hash -> rendered image

    def record_step(self, state, action, outcome):
        segment = (state, action, outcome)

        # Compute segment hash for caching
        segment_hash = hash(segment)

        # Render or retrieve from cache
        if segment_hash in self.visual_cache:
            visual = self.visual_cache[segment_hash]
        else:
            visual = self.render_segment(segment)
            self.visual_cache[segment_hash] = visual

        self.segments.append({
            'segment': segment,
            'visual': visual,
            'hash': segment_hash
        })

    def compress_history(self, compression_rate):
        """
        compression_rate: 0.0 (no compression) to 1.0 (maximum)
        """
        if compression_rate == 0.0:
            return self.segments  # Full history as text

        # Sample segments based on importance
        num_keep = int(len(self.segments) * (1 - compression_rate))
        important_segments = self.select_important(num_keep)

        # Convert kept segments to visuals
        compressed = [seg['visual'] for seg in important_segments]

        return compressed

Key mechanisms:

Segment rendering: state → structured visual layout
Hash-based caching: avoid re-rendering duplicate states
Importance sampling: prioritize critical decision points
Dynamic compression: agent chooses compression rate

Expected Outcomes

50%+ Token Reduction: Compressed visual histories use half the tokens
95%+ Performance Retention: Agent task performance barely degrades despite compression
20x Rendering Speedup: Caching eliminates redundant visual rendering
Longer Horizons: Fixed token budget enables more interaction steps
Flexible Trade-Offs: Agents adapt compression dynamically

Limitations and Considerations

Visual rendering requires computational overhead (though caching mitigates)
Some information loss inevitable; compression rate must be tuned per task
Visual encoding assumes vision-capable agents (text-only models don't benefit)
Cached segments may become stale if environment state representation changes

Integration Pattern

For a long-horizon web search agent:

Record Steps: Each search iteration records [query, results, decision]
Visual Encoding: Render query-results-decision as structured image
Cache Hit: Repeated search patterns hit cache, skip rendering
Compress When Needed: As history grows, compress older segments visually
Use Compressed History: Future reasoning uses compressed visuals + recent text

This maintains decision-making quality while managing token budget.

Compression Rate Tuning

Start with 0.3-0.5 compression rate for most tasks:

0.0-0.2: Minimal compression, preserve detail (critical decisions)
0.3-0.5: Moderate compression, good balance
0.6-0.8: Aggressive compression, tight token budgets
0.9+: Extreme compression, only for tasks tolerant of history loss

Related Work Context

AgentOCR advances agent efficiency by recognizing that history storage and processing is a core bottleneck. By shifting from text-based logs to visual representations, it enables longer-horizon agents within fixed computational budgets. This infrastructure improvement indirectly supports more complex agent behavior.

ADu2021/agent-ocr-history-compression

skills/skillxiv-v0.0.2-claude-opus-4.6/agent-ocr-history-compression/SKILL.md

Compress agent interaction history by converting observation-action sequences into compact visual representations (images), leveraging visual tokens' superior information density. Implements segment optical caching with 20x rendering speedup and enables dynamic compression rates. Preserves over 95% of agent performance while reducing token consumption by 50%+, enabling agents to maintain longer interaction histories within fixed budgets.

2 stars

development

Updated Apr 16, 2026

$ install --global

skillsauth

npx skillsauth add ADu2021/skillXiv agent-ocr-history-compression

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 16, 2026, 3:02 PM4.1s1 file scanned

SKILL.md

name:: agent-ocr-history-compression
title:: AgentOCR: Reimagining Agent History via Optical Self-Compression
version:: 0.0.2
engine:: skillxiv-v0.0.2-claude-opus-4.6
license:: MIT
url:: https://arxiv.org/abs/2601.04786
keywords:: [agent-efficiency, history-compression, token-optimization, visual-encoding, long-context]
description:: Compress agent interaction history by converting observation-action sequences into compact visual representations (images), leveraging visual tokens' superior information density. Implements segment optical caching with 20x rendering speedup and enables dynamic compression rates. Preserves over 95% of agent performance while reducing token consumption by 50%+, enabling agents to maintain longer interaction histories within fixed budgets.

Problem

Agent interaction histories grow rapidly, creating bottlenecks:

Token Explosion: Long sequences of observations and actions consume enormous token counts
Memory Pressure: Multi-turn agent executions accumulate context that exceeds token limits
Computational Cost: Each token in history requires reprocessing in forward passes
Limited Horizon: Token constraints force agents to forget recent history or truncate interactions
Inefficient Encoding: Text-based history is redundant (JSON records, timestamped logs)

Agents need ways to compress histories without losing critical information for decision-making.

Solution

AgentOCR introduces Optical Self-Compression for agent histories:

Visual Representation: Convert observation-action sequences into rendered images that capture state visually
- Renders agent state, actions taken, and outcomes as structured visual layouts
- Leverages fact that vision models process visual information with higher information density than text
Segment Optical Caching: Decomposes history into hashable segments with visual cache
- Segments: [state → action → outcome] → render to image
- Cache: map segment hash → cached image rendering
- 20x speedup through cache hits and vectorized rendering
Agentic Self-Compression: Agent learns to dynamically emit compression rates
- Trade-off: compress aggressively to preserve computation budget, or maintain detail for critical decisions
- Agent optimizes compression rate given current task demands

When to Use

Long-Horizon Agents: Tasks spanning 100+ interaction steps
Token-Constrained Deployment: Systems with fixed token budgets (mobile, inference servers)
Multi-Turn Applications: Dialogue agents, interactive tools with extended conversations
Expensive Computation: GPU-limited systems where per-token cost matters
History-Heavy Reasoning: Agents whose future decisions depend on cumulative history

When NOT to Use

For short-horizon tasks (compression overhead exceeds savings)
When exact textual history is required for auditing
In systems with unlimited token budgets
For tasks where visual compression might lose critical information

Core Concepts

The framework operates on the principle that visual encoding is information-dense:

Visual > Text: Images compress agent states more efficiently than textual logs
Segment Caching: Repeated state patterns can be cached and reused
Dynamic Compression: Agents learn when to compress aggressively vs. preserve detail
Performance Preservation: Aggressive compression doesn't significantly harm agent performance

Key Implementation Pattern

Implementing agent history compression:

# Conceptual: optical self-compression for agent histories
class CompressedAgentMemory:
    def __init__(self):
        self.segments = []        # [state, action, outcome] tuples
        self.visual_cache = {}    # hash -> rendered image

    def record_step(self, state, action, outcome):
        segment = (state, action, outcome)

        # Compute segment hash for caching
        segment_hash = hash(segment)

        # Render or retrieve from cache
        if segment_hash in self.visual_cache:
            visual = self.visual_cache[segment_hash]
        else:
            visual = self.render_segment(segment)
            self.visual_cache[segment_hash] = visual

        self.segments.append({
            'segment': segment,
            'visual': visual,
            'hash': segment_hash
        })

    def compress_history(self, compression_rate):
        """
        compression_rate: 0.0 (no compression) to 1.0 (maximum)
        """
        if compression_rate == 0.0:
            return self.segments  # Full history as text

        # Sample segments based on importance
        num_keep = int(len(self.segments) * (1 - compression_rate))
        important_segments = self.select_important(num_keep)

        # Convert kept segments to visuals
        compressed = [seg['visual'] for seg in important_segments]

        return compressed

Key mechanisms:

Segment rendering: state → structured visual layout
Hash-based caching: avoid re-rendering duplicate states
Importance sampling: prioritize critical decision points
Dynamic compression: agent chooses compression rate

Expected Outcomes

50%+ Token Reduction: Compressed visual histories use half the tokens
95%+ Performance Retention: Agent task performance barely degrades despite compression
20x Rendering Speedup: Caching eliminates redundant visual rendering
Longer Horizons: Fixed token budget enables more interaction steps
Flexible Trade-Offs: Agents adapt compression dynamically

Limitations and Considerations

Visual rendering requires computational overhead (though caching mitigates)
Some information loss inevitable; compression rate must be tuned per task
Visual encoding assumes vision-capable agents (text-only models don't benefit)
Cached segments may become stale if environment state representation changes

Integration Pattern

For a long-horizon web search agent:

Record Steps: Each search iteration records [query, results, decision]
Visual Encoding: Render query-results-decision as structured image
Cache Hit: Repeated search patterns hit cache, skip rendering
Compress When Needed: As history grows, compress older segments visually
Use Compressed History: Future reasoning uses compressed visuals + recent text

This maintains decision-making quality while managing token budget.

Compression Rate Tuning

Start with 0.3-0.5 compression rate for most tasks:

0.0-0.2: Minimal compression, preserve detail (critical decisions)
0.3-0.5: Moderate compression, good balance
0.6-0.8: Aggressive compression, tight token budgets
0.9+: Extreme compression, only for tasks tolerant of history loss

Related Work Context

Related Skills

ADu2021/flow-map-trajectory-tilting

testing

VerifiedTrustedCommunity

Uses flow maps as look-ahead operators to enable principled reward-guided diffusion by predicting trajectory endpoints at any denoising step. Deploy when applying rewards or preferences to diffusion trajectories with meaningful gradients throughout generation.

2SKILL.mdUpdated Apr 17, 2026

ADu2021/flow-map-trajectory-tilting

ADu2021/flexible-data-mixture-of-experts

testing

VerifiedTrustedCommunity

Train language models where each expert learns independently on closed datasets, enabling flexible inference with selective data inclusion or exclusion. 41% performance improvement while allowing users to opt out of specific data sources without retraining.

2SKILL.mdUpdated Apr 17, 2026

ADu2021/flexible-data-mixture-of-experts

ADu2021/flexibility-trap-diffusion-reasoning

data-ai

VerifiedTrustedCommunity

Understand how token generation flexibility in diffusion LMs paradoxically constrains reasoning, as models exploit ordering flexibility to avoid uncertain tokens, and apply simplified approaches that preserve parallel decoding benefits. Use when optimizing diffusion-based language models for reasoning tasks.

2SKILL.mdUpdated Apr 17, 2026

ADu2021/flexibility-trap-diffusion-reasoning

ADu2021/flex-continuous-agent-evolution

devops

VerifiedTrustedCommunity

Enable LLM agents to improve continuously during deployment by constructing structured experience libraries through self-reflection on successes and failures—achieving 23% improvement on reasoning without gradient-based parameter updates or external training.

2SKILL.mdUpdated Apr 17, 2026

ADu2021/flex-continuous-agent-evolution

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/ADu2021/skillXiv.git

# Copy into Claude Code skills folder (global)
cp -r skillXiv/skills/skillxiv-v0.0.2-claude-opus-4.6/agent-ocr-history-compression ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

ADu2021/skillXiv

2 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT