AgentOCR:通过光学自压缩重构智能体历史记录 / AgentOCR: Reimagining Agent History via Optical Self-Compression
1️⃣ 一句话总结
这篇论文提出了一个名为AgentOCR的新框架,它通过将智能体交互历史转换成紧凑的图像而非冗长的文本,并让智能体自己学习如何平衡任务成功率和计算资源消耗,从而在保持高性能的同时,大幅降低了AI智能体运行所需的计算和内存开销。
Recent advances in large language models (LLMs) enable agentic systems trained with reinforcement learning (RL) over multi-turn interaction trajectories, but practical deployment is bottlenecked by rapidly growing textual histories that inflate token budgets and memory usage. We introduce AgentOCR, a framework that exploits the superior information density of visual tokens by representing the accumulated observation-action history as a compact rendered image. To make multi-turn rollouts scalable, AgentOCR proposes segment optical caching. By decomposing history into hashable segments and maintaining a visual cache, this mechanism eliminates redundant re-rendering. Beyond fixed rendering, AgentOCR introduces agentic self-compression, where the agent actively emits a compression rate and is trained with compression-aware reward to adaptively balance task success and token efficiency. We conduct extensive experiments on challenging agentic benchmarks, ALFWorld and search-based QA. Remarkably, results demonstrate that AgentOCR preserves over 95\% of text-based agent performance while substantially reducing token consumption (>50\%), yielding consistent token and memory efficiency. Our further analysis validates a 20x rendering speedup from segment optical caching and the effective strategic balancing of self-compression.
AgentOCR:通过光学自压缩重构智能体历史记录 / AgentOCR: Reimagining Agent History via Optical Self-Compression
这篇论文提出了一个名为AgentOCR的新框架,它通过将智能体交互历史转换成紧凑的图像而非冗长的文本,并让智能体自己学习如何平衡任务成功率和计算资源消耗,从而在保持高性能的同时,大幅降低了AI智能体运行所需的计算和内存开销。
源自 arXiv: 2601.04786