菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-01-15
📄 Abstract - Toward Ultra-Long-Horizon Agentic Science: Cognitive Accumulation for Machine Learning Engineering

The advancement of artificial intelligence toward agentic science is currently bottlenecked by the challenge of ultra-long-horizon autonomy, the ability to sustain strategic coherence and iterative correction over experimental cycles spanning days or weeks. While Large Language Models (LLMs) have demonstrated prowess in short-horizon reasoning, they are easily overwhelmed by execution details in the high-dimensional, delayed-feedback environments of real-world research, failing to consolidate sparse feedback into coherent long-term guidance. Here, we present ML-Master 2.0, an autonomous agent that masters ultra-long-horizon machine learning engineering (MLE) which is a representative microcosm of scientific discovery. By reframing context management as a process of cognitive accumulation, our approach introduces Hierarchical Cognitive Caching (HCC), a multi-tiered architecture inspired by computer systems that enables the structural differentiation of experience over time. By dynamically distilling transient execution traces into stable knowledge and cross-task wisdom, HCC allows agents to decouple immediate execution from long-term experimental strategy, effectively overcoming the scaling limits of static context windows. In evaluations on OpenAI's MLE-Bench under 24-hour budgets, ML-Master 2.0 achieves a state-of-the-art medal rate of 56.44%. Our findings demonstrate that ultra-long-horizon autonomy provides a scalable blueprint for AI capable of autonomous exploration beyond human-precedent complexities.

顶级标签: agents systems model training
详细标签: autonomous agents long-horizon planning context management machine learning engineering hierarchical caching 或 搜索:

迈向超长视野的自主科学:面向机器学习工程的认知积累 / Toward Ultra-Long-Horizon Agentic Science: Cognitive Accumulation for Machine Learning Engineering


1️⃣ 一句话总结

这篇论文提出了一个名为ML-Master 2.0的自主智能体,它通过一种创新的‘分层认知缓存’架构,让AI能够像人类科学家一样,在长达数天或数周的复杂机器学习工程任务中,持续积累经验、调整策略并保持目标一致性,从而突破了现有AI在超长期自主探索方面的瓶颈。

源自 arXiv: 2601.10402