菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-05-19
📄 Abstract - Spectral Souping: A Unified Framework for Online Preference Alignment

Reinforcement Learning from Human Feedback (RLHF) effectively aligns Large Language Models (LLMs) with aggregate human preferences but often fails to address the diverse and conflicting needs of individual users. To overcome this issue, we introduce Spectral Souping, a unified framework for efficient, online preference alignment. Our contribution is the discovery of a universal spectral representation within LLMs, which is proven to be highly amenable to model merging. This theoretical insight enables a two-phase methodology: we first learn a basis of specialized policies offline, each focused on a distinct, fine-grained preference dimension. An online adaptation algorithm then efficiently ``soups'' these policies at inference time, either by merging their outputs or parameters, enabling rapid model adaptation without the need for costly online retraining w.r.t. tailored preference rewards. Experiments on online preference alignment benchmarks demonstrate that our method achieves significant performance improvements over existing state-of-the-art approaches, presenting a scalable and computationally efficient solution for dynamically adapting LLMs to individual user preferences.

顶级标签: llm reinforcement learning model training
详细标签: rlhf preference alignment model merging online adaptation llm alignment 或 搜索:

光谱混合:一种用于在线偏好对齐的统一框架 / Spectral Souping: A Unified Framework for Online Preference Alignment


1️⃣ 一句话总结

本文提出了一种名为“光谱混合”的新方法,通过发现大语言模型内部存在一种易于合并的通用光谱结构,先离线训练多个专注于不同偏好的专用模型,再在推理时快速将它们组合,从而高效、动态地让同一个模型适应不同用户的个性化需求,无需重新训练。

源自 arXiv: 2605.20408