Spectral Souping: A Unified Framework for Online Preference Alignment

📄 Abstract - Spectral Souping: A Unified Framework for Online Preference Alignment

Reinforcement Learning from Human Feedback (RLHF) effectively aligns Large Language Models (LLMs) with aggregate human preferences but often fails to address the diverse and conflicting needs of individual users. To overcome this issue, we introduce Spectral Souping, a unified framework for efficient, online preference alignment. Our contribution is the discovery of a universal spectral representation within LLMs, which is proven to be highly amenable to model merging. This theoretical insight enables a two-phase methodology: we first learn a basis of specialized policies offline, each focused on a distinct, fine-grained preference dimension. An online adaptation algorithm then efficiently ``soups'' these policies at inference time, either by merging their outputs or parameters, enabling rapid model adaptation without the need for costly online retraining w.r.t. tailored preference rewards. Experiments on online preference alignment benchmarks demonstrate that our method achieves significant performance improvements over existing state-of-the-art approaches, presenting a scalable and computationally efficient solution for dynamically adapting LLMs to individual user preferences.

光谱混合：一种用于在线偏好对齐的统一框架 / Spectral Souping: A Unified Framework for Online Preference Alignment

1️⃣ 一句话总结

本文提出了一种名为“光谱混合”的新方法，通过发现大语言模型内部存在一种易于合并的通用光谱结构，先离线训练多个专注于不同偏好的专用模型，再在推理时快速将它们组合，从而高效、动态地让同一个模型适应不同用户的个性化需求，无需重新训练。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要