← 返回列表

arXiv 提交日期: 2026-04-20

📄 Abstract - Efficient Federated RLHF via Zeroth-Order Policy Optimization

This paper considers reinforcement learning from human feedback in a federated learning setting with resource-constrained agents, such as edge devices. We propose an efficient federated RLHF algorithm, named Partitioned, Sign-based Stochastic Zeroth-order Policy Optimization (Par-S$^2$ZPO). The algorithm is built on zeroth-order optimization with binary perturbation, resulting in low communication, computation, and memory complexity by design. Our theoretical analysis establishes an upper bound on the convergence rate of Par-S$^2$ZPO, revealing that it is as efficient as its centralized counterpart in terms of sample complexity but converges faster in terms of policy update iterations. Our experimental results show that it outperforms a FedAvg-based RLHF on four MuJoCo RL tasks.

顶级标签: reinforcement learning model training systems

基于零阶策略优化的高效联邦人类反馈强化学习 / Efficient Federated RLHF via Zeroth-Order Policy Optimization

1️⃣ 一句话总结

本文提出了一种名为Par-S^2ZPO的高效联邦学习算法，让资源有限的设备（如手机、传感器）也能协同进行人类反馈强化学习，它在保证学习效果的同时，大幅降低了通信和计算开销，比现有方法更快更好。

👋 没兴趣 ☆ 感兴趣 📌 待读

打开原文 PDF

源自 arXiv: 2604.17747

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要