[复现] FairDICE:理论与实践的差距 / [Re] FairDICE: A Gap Between Theory And Practice
1️⃣ 一句话总结
这篇论文通过复现研究发现,一个旨在让离线强化学习算法自动权衡多个目标以实现公平性的新方法FairDICE,其理论虽然成立,但原始代码存在错误导致其在连续环境中失效,且实验验证部分需要大量修正才能支持其实际应用价值。
Offline Reinforcement Learning (RL) is an emerging field of RL in which policies are learned solely from demonstrations. Within offline RL, some environments involve balancing multiple objectives, but existing multi-objective offline RL algorithms do not provide an efficient way to find a fair compromise. FairDICE (see arXiv:2506.08062v2) seeks to fill this gap by adapting OptiDICE (an offline RL algorithm) to automatically learn weights for multiple objectives to e.g.\ incentivise fairness among objectives. As this would be a valuable contribution, this replication study examines the replicability of claims made regarding FairDICE. We find that many theoretical claims hold, but an error in the code reduces FairDICE to standard behaviour cloning in continuous environments, and many important hyperparameters were originally underspecified. After rectifying this, we show in experiments extending the original paper that FairDICE can scale to complex environments and high-dimensional rewards, though it can be reliant on (online) hyperparameter tuning. We conclude that FairDICE is a theoretically interesting method, but the experimental justification requires significant revision.
[复现] FairDICE:理论与实践的差距 / [Re] FairDICE: A Gap Between Theory And Practice
这篇论文通过复现研究发现,一个旨在让离线强化学习算法自动权衡多个目标以实现公平性的新方法FairDICE,其理论虽然成立,但原始代码存在错误导致其在连续环境中失效,且实验验证部分需要大量修正才能支持其实际应用价值。
源自 arXiv: 2603.03454