弥合领域差距:面向离线强化学习的对齐目标生成方法 / Bridging Domain Gaps with Target-Aligned Generation for Offline Reinforcement Learning
1️⃣ 一句话总结
本论文提出了一种名为TCE的框架,通过理论指导下的目标对齐生成技术,在目标域数据极其有限的情况下,智能地利用源域数据来扩展状态覆盖范围,从而有效解决了跨领域离线强化学习中因环境差异导致的策略适配难题。
Cross-domain offline reinforcement learning aims to adapt a policy from a source domain to a target domain using only pre-collected datasets, where environment dynamics may differ. A key challenge is to leverage source data while reducing distributional mismatch, particularly when the target dataset is extremely limited. To address this, we propose Target-aligned Coverage Expansion (TCE), a framework that decides how source data should be used, either by directly incorporating target-near transitions or by expanding state coverage through target-aligned generation, guided by theoretical analysis. TCE builds on a dual score-based generative model to synthesize target-consistent transitions over an expanded state region. Extensive experiments across diverse cross-domain environments show that TCE consistently outperforms state-of-the-art cross-domain offline RL baselines.
弥合领域差距:面向离线强化学习的对齐目标生成方法 / Bridging Domain Gaps with Target-Aligned Generation for Offline Reinforcement Learning
本论文提出了一种名为TCE的框架,通过理论指导下的目标对齐生成技术,在目标域数据极其有限的情况下,智能地利用源域数据来扩展状态覆盖范围,从而有效解决了跨领域离线强化学习中因环境差异导致的策略适配难题。
源自 arXiv: 2605.13054