Safe Flow Q-Learning: Offline Safe Reinforcement Learning with Reachability-Based Flow Policies

📄 Abstract - Safe Flow Q-Learning: Offline Safe Reinforcement Learning with Reachability-Based Flow Policies

Offline safe reinforcement learning (RL) seeks reward-maximizing policies from static datasets under strict safety constraints. Existing methods often rely on soft expected-cost objectives or iterative generative inference, which can be insufficient for safety-critical real-time control. We propose Safe Flow Q-Learning (SafeFQL), which extends FQL to safe offline RL by combining a Hamilton--Jacobi reachability-inspired safety value function with an efficient one-step flow policy. SafeFQL learns the safety value via a self-consistency Bellman recursion, trains a flow policy by behavioral cloning, and distills it into a one-step actor for reward-maximizing safe action selection without rejection sampling at deployment. To account for finite-data approximation error in the learned safety boundary, we add a conformal prediction calibration step that adjusts the safety threshold and provides finite-sample probabilistic safety coverage. Empirically, SafeFQL trades modestly higher offline training cost for substantially lower inference latency than diffusion-style safe generative baselines, which is advantageous for real-time safety-critical deployment. Across boat navigation, and Safety Gymnasium MuJoCo tasks, SafeFQL matches or exceeds prior offline safe RL performance while substantially reducing constraint violations.

安全流Q学习：基于可达性流策略的离线安全强化学习 / Safe Flow Q-Learning: Offline Safe Reinforcement Learning with Reachability-Based Flow Policies

1️⃣ 一句话总结

这篇论文提出了一种名为SafeFQL的新方法，它通过结合可达性安全评估和高效的单步决策，在离线强化学习中实现了既追求高回报又严格保证安全性的目标，特别适合对实时性和安全性要求极高的控制任务。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要