Integrating LTL Constraints into PPO for Safe Reinforcement Learning

📄 Abstract - Integrating LTL Constraints into PPO for Safe Reinforcement Learning

This paper proposes Proximal Policy Optimization with Linear Temporal Logic Constraints (PPO-LTL), a framework that integrates safety constraints written in LTL into PPO for safe reinforcement learning. LTL constraints offer rigorous representations of complex safety requirements, such as regulations that broadly exist in robotics, enabling systematic monitoring of safety requirements. Violations against LTL constraints are monitored by limit-deterministic Büchi automata, and then translated by a logic-to-cost mechanism into penalty signals. The signals are further employed for guiding the policy optimization via the Lagrangian scheme. Extensive experiments on the Zones and CARLA environments show that our PPO-LTL can consistently reduce safety violations, while maintaining competitive performance, against the state-of-the-art methods. The code is at this https URL.

将线性时序逻辑约束集成到PPO算法中实现安全强化学习 / Integrating LTL Constraints into PPO for Safe Reinforcement Learning

1️⃣ 一句话总结

这篇论文提出了一种名为PPO-LTL的新方法，它通过将描述复杂安全规则（如机器人避障）的线性时序逻辑公式转化为惩罚信号，并融入强化学习训练过程，从而在保证任务性能的同时，显著降低了智能体在训练和运行中的危险行为次数。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要