Distribution-Agnostic Robust Trajectory Optimization via Chance-Constrained Reinforcement Learning

📄 Abstract - Distribution-Agnostic Robust Trajectory Optimization via Chance-Constrained Reinforcement Learning

This paper presents a distribution-agnostic robust trajectory-optimization framework based on chance-constrained reinforcement learning. The uncertainty is represented here through initial conditions and process noise, with the only requirement being that it can be sampled. A deterministic nominal trajectory is first computed offline, and reinforcement learning is then used only to robustify that baseline through a structured affine closed-loop correction law comprising a feedforward control adjustment and time-varying feedback gains. Probabilistic feasibility is enforced empirically through rollout-based upper-tail quantiles, while terminal dispersion is regulated through covariance-feasibility penalties. The framework is assessed on two materially different trajectory design problems. The flagship case study is a three-dimensional multi-impulse Earth-Mars transfer, where the learned policy is benchmarked against a recent robust trajectory-optimization reference under Gaussian uncertainty and then evaluated under bounded uniform uncertainty and under process disturbances not seen during training. The second case study is a stochastic atmospheric pinpoint rocket landing problem, used to assess portability to a short-horizon continuous-thrust setting with drag, mass depletion, and glide-slope constraints. The results show that the proposed framework can remain competitive in upper-tail fuel cost while preserving probabilistic feasibility, and that the same robustification scaffold can be carried across heterogeneous spacecraft trajectory planning problems without redesign of its core stochastic-control structure.

基于机会约束强化学习的分布无关鲁棒轨迹优化 / Distribution-Agnostic Robust Trajectory Optimization via Chance-Constrained Reinforcement Learning

1️⃣ 一句话总结

本文提出了一种不依赖于特定不确定性分布类型的鲁棒轨迹优化方法，通过先离线计算标称轨迹，再利用强化学习对该轨迹进行鲁棒化修正（包括前馈控制和时变反馈增益），从而在保证任务可行性的同时有效降低燃料成本，并在多种不同航天任务（如地火转移和火箭着陆）中验证了其通用性。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要