📄
Abstract - Tempered Sequential Monte Carlo for Trajectory and Policy Optimization with Differentiable Dynamics
We propose a sampling-based framework for finite-horizon trajectory and policy optimization under differentiable dynamics by casting controller design as inference. Specifically, we minimize a KL-regularized expected trajectory cost, which yields an optimal "Boltzmann-tilted" distribution over controller parameters that concentrates on low-cost solutions as temperature decreases. To sample efficiently from this sharp, potentially multimodal target, we introduce tempered sequential Monte Carlo (TSMC): an annealing scheme that adaptively reweights and resamples particles along a tempering path from a prior to the target distribution, while using Hamiltonian Monte Carlo rejuvenation to maintain diversity and exploit exact gradients obtained by differentiating through trajectory rollouts. For policy optimization, we extend TSMC via (i) a deterministic empirical approximation of the initial-state distribution and (ii) an extended-space construction that treats rollout randomness as auxiliary variables. Experiments across trajectory- and policy-optimization benchmarks show that TSMC is broadly applicable and compares favorably to state-of-the-art baselines.
基于可微动力学的轨迹与策略优化的温控序贯蒙特卡洛方法 /
Tempered Sequential Monte Carlo for Trajectory and Policy Optimization with Differentiable Dynamics
1️⃣ 一句话总结
本文提出了一种名为温控序贯蒙特卡洛(TSMC)的采样框架,通过将控制器设计问题转化为统计推断问题,在可微动力学环境下高效搜索最优轨迹和策略,其核心思想是利用模拟退火和哈密顿蒙特卡洛技术,从复杂的高维分布中集中采样低成本的参数解,实验证明该方法在多个基准任务上优于现有最先进算法。