菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-05-03
📄 Abstract - DexSim2Real: Foundation Model-Guided Sim-to-Real Transfer for Generalizable Dexterous Manipulation

Sim-to-real transfer remains a critical bottleneck for deploying dexterous manipulation policies learned in simulation to real-world robots. Existing approaches rely on manually designed domain randomization or task-specific adaptation, limiting their generalizability across diverse manipulation scenarios. We present DexSim2Real, an integrated framework that leverages vision-language foundation models to bridge the sim-to-real gap for dexterous manipulation. Our system combines three components: (1) Foundation Model-Guided Domain Randomization (FM-DR), which uses a vision-language model as a visual realism critic to optimize simulation parameters via closed-loop CMA-ES, complementing text-based approaches like DrEureka with direct visual feedback; (2) a Tactile-Visual Cross-Attention Policy (TVCAP) that adapts cross-attention visuo-tactile fusion to zero-shot sim-to-real RL; and (3) a Progressive Skill Curriculum (PSC) that builds on LLM-based task decomposition with a difficulty scheduler tailored to contact-rich dexterous tasks. Extensive experiments on six challenging manipulation tasks with blinded evaluation demonstrate that DexSim2Real achieves a 78.2% average real-world success rate, outperforming DrEureka and DeXtreme while reducing the sim-to-real performance gap to only 8.3%.

顶级标签: robotics machine learning
详细标签: sim-to-real dexterous manipulation vision-language model domain randomization reinforcement learning 或 搜索:

DexSim2Real:基于基础模型的灵巧操作仿真到现实迁移框架 / DexSim2Real: Foundation Model-Guided Sim-to-Real Transfer for Generalizable Dexterous Manipulation


1️⃣ 一句话总结

本文提出一个名为DexSim2Real的集成框架,通过结合视觉-语言基础模型自动优化仿真参数、设计触觉-视觉交叉注意力策略以及构建渐进式技能课程,显著提升了灵巧操作任务从仿真到现实迁移的泛化能力,在六个复杂任务中实现了78.2%的平均真实世界成功率。

源自 arXiv: 2605.05241