菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-09
📄 Abstract - LAMP: Lift Image-Editing as General 3D Priors for Open-world Manipulation

Human-like generalization in open-world remains a fundamental challenge for robotic manipulation. Existing learning-based methods, including reinforcement learning, imitation learning, and vision-language-action-models (VLAs), often struggle with novel tasks and unseen environments. Another promising direction is to explore generalizable representations that capture fine-grained spatial and geometric relations for open-world manipulation. While large-language-model (LLMs) and vision-language-model (VLMs) provide strong semantic reasoning based on language or annotated 2D representations, their limited 3D awareness restricts their applicability to fine-grained manipulation. To address this, we propose LAMP, which lifts image-editing as 3D priors to extract inter-object 3D transformations as continuous, geometry-aware representations. Our key insight is that image-editing inherently encodes rich 2D spatial cues, and lifting these implicit cues into 3D transformations provides fine-grained and accurate guidance for open-world manipulation. Extensive experiments demonstrate that \codename delivers precise 3D transformations and achieves strong zero-shot generalization in open-world manipulation. Project page: this https URL.

顶级标签: robotics multi-modal computer vision
详细标签: 3d manipulation image editing priors zero-shot generalization spatial reasoning open-world robotics 或 搜索:

LAMP:将图像编辑提升为开放世界机器人操作的通用三维先验 / LAMP: Lift Image-Editing as General 3D Priors for Open-world Manipulation


1️⃣ 一句话总结

这篇论文提出了一种名为LAMP的新方法,它巧妙地将图像编辑中隐含的二维空间信息转化为精细的三维几何变换,从而为零样本的开放世界机器人操作任务提供了强大且通用的指导。

源自 arXiv: 2604.08475