NewtPhys: Do Foundation Models Understand Newtonian Physics?

📄 Abstract - NewtPhys: Do Foundation Models Understand Newtonian Physics?

Previous work has evaluated physics reasoning in foundation models using synthetic or semi-synthetic scenes and visual question-answering tasks. However, these benchmarks emphasize high-level events and lack the visual fidelity required to assess true low-level Newtonian understanding. We introduce NewtPhys, a 4D physically annotated dataset built from multiview images of real-world scenes with physics-grounded simulations. The dataset provides dense, fine-grained annotations across timesteps -- including 3D forces and amodal per-pixel quantities covering physics, tracking, semantics and geometry -- bridging the gap between simplistic synthetic setups and realistic visual complexity. Using NewtPhys, we systematically evaluate 56 VLMs, including 54 open-weight models and 2 closed-source frontier models, and 10 VFMs and reveal limitations in low-level physics reasoning. Beyond benchmarking, our dataset enables future research in physics-grounded vision and the development of next-generation physics-aware evaluations. Code and datasets are available at this https URL.

NewtPhys：基础模型理解牛顿物理吗？ / NewtPhys: Do Foundation Models Understand Newtonian Physics?

1️⃣ 一句话总结

该论文构建了一个名为NewtPhys的高保真4D物理数据集，通过对真实场景的多视角图像和物理模拟提供精细的3D力和逐像素标注，系统测试了56个视觉语言模型和10个视觉基础模型，揭示了现有模型在处理低层次牛顿力学推理方面的显著不足。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要