CauSight:学习超感知以实现视觉因果发现 / CauSight: Learning to Supersense for Visual Causal Discovery
1️⃣ 一句话总结
这篇论文提出了一个名为CauSight的新模型,它能够像人一样从图片中识别出事物之间的因果关系,而不仅仅是看到它们,并通过一个包含3.2万张带标注图片的新数据集和一套特殊的训练方法,在视觉因果发现任务上显著超越了GPT-4等现有模型。
Causal thinking enables humans to understand not just what is seen, but why it happens. To replicate this capability in modern AI systems, we introduce the task of visual causal discovery. It requires models to infer cause-and-effect relations among visual entities across diverse scenarios instead of merely perceiving their presence. To this end, we first construct the Visual Causal Graph dataset (VCG-32K), a large-scale collection of over 32,000 images annotated with entity-level causal graphs, and further develop CauSight, a novel vision-language model to perform visual causal discovery through causally aware reasoning. Our training recipe integrates three components: (1) training data curation from VCG-32K, (2) Tree-of-Causal-Thought (ToCT) for synthesizing reasoning trajectories, and (3) reinforcement learning with a designed causal reward to refine the reasoning policy. Experiments show that CauSight outperforms GPT-4.1 on visual causal discovery, achieving over a threefold performance boost (21% absolute gain). Our code, model, and dataset are fully open-sourced at project page: this https URL.
CauSight:学习超感知以实现视觉因果发现 / CauSight: Learning to Supersense for Visual Causal Discovery
这篇论文提出了一个名为CauSight的新模型,它能够像人一样从图片中识别出事物之间的因果关系,而不仅仅是看到它们,并通过一个包含3.2万张带标注图片的新数据集和一套特殊的训练方法,在视觉因果发现任务上显著超越了GPT-4等现有模型。