菜单

🤖 系统
📄 Abstract - Click2Graph: Interactive Panoptic Video Scene Graphs from a Single Click

State-of-the-art Video Scene Graph Generation (VSGG) systems provide structured visual understanding but operate as closed, feed-forward pipelines with no ability to incorporate human guidance. In contrast, promptable segmentation models such as SAM2 enable precise user interaction but lack semantic or relational reasoning. We introduce Click2Graph, the first interactive framework for Panoptic Video Scene Graph Generation (PVSG) that unifies visual prompting with spatial, temporal, and semantic understanding. From a single user cue, such as a click or bounding box, Click2Graph segments and tracks the subject across time, autonomously discovers interacting objects, and predicts <subject, object, predicate> triplets to form a temporally consistent scene graph. Our framework introduces two key components: a Dynamic Interaction Discovery Module that generates subject-conditioned object prompts, and a Semantic Classification Head that performs joint entity and predicate reasoning. Experiments on the OpenPVSG benchmark demonstrate that Click2Graph establishes a strong foundation for user-guided PVSG, showing how human prompting can be combined with panoptic grounding and relational inference to enable controllable and interpretable video scene understanding.

顶级标签: computer vision multi-modal systems
详细标签: video scene graph generation interactive segmentation panoptic segmentation video understanding human-in-the-loop 或 搜索:

Click2Graph:通过单次点击生成交互式全景视频场景图 / Click2Graph: Interactive Panoptic Video Scene Graphs from a Single Click


1️⃣ 一句话总结

这篇论文提出了一个名为Click2Graph的交互式框架,用户只需在视频中点击或框选一个目标,系统就能自动追踪它、找出与之互动的其他物体,并推断出它们之间的关系,从而生成一个结构化的、易于理解和控制的视频场景理解图谱。


📄 打开原文 PDF