Click2Graph: Interactive Panoptic Video Scene Graphs from a Single Click

📄 Abstract - Click2Graph: Interactive Panoptic Video Scene Graphs from a Single Click

State-of-the-art Video Scene Graph Generation (VSGG) systems provide structured visual understanding but operate as closed, feed-forward pipelines with no ability to incorporate human guidance. In contrast, promptable segmentation models such as SAM2 enable precise user interaction but lack semantic or relational reasoning. We introduce Click2Graph, the first interactive framework for Panoptic Video Scene Graph Generation (PVSG) that unifies visual prompting with spatial, temporal, and semantic understanding. From a single user cue, such as a click or bounding box, Click2Graph segments and tracks the subject across time, autonomously discovers interacting objects, and predicts <subject, object, predicate> triplets to form a temporally consistent scene graph. Our framework introduces two key components: a Dynamic Interaction Discovery Module that generates subject-conditioned object prompts, and a Semantic Classification Head that performs joint entity and predicate reasoning. Experiments on the OpenPVSG benchmark demonstrate that Click2Graph establishes a strong foundation for user-guided PVSG, showing how human prompting can be combined with panoptic grounding and relational inference to enable controllable and interpretable video scene understanding.

Click2Graph：通过单次点击生成交互式全景视频场景图 / Click2Graph: Interactive Panoptic Video Scene Graphs from a Single Click

1️⃣ 一句话总结

这篇论文提出了一个名为Click2Graph的交互式框架，用户只需在视频中点击或框选一个目标，系统就能自动追踪它、找出与之互动的其他物体，并推断出它们之间的关系，从而生成一个结构化的、易于理解和控制的视频场景理解图谱。

← 返回列表

菜单

🤖 AI 深度阅读

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

🤖 AI 深度阅读

1️⃣ 一句话总结

获取最新论文摘要