菜单

关于 🐙 GitHub
arXiv 提交日期: 2025-12-02
📄 Abstract - ReVSeg: Incentivizing the Reasoning Chain for Video Segmentation with Reinforcement Learning

Reasoning-centric video object segmentation is an inherently complex task: the query often refers to dynamics, causality, and temporal interactions, rather than static appearances. Yet existing solutions generally collapse these factors into simplified reasoning with latent embeddings, rendering the reasoning chain opaque and essentially intractable. We therefore adopt an explicit decomposition perspective and introduce ReVSeg, which executes reasoning as sequential decisions in the native interface of pretrained vision language models (VLMs). Rather than folding all reasoning into a single-step prediction, ReVSeg executes three explicit operations -- semantics interpretation, temporal evidence selection, and spatial grounding -- aligning pretrained capabilities. We further employ reinforcement learning to optimize the multi-step reasoning chain, enabling the model to self-refine its decision quality from outcome-driven signals. Experimental results demonstrate that ReVSeg attains state-of-the-art performances on standard video object segmentation benchmarks and yields interpretable reasoning trajectories. Project page is available at this https URL .

顶级标签: computer vision multi-modal reinforcement learning
详细标签: video object segmentation reasoning chain vision language models reinforcement learning interpretability 或 搜索:

ReVSeg:利用强化学习激励视频分割中的推理链 / ReVSeg: Incentivizing the Reasoning Chain for Video Segmentation with Reinforcement Learning


1️⃣ 一句话总结

这篇论文提出了一个名为ReVSeg的新方法,它通过将复杂的视频对象分割任务分解为语义理解、时序证据选择和空间定位三个明确的步骤,并利用强化学习来优化这个多步推理链,从而在提升分割性能的同时,让模型的推理过程变得可解释。


源自 arXiv: 2512.02835