使用RLE进行语义分割的标记化 / Tokenizing Semantic Segmentation with RLE
1️⃣ 一句话总结
这篇论文提出了一种新颖的通用方法,通过将图像和视频中的语义分割掩码转换为类似语言的离散标记序列,并利用自回归模型进行预测,从而统一处理图像和视频的分割任务,同时还能扩展到全景分割。
This paper presents a new unified approach to semantic segmentation in both images and videos by using language modeling to output the masks as sequences of discrete tokens. We use run length encoding (RLE) to discretize the segmentation masks and then train a modified version of Pix2Seq \cite{p2s} to output these RLE tokens through autoregression. We propose novel tokenization strategies to compress the length of the token sequence to make it practicable to extend this approach to videos. We also show how instance information can be incorporated into the tokenization process to perform panoptic segmentation. We evaluate our proposed models on two datasets to show that they are competitive with the state of the art in spite of being bottlenecked by our limited computational resources.
使用RLE进行语义分割的标记化 / Tokenizing Semantic Segmentation with RLE
这篇论文提出了一种新颖的通用方法,通过将图像和视频中的语义分割掩码转换为类似语言的离散标记序列,并利用自回归模型进行预测,从而统一处理图像和视频的分割任务,同时还能扩展到全景分割。
源自 arXiv: 2602.21627