← 返回列表

arXiv 提交日期: 2026-02-25

📄 Abstract - Tokenizing Semantic Segmentation with RLE

This paper presents a new unified approach to semantic segmentation in both images and videos by using language modeling to output the masks as sequences of discrete tokens. We use run length encoding (RLE) to discretize the segmentation masks and then train a modified version of Pix2Seq \cite{p2s} to output these RLE tokens through autoregression. We propose novel tokenization strategies to compress the length of the token sequence to make it practicable to extend this approach to videos. We also show how instance information can be incorporated into the tokenization process to perform panoptic segmentation. We evaluate our proposed models on two datasets to show that they are competitive with the state of the art in spite of being bottlenecked by our limited computational resources.

顶级标签: computer vision natural language processing model training

使用RLE进行语义分割的标记化 / Tokenizing Semantic Segmentation with RLE

1️⃣ 一句话总结

这篇论文提出了一种新颖的通用方法，通过将图像和视频中的语义分割掩码转换为类似语言的离散标记序列，并利用自回归模型进行预测，从而统一处理图像和视频的分割任务，同时还能扩展到全景分割。

👋 没兴趣 ☆ 感兴趣 📌 待读

打开原文 PDF

源自 arXiv: 2602.21627

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要