面向时序动作分割的边界中心主动学习 / Boundary-Centric Active Learning for Temporal Action Segmentation
1️⃣ 一句话总结
这篇论文提出了一种名为B-ACT的主动学习方法,它通过智能地将标注资源集中用于视频中难以确定、易出错的动作边界区域,从而在标注数据有限的情况下,显著提升了时序动作分割模型的性能。
Temporal action segmentation (TAS) demands dense temporal supervision, yet most of the annotation cost in untrimmed videos is spent identifying and refining action transitions, where segmentation errors concentrate and small temporal shifts disproportionately degrade segmental metrics. We introduce B-ACT, a clip-budgeted active learning framework that explicitly allocates supervision to these high-leverage boundary regions. B-ACT operates in a hierarchical two-stage loop: (i) it ranks and queries unlabeled videos using predictive uncertainty, and (ii) within each selected video, it detects candidate transitions from the current model predictions and selects the top-$K$ boundaries via a novel boundary score that fuses neighborhood uncertainty, class ambiguity, and temporal predictive dynamics. Importantly, our annotation protocol requests labels for only the boundary frames while still training on boundary-centered clips to exploit temporal context through the model's receptive field. Extensive experiments on GTEA, 50Salads, and Breakfast demonstrate that boundary-centric supervision delivers strong label efficiency and consistently surpasses representative TAS active learning baselines and prior state of the art under sparse budgets, with the largest gains on datasets where boundary placement dominates edit and overlap-based F1 scores.
面向时序动作分割的边界中心主动学习 / Boundary-Centric Active Learning for Temporal Action Segmentation
这篇论文提出了一种名为B-ACT的主动学习方法,它通过智能地将标注资源集中用于视频中难以确定、易出错的动作边界区域,从而在标注数据有限的情况下,显著提升了时序动作分割模型的性能。
源自 arXiv: 2604.15173